Short-Time Fourier Transforms

Motivation

In many scenarios, the time-domain or frequency-domain alone are not enough to adequately describe the signal. In some signals, the frequency content of the signals changes with time. As a result, we want to describe signals in both time and frequency simultaneously. The transform used to accomplish this is known as the short-time Fourier transform.

The Short-time Fourier Transform (DTFT version)

The Transform

Mathematically, the basic short-time Fourier transform (using the DTFT to transform into frequency) can be expressed by $$ \begin{align} X(m,\omega) = \sum_{n=-\infty}^{\infty} x[n] w[n - m M] e^{-j \omega n} \; , \end{align} $$ where $w[n]$ is a rectangular window function $$ \begin{align} w[n] = u[n] - u[n-M] \; . \end{align} $$ Note that in this scenario, $m$ is a discrete variable and $\omega$ is continuous.

Intuitively, this algorithm multiples our signal $x[n]$ by a length-$M$ window that starts at sample $m M$. Only $M$ non-zero samples are observed after this multiplication. The DTFT of these samples is then computed and placed into index $m$ of $X(m,\omega)$.

Time-Frequency Uncertainty

One of the greatest challenges with short-time Fourier transform is that you cannot achieve infinite resolution in both time and frequency. To perfectly resolve a single frequency in the frequency domain requires the time domain signal be observed from $-\infty$ to $+\infty$. By windowing the data, we are inherently limiting the frequency resolution.

Mathematically, we can represent this by evaluate the expression for $X(m,\omega)$. This expression can be expressed as: $$ \begin{align} X(m,\omega) &= \frac{1}{2\pi} \left( X(\omega) \circledast W(\omega) \right) e^{-j \omega m M } \; , \end{align} $$ where $$ \begin{align} W(\omega) &= e^{-j \omega (M-1)/2} \frac{\sin(\omega M/2)}{\sin(\omega /2)} \\ \end{align} $$

Show proof

Here, $W(\omega)$ is a sinc-like function. Hence, the original frequency response $X(\omega)$ is convolved by a sinc function, blurring (and thus reducing the resolution of) that original frequency response. As $M$ gets larger (i.e., as the time resolution becomes more coarse), the width of $W(\omega)$ gets smaller (i.e., the frequency resolution becomes more fine).

The Short-time Fourier Transform (DFT version)

The Transform

Mathematically, the basic short-time Fourier transform (using the DTFT to transform into frequency) can be expressed by $$ \begin{align} X(m,k) = \sum_{n=-\infty}^{\infty} x[n] w[n - m M] e^{-j \frac{2 \pi}{M} k n} \; , \end{align} $$ where $w[n]$ is a rectangular window function $$ \begin{align} w[n] = u[n] - u[n-M] \; . \end{align} $$ Note that in this scenario, both $m$ and $k$ are discrete variables.

Intuitively, this algorithm multiples our signal $x[n]$ by a length-$M$ window that starts at sample $m W$. Only $M$ non-zero samples are observed after this multiplication. The M-length DFT of these samples is then computed and placed into index $m$ of $X(m,k)$.