From the flow-graph of the FFT algorithm shown in Figure 7.6, X(k) are computed
by a series of butterfly computations with a single complex multiplication per butterfly
network. Note that some of the butterfly computations require multiplications by
À 1
(such as 2-point FFT in the first stage) that do not require multiplication in practical
implementation, thus avoiding roundoff errors.
Figure 7.6 shows that the computation of N-point FFT requires M log
2
N stages.
There are N/2 butterflies in the first stage, N/4 in the second stage, and so on. Thus the
total number of butterflies required to produce an output sample is
N
2
N
4
Á Á Á 2 1 2
MÀ1
2
MÀ2
Á Á Á 2 1
2
MÀ1
1
1
2
Á Á Á
1
2
MÀ1
4
5
2
MÀ1
MÀ1
m0
1
2
m
2
M
1 À
1
2
M
4
5
N À 1:
7:4:1
The quantization errors introduced at the mth stage appear at the output after propaga-
tion through m À 1 stages, while getting multiplied by the twiddle factors at each
subsequent stage. Since the magnitude of the twiddle factor is always unity, the vari-
ances of the quantization errors do not change while propagating to the output. If we
assume that the quantization errors in each butterfly are uncorrelated with the errors in
other butterflies, the total number of roundoff error sources contributing to the output
is 4N À 1. Therefore the variance of the output roundoff error is
s
2
e
4N À 1
2
À2B
12
%
N2
À2B
3
:
7:4:2
As mentioned earlier, some of the butterflies do not require multiplications in practical
implementation, thus the total roundoff error is less than the one given in (7.4.2).
The definition of DFT given in (7.1.3) shows that we can scale the input sequence
with the condition
jxnj <
1
N
7:4:3
to prevent the overflow at the output because je
Àj2p=Nkn
j 1. For example, in a 1024-
point FFT, the input data must be shifted right by 10 bits. If the original data is 16-bit,
the effective wordlength after scaling is reduced to only 6 bits. This worst-case scaling
substantially reduces the resolution of the FFT results.
Instead of scaling the input samples by 1/N at the beginning, we can scale the signals
at each stage since the FFT algorithm consists of a sequence of stages. Figure 7.5 shows
that we can avoid overflow within the FFT by scaling the input at each stage by 1/2
(right shift one bit in a fixed-point hardware) because the outputs of each butterfly
involve the addition of two numbers. That is, we shift right the input by 1bit, perform
the first stage of FFT, shift right that result by 1bit, perform the second stage of FFT, and
so on. This unconditional scaling process does not affect the signal level at the output of
IMPLEMENTATION CONSIDERATIONS
335