In this thesis we present an adaptive software library for the Fast Fourier transform (FFT) for different hardware architectures. The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at runtime by selecting the fastest strategy among all possible strategies that can be put together for a given transform size and the distribution of data. We have tested the library for performance on the IBM-SP2, SGI-2000, HP-Exemplar and Intel Pentium systems. The library is shown to be portable, adaptive and flexible.
The optimization of the FFT library is performed on two levels. The low level optimization involves generation of highly efficient codelets for small sizes of transforms. These codelets are optimized for the number of arithmetic operations and memory access on each platform and are shown to perform well on all architectures. The high level optimization involves selection of an execution plan for different size transform. Large size transforms are created by using a computationally efficient combination of the codelets in the library. Performance data for both levels of optimization are presented.
We also present an efficient automatic method of generating the library
using a special--purpose compiler. The code generator is written in C and
generates a library of codelets in C. The code generator is shown
to be flexible and extensible and the entire library can be generated in
a matter of seconds.