Examples of Using Multi-Threading for FFT Computation

The following sample program shows how to employ internal threading in Intel MKL for FFT computation (see case "a" in “Number of user threads”).

To specify the number of threads inside Intel MKL, use the following settings:

set MKL_NUM_THREADS = 1 for one-threaded mode;

set MKL_NUM_THREADS = 4 for multi-threaded mode.

Note that the configuration parameter DFTI_NUMBER_OF_USER_THREADS must be equal to its default value 1.

Using Intel MKL Internal Threading Mode

 
#include "mkl_dfti.h"
 
int main ()
 
{
    float x[200][100];
    DFTI_DESCRIPTOR_HANDLE fft;
    MKL_LONG len[2] = {200, 100};
    // initialize x
    DftiCreateDescriptor ( &fft, DFTI_SINGLE, DFTI_REAL, 2, len );
    DftiCommitDescriptor ( fft );
    DftiComputeForward ( fft, x );
    DftiFreeDescriptor ( &fft );
    return 0;
}
 

The following Example “Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region” and Example “Using Parallel Mode with Multiple Descriptors Initialized in One Thread” illustrate a parallel customer program with each descriptor instance used only in a single thread (see cases "b" and "c" in Number of user threads).

Specify the number of threads for Example “Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region” like this:

set MKL_NUM_THREADS = 1 for Intel MKL to work in the single-threaded mode (recommended);

set OMP_NUM_THREADS = 4 for the customer program to work in the multi-threaded mode.

The configuration parameter DFTI_NUMBER_OF_USER_THREADS must have its default value of 1.

Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region

Note that in this example, the program can be transformed to become single-threaded at the customer level but using parallel mode within Intel MKL (case "a"). To achieve this, you need to set the parameter DFTI_NUMBER_OF_TRANSFORMS = 4 and to set the corresponding parameter DFTI_INPUT_DISTANCE = 5000.

C code for the example is as follows:

#include "mkl_dfti.h"
#include <omp.h>
#define ARRAY_LEN(a) sizeof(a)/sizeof(a[0])
int main ()
{
    // 4 OMP threads, each does 2D FFT 50x100 points
    MKL_Complex8 x[4][50][100];
    int nth = ARRAY_LEN(x);
    MKL_LONG len[2] = {ARRAY_LEN(x[0]), ARRAY_LEN(x[0][0])};
    int th;
    // assume x is initialized and do 2D FFTs
#pragma omp parallel for shared(len, x)
    for (th = 0; th < nth; th++)
    {
        DFTI_DESCRIPTOR_HANDLE myFFT;

        DftiCreateDescriptor (&myFFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len);
        DftiCommitDescriptor (myFFT);
        DftiComputeForward (myFFT, x[th]);
        DftiFreeDescriptor (&myFFT);
    }
    return 0;
}

Fortran code for the example is as follows:

program fft2d_private_descr_main
  use mkl_dfti

  integer nth, len(2)
! 4 OMP threads, each does 2D FFT 50x100 points
  parameter (nth = 4, len = (/50, 100/))
  complex x(len(2)*len(1), nth)

  type(dfti_descriptor), pointer :: myFFT
  integer th, myStatus

! assume x is initialized and do 2D FFTs
!$OMP PARALLEL DO SHARED(len, x) PRIVATE(myFFT, myStatus)
  do th = 1, nth
    myStatus = DftiCreateDescriptor (myFFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len)
    myStatus = DftiCommitDescriptor (myFFT)
    myStatus = DftiComputeForward (myFFT, x(:, th))
    myStatus = DftiFreeDescriptor (myFFT)
  end do
!$OMP END PARALLEL DO
end

Specify the number of threads for Example “Using Parallel Mode with Multiple Descriptors Initialized in One Thread” like this:

set MKL_NUM_THREADS = 1 for Intel MKL to work in the single-threaded mode (obligatory);

set OMP_NUM_THREADS = 4 for the customer program to work in the multi-threaded mode.

The configuration parameter DFTI_NUMBER_OF_USER_THREADS must have the default value of 1.

Using Parallel Mode with Multiple Descriptors Initialized in One Thread

C code for the example is as follows:

#include "mkl_dfti.h"
#include <omp.h>
#define ARRAY_LEN(a) sizeof(a)/sizeof(a[0])
int main ()
{
    // 4 OMP threads, each does 2D FFT 50x100 points
    MKL_Complex8 x[4][50][100];
    int nth = ARRAY_LEN(x);
    MKL_LONG len[2] = {ARRAY_LEN(x[0]), ARRAY_LEN(x[0][0])};
    DFTI_DESCRIPTOR_HANDLE FFT[ARRAY_LEN(x)];
    int th;

    for (th = 0; th < nth; th++)
        DftiCreateDescriptor (&FFT[th], DFTI_SINGLE, DFTI_COMPLEX, 2, len);
    for (th = 0; th < nth; th++)
        DftiCommitDescriptor (FFT[th]);
    // assume x is initialized and do 2D FFTs
#pragma omp parallel for shared(FFT, x)
    for (th = 0; th < nth; th++)
        DftiComputeForward (FFT[th], x[th]);
    for (th = 0; th < nth; th++)
        DftiFreeDescriptor (&FFT[th]);
    return 0;
}

Fortran code for the example is as follows:

program fft2d_array_descr_main
  use mkl_dfti

  integer nth, len(2)
! 4 OMP threads, each does 2D FFT 50x100 points
  parameter (nth = 4, len = (/50, 100/))
  complex x(len(2)*len(1), nth)

  type thread_data
    type(dfti_descriptor), pointer :: FFT
  end type thread_data
  type(thread_data) :: workload(nth)

  integer th, status, myStatus

  do th = 1, nth
    status = DftiCreateDescriptor (workload(th)%FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len)
    status = DftiCommitDescriptor (workload(th)%FFT)
  end do
! assume x is initialized and do 2D FFTs
!$OMP PARALLEL DO SHARED(len, x, workload) PRIVATE(myStatus)
  do th = 1, nth
    myStatus = DftiComputeForward (workload(th)%FFT, x(:, th))
  end do
!$OMP END PARALLEL DO
  do th = 1, nth
    status = DftiFreeDescriptor (workload(th)%FFT)
  end do
end

Using Parallel Mode with a Common Descriptor

The following Example “Using Parallel Mode with a Common Descriptor” illustrates a parallel customer program with a common descriptor used in several threads (see case "d" in “Number of user threads”.

In this case, the number of threads, as well as any other configuration parameter, must not be changed after FFT initialization by the DftiCommitDescriptor() function is done.

C code for the example is as follows:

#include "mkl_dfti.h"
#include <omp.h>
#define ARRAY_LEN(a) sizeof(a)/sizeof(a[0])
int main ()
{
    // 4 OMP threads, each does 2D FFT 50x100 points
    MKL_Complex8 x[4][50][100];
    int nth = ARRAY_LEN(x);
    MKL_LONG len[2] = {ARRAY_LEN(x[0]), ARRAY_LEN(x[0][0])};
    DFTI_DESCRIPTOR_HANDLE FFT;
    int th;

    DftiCreateDescriptor (&FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len);
    DftiSetValue (FFT, DFTI_NUMBER_OF_USER_THREADS, nth);
    DftiCommitDescriptor (FFT);
    // assume x is initialized and do 2D FFTs
#pragma omp parallel for shared(FFT, x)
    for (th = 0; th < nth; th++)
        DftiComputeForward (FFT, x[th]);
    DftiFreeDescriptor (&FFT);
    return 0;
}

Fortran code for the example is as follows:

 program fft2d_shared_descr_main
  use mkl_dfti

  integer nth, len(2)
! 4 OMP threads, each does 2D FFT 50x100 points
  parameter (nth = 4, len = (/50, 100/))
  complex x(len(2)*len(1), nth)
  type(dfti_descriptor), pointer :: FFT

  integer th, status, myStatus

  status = DftiCreateDescriptor (FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len)
  status = DftiSetValue (FFT, DFTI_NUMBER_OF_USER_THREADS, nth)
  status = DftiCommitDescriptor (FFT)
! assume x is initialized and do 2D FFTs
!$OMP PARALLEL DO SHARED(len, x, FFT) PRIVATE(myStatus)
  do th = 1, nth
    myStatus = DftiComputeForward (FFT, x(:, th))
  end do
!$OMP END PARALLEL DO
  status = DftiFreeDescriptor (FFT)
end

Submit feedback on this help topic

Copyright © 1994 - 2011, Intel Corporation. All rights reserved.