The following Intel MKL function domains are threaded:
Direct sparse solver.
LAPACK.
For the list of threaded routines, see Threaded LAPACK Routines.
Level1 and Level2 BLAS.
For the list of threaded routines, see Threaded BLAS Level1 and Level2 Routines.
All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers.
All mathematical VML functions.
FFT.
For the list of FFT transforms that can be threaded, see Threaded FFT Problems.
In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.
The following LAPACK routines are threaded:
A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of parallelism:
?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges, cggesx/zggesx, cggev/zggev, cggevx/zggevx, and so on.
In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.
The following routines are threaded for Intel® Core™2 Duo and Intel® Core™ i7 processors:
The following characteristics of a specific problem determine whether your FFT computation may be threaded:
Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow.
One-dimensional (1D) transforms
1D transforms are threaded in many cases.
1D complex-to-complex (c2c) transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture:
Architecture |
Conditions |
---|---|
Intel® 64 |
N is a power of 2, log2(N) > 9, the transform is double-precision out-of-place, and input/output strides equal 1. |
IA-32 |
N is a power of 2, log2(N) > 13, and the transform is single-precision. |
N is a power of 2, log2(N) > 14, and the transform is double-precision. |
|
Any |
N is composite, log2(N) > 16, and input/output strides equal 1. |
1D real-to-complex and complex-to-real transforms are not threaded.
1D complex-to-complex transforms using split-complex layout are not threaded.
Prime-size complex-to-complex 1D transforms are not threaded.
Multidimensional transforms
All multidimensional transforms on large-volume data are threaded.