Intel® Math Kernel Library (Intel® MKL) 10.3 Release Notes

This document provides a general summary of new features and important notes about the Intel® Math Kernel Library (Intel® MKL) software product.

Please see the following links to the online resources and documents for the latest information regarding Intel MKL:

Links to documentation, help, and code samples can be found on the main Intel MKL product page. For technical support visit the Intel MKL technical support forum and review the articles in the Intel MKL knowledgebase.

Please register your product using your preferred email address. This helps Intel recognize you as a valued customer in the support forum and insures that you will be notified of product updates. You can read Intel's Online Privacy Notice Summary if you have any questions regarding the use of your email address for software product registration.

What's New in Intel® MKL 10.3 update 5

BLAS: Improved performance: {S,C,Z}TRSM for processors with Intel® Advanced Vector Extensions (Intel® AVX); {S,D}GEM2VU for processors with Intel AVX as well as the Intel® Core™ i7 processor and the Intel® Xeon® processor 5500 series
BLAS: Improved scaling: ?TRMV for large matrices on all architectures; DGEMM for odd numbers of threads on Intel® Xeon® processor 5400 series
LAPACK: Included LAPACK 3.3.1 extensions and the respective LAPACKE interfaces
LAPACK: Improved the performance of ?SYGST and ?HEGST used in generalized eigenvalue problems
LAPACK: Improved the performance of the inverse of an LU factored matrix (?GETRI)
PARDISO: Added transpose and conjugate transpose solve capability (A^Tx=b and A^Hx=b); facilitates compressed sparse column (CSC) format support
PARDISO: Improve out-of-core PARDISO performance when the memory requirements slightly exceed available memory using MKL_PARDISO_OOC_MAX_SWAP_SIZE environment variable and in-core PARDISO
Optimization Solvers: Added Inf and NaN checks in the RCI Trust-Region solvers
FFTs: Improved the performance of 3D FFTs on small cubes from 2x2x2 to 10x10x10 for all supported precisions and types on all Intel® processors supporting Intel® SSE3 and later
FFT examples: Re-designed example programs to cover common use cases for Intel MKL DFTI and FFTW
VSL: Improved the performance of the single precision MT19937 and MT2203 basic random number generators on the Intel® Core™ i7-2600 processor on 64-bit operating systems
VSL: Improved the performance of the integer version of the SOBOL quasi-random number generator on the Intel® Core™ i7-2600 processor and Intel® Xeon® processor 5400 series

What's New in Intel® MKL 10.3 update 4

BLAS: Improved DTRMM performance on Intel® Xeon® processors 5400 and later
BLAS: Improved DTRSM performance on all 64-bit enabled processors, especially processors with Intel® Advanced Vector Extensions (Intel® AVX)
LAPACK: Incorporated bug fixes from the LAPACK 3.3.1 release
OOC PARDISO: Improved the estimate of the amount of memory needed in out-of-core operation
FFT: Improved 1D real FFT scaling through improved threading
FFT: Updated C and Fortran FFT examples to use the new single dynamic library linking model
VML: Improved performance of the single precision Enhanced Performance version of the real Hypot and complex Abs functions and of the complex Arg, Div, Mul, MulByConj functions for all accuracy modes on Intel® Xeon® processors 5600 and 7500 series, and the Intel® Core™ i7-2600 processor
Service functions: Improvements and additions to the Intel MKL service functions (see the online release notes for more information)
Bug fixes

What's New in Intel® MKL 10.3 update 3

BLAS: Improved multi-threaded performance of DSYRK, DTRSM, and DGEMM on Intel® Xeon® processor 5400 series running 32-bit Windows*
LAPACK: Implemented LAPACK 3.3 from netlib including Cosine-Sine decomposition, improved linear equations solvers for symmetric and Hermitian matrices and auxiliary functions
PARDISO: 0-based permutation vectors are now allowed at input
PARDISO: Documentation for the pardisoinit() routine
PARDISO: Improved performance of serial PARDISO with multiple right-hand sides (RHS)
PARDISO: Independent control for parallelism in the solve step for improved performance on small matrices—see description of iparm(25)
PARDISO: Reduced backward substitution—allows partial solution computation for a full RHS—see description of iparm(31)
FFT: Implemented Real FFT transforms for 3 to 7 dimensions
FFT: Parallelized multi-dimensional complex transforms using split-complex data represented as two real arrays
Cluster FFTs: Extended FORTRAN 90 interface to real-to-complex transforms and included new examples
VML: Added new complex Pack/Unpack functions and real Gamma/LGamma functions
VML: Improved performance on Intel® Xeon® processor 5600 series and processors supporting Intel® Advanced Vector Extensions (Intel® AVX) for the following: all functions when operating on short vectors (<100), all functions when operating on unaligned input vectors, the sPow2o3 function, and the enhanced performance (EP) version of complex Add and Sub
VSL: Functions for saving/restoring random number generator (RNG) streams to/from memory
VSL: Added new UniformBits32 and UniformBits64 functions
VSL: Extended the number of unique streams supported by the MT2203 Basic RNG from 1024 to 6024
Bug fixes

What's New in Intel® MKL 10.3 update 2

BLAS: Improved performance of transposition functions on the Intel® Xeon® processor 5600 series
BLAS: Added examples for transposition routines
FFT: Added Fortran examples showing how to reduce application footprint by linking only functions with the desired precision
FFT: Added check for stride consistency on in-place real transforms with CCE storage
FFT: Expanded threading to new cases for multi-dimensional transforms
VSL: Improved performance of Multivariate Gaussian random number generator for single- and double-precision on 4-core Intel® Xeon® processors 5500 series
VML: Improved performance of in-place operation of Add, Mul, and Sub functions on the Intel® Xeon® processor 5500 series
Bug fixes

What's New in Intel® MKL 10.3 update 1

PARDISO/DSS: Added true F90 overloaded API (see the Intel MKL reference manual for more information)
PARDISO: Improved the statistical reporting to be more reader friendly
Sparse BLAS: Improved performance of ?BSRMM functions on Intel® Core™ i7 processors
FFTs: Support for negative strides
FFT examples: Added examples for split-complex FFTs in C and Fortran using both the DFTI and FFTW3 interfaces
VML: Improved performance of real in-place Add/Sub/Mul/Sqr functions on systems supporting SSE2 and SSE3
Poisson Library: Changed the default behavior of the Poisson library functions from sequential to threaded operation
Bug fixes

What's New in Intel® MKL 10.3

BLAS
- New functions for computing 2 matrix-vector products at once: [D/S]GEM2VU, [Z/C]GEM2VC
- New functions for computing mixed precision general matrix-vector products: [DZ/SC]GEMV
- New function for computing the sum of two scaled vectors: *AXPBY
- Intel® AVX optimizations in key functions: SMP LINPACK, level 3 BLAS, DDOT, DAXPY
LAPACK
- New C interfaces for LAPACK supporting row-major ordering
- Integrated Netlib LAPACK 3.2.2 including one new computational routine (*GEQRFP) and two new auxiliary routines (*GEQR2P and *LARFGP) and the earlier LAPACK 3.2.1 update
- Intel® AVX optimizations in key functions: DGETRF, DPOTRF, DGEQRF
PARDISO
- Improved performance of factor and solve steps in multi-core environments
- Introduced the ability to solve for sparse right-hand sides and perform partial solves—produces partial solution vector
- Improved performance of the out-of-core (OOC) factorization step
- Support for zero-based (C-style) array indexing
- Zeros on the diagonal of the matrix are no longer required in sparse data structures for symmetric matrices
- New ILP64 PARDISO interface allows the use of both LP64 and ILP64 versions when linked to the LP64 libraries
- The memory required for storing files on the disk in OOC mode can now be estimated just after reordering
Sparse BLAS
- Format conversion functions now support all data types (single and double precision for real and complex data) and can return sorted or unsorted arrays
FFTs
- New MPI FFTW 3.3alpha1 wrappers cover new cluster functionality
- Improved load-balancing of cluster FFTs provides improved performance
- Intel AVX optimizations in all 1D/2D/3D FFTs
- Improved performance of 2D and 3D mixed-radix FFTs for single and double precision data for all systems supporting the SSE4.2 instruction set
- Support for split-complex data represented as two real arrays introduced for 2D/3D FFTs
- Support for 1D complex-to-complex transforms of large prime lengths
- Introduced Hybrid parallelism (MPI + OpenMP*) on cluster 1D complex transforms and increased performance on vector lengths which are a multiple of the number of MPI processes
VML
- A new function for computing (ax+b)/(cy+d) where a, b, c, and d are scalars, and x and y are real vectors: v[s/d]LinearFrac()
- Intel AVX optimizations for real functions
- A new mode for setting denormals to zero, overflow support for complex vectors, and for every VML function a new function with an additional parameter for setting the accuracy mode
VSL
- A set of new Summary Statistics functions was added covering basic statistics, covariance and correlation, pooled, group, partial, and robust covariance/correlation, quantiles and streaming quantiles, outliers detection algorithm, and missing values support
  - Performance optimized algorithms: MI algorithm for support of missing values, TBS algorithm for computation of robust covariance, BACON algorithm for detection of outliers, ZW algorithm for computation of quantiles (streaming data case), and 1PASS algorithm for computation of pooled covariance
- Improved performance of SFMT19937 Basic Random Number Generator (BRNG)
- Intel® AVX optimizations: MT19937 and MT2203 BRNGs
Documentation: Product documentation is available in the Microsoft Help Viewer* 1.x format that integrates with Microsoft Visual Studio* 2010
Added runtime dispatching dynamic libraries allowing link to a single interface library which loads dependent libraries dynamically at runtime depending on runtime CPU detection and/or library function calls
The custom dynamic libraries builder now uses the runtime dispatching dynamic libraries on the Linux* and Mac OS* X operating systems
A new directory structure has been established to simplify integration of Intel MKL with the Intel® Parallel Studio XE family of products and directories formerly designated as "em64t" are now designated by the "intel64" tag
Intel® Itanium® architecture (IA-64) support is not included in this release. Intel® MKL 10.2 is the latest release for IA-64
The sparse solver functionality has been fully integrated into the core Intel MKL libraries and the libraries with "solver" in the filename have been removed from the product

Notices

The Intel MKL GNU Multiple Precision* (GMP) function interfaces will be removed in a future library release.
The timing function mkl_set_cpu_frequency() is deprecated. Please use mkl_get_max_cpu_frequency(), mkl_get_clocks_frequency(), and mkl_get_cpu_frequency() as described in the Intel® MKL Reference Manual.

Product Contents

The Intel® Math Kernel Library (Intel® MKL) version 10.3 and updates consists of three installation packages: one package for both IA-32 and Intel® 64 architectures, one for IA-32 only, and one for Intel® 64 architecture only.

Technical Support

If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates and upgrades for the duration of the support term.

For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.

Note: If your distributor provides technical support for this product, please contact them rather than Intel.

For technical information about Intel MKL, including FAQ's, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/.

Attributions

As referenced in the End User License Agreement, attribution requires, at a minimum, prominently displaying the full Intel product name (e.g. "Intel® Math Kernel Library") and providing a link/URL to the Intel® MKL homepage (http://www.intel.com/software/products/mkl) in both the product documentation and website.

The original versions of the BLAS from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/blas/index.html.

The original versions of LAPACK from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. Our FORTRAN 90/95 interfaces to LAPACK are similar to those in the LAPACK95 package at http://www.netlib.org/lapack95/index.html. All interfaces are provided for pure procedures.

The original versions of ScaLAPACK from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK are L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley.

PARDISO in Intel® MKL is compliant with the 3.2 release of PARDISO that is freely distributed by the University of Basel. It can be obtained at http://www.pardiso-project.org.

Some FFT functions in this release of Intel® MKL have been generated by the SPIRAL software generation system (http://www.spiral.net/) under license from Carnegie Mellon University. The Authors of SPIRAL are Markus Puschel, Jose Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo.

License Definitions

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R) PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site.

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.

This document contains information on products in the design phase of development.

BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Core Inside, FlashFile, i960, InstantIP, Intel, Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.

*Other names and brands may be claimed as the property of others.

Optimization Notice
Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the "Intel® Compiler User and Reference Guides" under "Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20101101

Optimization Notice

Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the "Intel® Compiler User and Reference Guides" under "Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.

Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.

Notice revision #20101101