Intel® Math Kernel Library (Intel® MKL) 10.3 Release Notes
This document provides a general summary of new features and important notes about the Intel® Math Kernel Library (Intel® MKL) software product.
Please see the following links to the online resources and documents for the latest information regarding Intel MKL:
Links to documentation, help, and code samples can be found on the main
Intel MKL product page. For technical support visit the
Intel MKL technical support forum and review the articles in the
Intel MKL knowledgebase.
Please register your product using your preferred email address. This helps Intel recognize you as a valued customer in the support forum and insures that you will be notified of product updates. You can read
Intel's Online Privacy Notice Summary if you have any questions regarding the use of your email address for software product registration.
What's New in Intel® MKL 10.3 update 5
- BLAS: Improved performance: {S,C,Z}TRSM for processors with Intel® Advanced Vector Extensions (Intel® AVX); {S,D}GEM2VU for processors with Intel AVX as well as the Intel® Core™ i7 processor and the Intel® Xeon® processor 5500 series
- BLAS: Improved scaling: ?TRMV for large matrices on all architectures; DGEMM for odd numbers of threads on Intel® Xeon® processor 5400 series
- LAPACK: Included LAPACK 3.3.1 extensions and the respective LAPACKE interfaces
- LAPACK: Improved the performance of ?SYGST and ?HEGST used in generalized eigenvalue problems
- LAPACK: Improved the performance of the inverse of an LU factored matrix (?GETRI)
- PARDISO: Added transpose and conjugate transpose solve capability (ATx=b and AHx=b); facilitates compressed sparse column (CSC) format support
- PARDISO: Improve out-of-core PARDISO performance when the memory requirements slightly exceed available memory using MKL_PARDISO_OOC_MAX_SWAP_SIZE environment variable and in-core PARDISO
- Optimization Solvers: Added Inf and NaN checks in the RCI Trust-Region solvers
- FFTs: Improved the performance of 3D FFTs on small cubes from 2x2x2 to 10x10x10 for all supported precisions and types on all Intel® processors supporting Intel® SSE3 and later
- FFT examples: Re-designed example programs to cover common use cases for Intel MKL DFTI and FFTW
- VSL: Improved the performance of the single precision MT19937 and MT2203 basic random number generators on the Intel® Core™ i7-2600 processor on 64-bit operating systems
- VSL: Improved the performance of the integer version of the SOBOL quasi-random number generator on the Intel® Core™ i7-2600 processor and Intel® Xeon® processor 5400 series
What's New in Intel® MKL 10.3 update 4
- BLAS: Improved DTRMM performance on Intel® Xeon® processors 5400 and later
- BLAS: Improved DTRSM performance on all 64-bit enabled processors, especially processors with Intel® Advanced Vector Extensions (Intel® AVX)
- LAPACK: Incorporated bug fixes from the LAPACK 3.3.1 release
- OOC PARDISO: Improved the estimate of the amount of memory needed in out-of-core operation
- FFT: Improved 1D real FFT scaling through improved threading
- FFT: Updated C and Fortran FFT examples to use the new single dynamic library linking model
- VML: Improved performance of the single precision Enhanced Performance version of the real Hypot and complex Abs functions and of the complex Arg, Div, Mul, MulByConj functions for all accuracy modes on Intel® Xeon® processors 5600 and 7500 series, and the Intel® Core™ i7-2600 processor
- Service functions: Improvements and additions to the Intel MKL service functions (see the online release notes for more information)
- Bug fixes
What's New in Intel® MKL 10.3 update 3
- BLAS: Improved multi-threaded performance of DSYRK, DTRSM, and DGEMM on Intel® Xeon® processor 5400 series running 32-bit Windows*
- LAPACK: Implemented LAPACK 3.3 from netlib including Cosine-Sine decomposition, improved linear equations solvers for symmetric and Hermitian matrices and auxiliary functions
- PARDISO: 0-based permutation vectors are now allowed at input
- PARDISO: Documentation for the pardisoinit() routine
- PARDISO: Improved performance of serial PARDISO with multiple right-hand sides (RHS)
- PARDISO: Independent control for parallelism in the solve step for improved performance on small matrices—see description of iparm(25)
- PARDISO: Reduced backward substitution—allows partial solution computation for a full RHS—see description of iparm(31)
- FFT: Implemented Real FFT transforms for 3 to 7 dimensions
- FFT: Parallelized multi-dimensional complex transforms using split-complex data represented as two real arrays
- Cluster FFTs: Extended FORTRAN 90 interface to real-to-complex transforms and included new examples
- VML: Added new complex Pack/Unpack functions and real Gamma/LGamma functions
- VML: Improved performance on Intel® Xeon® processor 5600 series and processors supporting Intel® Advanced Vector Extensions (Intel® AVX) for the following: all functions when operating on short vectors (<100), all functions when operating on unaligned input vectors, the sPow2o3 function, and the enhanced performance (EP) version of complex Add and Sub
- VSL: Functions for saving/restoring random number generator (RNG) streams to/from memory
- VSL: Added new UniformBits32 and UniformBits64 functions
- VSL: Extended the number of unique streams supported by the MT2203 Basic RNG from 1024 to 6024
- Bug fixes
What's New in Intel® MKL 10.3 update 2
- BLAS: Improved performance of transposition functions on the Intel® Xeon® processor 5600 series
- BLAS: Added examples for transposition routines
- FFT: Added Fortran examples showing how to reduce application footprint by linking only functions with the desired precision
- FFT: Added check for stride consistency on in-place real transforms with CCE storage
- FFT: Expanded threading to new cases for multi-dimensional transforms
- VSL: Improved performance of Multivariate Gaussian random number generator for single- and double-precision on 4-core Intel® Xeon® processors 5500 series
- VML: Improved performance of in-place operation of Add, Mul, and Sub functions on the Intel® Xeon® processor 5500 series
- Bug fixes
What's New in Intel® MKL 10.3 update 1
- PARDISO/DSS: Added true F90 overloaded API (see the Intel MKL reference manual for more information)
- PARDISO: Improved the statistical reporting to be more reader friendly
- Sparse BLAS: Improved performance of ?BSRMM functions on Intel® Core™ i7 processors
- FFTs: Support for negative strides
- FFT examples: Added examples for split-complex FFTs in C and Fortran using both the DFTI and FFTW3 interfaces
- VML: Improved performance of real in-place Add/Sub/Mul/Sqr functions on systems supporting SSE2 and SSE3
- Poisson Library: Changed the default behavior of the Poisson library functions from sequential to threaded operation
- Bug fixes
What's New in Intel® MKL 10.3
- BLAS
- New functions for computing 2 matrix-vector products at once: [D/S]GEM2VU, [Z/C]GEM2VC
- New functions for computing mixed precision general matrix-vector products: [DZ/SC]GEMV
- New function for computing the sum of two scaled vectors: *AXPBY
- Intel® AVX optimizations in key functions: SMP LINPACK, level 3 BLAS, DDOT, DAXPY
- LAPACK
- New C interfaces for LAPACK supporting row-major ordering
- Integrated Netlib LAPACK 3.2.2 including one new computational routine (*GEQRFP) and two new auxiliary routines (*GEQR2P and *LARFGP) and the earlier LAPACK 3.2.1 update
- Intel® AVX optimizations in key functions: DGETRF, DPOTRF, DGEQRF
- PARDISO
- Improved performance of factor and solve steps in multi-core environments
- Introduced the ability to solve for sparse right-hand sides and perform partial solves—produces partial solution vector
- Improved performance of the out-of-core (OOC) factorization step
- Support for zero-based (C-style) array indexing
- Zeros on the diagonal of the matrix are no longer required in sparse data structures for symmetric matrices
- New ILP64 PARDISO interface allows the use of both LP64 and ILP64 versions when linked to the LP64 libraries
- The memory required for storing files on the disk in OOC mode can now be estimated just after reordering
- Sparse BLAS
- Format conversion functions now support all data types (single and double precision for real and complex data) and can return sorted or unsorted arrays
- FFTs
- New MPI FFTW 3.3alpha1 wrappers cover new cluster functionality
- Improved load-balancing of cluster FFTs provides improved performance
- Intel AVX optimizations in all 1D/2D/3D FFTs
- Improved performance of 2D and 3D mixed-radix FFTs for single and double precision data for all systems supporting the SSE4.2 instruction set
- Support for split-complex data represented as two real arrays introduced for 2D/3D FFTs
- Support for 1D complex-to-complex transforms of large prime lengths
- Introduced Hybrid parallelism (MPI + OpenMP*) on cluster 1D complex transforms and increased performance on vector lengths which are a multiple of the number of MPI processes
- VML
- A new function for computing (ax+b)/(cy+d) where a, b, c, and d are scalars, and x and y are real vectors: v[s/d]LinearFrac()
- Intel AVX optimizations for real functions
- A new mode for setting denormals to zero, overflow support for complex vectors, and for every VML function a new function with an additional parameter for setting the accuracy mode
- VSL
- A set of new Summary Statistics functions was added covering basic statistics, covariance and correlation, pooled, group, partial, and robust covariance/correlation, quantiles and streaming quantiles, outliers detection algorithm, and missing values support
- Performance optimized algorithms: MI algorithm for support of missing values, TBS algorithm for computation of robust covariance, BACON algorithm for detection of outliers, ZW algorithm for computation of quantiles (streaming data case), and 1PASS algorithm for computation of pooled covariance
- Improved performance of SFMT19937 Basic Random Number Generator (BRNG)
- Intel® AVX optimizations: MT19937 and MT2203 BRNGs
- Documentation: Product documentation is available in the Microsoft Help Viewer* 1.x format that integrates with Microsoft Visual Studio* 2010
- Added runtime dispatching dynamic libraries allowing link to a single interface library which loads dependent libraries dynamically at runtime depending on runtime CPU detection and/or library function calls
- The custom dynamic libraries builder now uses the runtime dispatching dynamic libraries on the Linux* and Mac OS* X operating systems
- A new directory structure has been established to simplify integration of Intel MKL with the Intel® Parallel Studio XE family of products and directories formerly designated as "em64t" are now designated by the "intel64" tag
- Intel® Itanium® architecture (IA-64) support is not included in this release. Intel® MKL 10.2 is the latest release for IA-64
- The sparse solver functionality has been fully integrated into the core Intel MKL libraries and the libraries with "solver" in the filename have been removed from the product
Notices
- The Intel MKL GNU Multiple Precision* (GMP) function interfaces will be removed in a future library release.
- The timing function
mkl_set_cpu_frequency()
is deprecated. Please use mkl_get_max_cpu_frequency()
, mkl_get_clocks_frequency()
, and mkl_get_cpu_frequency()
as described in the Intel® MKL Reference Manual.
Product Contents
The Intel® Math Kernel Library (Intel® MKL) version 10.3 and updates consists of three installation packages: one package for both IA-32 and Intel® 64 architectures, one for IA-32 only, and one for Intel® 64 architecture only.
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQ's, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/.
Attributions
As referenced in the End User License Agreement, attribution requires, at a minimum, prominently displaying the full Intel product name (e.g. "Intel® Math Kernel Library") and providing a link/URL to the Intel® MKL homepage (http://www.intel.com/software/products/mkl) in both the product documentation and website.
The original versions of the BLAS from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/blas/index.html.
The original versions of LAPACK from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. Our FORTRAN 90/95 interfaces to LAPACK are similar to those in the LAPACK95 package at http://www.netlib.org/lapack95/index.html. All interfaces are provided for pure procedures.
The original versions of ScaLAPACK from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK are L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley.
PARDISO in Intel® MKL is compliant with the 3.2 release of PARDISO that is freely distributed by the University of Basel. It can be obtained at http://www.pardiso-project.org.
Some FFT functions in this release of Intel® MKL have been generated by the SPIRAL software generation system (http://www.spiral.net/) under license from Carnegie Mellon University. The Authors of SPIRAL are Markus Puschel, Jose Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo.
License Definitions
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R) PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site.
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.
This document contains information on products in the design phase of development.
BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Core Inside, FlashFile, i960, InstantIP, Intel, Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Optimization Notice |
Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the "Intel® Compiler User and Reference Guides" under "Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.
Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.
Notice revision #20101101
|
Copyright © 2002-2011, Intel Corporation. All rights reserved.