The OpenMP* software responds to the environment variable OMP_NUM_THREADS. Intel MKL also has other mechanisms to set the number of threads, such as the MKL_NUM_THREADS or MKL_DOMAIN_NUM_THREADS environment variables (see Using Additional Threading Control).
Make sure that the relevant environment variables have the same and correct values on all the nodes. Intel MKL versions 10.0 and higher no longer set the default number of threads to one, but depend on the OpenMP libraries used with the compiler to set the default number. For the threading layer based on the Intel compiler (libmkl_intel_thread.a), this value is the number of CPUs according to the OS.
Avoid over-prescribing the number of threads, which may occur, for instance, when the number of MPI ranks per node and the number of threads per node are both greater than one. The product of MPI ranks per node and the number of threads per node should not exceed the number of physical cores per node.
The best way to set an environment variable, such as OMP_NUM_THREADS, is your login environment. Remember that changing this value on the head node and then doing your run, as you do on a shared-memory (SMP) system, does not change the variable on all the nodes because mpirun starts a fresh default shell on all the nodes. To change the number of threads on all the nodes, in .bashrc, add a line at the top, as follows:
OMP_NUM_THREADS=1; export OMP_NUM_THREADS
You can run multiple CPUs per node using MPICH. To do this, build MPICH to enable multiple CPUs per node. Be aware that certain MPICH applications may fail to work perfectly in a threaded environment (see the Known Limitations section in the Release Notes. If you encounter problems with MPICH and setting of the number of threads is greater than one, first try setting the number of threads to one and see whether the problem persists.