To benchmark a cluster, follow the sequence of steps below (some of them are optional). Pay special attention to the iterative steps 3 and 4. They make a loop that searches for HPL parameters (specified in HPL.dat) that enable you to reach the top performance of your cluster.
You may run nodeperf.c (included in the distribution) to see the performance of DGEMM on all the nodes.
Compile nodeperf.c with your MPI and Intel MKL. For example:
mpiicc -O3 nodeperf.c -L$MKLPATH $MKLPATH/libmkl_intel_lp64.a \ -Wl,--start-group $MKLPATH/libmkl_sequential.a \ $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread .
Launching nodeperf.c on all the nodes is especially helpful in a very large cluster. nodeperf enables quick identification of the potential problem spot without numerous small MP LINPACK runs around the cluster in search of the bad node. It goes through all the nodes, one at a time, and reports the performance of DGEMM followed by some host identifier. Therefore, the higher the DGEMM performance, the faster that node was performing.
Edit
HPL.dat
to fit your cluster needs.
Read through the HPL documentation for ideas on this. Note, however, that you should use at least 4 nodes.
Make an HPL run, using compile options such as ASYOUGO, ASYOUGO2, or ENDEARLY to aid in your search. These options enable you to gain insight into the performance sooner than HPL would normally give this insight.
When doing so, follow these recommendations:
Use MP LINPACK, which is a patched version of HPL, to save time in the search.
All performance intrusive features are compile-optional in MP LINPACK. That is, if you do not use the new options to reduce search time, these features are disabled. The primary purpose of the additions is to assist you in finding solutions.
HPL requires a long time to search for many different parameters. In MP LINPACK, the goal is to get the best possible number.
Given that the input is not fixed, there is a large parameter space you must search over. An exhaustive search of all possible inputs is improbably large even for a powerful cluster. MP LINPACK optionally prints information on performance as it proceeds. You can also terminate early.
Save time by compiling with -DENDEARLY -DASYOUGO2 and using a negative threshold (do not use a negative threshold on the final run that you intend to submit as a Top500 entry). Set the threshold in line 13 of the HPL 2.0 input file HPL.dat
If you are going to run a problem to completion, do it with -DASYOUGO.
Using the quick performance feedback, return to step 3 and iterate until you are sure that the performance is as good as possible.