Linpack测试手册(1)Voltaire Infiniband:Step1:安装voltaire MPI(与HCA卡驱动集成在一起)安装HCA卡驱动:将驱动安装包VoltaireOFED-5.1.3.1_5-k2.6.18-92.el5-x86_64.tar.bz2放到/root 目录下,运行命令:tar –zxvf VoltaireOFED-5.1.3.1_5-k2.6.18-92.el5-x86_64.tar.bz2cd VoltaireOFED-5.1.3.1_5-k2.6.18-92.el5-x86_64./install.sh安装完毕后查看voltaire MPI是否正常Which mpicc提示/opt/vltmpi/OPENIB/mpi/bin/mpicc则返还正常,可进行下一步。
Step2:安装数学库(GotoBLAS)将数学库安装包GotoBLAS-1.26.tar.gz放到/hpc目录下,运行:tar –zxvf GotoBLAS-1.26.tar.gzcd GotoBLAS32 bit安裝:./quickbuild.32bit64 bit安裝:./quickbuild.64bit安裝完成后,在当前目录下会生成3个文件,系統根据你的CPU型式來取名,例如:libgoto.alibgoto_core2p-r1.14.a 系統根据你的CPU型式來取名libgoto_core2p-r1.14.so其中libgoto.a即为使用的数学库函数,记下该路径Step3:安装linpack测试包(hpl.tgz)将linpack测试包hpl.tgz放到/hpc目录下,运行tar –xvf hpl.tgzcd hplcd setupcp ./Make.Linux_PII_FBLAS /hpc/hpl/Make.testcd ..pwd目录为/hpc/hpl/vi Make.test编辑该文件如下地方需要更改:TOPdir = /hpc/hplINCdir = $(TOPdir)/includeBINdir = $(TOPdir)/bin/$(ARCH)LIBdir = $(TOPdir)/lib/$(ARCH)MPdir = /opt/vltmpi/OPENIB/mpiMPinc = -I$(MPdir)/includeMPlib = $(MPdir)/lib/libmpich.aLAdir = /hpc/GotoBLASLAlib = $(LAdir)/libgoto.aCC = /opt/vltmpi/OPENIB/mpi/bin/mpiccLINKER = /opt/vltmpi/OPENIB/mpi/bin/mpif77更改完毕保存后进行编译make arch=test完成后会在/hpc/hpl/bin下生成test目录,进入cd bin/test会看到2个文件HPL.dat 和xhpl编辑HPL.dat,设置如下:P值,Q值,NB值,Ns值可根据情况调整,不能超过sqrt((单个计算节点内存*计算节点个数)/8 )*0.8,否则可能导致测试中使用swap分区或者内存耗尽而导致的死机,P*Q=进程数=核数,16台计算节点,内存8G,每节点8核心数,共128核心例子如下:HPLinpack benchmark input fileInnovative Computing Laboratory, University of TennesseeHPL.out output file name (if any)6 device out (6=stdout,7=stderr,file)1 # of problems sizes (N)100000 Ns1 # of NBs192 NBs0 PMAP process mapping (0=Row-,1=Column-major)1 # of process grids (P x Q)8 Ps16 Qs16.0 threshold1 # of panel fact0 PFACTs (0=left, 1=Crout, 2=Right)1 # of recursive stopping criterium2 NBMINs (>= 1)1 # of panels in recursion1 # of recursive panel fact.0 RFACTs (0=left, 1=Crout, 2=Right)1 # of broadcast0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)1 # of lookahead depth0 DEPTHs (>=0)2 SWAP (0=bin-exch,1=long,2=mix)64 swapping threshold0 L1 in (0=transposed,1=no-transposed) form0 U in (0=transposed,1=no-transposed) form1 Equilibration (0=no,1=yes)8 memory alignment in double (> 0)编辑完成后创建运行节点的列表hostlist文件,每个核心对应一行节点名。
8核心示意如下:Vi hostlistcn01cn01cn01cn01cn01cn01cn01cn01cn02cn02cn02cn02cn02cn02cn02cn02...cn16cn16cn16cn16cn16cn16cn16cn16保存该文件后按如下命令运行linpack测试:Mpirun_ssh –hostfile ./hostlist –np 128 ./xhpl进行计算,计算完毕后得出计算结果。
如下所示============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================An explanation of the input/output parameters follows:T/V : Wall time / encoded variant.N : The order of the coefficient matrix A.NB : The partitioning blocking factor.P : The number of process rows.Q : The number of process columns.Time : Time in seconds to solve the linear system.Gflops : Rate of execution for solving the linear system.The following parameter values will be used:N : 103000NB : 168PMAP : Row-major process mappingP : 8Q : 16PFACT : LeftNBMIN : 2NDIV : 2RFACT : LeftBCAST : 1ringDEPTH : 0SWAP : Mix (threshold = 64)L1 : transposed formU : transposed formEQUIL : yesALIGN : 8 double precision words----------------------------------------------------------------------------- The matrix A is randomly generated for each test.- The following scaled residual checks will be computed:1) ||Ax-b||_oo / ( eps * ||A||_1 * N )2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )- The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR00L2L2 103000 168 8 16 682.09 1.068e+03 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0020002 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0026000 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0004820 ...... PASSED ============================================================================Finished 1 tests with the following results:1 tests completed and passed residual checks,0 tests completed and failed residual checks,0 tests skipped because of illegal input values.----------------------------------------------------------------------------End of Tests.============================================================================ Cleaning up all processes ...done.用例中有效计算值为1.068e+03————10680亿次每秒。