当前位置:文档之家› 相场模拟的并行计算方法

相场模拟的并行计算方法


That Should Have Worked !
CrayPAT Workaround
Use the API for “fine grain” instrumentation Add “PAT_region_{begin/end}” calls to most subroutines After narrowed down to a couple major subroutines, split labels to “computation” and “communication”
Easily parallelized and low memory requirement
Convergence rate depends on resolution, but roughly constant from problem to problem larger problem (with similar resolution) should not increase iterations.
#include <pat_api.h> ... void Complex_Jacobi(…){ ... int PAT_ID, ierr; PAT_ID = 41; ierr = PAT_region_begin(PAT_ID, "communication");
MPI_Internal_Communicate( …);
the project team (Victor Chan) and tested.
7
Profiling the Code with CrayPAT
Measure before optimize Can use sampling or tracing Using CrayPAT is simple: load module, re-compile, build instrumented code, re-run CayPAT can trace only specified group, e.g. mpi, io, heap, fftw, ...
8
> module load perftools > make clean > make
> pat_build –g mpi pfc_jacobi.exe
> aprun –n 48 pfc_jacobi.exe+pat > pat_report –o profile.txt \ <output_data>.xf
5
Complex Iterative Jacobi Solver
Hadley, G. R, A complex Jacobi iterative method for the indefinite Helmholtz Equation, p.Phys. 203 (2005) 358-370
6
Complex Iterative Jacobi Solver
Hadley, G. R, A complex Jacobi iterative method for the indefinite Helmholtz Equation, p.Phys. 203 (2005) 358-370
Project Background
Project Team (University of Michigan): Katsuyo Thornton (P.I.), Victor Chan Phase-field-crystal (PFC) formulation to study dynamics of various metal systems Original in-house code written in C++ Has been run in 2D and 3D systems Solves multiple Helmholtz equations, a reduction, then an explicit time step
Replaced HYPRE A modification of standard Jacobi method
������ ������+1 ������ ������ , Δ������������ , �������பைடு நூலகம்����2 , ������������2 is computed with centereddifference
MPI_Boundary_Communicate(…) ierr = PAT_region_end(PAT_ID); PAT_ID = 42;
ierr = PAT_region_begin(PAT_ID, "computation");
for (int i=1; i<size.L1+1; i++){ for (int j=1; j<size.L2+1; j++){ for (int k=1; k<size.L3+1; k++){ residual(i,j,k)=(1.0/D)*(. . .); } } } ierr = PAT_region_end(PAT_ID);
10
CrayPAT Workaround
Use the API for “fine grain” instrumentation Add “PAT_region_{begin/end}” calls to most subroutines After narrowed down to a couple major subroutines, split labels to “computation” and “communication” Communication subroutine eventually dominate at certain MPI size
MPI_Boundary_Communicate(…) ierr = PAT_region_end(PAT_ID); PAT_ID = 42;
ierr = PAT_region_begin(PAT_ID, "computation");
for (int i=1; i<size.L1+1; i++){ for (int j=1; j<size.L2+1; j++){ for (int k=1; k<size.L3+1; k++){ residual(i,j,k)=(1.0/D)*(. . .); } } } ierr = PAT_region_end(PAT_ID);
Profiling the Code with CrayPAT
Measure before optimize Can use sampling or tracing Using CrayPAT is simple: load module, re-compile, build instrumented code, re-run CayPAT can trace only specified group, e.g. mpi, io, heap, fftw, ...
9
> module load perftools > make clean > make
> pat_build –g mpi pfc_jacobi.exe
> aprun –n 48 pfc_jacobi.exe+pat > pat_report –o profile.txt \ <output_data>.xf
Decrease the time to solution to 1 sec / time step
– Strong scaling: decrease time-to-solution with increasing number of process and a fixed problem size – Exploit other parallelism (with OpenMP?) – Investigate better preconditioner – Different method (library?) to solve the equations
Easily parallelized and low memory requirement
Convergence rate depends on resolution, but roughly constant from problem to problem larger problem (with similar resolution) should not increase iterations. A draft version was quickly implemented by
Decrease the time to solution to 1 sec / time step
– Strong scaling: decrease time-to-solution with increasing number of process and a fixed problem size – Exploit other parallelism (with OpenMP?) – Investigate better preconditioner – Different method to solve the equations
Hybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code
相关主题