高级计算机系统结构第三章
17
4) Utilization
It is the ratio of the achieved speed to the peak speed of a given computer A sequential application executing on a single MPP processor has a utilization ranging from 5%-40%, typical 8%-35% A parallel application executing on multiple processors has a utilization ranging from 1%35%, typical 4%-20% Some benchmark can reach higher utilization, for example : ASCI White Pacific IBM SP POWER3(375MHz) U = 7.226/12.3 = 58.7 %, NEC Earth Simulator can reach U = 35.8/40.96 = 87.4%
4
(2) According to macro or micro:
– Macro benchmark → measure the performance as a whole – Micro benchmark → measure the performance from a specific aspect, such as, CPU speed, memory access time, I/O speed, OS performance , networking
5
§3.1.1 Micro Benchmarks
Name LINPARK (Top 500) LMBENCH STREAM Measuring Numerical computing (Linear algebra) System calls and data movement operations in Unix Memory bandwidth
11
SPEC89 SPEC92 SPEC95: (CPU-intensive applications)
– SPEC95 CPU benchmarks are most famous SPEC benchmarks widely used by vendors and users – they measure the CPU speed, the cache/memory system, and the compiler as a whole
Chapter 3. Performance Metrics and Benchmarks
§3.1 System and Application
Benchmarks
1.Definition of Benchmark: A benchmark is a performance testing program that supposedly captures processing and data movement characteristics of a class of applications
2
A benchmark suite = A set of benchmark programs + a set of specific rules governing the test conditions and procedures, including the
– – – – tested platform environment, the input data, the output results, and the performance metrics
9
§3.1.3 Business and TPC Benchmarks
TPC - Transaction Processing Performance Council The most popular benchmark for commercial applications is TPC-C Benchmark
3. Classification of benchmarks
(1)According to application classes – scientific computing – commercial applications – network services – multimedia applications – signal processing
14
1) Execution Time
To run the user’s application on the target machine and measure the wall clock time elapsed. But this approach is sometimes difficult to apply Execution time is critical to some applications, such as in a real-time application Execution time alone does not give much clue to a true performance of the machine
Rmax (Tflops) Rpeak (Tflops)
35.8 40.96
Nmax
1075200
Country
Japan
2
8192
13.88
20.48
633000
USA
3
2304
7.634
11.06
3ห้องสมุดไป่ตู้0000
USA
4
8192
7.304
12.288
518096
USA
4‘
8192 512 (Xeon 2GHz)
16
3) System Throughout
Throughput is defined to be the number of jobs processed in a unit time The throughput is usually used when multiple jobs are executed simultaneously.
20
§3.3 Basic Performance Metrics
Workload Metrics
– Execution time : depending on algorithm , data structure, input data, platform, and language – Instruction count : depending on input data, platform (RISC, CISC), compiler – Floating -point count : normally it is independent
6
Nov. 2006 Top500
7
2003, June,1st four of Top 500
Ran k 1 Computer NEC Earth-Simulator HP SC ES 45 MCR Linux Cluster Xeon 2.4GHz IBM ASCI White SP Power3 神州 IV Alpha 800MHz 联想 深腾1800 深腾 Number of processor 5120
10
§3.1.4 SPEC Benchmark Family
SPEC(Standard Performance Evaluation Corporation) emphasizes developing real applications benchmarks that closely reflect the actual workload SPEC defines a few (2 in many cases) metrics that measure the overall performance of entire system
12
– 8 integer programs → SPECint95 – 10 floating-point programs → SPECfp95 – All SPEC95 results are expressed as ratios compared to a Sun SPARC station 10/40, the reference machine
N.A
13.107
N.A
China
52
1.046
2.048
153600
China
8
§3.1.2 Parallel Computing Benchmarks
1.The NBP Suite - NAS parallel benchmarks 2.The PARKBENCH - PARallel Kernels and BENCHmarks) 3.The Parallel STAP Suite - The Space-Time Adaptive Processing
19
6) Performance/cost
It is defined as the ratio of the speed to the purchasing price Gflop/s per $M Should use sustained performance/cost, not peak performance/cost
Performance : how to measure 1) Execution Time 2) Processing Speed 3) System Throughput 4) Utilization 5) Cost-effectiveness 6) Performance/cost • These performance requirements could lead to quite different conclusions for the same application on the same computer platform