HACCKernels: A Benchmark for HACC's Particle Force Kernels The Hardware/Hybrid Accelerated Cosmology Code (HACC), a cosmology N-body-code framework, is designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. The gravitational force is the only significant force between particles at cosmological scales, and, in HACC, this force is divided into two components: a long-range component and a short-range component. The long-range component is handled using a distributed grid-based solver, and the short-range component is by more-direct particle-particle computations. On many systems, a tree-based multipole approximation is used to further reduce the computational complexity of the short-range force. The inner-most computation is a direct N^2 particle-particle force calculation of the short-range part of the gravitational force. It is this inner-most calculation that consumes most of the simulation time, is computationally bound, and is what is represented by this benchmark. Because this inner-most force calculation is algorithmically isolated from the overall scale of the problem, the parameters don't need to be adjusted to represent the workload on different machine scales (e.g. petascale or exascale). For more information on HACC, see: Salman Habib, et al. HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures. New Astronomy Volume 42, January 2016, pp. 49-65. http://doi.org/10.1016/j.newast.2015.06.003 https://arxiv.org/abs/1410.2805 The benchmark can be compiled using cmake (or make directly using Makefile.simple) and then run like this: $ ./HACCKernels Maximum OpenMP Threads: 1 Iterations: 2000 Gravity Short-Range-Force Kernel (4th Order): 26307.2 -122.385 -1369.32: 4.45269 s Gravity Short-Range-Force Kernel (5th Order): 26297.5 -123.056 -1368.67: 4.51347 s Gravity Short-Range-Force Kernel (6th Order): 26297.6 -123.225 -1368.66: 4.8256 s The accumulated acceleration in each direction for all particles in the last iteration, which is a function of the total number of iterations, is printed as a diagnostic. It should be similar for all polynomial kernel orders. If you'd like the benchmark only to display deterministic output (i.e. omitting information on the number of threads, timing, and the like), then define the preprocessor symbol VERIFICATION_OUTPUT_ONLY when compiling. You can enable this option when configuring by passing -DVERIFICATION_OUTPUT_ONLY=ON to cmake. Compared to the older HACCmk procurement benchmark (https://asc.llnl.gov/CORAL-benchmarks/#haccmk), this benchmark: * More closely matches the parallelization scheme used by the production code. * Uses a more-realistic distribution of interaction-list lengths and out-of-bounds particles. * Includes 4th-, 5th-, and 6th-order kernels. For more information, contact: Hal Finkel <hfinkel@anl.gov>
Adrian Pope
authored
Name | Last commit | Last update |
---|---|---|
CMakeLists.txt | ||
COPYING | ||
GravityForceKernel.cpp | ||
HACCKernels.h | ||
Makefile.simple | ||
README | ||
main.cpp |