hacc
HACCKernels

Repository



HACCKernels: A Benchmark for HACC's Particle Force Kernels

The Hardware/Hybrid Accelerated Cosmology Code (HACC), a cosmology N-body-code
framework, is designed to run efficiently on diverse computing architectures
and to scale to millions of cores and beyond. The gravitational force is the
only significant force between particles at cosmological scales, and, in HACC,
this force is divided into two components: a long-range component and a
short-range component. The long-range component is handled using a distributed
grid-based solver, and the short-range component is by more-direct
particle-particle computations. On many systems, a tree-based multipole
approximation is used to further reduce the computational complexity of the
short-range force. The inner-most computation is a direct N^2 particle-particle
force calculation of the short-range part of the gravitational force. It is this
inner-most calculation that consumes most of the simulation time, is
computationally bound, and is what is represented by this benchmark.

Because this inner-most force calculation is algorithmically isolated from the
overall scale of the problem, the parameters don't need to be adjusted to
represent the workload on different machine scales (e.g. petascale or
exascale).

For more information on HACC, see:

Salman Habib, et al. HACC: Simulating Sky Surveys on State-of-the-Art
Supercomputing Architectures. New Astronomy Volume 42, January 2016, pp. 49-65.
http://doi.org/10.1016/j.newast.2015.06.003
https://arxiv.org/abs/1410.2805

The benchmark can be compiled using cmake (or make directly using
Makefile.simple) and then run like this:

$ ./HACCKernels 
Maximum OpenMP Threads: 1
Iterations: 2000
Gravity Short-Range-Force Kernel (4th Order): 26307.2 -122.385 -1369.32: 4.45269 s
Gravity Short-Range-Force Kernel (5th Order): 26297.5 -123.056 -1368.67: 4.51347 s
Gravity Short-Range-Force Kernel (6th Order): 26297.6 -123.225 -1368.66: 4.8256 s

The accumulated acceleration in each direction for all particles in the last
iteration, which is a function of the total number of iterations, is printed as
a diagnostic. It should be similar for all polynomial kernel orders.

If you'd like the benchmark only to display deterministic output (i.e.
omitting information on the number of threads, timing, and the like), then
define the preprocessor symbol VERIFICATION_OUTPUT_ONLY when compiling.
You can enable this option when configuring by passing
-DVERIFICATION_OUTPUT_ONLY=ON to cmake.

Compared to the older HACCmk procurement benchmark
(https://asc.llnl.gov/CORAL-benchmarks/#haccmk), this benchmark:

 * More closely matches the parallelization scheme used by the production code.
 * Uses a more-realistic distribution of interaction-list lengths and
   out-of-bounds particles.
 * Includes 4th-, 5th-, and 6th-order kernels.

For more information, contact: Hal Finkel <hfinkel@anl.gov>