Skip to content
Snippets Groups Projects
user avatar
Michael Buehlmann authored
02348385
History

GenericIO

GenericIO is a write-optimized library for writing self-describing scientific data files on large-scale parallel file systems.

Reference

Habib, et al., HACC: Simulating Future Sky Surveys on State-of-the-Art Supercomputing Architectures, New Astronomy, 2015 (http://arxiv.org/abs/1410.2805).

Obtaining the Source Code

The most recent version of source is available by cloning this repo:

git clone https://xgitlab.cels.anl.gov/hacc/genericio.git

There is also a history of code releases: 2019-04-17 / 2017-09-25 / 2016-08-29 / 2016-04-12 / 2015-06-08 /


Building Executables / C++Library

The executables and libgenericio can be built either with CMake (minimum version 3.10) or with GNUMake. The following executables will be built:

  • frontend/GenericIOPrint print data to stdout (non-MPI version)
  • frontend/GenericIOVerify verify and try reading data (non-MPI version)
  • mpi/GenericIOBenchmarkRead reading benchmark, works on data written with GenericIOBenchmarkWrite
  • mpi/GenericIOBenchmarkWrite writing benchmark
  • mpi/GenericIOPrint print data to stdout
  • mpi/genericIORewrite rewrite data with a different number of ranks
  • mpi/genericIOVerify verify and try reading data

Using CMake

Note that the executables / libraries will be located in build/<frontend/mpi>. CMake will use the compiler pointed to in the CC and CXX environmental variables.

mkdir build && cd build
cmake ..
make -j4

Using Make

Make will create the executables / libraries under the main directory. Edit the CC, CXX, MPICC, and MPICXX variables in the GNUmakefile to change the compiler.

make

Installing the Python Library

The pygio library is pip-installable and works with mpi4py.

Requirements

Currently, a CMake version >= 3.11.0 is required to fetch dependencies during configuration. The pygio module also requires MPI libraries to be findable by CMake's FindMPI. The compiler needs to support C++17 (make sure that CC and CXX point to the correct compiler)

Install

The python library can be installed by running pip in the main folder:

pip install .

It will use the compiler referred by the CC and CXX environment variable. If the compiler supports OpenMP, the library will be threaded. Make sure to set OMP_NUM_THREADS to an appropriate variable, in particluar when using multiple MPI ranks per node.


Output file partitions (subfiles)

If you're running on an IBM BG/Q supercomputer, then the number of subfiles (partitions) chosen is based on the I/O nodes in an automatic way. Otherwise, by default, the GenericIO library picks the number of subfiles based on a fairly-naive hostname-based hashing scheme. This works reasonably-well on small clusters, but not on larger systems. On a larger system, you might want to set these environmental variables:

GENERICIO_PARTITIONS_USE_NAME=0
GENERICIO_RANK_PARTITIONS=256

Where the number of partitions (256 above) determines the number of subfiles used. If you're using a Lustre file system, for example, an optimal number of files is:

# of files * stripe count  ~ # OSTs

On Titan, for example, there are 1008 OSTs, and a default stripe count of 4, so we use approximately 256 files.