Skip to content
Snippets Groups Projects
Select Git revision
  • chunks
  • blosc2
  • benchmarks
  • master default protected
  • octree
  • sunspot-reproducer
  • sz_2.11.1
  • io_layer
  • 20230908
  • 20190417
  • 20170925
  • 20160829
  • 20160412
  • 20150608
14 results

genericio

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    Michael Buehlmann authored
    0ca283fd
    History

    GenericIO

    GenericIO is a write-optimized library for writing self-describing scientific data files on large-scale parallel file systems.

    Reference

    Habib, et al., HACC: Simulating Future Sky Surveys on State-of-the-Art Supercomputing Architectures, New Astronomy, 2015 (http://arxiv.org/abs/1410.2805).

    Obtaining the Source Code

    The most recent version of source is available by cloning this repo:

    git clone https://xgitlab.cels.anl.gov/hacc/genericio.git

    There is also a history of code releases: 2019-04-17 / 2017-09-25 / 2016-08-29 / 2016-04-12 / 2015-06-08 /


    Building Executables / C++Library

    The executables and libgenericio can be built either with CMake (minimum version 3.10) or with GNUMake. The following executables will be built:

    • frontend/GenericIOPrint print data to stdout (non-MPI version)
    • frontend/GenericIOVerify verify and try reading data (non-MPI version)
    • mpi/GenericIOBenchmarkRead reading benchmark, works on data written with GenericIOBenchmarkWrite
    • mpi/GenericIOBenchmarkWrite writing benchmark
    • mpi/GenericIOPrint print data to stdout
    • mpi/genericIORewrite rewrite data with a different number of ranks
    • mpi/genericIOVerify verify and try reading data

    Using CMake

    Note that the executables / libraries will be located in build/<frontend/mpi>. CMake will use the compiler pointed to in the CC and CXX environmental variables.

    mkdir build && cd build
    cmake ..
    make -j4

    Using Make

    Make will create the executables / libraries under the main directory. Edit the CC, CXX, MPICC, and MPICXX variables in the GNUmakefile to change the compiler.

    make

    Installing the Python Library

    The pygio library is pip-installable and works with mpi4py.

    Requirements

    Currently, a CMake version >= 3.11.0 is required to fetch dependencies during configuration. The pygio module also requires MPI libraries to be findable by CMake's FindMPI. The compiler needs to support C++17 (make sure that CC and CXX point to the correct compiler)

    Install

    The python library can be installed by running pip in the main folder:

    pip install .

    It will use the compiler referred by the CC and CXX environment variable. If the compiler supports OpenMP, the library will be threaded. Make sure to set OMP_NUM_THREADS to an appropriate variable, in particluar when using multiple MPI ranks per node.


    Output file partitions (subfiles)

    If you're running on an IBM BG/Q supercomputer, then the number of subfiles (partitions) chosen is based on the I/O nodes in an automatic way. Otherwise, by default, the GenericIO library picks the number of subfiles based on a fairly-naive hostname-based hashing scheme. This works reasonably-well on small clusters, but not on larger systems. On a larger system, you might want to set these environmental variables:

    GENERICIO_PARTITIONS_USE_NAME=0
    GENERICIO_RANK_PARTITIONS=256

    Where the number of partitions (256 above) determines the number of subfiles used. If you're using a Lustre file system, for example, an optimal number of files is:

    # of files * stripe count  ~ # OSTs

    On Titan, for example, there are 1008 OSTs, and a default stripe count of 4, so we use approximately 256 files.