# GenericIO GenericIO is a write-optimized library for writing self-describing scientific data files on large-scale parallel file systems. ## Reference Habib, et al., HACC: Simulating Future Sky Surveys on State-of-the-Art Supercomputing Architectures, New Astronomy, 2015 (http://arxiv.org/abs/1410.2805). ## Obtaining the Source Code The most recent version of source is available by cloning this repo: ```bash git clone https://xgitlab.cels.anl.gov/hacc/genericio.git ``` There is also a history of code [releases](https://xgitlab.cels.anl.gov/hacc/genericio/-/releases): [2019-04-17](https://xgitlab.cels.anl.gov/hacc/genericio/-/releases/20190417) / [2017-09-25](https://xgitlab.cels.anl.gov/hacc/genericio/-/releases/20170925) / [2016-08-29](https://xgitlab.cels.anl.gov/hacc/genericio/-/releases/20160829) / [2016-04-12](https://xgitlab.cels.anl.gov/hacc/genericio/-/releases/20160412) / [2015-06-08](https://xgitlab.cels.anl.gov/hacc/genericio/-/releases/20150608) / ----- ## Building Executables / C++Library The executables and ``libgenericio`` can be built either with [CMake](https://cmake.org/) (minimum version 3.10) or with [GNUMake](https://www.gnu.org/software/make/). The following executables will be built: - ``frontend/GenericIOPrint`` print data to stdout (non-MPI version) - ``frontend/GenericIOVerify`` verify and try reading data (non-MPI version) - ``mpi/GenericIOBenchmarkRead`` reading benchmark, works on data written with ``GenericIOBenchmarkWrite`` - ``mpi/GenericIOBenchmarkWrite`` writing benchmark - ``mpi/GenericIOPrint`` print data to stdout - ``mpi/genericIORewrite`` rewrite data with a different number of ranks - ``mpi/genericIOVerify`` verify and try reading data **Using CMake** Note that the executables / libraries will be located in ``build/<frontend/mpi>``. CMake will use the compiler pointed to in the ``CC`` and ``CXX`` environmental variables. ```bash mkdir build && cd build cmake .. make -j4 ``` **Using Make** Make will create the executables / libraries under the main directory. Edit the ``CC``, ``CXX``, ``MPICC``, and ``MPICXX`` variables in the GNUmakefile to change the compiler. ```bash make ``` ## Installing the Python Library The `pygio` library is pip-installable and works with `mpi4py`. **Requirements** Currently, a **CMake version >= 3.11.0** is required to fetch dependencies during configuration. The ``pygio`` module also requires MPI libraries to be findable by CMake's FindMPI. The compiler needs to support **C++17** (make sure that ``CC`` and ``CXX`` point to the correct compiler) **Install** The python library can be installed by running pip in the **main folder**: ```bash pip install . ``` It will use the compiler referred by the ``CC`` and ``CXX`` environment variable. If the compiler supports OpenMP, the library will be threaded. Make sure to set ``OMP_NUM_THREADS`` to an appropriate variable, in particluar when using multiple MPI ranks per node. ----- ## Output file partitions (subfiles) If you're running on an IBM BG/Q supercomputer, then the number of subfiles (partitions) chosen is based on the I/O nodes in an automatic way. Otherwise, by default, the GenericIO library picks the number of subfiles based on a fairly-naive hostname-based hashing scheme. This works reasonably-well on small clusters, but not on larger systems. On a larger system, you might want to set these environmental variables: ```bash GENERICIO_PARTITIONS_USE_NAME=0 GENERICIO_RANK_PARTITIONS=256 ``` Where the number of partitions (256 above) determines the number of subfiles used. If you're using a Lustre file system, for example, an optimal number of files is: ``` # of files * stripe count ~ # OSTs ``` On Titan, for example, there are 1008 OSTs, and a default stripe count of 4, so we use approximately 256 files.