GenericIO merge requestshttps://git.cels.anl.gov/hacc/genericio/-/merge_requests2024-01-31T15:44:47-06:00https://git.cels.anl.gov/hacc/genericio/-/merge_requests/14flushAll() returns result and can be called multiple times2024-01-31T15:44:47-06:00Bogdan NicolaeflushAll() returns result and can be called multiple timesThe original VELOC-GIO code introduced the flushAll() call that waits for async I/O to finish and shuts down VELOC. It was meant to be called by HACC at the end of the execution. This merge request changes the semantic: flushAll() only w...The original VELOC-GIO code introduced the flushAll() call that waits for async I/O to finish and shuts down VELOC. It was meant to be called by HACC at the end of the execution. This merge request changes the semantic: flushAll() only waits for async I/O to finish and returns the result (bool indicating whether all I/O operations finished successfully or not). A separate shutdown() call is needed to shut down VELOC (needed before exit).https://git.cels.anl.gov/hacc/genericio/-/merge_requests/13expose readDims and readCoords to python interface2023-11-15T14:49:56-06:00Michael Buehlmannexpose readDims and readCoords to python interfaceadditional capabilities for python interface:
- `pygio.read_dims` -> get MPI topology of source ranks
- `pygio.read_coords` -> get MPI coordinates of specific rankadditional capabilities for python interface:
- `pygio.read_dims` -> get MPI topology of source ranks
- `pygio.read_coords` -> get MPI coordinates of specific rankhttps://git.cels.anl.gov/hacc/genericio/-/merge_requests/12Fix CI2023-10-20T11:23:36-05:00Michael BuehlmannFix CIThis should fix issues with the CI pipeline and new versions of Debian (docker)This should fix issues with the CI pipeline and new versions of Debian (docker)https://git.cels.anl.gov/hacc/genericio/-/merge_requests/11some improvements to the python interface2023-09-19T14:53:31-05:00Michael Buehlmannsome improvements to the python interface- the python package now contains the git version with which it was
compiled
- instead of "from pygio import *", now use named imports
(helps with IDE suggestions)
- add "eff_rank" to docs and "read_num_elems"- the python package now contains the git version with which it was
compiled
- instead of "from pygio import *", now use named imports
(helps with IDE suggestions)
- add "eff_rank" to docs and "read_num_elems"https://git.cels.anl.gov/hacc/genericio/-/merge_requests/10remove rebalance_sourceranks function2023-09-07T15:38:05-05:00Michael Buehlmannremove rebalance_sourceranks function(was added 2 years ago and did not turn out to be useful.
Also causing problems on some compilers, eg on theta)(was added 2 years ago and did not turn out to be useful.
Also causing problems on some compilers, eg on theta)https://git.cels.anl.gov/hacc/genericio/-/merge_requests/9add hasVariable function to GenericIO2023-09-07T15:38:32-05:00Michael Buehlmannadd hasVariable function to GenericIOadded a convenience function to GenericIO to check if a variable is defined in a fileadded a convenience function to GenericIO to check if a variable is defined in a filehttps://git.cels.anl.gov/hacc/genericio/-/merge_requests/8change default genericio file partitions2023-03-03T20:05:58-06:00Michael Buehlmannchange default genericio file partitionschanged the setNaturalDefaultPartition to not do node-name partitions by default. Instead if neither `GENERICIO_PARTITIONS_USE_NAME` or `GENERICIO_RANK_PARTITIONS` are set, it will do one file per rank up to 256 ranks. For higher rank nu...changed the setNaturalDefaultPartition to not do node-name partitions by default. Instead if neither `GENERICIO_PARTITIONS_USE_NAME` or `GENERICIO_RANK_PARTITIONS` are set, it will do one file per rank up to 256 ranks. For higher rank numbers, it will emit a warning and assign the ranks round-robin to 256 files.Nicholas FrontiereNicholas Frontierehttps://git.cels.anl.gov/hacc/genericio/-/merge_requests/7update SZ version2023-05-16T08:42:32-05:00Michael Buehlmannupdate SZ versionporting changes from gio-sz branch of HACC to genericio
Will require validation before merging into master. Currently, it compiles with Makefile and CMakeporting changes from gio-sz branch of HACC to genericio
Will require validation before merging into master. Currently, it compiles with Makefile and CMakeAdrian PopeAdrian Popehttps://git.cels.anl.gov/hacc/genericio/-/merge_requests/6added initial VELOC support2022-10-21T17:09:00-05:00Bogdan Nicolaeadded initial VELOC supportVELOC support can be compiled into GIO by defining VELOC_INSTALL_DIR in GNUMakefile (which defines -DGENERICIO_WITH_VELOC used by the preprocessor). It is activated by defining the GENERICIO_USE_VELOC=<path_to_cfg_file> environment varia...VELOC support can be compiled into GIO by defining VELOC_INSTALL_DIR in GNUMakefile (which defines -DGENERICIO_WITH_VELOC used by the preprocessor). It is activated by defining the GENERICIO_USE_VELOC=<path_to_cfg_file> environment variable.https://git.cels.anl.gov/hacc/genericio/-/merge_requests/5Safe-Guards preventing segfaults for incorrect inputs (python interface)2022-06-08T04:14:29-05:00Michael BuehlmannSafe-Guards preventing segfaults for incorrect inputs (python interface)- throw an error for missing variables and multiple-defined variables
- explicitly define the stride for numpy arrays (under some circumstances, the stride was ill-defined and an error was thrown)
- update the pybind11 library to the lat...- throw an error for missing variables and multiple-defined variables
- explicitly define the stride for numpy arrays (under some circumstances, the stride was ill-defined and an error was thrown)
- update the pybind11 library to the latest bugfix releaseMichael BuehlmannMichael Buehlmannhttps://git.cels.anl.gov/hacc/genericio/-/merge_requests/4Updated Docs, New default for python module2022-04-17T13:27:25-05:00Michael BuehlmannUpdated Docs, New default for python moduleI'm starting this merge now since there are already quite a few changes and we can discuss (and potentially fix) them here before merging.
Main changes:
-------------
- Added Documentation (using sphinx and readthedocs theme)
- Added CI...I'm starting this merge now since there are already quite a few changes and we can discuss (and potentially fix) them here before merging.
Main changes:
-------------
- Added Documentation (using sphinx and readthedocs theme)
- Added CI scripts (thanks @bgutierrez-garcia !)
- updated/extended the python interface
Documentation
-------------
The documentation source is located in `./docs`. It is compiled with [`sphinx`](https://www.sphinx-doc.org/en/master/).
- the pages are `*.rst` files (restructured text)
- the main index is in `index.rst`, and there are 3 subfolders `executables/`, `python/`, `cpp/` containing the specific parts of the code
- configuration is in `conf.py`, but that should not be needed to be edited in the future
A preview of the compiled documentation from my fork can be found on [the hep server](https://www.hep.anl.gov/CPACdocs/genericio).
Note: I moved some documentation that was previously in the `README.md` file to the docs, with references to the documentation in the README file.
### Python Docs
The python part of the docs is partially auto-generated from the doc-strings in the python module (see e.g. [References](https://www.hep.anl.gov/CPACdocs/genericio/python/readwrite/#references).
The rest of it is more in a tutorial style with links to the references.
The Python class-interface currently doesn't have proper documentation. But it also shouldn't be used in my opinion.
### C++ Docs
The C++ docs are split in 2 parts:
- how to add GenericIO as a dependency using CMake, which should make it very easy to create new libraries and executables that use GenericIO, without having to manually copy the GenericIO code into the new source code. In fact, I already used it for the [monofonIC](https://bitbucket.org/ohahn/monofonic/src/master/) initial condition generator.
- how to use the `gio::GenericIO` class. This part is currently missing as I'm lacking the motivation to write this documentation...
### Executables
I listed all the executables and added the usage print as well as options they take. It could use some more details I think, but I'm not using any of these and therefore probably am the wrong person to extend the docs.
I also listed all the environmental variables that are being read by GenericIO, including BLOSC variables that will impact the compression. Since I have no idea what most of these do, that part of the documentation currently just contains the variables without any description.
CI scripts
----------
Ben helped me getting a `gitlab-runner` based on docker working (on one of the HEP servers). The configuration is located in `./.gitlab-ci.yml`. Whenever something is pushed (currently to all branches, we should probably limit it to main/master), the following tasks are executed:
- compilation using the `Makefile`
- compilation using `CMake`
- building the python library
- building the docs
- deploying the docs to the webserver
If any of these tasks are failing, the subsequent tasks won't be executed and if you subscribe to alerts, you will get an email notification. If I remember correctly, the deployment of the docs happens in 2 stages, with the `gitlab-runner` copying it to some temporary directory and a `cronjob` checking and updating the `www` folder on the webserver on a regular interval. Ben is the boss.
Updates to the Python interfaces
--------------------------------
I deprecated the simple ("old") interface by moving
- `python` -> `legacy_python`
- `new_python` -> `python`
I also added docstrings to the most used functions (they will show up in e.g. jupyter) and I added some parameters that were previously not passed through the python interface (e.g. `eff_rank`) and static functions (e.g. `setNaturalDefaultPartition`).
I also added a `pyproject.toml` file to comply with the latest python best-practices. It should help with error messages if CMake is not installed / not in the correct version, and it potentially can even install CMake. In the future, we may be able to upload GenericIO to [pypi](https://pypi.org), so that users can simply run `pip install pygio` without cloning/updating/... the git repo.
Final words
-----------
I hope this is useful, especially as more students and external collaborators will be using HACC data. A final note: Nothing in the C++ library has changed, all should exactly work as before
I currently marked it as "Draft", we can remove that once everyone is happy with the changes.Michael BuehlmannMichael Buehlmannhttps://git.cels.anl.gov/hacc/genericio/-/merge_requests/3non-MPI version for (new) Python interface (only affects new_python)2021-01-19T14:12:38-06:00Michael Buehlmannnon-MPI version for (new) Python interface (only affects new_python)Always compile a non-MPI version of the (new) python library, which can be loaded by setting the GENERICIO_NO_MPI env variable. Also add the `PrintStats` boolean to the python interface.
Background: on some systems, MPI is not support...Always compile a non-MPI version of the (new) python library, which can be loaded by setting the GENERICIO_NO_MPI env variable. Also add the `PrintStats` boolean to the python interface.
Background: on some systems, MPI is not supported on the login node (cooley) or on jupyterhub instances (summit), and the "new" python module cannot be loaded on this system. This commit compiles two python modules (with and without MPI, or only without MPI if MPI is not available during compilation).
By default, if available, the MPI version is loaded. The non-MPI version can be forced to be loaded as follows:
```python
import os
os.environ['GENERICIO_NO_MPI'] = 'True'
import pygio
```
The writing of GenericIO files in the python module is not supported if MPI is disabled.https://git.cels.anl.gov/hacc/genericio/-/merge_requests/2python interface updates and balancing option2021-01-07T12:18:13-06:00Michael Buehlmannpython interface updates and balancing optionThe first two commits are updates to the (new) python interface:
* allow setting the mismatch behavior (redistribute, allowed, disallowed) for more fine-grained control
* update dependencies (pybind11 to version 2.6.1)
The third c...The first two commits are updates to the (new) python interface:
* allow setting the mismatch behavior (redistribute, allowed, disallowed) for more fine-grained control
* update dependencies (pybind11 to version 2.6.1)
The third commit is a function in `GenericIO:rebalanceSourceRanks` that I originally wrote upon a request by Patricia. It is useful if the ranks that wrote the file were very unbalanced, which can happen for some analysis output.
* The function redistributes the storage-ranks such that each reading-rank has a similar amount of data to read.
* The reading can be faster or slower, depending on the machine and data.
* The function only has an effect if MPI is being used and `MismatchBehavior::MismatchRedistribute` is selected.
* The function needs to be manually called after `openAndReadHeader`
In case we don't want the last commit in the main branch, I can re-submit the merge-request with the first two commits only. However, since the rebalancing function only has an effect if manually called, it will not break/change any existing behavior.https://git.cels.anl.gov/hacc/genericio/-/merge_requests/1CMakeLists and new python interface2020-08-21T16:07:43-05:00Michael BuehlmannCMakeLists and new python interfaceI added a `CMakeLists.txt` that sets up the same targets as the GNUmakefile, and a python interface based on the [pybind11 project](https://pybind11.readthedocs.io/en/stable/).
We can use this pull request to discuss the changes and p...I added a `CMakeLists.txt` that sets up the same targets as the GNUmakefile, and a python interface based on the [pybind11 project](https://pybind11.readthedocs.io/en/stable/).
We can use this pull request to discuss the changes and potential issues. Patricia and me tested part of it on roomba and cooley so far.
The CMake compatibility will allow GenericIO to be included as a nested project in other projects that use CMake (maybe HACC in the future, and I'll test it with an initial condition generator code).
I tried to use modern CMake as much as possible, setting compile/link options on a target level. One thing that is different between the GNUmakefile and this: I compile BLOSC and GenericIO as a (static) library, which is then linked to the executables. I don't think this will cause any performance issues. In the future, we may be able to include BLOSC as a nested subproject too?
I'll do some more testing on different systems, Patricia is working on some benchmarks.