annbatch#
A data loader and io utilities for mini-batched data loading of on-disk anndata files,
co-developed by Lamin Labs and scverse.
annbatch lets you train models on terabyte-scale collections of AnnData files that do not fit
into memory, while keeping your GPU fed with high-throughput, shuffled mini-batches.
New to annbatch? Check out the installation guide and pick the right extras.
A hands-on notebook: convert your .h5ad files and stream shuffled mini-batches.
An in-depth tour of preprocessing, chunked loading, sampling and benchmarks.
The API reference contains a detailed description of the annbatch API.
Need help? Reach out on the scverse forum to get your questions answered.
Found a bug? Interested in contributing? Check out the source on GitHub.
Citation#
If you use annbatch in your work, please cite the annbatch publication:
annbatch unlocks terabyte-scale training of biological data in anndata
Gold, I., Fischer, F., Arnoldt, L., Wolf, F. A., & Theis, F. J. (2026). annbatch unlocks terabyte-scale training of biological data in anndata. arXiv. https://doi.org/10.48550/arxiv.2604.01949
@article{Gold_2026,
author = {Gold, I. and Fischer, F. and Arnoldt, L. and Wolf, F. A. and Theis, F. J.},
title = {annbatch unlocks terabyte-scale training of biological data in anndata},
journal = {arXiv},
year = {2026},
doi = {10.48550/arxiv.2604.01949},
eprint = {2604.01949},
archivePrefix = {arXiv},
url = {https://doi.org/10.48550/arxiv.2604.01949}
}