annbatch

Contents

annbatch#

A data loader and io utilities for mini-batched data loading of on-disk anndata files, co-developed by Lamin Labs and scverse.

annbatch lets you train models on terabyte-scale collections of AnnData files that do not fit into memory, while keeping your GPU fed with high-throughput, shuffled mini-batches.

annbatch data-loading speed compared to other dataloaders
Installation

New to annbatch? Check out the installation guide and pick the right extras.

Installation
Quickstart

A hands-on notebook: convert your .h5ad files and stream shuffled mini-batches.

Quickstart annbatch
User guide

An in-depth tour of preprocessing, chunked loading, sampling and benchmarks.

Detailed Walkthrough
API reference

The API reference contains a detailed description of the annbatch API.

API
Discussion

Need help? Reach out on the scverse forum to get your questions answered.

https://discourse.scverse.org/
GitHub

Found a bug? Interested in contributing? Check out the source on GitHub.

https://github.com/scverse/annbatch

Citation#

If you use annbatch in your work, please cite the annbatch publication:

annbatch unlocks terabyte-scale training of biological data in anndata

Gold, I., Fischer, F., Arnoldt, L., Wolf, F. A., & Theis, F. J. (2026). annbatch unlocks terabyte-scale training of biological data in anndata. arXiv. https://doi.org/10.48550/arxiv.2604.01949

@article{Gold_2026,
    author        = {Gold, I. and Fischer, F. and Arnoldt, L. and Wolf, F. A. and Theis, F. J.},
    title         = {annbatch unlocks terabyte-scale training of biological data in anndata},
    journal       = {arXiv},
    year          = {2026},
    doi           = {10.48550/arxiv.2604.01949},
    eprint        = {2604.01949},
    archivePrefix = {arXiv},
    url           = {https://doi.org/10.48550/arxiv.2604.01949}
}