Installation

Installation#

You need to have Python 3.12 or newer installed on your system. If you don’t have Python installed, we recommend installing uv.

PyPI#

Install the latest release of annbatch from PyPI:

pip install "annbatch[zarrs]"

Important

zarrs-python gives the necessary performance boost for the sharded data produced by our preprocessing functions to be useful when loading data off a local filesystem, so we recommend installing the zarrs extra and using it when working with local filesystems. Otherwise, be sure to install the [remote] extra for zarr-python to be able to use zarr.storage.ObjectStore for top remote performance.

Optional dependencies#

annbatch ships several extras that you can mix and match:

Extra

What it adds

zarrs

High-performance zarr codec pipeline via zarrs-python for local filesystems — strongly recommended.

torch

Yields batches as 0-copy torch.Tensors.

cupy-cuda12

GPU acceleration via cupy for CUDA 12, highly recommended for CUDA systems.

cupy-cuda13

GPU acceleration via cupy for CUDA 13, highly recommended for CUDA systems.

cupy provides accelerated handling of the data via preload_to_gpu once it has been read off disk, and does not need to be used in conjunction with torch. cupy is also compatible with rocm (AMD) devices, although we do not provide an extra for installing.

To install several extras at once:

pip install "annbatch[zarrs,torch,cupy-cuda13]"

(Replace cupy-cuda13 with the extra matching your local CUDA version.)

Important

Always quote the package specifier ("annbatch[zarrs,torch]") and do not put spaces between the extras. Most shells (bash, zsh) treat the square brackets as glob patterns, so an unquoted annbatch[zarrs,torch] — or one written as annbatch[zarrs, torch] — will fail to install.