annbatch.ChunkSampler#

class annbatch.ChunkSampler(chunk_size, preload_nchunks, batch_size, *, replacement=False, num_samples=None, shuffle=False, drop_last=False, mask=None, rng=None)#

Chunk-based sampler for batched data access.

Deprecated since version 0.1.0: Use RandomSampler (for shuffled access) or SequentialSampler (for ordered access) instead.

This is the monolithic sampler that powers both RandomSampler and SequentialSampler. It supports with-replacement (shuffle/no-shuffle) and without-replacement sampling.

Parameters:
batch_size int

Number of observations per batch.

chunk_size int

Size of each chunk i.e. the range of each chunk yielded.

mask slice | None (default: None)

A slice defining the observation range to sample from (start:stop).

shuffle bool (default: False)

Whether to shuffle chunk and index order.

preload_nchunks int

Number of chunks to load per iteration.

drop_last bool (default: False)

Whether to drop the last incomplete batch.

rng Generator | None (default: None)

Random number generator for shuffling. Note that torch.manual_seed() has no effect on reproducibility here; pass a seeded numpy.random.Generator to control randomness.

replacement bool (default: False)

If True, draw random chunks with replacement, allowing the same observations to appear more than once.

num_samples int | None (default: None)

Total number of observations to draw. When None (the default), equals the effective observation range. Must be positive when set and less than the number of observations to be yielded when replacement=False.

Attributes table#

batch_size

The batch size for data loading.

mask

The observation range this sampler operates on.

rng

The random number generator used by this sampler.

shuffle

Whether data is shuffled.

Methods table#

n_iters(n_obs)

Return the number of batches.

sample(n_obs)

Sample load requests given the total number of observations.

validate(n_obs)

Validate the sampler configuration against the loader's n_obs.

Attributes#

ChunkSampler.batch_size#
ChunkSampler.mask#

The observation range this sampler operates on.

ChunkSampler.rng#

The random number generator used by this sampler.

ChunkSampler.shuffle#

Methods#

ChunkSampler.n_iters(n_obs)#

Return the number of batches.

Parameters:
n_obs int

The total number of observations available.

Return type:

int

Returns:

int The total number of batches this sampler will produce.

ChunkSampler.sample(n_obs)#

Sample load requests given the total number of observations.

Base implemention simply calls validate() and then yields via _sample().

Parameters:
n_obs int

The total number of observations available.

Yields:

LoadRequest – Load requests for batching data.

Return type:

Iterator[LoadRequest]

ChunkSampler.validate(n_obs)#

Validate the sampler configuration against the loader’s n_obs.

Parameters:
n_obs int

The total number of observations in the loader.

Raises:

ValueError – If the sampler configuration is invalid for the given n_obs.

Return type:

None