annbatch.abc.Sampler#

class annbatch.abc.Sampler#

Base sampler class.

Samplers control how data is batched and loaded from the underlying datasets.

Attributes table#

batch_size

The batch size for data loading.

mask

The observation range this sampler operates on.

rng

The random number generator used by this sampler.

shuffle

Whether data is shuffled.

Methods table#

_sample(n_obs)

Implementation of the sample method.

n_iters(n_obs)

Return the number of batches.

sample(n_obs)

Sample load requests given the total number of observations.

validate(n_obs)

Validate the sampler configuration against the given n_obs.

Attributes#

Sampler.batch_size#

The batch size for data loading.

Note

This property is only used when the splits argument is not supplied in the annbatch.types.LoadRequest. When splits are explicitly provided, they determine the batch boundaries instead.

Returns:

int The number of observations per batch.

Sampler.mask#

The observation range this sampler operates on.

Sampler.rng#

The random number generator used by this sampler.

Sampler.shuffle#

Whether data is shuffled.

If batch_size is provided and annbatch.types.LoadRequest.splits is not, in-memory loaded data will be shuffled or not based on this param.

Shuffling of on-disk data is up to the user (controlled by chunks parameter in annbatch.types.LoadRequest).

Returns:

bool True if data should be shuffled, False otherwise.

Methods#

abstractmethod Sampler._sample(n_obs)#

Implementation of the sample method.

This method is called by the sample method to perform the actual sampling after validation has passed.

Parameters:
n_obs int

The total number of observations available.

Yields:

LoadRequest – Load requests for batching data.

Return type:

Iterator[LoadRequest]

abstractmethod Sampler.n_iters(n_obs)#

Return the number of batches.

Parameters:
n_obs int

The total number of observations available.

Return type:

int

Returns:

int The total number of batches this sampler will produce.

Sampler.sample(n_obs)#

Sample load requests given the total number of observations.

Base implemention simply calls validate() and then yields via _sample().

Parameters:
n_obs int

The total number of observations available.

Yields:

LoadRequest – Load requests for batching data.

Return type:

Iterator[LoadRequest]

abstractmethod Sampler.validate(n_obs)#

Validate the sampler configuration against the given n_obs.

This method is called at the start of each sample() call. Override this method to add custom validation for sampler parameters.

Parameters:
n_obs int

The total number of observations in the loader.

Raises:

ValueError – If the sampler configuration is invalid for the given n_obs.

Return type:

None