sfaira.data.StoreSingleFeatureSpace.checkout¶
- StoreSingleFeatureSpace.checkout(idx: Optional[numpy.ndarray] = None, batch_size: int = 1, retrieval_batch_size: int = 128, map_fn=None, obs_keys: Optional[List[str]] = None, return_dense: bool = True, randomized_batch_access: bool = False, random_access: bool = False, batch_schedule: str = 'base', **kwargs) sfaira.data.store.carts.single.CartSingle ¶
Yields an instance of a generator class over observations in the contained data sets.
Multiple such instances can be emitted by a single store class and point to data stored in this store class. Effectively, these generators are heavily reduced pointers to the data in an instance of self. A common use case is the instantiation of a training data generator and a validation data generator over a data subset defined in this class.
- Parameters
idx – Global idx to query from store. These is an array with indices corresponding to a contiuous index along all observations in self.adata_by_key, ordered along a hypothetical concatenation along the keys of self.adata_by_key. If None, all observations are selected.
batch_size – Number of observations to yield in each access (generator invocation).
retrieval_batch_size – Number of observations read from disk in each batched access (data-backend generator invocation).
map_fn – Map functino to apply to output tuple of raw generator. Each draw i from the generator is then:
yield map_fn(x[i, var_idx], obs[i, obs_keys])
obs_keys – .obs columns to return in the generator. These have to be a subset of the columns available in self.adata_by_key.
return_dense – Whether to force return count data .X as dense batches. This allows more efficient feature indexing if the store is sparse (column indexing on csr matrices is slow).
randomized_batch_access – Whether to randomize batches during reading (in generator). Lifts necessity of using a shuffle buffer on generator, however, batch composition stays unchanged over epochs unless there is overhangs in retrieval_batch_size in the raw data files, which often happens and results in modest changes in batch composition. Do not use randomized_batch_access and random_access.
random_access – Whether to fully shuffle observations before batched access takes place. May slow down access compared randomized_batch_access and to no randomization. Do not use randomized_batch_access and random_access.
batch_schedule –
A valid batch schedule name or a class that inherits from BatchDesignBase.
”basic”: sfaira.data.store.batch_schedule.BatchDesignBasic
”balanced”: sfaira.data.store.batch_schedule.BatchDesignBalanced
”blocks”: sfaira.data.store.batch_schedule.BatchDesignBlocks
”full”: sfaira.data.store.batch_schedule.BatchDesignFull
class: batch_schedule needs to be a class (not instance), subclassing BatchDesignBase.
kwargs – kwargs for idx_generator chosen.
- Returns
Generator function which yields batch_size at every invocation. The generator returns a tuple of (.X, .obs).