sfaira.data.StoreSingleFeatureSpace.checkout

StoreSingleFeatureSpace.checkout(idx: Optional[numpy.ndarray] = None, batch_size: int = 1, retrieval_batch_size: int = 128, map_fn=None, obs_keys: Optional[List[str]] = None, return_dense: bool = True, randomized_batch_access: bool = False, random_access: bool = False, batch_schedule: str = 'base', **kwargs) sfaira.data.store.carts.single.CartSingle

Yields an instance of a generator class over observations in the contained data sets.

Multiple such instances can be emitted by a single store class and point to data stored in this store class. Effectively, these generators are heavily reduced pointers to the data in an instance of self. A common use case is the instantiation of a training data generator and a validation data generator over a data subset defined in this class.

Parameters
  • idx – Global idx to query from store. These is an array with indices corresponding to a contiuous index along all observations in self.adata_by_key, ordered along a hypothetical concatenation along the keys of self.adata_by_key. If None, all observations are selected.

  • batch_size – Number of observations to yield in each access (generator invocation).

  • retrieval_batch_size – Number of observations read from disk in each batched access (data-backend generator invocation).

  • map_fn – Map functino to apply to output tuple of raw generator. Each draw i from the generator is then: yield map_fn(x[i, var_idx], obs[i, obs_keys])

  • obs_keys – .obs columns to return in the generator. These have to be a subset of the columns available in self.adata_by_key.

  • return_dense – Whether to force return count data .X as dense batches. This allows more efficient feature indexing if the store is sparse (column indexing on csr matrices is slow).

  • randomized_batch_access – Whether to randomize batches during reading (in generator). Lifts necessity of using a shuffle buffer on generator, however, batch composition stays unchanged over epochs unless there is overhangs in retrieval_batch_size in the raw data files, which often happens and results in modest changes in batch composition. Do not use randomized_batch_access and random_access.

  • random_access – Whether to fully shuffle observations before batched access takes place. May slow down access compared randomized_batch_access and to no randomization. Do not use randomized_batch_access and random_access.

  • batch_schedule

    A valid batch schedule name or a class that inherits from BatchDesignBase.

    • ”basic”: sfaira.data.store.batch_schedule.BatchDesignBasic

    • ”balanced”: sfaira.data.store.batch_schedule.BatchDesignBalanced

    • ”blocks”: sfaira.data.store.batch_schedule.BatchDesignBlocks

    • ”full”: sfaira.data.store.batch_schedule.BatchDesignFull

    • class: batch_schedule needs to be a class (not instance), subclassing BatchDesignBase.

  • kwargs – kwargs for idx_generator chosen.

Returns

Generator function which yields batch_size at every invocation. The generator returns a tuple of (.X, .obs).