sfaira.data.StoreSingleFeatureSpace¶

class sfaira.data.StoreSingleFeatureSpace(adata_by_key: Dict[str, anndata._core.anndata.AnnData], indices: Dict[str, numpy.ndarray], obs_by_key: Union[None, Dict[str, dask.dataframe.core.DataFrame]] = None, data_source: str = 'X')¶

Data set group class tailored to data access requirements common in high-performance computing (HPC).

This class does not inherit from DatasetGroup because it entirely relies on the cached objects. This class is centred around .adata_by_key and .indices.

.adata_by_key is a dictionary (by id) of backed anndata instances that point to individual h5ads. This dictionary is intialised with all h5ads in the store. As the store is sub-setted, key-value pairs are deleted from this dictionary.

.indices have keys that correspond to keys in .adata_by_key and contain index vectors of observations in the anndata instances in .adata_by_key which are still kept. These index vectors are a form of lazy slicing that does not require data set loading or re-writing. As the store is sub-setted, key-value pairs are deleted from this dictionary if no observations from a given key match the sub-setting. If a subset of observations from a key matches the subsetting operation, the index set in the corresponding value is reduced. All data retrieval operations work on .indices: Generators run over these indices when retrieving observations for example.

Attributes

`adata_by_key`	Anndata instance for each selected data set in store, sub-setted by selected cells.
`adata_memory_footprint`	Memory foot-print of data set k in MB.
`data_by_key`	Data matrix for each selected data set in store, sub-setted by selected cells.
`dataset_weights`
`genome_container`
`idx`	Global indices.
`indices`	Indices of observations that are currently exposed in adata of this instance.
`n_obs`	Number of observations selected in store.
`n_vars`	Number of selected features per organism in store
`obs_by_key`
`organism`	Organism of store.
`organisms_by_key`	Data set-wise organism label as dictionary of data set keys.
`shape`
`var`
`var_names`	Feature names of selected genes by organism in store.
`data_source`

Methods

`checkout`([idx, batch_size, ...])	Yields an instance of a generator class over observations in the contained data sets.
`get_subset_idx`(attr_key, values, excluded_values)	Get indices of subset list of adata objects based on cell-wise properties.
`load_config`(fn)	Load a config file and recreates a data sub-setting.
`subset`(attr_key[, values, excluded_values, ...])	Subset list of adata objects based on cell-wise properties.
`write_config`(fn)	Writes a config file that describes the current data sub-setting.