sfaira.data.StoreSingleFeatureSpace¶
- class sfaira.data.StoreSingleFeatureSpace(adata_by_key: Dict[str, anndata._core.anndata.AnnData], indices: Dict[str, numpy.ndarray], obs_by_key: Union[None, Dict[str, dask.dataframe.core.DataFrame]] = None, data_source: str = 'X')¶
Data set group class tailored to data access requirements common in high-performance computing (HPC).
This class does not inherit from DatasetGroup because it entirely relies on the cached objects. This class is centred around .adata_by_key and .indices.
.adata_by_key is a dictionary (by id) of backed anndata instances that point to individual h5ads. This dictionary is intialised with all h5ads in the store. As the store is sub-setted, key-value pairs are deleted from this dictionary.
.indices have keys that correspond to keys in .adata_by_key and contain index vectors of observations in the anndata instances in .adata_by_key which are still kept. These index vectors are a form of lazy slicing that does not require data set loading or re-writing. As the store is sub-setted, key-value pairs are deleted from this dictionary if no observations from a given key match the sub-setting. If a subset of observations from a key matches the subsetting operation, the index set in the corresponding value is reduced. All data retrieval operations work on .indices: Generators run over these indices when retrieving observations for example.
Attributes
Anndata instance for each selected data set in store, sub-setted by selected cells.
Memory foot-print of data set k in MB.
Data matrix for each selected data set in store, sub-setted by selected cells.
Global indices.
Indices of observations that are currently exposed in adata of this instance.
Number of observations selected in store.
Number of selected features per organism in store
Organism of store.
Data set-wise organism label as dictionary of data set keys.
Feature names of selected genes by organism in store.
Methods
checkout
([idx, batch_size, ...])Yields an instance of a generator class over observations in the contained data sets.
get_subset_idx
(attr_key, values, excluded_values)Get indices of subset list of adata objects based on cell-wise properties.
load_config
(fn)Load a config file and recreates a data sub-setting.
subset
(attr_key[, values, excluded_values, ...])Subset list of adata objects based on cell-wise properties.
write_config
(fn)Writes a config file that describes the current data sub-setting.