sfaira.data.StoreSingleFeatureSpace

class sfaira.data.StoreSingleFeatureSpace(adata_by_key: Dict[str, anndata._core.anndata.AnnData], indices: Dict[str, numpy.ndarray], obs_by_key: Union[None, Dict[str, dask.dataframe.core.DataFrame]] = None, data_source: str = 'X')

Data set group class tailored to data access requirements common in high-performance computing (HPC).

This class does not inherit from DatasetGroup because it entirely relies on the cached objects. This class is centred around .adata_by_key and .indices.

.adata_by_key is a dictionary (by id) of backed anndata instances that point to individual h5ads. This dictionary is intialised with all h5ads in the store. As the store is sub-setted, key-value pairs are deleted from this dictionary.

.indices have keys that correspond to keys in .adata_by_key and contain index vectors of observations in the anndata instances in .adata_by_key which are still kept. These index vectors are a form of lazy slicing that does not require data set loading or re-writing. As the store is sub-setted, key-value pairs are deleted from this dictionary if no observations from a given key match the sub-setting. If a subset of observations from a key matches the subsetting operation, the index set in the corresponding value is reduced. All data retrieval operations work on .indices: Generators run over these indices when retrieving observations for example.

Attributes

adata_by_key

Anndata instance for each selected data set in store, sub-setted by selected cells.

adata_memory_footprint

Memory foot-print of data set k in MB.

data_by_key

Data matrix for each selected data set in store, sub-setted by selected cells.

dataset_weights

genome_container

idx

Global indices.

indices

Indices of observations that are currently exposed in adata of this instance.

n_obs

Number of observations selected in store.

n_vars

Number of selected features per organism in store

obs_by_key

organism

Organism of store.

organisms_by_key

Data set-wise organism label as dictionary of data set keys.

shape

var

var_names

Feature names of selected genes by organism in store.

data_source

Methods

checkout([idx, batch_size, ...])

Yields an instance of a generator class over observations in the contained data sets.

get_subset_idx(attr_key, values, excluded_values)

Get indices of subset list of adata objects based on cell-wise properties.

load_config(fn)

Load a config file and recreates a data sub-setting.

subset(attr_key[, values, excluded_values, ...])

Subset list of adata objects based on cell-wise properties.

write_config(fn)

Writes a config file that describes the current data sub-setting.