sfaira.data.DatasetSuperGroup

class sfaira.data.DatasetSuperGroup(dataset_groups: Union[None, List[sfaira.data.dataloaders.base.dataset_group.DatasetGroup], List[sfaira.data.dataloaders.base.dataset_group.DatasetSuperGroup]])

Container for multiple DatasetGroup instances.

Used to manipulate structured dataset collections. Primarly designed for this manipulation, convert to DatasetGroup via flatten() for more functionalities.

Attributes

adata

adata_ls

additional_annotation_key

"

datasets

Returns DatasetGroup (rather than self = DatasetSuperGroup) containing all listed data sets.

ids

fn_backed

dataset_groups

Methods

collapse_counts()

Collapse count matrix along duplicated index.

download(**kwargs)

extend_dataset_groups(dataset_groups)

flatten()

Returns DatasetGroup (rather than self = DatasetSuperGroup) containing all listed data sets.

get_gc([genome])

load([annotated_only, load_raw, ...])

Loads data set homosapiens into anndata object.

load_config(fn)

Load a config file and recreates a data sub-setting.

ncells([annotated_only])

ncells_bydataset([annotated_only])

List of list of length of all data sets by data set group.

ncells_bydataset_flat([annotated_only])

Flattened list of length of all data sets.

project_celltypes_to_ontology([...])

Project free text cell type names to ontology based on mapping table.

remove_duplicates([supplier_hierarchy])

Remove duplicate data loaders from super group, e.g.

set_dataset_groups(dataset_groups)

show_summary()

streamline_features([match_to_release, ...])

Subset and sort genes to genes defined in an assembly or genes of a particular type, such as protein coding.

streamline_metadata([schema, clean_obs, ...])

Streamline the adata instance in each group and each data set to output format.

subset(key, values)

Subset list of adata objects based on match to values in key property.

subset_cells(key, values)

Subset list of adata objects based on cell-wise properties.

write_config(fn)

Writes a config file that describes the current data sub-setting.

write_distributed_store(dir_cache[, ...])

Write data set into a format that allows distributed access to data set on disk.