sfaira.data.DatasetGroupDirectoryOriented

class sfaira.data.DatasetGroupDirectoryOriented(file_base: str, data_path: Optional[str] = None, meta_path: Optional[str] = None, cache_path: Optional[str] = None)

Attributes

adata

adata_ls

additional_annotation_key

"

collection_id

doi

Propagates DOI annotation from contained datasets.

ids

ontology_celltypes

use might be replaced by ontology_container_sfaira in the future.

ontology_container_sfaira

supplier

Propagates supplier annotation from contained datasets.

Methods

clean_ontology_class_maps()

Finalises processed class maps of free text labels to ontology classes.

collapse_counts()

Collapse count matrix along duplicated index.

download(**kwargs)

load([annotated_only, load_raw, ...])

Load all datasets in group (option for temporary loading).

ncells([annotated_only])

ncells_bydataset([annotated_only])

obs_concat([keys])

Returns concatenation of all .obs.

project_celltypes_to_ontology([...])

Project free text cell type names to ontology based on mapping table.

show_summary()

streamline_features([match_to_release, ...])

Subset and sort genes to genes defined in an assembly or genes of a particular type, such as protein coding. :param match_to_release: Which genome annotation release to map the feature space to. Note that assemblies from ensbeml are usually named as Organism.Assembly.Release, this is the Release string. Can be: - str: Provide the name of the release. - dict: Mapping of organism to name of the release (see str format). Chooses release for each data set based on organism annotation.:param remove_gene_version: Whether to remove the version number after the colon sometimes found in ensembl gene ids. :param subset_genes_to_type: Type(s) to subset to. Can be a single type or a list of types or None. Types can be: - None: All genes in assembly. - "protein_coding": All protein coding genes in assembly.

streamline_metadata([schema, clean_obs, ...])

Streamline the adata instance in each data set to output format.

subset(key, values)

Subset list of adata objects based on sample-wise properties.

subset_cells(key, values)

Subset list of adata objects based on cell-wise properties.

write_backed(adata_backed, genome, idx[, ...])

Loads data set group into slice of backed anndata object.

write_distributed_store(dir_cache[, ...])

Write data set into a format that allows distributed access to data set on disk.

write_ontology_class_maps(fn, attrs[, ...])

Write cell type maps of free text cell types to ontology classes.