sfaira.data.DatasetBase¶
- class sfaira.data.DatasetBase(data_path: Optional[str] = None, meta_path: Optional[str] = None, cache_path: Optional[str] = None, load_func=None, dict_load_func_annotation=None, yaml_path: Optional[str] = None, sample_fn: Optional[str] = None, sample_fns: Optional[List[str]] = None, additional_annotation_key: Optional[str] = None, **kwargs)¶
Attributes
Return all information necessary to cite data set.
All publication DOI associated with the study which are the journal publication and the preprint.
The prepring publication (secondary) DOI associated with the study.
The main DOI associated with the study which is the journal publication if available, otherwise the preprint.
The journal publication (main) DOI associated with the study.
Data download website(s).
Meta data download website(s).
return: Whether DataSet was loaded into memory.
Methods
clear
()Remove loaded .adata to reduce memory footprint.
Collapse count matrix along duplicated index.
download
(**kwargs)get_ontology
(k)load
([load_raw, allow_caching])- param remove_gene_version
Remove gene version string from ENSEMBL ID so that different versions in different data sets are superimposed.
load_meta
(fn)project_free_to_ontology
(attr)Project free text cell type names to ontology based on mapping table.
Load class maps of free text class labels to ontology classes.
set_dataset_id
([idx])streamline_features
([match_to_release, ...])Subset and sort genes to genes defined in an assembly or genes of a particular type, such as protein coding.
streamline_metadata
([schema, clean_obs, ...])Streamline the adata instance to a defined output schema.
subset_cells
(key, values)Subset list of adata objects based on cell-wise properties.
write_distributed_store
(dir_cache[, ...])Write data set into a format that allows distributed access to data set on disk.
write_meta
([fn_meta, dir_out])Write meta data object for data set.
write_ontology_class_maps
(fn, attrs[, ...])Load class maps of ontology-controlled field to ontology classes.