sfaira.data.DatasetBase¶

class sfaira.data.DatasetBase(data_path: Optional[str] = None, meta_path: Optional[str] = None, cache_path: Optional[str] = None, load_func=None, dict_load_func_annotation=None, yaml_path: Optional[str] = None, sample_fn: Optional[str] = None, sample_fns: Optional[List[str]] = None, additional_annotation_key: Optional[str] = None, **kwargs)¶

Attributes

`additional_annotation_key`
`annotated`
`assay_differentiation`
`assay_sc`
`assay_type_differentiation`
`author`
`bio_sample`
`bio_sample_obs_key`
`cache_fn`
`cell_line`
`cell_type`
`celltypes_universe`
`citation`	Return all information necessary to cite data set.
`data_dir`
`default_embedding`
`development_stage`
`directory_formatted_doi`
`disease`
`doi`	All publication DOI associated with the study which are the journal publication and the preprint.
`doi_cleaned_id`
`doi_journal`	The prepring publication (secondary) DOI associated with the study.
`doi_main`	The main DOI associated with the study which is the journal publication if available, otherwise the preprint.
`doi_preprint`	The journal publication (main) DOI associated with the study.
`download_url_data`	Data download website(s).
`download_url_meta`	Meta data download website(s).
`ethnicity`
`feature_reference`
`feature_type`
`id`
`individual`
`loaded`	return: Whether DataSet was loaded into memory.
`meta`
`meta_fn`
`ncells`
`ontology_class_maps`
`organ`
`organism`
`primary_data`
`sample_source`
`sex`
`source`
`source_doi`
`state_exact`
`tech_sample`
`tech_sample_obs_key`
`title`
`year`
`adata`
`data_dir_base`
`meta_path`
`cache_path`
`genome`
`supplier`
`layer_counts`
`layer_processed`
`layer_spliced_counts`
`layer_spliced_processed`
`layer_unspliced_counts`
`layer_unspliced_processed`
`layer_velocity`
`gm`
`treatment`
`assay_sc_obs_key`
`assay_differentiation_obs_key`
`assay_type_differentiation_obs_key`
`cell_type_obs_key`
`development_stage_obs_key`
`disease_obs_key`
`ethnicity_obs_key`
`gm_obs_key`
`individual_obs_key`
`organ_obs_key`
`organism_obs_key`
`sample_source_obs_key`
`sex_obs_key`
`source_doi_obs_key`
`state_exact_obs_key`
`treatment_obs_key`
`feature_id_var_key`
`feature_reference_var_key`
`feature_symbol_var_key`
`feature_type_var_key`
`spatial_x_coord_obs_key`
`spatial_y_coord_obs_key`
`spatial_z_coord_obs_key`
`vdj_vj_1_obs_key_prefix`
`vdj_vj_2_obs_key_prefix`
`vdj_vdj_1_obs_key_prefix`
`vdj_vdj_2_obs_key_prefix`
`vdj_c_call_obs_key_suffix`
`vdj_consensus_count_obs_key_suffix`
`vdj_d_call_obs_key_suffix`
`vdj_duplicate_count_obs_key_suffix`
`vdj_j_call_obs_key_suffix`
`vdj_junction_obs_key_suffix`
`vdj_junction_aa_obs_key_suffix`
`vdj_locus_obs_key_suffix`
`vdj_productive_obs_key_suffix`
`vdj_v_call_obs_key_suffix`
`load_raw`
`mapped_features`
`remove_gene_version`
`subset_gene_type`
`streamlined_meta`
`sample_fn`

Methods

`clear`()	Remove loaded .adata to reduce memory footprint.
`collapse_counts`()	Collapse count matrix along duplicated index.
`download`(**kwargs)
`get_ontology`(k)
`load`([load_raw, allow_caching])	param remove_gene_version Remove gene version string from ENSEMBL ID so that different versions in different data sets are superimposed.
`load_meta`(fn)
`project_free_to_ontology`(attr)	Project free text cell type names to ontology based on mapping table.
`read_ontology_class_maps`(fns)	Load class maps of free text class labels to ontology classes.
`set_dataset_id`([idx])
`show_summary`()
`streamline_features`([match_to_release, ...])	Subset and sort genes to genes defined in an assembly or genes of a particular type, such as protein coding.
`streamline_metadata`([schema, clean_obs, ...])	Streamline the adata instance to a defined output schema.
`subset_cells`(key, values)	Subset list of adata objects based on cell-wise properties.
`write_distributed_store`(dir_cache[, ...])	Write data set into a format that allows distributed access to data set on disk.
`write_meta`([fn_meta, dir_out])	Write meta data object for data set.
`write_ontology_class_maps`(fn, attrs[, ...])	Load class maps of ontology-controlled field to ontology classes.