sfaira.data.DatasetBase

class sfaira.data.DatasetBase(data_path: Optional[str] = None, meta_path: Optional[str] = None, cache_path: Optional[str] = None, load_func=None, dict_load_func_annotation=None, yaml_path: Optional[str] = None, sample_fn: Optional[str] = None, sample_fns: Optional[List[str]] = None, additional_annotation_key: Optional[str] = None, **kwargs)

Attributes

additional_annotation_key

annotated

assay_differentiation

assay_sc

assay_type_differentiation

author

bio_sample

bio_sample_obs_key

cache_fn

cell_line

cell_type

celltypes_universe

citation

Return all information necessary to cite data set.

data_dir

default_embedding

development_stage

directory_formatted_doi

disease

doi

All publication DOI associated with the study which are the journal publication and the preprint.

doi_cleaned_id

doi_journal

The prepring publication (secondary) DOI associated with the study.

doi_main

The main DOI associated with the study which is the journal publication if available, otherwise the preprint.

doi_preprint

The journal publication (main) DOI associated with the study.

download_url_data

Data download website(s).

download_url_meta

Meta data download website(s).

ethnicity

feature_reference

feature_type

id

individual

loaded

return: Whether DataSet was loaded into memory.

meta

meta_fn

ncells

ontology_class_maps

organ

organism

primary_data

sample_source

sex

source

source_doi

state_exact

tech_sample

tech_sample_obs_key

title

year

adata

data_dir_base

meta_path

cache_path

genome

supplier

layer_counts

layer_processed

layer_spliced_counts

layer_spliced_processed

layer_unspliced_counts

layer_unspliced_processed

layer_velocity

gm

treatment

assay_sc_obs_key

assay_differentiation_obs_key

assay_type_differentiation_obs_key

cell_type_obs_key

development_stage_obs_key

disease_obs_key

ethnicity_obs_key

gm_obs_key

individual_obs_key

organ_obs_key

organism_obs_key

sample_source_obs_key

sex_obs_key

source_doi_obs_key

state_exact_obs_key

treatment_obs_key

feature_id_var_key

feature_reference_var_key

feature_symbol_var_key

feature_type_var_key

spatial_x_coord_obs_key

spatial_y_coord_obs_key

spatial_z_coord_obs_key

vdj_vj_1_obs_key_prefix

vdj_vj_2_obs_key_prefix

vdj_vdj_1_obs_key_prefix

vdj_vdj_2_obs_key_prefix

vdj_c_call_obs_key_suffix

vdj_consensus_count_obs_key_suffix

vdj_d_call_obs_key_suffix

vdj_duplicate_count_obs_key_suffix

vdj_j_call_obs_key_suffix

vdj_junction_obs_key_suffix

vdj_junction_aa_obs_key_suffix

vdj_locus_obs_key_suffix

vdj_productive_obs_key_suffix

vdj_v_call_obs_key_suffix

load_raw

mapped_features

remove_gene_version

subset_gene_type

streamlined_meta

sample_fn

Methods

clear()

Remove loaded .adata to reduce memory footprint.

collapse_counts()

Collapse count matrix along duplicated index.

download(**kwargs)

get_ontology(k)

load([load_raw, allow_caching])

param remove_gene_version

Remove gene version string from ENSEMBL ID so that different versions in different data sets are superimposed.

load_meta(fn)

project_free_to_ontology(attr)

Project free text cell type names to ontology based on mapping table.

read_ontology_class_maps(fns)

Load class maps of free text class labels to ontology classes.

set_dataset_id([idx])

show_summary()

streamline_features([match_to_release, ...])

Subset and sort genes to genes defined in an assembly or genes of a particular type, such as protein coding.

streamline_metadata([schema, clean_obs, ...])

Streamline the adata instance to a defined output schema.

subset_cells(key, values)

Subset list of adata objects based on cell-wise properties.

write_distributed_store(dir_cache[, ...])

Write data set into a format that allows distributed access to data set on disk.

write_meta([fn_meta, dir_out])

Write meta data object for data set.

write_ontology_class_maps(fn, attrs[, ...])

Load class maps of ontology-controlled field to ontology classes.