sfaira.data.DatasetSuperGroup.streamline_features¶
- DatasetSuperGroup.streamline_features(match_to_release: Optional[Union[str, Dict[str, str]]] = None, remove_gene_version: bool = True, subset_genes_to_type: Union[None, str, List[str]] = None, schema: Optional[str] = None)¶
Subset and sort genes to genes defined in an assembly or genes of a particular type, such as protein coding.
- Parameters
match_to_release –
Which genome annotation release to map the feature space to. Note that assemblies from ensembl are usually named as Organism.Assembly.Release, this is the Release string. Can be:
str: Provide the name of the release.
- dict: Mapping of organism to name of the release (see str format). Chooses release for each
data set based on organism annotation.:param remove_gene_version: Whether to remove the version number after the colon sometimes found in ensembl gene ids.
subset_genes_to_type –
Type(s) to subset to. Can be a single type or a list of types or None. Types can be:
None: All genes in assembly.
”protein_coding”: All protein coding genes in assembly.