sfaira.data.DatasetGroupDirectoryOriented.streamline_features

DatasetGroupDirectoryOriented.streamline_features(match_to_release: Optional[Union[str, Dict[str, str]]] = None, remove_gene_version: bool = True, subset_genes_to_type: Union[None, str, List[str]] = None, schema: Optional[str] = None)

Subset and sort genes to genes defined in an assembly or genes of a particular type, such as protein coding. :param match_to_release: Which genome annotation release to map the feature space to. Note that assemblies from

ensbeml are usually named as Organism.Assembly.Release, this is the Release string. Can be:
  • str: Provide the name of the release.

  • dict: Mapping of organism to name of the release (see str format). Chooses release for each

    data set based on organism annotation.:param remove_gene_version: Whether to remove the version number after the colon sometimes found in ensembl gene ids.

Parameters

subset_genes_to_type

Type(s) to subset to. Can be a single type or a list of types or None. Types can be:

  • None: All genes in assembly.

  • ”protein_coding”: All protein coding genes in assembly.