sfaira.data.DatasetGroupDirectoryOriented.load

DatasetGroupDirectoryOriented.load(annotated_only: bool = False, load_raw: bool = False, allow_caching: bool = True, processes: int = 1, func=None, kwargs_func: Optional[dict] = None, verbose: int = 0, **kwargs)

Load all datasets in group (option for temporary loading).

Note: This method automatically subsets to the group to the data sets for which input files were found.

This method also allows temporarily loading data sets to execute function on loaded data sets (supply func). In this setting, datasets are removed from memory after the function has been executed.

param annotated_only

param load_raw

param allow_caching

param processes

Processes to parallelise loading over. Uses python multiprocessing if > 1, for loop otherwise.

param func

Function to run on loaded datasets. map_fun should only take one argument, which is a Dataset instance. The return can be empty:

def func(dataset, **kwargs_func):

# code manipulating dataset and generating output x. return x

param kwargs_func

Kwargs of func.

param verbose

Verbosity of description of loading failure.

  • 0: no indication of failure

  • 1: indication of which data set failed in warning

  • 2: 1 with error report in warning

  • 3: reportin as in 2 but aborts with OSError

Parameters
  • remove_gene_version – Remove gene version string from ENSEMBL ID so that different versions in different data sets are superimposed.

  • match_to_reference – Reference genomes name or False to keep original feature space.

  • load_raw – Loads unprocessed version of data if available in data loader.

  • allow_caching – Whether to allow method to cache adata object for faster re-loading.