API

Import sfaira as:

import sfaira

Data: data

Data loaders

The sfaira data zoo API.

Dataset representing classes used for development:

data.DatasetBase([data_path, meta_path, ...])

data.DatasetGroup(datasets[, collection_id])

Container class that co-manages multiple data sets, removing need to call Dataset() methods directly through wrapping them.

data.DatasetGroupDirectoryOriented(file_base)

data.DatasetSuperGroup(dataset_groups)

Container for multiple DatasetGroup instances.

Interactive data class to use a loaded data object in the context sfaira tools:

data.DatasetInteractive(data[, ...])

Dataset universe to interact with all data loader classes:

data.Universe([data_path, meta_path, ...])

Stores

We distinguish stores for a single feature space, which could for example be a single organism, and those for multiple feature spaces. Critically, data from multiple feature spaces can be represented as a data array for each feature space. In load_store we represent a directory of datasets as a instance of a multi-feature space store and discover all feature spaces present. This store can be subsetted to a single store if only data corresponding to a single organism is desired, for example. The core API exposed to users is:

data.load_store(cache_path[, store_format, ...])

Instantiates a distributed store class.

Store classes for a single feature space:

data.StoreSingleFeatureSpace(adata_by_key, ...)

Data set group class tailored to data access requirements common in high-performance computing (HPC).

Store classes for a multiple feature spaces:

data.StoreMultipleFeatureSpaceBase(stores)

Umbrella class for a dictionary over multiple instances DistributedStoreSingleFeatureSpace.

data.StoresAnndata(adatas)

data.StoresDao(cache_path[, columns])

data.StoresH5ad(cache_path[, in_memory])

Carts

Stores represent on-disk data collection and perform operations such as subsetting. Ultimatively, they are often used to emit data objects, which are “carts”. Carts are specific to the underlying store’s data format and expose iterators, data matrices and adaptors to machine learning framework data pipelines, such as tensorflow and torchc data. Again, carts can cover one or multiple feature spaces.

data.store.carts.CartSingle(obs_idx, ...[, ...])

Cart for a DistributedStoreSingleFeatureSpace().

data.store.carts.CartMulti(carts[, intercalated])

Cart for a DistributedStoreMultipleFeatureSpaceBase().

The emission of data from cart iterators and adaptors is controlled by batch schedules, which direct how data is released from the underlying data matrix:

data.store.batch_schedule.BatchDesignBase(...)

Manages distribution of selected indices for a given data object over subsequent batches.

data.store.batch_schedule.BatchDesignBasic(...)

Standard batched access to data.

data.store.batch_schedule.BatchDesignBalanced(...)

Balanced batches across meta data partitions of data.

data.store.batch_schedule.BatchDesignBlocks(...)

Meta data-defined blocks of observations in each batch.

data.store.batch_schedule.BatchDesignFull(...)

Emits full dataset as a single batch in each query.

For most purposes related to stochastic optimisation, BatchDesignBasic is chosen.

Estimator classes: estimators

Estimator classes from the sfaira model zoo API for advanced use.

estimators.EstimatorKeras()

Estimator base class for keras models.

estimators.EstimatorKerasCelltype(data, ...)

Estimator class for the cell type model.

estimators.EstimatorKerasEmbedding(data, ...)

Estimator class for the embedding model.

Model classes: models

Model classes from the sfaira model zoo API for advanced use.

Cell type models

Classes that wrap tensorflow cell type predictor models.

models.celltype.CellTypeMarker(in_dim, out_dim)

Marker gene-based cell type classifier: Learns whether or not each gene exceeds requires threshold and learns cell type assignment as linear combination of these marker gene presence probabilities.

models.celltype.CellTypeMarker(in_dim, out_dim)

Marker gene-based cell type classifier: Learns whether or not each gene exceeds requires threshold and learns cell type assignment as linear combination of these marker gene presence probabilities.

models.celltype.CellTypeMlp(in_dim, out_dim)

Multi-layer perceptron to predict cell type.

models.celltype.CellTypeMlpVersioned(...[, ...])

Embedding models

Classes that wrap tensorflow embedding models.

models.embedding.ModelKerasAe(in_dim[, ...])

Combines the encoder and decoder into an end-to-end model for training.

models.embedding.ModelAeVersioned(...[, ...])

models.embedding.ModelKerasVae(in_dim[, ...])

models.embedding.ModelVaeVersioned(...[, ...])

models.embedding.ModelKerasLinear(in_dim[, ...])

models.embedding.ModelLinearVersioned(...[, ...])

models.embedding.ModelKerasVaeIAF(in_dim[, ...])

models.embedding.ModelVaeIAFVersioned(...[, ...])

models.embedding.ModelKerasVaeVamp(in_dim[, ...])

models.embedding.ModelVaeVampVersioned(...)

Train: train

The interface for training sfaira compatible models.

Trainer classes

Classes that wrap estimator classes to use in grid search training.

train.TrainModelCelltype(model_path, data, ...)

train.TrainModelEmbedding(model_path, data)

Grid search summaries

Classes to pool evaluation metrics across fits in a grid search.

train.GridsearchContainer(source_path, cv)

train.SummarizeGridsearchCelltype(...[, ...])

train.SummarizeGridsearchEmbedding(...[, ...])

Versions: versions

The interface for sfaira metadata management.

Genomes

Genome management.

versions.genomes.GenomeContainer([organism, ...])

Container class for a genome annotation for a specific release.

Metadata

Dataset metadata management. Base classes to manage ontology files:

versions.metadata.Ontology()

versions.metadata.OntologyList(terms, **kwargs)

Basic unordered ontology container.

versions.metadata.OntologyHierarchical()

Basic ordered ontology container

versions.metadata.OntologyObo(obo, **kwargs)

versions.metadata.OntologyOboCustom(obo, ...)

Onotology-specific classes:

versions.metadata.OntologyCellosaurus([recache])

versions.metadata.OntologyCl(branch[, ...])

versions.metadata.OntologyHsapdv(branch[, ...])

versions.metadata.OntologyMondo(branch[, ...])

versions.metadata.OntologyMmusdv(branch[, ...])

versions.metadata.OntologyUberon(branch[, ...])

Class wrapping cell type ontology for predictor models:

versions.metadata.CelltypeUniverse(cl, ...)

Cell type universe (list) and ontology (hierarchy) container class.

Topologies

Model topology management.

versions.topologies.TopologyContainer(...[, ...])

Class interface for a YAML-style defined model topology that loads a genome container tailored to the model.

User interface: ui

This sub-module gives users access to the model zoo, including model query from remote servers. This API is designed to be used in analysis workflows and does not require any understanding of the way models are defined and stored.

ui.UserInterface([custom_repo, sfaira_repo, ...])

This class performs data set handling and coordinates estimators for the different model types. Example code to obtain a UMAP embedding plot of the embedding created from your data with cell-type labels::.