API¶
Import sfaira as:
import sfaira
Data: data
¶
Data loaders¶
The sfaira data zoo API.
Dataset representing classes used for development:
|
|
|
Container class that co-manages multiple data sets, removing need to call Dataset() methods directly through wrapping them. |
|
|
|
Container for multiple DatasetGroup instances. |
Interactive data class to use a loaded data object in the context sfaira tools:
|
Dataset universe to interact with all data loader classes:
|
Stores¶
We distinguish stores for a single feature space, which could for example be a single organism,
and those for multiple feature spaces.
Critically, data from multiple feature spaces can be represented as a data array for each feature space.
In load_store
we represent a directory of datasets as a instance of a multi-feature space store and discover all feature
spaces present.
This store can be subsetted to a single store if only data corresponding to a single organism is desired,
for example.
The core API exposed to users is:
|
Instantiates a distributed store class. |
Store classes for a single feature space:
|
Data set group class tailored to data access requirements common in high-performance computing (HPC). |
Store classes for a multiple feature spaces:
Umbrella class for a dictionary over multiple instances DistributedStoreSingleFeatureSpace. |
|
|
|
|
|
|
Carts¶
Stores represent on-disk data collection and perform operations such as subsetting. Ultimatively, they are often used to emit data objects, which are “carts”. Carts are specific to the underlying store’s data format and expose iterators, data matrices and adaptors to machine learning framework data pipelines, such as tensorflow and torchc data. Again, carts can cover one or multiple feature spaces.
|
Cart for a DistributedStoreSingleFeatureSpace(). |
|
Cart for a DistributedStoreMultipleFeatureSpaceBase(). |
The emission of data from cart iterators and adaptors is controlled by batch schedules, which direct how data is released from the underlying data matrix:
Manages distribution of selected indices for a given data object over subsequent batches. |
|
Standard batched access to data. |
|
Balanced batches across meta data partitions of data. |
|
Meta data-defined blocks of observations in each batch. |
|
Emits full dataset as a single batch in each query. |
For most purposes related to stochastic optimisation, BatchDesignBasic
is chosen.
Estimator classes: estimators
¶
Estimator classes from the sfaira model zoo API for advanced use.
Estimator base class for keras models. |
|
|
Estimator class for the cell type model. |
|
Estimator class for the embedding model. |
Model classes: models
¶
Model classes from the sfaira model zoo API for advanced use.
Cell type models¶
Classes that wrap tensorflow cell type predictor models.
|
Marker gene-based cell type classifier: Learns whether or not each gene exceeds requires threshold and learns cell type assignment as linear combination of these marker gene presence probabilities. |
|
Marker gene-based cell type classifier: Learns whether or not each gene exceeds requires threshold and learns cell type assignment as linear combination of these marker gene presence probabilities. |
|
Multi-layer perceptron to predict cell type. |
|
Embedding models¶
Classes that wrap tensorflow embedding models.
|
Combines the encoder and decoder into an end-to-end model for training. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Train: train
¶
The interface for training sfaira compatible models.
Trainer classes¶
Classes that wrap estimator classes to use in grid search training.
|
|
|
Grid search summaries¶
Classes to pool evaluation metrics across fits in a grid search.
|
|
|
|
|
Versions: versions
¶
The interface for sfaira metadata management.
Genomes¶
Genome management.
|
Container class for a genome annotation for a specific release. |
Metadata¶
Dataset metadata management. Base classes to manage ontology files:
|
Basic unordered ontology container. |
Basic ordered ontology container |
|
|
|
|
Onotology-specific classes:
|
|
|
|
|
|
|
|
|
|
|
Class wrapping cell type ontology for predictor models:
|
Cell type universe (list) and ontology (hierarchy) container class. |
Topologies¶
Model topology management.
|
Class interface for a YAML-style defined model topology that loads a genome container tailored to the model. |
User interface: ui
¶
This sub-module gives users access to the model zoo, including model query from remote servers. This API is designed to be used in analysis workflows and does not require any understanding of the way models are defined and stored.
|
This class performs data set handling and coordinates estimators for the different model types. Example code to obtain a UMAP embedding plot of the embedding created from your data with cell-type labels::. |