sfaira.data.DatasetGroupDirectoryOriented.write_distributed_store¶

DatasetGroupDirectoryOriented.write_distributed_store(dir_cache: Union[str, os.PathLike], store_format: str = 'dao', dense: bool = False, compression_kwargs: dict = {}, chunks: Optional[int] = None)¶

Write data set into a format that allows distributed access to data set on disk.

Stores are useful for distributed access to data sets, in many settings this requires some streamlining of the data sets that are accessed. Use .streamline_* before calling this method to streamline the data sets. This method writes a separate file for each data set in this object.

Parameters

dir_cache – Directory to write cache in.
store_format –
Disk format for objects in cache. Recommended is “dao”.
- ”h5ad”: Allows access via backed .h5ad.
  
  Note on compression: .h5ad supports sparse data with is a good compression that gives fast row-wise
  access if the files are csr, so further compression potentially not necessary.
- ”dao”: Distributed access optimised format, recommended for batched access in optimisation, for example.
dense – Whether to write sparse or dense store, this will be homogenously enforced.
compression_kwargs –
Compression key word arguments to give to h5py or zarr For store_format==”h5ad”, see also anndata.AnnData.write_h5ad:
- compression,
- compression_opts.
For store_format==”dao”, see also sfaira.data.write_dao which relays kwargs to zarr.hierarchy.create_dataset:
- compressor
- overwrite
- order
and others.
chunks – Observation axes of chunk size of zarr array, see anndata.AnnData.write_zarr documentation. Only relevant for store==”dao”. The feature dimension of the chunks is always is the full feature space. Uses zarr default chunking across both axes if None.