sfaira.data.DatasetGroupDirectoryOriented.write_distributed_store

DatasetGroupDirectoryOriented.write_distributed_store(dir_cache: Union[str, os.PathLike], store_format: str = 'dao', dense: bool = False, compression_kwargs: dict = {}, chunks: Optional[int] = None)

Write data set into a format that allows distributed access to data set on disk.

Stores are useful for distributed access to data sets, in many settings this requires some streamlining of the data sets that are accessed. Use .streamline_* before calling this method to streamline the data sets. This method writes a separate file for each data set in this object.

Parameters
  • dir_cache – Directory to write cache in.

  • store_format

    Disk format for objects in cache. Recommended is “dao”.

    • ”h5ad”: Allows access via backed .h5ad.
      Note on compression: .h5ad supports sparse data with is a good compression that gives fast row-wise

      access if the files are csr, so further compression potentially not necessary.

    • ”dao”: Distributed access optimised format, recommended for batched access in optimisation, for example.

  • dense – Whether to write sparse or dense store, this will be homogenously enforced.

  • compression_kwargs

    Compression key word arguments to give to h5py or zarr For store_format==”h5ad”, see also anndata.AnnData.write_h5ad:

    • compression,

    • compression_opts.

    For store_format==”dao”, see also sfaira.data.write_dao which relays kwargs to zarr.hierarchy.create_dataset:

    • compressor

    • overwrite

    • order

    and others.

  • chunks – Observation axes of chunk size of zarr array, see anndata.AnnData.write_zarr documentation. Only relevant for store==”dao”. The feature dimension of the chunks is always is the full feature space. Uses zarr default chunking across both axes if None.