sfaira.data.store.carts.CartMulti.adaptor¶
- CartMulti.adaptor(generator_type: str, dataset_kwargs: Optional[dict] = None, shuffle_buffer: int = 0, repeat: int = 1, **kwargs)¶
The adaptor turns a python base generator into a different iteratable object, defined by generator_type.
- Parameters
generator_type –
Type of output iteratable. - python base generator (no change to
.generator
) - tensorflow dataset: This dataset is defined on a python iterator.- Important:
This only returns the tf.data.Dataset.from_generator(). You need to define the input pipeline (e.g. .batch(), .prefetch()) on top of this data set.
- pytorch: We distinguish torch.data.Dataset and torch.data.DataLoader ontop of either.
The Dataset vs DataLoader distinction is made by the “” suffix for Dataset or “-loader” suffix for + dataloader. The distinction between Dataset and IteratableDataset defines if the object is defined directly on a dask array or based on a python iterator on a dask array. Note that the python iterator can implement favorable remote access schemata but the torch.data.Dataset generally causes less trouble in out-of-the-box usage.
torch.data.Dataset: “torch” prefix, ie “torch” or “torch-loader”
torch.data.IteratableDataset: “torch-iter” prefix, ie “torch-iter” or “torch-iter-loader”
- Important:
For model training in pytorch you need the “-loader” prefix. You can specify the arguments passed to torch.utils.data.DataLoader by the dataset_kwargs dictionary.
dataset_kwargs – Dict Parameters to pass to the constructor of torch Dataset. Only relevant if generator_type in [‘torch’, ‘torch-loader’]
shuffle_buffer – int If shuffle_buffer > 0 -> Use a shuffle buffer with size shuffle_buffer to shuffle output of self.iterator (this option is useful when using randomized_batch_access in the DaskCart)
repeat – int Number of times to repeat the dataset until the underlying generator runs out of samples. If repeat <= 0 -> repeat dataset forever
- Returns
Modified iteratable (see generator_type).