class d3m.container.dataset.ComputeDigest(value)[source]

Bases: d3m.utils.Enum

Enumeration of possible approaches to computing dataset digest.

ALWAYS = 'ALWAYS'[source]
NEVER = 'NEVER'[source]
class d3m.container.dataset.Dataset(resources, metadata=None, *, load_lazy=None, generate_metadata=False, check=True, source=None, timestamp=None)[source]

Bases: dict

A class representing a dataset.

Internally, it is a dictionary containing multiple resources (e.g., tables).

  • resources (Mapping) – A map from resource IDs to resources.

  • metadata (d3m.metadata.base.DataMetadata) – Metadata associated with the data.

  • load_lazy (Optional[Callable[[Dataset], None]]) – If constructing a lazy dataset, calling this function will read all the data and convert the dataset to a non-lazy one.

  • generate_metadata (bool) – Automatically generate and update the metadata.

  • check (bool) – DEPRECATED: argument ignored.

  • source (Optional[Any]) – DEPRECATED: argument ignored.

  • timestamp (Optional[datetime]) – DEPRECATED: argument ignored.

Return type


Return type

Dict[str, Dict[ColumnReference, List[ColumnReference]]]


Builds the relations graph for the dataset.

Each key in the output corresponds to a resource/table. The value under a key is the list of edges this table has. The edge is represented by a tuple of four elements. For example, if the edge is (resource_id, True, index_1, index_2, custom_state), it means that there is a foreign key that points to table resource_id. Specifically, index_1 column in the current table points to index_2 column in the table resource_id.

custom_state is an empty dict when returned from this method, but allows users of this graph to store custom state there.


Returns the relation graph in adjacency representation.

Return type

Dict[str, List[Tuple[str, bool, int, int, Dict]]]


Return whether this dataset instance is lazy and not all data has been loaded.


True if this dataset instance is lazy.

Return type


classmethod load(dataset_uri, *, dataset_id=None, dataset_version=None, dataset_name=None, lazy=False, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False, handle_score_split=True)[source]

Tries to load dataset from dataset_uri using all registered dataset loaders.

  • dataset_uri (str) – A URI to load.

  • dataset_id (Optional[str]) – Override dataset ID determined by the loader.

  • dataset_version (Optional[str]) – Override dataset version determined by the loader.

  • dataset_name (Optional[str]) – Override dataset name determined by the loader.

  • lazy (bool) – If True, load only top-level metadata and not whole dataset.

  • compute_digest (ComputeDigest) – Compute a digest over the data?

  • strict_digest (bool) – If computed digest does not match the one provided in metadata, raise an exception?

  • handle_score_split (bool) – If a scoring dataset has target values in a separate file, merge them in?


A loaded dataset.

Return type



Read all the data and convert the dataset to a non-lazy one.

Return type


classmethod register_loader(loader)[source]

Registers a new dataset loader.


loader (Loader) – An instance of the loader class implementing a new loader.

Return type


classmethod register_saver(saver)[source]

Registers a new dataset saver.


saver (Saver) – An instance of the saver class implementing a new saver.

Return type


save(dataset_uri, *, compute_digest=<ComputeDigest.ALWAYS: 'ALWAYS'>, preserve_metadata=True)[source]

Tries to save dataset to dataset_uri using all registered dataset savers.

  • dataset_uri (str) – A URI to save to.

  • compute_digest (ComputeDigest) – Compute digest over the data when saving?

  • preserve_metadata (bool) – When saving a dataset, store its metadata as well?

Return type



Generate a new Dataset from the row indices for DataFrames.


row_indices_to_keep (Mapping[str, Sequence[int]]) – This is a dict where key is resource ID and value is a sequence of row indices to keep. If a resource ID is missing, the whole related resource is kept.


Returns a new Dataset.

Return type


to_json_structure(*, canonical=False)[source]

Returns only a top-level dataset description.

Return type


loaders: List[d3m.container.dataset.Loader] = [<d3m.container.dataset.D3MDatasetLoader object>, <d3m.container.dataset.CSVLoader object>, <d3m.container.dataset.SklearnExampleLoader object>, <d3m.container.dataset.OpenMLDatasetLoader object>][source]
metadata: d3m.metadata.base.DataMetadata[source]
savers: List[d3m.container.dataset.Saver] = [<d3m.container.dataset.D3MDatasetSaver object>][source]