d3m.container.dataset¶
-
class
d3m.container.dataset.ComputeDigest(value)[source]¶ Bases:
d3m.utils.EnumEnumeration of possible approaches to computing dataset digest.
-
class
d3m.container.dataset.Dataset(resources, metadata=None, *, load_lazy=None, generate_metadata=False, check=True, source=None, timestamp=None)[source]¶ Bases:
dictA class representing a dataset.
Internally, it is a dictionary containing multiple resources (e.g., tables).
- Parameters
resources (
Mapping) – A map from resource IDs to resources.metadata (d3m.metadata.base.DataMetadata) – Metadata associated with the
data.load_lazy (
Optional[Callable[[Dataset],None]]) – If constructing a lazy dataset, calling this function will read all the data and convert the dataset to a non-lazy one.generate_metadata (bool) – Automatically generate and update the metadata.
check (
bool) – DEPRECATED: argument ignored.timestamp (
Optional[datetime]) – DEPRECATED: argument ignored.
-
get_relations_graph()[source]¶ Builds the relations graph for the dataset.
Each key in the output corresponds to a resource/table. The value under a key is the list of edges this table has. The edge is represented by a tuple of four elements. For example, if the edge is
(resource_id, True, index_1, index_2, custom_state), it means that there is a foreign key that points to tableresource_id. Specifically,index_1column in the current table points toindex_2column in the tableresource_id.custom_stateis an empty dict when returned from this method, but allows users of this graph to store custom state there.
-
is_lazy()[source]¶ Return whether this dataset instance is lazy and not all data has been loaded.
- Returns
Trueif this dataset instance is lazy.- Return type
-
classmethod
load(dataset_uri, *, dataset_id=None, dataset_version=None, dataset_name=None, lazy=False, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False, handle_score_split=True)[source]¶ Tries to load dataset from
dataset_uriusing all registered dataset loaders.- Parameters
dataset_uri (
str) – A URI to load.dataset_id (
Optional[str]) – Override dataset ID determined by the loader.dataset_version (
Optional[str]) – Override dataset version determined by the loader.dataset_name (
Optional[str]) – Override dataset name determined by the loader.lazy (
bool) – IfTrue, load only top-level metadata and not whole dataset.compute_digest (
ComputeDigest) – Compute a digest over the data?strict_digest (
bool) – If computed digest does not match the one provided in metadata, raise an exception?handle_score_split (
bool) – If a scoring dataset has target values in a separate file, merge them in?
- Returns
A loaded dataset.
- Return type
-
classmethod
register_loader(loader)[source]¶ Registers a new dataset loader.
- Parameters
loader (
Loader) – An instance of the loader class implementing a new loader.- Return type
None
-
classmethod
register_saver(saver)[source]¶ Registers a new dataset saver.
- Parameters
saver (
Saver) – An instance of the saver class implementing a new saver.- Return type
None
-
save(dataset_uri, *, compute_digest=<ComputeDigest.ALWAYS: 'ALWAYS'>, preserve_metadata=True)[source]¶ Tries to save dataset to
dataset_uriusing all registered dataset savers.- Parameters
dataset_uri (
str) – A URI to save to.compute_digest (
ComputeDigest) – Compute digest over the data when saving?preserve_metadata (
bool) – When saving a dataset, store its metadata as well?
- Return type
None
-
select_rows(row_indices_to_keep)[source]¶ Generate a new Dataset from the row indices for DataFrames.
-
to_json_structure(*, canonical=False)[source]¶ Returns only a top-level dataset description.
- Return type
-
loaders: List[d3m.container.dataset.Loader] = [<d3m.container.dataset.D3MDatasetLoader object>, <d3m.container.dataset.CSVLoader object>, <d3m.container.dataset.SklearnExampleLoader object>, <d3m.container.dataset.OpenMLDatasetLoader object>][source]¶
-
metadata: d3m.metadata.base.DataMetadata[source]¶