d3m.container.dataset¶
-
class
d3m.container.dataset.
ComputeDigest
(value)[source]¶ Bases:
d3m.utils.Enum
Enumeration of possible approaches to computing dataset digest.
-
class
d3m.container.dataset.
Dataset
(resources, metadata=None, *, load_lazy=None, generate_metadata=False, check=True, source=None, timestamp=None)[source]¶ Bases:
dict
A class representing a dataset.
Internally, it is a dictionary containing multiple resources (e.g., tables).
- Parameters
resources (
Mapping
) – A map from resource IDs to resources.metadata (d3m.metadata.base.DataMetadata) – Metadata associated with the
data
.load_lazy (
Optional
[Callable
[[Dataset
],None
]]) – If constructing a lazy dataset, calling this function will read all the data and convert the dataset to a non-lazy one.generate_metadata (bool) – Automatically generate and update the metadata.
check (
bool
) – DEPRECATED: argument ignored.timestamp (
Optional
[datetime
]) – DEPRECATED: argument ignored.
-
get_relations_graph
()[source]¶ Builds the relations graph for the dataset.
Each key in the output corresponds to a resource/table. The value under a key is the list of edges this table has. The edge is represented by a tuple of four elements. For example, if the edge is
(resource_id, True, index_1, index_2, custom_state)
, it means that there is a foreign key that points to tableresource_id
. Specifically,index_1
column in the current table points toindex_2
column in the tableresource_id
.custom_state
is an empty dict when returned from this method, but allows users of this graph to store custom state there.
-
is_lazy
()[source]¶ Return whether this dataset instance is lazy and not all data has been loaded.
- Returns
True
if this dataset instance is lazy.- Return type
-
classmethod
load
(dataset_uri, *, dataset_id=None, dataset_version=None, dataset_name=None, lazy=False, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False, handle_score_split=True)[source]¶ Tries to load dataset from
dataset_uri
using all registered dataset loaders.- Parameters
dataset_uri (
str
) – A URI to load.dataset_id (
Optional
[str
]) – Override dataset ID determined by the loader.dataset_version (
Optional
[str
]) – Override dataset version determined by the loader.dataset_name (
Optional
[str
]) – Override dataset name determined by the loader.lazy (
bool
) – IfTrue
, load only top-level metadata and not whole dataset.compute_digest (
ComputeDigest
) – Compute a digest over the data?strict_digest (
bool
) – If computed digest does not match the one provided in metadata, raise an exception?handle_score_split (
bool
) – If a scoring dataset has target values in a separate file, merge them in?
- Returns
A loaded dataset.
- Return type
-
classmethod
register_loader
(loader)[source]¶ Registers a new dataset loader.
- Parameters
loader (
Loader
) – An instance of the loader class implementing a new loader.- Return type
None
-
classmethod
register_saver
(saver)[source]¶ Registers a new dataset saver.
- Parameters
saver (
Saver
) – An instance of the saver class implementing a new saver.- Return type
None
-
save
(dataset_uri, *, compute_digest=<ComputeDigest.ALWAYS: 'ALWAYS'>, preserve_metadata=True)[source]¶ Tries to save dataset to
dataset_uri
using all registered dataset savers.- Parameters
dataset_uri (
str
) – A URI to save to.compute_digest (
ComputeDigest
) – Compute digest over the data when saving?preserve_metadata (
bool
) – When saving a dataset, store its metadata as well?
- Return type
None
-
select_rows
(row_indices_to_keep)[source]¶ Generate a new Dataset from the row indices for DataFrames.
-
to_json_structure
(*, canonical=False)[source]¶ Returns only a top-level dataset description.
- Return type
-
loaders
: List[d3m.container.dataset.Loader] = [<d3m.container.dataset.D3MDatasetLoader object>, <d3m.container.dataset.CSVLoader object>, <d3m.container.dataset.SklearnExampleLoader object>, <d3m.container.dataset.OpenMLDatasetLoader object>][source]¶
-
metadata
: d3m.metadata.base.DataMetadata[source]¶