d3m.contrib.openml.crawler¶
-
d3m.contrib.openml.crawler.
crawl_openml
(save_dir, task_types, *, data_pipeline, data_params=None, context, random_seed=0, volumes_dir=None, scratch_dir=None, runtime_environment=None, max_tasks=None, ignore_tasks=[], ignore_datasets=[], dataset_resolver=None, problem_resolver=None, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False)[source]¶ A function that crawls OpenML tasks and corresponding datasets and converts them to D3M datasets and problems.
- Parameters
save_dir (
str
) – A directory where to save datasets and problems.task_types (
Sequence
[OpenMLTaskType
]) – Task types to crawl.data_pipeline (
Pipeline
) – A data preparation pipeline used for splitting.data_params (
Optional
[Dict
[str
,str
]]) – A dictionary that contains the hyper-parameters for the data prepration pipeline.context (
Context
) – In which context to run pipelines.random_seed (
int
) – A random seed to use for every run. This control all randomness during the run.volumes_dir (
Optional
[str
]) – Path to a directory with static files required by primitives.scratch_dir (
Optional
[str
]) – Path to a directory to store any temporary files needed during execution.runtime_environment (
Optional
[RuntimeEnvironment
]) – A description of the runtime environment.max_tasks (
Optional
[int
]) – Maximum number of tasks to crawl, no limit ifNone
or 0.dataset_resolver (
Optional
[Callable
]) – A dataset resolver to use.problem_resolver (
Optional
[Callable
]) – A problem description resolver to use.compute_digest (
ComputeDigest
) – Compute a digest over the data?strict_digest (
bool
) – If computed digest does not match the one provided in metadata, raise an exception?
- Returns
A boolean set to true if there was an error during the call.
- Return type
-
d3m.contrib.openml.crawler.
crawl_openml_handler
(arguments, *, pipeline_resolver=None, dataset_resolver=None, problem_resolver=None)[source]¶ - Return type
None
-
d3m.contrib.openml.crawler.
crawl_openml_task
(datasets, task_id, save_dir, *, data_pipeline, data_params=None, context, random_seed=0, volumes_dir=None, scratch_dir=None, runtime_environment=None, dataset_resolver=None, problem_resolver=None, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False)[source]¶ A function that crawls an OpenML task and corresponding dataset, do the split using a data preparation pipeline, and stores the splits as D3M dataset and problem description.
- Parameters
datasets (
Dict
[str
,str
]) – A mapping between known dataset IDs and their paths. Is updated in-place.task_id (
int
) – An integer representing and OpenML task id to crawl and convert.save_dir (
str
) – A directory where to save datasets and problems.data_pipeline (
Pipeline
) – A data preparation pipeline used for splitting.data_params (
Optional
[Dict
[str
,str
]]) – A dictionary that contains the hyper-parameters for the data prepration pipeline.context (
Context
) – In which context to run pipelines.random_seed (
int
) – A random seed to use for every run. This control all randomness during the run.volumes_dir (
Optional
[str
]) – Path to a directory with static files required by primitives.scratch_dir (
Optional
[str
]) – Path to a directory to store any temporary files needed during execution.runtime_environment (
Optional
[RuntimeEnvironment
]) – A description of the runtime environment.dataset_resolver (
Optional
[Callable
]) – A dataset resolver to use.problem_resolver (
Optional
[Callable
]) – A problem description resolver to use.compute_digest (
ComputeDigest
) – Compute a digest over the data?strict_digest (
bool
) – If computed digest does not match the one provided in metadata, raise an exception?
- Return type
None