d3m.contrib.openml.crawler¶

d3m.contrib.openml.crawler.crawl_openml(save_dir, task_types, *, data_pipeline, data_params=None, context, random_seed=0, volumes_dir=None, scratch_dir=None, runtime_environment=None, max_tasks=None, ignore_tasks=[], ignore_datasets=[], dataset_resolver=None, problem_resolver=None, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False)[source]¶

A function that crawls OpenML tasks and corresponding datasets and converts them to D3M datasets and problems.

Parameters

save_dir (str) – A directory where to save datasets and problems.
task_types (Sequence[OpenMLTaskType]) – Task types to crawl.
data_pipeline (Pipeline) – A data preparation pipeline used for splitting.
data_params (Optional[Dict[str, str]]) – A dictionary that contains the hyper-parameters for the data prepration pipeline.
context (Context) – In which context to run pipelines.
random_seed (int) – A random seed to use for every run. This control all randomness during the run.
volumes_dir (Optional[str]) – Path to a directory with static files required by primitives.
scratch_dir (Optional[str]) – Path to a directory to store any temporary files needed during execution.
runtime_environment (Optional[RuntimeEnvironment]) – A description of the runtime environment.
max_tasks (Optional[int]) – Maximum number of tasks to crawl, no limit if None or 0.
dataset_resolver (Optional[Callable]) – A dataset resolver to use.
problem_resolver (Optional[Callable]) – A problem description resolver to use.
compute_digest (ComputeDigest) – Compute a digest over the data?
strict_digest (bool) – If computed digest does not match the one provided in metadata, raise an exception?

Returns

A boolean set to true if there was an error during the call.

Return type

bool

d3m.contrib.openml.crawler.crawl_openml_handler(arguments, *, pipeline_resolver=None, dataset_resolver=None, problem_resolver=None)[source]¶

Return type: None

d3m.contrib.openml.crawler.crawl_openml_task(datasets, task_id, save_dir, *, data_pipeline, data_params=None, context, random_seed=0, volumes_dir=None, scratch_dir=None, runtime_environment=None, dataset_resolver=None, problem_resolver=None, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False)[source]¶

A function that crawls an OpenML task and corresponding dataset, do the split using a data preparation pipeline, and stores the splits as D3M dataset and problem description.

Parameters

datasets (Dict[str, str]) – A mapping between known dataset IDs and their paths. Is updated in-place.
task_id (int) – An integer representing and OpenML task id to crawl and convert.
save_dir (str) – A directory where to save datasets and problems.
data_pipeline (Pipeline) – A data preparation pipeline used for splitting.
data_params (Optional[Dict[str, str]]) – A dictionary that contains the hyper-parameters for the data prepration pipeline.
context (Context) – In which context to run pipelines.
random_seed (int) – A random seed to use for every run. This control all randomness during the run.
volumes_dir (Optional[str]) – Path to a directory with static files required by primitives.
scratch_dir (Optional[str]) – Path to a directory to store any temporary files needed during execution.
runtime_environment (Optional[RuntimeEnvironment]) – A description of the runtime environment.
dataset_resolver (Optional[Callable]) – A dataset resolver to use.
problem_resolver (Optional[Callable]) – A problem description resolver to use.
compute_digest (ComputeDigest) – Compute a digest over the data?
strict_digest (bool) – If computed digest does not match the one provided in metadata, raise an exception?

Return type

None

d3m.contrib.openml.crawler¶

Version

Table of Contents