d3m.contrib.openml.crawler¶
-
d3m.contrib.openml.crawler.crawl_openml(save_dir, task_types, *, data_pipeline, data_params=None, context, random_seed=0, volumes_dir=None, scratch_dir=None, runtime_environment=None, max_tasks=None, ignore_tasks=[], ignore_datasets=[], dataset_resolver=None, problem_resolver=None, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False)[source]¶ A function that crawls OpenML tasks and corresponding datasets and converts them to D3M datasets and problems.
- Parameters
save_dir (
str) – A directory where to save datasets and problems.task_types (
Sequence[OpenMLTaskType]) – Task types to crawl.data_pipeline (
Pipeline) – A data preparation pipeline used for splitting.data_params (
Optional[Dict[str,str]]) – A dictionary that contains the hyper-parameters for the data prepration pipeline.context (
Context) – In which context to run pipelines.random_seed (
int) – A random seed to use for every run. This control all randomness during the run.volumes_dir (
Optional[str]) – Path to a directory with static files required by primitives.scratch_dir (
Optional[str]) – Path to a directory to store any temporary files needed during execution.runtime_environment (
Optional[RuntimeEnvironment]) – A description of the runtime environment.max_tasks (
Optional[int]) – Maximum number of tasks to crawl, no limit ifNoneor 0.dataset_resolver (
Optional[Callable]) – A dataset resolver to use.problem_resolver (
Optional[Callable]) – A problem description resolver to use.compute_digest (
ComputeDigest) – Compute a digest over the data?strict_digest (
bool) – If computed digest does not match the one provided in metadata, raise an exception?
- Returns
A boolean set to true if there was an error during the call.
- Return type
-
d3m.contrib.openml.crawler.crawl_openml_handler(arguments, *, pipeline_resolver=None, dataset_resolver=None, problem_resolver=None)[source]¶ - Return type
None
-
d3m.contrib.openml.crawler.crawl_openml_task(datasets, task_id, save_dir, *, data_pipeline, data_params=None, context, random_seed=0, volumes_dir=None, scratch_dir=None, runtime_environment=None, dataset_resolver=None, problem_resolver=None, compute_digest=<ComputeDigest.ONLY_IF_MISSING: 'ONLY_IF_MISSING'>, strict_digest=False)[source]¶ A function that crawls an OpenML task and corresponding dataset, do the split using a data preparation pipeline, and stores the splits as D3M dataset and problem description.
- Parameters
datasets (
Dict[str,str]) – A mapping between known dataset IDs and their paths. Is updated in-place.task_id (
int) – An integer representing and OpenML task id to crawl and convert.save_dir (
str) – A directory where to save datasets and problems.data_pipeline (
Pipeline) – A data preparation pipeline used for splitting.data_params (
Optional[Dict[str,str]]) – A dictionary that contains the hyper-parameters for the data prepration pipeline.context (
Context) – In which context to run pipelines.random_seed (
int) – A random seed to use for every run. This control all randomness during the run.volumes_dir (
Optional[str]) – Path to a directory with static files required by primitives.scratch_dir (
Optional[str]) – Path to a directory to store any temporary files needed during execution.runtime_environment (
Optional[RuntimeEnvironment]) – A description of the runtime environment.dataset_resolver (
Optional[Callable]) – A dataset resolver to use.problem_resolver (
Optional[Callable]) – A problem description resolver to use.compute_digest (
ComputeDigest) – Compute a digest over the data?strict_digest (
bool) – If computed digest does not match the one provided in metadata, raise an exception?
- Return type
None