d3m.metadata.pipeline module

class d3m.metadata.pipeline.Pipeline(pipeline_id=None, *, context=None, created=None, source=None, name=None, description=None)[source]

Bases: object

Class representing a pipeline.

id[source]

A unique ID to identify this pipeline.

created[source]

Timestamp of pipeline creation in UTC timezone.

source[source]

Description of source.

name[source]

Name of the pipeline.

description[source]

Description of the pipeline.

users[source]

Users associated with the pipeline.

inputs[source]

A sequence of input descriptions which provide names for pipeline inputs.

outputs[source]

A sequence of output descriptions which provide data references for pipeline outputs.

steps[source]

A sequence of steps defining this pipeline.

Parameters
  • pipeline_id (Optional[str]) – Optional ID for the pipeline. If not provided, it is automatically generated.

  • context (Optional[Context]) – DEPRECATED: argument ignored.

  • created (Optional[datetime]) – Optional timestamp of pipeline creation in UTC timezone. If not provided, the current time will be used.

  • source (Optional[Dict]) – Description of source. Optional.

  • name (Optional[str]) – Name of the pipeline. Optional.

  • description (Optional[str]) – Description of the pipeline. Optional.

add_input(name=None)[source]

Add an input to the pipeline.

Parameters

name (Optional[str]) – Optional human friendly name for the input.

Returns

Return type

Data reference for the input added.

add_output(data_reference, name=None)[source]

Add an output to the pipeline.

Parameters
  • data_reference (str) – Data reference to use as an output.

  • name (Optional[str]) – Optional human friendly name for the output.

Returns

Return type

Data reference for the output added.

add_step(step)[source]

Add a step to the sequence of steps in the pipeline.

Parameters

step (StepBase) – A step to add.

Return type

None

add_user(user_description)[source]

Add a description of user to a list of users associated with the pipeline.

Parameters

user_description (Dict) – User description.

Return type

None

check(*, allow_placeholders=False, standard_pipeline=True, input_types=None)[source]

Check if the pipeline is a valid pipeline.

It supports checking against non-resolved primitives and pipelines, but in that case checking will be very limited. Make sure you used a strict resolver to assure full checking of this pipeline and any sub-pipelines.

Raises an exception if check fails.

Parameters
  • allow_placeholders (bool) – Do we allow placeholders in a pipeline?

  • standard_pipeline (bool) – Check it as a standard pipeline (inputs are Dataset objects, output is a DataFrame)?

  • input_types (Optional[Dict[str, type]]) – A map of types available as inputs. If provided, overrides standard_pipeline.

Return type

None

created: datetime.datetime = None[source]
description: str = None[source]
equals(pipeline, *, strict_order=False, only_control_hyperparams=False)[source]

Check if the two pipelines are equal in the sense of isomorphism.

Parameters
  • pipeline (~P) – A pipeline instance.

  • strict_order (bool) – If true, we will treat inputs of Set hyperparameters as a list, and the order of primitives are determined by their step indices. Otherwise we will try to sort contents of Set hyperparameters so the orders of their contents are not important, and we will try topological sorting to determine the order of nodes.

  • only_control_hyperparams (bool) – If true, equality checks will not happen for any hyperparameters that are not of the ControlParameter semantic type, i.e. there will be no checks for hyperparameters that are specific to the hyperparameter optimization phase, and not part of the logic of the pipeline.

Notes

This algorithm checks if the two pipelines are equal in the sense of isomorphism by solving a graph isomorphism problem. The general graph isomorphism problem is known to be neither P nor NP-complete. However, our pipelines are DAGs so we could have an algorithm to check its isomorphism in polynomial time.

The complexity of this algorithm is around \(O((V + E)logV)\), where \(V\) is the number of steps in the pipeline and \(E\) is the number of output references. It tries to assign unique orders to all nodes layer by layer greedily followed by a topological sort using DFS. Then we can get a unique, hashable & comparable tuple representing the structure of the pipeline. It is also a unique representation of the equivalence class of a pipeline in the sense of isomorphism.

Return type

bool

classmethod from_json(string_or_file, *, resolver=None, strict_digest=False)[source]
Return type

~P

classmethod from_json_structure(pipeline_description, *, resolver=None, strict_digest=False)[source]
Return type

~P

classmethod from_yaml(string_or_file, *, resolver=None, strict_digest=False)[source]
Return type

~P

get_all_hyperparams()[source]

Returns pipeline’s hyper-parameters as a list of hyper-parameters for each step, in order of steps.

Returns

Return type

A list of hyper-parameters configuration for all hyper-parameters for each step.

get_available_data_references(for_step=None)[source]

Returns a set of data references provided by existing steps (and pipeline inputs).

Those data references can be used by consequent steps as their inputs.

for_step[source]

Instead of using all existing steps, use only steps until for_step step.

Returns

Return type

A set of data references.

get_digest()[source]
Return type

str

get_exposable_outputs()[source]
Return type

AbstractSet[str]

get_free_hyperparams()[source]

Returns pipeline’s hyper-parameters which have not been fixed by the pipeline as a list of free hyper-parameters for each step, in order of steps.

Returns

Return type

A list of hyper-parameters configuration for free hyper-parameters for each step.

get_producing_outputs()[source]

Returns a set of recursive data references of all values produced by the pipeline during its run.

This represents outputs of each step of the pipeline, the outputs of the pipeline itself, but also exposable outputs of any sub-pipeline. The latter are prefixed with the step prefix, e.g., steps.1.steps.4.produce is steps.4.produce output of a sub-pipeline step with index 1.

Outputs of sub-pipelines are represented twice, as an output of the step and as an output of the sub-pipeline. This is done because not all outputs of a sub-pipeline are necessary exposed as an output of a step because they might not be used in the outer pipeline, but the sub-pipeline still defines them.

A primitive might have additional produce methods which could be called but they are not listed among step’s outputs. Data references related to those produce methods are not returned.

Returns

Return type

A set of recursive data references.

has_placeholder()[source]

Returns True if the pipeline has a placeholder step, in the pipeline itself, or any subpipeline.

Returns

Return type

True if the pipeline has a placeholder step.

hash(*, strict_order=False, only_control_hyperparams=False)[source]

Get the hash value of a pipeline. It simply hashes the unique representation of the equivalence class of a pipeline in the sense of isomorphism.

strict_order:

If true, we will treat inputs of Set hyperparameters as a list, and the order of primitives are determined by their step indices. Otherwise we will try to sort contents of Set hyperparameters so the orders of their contents are not important, and we will try topological sorting to determine the order of nodes.

only_control_hyperparams:

If true, equality checks will not happen for any hyperparameters that are not of the ControlParameter semantic type, i.e. there will be no checks for hyperparameters that are specific to the hyperparameter optimization phase, and not part of the logic of the pipeline.

Return type

int

id: str = None[source]
inputs: typing.List[typing.Dict] = None[source]
name: str = None[source]
outputs: typing.List[typing.Dict] = None[source]
replace_step(index, replacement_step)[source]

Replace an existing step (generally a placeholder) with a new step (generally a subpipeline). It makes sure that all inputs are available at that point in the pipeline, and all outputs needed later from this step stay available after replacement.

If the old pipeline (one before the step being replaced) has already been made public under some ID, make sure that new pipeline (one with replaced step) has a new different ID before making it public.

Parameters
  • index (int) – Index of the step to replace.

  • replacement_step (StepBase) – A new step.

Return type

None

source: typing.Dict = None[source]
steps: typing.List[StepBase] = None[source]
to_json(file=None, *, nest_subpipelines=False, canonical=False, **kwargs)[source]
Return type

Optional[str]

to_json_structure(*, nest_subpipelines=False, canonical=False)[source]
Return type

Dict

to_yaml(file=None, *, nest_subpipelines=False, canonical=False, **kwargs)[source]
Return type

Optional[str]

users: typing.List[typing.Dict] = None[source]
class d3m.metadata.pipeline.Resolver(*, strict_resolving=False, strict_digest=False, pipeline_search_paths=None, respect_environment_variable=True, load_all_primitives=True, primitives_blocklist=None)[source]

Bases: object

A resolver to resolve primitives and pipelines.

It resolves primitives from available primitives on the system, and resolves pipelines from files in pipeline search paths.

strict_resolving[source]

If resolved pipeline or primitive does not fully match specified primitive reference, raise an exception?

strict_digest[source]

When loading pipelines or primitives, if computed digest does not match the one provided in metadata, raise an exception?

pipeline_search_paths[source]

A list of paths to directories with pipelines to resolve from. Their files should be named <pipeline id>.json, <pipeline id>.yml, or <pipeline id>.yaml.

Parameters
  • strict_resolving (bool) – If resolved pipeline or primitive does not fully match specified primitive reference, raise an exception?

  • strict_digest (bool) – When loading pipelines or primitives, if computed digest does not match the one provided in metadata, raise an exception?

  • pipeline_search_paths (Optional[Sequence[str]]) – A list of paths to directories with pipelines to resolve from. Their files should be named <pipeline id>.json, <pipeline id>.yml, or <pipeline id>.yaml.

  • respect_environment_variable (bool) – Use also (colon separated) pipeline search paths from PIPELINES_PATH environment variable?

  • load_all_primitives (bool) – Load all primitives before attempting to resolve them. If False any primitive used in a pipeline has to be loaded before calling the resolver.

  • primitives_blocklist (Optional[Collection[str]]) – A collection of primitive path prefixes to not (try to) load.

get_pipeline(pipeline_description)[source]
Return type

Optional[Pipeline]

classmethod get_pipeline_class()[source]
Return type

Type[Pipeline]

get_primitive(primitive_description)[source]
Return type

Optional[Type[PrimitiveBase]]

pipeline_search_paths: typing.List[str] = None[source]
strict_digest: bool = None[source]
strict_resolving: bool = None[source]
class d3m.metadata.pipeline.NoResolver(*, strict_resolving=False, strict_digest=False, pipeline_search_paths=None, respect_environment_variable=True, load_all_primitives=True, primitives_blocklist=None)[source]

Bases: d3m.metadata.pipeline.Resolver

A resolver which never resolves anything.

pipeline_search_paths = None[source]
strict_digest = None[source]
strict_resolving = None[source]
class d3m.metadata.pipeline.PrimitiveStep(primitive_description=None, *, primitive=None, resolver=None)[source]

Bases: d3m.metadata.pipeline.StepBase

Class representing a primitive execution step in pipeline’s execution.

primitive_description[source]

A description of the primitive specified for this step. Available if primitive could not be resolved.

primitive[source]

A primitive class associated with this step.

outputs[source]

A list of method names providing outputs for this step.

hyperparams[source]

A map of of fixed hyper-parameters to their values which are set as part of a pipeline and should not be tuned during hyper-parameter tuning.

arguments[source]

A map between argument name and its description. Description contains a data reference of an output of a prior step (or a pipeline input).

users[source]

Users associated with the primitive.

Parameters
  • primitive_description (Optional[Dict]) – A description of the primitive specified for this step. Allowed only if primitive is not provided.

  • primitive (Optional[Type[PrimitiveBase]]) – A primitive class associated with this step. If not provided, resolved using resolver from primitive_description.

add_argument(name, argument_type, data_reference)[source]

Associate a data reference to an argument of this step (and underlying primitive).

Parameters
  • name (str) – Argument name.

  • argument_type (Any) – Argument type.

  • data_reference (Union[str, Sequence[str]]) – Data reference or a list of data references associated with this argument.

Return type

None

add_hyperparameter(name, argument_type, data)[source]

Associate a value for a hyper-parameter of this step (and underlying primitive).

Parameters
  • name (str) – Hyper-parameter name.

  • argument_type (Any) – Argument type.

  • data (Any) – Data reference associated with this hyper-parameter, or list of data references, or value itself.

Return type

None

add_output(output_id)[source]

Define an output from this step.

Underlying primitive can have multiple produce methods but not all have to be defined as outputs of the step.

Parameters

output_id (str) – A name of the method producing this output.

Return type

None

add_user(user_description)[source]

Add a description of user to a list of users associated with the primitive.

Parameters

user_description (Dict) – User description.

Return type

None

arguments: typing.Dict[str, typing.Dict] = None[source]
check_add(existing_steps, available_data_references)[source]

Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.

Raises an exception if check fails.

Parameters
  • existing_steps (Sequence[StepBase]) – Steps already in the pipeline.

  • available_data_references (AbstractSet[str]) – A set of available data references.

Return type

None

classmethod from_json_structure(step_description, *, resolver=None)[source]
Return type

~SP

get_all_hyperparams()[source]

Returns step’s hyper-parameters.

Returns

Return type

Hyper-parameters configuration for all hyper-parameters, or a list of those.

get_free_hyperparams()[source]

Returns step’s hyper-parameters which have not been fixed by the pipeline.

Returns

Return type

Hyper-parameters configuration for free hyper-parameters, or a list of those.

get_input_data_references()[source]
Return type

AbstractSet[str]

get_output_data_references()[source]
Return type

AbstractSet[str]

get_primitive_hyperparams()[source]
Return type

Hyperparams

get_primitive_id()[source]
Return type

str

classmethod get_step_type()[source]
Return type

PipelineStepType

hyperparams: typing.Dict[str, typing.Dict] = None[source]
outputs: typing.List[str] = None[source]
primitive: typing.Type[base.PrimitiveBase] = None[source]
primitive_description: typing.Dict = None[source]
to_json_structure()[source]
Return type

Dict

users: typing.List[typing.Dict] = None[source]
class d3m.metadata.pipeline.SubpipelineStep(pipeline_description=None, *, pipeline=None, resolver=None)[source]

Bases: d3m.metadata.pipeline.StepBase

Class representing one step in pipeline’s execution.

index[source]

An index of the step among steps in the pipeline.

resolver[source]

Resolver to use.

Parameters

resolver (Optional[Resolver]) – Resolver to use.

add_input(data_reference)[source]
Return type

None

add_output(output_id)[source]

Define an output from this step.

Underlying pipeline can have multiple outputs but not all have to be defined as outputs of the step. They can be skipped using None.

Parameters

output_id (Optional[str]) – ID to be used in the data reference, mapping pipeline’s outputs in order. If None this pipeline’s output is ignored and not mapped to a data reference.

Return type

None

check_add(existing_steps, available_data_references)[source]

Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.

Raises an exception if check fails.

Parameters
  • existing_steps (Sequence[StepBase]) – Steps already in the pipeline.

  • available_data_references (AbstractSet[str]) – A set of available data references.

Return type

None

classmethod from_json_structure(step_description, *, resolver=None)[source]
Return type

~SS

get_all_hyperparams()[source]

Returns step’s hyper-parameters.

Returns

Return type

Hyper-parameters configuration for all hyper-parameters, or a list of those.

get_free_hyperparams()[source]

Returns step’s hyper-parameters which have not been fixed by the pipeline.

Returns

Return type

Hyper-parameters configuration for free hyper-parameters, or a list of those.

get_input_data_references()[source]
Return type

AbstractSet[str]

get_output_data_references()[source]
Return type

AbstractSet[str]

get_pipeline_id()[source]
Return type

str

classmethod get_step_type()[source]
Return type

PipelineStepType

index = None[source]
resolver = None[source]
to_json_structure(*, nest_subpipelines=False)[source]
Return type

Dict

class d3m.metadata.pipeline.PlaceholderStep(resolver=None)[source]

Bases: d3m.metadata.pipeline.StepBase

Class representing one step in pipeline’s execution.

index[source]

An index of the step among steps in the pipeline.

resolver[source]

Resolver to use.

Parameters

resolver (Optional[Resolver]) – Resolver to use.

add_input(data_reference)[source]
Return type

None

add_output(output_id)[source]
Return type

None

check_add(existing_steps, available_data_references)[source]

Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.

Raises an exception if check fails.

Parameters
  • existing_steps (Sequence[StepBase]) – Steps already in the pipeline.

  • available_data_references (AbstractSet[str]) – A set of available data references.

Return type

None

classmethod from_json_structure(step_description, *, resolver=None)[source]
Return type

~SL

get_all_hyperparams()[source]

Returns step’s hyper-parameters.

Returns

Return type

Hyper-parameters configuration for all hyper-parameters, or a list of those.

get_free_hyperparams()[source]

Returns step’s hyper-parameters which have not been fixed by the pipeline.

Returns

Return type

Hyper-parameters configuration for free hyper-parameters, or a list of those.

get_input_data_references()[source]
Return type

AbstractSet[str]

get_output_data_references()[source]
Return type

AbstractSet[str]

classmethod get_step_type()[source]
Return type

PipelineStepType

index = None[source]
resolver = None[source]
to_json_structure()[source]
Return type

Dict