d3m.metadata.pipeline¶

class d3m.metadata.pipeline.NoResolver(*, strict_resolving=False, strict_digest=False, pipeline_search_paths=None, respect_environment_variable=True, load_all_primitives=True, primitives_blocklist=None)[source]¶

Bases: d3m.metadata.pipeline.Resolver

A resolver which never resolves anything.

pipeline_search_paths: Sequence[str][source]¶: A list of paths to directories with pipelines to resolve from. Their files should be named <pipeline id>.json, <pipeline id>.yml, or <pipeline id>.yaml.

strict_digest: bool[source]¶: When loading pipelines or primitives, if computed digest does not match the one provided in metadata, raise an exception?

strict_resolving: bool[source]¶: If resolved pipeline or primitive does not fully match specified primitive reference, raise an exception?

class d3m.metadata.pipeline.Pipeline(pipeline_id=None, *, context=None, created=None, source=None, name=None, description=None)[source]¶

Bases: object

Class representing a pipeline.

Parameters

pipeline_id (Optional[str]) – Optional ID for the pipeline. If not provided, it is automatically generated.
context (Optional[Context]) – DEPRECATED: argument ignored.
created (datetime.datetime) – Optional timestamp of pipeline creation in UTC timezone. If not provided, the current time will be used.
source (Optional[Dict]) – Description of source. Optional.
name (Optional[str]) – Name of the pipeline. Optional.
description (Optional[str]) – Description of the pipeline. Optional.

add_input(name=None)[source]¶

Add an input to the pipeline.

Parameters: name (Optional[str]) – Optional human friendly name for the input.
Returns: Data reference for the input added.
Return type: str

add_output(data_reference, name=None)[source]¶

Add an output to the pipeline.

Parameters

data_reference (str) – Data reference to use as an output.
name (Optional[str]) – Optional human friendly name for the output.

Returns

Data reference for the output added.

Return type

str

add_step(step)[source]¶

Add a step to the sequence of steps in the pipeline.

Parameters: step (StepBase) – A step to add.
Return type: None

add_user(user_description)[source]¶

Add a description of user to a list of users associated with the pipeline.

Parameters: user_description (Dict) – User description.
Return type: None

check(*, allow_placeholders=False, standard_pipeline=True, input_types=None)[source]¶

Check if the pipeline is a valid pipeline.

It supports checking against non-resolved primitives and pipelines, but in that case checking will be very limited. Make sure you used a strict resolver to assure full checking of this pipeline and any sub-pipelines.

Raises an exception if check fails.

Parameters

allow_placeholders (bool) – Do we allow placeholders in a pipeline?
standard_pipeline (bool) – Check it as a standard pipeline (inputs are Dataset objects, output is a DataFrame)?
input_types (Optional[Dict[str, type]]) – A map of types available as inputs. If provided, overrides standard_pipeline.

Return type

None

equals(pipeline, *, strict_order=False, only_control_hyperparams=False)[source]¶

Check if the two pipelines are equal in the sense of isomorphism.

Parameters

pipeline (~P) – A pipeline instance.
strict_order (bool) – If true, we will treat inputs of Set hyper-parameters as a list, and the order of primitives are determined by their step indices. Otherwise we will try to sort contents of Set hyper-parameters so the orders of their contents are not important, and we will try topological sorting to determine the order of nodes.
only_control_hyperparams (bool) – If true, equality checks will not happen for any hyper-parameters that are not of the ControlParameter semantic type, i.e. there will be no checks for hyper-parameters that are specific to the hyper-parameter optimization phase, and not part of the logic of the pipeline.

Notes

This algorithm checks if the two pipelines are equal in the sense of isomorphism by solving a graph isomorphism problem. The general graph isomorphism problem is known to be neither P nor NP-complete. However, our pipelines are DAGs so we could have an algorithm to check its isomorphism in polynomial time.

The complexity of this algorithm is around \(O((V + E)logV)\), where \(V\) is the number of steps in the pipeline and \(E\) is the number of output references. It tries to assign unique orders to all nodes layer by layer greedily followed by a topological sort using DFS. Then we can get a unique, hashable & comparable tuple representing the structure of the pipeline. It is also a unique representation of the equivalence class of a pipeline in the sense of isomorphism.

Return type: bool

classmethod from_json(string_or_file, *, resolver=None, strict_digest=False)[source]¶

Return type: ~P

classmethod from_json_structure(pipeline_description, *, resolver=None, strict_digest=False)[source]¶

Return type: ~P

classmethod from_yaml(string_or_file, *, resolver=None, strict_digest=False)[source]¶

Return type: ~P

get_all_hyperparams()[source]¶

Returns pipeline’s hyper-parameters as a list of hyper-parameters for each step, in order of steps.

Returns: A list of hyper-parameters configuration for all hyper-parameters for each step.
Return type: Sequence

get_available_data_references(for_step=None)[source]¶

Returns a set of data references provided by existing steps (and pipeline inputs).

Those data references can be used by consequent steps as their inputs.

Parameters: for_step (Optional[int]) – Instead of using all existing steps, use only steps until for_step step.
Returns: A set of data references.
Return type: AbstractSet[str]

get_digest()[source]¶

Return type: str

get_exposable_outputs()[source]¶

Returns a set of recursive data references of all values exposable by the pipeline during its run.

This represents exposable outputs of each step of the pipeline and exposable outputs of any sub-pipeline. The latter are prefixed with the step prefix, e.g., steps.1.steps.4.produce is steps.4.produce output of a sub-pipeline step with index 1.

Outputs of sub-pipelines are represented twice, as an output of the step and as an output of the sub-pipeline. This is done because not all outputs of a sub-pipeline are necessarily exposed as an output of a step because they might not be used in the outer pipeline, but the sub-pipeline still defines them.

A primitive might have additional produce methods which could be called but they are not listed among step’s outputs. Data references related to those produce methods are returned as well. If you not want those, use get_producing_outputs instead.

Returns: A set of recursive data references.
Return type: AbstractSet[str]

get_free_hyperparams()[source]¶

Returns pipeline’s hyper-parameters which have not been fixed by the pipeline as a list of free hyper-parameters for each step, in order of steps.

Returns: A list of hyper-parameters configuration for free hyper-parameters for each step.
Return type: Sequence

get_producing_outputs()[source]¶

Returns a set of recursive data references of all values produced by the pipeline during its run.

This represents outputs of each step of the pipeline, the outputs of the pipeline itself, but also producing outputs of any sub-pipeline. The latter are prefixed with the step prefix, e.g., steps.1.steps.4.produce is steps.4.produce output of a sub-pipeline step with index 1.

Outputs of sub-pipelines are represented twice, as an output of the step and as an output of the sub-pipeline. This is done because not all outputs of a sub-pipeline are necessarily exposed as an output of a step because they might not be used in the outer pipeline, but the sub-pipeline still defines them.

A primitive might have additional produce methods which could be called but they are not listed among step’s outputs. Data references related to those produce methods are not returned. If you need those as well, use get_exposable_outputs isntead.

Returns: A set of recursive data references.
Return type: AbstractSet[str]

has_placeholder()[source]¶

Returns True if the pipeline has a placeholder step, in the pipeline itself, or any subpipeline.

Returns: True if the pipeline has a placeholder step.
Return type: bool

hash(*, strict_order=False, only_control_hyperparams=False)[source]¶

Get the hash value of a pipeline. It simply hashes the unique representation of the equivalence class of a pipeline in the sense of isomorphism.

strict_order:: If true, we will treat inputs of Set hyper-parameters as a list, and the order of primitives are determined by their step indices. Otherwise we will try to sort contents of Set hyper-parameters so the orders of their contents are not important, and we will try topological sorting to determine the order of nodes.
only_control_hyperparams:: If true, equality checks will not happen for any hyper-parameters that are not of the ControlParameter semantic type, i.e. there will be no checks for hyper-parameters that are specific to the hyper-parameter optimization phase, and not part of the logic of the pipeline.

Return type: int

replace_step(index, replacement_step)[source]¶

Replace an existing step (generally a placeholder) with a new step (generally a subpipeline). It makes sure that all inputs are available at that point in the pipeline, and all outputs needed later from this step stay available after replacement.

If the old pipeline (one before the step being replaced) has already been made public under some ID, make sure that new pipeline (one with replaced step) has a new different ID before making it public.

Parameters

index (int) – Index of the step to replace.
replacement_step (StepBase) – A new step.

Return type

None

to_json(file=None, *, nest_subpipelines=False, canonical=False, **kwargs)[source]¶

Return type: Optional[str]

to_json_structure(*, nest_subpipelines=False, canonical=False)[source]¶

Return type: Dict

to_yaml(file=None, *, nest_subpipelines=False, canonical=False, **kwargs)[source]¶

Return type: Optional[str]

created: datetime.datetime[source]¶: Timestamp of pipeline creation in UTC timezone.

description: Optional[str][source]¶: Description of the pipeline.

id: str[source]¶: A unique ID to identify this pipeline.

inputs: List[Dict][source]¶: A sequence of input descriptions which provide names for pipeline inputs.

name: Optional[str][source]¶: Name of the pipeline.

outputs: List[Dict][source]¶: A sequence of output descriptions which provide data references for pipeline outputs.

source: Optional[Dict][source]¶: Description of source.

steps: List[d3m.metadata.pipeline.StepBase][source]¶: A sequence of steps defining this pipeline.

users: List[Dict][source]¶: Users associated with the pipeline.

class d3m.metadata.pipeline.PlaceholderStep(resolver=None)[source]¶

Bases: d3m.metadata.pipeline.StepBase

Class representing one step in pipeline’s execution.

Parameters: resolver (d3m.metadata.pipeline.Resolver) – Resolver to use.

add_input(data_reference)[source]¶

Return type: None

add_output(output_id)[source]¶

Return type: None

check_add(existing_steps, available_data_references)[source]¶

Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.

Raises an exception if check fails.

Parameters

existing_steps (Sequence[StepBase]) – Steps already in the pipeline.
available_data_references (AbstractSet[str]) – A set of available data references.

Return type

None

classmethod from_json_structure(step_description, *, resolver=None)[source]¶

Return type: ~SL

get_all_hyperparams()[source]¶

Returns step’s hyper-parameters.

Returns: Hyper-parameters configuration for all hyper-parameters, or a list of those.
Return type: Sequence

get_exposable_data_references()[source]¶

Return type: AbstractSet[str]

get_free_hyperparams()[source]¶

Returns step’s hyper-parameters which have not been fixed by the pipeline.

Returns: Hyper-parameters configuration for free hyper-parameters, or a list of those.
Return type: Sequence

get_input_data_references()[source]¶

Return type: AbstractSet[str]

get_output_data_references()[source]¶

Return type: AbstractSet[str]

classmethod get_step_type()[source]¶

Return type: PipelineStepType

to_json_structure()[source]¶

Return type: Dict

index: Optional[int][source]¶: An index of the step among steps in the pipeline.

resolver: d3m.metadata.pipeline.Resolver[source]¶: Resolver to use.

class d3m.metadata.pipeline.PrimitiveStep(primitive_description=None, *, primitive=None, resolver=None)[source]¶

Bases: d3m.metadata.pipeline.StepBase

Class representing a primitive execution step in pipeline’s execution.

Parameters

primitive_description (Optional[Dict]) – A description of the primitive specified for this step. Allowed only if primitive is not provided.
primitive (Optional[Type[d3m.primitive_interfaces.base.PrimitiveBase]]) – A primitive class associated with this step. If not provided, resolved using resolver from primitive_description.

add_argument(name, argument_type, data=<d3m.metadata.pipeline.NOT_SET_TYPE object>, data_reference=None)[source]¶

Associate a data reference to an argument of this step (and underlying primitive).

Parameters

name (str) – Argument name.
argument_type (Any) – Argument type.
data (Any) – Data reference associated with this argument, or list of data references, or value itself.
data_reference (Optional[Any]) – DEPRECATED: argument renamed to data.

Return type

None

add_hyperparameter(name, argument_type, data)[source]¶

Associate a value for a hyper-parameter of this step (and underlying primitive).

Parameters

name (str) – Hyper-parameter name.
argument_type (Any) – Argument type.
data (Any) – Data reference associated with this hyper-parameter, or list of data references, or value itself.

Return type

None

add_output(output_id)[source]¶

Define an output from this step.

Underlying primitive can have multiple produce methods but not all have to be defined as outputs of the step.

Parameters: output_id (str) – A name of the method producing this output.
Return type: None

add_user(user_description)[source]¶

Add a description of user to a list of users associated with the primitive.

Parameters: user_description (Dict) – User description.
Return type: None

check_add(existing_steps, available_data_references)[source]¶

Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.

Raises an exception if check fails.

Parameters

existing_steps (Sequence[StepBase]) – Steps already in the pipeline.
available_data_references (AbstractSet[str]) – A set of available data references.

Return type

None

classmethod from_json_structure(step_description, *, resolver=None)[source]¶

Return type: ~SP

get_all_hyperparams()[source]¶

Returns step’s hyper-parameters.

Returns: Hyper-parameters configuration for all hyper-parameters, or a list of those.
Return type: Dict

get_exposable_data_references()[source]¶

Return type: AbstractSet[str]

get_free_hyperparams()[source]¶

Returns step’s hyper-parameters which have not been fixed by the pipeline.

Returns: Hyper-parameters configuration for free hyper-parameters, or a list of those.
Return type: Dict

get_input_data_references()[source]¶

Return type: AbstractSet[str]

get_output_data_references()[source]¶

Return type: AbstractSet[str]

get_primitive_hyperparams()[source]¶

Return type: Hyperparams

get_primitive_id()[source]¶

Return type: str

classmethod get_step_type()[source]¶

Return type: PipelineStepType

to_json_structure()[source]¶

Return type: Dict

arguments: Dict[str, Dict][source]¶: A map between argument name and its description. Description contains a data reference of an output of a prior step (or a pipeline input).

hyperparams: Dict[str, Dict][source]¶: A map of of fixed hyper-parameters to their values which are set as part of a pipeline and should not be tuned during hyper-parameter tuning.

outputs: List[str][source]¶: A list of method names providing outputs for this step.

primitive: Optional[Type[d3m.primitive_interfaces.base.PrimitiveBase]][source]¶: A primitive class associated with this step.

primitive_description: Optional[Dict][source]¶: A description of the primitive specified for this step. Available if primitive could not be resolved.

users: List[Dict][source]¶: Users associated with the primitive.

class d3m.metadata.pipeline.Resolver(*, strict_resolving=False, strict_digest=False, pipeline_search_paths=None, respect_environment_variable=True, load_all_primitives=True, primitives_blocklist=None)[source]¶

Bases: object

A resolver to resolve primitives and pipelines.

It resolves primitives from available primitives on the system, and resolves pipelines from files in pipeline search paths.

Parameters

strict_resolving (bool) – If resolved pipeline or primitive does not fully match specified primitive reference, raise an exception?
strict_digest (bool) – When loading pipelines or primitives, if computed digest does not match the one provided in metadata, raise an exception?
pipeline_search_paths (Sequence[str]) – A list of paths to directories with pipelines to resolve from. Their files should be named <pipeline id>.json, <pipeline id>.yml, or <pipeline id>.yaml.
respect_environment_variable (bool) – Use also (colon separated) pipeline search paths from PIPELINES_PATH environment variable?
load_all_primitives (bool) – Load all primitives before attempting to resolve them. If False any primitive used in a pipeline has to be loaded before calling the resolver.
primitives_blocklist (Optional[Collection[str]]) – A collection of primitive path prefixes to not (try to) load.

get_pipeline(pipeline_description)[source]¶

Return type: Optional[Pipeline]

classmethod get_pipeline_class()[source]¶

Return type: Type[Pipeline]

get_primitive(primitive_description)[source]¶

Return type: Optional[Type[PrimitiveBase]]

pipeline_search_paths: Sequence[str][source]¶: A list of paths to directories with pipelines to resolve from. Their files should be named <pipeline id>.json, <pipeline id>.yml, or <pipeline id>.yaml.

strict_digest: bool[source]¶: When loading pipelines or primitives, if computed digest does not match the one provided in metadata, raise an exception?

strict_resolving: bool[source]¶: If resolved pipeline or primitive does not fully match specified primitive reference, raise an exception?

class d3m.metadata.pipeline.SubpipelineStep(pipeline_description=None, *, pipeline=None, resolver=None)[source]¶

Bases: d3m.metadata.pipeline.StepBase

Class representing one step in pipeline’s execution.

Parameters: resolver (d3m.metadata.pipeline.Resolver) – Resolver to use.

add_input(data_reference)[source]¶

Return type: None

add_output(output_id)[source]¶

Define an output from this step.

Underlying pipeline can have multiple outputs but not all have to be defined as outputs of the step. They can be skipped using None.

Parameters: output_id (Optional[str]) – ID to be used in the data reference, mapping pipeline’s outputs in order. If None this pipeline’s output is ignored and not mapped to a data reference.
Return type: None

check_add(existing_steps, available_data_references)[source]¶

Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.

Raises an exception if check fails.

Parameters

existing_steps (Sequence[StepBase]) – Steps already in the pipeline.
available_data_references (AbstractSet[str]) – A set of available data references.

Return type