d3m.metadata.pipeline¶
-
class
d3m.metadata.pipeline.
NoResolver
(*, strict_resolving=False, strict_digest=False, pipeline_search_paths=None, respect_environment_variable=True, load_all_primitives=True, primitives_blocklist=None)[source]¶ Bases:
d3m.metadata.pipeline.Resolver
A resolver which never resolves anything.
-
pipeline_search_paths
: Sequence[str][source]¶ A list of paths to directories with pipelines to resolve from. Their files should be named
<pipeline id>.json
,<pipeline id>.yml
, or<pipeline id>.yaml
.
-
-
class
d3m.metadata.pipeline.
Pipeline
(pipeline_id=None, *, context=None, created=None, source=None, name=None, description=None)[source]¶ Bases:
object
Class representing a pipeline.
- Parameters
pipeline_id (
Optional
[str
]) – Optional ID for the pipeline. If not provided, it is automatically generated.created (datetime.datetime) – Optional timestamp of pipeline creation in UTC timezone. If not provided, the current time will be used.
source (Optional[Dict]) – Description of source. Optional.
name (Optional[str]) – Name of the pipeline. Optional.
description (Optional[str]) – Description of the pipeline. Optional.
-
add_step
(step)[source]¶ Add a step to the sequence of steps in the pipeline.
- Parameters
step (
StepBase
) – A step to add.- Return type
None
-
add_user
(user_description)[source]¶ Add a description of user to a list of users associated with the pipeline.
- Parameters
user_description (
Dict
) – User description.- Return type
None
-
check
(*, allow_placeholders=False, standard_pipeline=True, input_types=None)[source]¶ Check if the pipeline is a valid pipeline.
It supports checking against non-resolved primitives and pipelines, but in that case checking will be very limited. Make sure you used a strict resolver to assure full checking of this pipeline and any sub-pipelines.
Raises an exception if check fails.
- Parameters
- Return type
None
-
equals
(pipeline, *, strict_order=False, only_control_hyperparams=False)[source]¶ Check if the two pipelines are equal in the sense of isomorphism.
- Parameters
pipeline (~P) – A pipeline instance.
strict_order (
bool
) – If true, we will treat inputs of Set hyper-parameters as a list, and the order of primitives are determined by their step indices. Otherwise we will try to sort contents of Set hyper-parameters so the orders of their contents are not important, and we will try topological sorting to determine the order of nodes.only_control_hyperparams (
bool
) – If true, equality checks will not happen for any hyper-parameters that are not of theControlParameter
semantic type, i.e. there will be no checks for hyper-parameters that are specific to the hyper-parameter optimization phase, and not part of the logic of the pipeline.
Notes
This algorithm checks if the two pipelines are equal in the sense of isomorphism by solving a graph isomorphism problem. The general graph isomorphism problem is known to be neither P nor NP-complete. However, our pipelines are DAGs so we could have an algorithm to check its isomorphism in polynomial time.
The complexity of this algorithm is around \(O((V + E)logV)\), where \(V\) is the number of steps in the pipeline and \(E\) is the number of output references. It tries to assign unique orders to all nodes layer by layer greedily followed by a topological sort using DFS. Then we can get a unique, hashable & comparable tuple representing the structure of the pipeline. It is also a unique representation of the equivalence class of a pipeline in the sense of isomorphism.
- Return type
-
classmethod
from_json
(string_or_file, *, resolver=None, strict_digest=False)[source]¶ - Return type
~P
-
classmethod
from_json_structure
(pipeline_description, *, resolver=None, strict_digest=False)[source]¶ - Return type
~P
-
classmethod
from_yaml
(string_or_file, *, resolver=None, strict_digest=False)[source]¶ - Return type
~P
-
get_all_hyperparams
()[source]¶ Returns pipeline’s hyper-parameters as a list of hyper-parameters for each step, in order of steps.
- Returns
A list of hyper-parameters configuration for all hyper-parameters for each step.
- Return type
-
get_available_data_references
(for_step=None)[source]¶ Returns a set of data references provided by existing steps (and pipeline inputs).
Those data references can be used by consequent steps as their inputs.
-
get_exposable_outputs
()[source]¶ Returns a set of recursive data references of all values exposable by the pipeline during its run.
This represents exposable outputs of each step of the pipeline and exposable outputs of any sub-pipeline. The latter are prefixed with the step prefix, e.g.,
steps.1.steps.4.produce
issteps.4.produce
output of a sub-pipeline step with index 1.Outputs of sub-pipelines are represented twice, as an output of the step and as an output of the sub-pipeline. This is done because not all outputs of a sub-pipeline are necessarily exposed as an output of a step because they might not be used in the outer pipeline, but the sub-pipeline still defines them.
A primitive might have additional produce methods which could be called but they are not listed among step’s outputs. Data references related to those produce methods are returned as well. If you not want those, use get_producing_outputs instead.
- Returns
A set of recursive data references.
- Return type
-
get_free_hyperparams
()[source]¶ Returns pipeline’s hyper-parameters which have not been fixed by the pipeline as a list of free hyper-parameters for each step, in order of steps.
- Returns
A list of hyper-parameters configuration for free hyper-parameters for each step.
- Return type
-
get_producing_outputs
()[source]¶ Returns a set of recursive data references of all values produced by the pipeline during its run.
This represents outputs of each step of the pipeline, the outputs of the pipeline itself, but also producing outputs of any sub-pipeline. The latter are prefixed with the step prefix, e.g.,
steps.1.steps.4.produce
issteps.4.produce
output of a sub-pipeline step with index 1.Outputs of sub-pipelines are represented twice, as an output of the step and as an output of the sub-pipeline. This is done because not all outputs of a sub-pipeline are necessarily exposed as an output of a step because they might not be used in the outer pipeline, but the sub-pipeline still defines them.
A primitive might have additional produce methods which could be called but they are not listed among step’s outputs. Data references related to those produce methods are not returned. If you need those as well, use get_exposable_outputs isntead.
- Returns
A set of recursive data references.
- Return type
-
has_placeholder
()[source]¶ Returns
True
if the pipeline has a placeholder step, in the pipeline itself, or any subpipeline.- Returns
True
if the pipeline has a placeholder step.- Return type
-
hash
(*, strict_order=False, only_control_hyperparams=False)[source]¶ Get the hash value of a pipeline. It simply hashes the unique representation of the equivalence class of a pipeline in the sense of isomorphism.
- strict_order:
If true, we will treat inputs of Set hyper-parameters as a list, and the order of primitives are determined by their step indices. Otherwise we will try to sort contents of Set hyper-parameters so the orders of their contents are not important, and we will try topological sorting to determine the order of nodes.
- only_control_hyperparams:
If true, equality checks will not happen for any hyper-parameters that are not of the
ControlParameter
semantic type, i.e. there will be no checks for hyper-parameters that are specific to the hyper-parameter optimization phase, and not part of the logic of the pipeline.
- Return type
-
replace_step
(index, replacement_step)[source]¶ Replace an existing step (generally a placeholder) with a new step (generally a subpipeline). It makes sure that all inputs are available at that point in the pipeline, and all outputs needed later from this step stay available after replacement.
If the old pipeline (one before the step being replaced) has already been made public under some ID, make sure that new pipeline (one with replaced step) has a new different ID before making it public.
- Parameters
index (
int
) – Index of the step to replace.replacement_step (
StepBase
) – A new step.
- Return type
None
-
created
: datetime.datetime[source]¶ Timestamp of pipeline creation in UTC timezone.
-
inputs
: List[Dict][source]¶ A sequence of input descriptions which provide names for pipeline inputs.
-
class
d3m.metadata.pipeline.
PlaceholderStep
(resolver=None)[source]¶ Bases:
d3m.metadata.pipeline.StepBase
Class representing one step in pipeline’s execution.
- Parameters
resolver (d3m.metadata.pipeline.Resolver) – Resolver to use.
-
check_add
(existing_steps, available_data_references)[source]¶ Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.
Raises an exception if check fails.
- Parameters
existing_steps (
Sequence
[StepBase
]) – Steps already in the pipeline.available_data_references (
AbstractSet
[str
]) – A set of available data references.
- Return type
None
-
get_all_hyperparams
()[source]¶ Returns step’s hyper-parameters.
- Returns
Hyper-parameters configuration for all hyper-parameters, or a list of those.
- Return type
-
get_free_hyperparams
()[source]¶ Returns step’s hyper-parameters which have not been fixed by the pipeline.
- Returns
Hyper-parameters configuration for free hyper-parameters, or a list of those.
- Return type
-
resolver
: d3m.metadata.pipeline.Resolver[source]¶ Resolver to use.
-
class
d3m.metadata.pipeline.
PrimitiveStep
(primitive_description=None, *, primitive=None, resolver=None)[source]¶ Bases:
d3m.metadata.pipeline.StepBase
Class representing a primitive execution step in pipeline’s execution.
- Parameters
primitive_description (Optional[Dict]) – A description of the primitive specified for this step. Allowed only if
primitive
is not provided.primitive (Optional[Type[d3m.primitive_interfaces.base.PrimitiveBase]]) – A primitive class associated with this step. If not provided, resolved using
resolver
fromprimitive_description
.
-
add_argument
(name, argument_type, data=<d3m.metadata.pipeline.NOT_SET_TYPE object>, data_reference=None)[source]¶ Associate a data reference to an argument of this step (and underlying primitive).
-
add_hyperparameter
(name, argument_type, data)[source]¶ Associate a value for a hyper-parameter of this step (and underlying primitive).
-
add_output
(output_id)[source]¶ Define an output from this step.
Underlying primitive can have multiple produce methods but not all have to be defined as outputs of the step.
- Parameters
output_id (
str
) – A name of the method producing this output.- Return type
None
-
add_user
(user_description)[source]¶ Add a description of user to a list of users associated with the primitive.
- Parameters
user_description (
Dict
) – User description.- Return type
None
-
check_add
(existing_steps, available_data_references)[source]¶ Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.
Raises an exception if check fails.
- Parameters
existing_steps (
Sequence
[StepBase
]) – Steps already in the pipeline.available_data_references (
AbstractSet
[str
]) – A set of available data references.
- Return type
None
-
get_all_hyperparams
()[source]¶ Returns step’s hyper-parameters.
- Returns
Hyper-parameters configuration for all hyper-parameters, or a list of those.
- Return type
-
get_free_hyperparams
()[source]¶ Returns step’s hyper-parameters which have not been fixed by the pipeline.
- Returns
Hyper-parameters configuration for free hyper-parameters, or a list of those.
- Return type
-
arguments
: Dict[str, Dict][source]¶ A map between argument name and its description. Description contains a data reference of an output of a prior step (or a pipeline input).
-
hyperparams
: Dict[str, Dict][source]¶ A map of of fixed hyper-parameters to their values which are set as part of a pipeline and should not be tuned during hyper-parameter tuning.
-
primitive
: Optional[Type[d3m.primitive_interfaces.base.PrimitiveBase]][source]¶ A primitive class associated with this step.
-
class
d3m.metadata.pipeline.
Resolver
(*, strict_resolving=False, strict_digest=False, pipeline_search_paths=None, respect_environment_variable=True, load_all_primitives=True, primitives_blocklist=None)[source]¶ Bases:
object
A resolver to resolve primitives and pipelines.
It resolves primitives from available primitives on the system, and resolves pipelines from files in pipeline search paths.
- Parameters
strict_resolving (bool) – If resolved pipeline or primitive does not fully match specified primitive reference, raise an exception?
strict_digest (bool) – When loading pipelines or primitives, if computed digest does not match the one provided in metadata, raise an exception?
pipeline_search_paths (Sequence[str]) – A list of paths to directories with pipelines to resolve from. Their files should be named
<pipeline id>.json
,<pipeline id>.yml
, or<pipeline id>.yaml
.respect_environment_variable (
bool
) – Use also (colon separated) pipeline search paths fromPIPELINES_PATH
environment variable?load_all_primitives (
bool
) – Load all primitives before attempting to resolve them. IfFalse
any primitive used in a pipeline has to be loaded before calling the resolver.primitives_blocklist (
Optional
[Collection
[str
]]) – A collection of primitive path prefixes to not (try to) load.
-
pipeline_search_paths
: Sequence[str][source]¶ A list of paths to directories with pipelines to resolve from. Their files should be named
<pipeline id>.json
,<pipeline id>.yml
, or<pipeline id>.yaml
.
-
class
d3m.metadata.pipeline.
SubpipelineStep
(pipeline_description=None, *, pipeline=None, resolver=None)[source]¶ Bases:
d3m.metadata.pipeline.StepBase
Class representing one step in pipeline’s execution.
- Parameters
resolver (d3m.metadata.pipeline.Resolver) – Resolver to use.
-
add_output
(output_id)[source]¶ Define an output from this step.
Underlying pipeline can have multiple outputs but not all have to be defined as outputs of the step. They can be skipped using
None
.
-
check_add
(existing_steps, available_data_references)[source]¶ Checks if a step can be added given existing steps and available data references to provide to the step. It also checks if the state of a step is suitable to be added at this point.
Raises an exception if check fails.
- Parameters
existing_steps (
Sequence
[StepBase
]) – Steps already in the pipeline.available_data_references (
AbstractSet
[str
]) – A set of available data references.
- Return type
None
-
get_all_hyperparams
()[source]¶ Returns step’s hyper-parameters.
- Returns
Hyper-parameters configuration for all hyper-parameters, or a list of those.
- Return type
-
get_free_hyperparams
()[source]¶ Returns step’s hyper-parameters which have not been fixed by the pipeline.
- Returns
Hyper-parameters configuration for free hyper-parameters, or a list of those.
- Return type
-
resolver
: d3m.metadata.pipeline.Resolver[source]¶ Resolver to use.