Advanced Pipelines¶
TODO: Document that custom/additional fields are allowed (which are part of digest). Document _prefix fields (which are not part of digest).
TODO: Document sub-pipeline step. Document how data references for sub-pipelines are done.
TODO: Document placeholder step.
TODO: Document resolving of pipelines (by filename based on ID in the pipeline search path).
Interaction with Problem Description¶
TODO: Passing true targets and LUPI through semantic types from the problem description.
Container types¶
All input and output (container) values passed between primitives should
expose a Sequence
protocol (sequence in
samples) and provide metadata attribute with metadata.
d3m.container module exposes such standard types:
Dataset– a class representing datasets, including D3M datasets, implemented ind3m.container.datasetmoduleDataFrame–pandas.DataFramewith support formetadataattribute, implemented ind3m.container.pandasmodulendarray–numpy.ndarraywith support formetadataattribute, implemented ind3m.container.numpymoduleList– a standardlistwith support formetadataattribute, implemented ind3m.container.listmodule
List can be used to create a simple list container.
It is strongly encouraged to use the DataFrame container type for
primitives which do not have strong reasons to use something else
(Datasets to operate on initial pipeline input, or optimized
high-dimensional packed data in ndarrays, or lists to pass as
values to hyper-parameters). This makes it easier to operate just on
columns without type casting while the data is being transformed to make
it useful for models.
When deciding which container type to use for inputs and outputs of a
primitive, consider as well where an expected place for your primitive
is in the pipeline. Generally, pipelines tend to have primitives
operating on Dataset at the beginning, then use DataFrame and
then convert to ndarray.
Data types¶
Container types can contain values of the following types:
container types themselves
Python builtin primitive types:
strbytesboolfloatintdict(consider usingtyping.Dict,typing.NamedTuple, or TypedDict)NoneType
Placeholders¶
Placeholders can be used to define pipeline templates to be used outside of the metalearning context. A placeholder is replaced with a pipeline step to form a pipeline. Restrictions of placeholders may apply on the number of them, their position, allowed inputs and outputs, etc.