Advanced Pipelines¶
TODO: Document that custom/additional fields are allowed (which are part of digest). Document _prefix fields (which are not part of digest).
TODO: Document sub-pipeline step. Document how data references for sub-pipelines are done.
TODO: Document placeholder step.
TODO: Document resolving of pipelines (by filename based on ID in the pipeline search path).
Interaction with Problem Description¶
TODO: Passing true targets and LUPI through semantic types from the problem description.
Container types¶
All input and output (container) values passed between primitives should
expose a Sequence
protocol (sequence in
samples) and provide metadata
attribute with metadata.
d3m.container
module exposes such standard types:
Dataset
– a class representing datasets, including D3M datasets, implemented ind3m.container.dataset
moduleDataFrame
–pandas.DataFrame
with support formetadata
attribute, implemented ind3m.container.pandas
modulendarray
–numpy.ndarray
with support formetadata
attribute, implemented ind3m.container.numpy
moduleList
– a standardlist
with support formetadata
attribute, implemented ind3m.container.list
module
List
can be used to create a simple list container.
It is strongly encouraged to use the DataFrame
container type for
primitives which do not have strong reasons to use something else
(Dataset
s to operate on initial pipeline input, or optimized
high-dimensional packed data in ndarray
s, or list
s to pass as
values to hyper-parameters). This makes it easier to operate just on
columns without type casting while the data is being transformed to make
it useful for models.
When deciding which container type to use for inputs and outputs of a
primitive, consider as well where an expected place for your primitive
is in the pipeline. Generally, pipelines tend to have primitives
operating on Dataset
at the beginning, then use DataFrame
and
then convert to ndarray
.
Data types¶
Container types can contain values of the following types:
container types themselves
Python builtin primitive types:
str
bytes
bool
float
int
dict
(consider usingtyping.Dict
,typing.NamedTuple
, or TypedDict)NoneType
Placeholders¶
Placeholders can be used to define pipeline templates to be used outside of the metalearning context. A placeholder is replaced with a pipeline step to form a pipeline. Restrictions of placeholders may apply on the number of them, their position, allowed inputs and outputs, etc.