d3m.base.primitives

class d3m.base.primitives.DatasetSplitPrimitiveBase(*, hyperparams, random_seed=0, docker_containers=None, volumes=None, temporary_directory=None)[source]

Bases: d3m.primitive_interfaces.generator.GeneratorPrimitiveBase[[d3m.container.list.List, typing.Params], typing.Hyperparams]

A base class for primitives which fit on a Dataset object to produce splits of that Dataset when producing. There are two produce methods: produce and produce_score_data. They take as an input a list of non-negative integers which identify which Dataset splits to return.

This class is parameterized using only by two type variables, Params and Hyperparams.

abstract produce(*, inputs, timeout=None, iterations=None)[source]

For each input integer creates a Dataset split and produces the training Dataset object. This Dataset object should then be used to fit (train) the pipeline.

Parameters
  • inputs (List) – The inputs of shape [num_inputs, …].

  • timeout (Optional[float]) – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations (Optional[int]) – How many of internal iterations should the primitive do.

Returns

The outputs of shape [num_inputs, …] wrapped inside CallResult.

Return type

CallResult[List]

abstract produce_score_data(*, inputs, timeout=None, iterations=None)[source]

For each input integer creates a Dataset split and produces the scoring Dataset object. This Dataset object should then be used to test the pipeline and score the results.

Output Dataset objects do not have targets redacted and are not directly suitable for testing.

Return type

CallResult[List]

abstract set_training_data(*, dataset)[source]

Sets training data of this primitive, the Dataset to split.

Parameters

dataset (Dataset) – The dataset to split.

Return type

None

docker_containers: Dict[str, d3m.primitive_interfaces.base.DockerContainer][source]

A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.

hyperparams: Hyperparams[source]

Hyperparams passed to the constructor.

logger: ClassVar[logging.Logger][source]

Primitive’s logger. Available as a class attribute. This gets automatically set to primitive’s logger in metaclass.

metadata: ClassVar[d3m.metadata.base.PrimitiveMetadata][source]

Primitive’s metadata. Available as a class attribute. Primitive author should provide all fields which cannot be determined automatically inside the code. In this way metadata is close to the code and it is easier for consumers to make sure metadata they are using is really matching the code they are using. PrimitiveMetadata class updates itself with metadata about code and other things it can extract automatically.

random_seed: int[source]

Random seed passed to the constructor.

temporary_directory: Optional[str][source]

An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.

volumes: Dict[str, str][source]

A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.

class d3m.base.primitives.FileReaderPrimitiveBase(*, hyperparams)[source]

Bases: d3m.primitive_interfaces.transformer.TransformerPrimitiveBase[[d3m.container.pandas.DataFrame, d3m.container.pandas.DataFrame], d3m.base.primitives.FileReaderHyperparams]

A primitive base class for reading files referenced in columns.

Primitives using this base class must implement:

  • _supported_media_types: A sequence of supported media types such as audio/mpeg, image/jpeg, etc.

  • _file_structural_type: Structural type of the file contents after being read such as container.ndarray, container.DataFrame, etc.

  • _file_semantic_types: A sequence of semantic types to be applied to the produced column.

  • metadata: Primitive Metadata.

  • _read_fileuri: The function which describes how to load each file. This function must load one file at the time.

produce(*, inputs, timeout=None, iterations=None)[source]

Produce primitive’s best choice of the output for each of the inputs.

The output value should be wrapped inside CallResult object before returning.

In many cases producing an output is a quick operation in comparison with fit, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs. timeout and iterations can serve as a way for a caller to guide the length of this process.

Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a TimeoutError exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose of timeout is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting the timeout (caller can terminate the misbehaving primitive from outside anyway). If a longer timeout would produce different outputs, then CallResult’s has_finished should be set to False.

Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If iterations is None, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). If iterations is a number, a primitive has to do those number of iterations, if possible. timeout should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should make CallResult contain correct values.

For primitives which do not have internal iterations, any value of iterations means that they should run fully, respecting only timeout.

If primitive should have been fitted before calling this method, but it has not been, primitive should raise a PrimitiveNotFittedError exception.

Parameters
  • inputs (DataFrame) – The inputs of shape [num_inputs, …].

  • timeout (Optional[float]) – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations (Optional[int]) – How many of internal iterations should the primitive do.

Returns

The outputs of shape [num_inputs, …] wrapped inside CallResult.

Return type

CallResult[DataFrame]

class d3m.base.primitives.TabularSplitPrimitiveBase(*, hyperparams, random_seed=0)[source]

Bases: d3m.base.primitives.DatasetSplitPrimitiveBase[d3m.base.primitives.TabularSplitPrimitiveParams, typing.Hyperparams]

A primitive base class for splitting tabular datasets.

Primitives using this base class must implement:

  • _get_splits: The function which describes how to split the tabular dataset.

__getstate__()[source]
Return type

dict

__setstate__(state)[source]
Return type

None

fit(*, timeout=None, iterations=None)[source]

This function computes everything in advance, including generating the relation graph.

Parameters
  • timeout (Optional[float]) – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations (Optional[int]) – How many of internal iterations should the primitive do.

Returns

A CallResult with None value.

Return type

CallResult[None]

fit_multi_produce(*, produce_methods, inputs, dataset, timeout=None, iterations=None)[source]

A method calling fit and after that multiple produce methods at once.

Parameters
  • produce_methods (Sequence[str]) – A list of names of produce methods to call.

  • inputs (List) – The inputs given to all produce methods.

  • outputs – The outputs given to set_training_data.

  • timeout (Optional[float]) – A maximum time this primitive should take to both fit the primitive and produce outputs for all produce methods listed in produce_methods argument, in seconds.

  • iterations (Optional[int]) – How many of internal iterations should the primitive do for both fitting and producing outputs of all produce methods.

Returns

A dict of values for each produce method wrapped inside MultiCallResult.

Return type

MultiCallResult

get_params()[source]

Returns parameters of this primitive.

Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.

Parameters should include all data which is necessary to create a new instance of this primitive behaving exactly the same as this instance, when the new instance is created by passing the same parameters to the class constructor and calling set_params.

No other arguments to the method are allowed (except for private arguments).

Returns

An instance of parameters.

Return type

TabularSplitPrimitiveParams

produce(*, inputs, timeout=None, iterations=None)[source]

For each input integer creates a Dataset split and produces the training Dataset object. This Dataset object should then be used to fit (train) the pipeline.

Parameters
  • inputs (List) – The inputs of shape [num_inputs, …].

  • timeout (Optional[float]) – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations (Optional[int]) – How many of internal iterations should the primitive do.

Returns

The outputs of shape [num_inputs, …] wrapped inside CallResult.

Return type

CallResult[List]

produce_score_data(*, inputs, timeout=None, iterations=None)[source]

For each input integer creates a Dataset split and produces the scoring Dataset object. This Dataset object should then be used to test the pipeline and score the results.

Output Dataset objects do not have targets redacted and are not directly suitable for testing.

Return type

CallResult[List]

set_params(*, params)[source]

Sets parameters of this primitive.

Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.

No other arguments to the method are allowed (except for private arguments).

Parameters

params (TabularSplitPrimitiveParams) – An instance of parameters.

Return type

None

set_training_data(*, dataset)[source]

Sets training data of this primitive, the Dataset to split.

Parameters

dataset (Dataset) – The dataset to split.

Return type

None

docker_containers: Dict[str, d3m.primitive_interfaces.base.DockerContainer][source]

A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.

hyperparams: Hyperparams[source]

Hyperparams passed to the constructor.

logger: ClassVar[logging.Logger][source]

Primitive’s logger. Available as a class attribute. This gets automatically set to primitive’s logger in metaclass.

metadata: ClassVar[d3m.metadata.base.PrimitiveMetadata][source]

Primitive’s metadata. Available as a class attribute. Primitive author should provide all fields which cannot be determined automatically inside the code. In this way metadata is close to the code and it is easier for consumers to make sure metadata they are using is really matching the code they are using. PrimitiveMetadata class updates itself with metadata about code and other things it can extract automatically.

random_seed: int[source]

Random seed passed to the constructor.

temporary_directory: Optional[str][source]

An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.

volumes: Dict[str, str][source]

A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.