D3M Developer Documentation

Version

2022.5.6.dev0

This is Data Driven Discovery of Models (D3M) developer documentation. Its target audience is anyone who wants to build upon technologies developed as part of the program or extend them. Primarily, developers and researchers interested in AutoML. If you are not familiar with the program, read about it on its main page.

Core Package

The D3M core package provides the interface of primitives, data types of values which can be passed between them during execution, the pipeline language, the metadata associated with values being passed between primitives, provides a reference runtime, and contains a lot of other useful code to write primitives, generate pipelines, and run them. You are reading its documentation.

AutoML RPC Protocol

D3M provides also a standard GRPC protocol for communicating with AutoML systems. It is used as a standard interface to interact with any D3M-compatible AutoML system. It is documented in its own repository.

Datasets

D3M program provides many datasets in an uniform structure. The format of those datasets is described in this repository.

Metalearning Database

Every pipeline which is run with the reference runtime produces a record of that run, called pipeline run. Those pipeline runs (together with metadata about input datasets and problem description) are stored in centralized and shared metalearning database, building towards a large metalearning dataset. Ideally, all those pipeline runs are fully reproducible. Documentation is here.

Docker

D3M program has many moving pieces, many primitives, with many dependencies. Putting them all together to work correctly can be tricky. This is why we provide Docker images with all primitives and dependencies installed, and configured to work both with or without GPUs. Download a Docker image, datasets, and you are ready to go to run some pipelines. More about Docker images here.