d3m.base.utils

d3m.base.utils.combine_columns(inputs, column_indices, columns_list, *, return_result, add_index_columns)[source]

Method which appends existing columns, replaces them, or creates new result from them, based on return_result argument, which can be append, replace, or new.

add_index_columns controls if when creating a new result, primary index columns should be added if they are not already among columns.

inputs is a DataFrame for which we are appending on replacing columns, or if we are creating new result, from where a primary index column can be taken.

column_indices controls which columns in inputs were used to create columns_list, and which columns should be replaced when replacing them.

columns_list is a list of DataFrames representing all together new columns. The reason it is a list is to make it easier to operate per-column when preparing columns_list and not have to concat them all together unnecessarily.

Top-level metadata in columns_list is ignored, except when creating new result. In that case top-level metadata from the first element in the list is used.

When column_indices columns are being replaced with columns_list, existing metadata in column_indices columns is not preserved but replaced with metadata in columns_list. Ideally, metadata for columns_list has been constructed by copying source metadata from column_indices columns and modifying it as necessary to adapt it to new columns. But columns_list also can have completely new metadata, if this is more reasonable, but it should be understood that in this case when replacing column_indices columns, any custom additional metadata on those columns will be lost.

column_indices and columns_list do not have to match in number of columns. Columns are first replaced in order for matching indices and columns. If then there are more column_indices than columns_list, additional column_indices columns are removed. If there are more columns_list than column_indices columns, then additional columns_list are inserted after the last replaced column.

If column_indices is empty, then the replacing behavior is equivalent to appending.

Return type

DataFrame

d3m.base.utils.combine_columns_metadata(inputs, column_indices, columns_list, *, return_result, add_index_columns)[source]

Analogous to combine_columns but operates only on metadata.

Return type

DataMetadata

d3m.base.utils.construct_file_uri(location_base_uris, filename)[source]

Construct the file URI given location_base_uris values and a filename (which should be in POSIX format). Generally, filename comes from the column of a collection resource.

Return type

str

d3m.base.utils.get_columns_to_use(metadata, use_columns, exclude_columns, can_use_column)[source]

A helper function which computes a list of columns to use and a list of columns to ignore given use_columns, exclude_columns, and a can_use_column function which should return True when column can be used.

Return type

Tuple[List[int], List[int]]

d3m.base.utils.get_tabular_resource(dataset, resource_id, *, pick_entry_point=True, pick_one=True, has_hyperparameter=True)[source]
Return type

Tuple[str, DataFrame]

d3m.base.utils.get_tabular_resource_metadata(dataset, resource_id, *, pick_entry_point=True, pick_one=True)[source]
Return type

Union[int, str, ALL_ELEMENTS_TYPE]

d3m.base.utils.sample_rows(dataset, main_resource_id, main_resource_indices_to_keep, relations_graph, *, delete_recursive=False)[source]
Return type

Dataset