d3m.container.pandas module

class d3m.container.pandas.DataFrame(data=None, metadata=None, index=None, columns=None, dtype=None, copy=False, *, generate_metadata=False, check=True, source=None, timestamp=None)[source]

Bases: pandas.core.frame.DataFrame

Extended pandas.DataFrame with the metadata attribute.

Parameters
  • data (Union[Sequence, Mapping, None]) – Anything array-like to create an instance from.

  • metadata (Optional[Dict[str, Any]]) – Optional initial metadata for the top-level of the data frame, or top-level metadata to be updated if data is another instance of this data frame class.

  • index (Union[Index, Sequence, Mapping, None]) – Index to use for resulting frame.

  • columns (Union[Index, Sequence, Mapping, None]) – Column labels to use for resulting frame.

  • dtype (Union[str, dtype, ExtensionDtype, None]) – Data type to force.

  • copy (bool) – Copy data from inputs.

  • generate_metadata (bool) – Automatically generate and update the metadata.

  • check (bool) – DEPRECATED: argument ignored.

  • source (Optional[Any]) – DEPRECATED: argument ignored.

  • timestamp (Optional[datetime]) – DEPRECATED: argument ignored.

metadata[source]

Metadata associated with the data frame.

append_columns(right, *, use_right_metadata=False)[source]

Appends all columns from right to the right of this DataFrame, together with all metadata of columns.

Metadata at the top-level of right DataFrame is ignored, not merged, except if use_right_metadata is set, in which case top-level metadata of this DataFrame is ignored and one from right is used instead.

Return type

~D

horizontal_concat(right, *, use_index=True, remove_second_index=True, use_right_metadata=False)[source]

Similar to append_columns, but it respects primary index columns, by default.

It has some heuristics how it tries to match up primary index columns in the case that there are multiple of them, but generally it aligns samples by all primary index columns.

It is required that both inputs have the same number of samples.

Return type

~D

insert_columns(columns, at_column_index)[source]

Inserts all columns from columns before at_column_index column in this DataFrame, pushing all existing columns to the right.

E.g., at_column_index == 0 means inserting columns at the beginning of this DataFrame.

Top-level metadata of columns is ignored.

Return type

~D

metadata: metadata_base.DataMetadata[source]
remove_columns(column_indices)[source]

Removes columns from the DataFrame and returns one without them, together with all metadata for columns removed as well.

It throws an exception if no columns would be left after removing columns.

Return type

~D

replace_columns(columns, column_indices, *, copy=True)[source]

Replaces columns listed in column_indices with columns, in order, in this DataFrame.

column_indices and columns do not have to match in number of columns. Columns are first replaced in order for matching indices and columns. If then there are more column_indices than columns, additional column_indices columns are removed. If there are more columns than column_indices columns, then additional columns are inserted after the last replaced column.

If column_indices is empty, then the behavior is equivalent to calling append_columns.

Top-level metadata of columns is ignored.

Return type

~D

select_columns(columns, *, allow_empty_columns=False)[source]

Returns a new DataFrame with data and metadata only for given columns. Moreover, columns are renumbered based on the position in columns list. Top-level metadata stays unchanged, except for updating the length of the columns dimension to the number of columns.

So if the columns is [3, 6, 5] then output DataFrame will have three columns, [0, 1, 2], mapping data and metadata for columns 3 to 0, 6 to 1 and 5 to 2.

This allows also duplication of columns.

Return type

~D

to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=False, **kwargs)[source]

Extends pandas.DataFrame to provide better default method for writing DataFrames to CSV files. If header argument is not explicitly provided column names are derived from metadata of the DataFrame. By default DataFrame indices are not written.

Parameters
  • path_or_buf (Union[IO[Any], str, None]) – File path or object, if None is provided the result is returned as a string.

  • sep (str) – String of length 1. Field delimiter for the output file.

  • na_rep (str) – Missing data representation.

  • float_format (Optional[str]) – Format string for floating point numbers.

  • columns (Optional[Sequence]) – Columns to write.

  • header (Union[bool, Sequence[str]]) – Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.

  • index (bool) – Write row names (index).

  • kwargs (Any) – Other arguments.

Return type

Optional[str]