d3m.container.pandas¶
-
class
d3m.container.pandas.DataFrame(data=None, metadata=None, index=None, columns=None, dtype=None, copy=False, *, generate_metadata=False, check=True, source=None, timestamp=None)[source]¶ Bases:
pandas.core.frame.DataFrameExtended pandas.DataFrame with the
metadataattribute.- Parameters
data (
Union[Sequence,Mapping,None]) – Anything array-like to create an instance from.metadata (d3m.metadata.base.DataMetadata) – Optional initial metadata for the top-level of the data frame, or top-level metadata to be updated if
datais another instance of this data frame class.index (pandas.core.indexes.base.Index) – Index to use for resulting frame.
columns (pandas.core.indexes.base.Index) – Column labels to use for resulting frame.
dtype (
Union[dtype,str,ExtensionDtype,None]) – Data type to force.copy (
bool) – Copy data from inputs.generate_metadata (
bool) – Automatically generate and update the metadata.check (
bool) – DEPRECATED: argument ignored.timestamp (
Optional[datetime]) – DEPRECATED: argument ignored.
-
append_columns(right, *, use_right_metadata=False)[source]¶ Appends all columns from
rightto the right of this DataFrame, together with all metadata of columns.Metadata at the top-level of
rightDataFrame is ignored, not merged, except ifuse_right_metadatais set, in which case top-level metadata of this DataFrame is ignored and one fromrightis used instead.- Return type
~D
-
horizontal_concat(right, *, use_index=True, remove_second_index=True, use_right_metadata=False)[source]¶ Similar to
append_columns, but it respects primary index columns, by default.It has some heuristics how it tries to match up primary index columns in the case that there are multiple of them, but generally it aligns samples by all primary index columns.
It is required that both inputs have the same number of samples.
- Return type
~D
-
insert_columns(columns, at_column_index)[source]¶ Inserts all columns from
columnsbeforeat_column_indexcolumn in this DataFrame, pushing all existing columns to the right.E.g.,
at_column_index == 0means insertingcolumnsat the beginning of this DataFrame.Top-level metadata of
columnsis ignored.- Return type
~D
-
remove_columns(column_indices)[source]¶ Removes columns from the DataFrame and returns one without them, together with all metadata for columns removed as well.
It throws an exception if no columns would be left after removing columns.
- Return type
~D
-
replace_columns(columns, column_indices, *, copy=True)[source]¶ Replaces columns listed in
column_indiceswithcolumns, in order, in this DataFrame.column_indicesandcolumnsdo not have to match in number of columns. Columns are first replaced in order for matching indices and columns. If then there are morecolumn_indicesthancolumns, additionalcolumn_indicescolumns are removed. If there are morecolumnsthancolumn_indicescolumns, then additionalcolumnsare inserted after the last replaced column.If
column_indicesis empty, then the behavior is equivalent to callingappend_columns.Top-level metadata of
columnsis ignored.- Return type
~D
-
select_columns(columns, *, allow_empty_columns=False)[source]¶ Returns a new DataFrame with data and metadata only for given
columns. Moreover, columns are renumbered based on the position incolumnslist. Top-level metadata stays unchanged, except for updating the length of the columns dimension to the number of columns.So if the
columnsis[3, 6, 5]then output DataFrame will have three columns,[0, 1, 2], mapping data and metadata for columns3to0,6to1and5to2.This allows also duplication of columns.
- Return type
~D
-
to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=False, **kwargs)[source]¶ Extends pandas.DataFrame to provide better default method for writing DataFrames to CSV files. If
headerargument is not explicitly provided column names are derived from metadata of the DataFrame. By default DataFrame indices are not written.See also
- Parameters
path_or_buf (
Union[IO[Any],str,Path,None]) – File path or object, if None is provided the result is returned as a string.sep (
str) – String of length 1. Field delimiter for the output file.na_rep (
str) – Missing data representation.float_format (
Optional[str]) – Format string for floating point numbers.header (
Union[bool,Sequence[str]]) – Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.index (
bool) – Write row names (index).kwargs (
Any) – Other arguments.
- Return type
-
metadata: d3m.metadata.base.DataMetadata[source]¶ Metadata associated with the data frame.