d3m.container.pandas¶
-
class
d3m.container.pandas.
DataFrame
(data=None, metadata=None, index=None, columns=None, dtype=None, copy=False, *, generate_metadata=False, check=True, source=None, timestamp=None)[source]¶ Bases:
pandas.core.frame.DataFrame
Extended pandas.DataFrame with the
metadata
attribute.- Parameters
data (
Union
[Sequence
,Mapping
,None
]) – Anything array-like to create an instance from.metadata (d3m.metadata.base.DataMetadata) – Optional initial metadata for the top-level of the data frame, or top-level metadata to be updated if
data
is another instance of this data frame class.index (pandas.core.indexes.base.Index) – Index to use for resulting frame.
columns (pandas.core.indexes.base.Index) – Column labels to use for resulting frame.
dtype (
Union
[dtype
,str
,ExtensionDtype
,None
]) – Data type to force.copy (
bool
) – Copy data from inputs.generate_metadata (
bool
) – Automatically generate and update the metadata.check (
bool
) – DEPRECATED: argument ignored.timestamp (
Optional
[datetime
]) – DEPRECATED: argument ignored.
-
append_columns
(right, *, use_right_metadata=False)[source]¶ Appends all columns from
right
to the right of this DataFrame, together with all metadata of columns.Metadata at the top-level of
right
DataFrame is ignored, not merged, except ifuse_right_metadata
is set, in which case top-level metadata of this DataFrame is ignored and one fromright
is used instead.- Return type
~D
-
horizontal_concat
(right, *, use_index=True, remove_second_index=True, use_right_metadata=False)[source]¶ Similar to
append_columns
, but it respects primary index columns, by default.It has some heuristics how it tries to match up primary index columns in the case that there are multiple of them, but generally it aligns samples by all primary index columns.
It is required that both inputs have the same number of samples.
- Return type
~D
-
insert_columns
(columns, at_column_index)[source]¶ Inserts all columns from
columns
beforeat_column_index
column in this DataFrame, pushing all existing columns to the right.E.g.,
at_column_index == 0
means insertingcolumns
at the beginning of this DataFrame.Top-level metadata of
columns
is ignored.- Return type
~D
-
remove_columns
(column_indices)[source]¶ Removes columns from the DataFrame and returns one without them, together with all metadata for columns removed as well.
It throws an exception if no columns would be left after removing columns.
- Return type
~D
-
replace_columns
(columns, column_indices, *, copy=True)[source]¶ Replaces columns listed in
column_indices
withcolumns
, in order, in this DataFrame.column_indices
andcolumns
do not have to match in number of columns. Columns are first replaced in order for matching indices and columns. If then there are morecolumn_indices
thancolumns
, additionalcolumn_indices
columns are removed. If there are morecolumns
thancolumn_indices
columns, then additionalcolumns
are inserted after the last replaced column.If
column_indices
is empty, then the behavior is equivalent to callingappend_columns
.Top-level metadata of
columns
is ignored.- Return type
~D
-
select_columns
(columns, *, allow_empty_columns=False)[source]¶ Returns a new DataFrame with data and metadata only for given
columns
. Moreover, columns are renumbered based on the position incolumns
list. Top-level metadata stays unchanged, except for updating the length of the columns dimension to the number of columns.So if the
columns
is[3, 6, 5]
then output DataFrame will have three columns,[0, 1, 2]
, mapping data and metadata for columns3
to0
,6
to1
and5
to2
.This allows also duplication of columns.
- Return type
~D
-
to_csv
(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=False, **kwargs)[source]¶ Extends pandas.DataFrame to provide better default method for writing DataFrames to CSV files. If
header
argument is not explicitly provided column names are derived from metadata of the DataFrame. By default DataFrame indices are not written.See also
- Parameters
path_or_buf (
Union
[IO
[Any
],str
,Path
,None
]) – File path or object, if None is provided the result is returned as a string.sep (
str
) – String of length 1. Field delimiter for the output file.na_rep (
str
) – Missing data representation.float_format (
Optional
[str
]) – Format string for floating point numbers.header (
Union
[bool
,Sequence
[str
]]) – Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.index (
bool
) – Write row names (index).kwargs (
Any
) – Other arguments.
- Return type
-
metadata
: d3m.metadata.base.DataMetadata[source]¶ Metadata associated with the data frame.