Extractors (battdat.io)#

Tools for reading external formats into BatteryDataset objects and exporting data to disk.

Base Classes (b.io.base)#

Base class for a battery data import and export tools

class battdat.io.base.CycleTestReader#

Bases: DatasetFileReader

Template class for reading the files output by battery cell cyclers

Adds logic for reading cycling time series from a list of files.

output_class#

alias of CellDataset

read_dataset(group: Sequence[Path | str] = (), metadata: BatteryMetadata | None = None) CellDataset#

Parse a set of files into a Pandas dataframe

Parameters:
  • group – List of files to parse as part of the same test. Ordered sequentially

  • metadata – Metadata for the battery, should adhere to the BatteryMetadata schema

Returns:

DataFrame containing the information from all files

read_file(file: str, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:
  • file – Path to the file

  • file_number – Number of file, in case the test is spread across multiple files

  • start_cycle – Index to use for the first cycle, in case test is spread across multiple files

  • start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

class battdat.io.base.DatasetFileReader#

Bases: DatasetReader

Tool which reads datasets written to files

Provide an identify_files() to filter out files likely to be in this format, or group() function to find related file if data are often split into multiple files.

group(files: str | Path | List[str | Path], directories: List[str | Path] | None = None, context: dict | None = None) Iterator[tuple[str | Path, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are all files that could be read by this extractor, which may include many false positives.

Parameters:
  • files – List of files to consider grouping

  • directories – Any directories to consider group as well

  • context – Context about the files

Yields:

Groups of files

identify_files(path: str | Path, context: dict | None = None) Iterator[tuple[str | Path]]#

Identify all groups of files likely to be compatible with this reader

Uses the group() function to determine groups of files that should be parsed together.

Parameters:
  • path – Root of directory to group together

  • context – Context about the files

Yields:

Groups of eligible files

class battdat.io.base.DatasetReader#

Bases: object

Base class for tools which read battery data as a BatteryDataset

All readers must implement a function which receives battery metadata as input and produces a completed battdat.data.BatteryDataset as an output.

Subclasses provide additional suggested operations useful when working with data from common sources (e.g., file systems, web APIs)

output_class#

Type of dataset to output

alias of BatteryDataset

read_dataset(metadata: BatteryMetadata | dict | None = None, **kwargs) BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:

metadata – Metadata for the battery

Returns:

Dataset holding all available information about the dataset

class battdat.io.base.DatasetWriter#

Bases: object

Tool which exports data from a BatteryDataset to disk in a specific format

export(dataset: BatteryDataset, path: str | Path)#

Write the dataset to disk in a specific path

All files from the dataset must be placed in the provided directory

Parameters:
  • dataset – Dataset to be exported

  • path – Output path

Arbin (b.io.arbin)#

Extractor for Arbin-format files

class battdat.io.arbin.ArbinReader#

Bases: CycleTestReader

Parser for reading from Arbin-format files

Expects the files to be in CSV format

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are all files that could be read by this extractor, which may include many false positives.

Parameters:
  • files – List of files to consider grouping

  • directories – Any directories to consider group as well

  • context – Context about the files

Yields:

Groups of files

read_file(file: str, file_number: int = 0, start_cycle: int = 0, start_time: float = 0) DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:
  • file – Path to the file

  • file_number – Number of file, in case the test is spread across multiple files

  • start_cycle – Index to use for the first cycle, in case test is spread across multiple files

  • start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

Battery Archive (b.io.ba)#

Tools for streamlining upload to Battery Archive

class battdat.io.ba.BatteryArchiveWriter(chunk_size: int = 100000)#

Bases: DatasetWriter

Export data into CSV files that follow the format definitions used in BatteryArchive

The exporter writes files for each table in the Battery Archive SQL schema with column names matches to their definitions.

chunk_size: int = 100000#

Maximum number of rows to write to disk in a single CSV file

export(dataset: CellDataset, path: Path)#

Write the dataset to disk in a specific path

All files from the dataset must be placed in the provided directory

Parameters:
  • dataset – Dataset to be exported

  • path – Output path

write_cycle_stats(cell_id: str, data: DataFrame, path: Path)#

Write the cycle stats to disk

Parameters:
  • cell_id – Name of the cell

  • data – Cycle stats dataframe

  • path – Path to the output directory

write_metadata(cell_id: str, metadata: BatteryMetadata, path: Path)#

Write the metadata into a JSON file

Parameters:
  • cell_id – ID for the cell

  • metadata – Metadata to be written

  • path – Path in which to write the data

write_timeseries(cell_id: str, data: DataFrame, path: Path)#

Write the time series dataset

Parameters:
  • cell_id – Name for the cell, used as a foreign key to map between tables

  • data – Time series data to write to disk

  • path – Root path for writing cycling data

Battery Data Hub (b.io.batterydata)#

Parse from the CSV formats of batterydata.energy.gov

class battdat.io.batterydata.BDReader(store_all: bool = False)#

Bases: DatasetFileReader

Read data from the batterydata.energy.gov CSV format

Every cell in batterydata.energy.gov is stored as two separate CSV files for each battery, “<cell_name>-summary.csv” for the cycle-level summaries and “<cell_name>-raw.csv” for the time series measurements. Metadata is held in an Excel file, “metadata.xlsx,” in the same directory.

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are all files that could be read by this extractor, which may include many false positives.

Parameters:
  • files – List of files to consider grouping

  • directories – Any directories to consider group as well

  • context – Context about the files

Yields:

Groups of files

read_dataset(group: List[str], metadata: BatteryMetadata | dict | None = None) CellDataset#

Parse a set of files into a Pandas dataframe

Parameters:

metadata – Metadata for the battery

Returns:

Dataset holding all available information about the dataset

store_all: bool = False#

Store all data from the original data, even if we have not defined it

battdat.io.batterydata.convert_eis_data(input_df: DataFrame) DataFrame#

Rename the columns from an NREL-standard set of EIS data to our names and conventions

Parameters:

input_df – NREL-format raw data

Returns:

EIS data in battdat format

battdat.io.batterydata.convert_raw_signal(input_df: DataFrame, store_all: bool) DataFrame#

Convert a cycle statistics dataframe to one using battdat names and conventions

Parameters:
  • input_df – Initial NREL-format dataframe

  • store_all – Whether to store columns even we have not defined their names

Returns:

DataFrame in the battdat format

battdat.io.batterydata.convert_summary(input_df: DataFrame, store_all: bool) DataFrame#

Convert the summary dataframe to a format using battdat names and conventions

Parameters:
  • input_df – Initial NREL-format dataframe

  • store_all – Whether to store columns even we have not defined their names

Returns:

DataFrame in the battdat format

battdat.io.batterydata.generate_metadata(desc: dict, associated_ids: Iterable[str] = ()) BatteryMetadata#

Assemble the battery metadata for a dataset

The metadata for a single dataset are all the same and available by querying the https://batterydata.energy.gov/api/3/action/package_show?id={dataset_id} endpoint of Battery Data Hub.

Parameters:
  • desc – Data from the CKAN metadata response

  • associated_ids – List of other resources associated with this dataset, such as the DOIs of papers.

Returns:

Metadata for the cell provenance and construction

HDF5 (b.io.hdf)#

Read and write from battery-data-toolkit’s HDF format

class battdat.io.hdf.HDF5Reader#

Bases: DatasetReader

Read datasets from a battery-data-toolkit format HDF5 file

The HDF5 format permits multiple datasets in a single HDF5 file so long as they all share the same metadata.

Access these datasets through the read_from_hdf()

read_dataset(path: str | Path, metadata: BatteryMetadata | dict | None = None) BatteryDataset#

Read the default dataset and all subsets from an HDF5 file

Use read_from_hdf() for more control over reads.

Parameters:
  • path – Path to the HDF file

  • metadata – Metadata to use in place of any found in the file

Returns:

Dataset read from the file

read_from_hdf(file: File, prefix: int | str | None, subsets: Collection[str] | None = None)#
class battdat.io.hdf.HDF5Writer(complevel: int = 0, complib: str = 'zlib')#

Bases: DatasetWriter

Interface to write HDF5 files in battery-data-toolkit’s layout

The write_to_hdf() method writes a dataset file with the default settings: assuming a single dataset per HDF5 file.

Use write_to_hdf() to store multiple datasets in the file with a different “prefix” for each.

complevel: int = 0#

Compression level for data. A value of 0 disables compression.

complib: str = 'zlib'#

Specifies the compression library to be used.

export(dataset: BatteryDataset, path: str | Path)#

Write the dataset to disk in a specific path

All files from the dataset must be placed in the provided directory

Parameters:
  • dataset – Dataset to be exported

  • path – Output path

write_to_hdf(dataset: BatteryDataset, file: File, prefix: str | None)#

Add a dataset to an already-open HDF5 file

Parameters:
  • dataset – Dataset to be added

  • file – PyTables file object in which to save the data

  • prefix – Prefix used when storing the data. Use prefixes to store multiple cells in the same HDF5

battdat.io.hdf.as_hdf5_object(path_or_file: str | Path | File, **kwargs) File#

Open a path as a PyTables file object if not done already.

Keyword arguments are used when creating a store from a new file

Parameters:

path_or_file – Either the path to a file or an already open File (in which case this function does nothing)

Yields:

A file that will close on exit from with context, if a file was provided.

battdat.io.hdf.inspect_hdf(file: File) Tuple[BatteryMetadata, Set[str | None]]#

Gather the metadata describing all datasets and the names of datasets within an HDF5 file

Parameters:

file – HDF5 file to read from

Returns:

  • Metadata from this file

  • List of names of datasets stored within the file (prefixes)

battdat.io.hdf.make_numpy_dtype_from_pandas(df: DataFrame) dtype#

Generate a Numpy dtype from a Pandas dataframe

Parameters:

df – Dataframe to be converted

Returns:

  • Structured dtype of the data

battdat.io.hdf.read_df_from_table(table: Table) DataFrame#

Read a dataframe from a table

Parameters:

table – Table to read from

Returns:

Dataframe containing the contents

battdat.io.hdf.write_df_to_table(file: File, group: Group, name: str, df: DataFrame, filters: Filters | None = None, expected_rows: int | None = None) Table#

Write a dataframe to an HDF5 file

Parameters:
  • file – File to be written to

  • group – Group which holds the associated datasets

  • name – Name of the dataset

  • df – DataFrame to write

  • filters – Filters to apply to data entering table

  • expected_rows

Returns:

Table object holding the dataset

MACCOR (b.io.maccor)#

Extractor for MACCOR

class battdat.io.maccor.MACCORReader#

Bases: CycleTestReader, DatasetFileReader

Parser for reading from MACCOR-format files

Expects the files to be ASCII files with a .### extension. The group() operation will consolidate files such that all with the same prefix (i.e., everything except the numerals in the extension) are treated as part of the same experiment.

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are all files that could be read by this extractor, which may include many false positives.

Parameters:
  • files – List of files to consider grouping

  • directories – Any directories to consider group as well

  • context – Context about the files

Yields:

Groups of files

read_file(file: str, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:
  • file – Path to the file

  • file_number – Number of file, in case the test is spread across multiple files

  • start_cycle – Index to use for the first cycle, in case test is spread across multiple files

  • start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

Parquet (b.io.parquet)#

Read and write from battery-data-toolkit’s parquet format

class battdat.io.parquet.ParquetReader#

Bases: DatasetFileReader

Read parquet files formatted according to battery-data-toolkit standards

Mirrors ParquetWriter. Expects each constituent table to be in a separate parquet file and to have the metadata stored in the file-level metadata of the parquet file.

read_dataset(paths: str | Path | Collection[str | Path], metadata: BatteryMetadata | dict | None = None) BatteryDataset#

Read a set of parquet files into a BatteryDataset

Parameters:
  • paths – Either the path to a single-directory of files, or a list of files to parse

  • metadata – Metadata which will overwrite what is available in the files

Returns:

Dataset including all subsets

class battdat.io.parquet.ParquetWriter(overwrite: bool = False, write_options: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: DatasetWriter

Write to parquet files in the format specification of battery-data-toolkit

Writes all data to the same directory with a separate parquet file for each table. The battery metadata, column schemas, and write date are all saved in the file-level metadata for each file.

export(dataset: BatteryDataset, path: Path)#

Write the dataset to disk in a specific path

All files from the dataset must be placed in the provided directory

Parameters:
  • dataset – Dataset to be exported

  • path – Output path

overwrite: bool = False#

Whether to overwrite existing data

write_options: Dict[str, Any]#

Options passed to write_table().

battdat.io.parquet.inspect_parquet_files(path: str | Path) BatteryMetadata#

Read the metadata from a collection of Parquet files

Parameters:

path – Path to a directory of parquet files

Returns:

Metadata from one of the files