Extractors (`battdat.io`)#

Tools for reading external formats into BatteryDataset objects and exporting data to disk.

Base Classes (`b.io.base`)#

Base class for a battery data import and export tools

class battdat.io.base.CycleTestReader#

Bases: DatasetFileReader

Template class for reading the files output by battery cell cyclers

Adds logic for reading cycling time series from a list of files.

read_dataset(group: Sequence[str | Path] = (), metadata: BatteryMetadata | None = None) → BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:

group – List of files to parse as part of the same test. Ordered sequentially
metadata – Metadata for the battery, should adhere to the BatteryMetadata schema

Returns:

DataFrame containing the information from all files

read_file(file: str, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) → DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:

file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

class battdat.io.base.DatasetFileReader#

Bases: DatasetReader

Tool which reads datasets written to files

Provide an identify_files() to filter out files likely to be in this format, or group() function to find related file if data are often split into multiple files.

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are all files that could be read by this extractor, which may include many false positives.

Parameters:

files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files

Yields:

Groups of files

identify_files(path: str | Path, context: dict | None = None) → Iterator[tuple[str | Path]]#

Identify all groups of files likely to be compatible with this reader

Uses the group() function to determine groups of files that should be parsed together.

Parameters:

path – Root of directory to group together
context – Context about the files

Yields:

Groups of eligible files

class battdat.io.base.DatasetReader#

Bases: object

Base class for tools which read battery data as a BatteryDataset

All readers must implement a function which receives battery metadata as input and produces a completed battdat.data.BatteryDataset as an output.

Subclasses provide additional suggested operations useful when working with data from common sources (e.g., file systems, web APIs)

read_dataset(metadata: BatteryMetadata | dict | None = None, **kwargs) → BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:: metadata – Metadata for the battery
Returns:: Dataset holding all available information about the dataset

class battdat.io.base.DatasetWriter#

Bases: object

Tool which exports data from a BatteryDataset to disk in a specific format

export(dataset: BatteryDataset, path: str | Path)#

Write the dataset to disk in a specific path

All files from the dataset must be placed in the provided directory

Parameters:

dataset – Dataset to be exported
path – Output path

Arbin (`b.io.arbin`)#

Extractor for Arbin-format files

class battdat.io.arbin.ArbinReader#

Bases: CycleTestReader

Parser for reading from Arbin-format files

Expects the files to be in CSV format

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) → Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are all files that could be read by this extractor, which may include many false positives.

Parameters:

files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files

Yields:

Groups of files

read_file(file: str, file_number: int = 0, start_cycle: int = 0, start_time: float = 0) → DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:

file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

Battery Archive (`b.io.ba`)#

Tools for streamlining upload to Battery Archive

class battdat.io.ba.BatteryArchiveWriter(chunk_size: int = 100000)#

Bases: DatasetWriter

Export data into CSV files that follow the format definitions used in BatteryArchive

The exporter writes files for each table in the Battery Archive SQL schema with column names matches to their definitions.

chunk_size: int = 100000#: Maximum number of rows to write to disk in a single CSV file

export(dataset: BatteryDataset, path: Path)#

Write the dataset to disk in a specific path

All files from the dataset must be placed in the provided directory

Parameters:

dataset – Dataset to be exported
path – Output path

write_cycle_stats(cell_id: str, data: DataFrame, path: Path)#

Write the cycle stats to disk

Parameters:

cell_id – Name of the cell
data – Cycle stats dataframe
path – Path to the output directory

write_metadata(cell_id: str, metadata: BatteryMetadata, path: Path)#

Write the metadata into a JSON file

Parameters:

cell_id – ID for the cell
metadata – Metadata to be written
path – Path in which to write the data

write_timeseries(cell_id: str, data: DataFrame, path: Path)#

Write the time series dataset

Parameters:

cell_id – Name for the cell, used as a foreign key to map between tables
data – Time series data to write to disk
path – Root path for writing cycling data

Battery Data Hub (`b.io.batterydata`)#

Parse from the CSV formats of batterydata.energy.gov

class battdat.io.batterydata.BDReader(store_all: bool = False)#

Bases: DatasetFileReader

Read data from the batterydata.energy.gov CSV format

Every cell in batterydata.energy.gov is stored as two separate CSV files for each battery, “<cell_name>-summary.csv” for the cycle-level summaries and “<cell_name>-raw.csv” for the time series measurements. Metadata is held in an Excel file, “metadata.xlsx,” in the same directory.

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) → Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are all files that could be read by this extractor, which may include many false positives.

Parameters:

files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files

Yields:

Groups of files

read_dataset(group: List[str], metadata: BatteryMetadata | dict | None = None) → BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:: metadata – Metadata for the battery
Returns:: Dataset holding all available information about the dataset

store_all: bool = False#: Store all data from the original data, even if we have not defined it

battdat.io.batterydata.convert_eis_data(input_df: DataFrame) → DataFrame#

Rename the columns from an NREL-standard set of EIS data to our names and conventions

Parameters:: input_df – NREL-format raw data
Returns:: EIS data in battdat format

battdat.io.batterydata.convert_raw_signal(input_df: DataFrame, store_all: bool) → DataFrame#

Convert a cycle statistics dataframe to one using battdat names and conventions

Parameters:

input_df – Initial NREL-format dataframe
store_all – Whether to store columns even we have not defined their names

Returns:

DataFrame in the battdat format

battdat.io.batterydata.convert_summary(input_df: DataFrame, store_all: bool) → DataFrame#

Convert the summary dataframe to a format using battdat names and conventions

Parameters:

input_df – Initial NREL-format dataframe
store_all – Whether to store columns even we have not defined their names

Returns:

DataFrame in the battdat format

battdat.io.batterydata.generate_metadata(desc: dict, associated_ids: Iterable[str] = ()) → BatteryMetadata#

Assemble the battery metadata for a dataset

The metadata for a single dataset are all the same and available by querying the https://batterydata.energy.gov/api/3/action/package_show?id={dataset_id} endpoint of Battery Data Hub.

Parameters:

desc – Data from the CKAN metadata response
associated_ids – List of other resources associated with this dataset, such as the DOIs of papers.

Returns:

Metadata for the cell provenance and construction

HDF5 (`b.io.hdf`)#

Read and write from battery-data-toolkit’s HDF format

class battdat.io.hdf.HDF5Reader#

Bases: DatasetReader

Read datasets from a battery-data-toolkit format HDF5 file

The HDF5 format permits multiple datasets in a single HDF5 file so long as they all share the same metadata.

Access these datasets through the read_from_hdf()

read_dataset(path: str | Path, metadata: BatteryMetadata | dict | None = None) → BatteryDataset#

Read the default dataset and all subsets from an HDF5 file

Use read_from_hdf() for more control over reads.

Parameters:

path – Path to the HDF file
metadata – Metadata to use in place of any found in the file

Returns:

Dataset read from the file

read_from_hdf(file: File, prefix: int | str | None, subsets: Collection[str] | None = None)#

class battdat.io.hdf.HDF5Writer(complevel: int = 0, complib: str = 'zlib')#

Bases: DatasetWriter

Interface to write HDF5 files in battery-data-toolkit’s layout

The write_to_hdf() method writes a dataset file with the default settings: assuming a single dataset per HDF5 file.

Use write_to_hdf() to store multiple datasets in the file with a different “prefix” for each.

add_table(file: File, name: str, data: DataFrame, schema: ColumnSchema, prefix: str | None = None)#

Add a table to an existing dataset

Parameters:

file – HDF file open via pytables
name – Name of the data table
data – Data table to be saved
schema – Description of the columns in battdat format
prefix – Prefix of the battery dataset if saving multiple per file

append_to_table(file: File, name: str, data: DataFrame, prefix: str | None = None)#

Add to an existing table

Parameters:

file – HDF file open via pytables
name – Name of the data table
data – Data table to be saved
prefix – Prefix of the battery dataset if saving multiple per file

complevel: int = 0#: Compression level for data. A value of 0 disables compression.

complib: str = 'zlib'#: Specifies the compression library to be used.

export(dataset: BatteryDataset, path: str | Path)#

Write the dataset to disk in a specific path

All files from the dataset must be placed in the provided directory

Parameters:

dataset – Dataset to be exported
path – Output path

write_to_hdf(dataset: BatteryDataset, file: File, prefix: str | None)#

Add a dataset to an already-open HDF5 file

Parameters:

dataset – Dataset to be added
file – PyTables file object in which to save the data
prefix – Prefix used when storing the data. Use prefixes to store multiple cells in the same HDF5

battdat.io.hdf.as_hdf5_object(path_or_file: str | Path | File, **kwargs)#

Open a path as a PyTables file object if not done already.

Keyword arguments are used when creating a store from a new file

Parameters:: path_or_file – Either the path to a file or an already open File (in which case this function does nothing)
Yields:: A file that will close on exit from with context, if a file was provided.

battdat.io.hdf.inspect_hdf(file: File) → Tuple[BatteryMetadata, Set[str | None], Dict[str, ColumnSchema]]#

Gather the metadata describing all datasets and the names of datasets within an HDF5 file

Parameters:

file – HDF5 file to read from

Returns:

Metadata from this file
List of names of datasets stored within the file (prefixes)

battdat.io.hdf.make_numpy_dtype_from_pandas(df: DataFrame) → dtype#

Generate a Numpy dtype from a Pandas dataframe

Parameters:

df – Dataframe to be converted

Returns:

Structured dtype of the data

battdat.io.hdf.read_df_from_table(table: Table) → DataFrame#

Read a dataframe from a table

Parameters:: table – Table to read from
Returns:: Dataframe containing the contents

battdat.io.hdf.write_df_to_table(file: File, group: Group, name: str, df: DataFrame, filters: Filters | None = None, expected_rows: int | None = None) → Table#

Write a dataframe to an HDF5 file

Parameters:

file – File to be written to
group – Group which holds the associated datasets
name – Name of the dataset
df – DataFrame to write
filters – Filters to apply to data entering table
expected_rows – How many rows to expect. Default is to use the length of the dataframe

Returns:

Table object holding the dataset

MACCOR (`b.io.maccor`)#

Extractor for MACCOR

class battdat.io.maccor.MACCORReader(ignore_time: bool = False)#

Bases: CycleTestReader, DatasetFileReader

Parser for reading from MACCOR-format files

Expects the files to be ASCII files with a .### extension. The group() operation will consolidate files such that all with the same prefix (i.e., everything except the numerals in the extension) are treated as part of the same experiment.

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) → Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are all files that could be read by this extractor, which may include many false positives.

Parameters:

files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files

Yields:

Groups of files

ignore_time: bool = False#: Ignore the the time column, which can be problematic.

read_dataset(group: Sequence[str | Path] = (), metadata: BatteryMetadata | None = None) → BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:

group – List of files to parse as part of the same test. Ordered sequentially
metadata – Metadata for the battery, should adhere to the BatteryMetadata schema

Returns:

DataFrame containing the information from all files

read_file(file: str | Path, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) → DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:

file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

Parquet (`b.io.parquet`)#

Read and write from battery-data-toolkit’s parquet format

class battdat.io.parquet.ParquetReader#

Bases: DatasetFileReader

Read parquet files formatted according to battery-data-toolkit standards

Mirrors ParquetWriter. Expects each constituent table to be in a separate parquet file and to have the metadata stored in the file-level metadata of the parquet file.

Read a set of parquet files into a BatteryDataset

Parameters:

paths – Either the path to a single-directory of files, or a list of files to parse
metadata – Metadata which will overwrite what is available in the files

Returns:

Dataset including all subsets

class battdat.io.parquet.ParquetWriter(overwrite: bool = False, write_options: ~typing.Dict[str, ~typing.Any] = <factory>)#

Bases: DatasetWriter

Write to parquet files in the format specification of battery-data-toolkit

Writes all data to the same directory with a separate parquet file for each table. The battery metadata, column schemas, and write date are all saved in the file-level metadata for each file.

export(dataset: BatteryDataset, path: Path)#

Write the dataset to disk in a specific path

All files from the dataset must be placed in the provided directory

Parameters:

dataset – Dataset to be exported
path – Output path

overwrite: bool = False#: Whether to overwrite existing data

write_options: Dict[str, Any]#: Options passed to write_table().

battdat.io.parquet.inspect_parquet_files(path: str | Path) → BatteryMetadata#

Read the metadata from a collection of Parquet files

Parameters:: path – Path to a directory of parquet files
Returns:: Metadata from one of the files

Extractors (battdat.io)#

Base Classes (b.io.base)#

Arbin (b.io.arbin)#

Battery Archive (b.io.ba)#

Battery Data Hub (b.io.batterydata)#

HDF5 (b.io.hdf)#

MACCOR (b.io.maccor)#

Parquet (b.io.parquet)#

This Page

Extractors (`battdat.io`)#

Base Classes (`b.io.base`)#

Arbin (`b.io.arbin`)#

Battery Archive (`b.io.ba`)#

Battery Data Hub (`b.io.batterydata`)#

HDF5 (`b.io.hdf`)#

MACCOR (`b.io.maccor`)#

Parquet (`b.io.parquet`)#