Extractors (battdat.io
)#
Tools for reading external formats into BatteryDataset
objects
and exporting data to disk.
Base Classes (b.io.base
)#
Base class for a battery data import and export tools
- class battdat.io.base.CycleTestReader#
Bases:
DatasetFileReader
Template class for reading the files output by battery cell cyclers
Adds logic for reading cycling time series from a list of files.
- output_class#
alias of
CellDataset
- read_dataset(group: Sequence[Path | str] = (), metadata: BatteryMetadata | None = None) CellDataset #
Parse a set of files into a Pandas dataframe
- Parameters:
group – List of files to parse as part of the same test. Ordered sequentially
metadata – Metadata for the battery, should adhere to the BatteryMetadata schema
- Returns:
DataFrame containing the information from all files
- read_file(file: str, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) DataFrame #
Generate a DataFrame containing the data in this file
The dataframe will be in our standard format
- Parameters:
file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files
- Returns:
Dataframe containing the battery data in a standard format
- class battdat.io.base.DatasetFileReader#
Bases:
DatasetReader
Tool which reads datasets written to files
Provide an
identify_files()
to filter out files likely to be in this format, orgroup()
function to find related file if data are often split into multiple files.- group(files: str | Path | List[str | Path], directories: List[str | Path] | None = None, context: dict | None = None) Iterator[tuple[str | Path, ...]] #
Identify a groups of files and directories that should be parsed together
Will create groups using only the files and directories included as input.
The files of files are all files that could be read by this extractor, which may include many false positives.
- Parameters:
files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files
- Yields:
Groups of files
- identify_files(path: str | Path, context: dict | None = None) Iterator[tuple[str | Path]] #
Identify all groups of files likely to be compatible with this reader
Uses the
group()
function to determine groups of files that should be parsed together.- Parameters:
path – Root of directory to group together
context – Context about the files
- Yields:
Groups of eligible files
- class battdat.io.base.DatasetReader#
Bases:
object
Base class for tools which read battery data as a
BatteryDataset
All readers must implement a function which receives battery metadata as input and produces a completed
battdat.data.BatteryDataset
as an output.Subclasses provide additional suggested operations useful when working with data from common sources (e.g., file systems, web APIs)
- output_class#
Type of dataset to output
alias of
BatteryDataset
- read_dataset(metadata: BatteryMetadata | dict | None = None, **kwargs) BatteryDataset #
Parse a set of files into a Pandas dataframe
- Parameters:
metadata – Metadata for the battery
- Returns:
Dataset holding all available information about the dataset
- class battdat.io.base.DatasetWriter#
Bases:
object
Tool which exports data from a
BatteryDataset
to disk in a specific format- export(dataset: BatteryDataset, path: str | Path)#
Write the dataset to disk in a specific path
All files from the dataset must be placed in the provided directory
- Parameters:
dataset – Dataset to be exported
path – Output path
Arbin (b.io.arbin
)#
Extractor for Arbin-format files
- class battdat.io.arbin.ArbinReader#
Bases:
CycleTestReader
Parser for reading from Arbin-format files
Expects the files to be in CSV format
- group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]] #
Identify a groups of files and directories that should be parsed together
Will create groups using only the files and directories included as input.
The files of files are all files that could be read by this extractor, which may include many false positives.
- Parameters:
files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files
- Yields:
Groups of files
- read_file(file: str, file_number: int = 0, start_cycle: int = 0, start_time: float = 0) DataFrame #
Generate a DataFrame containing the data in this file
The dataframe will be in our standard format
- Parameters:
file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files
- Returns:
Dataframe containing the battery data in a standard format
Battery Archive (b.io.ba
)#
Tools for streamlining upload to Battery Archive
- class battdat.io.ba.BatteryArchiveWriter(chunk_size: int = 100000)#
Bases:
DatasetWriter
Export data into CSV files that follow the format definitions used in BatteryArchive
The exporter writes files for each table in the Battery Archive SQL schema with column names matches to their definitions.
- export(dataset: CellDataset, path: Path)#
Write the dataset to disk in a specific path
All files from the dataset must be placed in the provided directory
- Parameters:
dataset – Dataset to be exported
path – Output path
- write_cycle_stats(cell_id: str, data: DataFrame, path: Path)#
Write the cycle stats to disk
- Parameters:
cell_id – Name of the cell
data – Cycle stats dataframe
path – Path to the output directory
- write_metadata(cell_id: str, metadata: BatteryMetadata, path: Path)#
Write the metadata into a JSON file
- Parameters:
cell_id – ID for the cell
metadata – Metadata to be written
path – Path in which to write the data
Battery Data Hub (b.io.batterydata
)#
Parse from the CSV formats of batterydata.energy.gov
- class battdat.io.batterydata.BDReader(store_all: bool = False)#
Bases:
DatasetFileReader
Read data from the batterydata.energy.gov CSV format
Every cell in batterydata.energy.gov is stored as two separate CSV files for each battery, “<cell_name>-summary.csv” for the cycle-level summaries and “<cell_name>-raw.csv” for the time series measurements. Metadata is held in an Excel file, “metadata.xlsx,” in the same directory.
- group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]] #
Identify a groups of files and directories that should be parsed together
Will create groups using only the files and directories included as input.
The files of files are all files that could be read by this extractor, which may include many false positives.
- Parameters:
files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files
- Yields:
Groups of files
- read_dataset(group: List[str], metadata: BatteryMetadata | dict | None = None) CellDataset #
Parse a set of files into a Pandas dataframe
- Parameters:
metadata – Metadata for the battery
- Returns:
Dataset holding all available information about the dataset
- battdat.io.batterydata.convert_eis_data(input_df: DataFrame) DataFrame #
Rename the columns from an NREL-standard set of EIS data to our names and conventions
- Parameters:
input_df – NREL-format raw data
- Returns:
EIS data in battdat format
- battdat.io.batterydata.convert_raw_signal(input_df: DataFrame, store_all: bool) DataFrame #
Convert a cycle statistics dataframe to one using battdat names and conventions
- Parameters:
input_df – Initial NREL-format dataframe
store_all – Whether to store columns even we have not defined their names
- Returns:
DataFrame in the battdat format
- battdat.io.batterydata.convert_summary(input_df: DataFrame, store_all: bool) DataFrame #
Convert the summary dataframe to a format using battdat names and conventions
- Parameters:
input_df – Initial NREL-format dataframe
store_all – Whether to store columns even we have not defined their names
- Returns:
DataFrame in the battdat format
- battdat.io.batterydata.generate_metadata(desc: dict, associated_ids: Iterable[str] = ()) BatteryMetadata #
Assemble the battery metadata for a dataset
The metadata for a single dataset are all the same and available by querying the
https://batterydata.energy.gov/api/3/action/package_show?id={dataset_id}
endpoint of Battery Data Hub.- Parameters:
desc – Data from the CKAN metadata response
associated_ids – List of other resources associated with this dataset, such as the DOIs of papers.
- Returns:
Metadata for the cell provenance and construction
HDF5 (b.io.hdf
)#
Read and write from battery-data-toolkit’s HDF format
- class battdat.io.hdf.HDF5Reader#
Bases:
DatasetReader
Read datasets from a battery-data-toolkit format HDF5 file
The HDF5 format permits multiple datasets in a single HDF5 file so long as they all share the same metadata.
Access these datasets through the
read_from_hdf()
- read_dataset(path: str | Path, metadata: BatteryMetadata | dict | None = None) BatteryDataset #
Read the default dataset and all subsets from an HDF5 file
Use
read_from_hdf()
for more control over reads.- Parameters:
path – Path to the HDF file
metadata – Metadata to use in place of any found in the file
- Returns:
Dataset read from the file
- class battdat.io.hdf.HDF5Writer(complevel: int = 0, complib: str = 'zlib')#
Bases:
DatasetWriter
Interface to write HDF5 files in battery-data-toolkit’s layout
The
write_to_hdf()
method writes a dataset file with the default settings: assuming a single dataset per HDF5 file.Use
write_to_hdf()
to store multiple datasets in the file with a different “prefix” for each.- export(dataset: BatteryDataset, path: str | Path)#
Write the dataset to disk in a specific path
All files from the dataset must be placed in the provided directory
- Parameters:
dataset – Dataset to be exported
path – Output path
- write_to_hdf(dataset: BatteryDataset, file: File, prefix: str | None)#
Add a dataset to an already-open HDF5 file
- Parameters:
dataset – Dataset to be added
file – PyTables file object in which to save the data
prefix – Prefix used when storing the data. Use prefixes to store multiple cells in the same HDF5
- battdat.io.hdf.as_hdf5_object(path_or_file: str | Path | File, **kwargs) File #
Open a path as a PyTables file object if not done already.
Keyword arguments are used when creating a store from a new file
- Parameters:
path_or_file – Either the path to a file or an already open File (in which case this function does nothing)
- Yields:
A file that will close on exit from
with
context, if a file was provided.
- battdat.io.hdf.inspect_hdf(file: File) Tuple[BatteryMetadata, Set[str | None]] #
Gather the metadata describing all datasets and the names of datasets within an HDF5 file
- Parameters:
file – HDF5 file to read from
- Returns:
Metadata from this file
List of names of datasets stored within the file (prefixes)
- battdat.io.hdf.make_numpy_dtype_from_pandas(df: DataFrame) dtype #
Generate a Numpy dtype from a Pandas dataframe
- Parameters:
df – Dataframe to be converted
- Returns:
Structured dtype of the data
- battdat.io.hdf.read_df_from_table(table: Table) DataFrame #
Read a dataframe from a table
- Parameters:
table – Table to read from
- Returns:
Dataframe containing the contents
- battdat.io.hdf.write_df_to_table(file: File, group: Group, name: str, df: DataFrame, filters: Filters | None = None, expected_rows: int | None = None) Table #
Write a dataframe to an HDF5 file
- Parameters:
file – File to be written to
group – Group which holds the associated datasets
name – Name of the dataset
df – DataFrame to write
filters – Filters to apply to data entering table
expected_rows
- Returns:
Table object holding the dataset
MACCOR (b.io.maccor
)#
Extractor for MACCOR
- class battdat.io.maccor.MACCORReader#
Bases:
CycleTestReader
,DatasetFileReader
Parser for reading from MACCOR-format files
Expects the files to be ASCII files with a .### extension. The
group()
operation will consolidate files such that all with the same prefix (i.e., everything except the numerals in the extension) are treated as part of the same experiment.- group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]] #
Identify a groups of files and directories that should be parsed together
Will create groups using only the files and directories included as input.
The files of files are all files that could be read by this extractor, which may include many false positives.
- Parameters:
files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files
- Yields:
Groups of files
- read_file(file: str, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) DataFrame #
Generate a DataFrame containing the data in this file
The dataframe will be in our standard format
- Parameters:
file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files
- Returns:
Dataframe containing the battery data in a standard format
Parquet (b.io.parquet
)#
Read and write from battery-data-toolkit’s parquet format
- class battdat.io.parquet.ParquetReader#
Bases:
DatasetFileReader
Read parquet files formatted according to battery-data-toolkit standards
Mirrors
ParquetWriter
. Expects each constituent table to be in a separate parquet file and to have the metadata stored in the file-level metadata of the parquet file.- read_dataset(paths: str | Path | Collection[str | Path], metadata: BatteryMetadata | dict | None = None) BatteryDataset #
Read a set of parquet files into a BatteryDataset
- Parameters:
paths – Either the path to a single-directory of files, or a list of files to parse
metadata – Metadata which will overwrite what is available in the files
- Returns:
Dataset including all subsets
- class battdat.io.parquet.ParquetWriter(overwrite: bool = False, write_options: ~typing.Dict[str, ~typing.Any] = <factory>)#
Bases:
DatasetWriter
Write to parquet files in the format specification of battery-data-toolkit
Writes all data to the same directory with a separate parquet file for each table. The battery metadata, column schemas, and write date are all saved in the file-level metadata for each file.
- export(dataset: BatteryDataset, path: Path)#
Write the dataset to disk in a specific path
All files from the dataset must be placed in the provided directory
- Parameters:
dataset – Dataset to be exported
path – Output path
- write_options: Dict[str, Any]#
Options passed to
write_table()
.
- battdat.io.parquet.inspect_parquet_files(path: str | Path) BatteryMetadata #
Read the metadata from a collection of Parquet files
- Parameters:
path – Path to a directory of parquet files
- Returns:
Metadata from one of the files