Extractors (`batdata.extractors`)#

Data extraction tools

Base Class (`b.extractors.base`)#

Base class for a battery data extractor

class batdata.extractors.base.BatteryDataExtractor(eps: float = 1e-10)#

Bases: object

Base class for a data extractors

Implementing an Extractor#

The minimum is to define the generate_dataframe method, which produces a data-frame containing the time-series data with standardized column names.

If the data format contains additional metadata or cycle-level features, override the parse_to_dataframe() such that it adds such data after parsing the time-series results.

Provide an identify_files() or group() function to find related files if data are often split into multiple files.

generate_dataframe(file: str, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) → DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:

file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) → Iterator[tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are _all_ files that could be read by this extractor, which may include many false positives.

Parameters:

files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files

Yields:

Groups of files

identify_files(path: str, context: dict | None = None) → Iterator[tuple[str]]#

Identify all groups of files likely to be compatible with this extractor

Uses the group() function to determine groups of files that should be parsed together.

Parameters:

path – Root of directory to group together
context – Context about the files

Yields:

Groups of eligible files

parse_to_dataframe(group: List[str], metadata: BatteryMetadata | dict | None = None) → BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:

group – List of files to parse as part of the same test. Ordered sequentially
metadata – Metadata for the battery, should adhere to the BatteryMetadata schema

Returns:

DataFrame containing the information from all files

Arbin (`b.extractors.arbin`)#

Extractor for Arbin-format files

class batdata.extractors.arbin.ArbinExtractor(eps: float = 1e-10)#

Bases: BatteryDataExtractor

Parser for reading from Arbin-format files

Expects the files to be in CSV format

generate_dataframe(file: str, file_number: int = 0, start_cycle: int = 0, start_time: float = 0) → DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:

file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) → Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are _all_ files that could be read by this extractor, which may include many false positives.

Parameters:

files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files

Yields:

Groups of files

Battery Data Hub (`b.extractors.batterydata`)#

Parse from the CSV formats of batterydata.energy.gov

class batdata.extractors.batterydata.BDExtractor(store_all: bool = False)#

Bases: BatteryDataExtractor

Read data from the batterydata.energy.gov CSV format

Every cell in batterydata.energy.gov is stored as two separate CSV files for each battery, “<cell_name>-summary.csv” for the cycle-level summaries and “<cell_name>-raw.csv” for the time series measurements. Metadata is held in an Excel file, “metadata.xlsx,” in the same directory.

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) → Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are _all_ files that could be read by this extractor, which may include many false positives.

Parameters:

files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files

Yields:

Groups of files

parse_to_dataframe(group: List[str], metadata: BatteryMetadata | dict | None = None) → BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:

group – List of files to parse as part of the same test. Ordered sequentially
metadata – Metadata for the battery, should adhere to the BatteryMetadata schema

Returns:

DataFrame containing the information from all files

store_all: bool = False#: Store all data from the original data, even if we have not defined it

batdata.extractors.batterydata.convert_eis_data_to_batdata(input_df: DataFrame) → DataFrame#

Rename the columns from an NREL-standard set of EIS data to our names and conventions

Parameters:: input_df – NREL-format raw data
Returns:: EIS data in batdata format

batdata.extractors.batterydata.convert_raw_signal_to_batdata(input_df: DataFrame, store_all: bool) → DataFrame#

Convert a cycle statistics dataframe to one using batdata names and conventions

Parameters:

input_df – Initial NREL-format dataframe
store_all – Whether to store columns even we have not defined their names

Returns:

DataFrame in the batdata format

batdata.extractors.batterydata.convert_summary_to_batdata(input_df: DataFrame, store_all: bool) → DataFrame#

Convert the summary dataframe to a format using batdata names and conventions

Parameters:

input_df – Initial NREL-format dataframe
store_all – Whether to store columns even we have not defined their names

Returns:

DataFrame in the batdata format

batdata.extractors.batterydata.generate_metadata(desc: dict, associated_ids: Iterable[str] = ()) → BatteryMetadata#

Assemble the battery metadata for a dataset

The metadata for a single dataset are all the same and available by querying the https://batterydata.energy.gov/api/3/action/package_show?id={dataset_id} endpoint of Battery Data Hub.

Parameters:

desc – Data from the CKAN metadata response
associated_ids – List of other resources associated with this dataset, such as the DOIs of papers.

Returns:

Metadata for the cell provenance and construction

Extractors (batdata.extractors)#

Base Class (b.extractors.base)#

Implementing an Extractor#

Arbin (b.extractors.arbin)#

Battery Data Hub (b.extractors.batterydata)#

Extractors (`batdata.extractors`)#

Base Class (`b.extractors.base`)#

Arbin (`b.extractors.arbin`)#

Battery Data Hub (`b.extractors.batterydata`)#