Extractors (batdata.extractors)#

Data extraction tools

Base Class (b.extractors.base)#

Base class for a battery data extractor

class batdata.extractors.base.BatteryDataExtractor(eps: float = 1e-10)#

Bases: object

Base class for a data extractors

Implementing an Extractor#

The minimum is to define the generate_dataframe method, which produces a data-frame containing the time-series data with standardized column names.

If the data format contains additional metadata or cycle-level features, override the parse_to_dataframe() such that it adds such data after parsing the time-series results.

Provide an identify_files() or group() function to find related files if data are often split into multiple files.

generate_dataframe(file: str, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:
  • file – Path to the file

  • file_number – Number of file, in case the test is spread across multiple files

  • start_cycle – Index to use for the first cycle, in case test is spread across multiple files

  • start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are _all_ files that could be read by this extractor, which may include many false positives.

Parameters:
  • files – List of files to consider grouping

  • directories – Any directories to consider group as well

  • context – Context about the files

Yields:

Groups of files

identify_files(path: str, context: dict | None = None) Iterator[tuple[str]]#

Identify all groups of files likely to be compatible with this extractor

Uses the group() function to determine groups of files that should be parsed together.

Parameters:
  • path – Root of directory to group together

  • context – Context about the files

Yields:

Groups of eligible files

parse_to_dataframe(group: List[str], metadata: BatteryMetadata | dict | None = None) BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:
  • group – List of files to parse as part of the same test. Ordered sequentially

  • metadata – Metadata for the battery, should adhere to the BatteryMetadata schema

Returns:

DataFrame containing the information from all files

Arbin (b.extractors.arbin)#

Extractor for Arbin-format files

class batdata.extractors.arbin.ArbinExtractor(eps: float = 1e-10)#

Bases: BatteryDataExtractor

Parser for reading from Arbin-format files

Expects the files to be in CSV format

generate_dataframe(file: str, file_number: int = 0, start_cycle: int = 0, start_time: float = 0) DataFrame#

Generate a DataFrame containing the data in this file

The dataframe will be in our standard format

Parameters:
  • file – Path to the file

  • file_number – Number of file, in case the test is spread across multiple files

  • start_cycle – Index to use for the first cycle, in case test is spread across multiple files

  • start_time – Test time to use for the start of the test, in case test is spread across multiple files

Returns:

Dataframe containing the battery data in a standard format

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are _all_ files that could be read by this extractor, which may include many false positives.

Parameters:
  • files – List of files to consider grouping

  • directories – Any directories to consider group as well

  • context – Context about the files

Yields:

Groups of files

Battery Data Hub (b.extractors.batterydata)#

Parse from the CSV formats of batterydata.energy.gov

class batdata.extractors.batterydata.BDExtractor(store_all: bool = False)#

Bases: BatteryDataExtractor

Read data from the batterydata.energy.gov CSV format

Every cell in batterydata.energy.gov is stored as two separate CSV files for each battery, “<cell_name>-summary.csv” for the cycle-level summaries and “<cell_name>-raw.csv” for the time series measurements. Metadata is held in an Excel file, “metadata.xlsx,” in the same directory.

group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]]#

Identify a groups of files and directories that should be parsed together

Will create groups using only the files and directories included as input.

The files of files are _all_ files that could be read by this extractor, which may include many false positives.

Parameters:
  • files – List of files to consider grouping

  • directories – Any directories to consider group as well

  • context – Context about the files

Yields:

Groups of files

parse_to_dataframe(group: List[str], metadata: BatteryMetadata | dict | None = None) BatteryDataset#

Parse a set of files into a Pandas dataframe

Parameters:
  • group – List of files to parse as part of the same test. Ordered sequentially

  • metadata – Metadata for the battery, should adhere to the BatteryMetadata schema

Returns:

DataFrame containing the information from all files

store_all: bool = False#

Store all data from the original data, even if we have not defined it

batdata.extractors.batterydata.convert_eis_data_to_batdata(input_df: DataFrame) DataFrame#

Rename the columns from an NREL-standard set of EIS data to our names and conventions

Parameters:

input_df – NREL-format raw data

Returns:

EIS data in batdata format

batdata.extractors.batterydata.convert_raw_signal_to_batdata(input_df: DataFrame, store_all: bool) DataFrame#

Convert a cycle statistics dataframe to one using batdata names and conventions

Parameters:
  • input_df – Initial NREL-format dataframe

  • store_all – Whether to store columns even we have not defined their names

Returns:

DataFrame in the batdata format

batdata.extractors.batterydata.convert_summary_to_batdata(input_df: DataFrame, store_all: bool) DataFrame#

Convert the summary dataframe to a format using batdata names and conventions

Parameters:
  • input_df – Initial NREL-format dataframe

  • store_all – Whether to store columns even we have not defined their names

Returns:

DataFrame in the batdata format

batdata.extractors.batterydata.generate_metadata(desc: dict, associated_ids: Iterable[str] = ()) BatteryMetadata#

Assemble the battery metadata for a dataset

The metadata for a single dataset are all the same and available by querying the https://batterydata.energy.gov/api/3/action/package_show?id={dataset_id} endpoint of Battery Data Hub.

Parameters:
  • desc – Data from the CKAN metadata response

  • associated_ids – List of other resources associated with this dataset, such as the DOIs of papers.

Returns:

Metadata for the cell provenance and construction