Extractors (batdata.extractors
)#
Data extraction tools
Base Class (b.extractors.base
)#
Base class for a battery data extractor
- class batdata.extractors.base.BatteryDataExtractor(eps: float = 1e-10)#
Bases:
object
Base class for a data extractors
Implementing an Extractor#
The minimum is to define the generate_dataframe method, which produces a data-frame containing the time-series data with standardized column names.
If the data format contains additional metadata or cycle-level features, override the
parse_to_dataframe()
such that it adds such data after parsing the time-series results.Provide an
identify_files()
orgroup()
function to find related files if data are often split into multiple files.- generate_dataframe(file: str, file_number: int = 0, start_cycle: int = 0, start_time: int = 0) DataFrame #
Generate a DataFrame containing the data in this file
The dataframe will be in our standard format
- Parameters:
file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files
- Returns:
Dataframe containing the battery data in a standard format
- group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[tuple[str, ...]] #
Identify a groups of files and directories that should be parsed together
Will create groups using only the files and directories included as input.
The files of files are _all_ files that could be read by this extractor, which may include many false positives.
- Parameters:
files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files
- Yields:
Groups of files
- identify_files(path: str, context: dict | None = None) Iterator[tuple[str]] #
Identify all groups of files likely to be compatible with this extractor
Uses the
group()
function to determine groups of files that should be parsed together.- Parameters:
path – Root of directory to group together
context – Context about the files
- Yields:
Groups of eligible files
- parse_to_dataframe(group: List[str], metadata: BatteryMetadata | dict | None = None) BatteryDataset #
Parse a set of files into a Pandas dataframe
- Parameters:
group – List of files to parse as part of the same test. Ordered sequentially
metadata – Metadata for the battery, should adhere to the BatteryMetadata schema
- Returns:
DataFrame containing the information from all files
Arbin (b.extractors.arbin
)#
Extractor for Arbin-format files
- class batdata.extractors.arbin.ArbinExtractor(eps: float = 1e-10)#
Bases:
BatteryDataExtractor
Parser for reading from Arbin-format files
Expects the files to be in CSV format
- generate_dataframe(file: str, file_number: int = 0, start_cycle: int = 0, start_time: float = 0) DataFrame #
Generate a DataFrame containing the data in this file
The dataframe will be in our standard format
- Parameters:
file – Path to the file
file_number – Number of file, in case the test is spread across multiple files
start_cycle – Index to use for the first cycle, in case test is spread across multiple files
start_time – Test time to use for the start of the test, in case test is spread across multiple files
- Returns:
Dataframe containing the battery data in a standard format
- group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]] #
Identify a groups of files and directories that should be parsed together
Will create groups using only the files and directories included as input.
The files of files are _all_ files that could be read by this extractor, which may include many false positives.
- Parameters:
files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files
- Yields:
Groups of files
Battery Data Hub (b.extractors.batterydata
)#
Parse from the CSV formats of batterydata.energy.gov
- class batdata.extractors.batterydata.BDExtractor(store_all: bool = False)#
Bases:
BatteryDataExtractor
Read data from the batterydata.energy.gov CSV format
Every cell in batterydata.energy.gov is stored as two separate CSV files for each battery, “<cell_name>-summary.csv” for the cycle-level summaries and “<cell_name>-raw.csv” for the time series measurements. Metadata is held in an Excel file, “metadata.xlsx,” in the same directory.
- group(files: str | List[str], directories: List[str] | None = None, context: dict | None = None) Iterator[Tuple[str, ...]] #
Identify a groups of files and directories that should be parsed together
Will create groups using only the files and directories included as input.
The files of files are _all_ files that could be read by this extractor, which may include many false positives.
- Parameters:
files – List of files to consider grouping
directories – Any directories to consider group as well
context – Context about the files
- Yields:
Groups of files
- parse_to_dataframe(group: List[str], metadata: BatteryMetadata | dict | None = None) BatteryDataset #
Parse a set of files into a Pandas dataframe
- Parameters:
group – List of files to parse as part of the same test. Ordered sequentially
metadata – Metadata for the battery, should adhere to the BatteryMetadata schema
- Returns:
DataFrame containing the information from all files
- batdata.extractors.batterydata.convert_eis_data_to_batdata(input_df: DataFrame) DataFrame #
Rename the columns from an NREL-standard set of EIS data to our names and conventions
- Parameters:
input_df – NREL-format raw data
- Returns:
EIS data in batdata format
- batdata.extractors.batterydata.convert_raw_signal_to_batdata(input_df: DataFrame, store_all: bool) DataFrame #
Convert a cycle statistics dataframe to one using batdata names and conventions
- Parameters:
input_df – Initial NREL-format dataframe
store_all – Whether to store columns even we have not defined their names
- Returns:
DataFrame in the batdata format
- batdata.extractors.batterydata.convert_summary_to_batdata(input_df: DataFrame, store_all: bool) DataFrame #
Convert the summary dataframe to a format using batdata names and conventions
- Parameters:
input_df – Initial NREL-format dataframe
store_all – Whether to store columns even we have not defined their names
- Returns:
DataFrame in the batdata format
- batdata.extractors.batterydata.generate_metadata(desc: dict, associated_ids: Iterable[str] = ()) BatteryMetadata #
Assemble the battery metadata for a dataset
The metadata for a single dataset are all the same and available by querying the
https://batterydata.energy.gov/api/3/action/package_show?id={dataset_id}
endpoint of Battery Data Hub.- Parameters:
desc – Data from the CKAN metadata response
associated_ids – List of other resources associated with this dataset, such as the DOIs of papers.
- Returns:
Metadata for the cell provenance and construction