Streaming Battery Data ====================== Many battery datasets are too large to fit in memory in a single computer at once. Such data can be read or written incrementally using the streaming module of battery data toolkit, :class:`battdat.streaming`. Reading Data as a Stream ------------------------ The battery-data-toolkit allows streaming the raw time series data from an :ref:`HDF5 file format `. Stream the data either as individual rows or all rows belonging to each cycle with the :meth:`~battdat.streaming.iterate_records_from_file` or :meth:`~battdat.streaming.iterate_cycles_from_file`. Both functions produce `a Python generator `_ which retrieves a chunk of data from the HDF5 file incrementally and can be used to produce data individually .. code-block:: python row_iter = iterate_records_from_file('example.h5') row = next(row_iter) do_something_per_timestep(row) or as part of a for loop. .. code-block:: python for cycle in iterate_cycles_from_file('example.h5'): do_something_per_cycle(cycle) Reading full cycles by file can produce either a single :class:`~pandas.DataFrame` when reading a single table, a dictionary of ``DataFrames``, or a full :class:`~battdat.data.BatteryDataset` depending on the options for ``key`` and ``make_dataset``. .. code-block:: python # Read as a single DataFrame df = next(iterate_cycles_from_file('example.h5', key='raw_data')) # Read multiple tables as a dictionary dict_of_df = next(iterate_cycles_from_file('example.h5', key=['raw_data', 'cycle_stats'])) # Read all tables as a Dataset dataset = next(iterate_cycles_from_file('example.h5', key=None, make_dataset=True)) Streaming Data to a File ------------------------ Write large datasets into battery-data-toolkit-compatible formats incrementally using the :class:`~battdat.streaming.hdf5.HDF5Writer`. Start the writer class by providing the path to the HDF5 file and the metadata to be written then opening it via Python's ``with`` syntax. .. code-block:: python metadata = BatteryMetadata(name='example') with HDF5Writer('streamed.h5', metadata=metadata) as writer: for time, current, voltage in data_stream: writer.write_row({'test_time': time, 'current': current, 'voltage': voltage}) The writer only writes to disk after enough rows are collected or the end of a data stream is signaled by exiting the ``with`` block.