kedro.extras.datasets.pandas.AppendableExcelDataSet¶
-
class
kedro.extras.datasets.pandas.
AppendableExcelDataSet
(filepath, load_args=None, save_args=None)[source]¶ Bases:
kedro.io.core.AbstractDataSet
AppendableExcelDataSet
loads/saves data from/to a local Excel file opened in append mode. It uses pandas to handle the Excel file.Example:
from kedro.extras.datasets.pandas import AppendableExcelDataSet from kedro.extras.datasets.pandas import ExcelDataSet import pandas as pd data_1 = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5], 'col3': [5, 6]}) data_2 = pd.DataFrame({'col1': [7, 8], 'col2': [5, 7]}) regular_ds = ExcelDataSet(filepath="/tmp/test.xlsx") appendable_ds = AppendableExcelDataSet( filepath="/tmp/test.xlsx", save_args={"sheet_name": "my_sheet"}, load_args={"sheet_name": "my_sheet"} ) regular_ds.save(data_1) appendable_ds.save(data_2) reloaded = appendable_ds.load() assert data_2.equals(reloaded)
Attributes
AppendableExcelDataSet.DEFAULT_LOAD_ARGS
AppendableExcelDataSet.DEFAULT_SAVE_ARGS
Methods
AppendableExcelDataSet.__init__
(filepath[, …])Creates a new instance of AppendableExcelDataSet
pointing to an existing local Excel file to be opened in append mode.AppendableExcelDataSet.exists
()Checks whether a data set’s output already exists by calling the provided _exists() method. AppendableExcelDataSet.from_config
(name, config)Create a data set instance using the configuration provided. AppendableExcelDataSet.load
()Loads data by delegation to the provided load method. AppendableExcelDataSet.release
()Release any cached data. AppendableExcelDataSet.save
(data)Saves data by delegation to the provided save method. -
DEFAULT_LOAD_ARGS
= {'engine': 'openpyxl'}¶
-
DEFAULT_SAVE_ARGS
= {'index': False}¶
-
__init__
(filepath, load_args=None, save_args=None)[source]¶ Creates a new instance of
AppendableExcelDataSet
pointing to an existing local Excel file to be opened in append mode.Parameters: - filepath (
str
) – Filepath in POSIX format to an existing local Excel file. - load_args (
Optional
[Dict
[str
,Any
]]) – Pandas options for loading Excel files. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html All defaults are preserved, but “engine”, which is set to “openpyxl”. - save_args (
Optional
[Dict
[str
,Any
]]) – Pandas options for saving Excel files. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html All defaults are preserved, but “index”, which is set to False. If you would like to specify options for the ExcelWriter, you can include them under “writer” key. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.ExcelWriter.html Note: mode option of ExcelWriter is set to a and it can not be overridden.
Return type: None
- filepath (
-
exists
()¶ Checks whether a data set’s output already exists by calling the provided _exists() method.
Return type: bool
Returns: Flag indicating whether the output already exists. Raises: DataSetError
– when underlying exists method raises error.
-
classmethod
from_config
(name, config, load_version=None, save_version=None)¶ Create a data set instance using the configuration provided.
Parameters: - name (
str
) – Data set name. - config (
Dict
[str
,Any
]) – Data set config dictionary. - load_version (
Optional
[str
]) – Version string to be used forload
operation if the data set is versioned. Has no effect on the data set if versioning was not enabled. - save_version (
Optional
[str
]) – Version string to be used forsave
operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
Return type: AbstractDataSet
Returns: An instance of an
AbstractDataSet
subclass.Raises: DataSetError
– When the function fails to create the data set from its config.- name (
-
load
()¶ Loads data by delegation to the provided load method.
Return type: Any
Returns: Data returned by the provided load method. Raises: DataSetError
– When underlying load method raises error.
-
release
()¶ Release any cached data.
Raises: DataSetError
– when underlying release method raises error.Return type: None
-
save
(data)¶ Saves data by delegation to the provided save method.
Parameters: data ( Any
) – the value to be saved by provided save method.Raises: DataSetError
– when underlying save method raises error.Return type: None
-