kedro.io

Description

kedro.io provides functionality to read and write to a number of data sets. At core of the library is AbstractDataSet which allows implementation of various AbstractDataSets.

Data Catalog

kedro.io.DataCatalog([data_sets, feed_dict, …]) DataCatalog stores instances of AbstractDataSet implementations to provide load and save capabilities from anywhere in the program.

Data Sets

kedro.io.CSVLocalDataSet(filepath[, …]) CSVLocalDataSet loads and saves data to a local csv file.
kedro.io.CSVHTTPDataSet(fileurl[, auth, …]) CSVHTTPDataSet loads the data from HTTP(S) and parses as Pandas dataframe.
kedro.io.CSVS3DataSet(filepath[, …]) CSVS3DataSet loads and saves data to a file in S3.
kedro.io.HDFLocalDataSet(filepath, key[, …]) HDFLocalDataSet loads and saves data to a local hdf file.
kedro.io.HDFS3DataSet(filepath, key[, …]) HDFS3DataSet loads and saves data to a S3 bucket.
kedro.io.JSONLocalDataSet(filepath[, …]) JSONLocalDataSet encodes data as json and saves it to a local file or reads in and decodes an existing json file.
kedro.io.JSONDataSet(filepath[, load_args, …]) JSONDataSet loads/saves data from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS).
kedro.io.LambdaDataSet(load, save[, exists, …]) LambdaDataSet loads and saves data to a data set.
kedro.io.MemoryDataSet([data, copy_mode]) MemoryDataSet loads and saves data from/to an in-memory Python object.
kedro.io.ParquetLocalDataSet(filepath[, …]) AbstractDataSet with functionality for handling local parquet files.
kedro.io.PartitionedDataSet(path, dataset[, …]) PartitionedDataSet loads and saves partitioned file-like data using the underlying dataset definition.
kedro.io.IncrementalDataSet(path, dataset[, …]) IncrementalDataSet inherits from PartitionedDataSet, which loads and saves partitioned file-like data using the underlying dataset definition.
kedro.io.PickleLocalDataSet(filepath[, …]) PickleLocalDataSet loads and saves a Python object to a local pickle file.
kedro.io.PickleS3DataSet(filepath[, …]) PickleS3DataSet loads and saves a Python object to a pickle file on S3.
kedro.io.SQLTableDataSet(table_name, credentials) SQLTableDataSet loads data from a SQL table and saves a pandas dataframe to a table.
kedro.io.SQLQueryDataSet(sql, credentials[, …]) SQLQueryDataSet loads data from a provided SQL query.
kedro.io.TextLocalDataSet(filepath[, …]) TextLocalDataSet loads and saves unstructured text files.
kedro.io.ExcelLocalDataSet(filepath[, …]) ExcelLocalDataSet loads and saves data to a local Excel file.
kedro.io.CachedDataSet(dataset[, version, …]) CachedDataSet is a dataset wrapper which caches in memory the data saved, so that the user avoids io operations with slow storage media.
kedro.io.DataCatalogWithDefault([data_sets, …]) A DataCatalog with a default DataSet implementation for any data set which is not registered in the catalog.

Additional AbstractDataSet implementations can be found in kedro.contrib.io.

Errors

kedro.io.DataSetAlreadyExistsError DataSetAlreadyExistsError raised by DataCatalog class in case of trying to add a data set which already exists in the DataCatalog.
kedro.io.DataSetError DataSetError raised by AbstractDataSet implementations in case of failure of input/output methods.
kedro.io.DataSetNotFoundError DataSetNotFoundError raised by DataCatalog class in case of trying to use a non-existing data set.

Base Classes

kedro.io.AbstractDataSet AbstractDataSet is the base class for all data set implementations.
kedro.io.AbstractVersionedDataSet(filepath, …) AbstractVersionedDataSet is the base class for all versioned data set implementations.
kedro.io.AbstractTransformer AbstractTransformer is the base class for all transformer implementations.
kedro.io.Version This namedtuple is used to provide load and save versions for versioned data sets.