
Welcome to Kedro’s documentation!¶
Introduction
Get Started
Tutorial
Kedro Project Setup
Data Catalog
- The Data Catalog
- Using the Data Catalog within Kedro configuration
- Specifying the location of the dataset
- Data Catalog
*_args
parameters - Using the Data Catalog with the YAML API
- Creating a Data Catalog YAML configuration file via CLI
- Adding parameters
- Feeding in credentials
- Loading multiple datasets that have similar configuration
- Transcoding datasets
- Transforming datasets
- Versioning datasets and ML models
- Using the Data Catalog with the Code API
- Kedro IO
Nodes and Pipelines
- Nodes
- Pipelines
- Modular pipelines
- What are modular pipelines?
- How do I create a modular pipeline?
- Recommendations
- How to share a modular pipeline
- A modular pipeline example template
- How to connect existing pipelines
- How to use a modular pipeline twice
- How to use a modular pipeline with different parameters
- How to clean up a modular pipeline
- Run a pipeline
- Slice a pipeline
Extend Kedro
- Common use cases
- Hooks
- Custom datasets
- Scenario
- Project setup
- The anatomy of a dataset
- Implement the
_load
method withfsspec
- Implement the
_save
method withfsspec
- Implement the
_describe
method - The complete example
- Integration with
PartitionedDataSet
- Versioning
- Thread-safety
- How to handle credentials and different filesystems
- How to contribute a custom dataset implementation
- Kedro plugins
- Create a Kedro starter
- Dataset transformers (deprecated)
- Decorators (deprecated)
Logging
Development
Deployment
Tools Integration
- Build a Kedro pipeline with PySpark
- Centralise Spark configuration in
conf/base/spark.yml
- Initialise a
SparkSession
in custom project context class - Use Kedro’s built-in Spark datasets to load and save raw data
- Use
MemoryDataSet
for intermediaryDataFrame
- Use
MemoryDataSet
withcopy_mode="assign"
for non-DataFrame
Spark objects - Tips for maximising concurrency using
ThreadRunner
- Centralise Spark configuration in
- Use Kedro with IPython and Jupyter Notebooks/Lab
Resources
API Docs¶
kedro |
Kedro is a framework that makes it easy to build robust and scalable data pipelines by providing uniform project templates, data abstraction, configuration and pipeline assembly. |