Kedro Logo
latest

Introduction

  • What is Kedro?
    • Learn how to use Kedro
    • Assumptions

Get started

  • Installation prerequisites
    • Virtual environments
      • conda
      • venv (instead of conda)
      • pipenv (instead of conda)
  • Install Kedro
    • Verify a successful installation
    • Install a development version
  • A “Hello World” example
    • Node
    • Pipeline
    • DataCatalog
    • Runner
    • Hello Kedro!

Make a project

  • Create a new project
    • Create a new project interactively
    • Create a new project from a configuration file
    • Initialise a git repository
  • Iris dataset example project
    • Create the example project
      • Project directory structure
        • conf/
        • data
        • src
      • What best practice should I follow to avoid leaking confidential data?
    • Run the example project
    • Under the hood: Pipelines and nodes
  • Kedro starters
    • How to use Kedro starters
      • Starter aliases
    • List of official starters
    • Starter versioning
    • Use a starter in interactive mode
    • Use a starter with a configuration file
  • Standalone use of the DataCatalog
    • Introduction
    • Usage
    • Content
    • Create a full Kedro project

Tutorial

  • Kedro spaceflights tutorial
    • Kedro project development workflow
      • 1. Set up the project template
      • 2. Set up the data
      • 3. Create the pipeline
      • 4. Package the project
    • Optional: Git workflow
      • Create a project repository
      • Submit your changes to GitHub
  • Set up the spaceflights project
    • Create a new project
    • Install dependencies
    • Configure the project
  • Set up the data
    • Add your datasets to data
    • Register the datasets
      • csv
      • xlsx
    • Custom data
  • Create a pipeline
    • Data processing pipeline
      • Generate a new pipeline template
      • Add node functions
      • Assemble nodes into the data processing pipeline
      • Update the project pipeline
      • Test the example
      • Visualise the pipeline
      • Persist pre-processed data
      • Extend the data processing pipeline
      • Persist the model input table
      • Test the example
      • Use kedro viz --autoreload
    • Data science pipeline
      • Create the data science pipeline
      • Configure the input parameters
      • Register the dataset
      • Assemble the data science pipeline
      • Update the project pipeline
      • Test the pipelines
    • Kedro runners
    • Slice a pipeline
  • Visualise pipelines
    • Install Kedro-Viz
    • Visualise a whole pipeline
    • Exit an open visualisation
    • Visualise layers
    • Share a pipeline
    • Visualise Plotly charts in Kedro-Viz
  • Namespace pipelines
    • Adding a namespace to the data_processing pipeline
      • Why do we need to provide explicit inputs and outputs?
    • Adding namespaces to the data_science pipeline
      • Let’s explain what’s going on here
    • Nesting modular pipelines
  • Set up experiment tracking
    • Set up a project
    • Set up the session store
    • Set up tracking datasets
    • Set up your nodes and pipelines to log metrics
    • Generate the Run data
    • Access run data and compare runs
  • Package a project
    • Add documentation to your project
    • Package your project
      • Docker, Airflow and Deployment

Kedro project setup

  • Dependencies
    • Project-specific dependencies
    • Install project-specific dependencies
    • Workflow dependencies
      • Install dependencies related to the Data Catalog
        • Install dependencies at a group-level
        • Install dependencies at a type-level
  • Configuration
    • Configuration root
    • Local and base configuration environments
    • Additional configuration environments
    • Template configuration
      • Jinja2 support
    • Parameters
      • Load parameters
      • Specify parameters at runtime
      • Use parameters
    • Credentials
      • AWS credentials
    • Configure kedro run arguments
  • Lifecycle management with KedroSession
    • Overview
    • Create a session
  • Project settings

Data Catalog

  • The Data Catalog
    • Using the Data Catalog within Kedro configuration
    • Specifying the location of the dataset
    • Data Catalog *_args parameters
    • Using the Data Catalog with the YAML API
    • Creating a Data Catalog YAML configuration file via CLI
    • Adding parameters
    • Feeding in credentials
    • Loading multiple datasets that have similar configuration
    • Transcoding datasets
      • A typical example of transcoding
      • How does transcoding work?
    • Versioning datasets and ML models
    • Using the Data Catalog with the Code API
      • Configuring a Data Catalog
      • Loading datasets
        • Behind the scenes
      • Viewing the available data sources
      • Saving data
        • Saving data to memory
        • Saving data to a SQL database for querying
        • Saving data in Parquet
  • Kedro IO
    • Error handling
    • AbstractDataSet
    • Versioning
      • version namedtuple
      • Versioning using the YAML API
      • Versioning using the Code API
      • Supported datasets
    • Partitioned dataset
      • Partitioned dataset definition
        • Dataset definition
        • Partitioned dataset credentials
      • Partitioned dataset load
      • Partitioned dataset save
      • Incremental loads with IncrementalDataSet
        • Incremental dataset load
        • Incremental dataset save
        • Incremental dataset confirm
        • Checkpoint configuration
        • Special checkpoint config keys

Nodes and pipelines

  • Nodes
    • How to create a node
      • Node definition syntax
      • Syntax for input variables
      • Syntax for output variables
    • **kwargs-only node functions
    • How to tag a node
    • How to run a node
  • Pipelines
    • How to build a pipeline
      • How to tag a pipeline
      • How to merge multiple pipelines
      • Information about the nodes in a pipeline
      • Information about pipeline inputs and outputs
    • Bad pipelines
      • Pipeline with bad nodes
      • Pipeline with circular dependencies
  • Modular pipelines
    • What are modular pipelines?
      • Key concepts
    • How do I create a modular pipeline?
      • What does the kedro pipeline create do?
      • Ensuring portability
      • Providing modular pipeline specific dependencies
    • Using the modular pipeline() wrapper to provide overrides
    • Combining disconnected pipelines
    • Using a modular pipeline multiple times
    • How to use a modular pipeline with different parameters
  • Micro-packaging
    • Package a micro-package
    • Package multiple micro-packages
    • Pull a micro-package
      • Providing fsspec arguments
    • Pull multiple micro-packages
  • Run a pipeline
    • Runners
      • SequentialRunner
      • ParallelRunner
        • Multiprocessing
        • Multithreading
    • Custom runners
    • Load and save asynchronously
    • Run a pipeline by name
    • Run pipelines with IO
    • Output to a file
  • Slice a pipeline
    • Slice a pipeline by providing inputs
    • Slice a pipeline by specifying nodes
    • Slice a pipeline by specifying final nodes
    • Slice a pipeline with tagged nodes
    • Slice a pipeline by running specified nodes
    • How to recreate missing outputs

Extend Kedro

  • Common use cases
    • Use Case 1: How to add extra behaviour to Kedro’s execution timeline
    • Use Case 2: How to integrate Kedro with additional data sources
    • Use Case 3: How to add or modify CLI commands
    • Use Case 4: How to customise the initial boilerplate of your project
  • Hooks
    • Introduction
    • Concepts
      • Hook specification
        • CLI hooks
      • Hook implementation
        • Registering your Hook implementations with Kedro
        • Disable auto-registered plugins’ Hooks
    • Common use cases
      • Use Hooks to extend a node’s behaviour
      • Use Hooks to customise the dataset load and save methods
    • Under the hood
    • Hooks examples
      • Add memory consumption tracking
      • Add data validation
      • Add observability to your pipeline
      • Add metrics tracking to your model
      • Modify node inputs using before_node_run hook
  • Custom datasets
    • Scenario
    • Project setup
    • The anatomy of a dataset
    • Implement the _load method with fsspec
    • Implement the _save method with fsspec
    • Implement the _describe method
    • The complete example
    • Integration with PartitionedDataSet
    • Versioning
    • Thread-safety
    • How to handle credentials and different filesystems
    • How to contribute a custom dataset implementation
  • Kedro plugins
    • Overview
    • Example of a simple plugin
    • Working with click
    • Project context
    • Initialisation
    • global and project commands
    • Suggested command convention
    • Hooks
    • CLI Hooks
    • Contributing process
    • Supported Kedro plugins
    • Community-developed plugins
  • Create a Kedro starter
    • How to create a Kedro starter
    • Configuration variables
      • Example Kedro starter

Logging

  • Logging
    • Configure logging
    • Use logging
    • Logging for anyconfig
  • Experiment tracking
    • Enable experiment tracking
    • Community solutions

Development

  • Set up Visual Studio Code
    • Advanced: For those using venv / virtualenv
    • Setting up tasks
    • Debugging
      • Advanced: Remote Interpreter / Debugging
    • Configuring the Kedro catalog validation schema
  • Set up PyCharm
    • Set up Run configurations
    • Debugging
    • Advanced: Remote SSH interpreter
    • Advanced: Docker interpreter
    • Configure Python Console
    • Configuring the Kedro catalog validation schema
  • Kedro’s command line interface
    • Autocompletion (optional)
    • Invoke Kedro CLI from Python (optional)
    • Kedro commands
    • Global Kedro commands
      • Get help on Kedro commands
      • Confirm the Kedro version
      • Confirm Kedro information
      • Create a new Kedro project
      • Open the Kedro documentation in your browser
    • Customise or Override Project-specific Kedro commands
      • Project setup
        • Build the project’s dependency tree
        • Install all package dependencies
      • Run the project
        • Modifying a kedro run
      • Deploy the project
      • Pull a micro-package
      • Project quality
        • Build the project documentation
        • Lint your project
        • Test your project
      • Project development
        • Modular pipelines
        • Registered pipelines
        • Datasets
        • Data Catalog
        • Notebooks
  • Debugging
    • Introduction
    • Debugging Node
    • Debugging Pipeline

Deployment

  • Deployment guide
    • Deployment choices
  • Single-machine deployment
    • Container-based
      • How to use container registry
    • Package-based
    • CLI-based
      • Use GitHub workflow to copy your project
      • Install and run the Kedro project
  • Distributed deployment
    • 1. Containerise the pipeline
    • 2. Convert your Kedro pipeline into targeted platform’s primitives
    • 3. Parameterise the runs
    • 4. (Optional) Create starters
  • Deployment with Argo Workflows
    • Why would you use Argo Workflows?
    • Prerequisites
    • How to run your Kedro pipeline using Argo Workflows
      • Containerise your Kedro project
      • Create Argo Workflows spec
      • Submit Argo Workflows spec to Kubernetes
      • Kedro-Argo plugin
  • Deployment with Prefect
    • Prerequisites
    • How to run your Kedro pipeline using Prefect
      • Convert your Kedro pipeline to Prefect flow
      • Run Prefect flow
  • Deployment with Kubeflow Pipelines
    • Why would you use Kubeflow Pipelines?
    • Prerequisites
    • How to run your Kedro pipeline using Kubeflow Pipelines
      • Containerise your Kedro project
      • Create a workflow spec
      • Authenticate Kubeflow Pipelines
      • Upload workflow spec and execute runs
  • Deployment with AWS Batch
    • Why would you use AWS Batch?
    • Prerequisites
    • How to run a Kedro pipeline using AWS Batch
      • Containerise your Kedro project
      • Provision resources
        • Create IAM Role
        • Create AWS Batch job definition
        • Create AWS Batch compute environment
        • Create AWS Batch job queue
      • Configure the credentials
      • Submit AWS Batch jobs
        • Create a custom runner
        • Set up Batch-related configuration
        • Update CLI implementation
      • Deploy
  • Deployment to a Databricks cluster
    • Prerequisites
    • Running Kedro project from a Databricks notebook
      • 1. Project setup
      • 2. Install dependencies and run locally
      • 3. Create a Databricks cluster
      • 4. Create GitHub personal access token
      • 5. Create a GitHub repository
      • 6. Push Kedro project to the GitHub repository
      • 7. Configure the Databricks cluster
      • 8. Run your Kedro project from the Databricks notebook
      • 9. Using the Kedro IPython Extension
      • 10. Running Kedro-Viz on Databricks
  • How to integrate Amazon SageMaker into your Kedro pipeline
    • Why would you use Amazon SageMaker?
    • Prerequisites
    • Prepare the environment
      • Install SageMaker package dependencies
      • Create SageMaker execution role
      • Create S3 bucket
    • Update the Kedro project
      • Create the configuration environment
      • Update the project hooks
      • Update the data science pipeline
        • Create node functions
        • Update the pipeline definition
      • Create the SageMaker entry point
    • Run the project
    • Cleanup
  • How to deploy your Kedro pipeline with AWS Step Functions
    • Why would you run a Kedro pipeline with AWS Step Functions
    • Strategy
    • Prerequisites
    • Deployment process
      • Step 1. Create new configuration environment to prepare a compatible DataCatalog
      • Step 2. Package the Kedro pipeline as an AWS Lambda-compliant Docker image
      • Step 3. Write the deployment script
      • Step 4. Deploy the pipeline
    • Limitations
    • Final thought
  • How to deploy your Kedro pipeline on Apache Airflow with Astronomer
    • Strategy
    • Prerequisites
    • Project Setup
    • Deployment process
      • Step 1. Create new configuration environment to prepare a compatible DataCatalog
      • Step 2. Package the Kedro pipeline as an Astronomer-compliant Docker image
      • Step 3. Convert the Kedro pipeline into an Airflow DAG with kedro airflow
      • Step 4. Launch the local Airflow cluster with Astronomer
    • Final thought
  • Deployment to a Dask cluster
    • Why would you use Dask?
    • Prerequisites
    • How to distribute your Kedro pipeline using Dask
      • Create a custom runner
      • Update CLI implementation
      • Deploy
        • Set up Dask and related configuration

Tools integration

  • Build a Kedro pipeline with PySpark
    • Centralise Spark configuration in conf/base/spark.yml
    • Initialise a SparkSession in custom project context class
    • Use Kedro’s built-in Spark datasets to load and save raw data
    • Spark and Delta Lake interaction
    • Use MemoryDataSet for intermediary DataFrame
    • Use MemoryDataSet with copy_mode="assign" for non-DataFrame Spark objects
    • Tips for maximising concurrency using ThreadRunner
  • Use Kedro with IPython and Jupyter
    • Why use a Notebook?
    • Kedro IPython extension
      • Managed Jupyter instances
    • Kedro variables: catalog, context, pipelines and session
      • catalog
      • context
      • pipelines
      • session
    • Kedro and Jupyter
      • Manage Jupyter kernels
      • Use an alternative Jupyter client
      • Convert functions from Jupyter Notebooks into Kedro nodes
      • Kedro-Viz line magic

FAQs

  • Frequently asked questions
    • What is Kedro?
    • Who maintains Kedro?
    • What are the primary advantages of Kedro?
    • How does Kedro compare to other projects?
    • What is data engineering convention?
    • How do I upgrade Kedro?
    • How can I use a development version of Kedro?
    • How can I find out more about Kedro?
    • How can I cite Kedro?
    • How can I get my question answered?
  • Kedro architecture overview
    • Kedro project
    • Kedro starter
    • Kedro library
    • Kedro framework
    • Kedro extension
  • Kedro Principles
    • 1. Modularity at the core ️📦
    • 2. Grow beginners into experts 🌱
    • 3. User empathy without unfounded assumptions 🤝
    • 4. Simplicity means bare necessities 🍞
    • 5. There should be one obvious way of doing things 🎯
    • 6. A sprinkle of magic is better than a spoonful of it ✨
    • 7. Lean process and lean product 👟

Resources

  • Images and icons
    • White background
      • Icon
      • Icon with text
    • Black background
      • Icon
      • Icon with text
  • Kedro glossary
    • Data Catalog
    • Data engineering vs Data science
    • Kedro
    • KedroContext
    • KedroSession
    • Kedro-Viz
    • Layers (data engineering convention)
    • Modular pipeline
    • Node
    • Node execution order
    • Pipeline
    • Pipeline slicing
    • Runner
    • Starters
    • Tags
    • Workflow dependencies

Contribute to Kedro

  • Introduction
  • Guidelines for contributing developers
    • Introduction
    • Before you start: development set up
    • Get started: areas of contribution
      • core contribution process
      • extras contribution process
    • Create a pull request
      • Hints on pre-commit usage
      • Developer Certificate of Origin
    • Need help?
  • Backwards compatibility & breaking changes
    • When should I make a breaking change?
    • The Kedro release model
  • Contribute to the Kedro documentation
    • How do I rebuild the documentation after I make changes to it?
      • Set up to build Kedro documentation
      • Build the documentation
    • Extend Kedro documentation
      • Add new pages
      • Move or remove pages
      • Create a pull request
      • Help!
    • Kedro documentation style guide
      • Language
      • Formatting
      • Links
      • Capitalisation
      • Bullets
      • Notes
      • Kedro lexicon
      • Style
  • Join the Technical Steering Committee
    • Responsibilities of a maintainer
      • Product development
      • Community management
    • Requirements to become a maintainer
    • Application process
    • Voting process
      • Other issues or proposals
      • Adding or removing maintainers

API documentation

  • kedro
    • kedro.config
      • kedro.config.ConfigLoader
      • kedro.config.TemplatedConfigLoader
      • kedro.config.MissingConfigException
    • kedro.extras
      • kedro.extras.datasets
        • kedro.extras.datasets.api.APIDataSet
        • kedro.extras.datasets.biosequence.BioSequenceDataSet
        • kedro.extras.datasets.dask.ParquetDataSet
        • kedro.extras.datasets.email.EmailMessageDataSet
        • kedro.extras.datasets.geopandas.GeoJSONDataSet
        • kedro.extras.datasets.holoviews.HoloviewsWriter
        • kedro.extras.datasets.json.JSONDataSet
        • kedro.extras.datasets.matplotlib.MatplotlibWriter
        • kedro.extras.datasets.networkx.GMLDataSet
        • kedro.extras.datasets.networkx.GraphMLDataSet
        • kedro.extras.datasets.networkx.JSONDataSet
        • kedro.extras.datasets.pandas.CSVDataSet
        • kedro.extras.datasets.pandas.ExcelDataSet
        • kedro.extras.datasets.pandas.FeatherDataSet
        • kedro.extras.datasets.pandas.GBQQueryDataSet
        • kedro.extras.datasets.pandas.GBQTableDataSet
        • kedro.extras.datasets.pandas.GenericDataSet
        • kedro.extras.datasets.pandas.HDFDataSet
        • kedro.extras.datasets.pandas.JSONDataSet
        • kedro.extras.datasets.pandas.ParquetDataSet
        • kedro.extras.datasets.pandas.SQLQueryDataSet
        • kedro.extras.datasets.pandas.SQLTableDataSet
        • kedro.extras.datasets.pandas.XMLDataSet
        • kedro.extras.datasets.pickle.PickleDataSet
        • kedro.extras.datasets.pillow.ImageDataSet
        • kedro.extras.datasets.plotly.JSONDataSet
        • kedro.extras.datasets.plotly.PlotlyDataSet
        • kedro.extras.datasets.redis.PickleDataSet
        • kedro.extras.datasets.spark.DeltaTableDataSet
        • kedro.extras.datasets.spark.SparkDataSet
        • kedro.extras.datasets.spark.SparkHiveDataSet
        • kedro.extras.datasets.spark.SparkJDBCDataSet
        • kedro.extras.datasets.tensorflow.TensorFlowModelDataset
        • kedro.extras.datasets.text.TextDataSet
        • kedro.extras.datasets.tracking.JSONDataSet
        • kedro.extras.datasets.tracking.MetricsDataSet
        • kedro.extras.datasets.yaml.YAMLDataSet
      • kedro.extras.extensions
        • kedro.extras.extensions.ipython
      • kedro.extras.logging
        • kedro.extras.logging.color_logger
    • kedro.framework
      • kedro.framework.cli
        • kedro.framework.cli.catalog
        • kedro.framework.cli.cli
        • kedro.framework.cli.hooks
        • kedro.framework.cli.jupyter
        • kedro.framework.cli.micropkg
        • kedro.framework.cli.pipeline
        • kedro.framework.cli.project
        • kedro.framework.cli.registry
        • kedro.framework.cli.starters
        • kedro.framework.cli.utils
      • kedro.framework.context
        • kedro.framework.context.KedroContext
        • kedro.framework.context.KedroContextError
      • kedro.framework.hooks
        • kedro.framework.hooks.manager
        • kedro.framework.hooks.markers
        • kedro.framework.hooks.specs
      • kedro.framework.project
        • kedro.framework.project.configure_logging
        • kedro.framework.project.configure_project
        • kedro.framework.project.validate_settings
      • kedro.framework.session
        • kedro.framework.session.session
        • kedro.framework.session.store
      • kedro.framework.startup
        • kedro.framework.startup.bootstrap_project
        • kedro.framework.startup.ProjectMetadata
    • kedro.io
      • kedro.io.AbstractDataSet
      • kedro.io.AbstractVersionedDataSet
      • kedro.io.DataCatalog
      • kedro.io.LambdaDataSet
      • kedro.io.MemoryDataSet
      • kedro.io.PartitionedDataSet
      • kedro.io.IncrementalDataSet
      • kedro.io.CachedDataSet
      • kedro.io.Version
      • kedro.io.DataSetAlreadyExistsError
      • kedro.io.DataSetError
      • kedro.io.DataSetNotFoundError
    • kedro.pipeline
      • kedro.pipeline.node
      • kedro.pipeline.modular_pipeline.pipeline
      • kedro.pipeline.Pipeline
      • kedro.pipeline.node.Node
      • kedro.pipeline.modular_pipeline.ModularPipelineError
    • kedro.runner
      • kedro.runner.run_node
      • kedro.runner.AbstractRunner
      • kedro.runner.ParallelRunner
      • kedro.runner.SequentialRunner
      • kedro.runner.ThreadRunner
    • kedro.utils
      • kedro.utils.load_obj
Kedro
  • Docs »
  • Python Module Index

Python Module Index

k
 
k
- kedro
    kedro.config
    kedro.extras
    kedro.extras.datasets
    kedro.extras.extensions
    kedro.extras.extensions.ipython
    kedro.extras.logging
    kedro.extras.logging.color_logger
    kedro.framework
    kedro.framework.cli
    kedro.framework.cli.catalog
    kedro.framework.cli.cli
    kedro.framework.cli.hooks
    kedro.framework.cli.hooks.manager
    kedro.framework.cli.hooks.markers
    kedro.framework.cli.hooks.specs
    kedro.framework.cli.jupyter
    kedro.framework.cli.micropkg
    kedro.framework.cli.pipeline
    kedro.framework.cli.project
    kedro.framework.cli.registry
    kedro.framework.cli.starters
    kedro.framework.cli.utils
    kedro.framework.context
    kedro.framework.hooks
    kedro.framework.hooks.manager
    kedro.framework.hooks.markers
    kedro.framework.hooks.specs
    kedro.framework.project
    kedro.framework.session
    kedro.framework.session.session
    kedro.framework.session.store
    kedro.framework.startup
    kedro.io
    kedro.pipeline
    kedro.runner
    kedro.utils

Revision 8b357c9b.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: latest
Versions
latest
stable
0.18.1
0.18.0
0.17.7
0.17.6
0.17.5
0.17.4
0.17.3
0.17.2
0.17.1
0.17.0
0.16.6
0.16.5
0.16.4
0.16.3
0.16.2
0.16.1
0.16.0
0.15.9
0.15.8
0.15.7
0.15.6
0.15.5
0.15.4
0.15.3
0.15.2
0.15.0
0.14.3
Downloads
pdf
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.