Kedro Logo
0.17.6

Introduction

  • What is Kedro?
    • Learn how to use Kedro
    • Assumptions

Get started

  • Installation prerequisites
    • Virtual environments
      • conda
      • venv (instead of conda)
      • pipenv (instead of conda)
  • Install Kedro
    • Verify a successful installation
    • Install a development version
  • A “Hello World” example
    • Node
    • Pipeline
    • DataCatalog
    • Runner
    • Hello Kedro!
  • Create a new project
    • Create a new project interactively
    • Create a new project from a configuration file
    • Initialise a git repository
  • Iris dataset example project
    • Create the example project
      • Project directory structure
        • conf/
        • data
        • src
      • What best practice should I follow to avoid leaking confidential data?
    • Run the example project
    • Under the hood: Pipelines and nodes
  • Kedro starters
    • How to use Kedro starters
      • Starter aliases
    • List of official starters
    • Starter versioning
    • Use a starter in interactive mode
    • Use a starter with a configuration file

Tutorial

  • Kedro spaceflights tutorial
    • Kedro project development workflow
      • 1. Set up the project template
      • 2. Set up the data
      • 3. Create the pipeline
      • 4. Package the project
    • Optional: Git workflow
      • Create a project repository
      • Submit your changes to GitHub
  • Set up the spaceflights project
    • Create a new project
    • Install project dependencies with kedro install
      • More about project dependencies
      • Add and remove project-specific dependencies
    • Configure the project
  • Set up the data
    • Add your datasets to data
      • reviews.csv
      • companies.csv
      • shuttles.xlsx
    • Register the datasets
      • csv
      • xlsx
    • Custom data
  • Create a pipeline
    • Data processing pipeline
      • Node functions
      • Assemble nodes into the data processing pipeline
      • Update the project pipeline
      • Test the example
      • Persist pre-processed data
      • Extend the data processing pipeline
      • Test the example
    • Data science pipeline
      • Update dependencies
      • Create a data science node
      • Configure the input parameters
      • Register the dataset
      • Assemble the data science pipeline
      • Update the project pipeline
      • Test the pipelines
    • Kedro runners
    • Slice a pipeline
  • Package a project
    • Add documentation to your project
    • Package your project
      • Docker and Airflow
  • Visualise pipelines
    • Install Kedro-Viz
    • Visualise a whole pipeline
    • Exit an open visualisation
    • Visualise layers
    • Share a pipeline

Kedro project setup

  • Dependencies
    • Project-specific dependencies
    • kedro install
    • Workflow dependencies
      • Install dependencies related to the Data Catalog
        • Install dependencies at a group-level
        • Install dependencies at a type-level
  • Configuration
    • Configuration root
    • Local and base configuration environments
    • Additional configuration environments
    • Template configuration
      • Jinja2 support
    • Parameters
      • Load parameters
      • Specify parameters at runtime
      • Use parameters
    • Credentials
      • AWS credentials
    • Configure kedro run arguments
  • Lifecycle management with KedroSession
    • Overview
    • Create a session
  • The mini-kedro Kedro starter
    • Introduction
    • Usage
    • Content

Data Catalog

  • The Data Catalog
    • Using the Data Catalog within Kedro configuration
    • Specifying the location of the dataset
    • Data Catalog *_args parameters
    • Using the Data Catalog with the YAML API
    • Creating a Data Catalog YAML configuration file via CLI
    • Adding parameters
    • Feeding in credentials
    • Loading multiple datasets that have similar configuration
    • Transcoding datasets
      • A typical example of transcoding
      • How does transcoding work?
    • Transforming datasets
      • Applying built-in transformers
      • Transformer scope
    • Versioning datasets and ML models
    • Using the Data Catalog with the Code API
      • Configuring a Data Catalog
      • Loading datasets
        • Behind the scenes
      • Viewing the available data sources
      • Saving data
        • Saving data to memory
        • Saving data to a SQL database for querying
        • Saving data in Parquet
  • Kedro IO
    • Error handling
    • AbstractDataSet
    • Versioning
      • version namedtuple
      • Versioning using the YAML API
      • Versioning using the Code API
      • Supported datasets
    • Partitioned dataset
      • Partitioned dataset definition
        • Dataset definition
        • Partitioned dataset credentials
      • Partitioned dataset load
      • Partitioned dataset save
      • Incremental loads with IncrementalDataSet
        • Incremental dataset load
        • Incremental dataset save
        • Incremental dataset confirm
        • Checkpoint configuration
        • Special checkpoint config keys

Nodes and pipelines

  • Nodes
    • How to create a node
      • Node definition syntax
      • Syntax for input variables
      • Syntax for output variables
    • How to tag a node
    • How to run a node
  • Pipelines
    • How to build a pipeline
      • How to tag a pipeline
      • How to merge multiple pipelines
      • Information about the nodes in a pipeline
      • Information about pipeline inputs and outputs
    • Bad pipelines
      • Pipeline with bad nodes
      • Pipeline with circular dependencies
  • Modular pipelines
    • What are modular pipelines?
    • How do I create a modular pipeline?
    • Recommendations
    • How to share a modular pipeline
      • Package a modular pipeline
        • Package multiple modular pipelines
      • Pull a modular pipeline
        • Pull multiple modular pipelines
    • A modular pipeline example template
      • Configuration
      • Datasets
    • How to connect existing pipelines
    • How to use a modular pipeline twice
    • How to use a modular pipeline with different parameters
    • How to clean up a modular pipeline
  • Run a pipeline
    • Runners
      • SequentialRunner
      • ParallelRunner
        • Multiprocessing
        • Multithreading
    • Custom runners
    • Load and save asynchronously
    • Run a pipeline by name
    • Run pipelines with IO
    • Output to a file
  • Slice a pipeline
    • Slice a pipeline by providing inputs
    • Slice a pipeline by specifying nodes
    • Slice a pipeline by specifying final nodes
    • Slice a pipeline with tagged nodes
    • Slice a pipeline by running specified nodes
    • How to recreate missing outputs

Extend Kedro

  • Common use cases
    • Use Case 1: How to add extra behaviour to Kedro’s execution timeline
    • Use Case 2: How to integrate Kedro with additional data sources
    • Use Case 3: How to add CLI commands that are reusable across projects
    • Use Case 4: How to customise the initial boilerplate of your project
  • Hooks
    • Introduction
    • Concepts
      • Hook specification
        • Execution timeline Hooks
        • Registration Hooks
      • Hook implementation
        • Registering your Hook implementations with Kedro
        • Disable auto-registered plugins’ Hooks
    • Common use cases
      • Use Hooks to extend a node’s behaviour
      • Use Hooks to customise the dataset load and save methods
    • Under the hood
    • Hooks examples
      • Add memory consumption tracking
      • Add data validation
      • Add observability to your pipeline
      • Add metrics tracking to your model
      • Modify node inputs using before_node_run hook
  • Custom datasets
    • Scenario
    • Project setup
    • The anatomy of a dataset
    • Implement the _load method with fsspec
    • Implement the _save method with fsspec
    • Implement the _describe method
    • The complete example
    • Integration with PartitionedDataSet
    • Versioning
    • Thread-safety
    • How to handle credentials and different filesystems
    • How to contribute a custom dataset implementation
  • Kedro plugins
    • Overview
    • Example of a simple plugin
    • Working with click
    • Project context
    • Initialisation
    • global and project commands
    • Suggested command convention
    • Hooks
    • CLI Hooks
    • Contributing process
    • Supported Kedro plugins
    • Community-developed plugins
  • Create a Kedro starter
    • How to create a Kedro starter
    • Configuration variables
      • Example Kedro starter
  • Dataset transformers (deprecated)
    • Develop your own dataset transformer
  • Decorators (deprecated)
    • How to apply a decorator to nodes
    • How to apply multiple decorators to nodes
    • How to apply a decorator to a pipeline
    • Kedro decorators

Logging

  • Logging
    • Configure logging
    • Use logging
    • Logging for anyconfig
  • Experiment tracking
    • Enable experiment tracking
    • Community solutions

Development

  • Set up Visual Studio Code
    • Advanced: For those using venv / virtualenv
    • Setting up tasks
    • Debugging
      • Advanced: Remote Interpreter / Debugging
    • Configuring the Kedro catalog validation schema
  • Set up PyCharm
    • Set up Run configurations
    • Debugging
    • Advanced: Remote SSH interpreter
    • Advanced: Docker interpreter
    • Configure Python Console
    • Configuring the Kedro catalog validation schema
  • Kedro’s command line interface
    • Autocompletion (optional)
    • Invoke Kedro CLI from Python (optional)
    • Kedro commands
    • Global Kedro commands
      • Get help on Kedro commands
      • Confirm the Kedro version
      • Confirm Kedro information
      • Create a new Kedro project
      • Open the Kedro documentation in your browser
    • Project-specific Kedro commands
      • Project setup
        • Build the project’s dependency tree
        • Install all package dependencies
      • Run the project
        • Modifying a kedro run
      • Deploy the project
      • Pull a modular pipeline
      • Project quality
        • Build the project documentation
        • Lint your project
        • Test your project
      • Project development
        • Modular pipelines
        • Datasets
        • Data Catalog
        • Notebooks
  • Linting your Kedro project
  • Debugging
    • Introduction
    • Debugging Node
    • Debugging Pipeline

Deployment

  • Deployment guide
    • Deployment choices
  • Single-machine deployment
    • Container based
      • How to use container registry
    • Package based
    • CLI based
      • Use GitHub workflow to copy your project
      • Install and run the Kedro project
  • Distributed deployment
    • 1. Containerise the pipeline
    • 2. Convert your Kedro pipeline into targeted platform’s primitives
    • 3. Parameterise the runs
    • 4. (Optional) Create starters
  • Deployment with Argo Workflows
    • Why would you use Argo Workflows?
    • Prerequisites
    • How to run your Kedro pipeline using Argo Workflows
      • Containerise your Kedro project
      • Create Argo Workflows spec
      • Submit Argo Workflows spec to Kubernetes
      • Kedro-Argo plugin
  • Deployment with Prefect
    • Prerequisites
    • How to run your Kedro pipeline using Prefect
      • Convert your Kedro pipeline to Prefect flow
      • Run Prefect flow
  • Deployment with Kubeflow Pipelines
    • Why would you use Kubeflow Pipelines?
    • Prerequisites
    • How to run your Kedro pipeline using Kubeflow Pipelines
      • Containerise your Kedro project
      • Create a workflow spec
      • Authenticate Kubeflow Pipelines
      • Upload workflow spec and execute runs
  • Deployment with AWS Batch
    • Why would you use AWS Batch?
    • Prerequisites
    • How to run a Kedro pipeline using AWS Batch
      • Containerise your Kedro project
      • Provision resources
        • Create IAM Role
        • Create AWS Batch job definition
        • Create AWS Batch compute environment
        • Create AWS Batch job queue
      • Configure the credentials
      • Submit AWS Batch jobs
        • Create a custom runner
        • Set up Batch-related configuration
        • Update CLI implementation
      • Deploy
  • Deployment to a Databricks cluster
    • Prerequisites
    • Run the Kedro project with Databricks Connect
      • 1. Project setup
      • 2. Install dependencies and run locally
      • 3. Create a Databricks cluster
      • 4. Install Databricks Connect
      • 5. Configure Databricks Connect
      • 6. Copy local data into DBFS
      • 7. Run the project
    • Run Kedro project from a Databricks notebook
      • Extra requirements
      • 1. Create Kedro project
      • 2. Create GitHub personal access token
      • 3. Create a GitHub repository
      • 4. Push Kedro project to the GitHub repository
      • 5. Configure the Databricks cluster
      • 6. Run your Kedro project from the Databricks notebook
  • How to integrate Amazon SageMaker into your Kedro pipeline
    • Why would you use Amazon SageMaker?
    • Prerequisites
    • Prepare the environment
      • Install SageMaker package dependencies
      • Create SageMaker execution role
      • Create S3 bucket
    • Update the Kedro project
      • Create the configuration environment
      • Update the project hooks
      • Update the data science pipeline
        • Create node functions
        • Update the pipeline definition
      • Create the SageMaker entry point
    • Run the project
    • Cleanup
  • How to deploy your Kedro pipeline with AWS Step Functions
    • Why would you run a Kedro pipeline with AWS Step Functions
    • Strategy
    • Prerequisites
    • Deployment process
      • Step 1. Create new configuration environment to prepare a compatible DataCatalog
      • Step 2. Package the Kedro pipeline as an AWS Lambda-compliant Docker image
      • Step 3. Write the deployment script
      • Step 4. Deploy the pipeline
    • Limitations
    • Final thought
  • How to deploy your Kedro pipeline on Apache Airflow with Astronomer
    • Strategy
    • Prerequisites
    • Project Setup
    • Deployment process
      • Step 1. Create new configuration environment to prepare a compatible DataCatalog
      • Step 2. Package the Kedro pipeline as an Astronomer-compliant Docker image
      • Step 3. Convert the Kedro pipeline into an Airflow DAG with kedro airflow
      • Step 4. Launch the local Airflow cluster with Astronomer
    • Final thought

Tools integration

  • Build a Kedro pipeline with PySpark
    • Centralise Spark configuration in conf/base/spark.yml
    • Initialise a SparkSession in custom project context class
    • Use Kedro’s built-in Spark datasets to load and save raw data
      • spark.DeltaTableDataSet
      • spark.SparkDataSet
      • spark.SparkJDBCDataSet
      • spark.SparkHiveDataSet
    • Spark and Delta Lake interaction
    • Use MemoryDataSet for intermediary DataFrame
    • Use MemoryDataSet with copy_mode="assign" for non-DataFrame Spark objects
    • Tips for maximising concurrency using ThreadRunner
  • Use Kedro with IPython and Jupyter Notebooks/Lab
    • Why use a Notebook?
    • Kedro and IPython
      • Load DataCatalog in IPython
        • Dataset versioning
    • Kedro and Jupyter
    • How to use context
      • Run the pipeline
      • Parameters
      • Load/Save DataCatalog in Jupyter
      • Additional parameters for session.run()
    • Global variables
    • Convert functions from Jupyter Notebooks into Kedro nodes
    • IPython extension
    • IPython loader
      • Installation
      • Prerequisites
      • Troubleshooting and FAQs
        • How can I stop my notebook terminating?
        • Why can’t I run kedro jupyter notebook?
        • How can I reload the session, context, catalog and startup_error variables?
      • Kedro-Viz and Jupyter

FAQs

  • Frequently asked questions
    • What is Kedro?
    • Who maintains Kedro?
    • What are the primary advantages of Kedro?
    • How does Kedro compare to other projects?
    • What is data engineering convention?
    • How do I upgrade Kedro?
    • How can I use a development version of Kedro?
    • How can I find out more about Kedro?
    • How can I cite Kedro?
    • How can I get my question answered?
  • Kedro architecture overview
    • Kedro project
    • Kedro starter
    • Kedro library
    • Kedro framework
    • Kedro extension
  • Kedro Principles
    • 1. Modularity at the core ️📦
    • 2. Grow beginners into experts 🌱
    • 3. User empathy without unfounded assumptions 🤝
    • 4. Simplicity means bare necessities 🍞
    • 5. There should be one obvious way of doing things 🎯
    • 6. A sprinkle of magic is better than a spoonful of it ✨
    • 7. Lean process and lean product 👟

Resources

  • Images and icons
    • White background
      • Icon
      • Icon with text
    • Black background
      • Icon
      • Icon with text
  • Kedro glossary
    • Data Catalog
    • Data engineering vs Data science
    • Kedro
    • KedroContext
    • KedroSession
    • Kedro-Viz
    • Layers (data engineering convention)
    • Modular pipeline
    • Node
    • Node execution order
    • Pipeline
    • Pipeline slicing
    • Runner
    • Starters
    • Tags
    • Workflow dependencies

Contribute to Kedro

  • Introduction
  • Guidelines for contributing developers
    • Introduction
    • Before you start: development set up
    • Get started: areas of contribution
      • core contribution process
      • extras contribution process
    • Create a pull request
      • PEP-8 Standards (pylint and flake8)
      • Unit tests, 100% coverage (pytest, pytest-cov)
      • E2E tests (behave)
      • Others
      • Hints on pre-commit usage
    • Need help?
      • First timers only
      • How to contribute to an open source project on GitHub
  • Backwards compatibility & breaking changes
    • When should I make a breaking change?
    • The Kedro release model
  • Contribute to the Kedro documentation
    • How do I rebuild the documentation after I make changes to it?
      • Set up to build Kedro documentation
      • Build the documentation
    • Extend Kedro documentation
      • Add new pages
      • Move or remove pages
      • Create a pull request
      • Help!
    • Kedro documentation style guide
      • Language
      • Formatting
      • Links
      • Capitalisation
      • Bullets
      • Notes
      • Kedro lexicon
      • Style

API documentation

  • kedro
    • kedro.config
      • kedro.config.ConfigLoader
      • kedro.config.TemplatedConfigLoader
      • kedro.config.MissingConfigException
    • kedro.extras
      • kedro.extras.datasets
        • kedro.extras.datasets.api.APIDataSet
        • kedro.extras.datasets.biosequence.BioSequenceDataSet
        • kedro.extras.datasets.dask.ParquetDataSet
        • kedro.extras.datasets.email.EmailMessageDataSet
        • kedro.extras.datasets.geopandas.GeoJSONDataSet
        • kedro.extras.datasets.holoviews.HoloviewsWriter
        • kedro.extras.datasets.json.JSONDataSet
        • kedro.extras.datasets.matplotlib.MatplotlibWriter
        • kedro.extras.datasets.networkx.NetworkXDataSet
        • kedro.extras.datasets.pandas.CSVDataSet
        • kedro.extras.datasets.pandas.ExcelDataSet
        • kedro.extras.datasets.pandas.AppendableExcelDataSet
        • kedro.extras.datasets.pandas.FeatherDataSet
        • kedro.extras.datasets.pandas.GBQQueryDataSet
        • kedro.extras.datasets.pandas.GBQTableDataSet
        • kedro.extras.datasets.pandas.GenericDataSet
        • kedro.extras.datasets.pandas.HDFDataSet
        • kedro.extras.datasets.pandas.JSONDataSet
        • kedro.extras.datasets.pandas.ParquetDataSet
        • kedro.extras.datasets.pandas.SQLQueryDataSet
        • kedro.extras.datasets.pandas.SQLTableDataSet
        • kedro.extras.datasets.pickle.PickleDataSet
        • kedro.extras.datasets.pillow.ImageDataSet
        • kedro.extras.datasets.plotly.JSONDataSet
        • kedro.extras.datasets.plotly.PlotlyDataSet
        • kedro.extras.datasets.spark.DeltaTableDataSet
        • kedro.extras.datasets.spark.SparkDataSet
        • kedro.extras.datasets.spark.SparkHiveDataSet
        • kedro.extras.datasets.spark.SparkJDBCDataSet
        • kedro.extras.datasets.tensorflow.TensorFlowModelDataset
        • kedro.extras.datasets.text.TextDataSet
        • kedro.extras.datasets.tracking.JSONDataSet
        • kedro.extras.datasets.tracking.MetricsDataSet
        • kedro.extras.datasets.yaml.YAMLDataSet
      • kedro.extras.decorators
        • kedro.extras.decorators.memory_profiler
        • kedro.extras.decorators.retry_node
      • kedro.extras.extensions
        • kedro.extras.extensions.ipython
      • kedro.extras.logging
        • kedro.extras.logging.color_logger
      • kedro.extras.transformers
        • kedro.extras.transformers.memory_profiler
        • kedro.extras.transformers.time_profiler
    • kedro.framework
      • kedro.framework.cli
        • kedro.framework.cli.catalog
        • kedro.framework.cli.cli
        • kedro.framework.cli.hooks
        • kedro.framework.cli.jupyter
        • kedro.framework.cli.pipeline
        • kedro.framework.cli.project
        • kedro.framework.cli.registry
        • kedro.framework.cli.starters
        • kedro.framework.cli.utils
      • kedro.framework.context
        • kedro.framework.context.KedroContext
        • kedro.framework.context.KedroContextError
      • kedro.framework.hooks
        • kedro.framework.hooks.manager
        • kedro.framework.hooks.markers
        • kedro.framework.hooks.specs
      • kedro.framework.project
        • kedro.framework.project.configure_project
        • kedro.framework.project.validate_settings
      • kedro.framework.session
        • kedro.framework.session.session
        • kedro.framework.session.store
      • kedro.framework.startup
        • kedro.framework.startup.bootstrap_project
        • kedro.framework.startup.ProjectMetadata
    • kedro.io
      • kedro.io.AbstractDataSet
      • kedro.io.AbstractVersionedDataSet
      • kedro.io.AbstractTransformer
      • kedro.io.DataCatalog
      • kedro.io.LambdaDataSet
      • kedro.io.MemoryDataSet
      • kedro.io.PartitionedDataSet
      • kedro.io.IncrementalDataSet
      • kedro.io.CachedDataSet
      • kedro.io.DataCatalogWithDefault
      • kedro.io.Version
      • kedro.io.DataSetAlreadyExistsError
      • kedro.io.DataSetError
      • kedro.io.DataSetNotFoundError
    • kedro.pipeline
      • kedro.pipeline.node
      • kedro.pipeline.modular_pipeline.pipeline
      • kedro.pipeline.Pipeline
      • kedro.pipeline.node.Node
      • kedro.pipeline.decorators
        • kedro.pipeline.decorators.log_time
      • kedro.pipeline.modular_pipeline.ModularPipelineError
    • kedro.runner
      • kedro.runner.run_node
      • kedro.runner.AbstractRunner
      • kedro.runner.ParallelRunner
      • kedro.runner.SequentialRunner
      • kedro.runner.ThreadRunner
    • kedro.utils
      • kedro.utils.load_obj
    • kedro.versioning
      • kedro.versioning.journal
        • kedro.versioning.journal.Journal
        • kedro.versioning.journal.JournalFileHandler
Kedro
  • Docs »
  • Python Module Index

Python Module Index

k
 
k
- kedro
    kedro.config
    kedro.extras
    kedro.extras.datasets
    kedro.extras.decorators
    kedro.extras.decorators.memory_profiler
    kedro.extras.decorators.retry_node
    kedro.extras.extensions
    kedro.extras.extensions.ipython
    kedro.extras.logging
    kedro.extras.logging.color_logger
    kedro.extras.transformers
    kedro.extras.transformers.memory_profiler
    kedro.extras.transformers.time_profiler
    kedro.framework
    kedro.framework.cli
    kedro.framework.cli.catalog
    kedro.framework.cli.cli
    kedro.framework.cli.hooks
    kedro.framework.cli.hooks.manager
    kedro.framework.cli.hooks.markers
    kedro.framework.cli.hooks.specs
    kedro.framework.cli.jupyter
    kedro.framework.cli.pipeline
    kedro.framework.cli.project
    kedro.framework.cli.registry
    kedro.framework.cli.starters
    kedro.framework.cli.utils
    kedro.framework.context
    kedro.framework.hooks
    kedro.framework.hooks.manager
    kedro.framework.hooks.markers
    kedro.framework.hooks.specs
    kedro.framework.project
    kedro.framework.session
    kedro.framework.session.session
    kedro.framework.session.store
    kedro.framework.startup
    kedro.io
    kedro.pipeline
    kedro.pipeline.decorators
    kedro.runner
    kedro.utils
    kedro.versioning
    kedro.versioning.journal

Revision 319a9177.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: 0.17.6
Versions
latest
stable
0.17.6
0.17.5
0.17.4
0.17.3
0.17.2
0.17.1
0.17.0
0.16.6
0.16.5
0.16.4
0.16.3
0.16.2
0.16.1
0.16.0
0.15.9
0.15.8
0.15.7
0.15.6
0.15.5
0.15.4
0.15.3
0.15.2
0.15.0
0.14.3
Downloads
pdf
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.