Kedro starters are used to create projects that contain code to run as-is, or to adapt and extend. They provide pre-defined example code and configuration that can be reused, for example:
- As example code for a typical Kedro project
- To add a
docker-composesetup to launch Kedro next to a monitoring stack
- To add deployment scripts and CI/CD setup for your targeted infrastructure
A Kedro starter is a Cookiecutter template that contains the boilerplate code for a Kedro project. You can create your own starters for reuse within a project or team, as described in the documentation about how to create a Kedro starter.
How to use Kedro starters¶
To create a Kedro project using a starter, apply the
--starter flag to
kedro new as follows:
kedro new --starter=<path-to-starter>
path-to-startercould be a local directory or a VCS repository, as long as it is supported by Cookiecutter.
To create a project using the
kedro new --starter=https://github.com/quantumblacklabs/kedro-starter-pyspark.git
If no starter is provided to
kedro new, the default Kedro template will be used, as documented in “Creating a new project”.
We provide aliases for common starters maintained by Kedro team so that users don’t have to specify the full path. For example, to create a project using the
kedro new --starter=pyspark
To list all the aliases we support:
kedro starter list
List of official starters¶
The Kedro team maintains the following starters to bootstrap new Kedro projects:
mini-kedro: A minimum setup to use the traditional Iris dataset with Kedro’s DataCatalog, which is a core component of Kedro. This starter is of use in the exploratory phase of a project. For more information, please read the Mini-Kedro guide.
pandas-iris: The Kedro Iris dataset example project
pyspark-iris: An alternative Kedro Iris dataset example, using PySpark
pyspark: The configuration and initialisation code for a Kedro pipeline using PySpark
spaceflights: The spaceflights tutorial example code
Each starter project encodes our recommended Kedro best practices.
By default, Kedro will use the latest version available in the repository, but if you want to use a specific version of a starter, you can pass a
--checkout argument to the command as follows:
kedro new --starter=pyspark --checkout=0.1.0
--checkout value points to a branch, tag or commit in the starter repository.
Under the hood, the value will be passed to the
--checkout flag in Cookiecutter.
Use a starter in interactive mode¶
By default, when you create a new project using a starter,
kedro new launches by asking a few questions. You will be prompted to provide the following variables:
project_name- A human readable name for your new project
repo_name- A name for the directory that holds your project repository
python_package- A Python package name for your project package (see Python package naming conventions)
This mode assumes that the starter doesn’t require any additional configuration variables.
Use a starter with a configuration file¶
Kedro also allows you to specify a configuration file to create a project. Use the
--config flag alongside the starter as follows:
kedro new --config=my_kedro_pyspark_project.yml --starter=pyspark
This option is useful when the starter requires more configuration than is required by the interactive mode.