Set up the spaceflights tutorial project¶
In this section, we discuss the project set-up phase, which is the first part of the standard development workflow. The setup steps are as follows:
Create a new project with
Install project dependencies with
pip install -r src/requirements.txt
Configure the following in the
Credentials and any other sensitive information
Create a new project¶
If you have not yet set up Kedro, do so by following the guidelines to install Kedro.
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.18.4).
In your terminal window, navigate to the folder you want to store the project and type the following to create an empty project:
Alternatively, if you want to include a complete set of working example code within the project, generate the project from the Kedro starter for the spaceflights tutorial:
kedro new --starter=spaceflights
For either option, when prompted for a project name, enter
Kedro Tutorial. When Kedro has created your project, you can navigate to the project root directory:
Kedro projects have a
requirements.txt file to specify their dependencies and enable sharable projects by ensuring consistency across Python packages and versions.
The generic project template bundles some typical dependencies in
src/requirements.txt. Here’s a typical example, although you may find that the version numbers differ slightly depending on your version of Kedro:
# code quality packages black==22.1.0 # Used for formatting code with `kedro lint` flake8>=3.7.9, <5.0 # Used for linting code with `kedro lint` ipython==7.0 # Used for an IPython session with `kedro ipython` isort~=5.0 # Used for linting code with `kedro lint` nbstripout~=0.4 # Strips the output of a Jupyter Notebook and writes the outputless version to the original file # notebook tooling jupyter~=1.0 # Used to open a Kedro-session in Jupyter Notebook & Lab jupyterlab~=3.0 # Used to open a Kedro-session in Jupyter Lab # Pytest + useful extensions pytest-cov~=3.0 # Produces test coverage reports pytest-mock>=1.7.1, <2.0 # Wrapper around the mock package for easier use with pytest pytest~=6.2 # Testing framework for Python code
You can learn more about project dependencies in the Kedro documentation.
Add dependencies to the project¶
The dependencies above might be sufficient for some projects, but for this tutorial, you must add some extra requirements. These requirements will enable us to work with different data formats (including CSV, Excel, and Parquet) and to visualise the pipeline.
If you are using the tutorial created by the spaceflights starter, you can omit the copy/paste, but it’s worth opening
src/requirements.txtto inspect the contents.
Add the following lines to your
kedro[pandas.CSVDataSet, pandas.ExcelDataSet, pandas.ParquetDataSet]==0.18.4 # Specify optional Kedro dependencies kedro-viz~=5.0 # Visualise your pipelines scikit-learn~=1.0 # For modelling in the data science pipeline
Install the dependencies¶
To install all the project-specific dependencies, run the following from the project root directory:
pip install -r src/requirements.txt
Optional: configuration and logging¶
You may want to store credentials such as usernames and passwords if they are needed for specific data sources used by the project.
To do this, add them to
conf/local/credentials.yml (some examples are included in that file for illustration).
You can find additional information in the advanced documentation on configuration.
You might also want to set up logging at this stage of the workflow, but we do not use it in this tutorial.