kedro.pipeline.node

kedro.pipeline.node(func, inputs, outputs, *, name=None, tags=None, confirms=None, namespace=None)[source]

Create a node in the pipeline by providing a function to be called along with variable names for inputs and/or outputs.

Parameters:
  • func (Callable) – A function that corresponds to the node logic. The function should have at least one input or output.

  • inputs (str | list[str] | dict[str, str] | None) – The name or the list of the names of variables used as inputs to the function. The number of names should match the number of arguments in the definition of the provided function. When dict[str, str] is provided, variable names will be mapped to function argument names.

  • outputs (str | list[str] | dict[str, str] | None) – The name or the list of the names of variables used as outputs to the function. The number of names should match the number of outputs returned by the provided function. When dict[str, str] is provided, variable names will be mapped to the named outputs the function returns.

  • name (str | None) – Optional node name to be used when displaying the node in logs or any other visualisations.

  • tags (str | Iterable[str] | None) – Optional set of tags to be applied to the node.

  • confirms (str | list[str] | None) – Optional name or the list of the names of the datasets that should be confirmed. This will result in calling confirm() method of the corresponding data set instance. Specified dataset names do not necessarily need to be present in the node inputs or outputs.

  • namespace (str | None) – Optional node namespace.

Return type:

Node

Returns:

A Node object with mapped inputs, outputs and function.

Example:

import pandas as pd
import numpy as np

def clean_data(cars: pd.DataFrame,
               boats: pd.DataFrame) -> dict[str, pd.DataFrame]:
    return dict(cars_df=cars.dropna(), boats_df=boats.dropna())

def halve_dataframe(data: pd.DataFrame) -> List[pd.DataFrame]:
    return np.array_split(data, 2)

nodes = [
    node(clean_data,
         inputs=['cars2017', 'boats2017'],
         outputs=dict(cars_df='clean_cars2017',
                      boats_df='clean_boats2017')),
    node(halve_dataframe,
         'clean_cars2017',
         ['train_cars2017', 'test_cars2017']),
    node(halve_dataframe,
         dict(data='clean_boats2017'),
         ['train_boats2017', 'test_boats2017'])
]