kedro.pipeline.node

kedro.pipeline.node(func, inputs, outputs, *, name=None, tags=None)[source]

Create a node in the pipeline by providing a function to be called along with variable names for inputs and/or outputs.

Parameters:
  • func (Callable) – A function that corresponds to the node logic. The function should have at least one input or output.
  • inputs (Union[None, str, List[str], Dict[str, str]]) – The name or the list of the names of variables used as inputs to the function. The number of names should match the number of arguments in the definition of the provided function. When Dict[str, str] is provided, variable names will be mapped to function argument names.
  • outputs (Union[None, str, List[str], Dict[str, str]]) – The name or the list of the names of variables used as outputs to the function. The number of names should match the number of outputs returned by the provided function. When Dict[str, str] is provided, variable names will be mapped to the named outputs the function returns.
  • name (Optional[str]) – Optional node name to be used when displaying the node in logs or any other visualisations.
  • tags (Optional[Iterable[str]]) – Optional set of tags to be applied to the node.
Return type:

Node

Returns:

A Node object with mapped inputs, outputs and function.

Example:

import pandas as pd
import numpy as np

def clean_data(cars: pd.DataFrame,
               boats: pd.DataFrame) -> Dict[str, pd.DataFrame]:
    return dict(cars_df=cars.dropna(), boats_df=boats.dropna())

def halve_dataframe(data: pd.DataFrame) -> List[pd.DataFrame]:
    return np.array_split(data, 2)

nodes = [
    node(clean_data,
         inputs=['cars2017', 'boats2017'],
         outputs=dict(cars_df='clean_cars2017',
                      boats_df='clean_boats2017')),
    node(halve_dataframe,
         'clean_cars2017',
         ['train_cars2017', 'test_cars2017']),
    node(halve_dataframe,
         dict(data='clean_boats2017'),
         ['train_boats2017', 'test_boats2017'])
]