kedro.runner.ParallelRunner

class kedro.runner.ParallelRunner(max_workers=None, is_async=False)[source]

Bases: kedro.runner.runner.AbstractRunner

ParallelRunner is an AbstractRunner implementation. It can be used to run the Pipeline in parallel groups formed by toposort.

Methods

ParallelRunner.__init__([max_workers, is_async]) Instantiates the runner by creating a Manager.
ParallelRunner.create_default_data_set(ds_name) Factory method for creating the default data set for the runner.
ParallelRunner.run(pipeline, catalog[, run_id]) Run the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.
ParallelRunner.run_only_missing(pipeline, …) Run only the missing outputs from the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.
__init__(max_workers=None, is_async=False)[source]

Instantiates the runner by creating a Manager.

Parameters:
  • max_workers (Optional[int]) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count.
  • is_async (bool) – If True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False.
Raises:

ValueError – bad parameters passed

create_default_data_set(ds_name)[source]

Factory method for creating the default data set for the runner.

Parameters:ds_name (str) – Name of the missing data set
Return type:_SharedMemoryDataSet
Returns:An instance of an implementation of _SharedMemoryDataSet to be used for all unregistered data sets.
run(pipeline, catalog, run_id=None)

Run the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.

Parameters:
  • pipeline (Pipeline) – The Pipeline to run.
  • catalog (DataCatalog) – The DataCatalog from which to fetch data.
  • run_id (Optional[str]) – The id of the run.
Raises:

ValueError – Raised when Pipeline inputs cannot be satisfied.

Return type:

Dict[str, Any]

Returns:

Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.

run_only_missing(pipeline, catalog)

Run only the missing outputs from the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.

Parameters:
  • pipeline (Pipeline) – The Pipeline to run.
  • catalog (DataCatalog) – The DataCatalog from which to fetch data.
Raises:

ValueError – Raised when Pipeline inputs cannot be satisfied.

Return type:

Dict[str, Any]

Returns:

Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.