kedro.runner.ParallelRunner

class kedro.runner.ParallelRunner(max_workers=None, is_async=False)[source]

ParallelRunner is an AbstractRunner implementation. It can be used to run the Pipeline in parallel groups formed by toposort. Please note that this runner implementation validates dataset using the _validate_catalog method, which checks if any of the datasets are single process only using the _SINGLE_PROCESS dataset attribute.

Methods

create_default_data_set(ds_name)

Factory method for creating the default data set for the runner.

run(pipeline, catalog[, run_id])

Run the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.

run_only_missing(pipeline, catalog)

Run only the missing outputs from the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.

__init__(max_workers=None, is_async=False)[source]

Instantiates the runner by creating a Manager.

Parameters
  • max_workers (Optional[int]) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count. On windows machines, the max_workers value cannot be larger than 61 and will be set to min(61, max_workers).

  • is_async (bool) – If True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False.

Raises

ValueError – bad parameters passed

create_default_data_set(ds_name)[source]

Factory method for creating the default data set for the runner.

Parameters

ds_name (str) – Name of the missing data set

Return type

_SharedMemoryDataSet

Returns

An instance of an implementation of _SharedMemoryDataSet to be used for all unregistered data sets.

run(pipeline, catalog, run_id=None)

Run the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.

Parameters
  • pipeline (Pipeline) – The Pipeline to run.

  • catalog (DataCatalog) – The DataCatalog from which to fetch data.

  • run_id (Optional[str]) – The id of the run.

Raises

ValueError – Raised when Pipeline inputs cannot be satisfied.

Return type

Dict[str, Any]

Returns

Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.

run_only_missing(pipeline, catalog)

Run only the missing outputs from the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.

Parameters
  • pipeline (Pipeline) – The Pipeline to run.

  • catalog (DataCatalog) – The DataCatalog from which to fetch data.

Raises

ValueError – Raised when Pipeline inputs cannot be satisfied.

Return type

Dict[str, Any]

Returns

Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.