kedro.runner.ParallelRunner¶
-
class
kedro.runner.
ParallelRunner
(max_workers=None, is_async=False)[source]¶ Bases:
kedro.runner.runner.AbstractRunner
ParallelRunner
is anAbstractRunner
implementation. It can be used to run thePipeline
in parallel groups formed by toposort.Attributes
Methods
Factory method for creating the default data set for the runner.
ParallelRunner.run
(pipeline, catalog[, run_id])Run the
Pipeline
using theDataSet``s provided by ``catalog
and save results back to the same objects.ParallelRunner.run_only_missing
(pipeline, …)Run only the missing outputs from the
Pipeline
using theDataSet``s provided by ``catalog
and save results back to the same objects.-
__init__
(max_workers=None, is_async=False)[source]¶ Instantiates the runner by creating a Manager.
- Parameters
max_workers (
Optional
[int
]) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count. On windows machines, the max_workers value cannot be larger than 61 and will be set to min(61, max_workers).is_async (
bool
) – If True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False.
- Raises
ValueError – bad parameters passed
-
create_default_data_set
(ds_name)[source]¶ Factory method for creating the default data set for the runner.
- Parameters
ds_name (
str
) – Name of the missing data set- Return type
_SharedMemoryDataSet
- Returns
An instance of an implementation of _SharedMemoryDataSet to be used for all unregistered data sets.
-
run
(pipeline, catalog, run_id=None)¶ Run the
Pipeline
using theDataSet``s provided by ``catalog
and save results back to the same objects.- Parameters
pipeline (
Pipeline
) – ThePipeline
to run.catalog (
DataCatalog
) – TheDataCatalog
from which to fetch data.run_id (
Optional
[str
]) – The id of the run.
- Raises
ValueError – Raised when
Pipeline
inputs cannot be satisfied.- Return type
Dict
[str
,Any
]- Returns
Any node outputs that cannot be processed by the
DataCatalog
. These are returned in a dictionary, where the keys are defined by the node outputs.
-
run_only_missing
(pipeline, catalog)¶ Run only the missing outputs from the
Pipeline
using theDataSet``s provided by ``catalog
and save results back to the same objects.- Parameters
pipeline (
Pipeline
) – ThePipeline
to run.catalog (
DataCatalog
) – TheDataCatalog
from which to fetch data.
- Raises
ValueError – Raised when
Pipeline
inputs cannot be satisfied.- Return type
Dict
[str
,Any
]- Returns
Any node outputs that cannot be processed by the
DataCatalog
. These are returned in a dictionary, where the keys are defined by the node outputs.
-