kedro.runner.ThreadRunner¶
-
class
kedro.runner.
ThreadRunner
(max_workers=None, is_async=False)[source]¶ Bases:
kedro.runner.runner.AbstractRunner
ThreadRunner
is anAbstractRunner
implementation. It can be used to run thePipeline
in parallel groups formed by toposort using threads.Methods
ThreadRunner.__init__
([max_workers, is_async])Instantiates the runner. ThreadRunner.create_default_data_set
(ds_name)Factory method for creating the default data set for the runner. ThreadRunner.run
(pipeline, catalog[, run_id])Run the Pipeline
using theDataSet``s provided by ``catalog
and save results back to the same objects.ThreadRunner.run_only_missing
(pipeline, catalog)Run only the missing outputs from the Pipeline
using theDataSet``s provided by ``catalog
and save results back to the same objects.-
__init__
(max_workers=None, is_async=False)[source]¶ Instantiates the runner.
Parameters: - max_workers (
Optional
[int
]) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count. - is_async (
bool
) – If True, set to False, because ThreadRunner doesn’t support loading and saving the node inputs and outputs asynchronously with threads. Defaults to False.
Raises: ValueError
– bad parameters passed- max_workers (
-
create_default_data_set
(ds_name)[source]¶ Factory method for creating the default data set for the runner.
Parameters: ds_name ( str
) – Name of the missing data setReturn type: AbstractDataSet
Returns: An instance of an implementation of AbstractDataSet to be used for all unregistered data sets.
-
run
(pipeline, catalog, run_id=None)¶ Run the
Pipeline
using theDataSet``s provided by ``catalog
and save results back to the same objects.Parameters: - pipeline (
Pipeline
) – ThePipeline
to run. - catalog (
DataCatalog
) – TheDataCatalog
from which to fetch data. - run_id (
Optional
[str
]) – The id of the run.
Raises: ValueError
– Raised whenPipeline
inputs cannot be satisfied.Return type: Dict
[str
,Any
]Returns: Any node outputs that cannot be processed by the
DataCatalog
. These are returned in a dictionary, where the keys are defined by the node outputs.- pipeline (
-
run_only_missing
(pipeline, catalog)¶ Run only the missing outputs from the
Pipeline
using theDataSet``s provided by ``catalog
and save results back to the same objects.Parameters: - pipeline (
Pipeline
) – ThePipeline
to run. - catalog (
DataCatalog
) – TheDataCatalog
from which to fetch data.
Raises: ValueError
– Raised whenPipeline
inputs cannot be satisfied.Return type: Dict
[str
,Any
]Returns: Any node outputs that cannot be processed by the
DataCatalog
. These are returned in a dictionary, where the keys are defined by the node outputs.- pipeline (
-