Isolation Group and Python Processor

Dear *,

we are using isolation groups for our production line, within the production line there are quite some python processors. We encounter the issue, that the python processors run in time outs if we have quite some runs in the queue.

I wonder if isolation groups are used, do runs that are in the queue already reserve python connections?

Maybe to rephrase, how do isolation groups work - is the workflow started and wait till the preceding one is finished, so all processors do already start and are waiting in some point (especially important for python / r and flexible rest api processors)?

1 Like

When adding workflows to the same isolation group, the feature takes care that these workflows are executed one after the other, by adding them internally to a queue. The feature itself strives for max throughput of jobs over fairness.

You can add one or multiple workflows to a isolation-group by with the following prefix " !isolation: " to the tags of a workflow. Be aware that you must write it in lower case.

The isolation groups should behave the same for all processors. There is no special treatment for python processors.

1 Like

@kristina.dess to add up to @1Peter 's statement, I can confirm that there is no such thing as “reserving python or other execution environments upfront”. The resources (i.e. execution environment - or even more precise connection to the execution environment) are allocated as late as the python processor hitting its execute state. This does not depend on the execution context of the WF itself (Microservice, manual execution, PL execution, scheduled execution or isolation group chosen). Moreover, the timeout starts ticking after the script was sent to the execution environment controller. It does include the waiting time for the connection, though. This is partially a safeguard to avoid congestion of the system due to the lack of python resources.

If you have trouble with timeouts due to parallel running python processors, in fact isolation groups can come to the rescue (by queuing the jobs of WFs with python processors in them until the previous executions have finished.

Another option is, to have the limit of parallel python scripts running increased. Of course, this also means that the python execution environments have to be upgraded in resources to cope with the higher demand.

And yet another option is trying to avoid the use of externally executed scripts in the first place. I know that sometimes it seems more handy to have a python script doing the mundane tasks since one has better tools for the control flow. However, each python script execution is a change in data arrangement. To give the user the flexibility mentioned above, data will be (materialized and) collected to be supplied to the python script. This harms performance and produces IO overhead. Moreover, you have to take care of data arrangement, memory and parallelism. Things that usually Spark manages for you.