Execution of the same workflow with different versions

I am wondering how workflow execution is happening in the background in the following situation:

If I have a workflow and this workflow has several versions (each version has its own input data table). Then I start all the different versions at the same time. Do all versions of this workflow execute in parallel or one after another?

Thanks in advance for your answers!

Hi,
I would assume that Spark does not see a difference whether we start two different workflows or the same workflow with different versions. For Spark it makes no difference, as long as ONE DATA does not do some magic there. :slight_smile:

1 Like

If you want to supress the parallel execution of Workflows you can try out the feature of isolation groups

1 Like

We did a small experiment and found out that: the data table save processor determines the fate of the execution. If we start the same WF (maybe different versions) at the same time, they will run parallel, if the data table save processor is set to “APPEND” mode. on other hand, if the data table save processor is set to “REPLACE”, then the execution happens one after another (only saving part of the WF).

It is actually slightly different. If the workflows are triggered to run in parallel, then each datatable save performs the operations on the physical storage as soon as the spark execution reaches that part of the execution plan.
This explicitly means that the save operations can occur in parallel. The important part is that the update of our metadata happens only after the operations on the physical filesystem have been completed, and the last such update wins - it updates the reference to the physical representation.

For APPEND operations, this just does not matter, as at the end you get a physical representation that contains the appended rows from all APPEND operations. For REPLACE however only the physical representation that was written last is the one you as a user see as the result of all the executions.
If you concurrently APPEND and REPLACE to the same datatable, then it’s only guaranteed that the result is “consistent” from the perspective of the respective WF’s execution (the data from the APPEND !OR! the data from the REPLACE). But it’s guaranteed that the data is consistent, and not a wild mix from different executions (which required the implementation of an explicit wrapper for postgres datatables).

Note also: The number of WFs that can actually execute in parallel is generally limited by the number of workers that are available, and by the scheduling mode (FAIR vs FIFO). Also, the number of concurrent WF executions from a single schedule is limited to two (due to a bug in the implementation; it should be 1).
As Kai noted, you can also explicitly prevent concurrency with isolation groups.

2 Likes