Parallel Python Executions

Is there any (soft or hard) limit on the parallel python executions (onedata-server config pythonService.allowedConnections)?

Are there any risks when setting it too high? What would be the worst thing to happen?

There are no limits as far as I know. Really depends on the available memory, CPU & what Python Processors are used for. There is no official rule of thumb or something like that as far as I know.

The risks really depend on the setup:

  • If ONE DATA Server and PyData share resources (memory & CPU), a crash of PyData due to high number of parallel executions might lead to a crash of ONE DATA Server (& other services).
  • If they do not share resources, PyData might crash & ONE DATA Server would probably notice the crash (when it does not receive a response when calling /info on PyData) and fail the WF for which it ran the Python Execution.

Is there maybe also a possibility to monitor PyData or to monitor how many python scripts are executed in parallel?
e.g. is it possible to get those info from the OD API?

If you are running PyData in a docker container, you might be able to monitor the resources used by PyData (if you have DevOps access) but that won’t tell you how many scripts are executed in parallel.
There is no OD API that returns such info but such a feature should be quite feasible.

Additionally, I forgot to mention this before, “pythonService.allowedConnections” is not the only property that should be configured. There is a property on PyData’s side to throttle parallel executions called “max-concurrent-containers”.

Is this actually still in use? It sounds as if this could be a relict of the “old” pydata that spawned a container for each python processor executed, but the recent version does not do that anymore (then this setting would most likely be something along “max-concurrent-subprocesses” or something like that if still in use)

Hi @christoph.schober , yes, that property was used in the past to limit the number of containers created by PyData. However, as you said, PyData now creates subprocesses instead. The property was not renamed, it would cause a backward-incompatibility if we do so. Therefore, the same property is used now to limit the creation of subprocesses.

1 Like

One additional note: For setting max-concurrent-containers, the environment variable ALLOWED_CONNECTIONS needs to be set (e.g. in the environment section of the pydata component in the helm chart override file).

1 Like

One additional note: Setting the environment variable ALLOWED_CONNECTIONS for PyData only has an effect starting from PyData version 1.5.1

1 Like