Hey Lucas,
there are multiple possible limitations in the place. In general, an upload will go through this route:
ingress -> reverse-proxy -> onedata-server -> faas-server -> function-registry
as for which ones could interfere here:
First likely one is that the registry is set to only have 10Gi persistent volume by default.
This is set by the helm chart, in the function registry section:
functionRegistry:
persistence:
size: 10Gi
In case of a docker environment, I’m not quite sure if any limit is set for that.
Generally speaking each function takes up about 65-70M, that should give you an estimate how much free space is left if you cannot check the volume’s space directly in your environment.
The second limitation could be in the overall ingress using NGINX.
This is manually set by your environment, and from the looks of it the Hyper Hyper template uses 5G here, but I can see some environments using 10G. Additionally, the ingress have a time limit for any transfer as well, that seems to be set between 8.5m and 2h. Here it is likely that you run out of time if it is set to something like 30m.
The way these are set is by the ingress annotations in the helm chart:
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: 5G
nginx.ingress.kubernetes.io/proxy-read-timeout: 3600
nginx.ingress.kubernetes.io/proxy-write-timeout: 510
Again, for a docker environment I am not quite sure how is it done, but should be similarly with the body size and timeouts, but in that case the keys are client_max_body_size
, proxy_send_timeout
, and proxy_read_timeout
.
The third one I am not quite sure if interferes, but our reverse proxy solution with Traefik may closes the connection as well.
In this case, the config map reverse-proxy-cm
has the configuration and for each middleware (faas-server-mw
, function-registry-mw
) there can be extra options added. While here nothing is set by default, the buffering.maxRequestBodyBytes
would control the size of the request, like:
http:
middlewares:
faas-server-mw:
buffering:
# Set 4GiB for the Request max size
maxRequestBodyBytes: 4295000000
When used, the request will try to load into memory if there is available space, if not then it will use the filesystem to forward the data. This can add additional time to the whole process as filesystem generally speaking are slow.
The timeout here can be set by the forwarding timeouts, but that does not seem to be set by default.
The fourth one is hard to track down, but for each service we do have a memory request and can have a limitation. The request is usually around 1-2GiB. Without a limit it would grow in size until the host system runs out of it, otherwise the limit is the max.
What happens is that each chunk of data or the whole request is stored in memory before it can be processes so if any of the limits are reached they can break the flow.
The limits and requests can be set in the helm chart for each affected parts of the system:
reverseProxy:
resources:
# here
onedata:
server:
resources:
# here
functions:
server:
resources:
# here
functionRegistry:
resources:
# here
To investigate your situation, can you please navigate to the functions panel, open the inspector (right-click -> inspect
) on the networking tab, try uploading the environment again and save the request into a file and send it to me, @lukas.mueller, or @claudiu.moldovan.
Here is a Firefox as an example, but for Chrome its very similar. With Safari, you would need to open the preferences and enable the “Show Develop menu in the menu bar” within the Advanced tab.
This would tell us what kind of error occured so then we may track it down right away.
Additionally, I am curious on your Dockerfile as a 3.14GB image is quite big and not quite sure how is it possible to reach that by adding python packages 
Thank you for reporting this issue!