Following scenario:
User did joins on quite large datatables and set result table config to 999.999.999 -
Following error occurred:
-
Instances crashed
-
Did the error (“Exception in processing: IllegalStateException: SparkContext has been shutdown”) happen due to a huge amount of data in the Result Table processor?
-
What is the maximum amount of data that can and should be displayed within the RT?
-
Is it possible to set a server limit for result tables to prevent the scenario?
Hi Kristina,
- The reason the error happened can only be confirmed after having a very close look to the logs of ONE DATA server but yes, such a scenario can lead to a SparkContext shutdown (possibly because Spark could not obtain the needed memory and the whole instance ran out of memory).
- There is no specific maximum amount of data that can/should be displayed within the RT. It mainly depends on the number of columns in the data, the average data size in each record/row, the amount of memory and CPU available to Spark and of course, because this is a result table and result table results are saved to the database (if I remember correctly), the resources of the database server play a role here too.
- There is no special property in ONE DATA server to specify a limit but maybe Spark properties could help: Configuration - Spark 3.0.2 Documentation (please take care to look up the right Spark version)
Best regards,
Jean Pierre
Just some additions to what JayP wrote:
- You should not use the result table for more than a few thousand rows - everything else should be put into datatables
- difference in performance is negligible compared to parquet and postgres datatables
- result tables heavily clutter up the ONE DATA-internal database
- large and / or many result tables cause your workflow jobs to take significantly longer to load (both on the server and client)
- an upper limit on the number of collected rows is being considered, exactly for the reasons you described - the only problem is that it will be breaking change