Afaik, ONE DATA saves the meta information (name, creator, timestamps, AA rules, etc.) of a Data Table as well as the references (file names/paths/locations) to the parquet files containing the Data Table’s records in the (Postgres) database. The actual data (the records) are stored in the parquet files (one for each partition) on the filesystem. Please correct me if I am wrong here.
What happens, if I execute a workflow that loads a Data Table, but the parquet files on the file system cannot be found (e.g. because the files were removed manually/externally)?
You should receive an error that informs you about the missing file. Before the actual loading a validation is performed which does such sanity checks.
If the whole data table was deleted, you will get an error at the Load Processor.
If there’s (for any reason - mostly faulty FS or concurrent deletions where the latter can be countered with isolation groups) only parts of the data (certain partitions) missing, you will get an error farther down the workflow (at the first Spark Action that triggers materialization of the data and therefore loading the physical files).
While the Error at the Load Processor is quite user friendly, the messages for missing parts can be a bit more confusing (but still clearly state that there was a missing file).
Reason behind this is the laziness of Spark - it does not check everything in the beginning but only when the actual data has to be loaded. This makes sense especially for large data tables with many partitions since the check will involve I/O overhead proportional to the partition count.