it seems currently not possible to use null values in custom input tables. However for unit testing there are cases where we want to test with null values.
The workaround currently used is to define empty strings, which is similar, but not the same.
Is there another option to solve this or would this require a feature change?
the major difficulty, and probably the reason why this is unsupported currently, is the difficulty to distinguish between the string null (null as a word) and null. I wouldnât claim that itâs impossible but you would need a new feature if you want to put Nulls into a custom input table.
SQL treats NULL and Empty Strings very differently in certain cases. I would not recommend to do so, especially not in Unit Testing.
As workaround, you can create NULL values using LEFT JOIN in a Double Input Query where the IDs in your join do not match. Itâs easy to do so but of course very annoying.
@Viola you can use an Extended Mathematical Operations Processor with CAST(null AS DOUBLE) (or any desired target data type) to add a null valued column to your data.
Unfortunately the null value raises a red flag in many processors since they cannot cope with such data. Query processors should not have any problems with it but other processors enforce a non-null schema which is violated by such rows (either causing a schema exception or a NullPointerException in executed Spark functions - donât worry, the exceptions will be caught but will break the execution of the affected sub-graph of the Workflow).
If you want to mix null and non-null values, a union processor can be used (or query for more sophisticated operations). Itâs a bit more effort than a CIT but it yields real nulls.
And to add another fun way of getting true ânullsâ: Use a query processor with
to create a column full of NULLs.
Or a Python processor as Data Generator instead of custom input table:
This will also give you some âNaNsâ to play around if desired for numeric columns: