Show a set of column names in an assert statement

Dear One Data Community,

I would like to display a set of column names within an error message in an assert processor. I am just wondering how that can be done?

So far, I have been trying to do that with workflow variables, but I have not managed to get that working yet.

In an SQL Query processor I have the following SQL statements

SELECT distinct f.col_name as unexpected_columns
FROM firstInputTable as f
WHERE f.col_name NOT in (SELECT s.origin_column_name from secondInputTable as s)

At a later step, I would then like to display the unexpected_columns within an assert’s error message.

Is there any way to do that, like by setting the unexpected_columns as a value of a workflow variable and then embedding the workflow variable within the assert error using the ‘@global_variable_name@’ syntax?

1 Like

ONE DATA variables work more like constants during the workflow execution, that means that once the workflow execution started, the variables are fixed. They can only be changed for the next input again.

Possible workarounds: If the number of error cases is finite (and small enough), you can work with the if/else processor and create one path for each error case and put a dedicated Assert processor for each error case.

Other than that, it sounds like a potential new feature to me to integrate column values or SQL queries into the assertion message.

Unfortunately the number of columns to be checked is pretty large (30/40ish), so listing them by hand is not really a viable solution in this case.

It would be nice to have an integration of SQL within assert statements or being able to set global variables within SQL queries in One Data though.

1 Like

Still concerning the same topic: is it possible to set a workflow variable in Python or R using a statement like the following one?

'@workflow_var@' = "my_value"

I was trying to do that yesterday evening, but was getting an error related to invalid literals being set.

EDIT: Apparently, this is not possible in the current version of ONE DATA

At the end of the day, I resorted to using a Python processor to display the list of columns missing with no global variable involved by raising an ImportError in my workflow:

list_missing_columns = list(dataset["missing_col_name"])

#Let's just get the list of missing columns 
str_origin_columns_missing = ' | '.join(str(e) for e in list_missing_columns)

error_message_display = "The following columns were not found in the input table: "  + str_origin_columns_missing + " |"

raise ImportError(error_message_display)