I have a string column whose entries can be either formatted as “MM/dd/yyyy” or “dd.MM.yyyy” or “MM/dd/yy”, based on the customer’s file that is uploaded, and the locale of their PC.
(i.e.; customer1 provides a file with format MM/dd/yyyy, customer 2 uploads file with format “dd.MM.yyyy” . Multiple data formats within the same file are not possible).
Is there any way in a ONE DATA workflow to accept multiple date formats and convert my string column to a date format, accepting multiple date formats?
I don’t think there is a built-in feature that you can use.
My approach would be to use Query processors and filter by the different patterns with regexes.
[opinionated] If possible, I would try and identify the input formats by other parameters than the format syntax; if e.g. you can identify the customers and the format is uniquely determined by the customers, I would separate the different branches in the workflow by this criterion instead.
Filtering purely by the regexes might allow erroneous data to slip in unnoticed.