Dear OD Users,
I would like to URL encode strings in a data table.
First I tried the reflect function which caused an error in OD.
Now I am using:
SELECT i.col1,encode( i.col1 , 'UTF-8') AS url_encoded FROM inputTable i
but the url_encoded strings do not look like url_encoded strings. I was expecting more '%.
If I use a platform to test url encoding, the results look different.
Can someone tell me, if this url encoding is correct? Or how it is done correct?
That would help me a lot
I can at least give you the reason why this happens: return type of the encode function is actually binary, which in turn gets for some reason converted to a string here…
Details can be found in Spark’s Code: spark/stringExpressions.scala at fb9d706784557ef0fe9e17d59b7096374658954e · apache/spark · GitHub
Bottomline: This actually never did what the Spark documentation says (at least it seems so to me).
I’ve tried my own map function which uses
java.net.URLEncoder.encode(<value>, "UTF-8") for the conversion which works as expected. Since there is currently no possibility to use this code in a query (reflection is blacklisted - for a good reason!), I’m afraid, this functionality is out of reach.
Going with a small python script inside the WF can be used to workaround this limitation. If your data amount is too large, you can instead try subsequent regex-replace steps. But this is cumbersome in its own way. In the end, if you really need this particular functionality on a large data table, let the product people know!
There are two possibilities:
- Create a dedicated processor
- Whitelist some reflective calls
thank you very much for your answer. I’ll try it first with a simple python script. Maybe we come back to the product people, if we need the conversion to be faster.
Thanks a lot!