Csv Upload with German Umlauts

I would like to upload some data tables as csv to ONE DATA. These tables have string columns that might contain German Umlauts (like Ä, ö, ß, …). The csv files are UTF-8 encoded and if I open them locally with a text editor, the Umlauts are correctly displayed. After upload, the strings are broken (some other special characters).

Which encoding must I choose, to correctly display German Umlauts?
Is there a way to specify the encoding during upload?

Hi @kai.geukes !
I assume that you upload the CSV files using ONE DATA’s built in uploader. While I have no idea how to set up encoding etc. correctly there (if it is possible at all), I have made good experiences with the data upload custom component for Apps. At least in one case it was capable to successfully upload string columns that contain german-specific symbols.

For reference: You neither can use the python SDK like that

  response = od.datatables.create(name="MyNewFile",
                                  headers=list(upload_df.columns),
                                  content=data_with_utf_16_signs,
                                  project_id=project_id,
                                  locked=DatasetLock.OPEN, 
                                  create_fallback=False,  
                                  description="My new table" 
                                  )

:confused:

In general, ONE DATA seems to expect a CSV file with UTF-8-BOM (see Byte order mark - Wikipedia) which I suspect comes from both Java and Excel using this variant of UTF-8.

Python and Notepad++ and many other tools on the other hand use plain UTF-8, leading to ONE DATA “misinterpreting” this.

I did not use it for a while, but if I remember it you have different options to upload “umlauts” with ONE DATA:

In Python:
Save your file with encoding “utf-8-sig” (this is the python bom equivalent) or “latin-1” (the “old” iso-8859-1
image

In Notepad++ or other tools that allow this: Change the file encoding to utf-8-bom and save again.
image

1 Like