Hi community,
I am loading data from an Oracle database with the Database connection Load processor. Some columns have types like DECIMAL(38,10) in Oracle.
I then save the data into a ONE DATA data table X using “Create or append”. The initial saving (create table) is working fine and the columns are cast as double.
However if I run the workflow again trying to append the data to the existing table I get the error:
Column COL_X can’t be casted from DecimalType(38,10) to DoubleType to match the schema of the existing Data Table.
However the append did work in the past when I loaded the already existing data table X, do a join with the new input data from Oracle (to check which data from Oracle is not present in the existing data table X), select only the Oracle columns in the join and append then the input from the Oracle DB to X.
What could be the reason that with an intermediate join the append is working, while it is failing when appending directly?
Furthermore, after some appends in the past, I am now not able to load the data table X anymore (sample or in a workflow) and get an error like:
Could not fetch Data Table by id
[ "Job aborted due to stage failure: Task 0 in stage 2129.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2129.0 (TID 27122, instance-cropped-by-me, executor 6): org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file maprfs:///path-cropped-by-me.parquet/part-00006-7729607d-7fd5-44e8-88b4-86ac503f3097-c000.snappy.parquet. Column: [Col_X], Expected: decimal(38,10), Found: DOUBLE
What could be the reason to being not able to load this data, despite I never got errors when appending it?
cc: @Viola , @ali
From a quick look the Database connection Load does not cast to the workflow supported data types but takes what comes from the oracle db. You could verify that by a “describe inputTable” in the query processor after the Database Connection Load. Workaround would be to explicitly cast to double afterwards.
Our processors (Including Data Table Save) only expect to operate StringType, LongType, TimestampType or DoubleType.
Would be tempted to consider that a bug if it’s the case.
1 Like

Yes, you are right. Directly after the “Database Connection Load” processor I get the Oracle datatypes instead of ONE DATA types.
With the “describe” command I was also able to verify: After joining this data with Oracle-types with data with ONEDATA-datatypes and only selecting the former data (Oracle-types), the output has ONE DATA datatypes. That is the reason I was able to save it previously.
I still have the problem, that after saving it successfully, I am not able to load the data with a normal “Data Table Load” processor, I get this error:
Exception during workflow execution (SparkException: Job aborted due to stage failure: Task 0 in stage 28512.0 failed 4 times, most recent failure: Lost task 0.3 in stage 28512.0 (TID 1367395, blabla.eu, executor 40): org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file maprfs:///environments/blablabla/b5b63084-e6f8-4383-a266-4347ae43e326.parquet/part-00006-7729607d-7fd5-44e8-88b4-86ac503f3097-c000.snappy.parquet. Column: [COL_NAME], Expected: decimal(38,10), Found: DOUBLE at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:187) at
So despite saving was successful, it seems there is still some information about Oracle datatypes stored, which leads to an error when trying to simply load the data.
Btw: Doing “describe inputTable” in a query after the load-processor shows me ONE DATA datatypes, also doing row-counts works only a simple result-table after the load-processor already gives the error above. Do you have any ideas if it is possible to retrieve the data or is it lost?
Have you tried to convert your data directly after the loading to double and then save it afterwards. As you already have the data_types extracted, you “only” need the Query Helper Processor and some SQL.
Or is the converting of those columns also not possible?
Hi Kai,
good idea indeed. Unfortunately converting the columns using the query helper or the ONE DATA “datatype conversion” processor gives me the same error when I try to put a result table or datatable save processor afterwards (save to new datatable). Also the python-processor is no help here, I get the same error when trying to add a Single-Input Pythonprocessor after the datatable load…
It seems there is no way to retrieve the data anymore, but thanks for your help 
That’s strange. I got the error “There is a type mismatch between the input schema and the stored schema: ROLE_ID (DecimalType(38,10) <-> DoubleType)” in a Save & Replace Processor and a simple CAST(ROLE_ID as Double)
in the Extended Mathematical Processor solved the issue. Of course I needed to delete my first data table and save a new one first. But then, I can replace this table and also load and transform it as usual.
Though, the column seems to be stored as DecimalType(38,10) in Oracle, it only contains INT values, maybe that makes it easier for ONE DATA
Yes, explicitly casting the datatypes and saving to a new table is a workaround for it, however I found no way to retrieve the already existing datatable.
@peter.dahlberg I would then create a bugticket for that: https://jira.intranet.onelogic.de/browse/QA-2894
FYI:
A probably related error occurred in a project of mine when using a Database Connection Load to Oracle and feeding the data directly into Python-Script Single Input:
We created a bugticket for that as well
https://jira.intranet.onelogic.de/browse/QA-2928