After an ONE DATA update a formerly working inner join between a table A (n rows) and table B (1 row) fails with the error shown below.
Question 1: Why is this a cross join for Spark?
Edit: Is it because currently its a 1on1 join? And even if Table A contains more rows the result of any inner/left join will be equivalent to a cross join?
Table A currently contains a single row:
role | attrA | attrB
template | 3 | 5
Table B always contains a single row (basically a couple of template columns that should be added to each row in Table A)
role | attrC | attrD
template | 8 | 9
An inner join is then performed on column “role” using the inner join processor of ONE DATA, leading to the error.
Question 2: Are there any best-practices regarding the setting of this config? I don’t know if it was True
before the ONE DATA update or if it got changed due to Spark upgrades (perhaps?), but is it safe to set it to True
or are there any downsides like exploding joins crashing the instance?
The MNWE workflow and error:
"AnalysisException: Detected implicit cartesian product for INNER join between logical plans Filter (isnotnull(role#103153) && (role#103153 = template)) +- LogicalRDD [role#103153, page_template#103154, layout_template#103155, table_element_json#103156, header_template#103157], false and Project [app_template#103004, study_id#103005, version#103006L, Operator_app_id#103007, Manager_app_id#103008, Viewer_app_id#103009] +- Relation[app_template#103004,study_id#103005,version#103006L,Operator_app_id#103007,Manager_app_id#103008,Viewer_app_id#103009,__ROW_ID__#103010] JDBCRelation(frt_acc95ccc_a506_422e_a7d4_717831699dff) [numPartitions=1] Join condition is missing or trivial. Either: use the CROSS JOIN syntax to allow cartesian products between these relations, or: enable implicit cartesian products by setting the configuration variable spark.sql.crossJoin.enabled=true;"