[ 
https://issues.apache.org/jira/browse/SPARK-50475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cedric Cuypers resolved SPARK-50475.
------------------------------------
    Resolution: Not A Bug

Issue was caused by a difference in Java version between driver (client mode) & 
executors (pre-built docker image).

> NullPointerException when saving a df with 'noop' format after joining wide 
> dfs
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-50475
>                 URL: https://issues.apache.org/jira/browse/SPARK-50475
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.4.0, 3.5.3
>         Environment: Context:
>  * Issue appears when using spark 3.4.0 or 3.5.3; it does not appear when 
> using spark 3.2.0 or 3.3.0
>  * Spark is run on Kubernetes; data is read from s3
>  
> Overview of spark context (sensitive items are redacted):
> {code:java}
> spark.sparkContext.getConf().getAll()
>  [('spark.eventLog.enabled', 'true'),
>  ('spark.kubernetes.driverEnv.AWS_REGION', 'eu-west-1'),
>  ('spark.network.crypto.enabled', 'true'),
>  ('spark.kubernetes.allocation.batch.size', '10'),
>  ('spark.kubernetes.container.image.pullSecrets', 'ecr-credentials'),
>  ('spark.hadoop.fs.s3a.bucket.XXX.server-side-encryption.key',
>   'arn:aws:kms:eu-west-1:XXX:key/XXX'),
>  ('spark.kubernetes.executor.podNamePrefix',
>   'XXX'),
>  ('spark.hadoop.fs.s3a.server-side-encryption-algorithm', 'SSE-KMS'),
>  ('spark.driver.extraJavaOptions',
>   '-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions 
> --add-opens=java.base/java.lang=ALL-UNNAMED 
> --add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
> --add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
> --add-opens=java.base/java.io=ALL-UNNAMED 
> --add-opens=java.base/java.net=ALL-UNNAMED 
> --add-opens=java.base/java.nio=ALL-UNNAMED 
> --add-opens=java.base/java.util=ALL-UNNAMED 
> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
> --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
> --add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
> --add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
> --add-opens=java.base/sun.security.action=ALL-UNNAMED 
> --add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
> --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
> -Djdk.reflect.useDirectMethodHandle=false'),
>  ('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'),
>  ('spark.hadoop.fs.s3a.endpoint.region', 'eu-west-1'),
>  ('spark.executor.instances', '10'),
>  ('spark.hadoop.fs.s3.impl', 'org.apache.hadoop.fs.s3a.S3AFileSystem'),
>  ('spark.hadoop.fs.s3a.bucket.XXX.server-side-encryption-algorithm',
>   'SSE-KMS'),
>  ('spark.sql.parquet.compression.codec', 'snappy'),
>  ('spark.eventLog.dir',
>   's3a://XXX'),
>  ('spark.sql.adaptive.enabled', 'False'),
>  ('spark.kubernetes.container.image.pullPolicy', 'Always'),
>  ('spark.kubernetes.driver.annotation.iam.amazonaws.com/role',
>   'XXX'),
>  ('spark.executor.memory', '4g'),
>  ('spark.sql.session.timeZone', 'CET'),
>  ('spark.executor.id', 'driver'),
>  ('spark.executor.extraJavaOptions',
>   '-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions 
> --add-opens=java.base/java.lang=ALL-UNNAMED 
> --add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
> --add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
> --add-opens=java.base/java.io=ALL-UNNAMED 
> --add-opens=java.base/java.net=ALL-UNNAMED 
> --add-opens=java.base/java.nio=ALL-UNNAMED 
> --add-opens=java.base/java.util=ALL-UNNAMED 
> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
> --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
> --add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
> --add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
> --add-opens=java.base/sun.security.action=ALL-UNNAMED 
> --add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
> --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
> -Djdk.reflect.useDirectMethodHandle=false -Dlog4j.debug=true 
> -Dlog4j.logger.org.apache.hadoop=DEBUG'),
>  ('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version', '2'),
>  ('spark.driver.host', 'XXX'),
>  ('spark.sql.warehouse.dir',
>   'file:XXX'),
>  ('spark.jars',
>   'XXX'),
>  ('spark.sql.sources.partitionColumnTypeInference.enabled', 'false'),
>  ('spark.sql.debug.maxToStringFields', '1000'),
>  ('spark.hadoop.fs.s3a.server-side-encryption.key',
>   'arn:aws:kms:eu-west-1:XXX:key/XXX'),
>  ('spark.authenticate', 'true'),
>  ('spark.hadoop.fs.s3a.multiobjectdelete.enable', 'false'),
>  ('spark.app.initial.archive.urls',
>   'XXX'),
>  ('spark.hadoop.fs.s3a.aws.credentials.provider',
>   'com.amazonaws.auth.DefaultAWSCredentialsProviderChain'),
>  ('spark.app.name', 'XXX'),
>  ('spark.kubernetes.executor.request.cores', '1000m'),
>  ('spark.kubernetes.pyspark.pythonVersion', '3'),
>  ('spark.io.encryption.enabled', 'true'),
>  ('spark.serializer.objectStreamReset', '100'),
>  ('spark.archives',
>   'XXX'),
>  ('spark.kubernetes.executor.limit.cores', '1000m'),
>  ('spark.sql.pyspark.jvmStacktrace.enabled', 'True'),
>  ('spark.submit.deployMode', 'client'),
>  ('spark.kubernetes.driver.request.cores', '1000m'),
>  ('spark.driver.cores', '1'),
>  ('spark.repl.local.jars',
>   'XXX'),
>  ('spark.app.submitTime', '1733211972479'),
>  ('spark.sql.avro.compression.codec', 'snappy'),
>  ('spark.app.startTime', '1733211972711'),
>  ('spark.logConf', 'true'),
>  ('spark.master', 'k8s://https://kubernetes.default.svc.cluster.local'),
>  ('spark.kubernetes.namespace', 'dbe'),
>  ('spark.kubernetes.executor.annotation.iam.amazonaws.com/role',
>   'XXX'),
>  ('spark.app.id', 'XXX'),
>  ('spark.driver.port', '40163'),
>  ('spark.app.initial.jar.urls',
>   'XXX'),
>  ('spark.kubernetes.container.image',
>   'XXX.dkr.ecr.eu-west-1.amazonaws.com/public/spark-k8s:3.4.0'),
>  ('spark.rdd.compress', 'True'),
>  ('spark.driver.memory', '2g'),
>  ('spark.submit.pyFiles', ''),
>  ('spark.kubernetes.authenticate.driver.serviceAccountName',
>   'XXX'),{code}
>            Reporter: Cedric Cuypers
>            Priority: Blocker
>
> When joining two dataframes, Spark throws a NullPointerException
>  
> {code:java}
> from pyspark.sql.functions import lit
> import logging
> from pyspark.sql.functions import col
> from pyspark.sql import functions as FPREFIX_PP = "KBC__GDPR__PP__"
> path = 
> "s3://XXX/feature/dfpa/s3/partycompanyinvestmentdigiscorefeature.parquet"
> pdate = "20241022"
> #psource = "ADOBECLICK"
> #psource = "NEWSFEED"
> psource = "ADVICES"
> input_df = 
> spark.read.parquet(path+"/pdate="+pdate+"/psource="+psource).withColumn("pdate",lit(pdate)).withColumn("psource",lit(psource))
> df = input_df
> pp_df = 
> spark.read.parquet("s3://XXX/master/dfpa/s3/partyprivacy.parquet/pdate=20241125/psource=DCTL_BIS/")
> party_privacy_columns = ['ANL_COOK_CST_TS',
>  'MDL_COOK_CST_TS',
>  'CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC',
>  'DRT_MKT_NFNC_PDC_COOK_CST_TS',
>  'DRT_MKT_COOK_CST_IC',
>  'DATA_DTH_IC',
>  'CLN_CDT_ETD_DRT_MKT_IC',
>  'CFT_COOK_CST_IC',
>  'DRT_MKT_FNC_PDC_COOK_CST_TS',
>  'CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC',
>  'DRT_MKT_NFNC_PDC_COOK_CST_IC',
>  'CLN_CDT_CMC_MDL_IC',
>  'MDL_COOK_CST_IC',
>  'DRT_MKT_FNC_PDC_COOK_CST_IC',
>  'CLN_CDT_CVN_IC',
>  'CLN_CDT_ETD_CMC_PFL_IC',
>  'ANL_COOK_CST_IC',
>  'DRT_MKT_COOK_CST_TS',
>  'NFNC_PDC_PCM_CST_IC',
>  'ACS_PSD2_CST_IC',
>  'CFT_COOK_CST_TS',
>  'CLN_CDT_BAS_DRT_MKT_IC',
>  'CLN_CDT_PTC_MKT_SVY_IC',
>  'FNC_PDC_PCM_CST_IC',
>  'CLN_CDT_BAS_CMC_PFL_IC',
>  'CPN_NO'] + [pp_join_field]pp_df = pp_df.select(*party_privacy_columns)for 
> col_name in pp_df.columns:  # rename pp columns
>     new_col_name = PREFIX_PP + col_name
>     pp_df = pp_df.withColumnRenamed(col_name, 
> new_col_name)data_controller_column = "CPN_NO"
> col_cc_join_field = f"{PREFIX_PP}CPN_NO"
> # Let's try to select only the columns needed to join
> print(len(df.columns))
> nr_cols_keep = 130 # Fails at 125
> #pp_df = pp_df.select(col(col_pp_join_field),col(col_cc_join_field))
> df = 
> df.select(col(party_identifier_attrib),col(data_controller_column),*df.columns[:nr_cols_keep])print(df.where((col(party_identifier_attrib).isNull())
>  | (col(party_identifier_attrib) == "")).count())
> print(df.where((col(data_controller_column).isNull()) | 
> (col(data_controller_column) == "")).count())
> # Let's try to drop records from pp_df that have null for col_pp_join_field 
> or col_cc_join_field
> pp_df = pp_df.dropna(how='any',subset=[col_pp_join_field,col_cc_join_field])
> df = 
> df.dropna(how='any',subset=[party_identifier_attrib,data_controller_column])
> print(df_unknown_customers.count())
> df_unknown_customers.printSchema()
> df_unknown_customers.write.format("noop").mode("overwrite").save()
> print("unknown customers ok")df_known_customers = df.join(pp_df,
>         (col(party_identifier_attrib).eqNullSafe(col(col_pp_join_field)))
>         & (col(data_controller_column).eqNullSafe(col(col_cc_join_field))),
>         how="inner",
>        )
> #df_known_customers.cache()
> print(df_known_customers.count())
> #df_known_customers.explain(mode="formatted")
> df_known_customers.printSchema()
> df_known_customers.write.format("noop").mode("overwrite").save()
> print("known customers ok"){code}
> Full stacktrace:
> {code:java}
> ---------------------------------------------------------------------------
> Py4JJavaError                             Traceback (most recent call last)
> Cell In[5], line 105
>     103 #df_known_customers.explain(mode="formatted")
>     104 df_known_customers.printSchema()
> --> 105 df_known_customers.write.format("noop").mode("overwrite").save()
>     106 print("known customers ok")
> File 
> ~/.conda/envs/demo_oktopuss_datalink-env_34/lib/python3.11/site-packages/pyspark/sql/readwriter.py:1396,
>  in DataFrameWriter.save(self, path, format, mode, partitionBy, **options)
>    1394     self.format(format)
>    1395 if path is None:
> -> 1396     self._jwrite.save()
>    1397 else:
>    1398     self._jwrite.save(path)
> File 
> ~/.conda/envs/demo_oktopuss_datalink-env_34/lib/python3.11/site-packages/py4j/java_gateway.py:1322,
>  in JavaMember.__call__(self, *args)
>    1316 command = proto.CALL_COMMAND_NAME +\
>    1317     self.command_header +\
>    1318     args_command +\
>    1319     proto.END_COMMAND_PART
>    1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>    1323     answer, self.gateway_client, self.target_id, self.name)
>    1325 for temp_arg in temp_args:
>    1326     if hasattr(temp_arg, "_detach"):
> File 
> ~/.conda/envs/demo_oktopuss_datalink-env_34/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py:169,
>  in capture_sql_exception.<locals>.deco(*a, **kw)
>     167 def deco(*a: Any, **kw: Any) -> Any:
>     168     try:
> --> 169         return f(*a, **kw)
>     170     except Py4JJavaError as e:
>     171         converted = convert_exception(e.java_exception)
> File 
> ~/.conda/envs/demo_oktopuss_datalink-env_34/lib/python3.11/site-packages/py4j/protocol.py:326,
>  in get_return_value(answer, gateway_client, target_id, name)
>     324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
>     325 if answer[1] == REFERENCE_TYPE:
> --> 326     raise Py4JJavaError(
>     327         "An error occurred while calling {0}{1}{2}.\n".
>     328         format(target_id, ".", name), value)
>     329 else:
>     330     raise Py4JError(
>     331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
>     332         format(target_id, ".", name, value))
> Py4JJavaError: An error occurred while calling o800.save.
> : java.util.concurrent.ExecutionException: java.lang.NullPointerException
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>       at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:209)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeBroadcast$1(SparkPlan.scala:208)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>       at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>       at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:204)
>       at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doExecute(BroadcastHashJoinExec.scala:142)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>       at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:191)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:384)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:382)
>       at 
> org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExec.writeWithV2(WriteToDataSourceV2Exec.scala:266)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:360)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:359)
>       at 
> org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExec.run(WriteToDataSourceV2Exec.scala:266)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>       at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
>       at 
> org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
>       at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
>       at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
>       at 
> org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133)
>       at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
>       at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:318)
>       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
>       at py4j.Gateway.invoke(Gateway.java:282)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>       at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>       at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.spark.util.io.ChunkedByteBuffer.$anonfun$getChunks$1(ChunkedByteBuffer.scala:181)
>       at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>       at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>       at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>       at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>       at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>       at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
>       at 
> org.apache.spark.util.io.ChunkedByteBuffer.getChunks(ChunkedByteBuffer.scala:181)
>       at 
> org.apache.spark.util.io.ChunkedByteBufferInputStream.<init>(ChunkedByteBuffer.scala:278)
>       at 
> org.apache.spark.util.io.ChunkedByteBuffer.toInputStream(ChunkedByteBuffer.scala:174)
>       at 
> org.apache.spark.sql.execution.SparkPlan.decodeUnsafeRows(SparkPlan.scala:409)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeCollectIterator$2(SparkPlan.scala:457)
>       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
>       at 
> org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:476)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:160)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:1163)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:1151)
>       at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:148)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:217)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       ... 1 more {code}
>  
>  
> The schema of the wide dataframe:
>  
> {code:java}
> root
>  |-- PTY_CLT_NO: string (nullable = true)
>  |-- CPN_NO: string (nullable = true)
>  |-- PTY_CPN_GUID_NO: string (nullable = true)
>  |-- PTY_CLT_NO: string (nullable = true)
>  |-- PTY_CLT_NO_TKN_NM: string (nullable = true)
>  |-- CPN_NO: string (nullable = true)
>  |-- ALL_ADV_NO: integer (nullable = true)
>  |-- DGT_ADV_NO: integer (nullable = true)
>  |-- DGT_ADV_READ_NO: integer (nullable = true)
>  |-- DGT_ADV_SGN_NO: integer (nullable = true)
>  |-- DGT_ADV_DCD_NO: integer (nullable = true)
>  |-- ALL_ADV_MH_NO: integer (nullable = true)
>  |-- DGT_ADV_MH_NO: integer (nullable = true)
>  |-- DGT_ADV_READ_MH_NO: integer (nullable = true)
>  |-- DGT_ADV_SGN_MH_NO: integer (nullable = true)
>  |-- DGT_ADV_DCD_MH_NO: integer (nullable = true)
>  |-- ALL_ADV_YR_NO: integer (nullable = true)
>  |-- DGT_ADV_YR_NO: integer (nullable = true)
>  |-- DGT_ADV_READ_YR_NO: integer (nullable = true)
>  |-- DGT_ADV_SGN_YR_NO: integer (nullable = true)
>  |-- DGT_ADV_DCD_YR_NO: integer (nullable = true)
>  |-- DGT_ADV_RATE: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_RATE_CD: integer (nullable = true)
>  |-- DGT_ADV_READ_RATE: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_READ_RATE_CD: integer (nullable = true)
>  |-- DGT_ADV_DCD_RATE: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_DCD_RATE_CD: integer (nullable = true)
>  |-- DGT_ADV_RATE_MH: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_RATE_MH_DIFF: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_RATE_TRD: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_RATE_TRD_CD: string (nullable = true)
>  |-- DGT_ADV_SGN_RATE: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_SGN_RATE_CD: integer (nullable = true)
>  |-- DGT_ADV_READ_RATE_MH: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_READ_RATE_MH_DIFF: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_READ_RATE_TRD: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_READ_RATE_TRD_CD: string (nullable = true)
>  |-- DGT_ADV_SGN_RATE_MH: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_SGN_RATE_MH_DIFF: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_SGN_RATE_TRD: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_SGN_RATE_TRD_CD: string (nullable = true)
>  |-- DGT_ADV_DCD_RATE_MH: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_DCD_RATE_MH_DIFF: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_DCD_RATE_TRD: decimal(13,2) (nullable = true)
>  |-- DGT_ADV_DCD_RATE_TRD_CD: string (nullable = true)
>  |-- TTL_DG_SCR: string (nullable = true)
>  |-- TTL_DG_SCR_CD: string (nullable = true)
>  |-- TTL_DG_SCR_TRD: string (nullable = true)
>  |-- TTL_DG_SCR_TRD_CD: string (nullable = true)
>  |-- OLD_AVL_CLT_DATA_DT: date (nullable = true)
>  |-- CFT_COOK_CST_IC: integer (nullable = true)
>  |-- ITA_REF_DT: date (nullable = true)
>  |-- ALL_NFC_NO: integer (nullable = true)
>  |-- NFC_SHW_NO: integer (nullable = true)
>  |-- NFC_CLK_NO: integer (nullable = true)
>  |-- NFC_DEL_NO: integer (nullable = true)
>  |-- ALL_NFC_YR_NO: integer (nullable = true)
>  |-- NFC_SHW_YR_NO: integer (nullable = true)
>  |-- NFC_CLK_YR_NO: integer (nullable = true)
>  |-- NFC_DEL_YR_NO: integer (nullable = true)
>  |-- NFC_CLK_RATE: decimal(13,2) (nullable = true)
>  |-- NFC_CLK_RATE_CD: integer (nullable = true)
>  |-- ALL_NFC_MH_NO: integer (nullable = true)
>  |-- NFC_SHW_MH_NO: integer (nullable = true)
>  |-- NFC_CLK_MH_NO: integer (nullable = true)
>  |-- NFC_DEL_MH_NO: integer (nullable = true)
>  |-- NFC_CLK_RATE_MH: decimal(13,2) (nullable = true)
>  |-- NFC_CLK_RATE_MH_DIFF: integer (nullable = true)
>  |-- NFC_CLK_RATE_TRD: decimal(13,2) (nullable = true)
>  |-- NFC_CLK_RATE_TRD_CD: string (nullable = true)
>  |-- ACI_INV_HP_DB_NO: integer (nullable = true)
>  |-- ACI_INV_PRT_VAL_NO: integer (nullable = true)
>  |-- ACI_INV_PRT_PERF_NO: integer (nullable = true)
>  |-- ACI_INV_POS_LST_NO: integer (nullable = true)
>  |-- ACI_INV_PRD_DL_NO: integer (nullable = true)
>  |-- ACI_INV_SPRD_NO: integer (nullable = true)
>  |-- ACI_INV_STRY_NO: integer (nullable = true)
>  |-- ACI_INV_RSK_NO: integer (nullable = true)
>  |-- ACI_INV_HP_DB_YR_NO: integer (nullable = true)
>  |-- ACI_INV_PRT_VAL_YR_NO: integer (nullable = true)
>  |-- ACI_INV_PRT_PERF_YR_NO: integer (nullable = true)
>  |-- ACI_INV_POS_LST_YR_NO: integer (nullable = true)
>  |-- ACI_INV_PRD_DL_YR_NO: integer (nullable = true)
>  |-- ACI_INV_SPRD_YR_NO: integer (nullable = true)
>  |-- ACI_INV_STRY_YR_NO: integer (nullable = true)
>  |-- ACI_INV_RSK_YR_NO: integer (nullable = true)
>  |-- ACI_INV_HP_DB_MH_NO: integer (nullable = true)
>  |-- ACI_INV_PRT_VAL_MH_NO: integer (nullable = true)
>  |-- ACI_INV_PRT_PERF_MH_NO: integer (nullable = true)
>  |-- ACI_INV_POS_LST_MH_NO: integer (nullable = true)
>  |-- ACI_INV_PRD_DL_MH_NO: integer (nullable = true)
>  |-- ACI_INV_SPRD_MH_NO: integer (nullable = true)
>  |-- ACI_INV_STRY_MH_NO: integer (nullable = true)
>  |-- ACI_INV_RSK_MH_NO: integer (nullable = true)
>  |-- ACI_INV_HP_DB_MH_DIFF: integer (nullable = true)
>  |-- ACI_INV_PRT_VAL_MH_DIFF: integer (nullable = true)
>  |-- ACI_INV_PRT_PERF_MH_DIFF: integer (nullable = true)
>  |-- ACI_INV_POS_LST_MH_DIFF: integer (nullable = true)
>  |-- ACI_INV_PRD_DL_MH_DIFF: integer (nullable = true)
>  |-- ACI_INV_SPRD_MH_DIFF: integer (nullable = true)
>  |-- ACI_INV_STRY_MH_DIFF: integer (nullable = true)
>  |-- ACI_INV_RSK_MH_DIFF: integer (nullable = true)
>  |-- ACI_INV_HP_DB_YR_CD: integer (nullable = true)
>  |-- ACI_INV_PRT_VAL_YR_CD: integer (nullable = true)
>  |-- ACI_INV_PRT_PERF_YR_CD: integer (nullable = true)
>  |-- ACI_INV_POS_LST_YR_CD: integer (nullable = true)
>  |-- ACI_INV_PRD_DL_YR_CD: integer (nullable = true)
>  |-- ACI_INV_SPRD_YR_CD: integer (nullable = true)
>  |-- ACI_INV_STRY_YR_CD: integer (nullable = true)
>  |-- ACI_INV_RSK_YR_CD: integer (nullable = true)
>  |-- ACI_INV_HP_DB_TRD: decimal(13,2) (nullable = true)
>  |-- ACI_INV_PRT_VAL_TRD: decimal(13,2) (nullable = true)
>  |-- ACI_INV_PRT_PERF_TRD: decimal(13,2) (nullable = true)
>  |-- ACI_INV_POS_LST_TRD: decimal(13,2) (nullable = true)
>  |-- ACI_INV_PRD_DL_TRD: decimal(13,2) (nullable = true)
>  |-- ACI_INV_SPRD_TRD: decimal(13,2) (nullable = true)
>  |-- ACI_INV_STRY_TRD: decimal(13,2) (nullable = true)
>  |-- ACI_INV_RSK_TRD: decimal(13,2) (nullable = true)
>  |-- ACI_INV_HP_DB_TRD_CD: string (nullable = true)
>  |-- ACI_INV_PRT_VAL_TRD_CD: string (nullable = true)
>  |-- ACI_INV_PRT_PERF_TRD_CD: string (nullable = true)
>  |-- ACI_INV_POS_LST_TRD_CD: string (nullable = true)
>  |-- ACI_INV_PRD_DL_TRD_CD: string (nullable = true)
>  |-- ACI_INV_SPRD_TRD_CD: string (nullable = true)
>  |-- ACI_INV_STRY_TRD_CD: string (nullable = true)
>  |-- ACI_INV_RSK_TRD_CD: string (nullable = true)
>  |-- GDPR_REF_DT: date (nullable = true)
>  |-- SRC_CD: string (nullable = true)
>  |-- SRC_KEY: string (nullable = true)
>  |-- META: string (nullable = true)
>  |-- pdate: string (nullable = false)
>  |-- psource: string (nullable = false) {code}
> The schema of the pp_df dataframe:
> {code:java}
> root
>  |-- KBC__GDPR__PP__ANL_COOK_CST_TS: timestamp (nullable = true)
>  |-- KBC__GDPR__PP__MDL_COOK_CST_TS: timestamp (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__DRT_MKT_NFNC_PDC_COOK_CST_TS: timestamp (nullable = true)
>  |-- KBC__GDPR__PP__DRT_MKT_COOK_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__DATA_DTH_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_ETD_DRT_MKT_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CFT_COOK_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__DRT_MKT_FNC_PDC_COOK_CST_TS: timestamp (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__DRT_MKT_NFNC_PDC_COOK_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_CMC_MDL_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__MDL_COOK_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__DRT_MKT_FNC_PDC_COOK_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_CVN_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_ETD_CMC_PFL_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__ANL_COOK_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__DRT_MKT_COOK_CST_TS: timestamp (nullable = true)
>  |-- KBC__GDPR__PP__NFNC_PDC_PCM_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__ACS_PSD2_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CFT_COOK_CST_TS: timestamp (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_BAS_DRT_MKT_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_PTC_MKT_SVY_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__FNC_PDC_PCM_CST_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CLN_CDT_BAS_CMC_PFL_IC: integer (nullable = true)
>  |-- KBC__GDPR__PP__CPN_NO: string (nullable = true)
>  |-- KBC__GDPR__PP__PTY_CLT_NO: string (nullable = true) {code}
>  
> The key columns have the same type in both dfs.
>  
>  
> There are no None's in the key columns in either dataframe.
>  
>  
> The issue only appears for a specific 'psource=ADVICES' partition; it doesn't 
> appear for the other 'psource' partitions (which have the exact same schema).
>  
> The (logical) execution plan of the joined df:
>  
> {code:java}
> df_known_customers.explain(mode="formatted")
> == Physical Plan ==
> BroadcastHashJoin Inner BuildLeft (9)
> :- BroadcastExchange (4)
> :  +- Project (3)
> :     +- Filter (2)
> :        +- Scan parquet  (1)
> +- * Project (8)
>    +- * Filter (7)
>       +- * ColumnarToRow (6)
>          +- Scan parquet  (5)
> (1) Scan parquet 
> Output [128]: [PTY_CPN_GUID_NO#21, PTY_CLT_NO#22, PTY_CLT_NO_TKN_NM#23, 
> CPN_NO#24, ALL_ADV_NO#25, DGT_ADV_NO#26, DGT_ADV_READ_NO#27, 
> DGT_ADV_SGN_NO#28, DGT_ADV_DCD_NO#29, ALL_ADV_MH_NO#30, DGT_ADV_MH_NO#31, 
> DGT_ADV_READ_MH_NO#32, DGT_ADV_SGN_MH_NO#33, DGT_ADV_DCD_MH_NO#34, 
> ALL_ADV_YR_NO#35, DGT_ADV_YR_NO#36, DGT_ADV_READ_YR_NO#37, 
> DGT_ADV_SGN_YR_NO#38, DGT_ADV_DCD_YR_NO#39, DGT_ADV_RATE#40, 
> DGT_ADV_RATE_CD#41, DGT_ADV_READ_RATE#42, DGT_ADV_READ_RATE_CD#43, 
> DGT_ADV_DCD_RATE#44, DGT_ADV_DCD_RATE_CD#45, DGT_ADV_RATE_MH#46, 
> DGT_ADV_RATE_MH_DIFF#47, DGT_ADV_RATE_TRD#48, DGT_ADV_RATE_TRD_CD#49, 
> DGT_ADV_SGN_RATE#50, DGT_ADV_SGN_RATE_CD#51, DGT_ADV_READ_RATE_MH#52, 
> DGT_ADV_READ_RATE_MH_DIFF#53, DGT_ADV_READ_RATE_TRD#54, 
> DGT_ADV_READ_RATE_TRD_CD#55, DGT_ADV_SGN_RATE_MH#56, 
> DGT_ADV_SGN_RATE_MH_DIFF#57, DGT_ADV_SGN_RATE_TRD#58, 
> DGT_ADV_SGN_RATE_TRD_CD#59, DGT_ADV_DCD_RATE_MH#60, 
> DGT_ADV_DCD_RATE_MH_DIFF#61, DGT_ADV_DCD_RATE_TRD#62, 
> DGT_ADV_DCD_RATE_TRD_CD#63, TTL_DG_SCR#64, TTL_DG_SCR_CD#65, 
> TTL_DG_SCR_TRD#66, TTL_DG_SCR_TRD_CD#67, OLD_AVL_CLT_DATA_DT#68, 
> CFT_COOK_CST_IC#69, ITA_REF_DT#70, ALL_NFC_NO#71, NFC_SHW_NO#72, 
> NFC_CLK_NO#73, NFC_DEL_NO#74, ALL_NFC_YR_NO#75, NFC_SHW_YR_NO#76, 
> NFC_CLK_YR_NO#77, NFC_DEL_YR_NO#78, NFC_CLK_RATE#79, NFC_CLK_RATE_CD#80, 
> ALL_NFC_MH_NO#81, NFC_SHW_MH_NO#82, NFC_CLK_MH_NO#83, NFC_DEL_MH_NO#84, 
> NFC_CLK_RATE_MH#85, NFC_CLK_RATE_MH_DIFF#86, NFC_CLK_RATE_TRD#87, 
> NFC_CLK_RATE_TRD_CD#88, ACI_INV_HP_DB_NO#89, ACI_INV_PRT_VAL_NO#90, 
> ACI_INV_PRT_PERF_NO#91, ACI_INV_POS_LST_NO#92, ACI_INV_PRD_DL_NO#93, 
> ACI_INV_SPRD_NO#94, ACI_INV_STRY_NO#95, ACI_INV_RSK_NO#96, 
> ACI_INV_HP_DB_YR_NO#97, ACI_INV_PRT_VAL_YR_NO#98, ACI_INV_PRT_PERF_YR_NO#99, 
> ACI_INV_POS_LST_YR_NO#100, ACI_INV_PRD_DL_YR_NO#101, ACI_INV_SPRD_YR_NO#102, 
> ACI_INV_STRY_YR_NO#103, ACI_INV_RSK_YR_NO#104, ACI_INV_HP_DB_MH_NO#105, 
> ACI_INV_PRT_VAL_MH_NO#106, ACI_INV_PRT_PERF_MH_NO#107, 
> ACI_INV_POS_LST_MH_NO#108, ACI_INV_PRD_DL_MH_NO#109, ACI_INV_SPRD_MH_NO#110, 
> ACI_INV_STRY_MH_NO#111, ACI_INV_RSK_MH_NO#112, ACI_INV_HP_DB_MH_DIFF#113, 
> ACI_INV_PRT_VAL_MH_DIFF#114, ACI_INV_PRT_PERF_MH_DIFF#115, 
> ACI_INV_POS_LST_MH_DIFF#116, ACI_INV_PRD_DL_MH_DIFF#117, 
> ACI_INV_SPRD_MH_DIFF#118, ACI_INV_STRY_MH_DIFF#119, ACI_INV_RSK_MH_DIFF#120, 
> ACI_INV_HP_DB_YR_CD#121, ACI_INV_PRT_VAL_YR_CD#122, 
> ACI_INV_PRT_PERF_YR_CD#123, ACI_INV_POS_LST_YR_CD#124, 
> ACI_INV_PRD_DL_YR_CD#125, ACI_INV_SPRD_YR_CD#126, ACI_INV_STRY_YR_CD#127, 
> ACI_INV_RSK_YR_CD#128, ACI_INV_HP_DB_TRD#129, ACI_INV_PRT_VAL_TRD#130, 
> ACI_INV_PRT_PERF_TRD#131, ACI_INV_POS_LST_TRD#132, ACI_INV_PRD_DL_TRD#133, 
> ACI_INV_SPRD_TRD#134, ACI_INV_STRY_TRD#135, ACI_INV_RSK_TRD#136, 
> ACI_INV_HP_DB_TRD_CD#137, ACI_INV_PRT_VAL_TRD_CD#138, 
> ACI_INV_PRT_PERF_TRD_CD#139, ACI_INV_POS_LST_TRD_CD#140, 
> ACI_INV_PRD_DL_TRD_CD#141, ACI_INV_SPRD_TRD_CD#142, ACI_INV_STRY_TRD_CD#143, 
> ACI_INV_RSK_TRD_CD#144, GDPR_REF_DT#145, SRC_CD#146, SRC_KEY#147, META#148]
> Batched: false
> Location: InMemoryFileIndex 
> [s3://XXX/feature/dfpa/s3/partycompanyinvestmentdigiscorefeature.parquet/pdate=20241022/psource=ADVICES]
> ReadSchema: 
> struct<PTY_CPN_GUID_NO:string,PTY_CLT_NO:string,PTY_CLT_NO_TKN_NM:string,CPN_NO:string,ALL_ADV_NO:int,DGT_ADV_NO:int,DGT_ADV_READ_NO:int,DGT_ADV_SGN_NO:int,DGT_ADV_DCD_NO:int,ALL_ADV_MH_NO:int,DGT_ADV_MH_NO:int,DGT_ADV_READ_MH_NO:int,DGT_ADV_SGN_MH_NO:int,DGT_ADV_DCD_MH_NO:int,ALL_ADV_YR_NO:int,DGT_ADV_YR_NO:int,DGT_ADV_READ_YR_NO:int,DGT_ADV_SGN_YR_NO:int,DGT_ADV_DCD_YR_NO:int,DGT_ADV_RATE:decimal(13,2),DGT_ADV_RATE_CD:int,DGT_ADV_READ_RATE:decimal(13,2),DGT_ADV_READ_RATE_CD:int,DGT_ADV_DCD_RATE:decimal(13,2),DGT_ADV_DCD_RATE_CD:int,DGT_ADV_RATE_MH:decimal(13,2),DGT_ADV_RATE_MH_DIFF:decimal(13,2),DGT_ADV_RATE_TRD:decimal(13,2),DGT_ADV_RATE_TRD_CD:string,DGT_ADV_SGN_RATE:decimal(13,2),DGT_ADV_SGN_RATE_CD:int,DGT_ADV_READ_RATE_MH:decimal(13,2),DGT_ADV_READ_RATE_MH_DIFF:decimal(13,2),DGT_ADV_READ_RATE_TRD:decimal(13,2),DGT_ADV_READ_RATE_TRD_CD:string,DGT_ADV_SGN_RATE_MH:decimal(13,2),DGT_ADV_SGN_RATE_MH_DIFF:decimal(13,2),DGT_ADV_SGN_RATE_TRD:decimal(13,2),DGT_ADV_SGN_RATE_TRD_CD:string,DGT_ADV_DCD_RATE_MH:decimal(13,2),DGT_ADV_DCD_RATE_MH_DIFF:decimal(13,2),DGT_ADV_DCD_RATE_TRD:decimal(13,2),DGT_ADV_DCD_RATE_TRD_CD:string,TTL_DG_SCR:string,TTL_DG_SCR_CD:string,TTL_DG_SCR_TRD:string,TTL_DG_SCR_TRD_CD:string,OLD_AVL_CLT_DATA_DT:date,CFT_COOK_CST_IC:int,ITA_REF_DT:date,ALL_NFC_NO:int,NFC_SHW_NO:int,NFC_CLK_NO:int,NFC_DEL_NO:int,ALL_NFC_YR_NO:int,NFC_SHW_YR_NO:int,NFC_CLK_YR_NO:int,NFC_DEL_YR_NO:int,NFC_CLK_RATE:decimal(13,2),NFC_CLK_RATE_CD:int,ALL_NFC_MH_NO:int,NFC_SHW_MH_NO:int,NFC_CLK_MH_NO:int,NFC_DEL_MH_NO:int,NFC_CLK_RATE_MH:decimal(13,2),NFC_CLK_RATE_MH_DIFF:int,NFC_CLK_RATE_TRD:decimal(13,2),NFC_CLK_RATE_TRD_CD:string,ACI_INV_HP_DB_NO:int,ACI_INV_PRT_VAL_NO:int,ACI_INV_PRT_PERF_NO:int,ACI_INV_POS_LST_NO:int,ACI_INV_PRD_DL_NO:int,ACI_INV_SPRD_NO:int,ACI_INV_STRY_NO:int,ACI_INV_RSK_NO:int,ACI_INV_HP_DB_YR_NO:int,ACI_INV_PRT_VAL_YR_NO:int,ACI_INV_PRT_PERF_YR_NO:int,ACI_INV_POS_LST_YR_NO:int,ACI_INV_PRD_DL_YR_NO:int,ACI_INV_SPRD_YR_NO:int,ACI_INV_STRY_YR_NO:int,ACI_INV_RSK_YR_NO:int,ACI_INV_HP_DB_MH_NO:int,ACI_INV_PRT_VAL_MH_NO:int,ACI_INV_PRT_PERF_MH_NO:int,ACI_INV_POS_LST_MH_NO:int,ACI_INV_PRD_DL_MH_NO:int,ACI_INV_SPRD_MH_NO:int,ACI_INV_STRY_MH_NO:int,ACI_INV_RSK_MH_NO:int,ACI_INV_HP_DB_MH_DIFF:int,ACI_INV_PRT_VAL_MH_DIFF:int,ACI_INV_PRT_PERF_MH_DIFF:int,ACI_INV_POS_LST_MH_DIFF:int,ACI_INV_PRD_DL_MH_DIFF:int,ACI_INV_SPRD_MH_DIFF:int,ACI_INV_STRY_MH_DIFF:int,ACI_INV_RSK_MH_DIFF:int,ACI_INV_HP_DB_YR_CD:int,ACI_INV_PRT_VAL_YR_CD:int,ACI_INV_PRT_PERF_YR_CD:int,ACI_INV_POS_LST_YR_CD:int,ACI_INV_PRD_DL_YR_CD:int,ACI_INV_SPRD_YR_CD:int,ACI_INV_STRY_YR_CD:int,ACI_INV_RSK_YR_CD:int,ACI_INV_HP_DB_TRD:decimal(13,2),ACI_INV_PRT_VAL_TRD:decimal(13,2),ACI_INV_PRT_PERF_TRD:decimal(13,2),ACI_INV_POS_LST_TRD:decimal(13,2),ACI_INV_PRD_DL_TRD:decimal(13,2),ACI_INV_SPRD_TRD:decimal(13,2),ACI_INV_STRY_TRD:decimal(13,2),ACI_INV_RSK_TRD:decimal(13,2),ACI_INV_HP_DB_TRD_CD:string,ACI_INV_PRT_VAL_TRD_CD:string,ACI_INV_PRT_PERF_TRD_CD:string,ACI_INV_POS_LST_TRD_CD:string,ACI_INV_PRD_DL_TRD_CD:string,ACI_INV_SPRD_TRD_CD:string,ACI_INV_STRY_TRD_CD:string,ACI_INV_RSK_TRD_CD:string,GDPR_REF_DT:date,SRC_CD:string,SRC_KEY:string,META:string>
> (2) Filter
> Input [128]: [PTY_CPN_GUID_NO#21, PTY_CLT_NO#22, PTY_CLT_NO_TKN_NM#23, 
> CPN_NO#24, ALL_ADV_NO#25, DGT_ADV_NO#26, DGT_ADV_READ_NO#27, 
> DGT_ADV_SGN_NO#28, DGT_ADV_DCD_NO#29, ALL_ADV_MH_NO#30, DGT_ADV_MH_NO#31, 
> DGT_ADV_READ_MH_NO#32, DGT_ADV_SGN_MH_NO#33, DGT_ADV_DCD_MH_NO#34, 
> ALL_ADV_YR_NO#35, DGT_ADV_YR_NO#36, DGT_ADV_READ_YR_NO#37, 
> DGT_ADV_SGN_YR_NO#38, DGT_ADV_DCD_YR_NO#39, DGT_ADV_RATE#40, 
> DGT_ADV_RATE_CD#41, DGT_ADV_READ_RATE#42, DGT_ADV_READ_RATE_CD#43, 
> DGT_ADV_DCD_RATE#44, DGT_ADV_DCD_RATE_CD#45, DGT_ADV_RATE_MH#46, 
> DGT_ADV_RATE_MH_DIFF#47, DGT_ADV_RATE_TRD#48, DGT_ADV_RATE_TRD_CD#49, 
> DGT_ADV_SGN_RATE#50, DGT_ADV_SGN_RATE_CD#51, DGT_ADV_READ_RATE_MH#52, 
> DGT_ADV_READ_RATE_MH_DIFF#53, DGT_ADV_READ_RATE_TRD#54, 
> DGT_ADV_READ_RATE_TRD_CD#55, DGT_ADV_SGN_RATE_MH#56, 
> DGT_ADV_SGN_RATE_MH_DIFF#57, DGT_ADV_SGN_RATE_TRD#58, 
> DGT_ADV_SGN_RATE_TRD_CD#59, DGT_ADV_DCD_RATE_MH#60, 
> DGT_ADV_DCD_RATE_MH_DIFF#61, DGT_ADV_DCD_RATE_TRD#62, 
> DGT_ADV_DCD_RATE_TRD_CD#63, TTL_DG_SCR#64, TTL_DG_SCR_CD#65, 
> TTL_DG_SCR_TRD#66, TTL_DG_SCR_TRD_CD#67, OLD_AVL_CLT_DATA_DT#68, 
> CFT_COOK_CST_IC#69, ITA_REF_DT#70, ALL_NFC_NO#71, NFC_SHW_NO#72, 
> NFC_CLK_NO#73, NFC_DEL_NO#74, ALL_NFC_YR_NO#75, NFC_SHW_YR_NO#76, 
> NFC_CLK_YR_NO#77, NFC_DEL_YR_NO#78, NFC_CLK_RATE#79, NFC_CLK_RATE_CD#80, 
> ALL_NFC_MH_NO#81, NFC_SHW_MH_NO#82, NFC_CLK_MH_NO#83, NFC_DEL_MH_NO#84, 
> NFC_CLK_RATE_MH#85, NFC_CLK_RATE_MH_DIFF#86, NFC_CLK_RATE_TRD#87, 
> NFC_CLK_RATE_TRD_CD#88, ACI_INV_HP_DB_NO#89, ACI_INV_PRT_VAL_NO#90, 
> ACI_INV_PRT_PERF_NO#91, ACI_INV_POS_LST_NO#92, ACI_INV_PRD_DL_NO#93, 
> ACI_INV_SPRD_NO#94, ACI_INV_STRY_NO#95, ACI_INV_RSK_NO#96, 
> ACI_INV_HP_DB_YR_NO#97, ACI_INV_PRT_VAL_YR_NO#98, ACI_INV_PRT_PERF_YR_NO#99, 
> ACI_INV_POS_LST_YR_NO#100, ACI_INV_PRD_DL_YR_NO#101, ACI_INV_SPRD_YR_NO#102, 
> ACI_INV_STRY_YR_NO#103, ACI_INV_RSK_YR_NO#104, ACI_INV_HP_DB_MH_NO#105, 
> ACI_INV_PRT_VAL_MH_NO#106, ACI_INV_PRT_PERF_MH_NO#107, 
> ACI_INV_POS_LST_MH_NO#108, ACI_INV_PRD_DL_MH_NO#109, ACI_INV_SPRD_MH_NO#110, 
> ACI_INV_STRY_MH_NO#111, ACI_INV_RSK_MH_NO#112, ACI_INV_HP_DB_MH_DIFF#113, 
> ACI_INV_PRT_VAL_MH_DIFF#114, ACI_INV_PRT_PERF_MH_DIFF#115, 
> ACI_INV_POS_LST_MH_DIFF#116, ACI_INV_PRD_DL_MH_DIFF#117, 
> ACI_INV_SPRD_MH_DIFF#118, ACI_INV_STRY_MH_DIFF#119, ACI_INV_RSK_MH_DIFF#120, 
> ACI_INV_HP_DB_YR_CD#121, ACI_INV_PRT_VAL_YR_CD#122, 
> ACI_INV_PRT_PERF_YR_CD#123, ACI_INV_POS_LST_YR_CD#124, 
> ACI_INV_PRD_DL_YR_CD#125, ACI_INV_SPRD_YR_CD#126, ACI_INV_STRY_YR_CD#127, 
> ACI_INV_RSK_YR_CD#128, ACI_INV_HP_DB_TRD#129, ACI_INV_PRT_VAL_TRD#130, 
> ACI_INV_PRT_PERF_TRD#131, ACI_INV_POS_LST_TRD#132, ACI_INV_PRD_DL_TRD#133, 
> ACI_INV_SPRD_TRD#134, ACI_INV_STRY_TRD#135, ACI_INV_RSK_TRD#136, 
> ACI_INV_HP_DB_TRD_CD#137, ACI_INV_PRT_VAL_TRD_CD#138, 
> ACI_INV_PRT_PERF_TRD_CD#139, ACI_INV_POS_LST_TRD_CD#140, 
> ACI_INV_PRD_DL_TRD_CD#141, ACI_INV_SPRD_TRD_CD#142, ACI_INV_STRY_TRD_CD#143, 
> ACI_INV_RSK_TRD_CD#144, GDPR_REF_DT#145, SRC_CD#146, SRC_KEY#147, META#148]
> Condition : atleastnnonnulls(2, PTY_CLT_NO#22, CPN_NO#24)
> (3) Project
> Output [132]: [PTY_CLT_NO#22, CPN_NO#24, PTY_CPN_GUID_NO#21, PTY_CLT_NO#22, 
> PTY_CLT_NO_TKN_NM#23, CPN_NO#24, ALL_ADV_NO#25, DGT_ADV_NO#26, 
> DGT_ADV_READ_NO#27, DGT_ADV_SGN_NO#28, DGT_ADV_DCD_NO#29, ALL_ADV_MH_NO#30, 
> DGT_ADV_MH_NO#31, DGT_ADV_READ_MH_NO#32, DGT_ADV_SGN_MH_NO#33, 
> DGT_ADV_DCD_MH_NO#34, ALL_ADV_YR_NO#35, DGT_ADV_YR_NO#36, 
> DGT_ADV_READ_YR_NO#37, DGT_ADV_SGN_YR_NO#38, DGT_ADV_DCD_YR_NO#39, 
> DGT_ADV_RATE#40, DGT_ADV_RATE_CD#41, DGT_ADV_READ_RATE#42, 
> DGT_ADV_READ_RATE_CD#43, DGT_ADV_DCD_RATE#44, DGT_ADV_DCD_RATE_CD#45, 
> DGT_ADV_RATE_MH#46, DGT_ADV_RATE_MH_DIFF#47, DGT_ADV_RATE_TRD#48, 
> DGT_ADV_RATE_TRD_CD#49, DGT_ADV_SGN_RATE#50, DGT_ADV_SGN_RATE_CD#51, 
> DGT_ADV_READ_RATE_MH#52, DGT_ADV_READ_RATE_MH_DIFF#53, 
> DGT_ADV_READ_RATE_TRD#54, DGT_ADV_READ_RATE_TRD_CD#55, 
> DGT_ADV_SGN_RATE_MH#56, DGT_ADV_SGN_RATE_MH_DIFF#57, DGT_ADV_SGN_RATE_TRD#58, 
> DGT_ADV_SGN_RATE_TRD_CD#59, DGT_ADV_DCD_RATE_MH#60, 
> DGT_ADV_DCD_RATE_MH_DIFF#61, DGT_ADV_DCD_RATE_TRD#62, 
> DGT_ADV_DCD_RATE_TRD_CD#63, TTL_DG_SCR#64, TTL_DG_SCR_CD#65, 
> TTL_DG_SCR_TRD#66, TTL_DG_SCR_TRD_CD#67, OLD_AVL_CLT_DATA_DT#68, 
> CFT_COOK_CST_IC#69, ITA_REF_DT#70, ALL_NFC_NO#71, NFC_SHW_NO#72, 
> NFC_CLK_NO#73, NFC_DEL_NO#74, ALL_NFC_YR_NO#75, NFC_SHW_YR_NO#76, 
> NFC_CLK_YR_NO#77, NFC_DEL_YR_NO#78, NFC_CLK_RATE#79, NFC_CLK_RATE_CD#80, 
> ALL_NFC_MH_NO#81, NFC_SHW_MH_NO#82, NFC_CLK_MH_NO#83, NFC_DEL_MH_NO#84, 
> NFC_CLK_RATE_MH#85, NFC_CLK_RATE_MH_DIFF#86, NFC_CLK_RATE_TRD#87, 
> NFC_CLK_RATE_TRD_CD#88, ACI_INV_HP_DB_NO#89, ACI_INV_PRT_VAL_NO#90, 
> ACI_INV_PRT_PERF_NO#91, ACI_INV_POS_LST_NO#92, ACI_INV_PRD_DL_NO#93, 
> ACI_INV_SPRD_NO#94, ACI_INV_STRY_NO#95, ACI_INV_RSK_NO#96, 
> ACI_INV_HP_DB_YR_NO#97, ACI_INV_PRT_VAL_YR_NO#98, ACI_INV_PRT_PERF_YR_NO#99, 
> ACI_INV_POS_LST_YR_NO#100, ACI_INV_PRD_DL_YR_NO#101, ACI_INV_SPRD_YR_NO#102, 
> ACI_INV_STRY_YR_NO#103, ACI_INV_RSK_YR_NO#104, ACI_INV_HP_DB_MH_NO#105, 
> ACI_INV_PRT_VAL_MH_NO#106, ACI_INV_PRT_PERF_MH_NO#107, 
> ACI_INV_POS_LST_MH_NO#108, ACI_INV_PRD_DL_MH_NO#109, ACI_INV_SPRD_MH_NO#110, 
> ACI_INV_STRY_MH_NO#111, ACI_INV_RSK_MH_NO#112, ACI_INV_HP_DB_MH_DIFF#113, 
> ACI_INV_PRT_VAL_MH_DIFF#114, ACI_INV_PRT_PERF_MH_DIFF#115, 
> ACI_INV_POS_LST_MH_DIFF#116, ACI_INV_PRD_DL_MH_DIFF#117, 
> ACI_INV_SPRD_MH_DIFF#118, ACI_INV_STRY_MH_DIFF#119, ACI_INV_RSK_MH_DIFF#120, 
> ACI_INV_HP_DB_YR_CD#121, ACI_INV_PRT_VAL_YR_CD#122, 
> ACI_INV_PRT_PERF_YR_CD#123, ACI_INV_POS_LST_YR_CD#124, 
> ACI_INV_PRD_DL_YR_CD#125, ACI_INV_SPRD_YR_CD#126, ACI_INV_STRY_YR_CD#127, 
> ACI_INV_RSK_YR_CD#128, ACI_INV_HP_DB_TRD#129, ACI_INV_PRT_VAL_TRD#130, 
> ACI_INV_PRT_PERF_TRD#131, ACI_INV_POS_LST_TRD#132, ACI_INV_PRD_DL_TRD#133, 
> ACI_INV_SPRD_TRD#134, ACI_INV_STRY_TRD#135, ACI_INV_RSK_TRD#136, 
> ACI_INV_HP_DB_TRD_CD#137, ACI_INV_PRT_VAL_TRD_CD#138, 
> ACI_INV_PRT_PERF_TRD_CD#139, ACI_INV_POS_LST_TRD_CD#140, 
> ACI_INV_PRD_DL_TRD_CD#141, ACI_INV_SPRD_TRD_CD#142, ACI_INV_STRY_TRD_CD#143, 
> ACI_INV_RSK_TRD_CD#144, GDPR_REF_DT#145, SRC_CD#146, SRC_KEY#147, META#148, 
> 20241022 AS pdate#277, ADVICES AS psource#408]
> Input [128]: [PTY_CPN_GUID_NO#21, PTY_CLT_NO#22, PTY_CLT_NO_TKN_NM#23, 
> CPN_NO#24, ALL_ADV_NO#25, DGT_ADV_NO#26, DGT_ADV_READ_NO#27, 
> DGT_ADV_SGN_NO#28, DGT_ADV_DCD_NO#29, ALL_ADV_MH_NO#30, DGT_ADV_MH_NO#31, 
> DGT_ADV_READ_MH_NO#32, DGT_ADV_SGN_MH_NO#33, DGT_ADV_DCD_MH_NO#34, 
> ALL_ADV_YR_NO#35, DGT_ADV_YR_NO#36, DGT_ADV_READ_YR_NO#37, 
> DGT_ADV_SGN_YR_NO#38, DGT_ADV_DCD_YR_NO#39, DGT_ADV_RATE#40, 
> DGT_ADV_RATE_CD#41, DGT_ADV_READ_RATE#42, DGT_ADV_READ_RATE_CD#43, 
> DGT_ADV_DCD_RATE#44, DGT_ADV_DCD_RATE_CD#45, DGT_ADV_RATE_MH#46, 
> DGT_ADV_RATE_MH_DIFF#47, DGT_ADV_RATE_TRD#48, DGT_ADV_RATE_TRD_CD#49, 
> DGT_ADV_SGN_RATE#50, DGT_ADV_SGN_RATE_CD#51, DGT_ADV_READ_RATE_MH#52, 
> DGT_ADV_READ_RATE_MH_DIFF#53, DGT_ADV_READ_RATE_TRD#54, 
> DGT_ADV_READ_RATE_TRD_CD#55, DGT_ADV_SGN_RATE_MH#56, 
> DGT_ADV_SGN_RATE_MH_DIFF#57, DGT_ADV_SGN_RATE_TRD#58, 
> DGT_ADV_SGN_RATE_TRD_CD#59, DGT_ADV_DCD_RATE_MH#60, 
> DGT_ADV_DCD_RATE_MH_DIFF#61, DGT_ADV_DCD_RATE_TRD#62, 
> DGT_ADV_DCD_RATE_TRD_CD#63, TTL_DG_SCR#64, TTL_DG_SCR_CD#65, 
> TTL_DG_SCR_TRD#66, TTL_DG_SCR_TRD_CD#67, OLD_AVL_CLT_DATA_DT#68, 
> CFT_COOK_CST_IC#69, ITA_REF_DT#70, ALL_NFC_NO#71, NFC_SHW_NO#72, 
> NFC_CLK_NO#73, NFC_DEL_NO#74, ALL_NFC_YR_NO#75, NFC_SHW_YR_NO#76, 
> NFC_CLK_YR_NO#77, NFC_DEL_YR_NO#78, NFC_CLK_RATE#79, NFC_CLK_RATE_CD#80, 
> ALL_NFC_MH_NO#81, NFC_SHW_MH_NO#82, NFC_CLK_MH_NO#83, NFC_DEL_MH_NO#84, 
> NFC_CLK_RATE_MH#85, NFC_CLK_RATE_MH_DIFF#86, NFC_CLK_RATE_TRD#87, 
> NFC_CLK_RATE_TRD_CD#88, ACI_INV_HP_DB_NO#89, ACI_INV_PRT_VAL_NO#90, 
> ACI_INV_PRT_PERF_NO#91, ACI_INV_POS_LST_NO#92, ACI_INV_PRD_DL_NO#93, 
> ACI_INV_SPRD_NO#94, ACI_INV_STRY_NO#95, ACI_INV_RSK_NO#96, 
> ACI_INV_HP_DB_YR_NO#97, ACI_INV_PRT_VAL_YR_NO#98, ACI_INV_PRT_PERF_YR_NO#99, 
> ACI_INV_POS_LST_YR_NO#100, ACI_INV_PRD_DL_YR_NO#101, ACI_INV_SPRD_YR_NO#102, 
> ACI_INV_STRY_YR_NO#103, ACI_INV_RSK_YR_NO#104, ACI_INV_HP_DB_MH_NO#105, 
> ACI_INV_PRT_VAL_MH_NO#106, ACI_INV_PRT_PERF_MH_NO#107, 
> ACI_INV_POS_LST_MH_NO#108, ACI_INV_PRD_DL_MH_NO#109, ACI_INV_SPRD_MH_NO#110, 
> ACI_INV_STRY_MH_NO#111, ACI_INV_RSK_MH_NO#112, ACI_INV_HP_DB_MH_DIFF#113, 
> ACI_INV_PRT_VAL_MH_DIFF#114, ACI_INV_PRT_PERF_MH_DIFF#115, 
> ACI_INV_POS_LST_MH_DIFF#116, ACI_INV_PRD_DL_MH_DIFF#117, 
> ACI_INV_SPRD_MH_DIFF#118, ACI_INV_STRY_MH_DIFF#119, ACI_INV_RSK_MH_DIFF#120, 
> ACI_INV_HP_DB_YR_CD#121, ACI_INV_PRT_VAL_YR_CD#122, 
> ACI_INV_PRT_PERF_YR_CD#123, ACI_INV_POS_LST_YR_CD#124, 
> ACI_INV_PRD_DL_YR_CD#125, ACI_INV_SPRD_YR_CD#126, ACI_INV_STRY_YR_CD#127, 
> ACI_INV_RSK_YR_CD#128, ACI_INV_HP_DB_TRD#129, ACI_INV_PRT_VAL_TRD#130, 
> ACI_INV_PRT_PERF_TRD#131, ACI_INV_POS_LST_TRD#132, ACI_INV_PRD_DL_TRD#133, 
> ACI_INV_SPRD_TRD#134, ACI_INV_STRY_TRD#135, ACI_INV_RSK_TRD#136, 
> ACI_INV_HP_DB_TRD_CD#137, ACI_INV_PRT_VAL_TRD_CD#138, 
> ACI_INV_PRT_PERF_TRD_CD#139, ACI_INV_POS_LST_TRD_CD#140, 
> ACI_INV_PRD_DL_TRD_CD#141, ACI_INV_SPRD_TRD_CD#142, ACI_INV_STRY_TRD_CD#143, 
> ACI_INV_RSK_TRD_CD#144, GDPR_REF_DT#145, SRC_CD#146, SRC_KEY#147, META#148]
> (4) BroadcastExchange
> Input [132]: [PTY_CLT_NO#22, CPN_NO#24, PTY_CPN_GUID_NO#21, PTY_CLT_NO#22, 
> PTY_CLT_NO_TKN_NM#23, CPN_NO#24, ALL_ADV_NO#25, DGT_ADV_NO#26, 
> DGT_ADV_READ_NO#27, DGT_ADV_SGN_NO#28, DGT_ADV_DCD_NO#29, ALL_ADV_MH_NO#30, 
> DGT_ADV_MH_NO#31, DGT_ADV_READ_MH_NO#32, DGT_ADV_SGN_MH_NO#33, 
> DGT_ADV_DCD_MH_NO#34, ALL_ADV_YR_NO#35, DGT_ADV_YR_NO#36, 
> DGT_ADV_READ_YR_NO#37, DGT_ADV_SGN_YR_NO#38, DGT_ADV_DCD_YR_NO#39, 
> DGT_ADV_RATE#40, DGT_ADV_RATE_CD#41, DGT_ADV_READ_RATE#42, 
> DGT_ADV_READ_RATE_CD#43, DGT_ADV_DCD_RATE#44, DGT_ADV_DCD_RATE_CD#45, 
> DGT_ADV_RATE_MH#46, DGT_ADV_RATE_MH_DIFF#47, DGT_ADV_RATE_TRD#48, 
> DGT_ADV_RATE_TRD_CD#49, DGT_ADV_SGN_RATE#50, DGT_ADV_SGN_RATE_CD#51, 
> DGT_ADV_READ_RATE_MH#52, DGT_ADV_READ_RATE_MH_DIFF#53, 
> DGT_ADV_READ_RATE_TRD#54, DGT_ADV_READ_RATE_TRD_CD#55, 
> DGT_ADV_SGN_RATE_MH#56, DGT_ADV_SGN_RATE_MH_DIFF#57, DGT_ADV_SGN_RATE_TRD#58, 
> DGT_ADV_SGN_RATE_TRD_CD#59, DGT_ADV_DCD_RATE_MH#60, 
> DGT_ADV_DCD_RATE_MH_DIFF#61, DGT_ADV_DCD_RATE_TRD#62, 
> DGT_ADV_DCD_RATE_TRD_CD#63, TTL_DG_SCR#64, TTL_DG_SCR_CD#65, 
> TTL_DG_SCR_TRD#66, TTL_DG_SCR_TRD_CD#67, OLD_AVL_CLT_DATA_DT#68, 
> CFT_COOK_CST_IC#69, ITA_REF_DT#70, ALL_NFC_NO#71, NFC_SHW_NO#72, 
> NFC_CLK_NO#73, NFC_DEL_NO#74, ALL_NFC_YR_NO#75, NFC_SHW_YR_NO#76, 
> NFC_CLK_YR_NO#77, NFC_DEL_YR_NO#78, NFC_CLK_RATE#79, NFC_CLK_RATE_CD#80, 
> ALL_NFC_MH_NO#81, NFC_SHW_MH_NO#82, NFC_CLK_MH_NO#83, NFC_DEL_MH_NO#84, 
> NFC_CLK_RATE_MH#85, NFC_CLK_RATE_MH_DIFF#86, NFC_CLK_RATE_TRD#87, 
> NFC_CLK_RATE_TRD_CD#88, ACI_INV_HP_DB_NO#89, ACI_INV_PRT_VAL_NO#90, 
> ACI_INV_PRT_PERF_NO#91, ACI_INV_POS_LST_NO#92, ACI_INV_PRD_DL_NO#93, 
> ACI_INV_SPRD_NO#94, ACI_INV_STRY_NO#95, ACI_INV_RSK_NO#96, 
> ACI_INV_HP_DB_YR_NO#97, ACI_INV_PRT_VAL_YR_NO#98, ACI_INV_PRT_PERF_YR_NO#99, 
> ACI_INV_POS_LST_YR_NO#100, ACI_INV_PRD_DL_YR_NO#101, ACI_INV_SPRD_YR_NO#102, 
> ACI_INV_STRY_YR_NO#103, ACI_INV_RSK_YR_NO#104, ACI_INV_HP_DB_MH_NO#105, 
> ACI_INV_PRT_VAL_MH_NO#106, ACI_INV_PRT_PERF_MH_NO#107, 
> ACI_INV_POS_LST_MH_NO#108, ACI_INV_PRD_DL_MH_NO#109, ACI_INV_SPRD_MH_NO#110, 
> ACI_INV_STRY_MH_NO#111, ACI_INV_RSK_MH_NO#112, ACI_INV_HP_DB_MH_DIFF#113, 
> ACI_INV_PRT_VAL_MH_DIFF#114, ACI_INV_PRT_PERF_MH_DIFF#115, 
> ACI_INV_POS_LST_MH_DIFF#116, ACI_INV_PRD_DL_MH_DIFF#117, 
> ACI_INV_SPRD_MH_DIFF#118, ACI_INV_STRY_MH_DIFF#119, ACI_INV_RSK_MH_DIFF#120, 
> ACI_INV_HP_DB_YR_CD#121, ACI_INV_PRT_VAL_YR_CD#122, 
> ACI_INV_PRT_PERF_YR_CD#123, ACI_INV_POS_LST_YR_CD#124, 
> ACI_INV_PRD_DL_YR_CD#125, ACI_INV_SPRD_YR_CD#126, ACI_INV_STRY_YR_CD#127, 
> ACI_INV_RSK_YR_CD#128, ACI_INV_HP_DB_TRD#129, ACI_INV_PRT_VAL_TRD#130, 
> ACI_INV_PRT_PERF_TRD#131, ACI_INV_POS_LST_TRD#132, ACI_INV_PRD_DL_TRD#133, 
> ACI_INV_SPRD_TRD#134, ACI_INV_STRY_TRD#135, ACI_INV_RSK_TRD#136, 
> ACI_INV_HP_DB_TRD_CD#137, ACI_INV_PRT_VAL_TRD_CD#138, 
> ACI_INV_PRT_PERF_TRD_CD#139, ACI_INV_POS_LST_TRD_CD#140, 
> ACI_INV_PRD_DL_TRD_CD#141, ACI_INV_SPRD_TRD_CD#142, ACI_INV_STRY_TRD_CD#143, 
> ACI_INV_RSK_TRD_CD#144, GDPR_REF_DT#145, SRC_CD#146, SRC_KEY#147, META#148, 
> pdate#277, psource#408]
> Arguments: HashedRelationBroadcastMode(List(coalesce(input[0, string, true], 
> ), isnull(input[0, string, true]), coalesce(input[1, string, true], ), 
> isnull(input[1, string, true])),false), [plan_id=476]
> (5) Scan parquet 
> Output [27]: [PTY_CLT_NO#805, CPN_NO#806, MDL_COOK_CST_IC#808, 
> MDL_COOK_CST_TS#809, DRT_MKT_FNC_PDC_COOK_CST_IC#810, 
> DRT_MKT_FNC_PDC_COOK_CST_TS#811, DRT_MKT_NFNC_PDC_COOK_CST_IC#812, 
> DRT_MKT_NFNC_PDC_COOK_CST_TS#813, DRT_MKT_COOK_CST_IC#815, 
> DRT_MKT_COOK_CST_TS#816, CFT_COOK_CST_IC#829, CFT_COOK_CST_TS#830, 
> ANL_COOK_CST_IC#831, ANL_COOK_CST_TS#832, FNC_PDC_PCM_CST_IC#840, 
> NFNC_PDC_PCM_CST_IC#841, DATA_DTH_IC#872, ACS_PSD2_CST_IC#931, 
> CLN_CDT_CMC_MDL_IC#956, CLN_CDT_BAS_CMC_PFL_IC#957, 
> CLN_CDT_BAS_DRT_MKT_IC#958, CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC#959, 
> CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC#960, CLN_CDT_ETD_DRT_MKT_IC#961, 
> CLN_CDT_ETD_CMC_PFL_IC#962, CLN_CDT_CVN_IC#963, CLN_CDT_PTC_MKT_SVY_IC#964]
> Batched: true
> Location: InMemoryFileIndex 
> [s3://XXX/master/dfpa/s3/partyprivacy.parquet/pdate=20241125/psource=DCTL_BIS]
> ReadSchema: 
> struct<PTY_CLT_NO:string,CPN_NO:string,MDL_COOK_CST_IC:int,MDL_COOK_CST_TS:timestamp,DRT_MKT_FNC_PDC_COOK_CST_IC:int,DRT_MKT_FNC_PDC_COOK_CST_TS:timestamp,DRT_MKT_NFNC_PDC_COOK_CST_IC:int,DRT_MKT_NFNC_PDC_COOK_CST_TS:timestamp,DRT_MKT_COOK_CST_IC:int,DRT_MKT_COOK_CST_TS:timestamp,CFT_COOK_CST_IC:int,CFT_COOK_CST_TS:timestamp,ANL_COOK_CST_IC:int,ANL_COOK_CST_TS:timestamp,FNC_PDC_PCM_CST_IC:int,NFNC_PDC_PCM_CST_IC:int,DATA_DTH_IC:int,ACS_PSD2_CST_IC:int,CLN_CDT_CMC_MDL_IC:int,CLN_CDT_BAS_CMC_PFL_IC:int,CLN_CDT_BAS_DRT_MKT_IC:int,CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC:int,CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC:int,CLN_CDT_ETD_DRT_MKT_IC:int,CLN_CDT_ETD_CMC_PFL_IC:int,CLN_CDT_CVN_IC:int,CLN_CDT_PTC_MKT_SVY_IC:int>
> (6) ColumnarToRow [codegen id : 1]
> Input [27]: [PTY_CLT_NO#805, CPN_NO#806, MDL_COOK_CST_IC#808, 
> MDL_COOK_CST_TS#809, DRT_MKT_FNC_PDC_COOK_CST_IC#810, 
> DRT_MKT_FNC_PDC_COOK_CST_TS#811, DRT_MKT_NFNC_PDC_COOK_CST_IC#812, 
> DRT_MKT_NFNC_PDC_COOK_CST_TS#813, DRT_MKT_COOK_CST_IC#815, 
> DRT_MKT_COOK_CST_TS#816, CFT_COOK_CST_IC#829, CFT_COOK_CST_TS#830, 
> ANL_COOK_CST_IC#831, ANL_COOK_CST_TS#832, FNC_PDC_PCM_CST_IC#840, 
> NFNC_PDC_PCM_CST_IC#841, DATA_DTH_IC#872, ACS_PSD2_CST_IC#931, 
> CLN_CDT_CMC_MDL_IC#956, CLN_CDT_BAS_CMC_PFL_IC#957, 
> CLN_CDT_BAS_DRT_MKT_IC#958, CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC#959, 
> CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC#960, CLN_CDT_ETD_DRT_MKT_IC#961, 
> CLN_CDT_ETD_CMC_PFL_IC#962, CLN_CDT_CVN_IC#963, CLN_CDT_PTC_MKT_SVY_IC#964]
> (7) Filter [codegen id : 1]
> Input [27]: [PTY_CLT_NO#805, CPN_NO#806, MDL_COOK_CST_IC#808, 
> MDL_COOK_CST_TS#809, DRT_MKT_FNC_PDC_COOK_CST_IC#810, 
> DRT_MKT_FNC_PDC_COOK_CST_TS#811, DRT_MKT_NFNC_PDC_COOK_CST_IC#812, 
> DRT_MKT_NFNC_PDC_COOK_CST_TS#813, DRT_MKT_COOK_CST_IC#815, 
> DRT_MKT_COOK_CST_TS#816, CFT_COOK_CST_IC#829, CFT_COOK_CST_TS#830, 
> ANL_COOK_CST_IC#831, ANL_COOK_CST_TS#832, FNC_PDC_PCM_CST_IC#840, 
> NFNC_PDC_PCM_CST_IC#841, DATA_DTH_IC#872, ACS_PSD2_CST_IC#931, 
> CLN_CDT_CMC_MDL_IC#956, CLN_CDT_BAS_CMC_PFL_IC#957, 
> CLN_CDT_BAS_DRT_MKT_IC#958, CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC#959, 
> CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC#960, CLN_CDT_ETD_DRT_MKT_IC#961, 
> CLN_CDT_ETD_CMC_PFL_IC#962, CLN_CDT_CVN_IC#963, CLN_CDT_PTC_MKT_SVY_IC#964]
> Condition : atleastnnonnulls(2, PTY_CLT_NO#805, CPN_NO#806)
> (8) Project [codegen id : 1]
> Output [27]: [ANL_COOK_CST_TS#832 AS KBC__GDPR__PP__ANL_COOK_CST_TS#1160, 
> MDL_COOK_CST_TS#809 AS KBC__GDPR__PP__MDL_COOK_CST_TS#1188, 
> CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC#960 AS 
> KBC__GDPR__PP__CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC#1216, 
> DRT_MKT_NFNC_PDC_COOK_CST_TS#813 AS 
> KBC__GDPR__PP__DRT_MKT_NFNC_PDC_COOK_CST_TS#1244, DRT_MKT_COOK_CST_IC#815 AS 
> KBC__GDPR__PP__DRT_MKT_COOK_CST_IC#1272, DATA_DTH_IC#872 AS 
> KBC__GDPR__PP__DATA_DTH_IC#1300, CLN_CDT_ETD_DRT_MKT_IC#961 AS 
> KBC__GDPR__PP__CLN_CDT_ETD_DRT_MKT_IC#1328, CFT_COOK_CST_IC#829 AS 
> KBC__GDPR__PP__CFT_COOK_CST_IC#1356, DRT_MKT_FNC_PDC_COOK_CST_TS#811 AS 
> KBC__GDPR__PP__DRT_MKT_FNC_PDC_COOK_CST_TS#1384, 
> CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC#959 AS 
> KBC__GDPR__PP__CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC#1412, 
> DRT_MKT_NFNC_PDC_COOK_CST_IC#812 AS 
> KBC__GDPR__PP__DRT_MKT_NFNC_PDC_COOK_CST_IC#1440, CLN_CDT_CMC_MDL_IC#956 AS 
> KBC__GDPR__PP__CLN_CDT_CMC_MDL_IC#1468, MDL_COOK_CST_IC#808 AS 
> KBC__GDPR__PP__MDL_COOK_CST_IC#1496, DRT_MKT_FNC_PDC_COOK_CST_IC#810 AS 
> KBC__GDPR__PP__DRT_MKT_FNC_PDC_COOK_CST_IC#1524, CLN_CDT_CVN_IC#963 AS 
> KBC__GDPR__PP__CLN_CDT_CVN_IC#1552, CLN_CDT_ETD_CMC_PFL_IC#962 AS 
> KBC__GDPR__PP__CLN_CDT_ETD_CMC_PFL_IC#1580, ANL_COOK_CST_IC#831 AS 
> KBC__GDPR__PP__ANL_COOK_CST_IC#1608, DRT_MKT_COOK_CST_TS#816 AS 
> KBC__GDPR__PP__DRT_MKT_COOK_CST_TS#1636, NFNC_PDC_PCM_CST_IC#841 AS 
> KBC__GDPR__PP__NFNC_PDC_PCM_CST_IC#1664, ACS_PSD2_CST_IC#931 AS 
> KBC__GDPR__PP__ACS_PSD2_CST_IC#1692, CFT_COOK_CST_TS#830 AS 
> KBC__GDPR__PP__CFT_COOK_CST_TS#1720, CLN_CDT_BAS_DRT_MKT_IC#958 AS 
> KBC__GDPR__PP__CLN_CDT_BAS_DRT_MKT_IC#1748, CLN_CDT_PTC_MKT_SVY_IC#964 AS 
> KBC__GDPR__PP__CLN_CDT_PTC_MKT_SVY_IC#1776, FNC_PDC_PCM_CST_IC#840 AS 
> KBC__GDPR__PP__FNC_PDC_PCM_CST_IC#1804, CLN_CDT_BAS_CMC_PFL_IC#957 AS 
> KBC__GDPR__PP__CLN_CDT_BAS_CMC_PFL_IC#1832, CPN_NO#806 AS 
> KBC__GDPR__PP__CPN_NO#1860, PTY_CLT_NO#805 AS KBC__GDPR__PP__PTY_CLT_NO#1888]
> Input [27]: [PTY_CLT_NO#805, CPN_NO#806, MDL_COOK_CST_IC#808, 
> MDL_COOK_CST_TS#809, DRT_MKT_FNC_PDC_COOK_CST_IC#810, 
> DRT_MKT_FNC_PDC_COOK_CST_TS#811, DRT_MKT_NFNC_PDC_COOK_CST_IC#812, 
> DRT_MKT_NFNC_PDC_COOK_CST_TS#813, DRT_MKT_COOK_CST_IC#815, 
> DRT_MKT_COOK_CST_TS#816, CFT_COOK_CST_IC#829, CFT_COOK_CST_TS#830, 
> ANL_COOK_CST_IC#831, ANL_COOK_CST_TS#832, FNC_PDC_PCM_CST_IC#840, 
> NFNC_PDC_PCM_CST_IC#841, DATA_DTH_IC#872, ACS_PSD2_CST_IC#931, 
> CLN_CDT_CMC_MDL_IC#956, CLN_CDT_BAS_CMC_PFL_IC#957, 
> CLN_CDT_BAS_DRT_MKT_IC#958, CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC#959, 
> CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC#960, CLN_CDT_ETD_DRT_MKT_IC#961, 
> CLN_CDT_ETD_CMC_PFL_IC#962, CLN_CDT_CVN_IC#963, CLN_CDT_PTC_MKT_SVY_IC#964]
> (9) BroadcastHashJoin
> Left keys [4]: [coalesce(PTY_CLT_NO#22, ), isnull(PTY_CLT_NO#22), 
> coalesce(CPN_NO#24, ), isnull(CPN_NO#24)]
> Right keys [4]: [coalesce(KBC__GDPR__PP__PTY_CLT_NO#1888, ), 
> isnull(KBC__GDPR__PP__PTY_CLT_NO#1888), coalesce(KBC__GDPR__PP__CPN_NO#1860, 
> ), isnull(KBC__GDPR__PP__CPN_NO#1860)]
> Join type: Inner
> Join condition: None {code}
>  
>  
> If we only select a limited set of columns of the wide df before joining, 
> there is no issue.
>  
> I tested with the exact same code/data against the following spark versions 
> (these are the ones made available in our corporate environment):
>  * 3.2.0 => no issue
>  * 3.3.0 => no issue
>  * 3.4.0 => NullPointerException
>  * 3.5.3 => NullPointerException
> I'd guess some kind of optimization of the execution plan was introduced in 
> spark >=3.4 for wide dataframes, but fails in this specific case.
>  
> I can't share the actual dataframes, but I can share statistics upon request. 
> I'm still trying to reproduce the issue with a dummy/fabricated dataframe 
> (that I can share).
>  
>  
> *UPDATE*
> I've managed to trigger the issue with fake/generated data => see code 
> snippet below
>  
> {code:java}
> base_path = "s3://XXX/cases/sdbe/test_spark_issue"
> pp_df_path = f"{base_path}/dummy_party_privacy/pdate=20241204"
> dummy_df_path = f"{base_path}/dummy_df/pdate=20241204/psource=A"party_id_col 
> = "PTY_CLT_NO"
> data_controller_column = "CPN_NO"PREFIX_PP = "KBC__GDPR__PP__"nr_records = 
> 500 * 1000
> nr_cols = 200 {code}
> {code:java}
> # Generate & write pp_df 
> from pyspark.sql.types import DateType, IntegerType, StringType, StructField, 
> StructType
> import random from datetime 
> import date
> pp_non_join_cols = ['ANL_COOK_CST_TS',  'MDL_COOK_CST_TS',  
> 'CLN_CDT_ETD_DRT_MKT_NFNC_PDC_IC',  'DRT_MKT_NFNC_PDC_COOK_CST_TS',  
> 'DRT_MKT_COOK_CST_IC',  'DATA_DTH_IC',  'CLN_CDT_ETD_DRT_MKT_IC',  
> 'CFT_COOK_CST_IC',  'DRT_MKT_FNC_PDC_COOK_CST_TS',  
> 'CLN_CDT_ETD_DRT_MKT_FNC_PDC_IC',  'DRT_MKT_NFNC_PDC_COOK_CST_IC',  
> 'CLN_CDT_CMC_MDL_IC',  'MDL_COOK_CST_IC',  'DRT_MKT_FNC_PDC_COOK_CST_IC',  
> 'CLN_CDT_CVN_IC',  'CLN_CDT_ETD_CMC_PFL_IC',  'ANL_COOK_CST_IC',  
> 'DRT_MKT_COOK_CST_TS',  'NFNC_PDC_PCM_CST_IC',  'ACS_PSD2_CST_IC',  
> 'CFT_COOK_CST_TS',  'CLN_CDT_BAS_DRT_MKT_IC',  'CLN_CDT_PTC_MKT_SVY_IC',  
> 'FNC_PDC_PCM_CST_IC',  'CLN_CDT_BAS_CMC_PFL_IC', ]
> def generate_dict():
>     return {c: random.getrandbits(1) if c.endswith("IC") else date.today() 
> for c in pp_non_join_cols}
> pp_data = [generate_dict() | {party_id_col: i, data_controller_column: 
> "0001"} for i in range(nr_records)]
> pp_schema = StructType(             [                 StructField(c, 
> DateType() if c.endswith("TS") else IntegerType() if c.endswith("IC") else 
> StringType())                 for c in pp_non_join_cols             ] + [     
>             StructField(party_id_col, IntegerType()),                 
> StructField(data_controller_column, StringType()),             ]         )
> pp_df = spark.createDataFrame(pp_data, pp_schema)
> #pp_df.show()
> print(f"Writing partyprivacy to path '{pp_df_path}'")
> pp_df.write.parquet(pp_df_path, mode="overwrite")
> print(f"Done writing partyprivacy to path '{pp_df_path}'")
> {code}
>  
> {code:java}
> # Generate dummy df
> dummy_df_non_join_cols = [f"col_{i}" for i in range(nr_cols)]
> def generate_dummy_record():
>     return {f"col_{i}": random.choice([None, "blabla"]) for i in 
> range(nr_cols)}
> dummy_df_data = [generate_dummy_record() | {party_id_col: i, 
> data_controller_column: "0001"} for i in range(nr_records)]
> dummy_df_schema = StructType(
>             [
>                 StructField(c, StringType()) for c in dummy_df_non_join_cols
>             ] + [
>                 StructField(party_id_col, IntegerType()),
>                 StructField(data_controller_column, StringType()),
>             ]
>         )dummy_df = spark.createDataFrame(dummy_df_data, dummy_df_schema)
> #dummy_df.show()print(f"Writing dummy df to path '{dummy_df_path}'")
> dummy_df.write.parquet(dummy_df_path, mode="overwrite")
> print(f"Done writing dummy df to path '{dummy_df_path}'") {code}
>  
> {code:java}
> from pyspark.sql.functions import col
> input_df = spark.read.parquet(dummy_df_path)
> pp_input_df = spark.read.parquet(pp_df_path)for col_name in 
> pp_input_df.columns:  # rename pp columns
>     new_col_name = PREFIX_PP + col_name
>     pp_input_df = pp_input_df.withColumnRenamed(col_name, 
> new_col_name)party_identifier_attrib = "PTY_CLT_NO"
> pp_join_field = "PTY_CLT_NO"
> col_pp_join_field = f"{PREFIX_PP}{pp_join_field}"data_controller_column = 
> "CPN_NO"
> col_cc_join_field = f"{PREFIX_PP}CPN_NO"df_known_customers = 
> input_df.join(pp_input_df,
>         (col(party_identifier_attrib).eqNullSafe(col(col_pp_join_field)))
>         & (col(data_controller_column).eqNullSafe(col(col_cc_join_field))),
>         how="inner",
>        )print(df_known_customers.count())
> df_known_customers.write.format("noop").mode("overwrite").save()
> print("known customers ok"){code}
> {noformat}
> ---------------------------------------------------------------------------
> Py4JJavaError                             Traceback (most recent call last)
> Cell In[59], line 24
>      17 df_known_customers = input_df.join(pp_input_df,
>      18         
> (col(party_identifier_attrib).eqNullSafe(col(col_pp_join_field)))
>      19         & 
> (col(data_controller_column).eqNullSafe(col(col_cc_join_field))),
>      20         how="inner",
>      21        )
>      23 print(df_known_customers.count())
> ---> 24 df_known_customers.write.format("noop").mode("overwrite").save()
>      25 print("known customers ok")
> File 
> ~/.conda/envs/gdpr_spark_issue_34/lib/python3.11/site-packages/pyspark/sql/readwriter.py:1396,
>  in DataFrameWriter.save(self, path, format, mode, partitionBy, **options)
>    1394     self.format(format)
>    1395 if path is None:
> -> 1396     self._jwrite.save()
>    1397 else:
>    1398     self._jwrite.save(path)
> File 
> ~/.conda/envs/gdpr_spark_issue_34/lib/python3.11/site-packages/py4j/java_gateway.py:1322,
>  in JavaMember.__call__(self, *args)
>    1316 command = proto.CALL_COMMAND_NAME +\
>    1317     self.command_header +\
>    1318     args_command +\
>    1319     proto.END_COMMAND_PART
>    1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>    1323     answer, self.gateway_client, self.target_id, self.name)
>    1325 for temp_arg in temp_args:
>    1326     if hasattr(temp_arg, "_detach"):
> File 
> ~/.conda/envs/gdpr_spark_issue_34/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py:169,
>  in capture_sql_exception.<locals>.deco(*a, **kw)
>     167 def deco(*a: Any, **kw: Any) -> Any:
>     168     try:
> --> 169         return f(*a, **kw)
>     170     except Py4JJavaError as e:
>     171         converted = convert_exception(e.java_exception)
> File 
> ~/.conda/envs/gdpr_spark_issue_34/lib/python3.11/site-packages/py4j/protocol.py:326,
>  in get_return_value(answer, gateway_client, target_id, name)
>     324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
>     325 if answer[1] == REFERENCE_TYPE:
> --> 326     raise Py4JJavaError(
>     327         "An error occurred while calling {0}{1}{2}.\n".
>     328         format(target_id, ".", name), value)
>     329 else:
>     330     raise Py4JError(
>     331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
>     332         format(target_id, ".", name, value))
> Py4JJavaError: An error occurred while calling o1515.save.
> : java.util.concurrent.ExecutionException: java.lang.NullPointerException
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>       at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:209)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeBroadcast$1(SparkPlan.scala:208)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>       at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>       at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:204)
>       at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doExecute(BroadcastHashJoinExec.scala:142)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>       at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:191)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:384)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:382)
>       at 
> org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExec.writeWithV2(WriteToDataSourceV2Exec.scala:266)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:360)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:359)
>       at 
> org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExec.run(WriteToDataSourceV2Exec.scala:266)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>       at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
>       at 
> org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
>       at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
>       at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
>       at 
> org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133)
>       at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
>       at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:318)
>       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
>       at py4j.Gateway.invoke(Gateway.java:282)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>       at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>       at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.spark.util.io.ChunkedByteBuffer.$anonfun$getChunks$1(ChunkedByteBuffer.scala:181)
>       at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>       at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>       at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>       at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>       at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>       at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
>       at 
> org.apache.spark.util.io.ChunkedByteBuffer.getChunks(ChunkedByteBuffer.scala:181)
>       at 
> org.apache.spark.util.io.ChunkedByteBufferInputStream.<init>(ChunkedByteBuffer.scala:278)
>       at 
> org.apache.spark.util.io.ChunkedByteBuffer.toInputStream(ChunkedByteBuffer.scala:174)
>       at 
> org.apache.spark.sql.execution.SparkPlan.decodeUnsafeRows(SparkPlan.scala:409)
>       at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeCollectIterator$2(SparkPlan.scala:457)
>       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:153)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:1163)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:1151)
>       at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:148)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:217)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       ... 1 more{noformat}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to