This is normal and it is caused by the "lazy evaluation" in Spark. A
spatial join result is only fully materialized when you store it to
external storage or consumed by some other operators (e.g., count).
Otherwise, the query result is not fully materialized and does not finish
the computation. For example, if you run ".show(20)" on a spatial join
query result, it is significantly faster as it only calculates the first 20
rows in the result.



On Thu, Sep 1, 2022 at 8:16 PM 可为 <[email protected]> wrote:

> Hello everyone,
> &nbsp; &nbsp;I have a question to consult.&nbsp; After making a spatial
> join of two layers, I output the result using csv format or&nbsp; jdbc.
> //csv format
> dfContain.write.format("csv").save("file:///F:/data/result")
> //jdbc format
> dfContain.write
> &nbsp;.mode("append")
> &nbsp;.option("createTableColumnTypes", "l_gid integer, b_gid integer")
> &nbsp;.jdbc(jdbcUrl, "result", prop)
>
>
> &nbsp;I found&nbsp; it take longer to output the result than to computing.
> Why? Which method can output faster, except csv or jdbc? Any sugestions
> will be appreciated.

Reply via email to