This is normal and it is caused by the "lazy evaluation" in Spark. A spatial join result is only fully materialized when you store it to external storage or consumed by some other operators (e.g., count). Otherwise, the query result is not fully materialized and does not finish the computation. For example, if you run ".show(20)" on a spatial join query result, it is significantly faster as it only calculates the first 20 rows in the result.
On Thu, Sep 1, 2022 at 8:16 PM 可为 <[email protected]> wrote: > Hello everyone, > I have a question to consult. After making a spatial > join of two layers, I output the result using csv format or jdbc. > //csv format > dfContain.write.format("csv").save("file:///F:/data/result") > //jdbc format > dfContain.write > .mode("append") > .option("createTableColumnTypes", "l_gid integer, b_gid integer") > .jdbc(jdbcUrl, "result", prop) > > > I found it take longer to output the result than to computing. > Why? Which method can output faster, except csv or jdbc? Any sugestions > will be appreciated.
