myfjdthink opened a new issue, #49:
URL: https://github.com/apache/doris-spark-connector/issues/49

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   1.1.1
   
   ### What's Wrong?
   
   read table from
   ```py
   df_origin = spark.read.format("doris")\
       .option("doris.table.identifier", "db.nft_transactions")\
       .option("doris.fenodes", "")\
       .option("user", "root")\
       .option("password", "password") \
       .option("doris.filter.query", "block_timestamp >= '2022-06-01' and 
block_timestamp < '2022-06-03'") \
       .option("doris.read.field", "block_timestamp,marketplace_slug") \
       .option("doris.batch.size", 40000) \
       .load() 
   df_origin.show()
   ```
   output:
   
   +----------------+-------------------+
   |marketplace_slug|    block_timestamp|
   +----------------+-------------------+
   |      aavegotchi|2022-06-01 00:15:02|
   |      aavegotchi|2022-06-01 00:15:14|
   |      aavegotchi|2022-06-01 00:15:26|
   |      aavegotchi|2022-06-01 00:15:38|
   |      aavegotchi|2022-06-01 00:18:50|
   |      aavegotchi|2022-06-01 00:20:26|
   |      aavegotchi|2022-06-01 00:21:10|
   
   doris connector log
   
   > 22/08/25 03:08:47 DEBUG org.apache.doris.spark.sql.ScalaDorisRowRDD: Query 
SQL Sending to Doris FE is: 'select `marketplace_slug`,`block_timestamp` from 
`db`.`nft_transactions` where block_timestamp >= '2022-06-01' and 
block_timestamp < '2022-06-03''.
   
   在这个基础上,我们继续做 where 过滤
   ```python
   df_origin.where("marketplace_slug = 'opensea'").show()
   ```
   doris connector 会将这个过滤这个 where 条件下推到 doris
   
   doris connector log
   
   > Query SQL Sending to Doris FE is: 'select 
`marketplace_slug`,`block_timestamp` from 
`gaia_data__origin_data`.`nft_transactions` where (`marketplace_slug` is not 
null) and (`marketplace_slug` = 'opensea')'.
   
   可以看到,推送到 doris 的条件忽略了前面的 block_timestamp filter,导致最终查询结果是
   
   +----------------+-------------------+
   |marketplace_slug|    block_timestamp|
   +----------------+-------------------+
   |         opensea|2019-08-01 00:06:57|
   |         opensea|2019-08-01 00:17:42|
   |         opensea|2019-08-01 00:19:20|
   |         opensea|2019-08-01 00:38:02|
   |         opensea|2019-08-01 00:47:38|
   |         opensea|2019-08-01 00:59:39|
   
   出现了我们不期望的日期的数据
   
   
   ### What You Expected?
   
   1. where 条件下推要结合 doris.filter.query 的过滤条件,需要parse sql,有些麻烦
   2. 提供选项关闭 where 条件下推,让 spark 来完成这个 where 过滤
   
   ### How to Reproduce?
   
   见上文
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to