chuang-wang-pre opened a new issue, #314:
URL: https://github.com/apache/doris-spark-connector/issues/314

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   spark-doris-connector: 25.0.1
   doris: 3.0.0
   spark: 3.0.1
   
   
   ### What's Wrong?
   
   ```
   val dorisTableIdentifier = "doris_db.doris_table"
       val hiveTableName = "hive_db.hive_table"
       val timeColumn = "ctime"
       val selectedColumnsStr = args(5).trim
       val startTime = "2025-05-06 00:00:00"
       val endTime = "2025-05-07 00:00:00"
   
         val appName = s"doris-to-hive-$hiveTableName"
         val spark = SparkSession.builder()
           .appName(appName)
           .enableHiveSupport()
           .getOrCreate()
   
         // 1. read data from doris
         val dorisDF = spark.read
           .format("doris")
           .option("doris.fenodes", feNodes)
           .option("doris.table.identifier", dorisTableIdentifier)
           .option("user", user)
           .option("password", password)
           .load()
           .filter(col(timeColumn) >= lit(startTime) && col(timeColumn) < 
lit(endTime)) // limit timespan
           .select(selectedColumns.map(col): _*) // select columns
   
         log.info("doris data count: {}", dorisDF.count()) 
   
         Thread.sleep(1000)
         log.info("doris data count: {}", dorisDF.count())
   
         Thread.sleep(5000)
         log.info("doris data count: {}", dorisDF.count())
   
         dorisDF.createOrReplaceTempView("doris_data_detail")
   
         // 2. write to hive
         val insertSql =
           s"""
              |INSERT OVERWRITE TABLE $hiveTableName PARTITION 
(pt='20250410000000')
              |SELECT
              |$selectedColumnsStr
              |FROM doris_data_detail
              |""".stripMargin
         log.info("insert hive sql: {}", insertSql)
         spark.sql(insertSql)
   
         spark.stop()
   ```
   I used this code to implement doris2hive, and I found that the amount of 
data in the hive table was smaller than that in the doris table, so I added 
some logs to record the number of dataframes. The log is as follows:
   ```
   25/05/07 19:43:56 INFO Doris2HiveTask$: doris data count: 68684
   25/05/07 19:43:59 INFO Doris2HiveTask$: doris data count: 97918
   25/05/07 19:44:05 INFO Doris2HiveTask$: doris data count: 99903
   ```
   the amount in doris:
   
   <img width="688" alt="Image" 
src="https://github.com/user-attachments/assets/3e01790b-bc10-4f76-bfe9-13cb1bcc3555";
 />
   Why did this happen , is this a bug?
   
   
   ### What You Expected?
   
   The reason for this situation
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to