JNSimba opened a new pull request, #63618:
URL: https://github.com/apache/doris/pull/63618

   ### What problem does this PR solve?
   
   Problem Summary:
   
   When a Postgres CDC streaming job ingests rows whose timestamp / date 
columns hold historical values (pre-1970 with sub-millisecond precision, or 
pre-1582 / pre-1901 dates), two independent bugs in cdc-client cause data 
corruption or task crash:
   
   1. `DebeziumJsonDeserializer.convertTimestamp` uses signed `/` and `%` on 
negative `micros` / `nanos`, producing a negative `nanoOfMillisecond` and 
tripping Flink `TimestampData`'s `checkArgument(nanoOfMillisecond >= 0)`. 
Result: the ingestion task crashes whenever a pre-1970 timestamp with 
sub-millisecond precision flows through (e.g. `1969-12-31 23:59:59.999123`).
   
   2. The snapshot path reads column values via `rs.getObject()`, which routes 
through PG JDBC's `TimestampUtils` + `GregorianCalendar`. For pre-1582 
timestamps the Julian/proleptic cutover shifts values by N days; for pre-1901 
timestamps the JVM time zone's LMT offset shifts values by the LMT difference 
(e.g. ~343s in `Asia/Shanghai`). Result: the same PG value (e.g. `0001-01-01 
00:00:00`) yields different doris values depending on whether the row was 
synced via snapshot or via binlog.
   
   This PR fixes both:
   
   1. Use `Math.floorDiv` / `Math.floorMod` so the millisecond / nanosecond 
split stays valid for negative epoch values.
   2. Dispatch `TIMESTAMP` / `TIMESTAMPTZ` / `DATE` columns through 
`LocalDateTime` / `OffsetDateTime` / `LocalDate` in the snapshot reader, 
bypassing `GregorianCalendar` entirely. Preserve the legacy 
`Timestamp(Long.MAX/MIN_VALUE)` sentinel for `+/-infinity`.
   
   ### Release note
   
   Fix postgres CDC streaming job ingestion crash and value drift for 
historical-date timestamp / date columns.
   
   ### Check List (For Author)
   
   - Test
       - [x] Regression test
       - [x] Unit Test
   
   - Behavior changed:
       - [x] No.
   
   - Does this need documentation?
       - [x] No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to