dataroaring commented on PR #60116:
URL: https://github.com/apache/doris/pull/60116#issuecomment-4124008983

   ## Test Coverage Analysis
   
   The regression tests validate the golden-path end-to-end, which is valuable 
for integration confidence. However, the error surface is largely untested. 
Here's a detailed mapping of code paths vs. test coverage.
   
   ### What's Covered
   
   | # | Scenario | Verdict |
   |---|----------|---------|
   | 1 | MySQL: read from specific binlog offset (file+pos) | **Covered** — 
asserts C1/D1 rows |
   | 2 | MySQL: reject `initial` offset | **Covered** |
   | 3 | PostgreSQL: reject `initial` offset | **Covered** |
   | 4 | PostgreSQL: `latest` offset executes | **Weak** — result logged but 
never asserted |
   
   ### Gap 1: Zero validation error tests
   
   `CdcStreamTableValuedFunction.validate()` has 4 required-field checks. None 
are tested:
   
   ```java
   if (!properties.containsKey(DataSourceConfigKeys.JDBC_URL))   // ❌ not tested
   if (!properties.containsKey(DataSourceConfigKeys.TYPE))        // ❌ not 
tested
   if (!properties.containsKey(DataSourceConfigKeys.TABLE))       // ❌ not 
tested
   if (!properties.containsKey(DataSourceConfigKeys.OFFSET))      // ❌ not 
tested
   ```
   
   Suggest adding at least:
   ```groovy
   test {
       sql """select * from cdc_stream("type" = "mysql")"""
       exception "jdbc_url is required"
   }
   ```
   
   Additionally, `database` is **not validated** but **required downstream** by 
`getRemoteDbName()` — a missing `database` produces a confusing NPE instead of 
a clear error. Both validation and test are missing.
   
   ### Gap 2: Only 1 of 5 offset modes tested for MySQL
   
   | Offset Mode | MySQL | PostgreSQL |
   |-------------|-------|------------|
   | `initial` | ✅ (error) | ✅ (error) |
   | `earliest` | ❌ | ❌ |
   | `latest` | ❌ | ⚠️ (no assertion) |
   | JSON `{file, pos}` | ✅ | ❌ |
   | Malformed/invalid string | ❌ | ❌ |
   
   `earliest` goes through a completely different code path and should have at 
least one test.
   
   ### Gap 3: PostgreSQL result never asserted
   
   ```groovy
   def result = sql """select * from cdc_stream(... "offset" = 'latest')"""
   log.info("result:", result)   // logged but NEVER asserted
   ```
   
   This only verifies "no exception thrown." Even if PG slot creation makes 
exact offset-based tests difficult, the result should at minimum be checked for 
non-null or expected shape.
   
   ### Gap 4: Only INSERT tested — no UPDATE/DELETE
   
   Both tests only insert rows. CDC fundamentally captures INSERT, UPDATE, and 
DELETE. The Debezium deserialization path for `Envelope.Operation.UPDATE` and 
`DELETE` is completely untested through the TVF.
   
   ### Gap 5: No data type coverage
   
   Test tables use only `varchar` + `int`. Not tested:
   - `DATETIME`/`TIMESTAMP` — CDC timestamp handling is notoriously tricky
   - `DECIMAL`/`NUMERIC` — precision loss risk
   - `NULL` values
   
   Compare with the existing `test_streaming_mysql_job_all_type.groovy` which 
has comprehensive type coverage.
   
   ### Gap 6: No error handling tests for connectivity/auth failures
   
   | Error Scenario | Code Path | Test |
   |---|---|---|
   | Unreachable JDBC URL | `getTableColumns()` → connection failure | ❌ |
   | Wrong credentials | `getTableColumns()` → auth failure | ❌ |
   | Non-existent table | `throw new AnalysisException("Table does not exist")` 
| ❌ |
   | Invalid `type` (e.g., "oracle") | `DataSourceType.valueOf()` → 
IllegalArgumentException | ❌ |
   
   ### Gap 7: No unit tests for new utility methods
   
   | Method | Branches | Unit Test |
   |---|---|---|
   | `ConfigUtil.getTableList()` | 4 (include_tables / single table / table 
with comma / neither) | ❌ |
   | `ConfigUtil.getServerId(String)` | signature changed from `long` to 
`String` | ❌ |
   | `PipelineCoordinator.isLong()` | critical TVF-vs-job discriminator | ❌ |
   | `PipelineCoordinator.generateMeta()` | 3 branches (latest/earliest, JSON, 
unsupported) | ❌ |
   
   ### Summary
   
   | Category | Covered / Total | Coverage |
   |---|---|---|
   | Happy path (MySQL offset modes) | 1 / 5 | ~20% |
   | Happy path (PostgreSQL offset modes) | 0 / 4 (1 runs but unasserted) | ~0% 
|
   | Validation errors | 0 / 5 | 0% |
   | DML operations (INSERT/UPDATE/DELETE) | 1 / 3 | 33% |
   | Data types | 1 basic / many | ~10% |
   | Connectivity/auth errors | 0 / 4 | 0% |
   | New utility method unit tests | 0 / 4 methods | 0% |
   
   For a feature that connects to external databases over JDBC and streams data 
through HTTP, the error paths are arguably more important to test than the 
happy paths — they're what users encounter when something goes wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to