dataroaring commented on PR #60116:
URL: https://github.com/apache/doris/pull/60116#issuecomment-4124008983
## Test Coverage Analysis
The regression tests validate the golden-path end-to-end, which is valuable
for integration confidence. However, the error surface is largely untested.
Here's a detailed mapping of code paths vs. test coverage.
### What's Covered
| # | Scenario | Verdict |
|---|----------|---------|
| 1 | MySQL: read from specific binlog offset (file+pos) | **Covered** —
asserts C1/D1 rows |
| 2 | MySQL: reject `initial` offset | **Covered** |
| 3 | PostgreSQL: reject `initial` offset | **Covered** |
| 4 | PostgreSQL: `latest` offset executes | **Weak** — result logged but
never asserted |
### Gap 1: Zero validation error tests
`CdcStreamTableValuedFunction.validate()` has 4 required-field checks. None
are tested:
```java
if (!properties.containsKey(DataSourceConfigKeys.JDBC_URL)) // ❌ not tested
if (!properties.containsKey(DataSourceConfigKeys.TYPE)) // ❌ not
tested
if (!properties.containsKey(DataSourceConfigKeys.TABLE)) // ❌ not
tested
if (!properties.containsKey(DataSourceConfigKeys.OFFSET)) // ❌ not
tested
```
Suggest adding at least:
```groovy
test {
sql """select * from cdc_stream("type" = "mysql")"""
exception "jdbc_url is required"
}
```
Additionally, `database` is **not validated** but **required downstream** by
`getRemoteDbName()` — a missing `database` produces a confusing NPE instead of
a clear error. Both validation and test are missing.
### Gap 2: Only 1 of 5 offset modes tested for MySQL
| Offset Mode | MySQL | PostgreSQL |
|-------------|-------|------------|
| `initial` | ✅ (error) | ✅ (error) |
| `earliest` | ❌ | ❌ |
| `latest` | ❌ | ⚠️ (no assertion) |
| JSON `{file, pos}` | ✅ | ❌ |
| Malformed/invalid string | ❌ | ❌ |
`earliest` goes through a completely different code path and should have at
least one test.
### Gap 3: PostgreSQL result never asserted
```groovy
def result = sql """select * from cdc_stream(... "offset" = 'latest')"""
log.info("result:", result) // logged but NEVER asserted
```
This only verifies "no exception thrown." Even if PG slot creation makes
exact offset-based tests difficult, the result should at minimum be checked for
non-null or expected shape.
### Gap 4: Only INSERT tested — no UPDATE/DELETE
Both tests only insert rows. CDC fundamentally captures INSERT, UPDATE, and
DELETE. The Debezium deserialization path for `Envelope.Operation.UPDATE` and
`DELETE` is completely untested through the TVF.
### Gap 5: No data type coverage
Test tables use only `varchar` + `int`. Not tested:
- `DATETIME`/`TIMESTAMP` — CDC timestamp handling is notoriously tricky
- `DECIMAL`/`NUMERIC` — precision loss risk
- `NULL` values
Compare with the existing `test_streaming_mysql_job_all_type.groovy` which
has comprehensive type coverage.
### Gap 6: No error handling tests for connectivity/auth failures
| Error Scenario | Code Path | Test |
|---|---|---|
| Unreachable JDBC URL | `getTableColumns()` → connection failure | ❌ |
| Wrong credentials | `getTableColumns()` → auth failure | ❌ |
| Non-existent table | `throw new AnalysisException("Table does not exist")`
| ❌ |
| Invalid `type` (e.g., "oracle") | `DataSourceType.valueOf()` →
IllegalArgumentException | ❌ |
### Gap 7: No unit tests for new utility methods
| Method | Branches | Unit Test |
|---|---|---|
| `ConfigUtil.getTableList()` | 4 (include_tables / single table / table
with comma / neither) | ❌ |
| `ConfigUtil.getServerId(String)` | signature changed from `long` to
`String` | ❌ |
| `PipelineCoordinator.isLong()` | critical TVF-vs-job discriminator | ❌ |
| `PipelineCoordinator.generateMeta()` | 3 branches (latest/earliest, JSON,
unsupported) | ❌ |
### Summary
| Category | Covered / Total | Coverage |
|---|---|---|
| Happy path (MySQL offset modes) | 1 / 5 | ~20% |
| Happy path (PostgreSQL offset modes) | 0 / 4 (1 runs but unasserted) | ~0%
|
| Validation errors | 0 / 5 | 0% |
| DML operations (INSERT/UPDATE/DELETE) | 1 / 3 | 33% |
| Data types | 1 basic / many | ~10% |
| Connectivity/auth errors | 0 / 4 | 0% |
| New utility method unit tests | 0 / 4 methods | 0% |
For a feature that connects to external databases over JDBC and streams data
through HTTP, the error paths are arguably more important to test than the
happy paths — they're what users encounter when something goes wrong.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]