anshul98ks123 opened a new pull request, #18495:
URL: https://github.com/apache/pinot/pull/18495
## Issue
`AbstractIndexType#convertToNewFormat` (called via
`TableConfigUtils.createTableConfigFromOldFormat`) unconditionally overwrites
every `FieldConfig.indexes[prettyName]` with the verbose typed-POJO unwrap —
even on columns the user already supplied in new-format slim shape.
Repro: user posts `{"forward":{"compressionCodec":"SNAPPY"}}`. The
deserializer fills missing primitive `@JsonCreator` params with `0`/`false`.
`configValue.toJsonNode()` re-serializes via the bean serializer and emits
every key. `currentIndexes.set(prettyName, …)` then clobbers the user's slim
shape with a verbose 7-key blob:
```json
{
"disabled": false,
"encodingType": "DICTIONARY",
"compressionCodec": "SNAPPY",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4,
"targetMaxChunkSize": "1MB",
"targetDocsPerChunk": 1000
}
```
This is load-bearing for downstream surfaces that round-trip user-supplied
`FieldConfig.indexes` through the migration step and back to the client / ZK —
the slim shape never survives.
## Fix
Make the write **gap-filling**: when the column already carries a JsonNode
at `FieldConfig.indexes[prettyName]`, preserve it verbatim. Pure-legacy inputs
still translate to new format unchanged.
```diff
- ObjectNode currentIndexes = fieldConfig.getIndexes().isNull()
- ? new ObjectMapper().createObjectNode()
- : new ObjectMapper().valueToTree(fieldConfig.getIndexes());
- currentIndexes.set(getPrettyName(), configValue.toJsonNode());
+ ObjectNode currentIndexes = fieldConfig.getIndexes().isNull()
+ ? MAPPER.createObjectNode()
+ : MAPPER.valueToTree(fieldConfig.getIndexes());
+ JsonNode existing = currentIndexes.get(getPrettyName());
+ if (existing != null && !existing.isNull()) {
+ // Column already carries a new-format JsonNode for this index type
— preserve the
+ // user's shape verbatim. Legacy-only inputs fall through to the
set() branch below.
+ continue;
+ }
+ currentIndexes.set(getPrettyName(), configValue.toJsonNode());
```
`ObjectMapper` is also hoisted to `private static final` (was allocated
per-column).
## Behavior matrix
| Case | Before | After |
|---|---|---|
| Pure legacy (e.g. `bloomFilterColumns=[c1]`) | Translated to new format
(verbose). | Same — falls through to the `set()` branch. |
| Pure new-format slim (e.g. `{"forward":{"compressionCodec":"SNAPPY"}}`) |
**Overwritten** with verbose typed-POJO unwrap. | Preserved verbatim. |
| Both formats, different index types (e.g. slim `bloom` + legacy
`noDictionaryColumns`) | Both translate; legacy fattens its own entry. | Both
translate independently; user's slim shape stays slim. |
| Both formats, same column + same index type |
`ConfigDeclaredTwiceException` (merged deserializer). | Same — gap-fill runs
after the deserializer, so the throw still fires. |
| Cross-type inference (e.g. inverted → dictionary auto-create) | Verbose
written. | Verbose written (key was absent). Same. |
No SPI signature changes. No config-key renames. Strictly more conservative
than today.
## Testing
New test class `ConvertToNewFormatPreservesSlimTest` (21 tests) pins the
full contract:
- **Pure new-format preservation:** `forward`/`bloom` slim shapes preserved;
explicit value equal to today's default preserved; explicit `false` preserved;
empty `{}` object preserved.
- **Pure-legacy translation (back-compat):** `bloomFilterColumns`,
`noDictionaryColumns`, `invertedIndexColumns` all still translate.
- **Both-formats coexistence:** different index types coexist with each
shape independent; same column + same type still raises
`ConfigDeclaredTwiceException`.
- **Idempotency:** migration twice == migration once, for both new-format
and legacy inputs.
- **Multi-column independence:** slim/legacy/new-column scenarios on the
same `TableConfig`.
- **Multiple index types on one column:** `forward` + `bloom` + `inverted`
all preserved.
- **Legacy cleanup:** `indexingConfig.*` fields cleared post-migration even
when user slim shape is preserved.
- **Default-config skip:** user supplies a config equal to type default →
skipped (`continue` branch); JsonNode preserved.
- **Cross-type coverage:** `range`, `text`, `json` slim shapes preserved
(each subclass shares the base codepath).
- **End-to-end Jackson round-trip:** serialize → deserialize the migrated
`TableConfig` and confirm the slim shape survives.
Adjacent test suites still pass: `TableConfigUtilsTest` (56),
`DictionaryIndexConfigTest` (10), `IndexServiceTest` (1), `VectorIndexTest`,
`H3IndexTest`, `JsonIndexTest`, `RangeIndexTest`, `TextIndexTest`,
`FstIndexTest`, `ForwardIndexTypeTest`, `DictionaryIndexTypeTest`,
`BloomFilterTypeTest`, `InvertedIndexTypeTest`, `NullValueIndexTest` — 120
across the index-type sweep.
`./mvnw spotless:apply checkstyle:check license:format license:check -pl
pinot-segment-spi,pinot-segment-local` clean.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]