anshul98ks123 opened a new pull request, #18495:
URL: https://github.com/apache/pinot/pull/18495

   ## Issue
   
   `AbstractIndexType#convertToNewFormat` (called via 
`TableConfigUtils.createTableConfigFromOldFormat`) unconditionally overwrites 
every `FieldConfig.indexes[prettyName]` with the verbose typed-POJO unwrap — 
even on columns the user already supplied in new-format slim shape.
   
   Repro: user posts `{"forward":{"compressionCodec":"SNAPPY"}}`. The 
deserializer fills missing primitive `@JsonCreator` params with `0`/`false`. 
`configValue.toJsonNode()` re-serializes via the bean serializer and emits 
every key. `currentIndexes.set(prettyName, …)` then clobbers the user's slim 
shape with a verbose 7-key blob:
   
   ```json
   {
     "disabled": false,
     "encodingType": "DICTIONARY",
     "compressionCodec": "SNAPPY",
     "deriveNumDocsPerChunk": false,
     "rawIndexWriterVersion": 4,
     "targetMaxChunkSize": "1MB",
     "targetDocsPerChunk": 1000
   }
   ```
   
   This is load-bearing for downstream surfaces that round-trip user-supplied 
`FieldConfig.indexes` through the migration step and back to the client / ZK — 
the slim shape never survives.
   
   ## Fix
   
   Make the write **gap-filling**: when the column already carries a JsonNode 
at `FieldConfig.indexes[prettyName]`, preserve it verbatim. Pure-legacy inputs 
still translate to new format unchanged.
   
   ```diff
   -      ObjectNode currentIndexes = fieldConfig.getIndexes().isNull()
   -          ? new ObjectMapper().createObjectNode()
   -          : new ObjectMapper().valueToTree(fieldConfig.getIndexes());
   -      currentIndexes.set(getPrettyName(), configValue.toJsonNode());
   +      ObjectNode currentIndexes = fieldConfig.getIndexes().isNull()
   +          ? MAPPER.createObjectNode()
   +          : MAPPER.valueToTree(fieldConfig.getIndexes());
   +      JsonNode existing = currentIndexes.get(getPrettyName());
   +      if (existing != null && !existing.isNull()) {
   +        // Column already carries a new-format JsonNode for this index type 
— preserve the
   +        // user's shape verbatim. Legacy-only inputs fall through to the 
set() branch below.
   +        continue;
   +      }
   +      currentIndexes.set(getPrettyName(), configValue.toJsonNode());
   ```
   
   `ObjectMapper` is also hoisted to `private static final` (was allocated 
per-column).
   
   ## Behavior matrix
   
   | Case | Before | After |
   |---|---|---|
   | Pure legacy (e.g. `bloomFilterColumns=[c1]`) | Translated to new format 
(verbose). | Same — falls through to the `set()` branch. |
   | Pure new-format slim (e.g. `{"forward":{"compressionCodec":"SNAPPY"}}`) | 
**Overwritten** with verbose typed-POJO unwrap. | Preserved verbatim. |
   | Both formats, different index types (e.g. slim `bloom` + legacy 
`noDictionaryColumns`) | Both translate; legacy fattens its own entry. | Both 
translate independently; user's slim shape stays slim. |
   | Both formats, same column + same index type | 
`ConfigDeclaredTwiceException` (merged deserializer). | Same — gap-fill runs 
after the deserializer, so the throw still fires. |
   | Cross-type inference (e.g. inverted → dictionary auto-create) | Verbose 
written. | Verbose written (key was absent). Same. |
   
   No SPI signature changes. No config-key renames. Strictly more conservative 
than today.
   
   ## Testing
   
   New test class `ConvertToNewFormatPreservesSlimTest` (21 tests) pins the 
full contract:
   
   - **Pure new-format preservation:** `forward`/`bloom` slim shapes preserved; 
explicit value equal to today's default preserved; explicit `false` preserved; 
empty `{}` object preserved.
   - **Pure-legacy translation (back-compat):** `bloomFilterColumns`, 
`noDictionaryColumns`, `invertedIndexColumns` all still translate.
   - **Both-formats coexistence:** different index types coexist with each 
shape independent; same column + same type still raises 
`ConfigDeclaredTwiceException`.
   - **Idempotency:** migration twice == migration once, for both new-format 
and legacy inputs.
   - **Multi-column independence:** slim/legacy/new-column scenarios on the 
same `TableConfig`.
   - **Multiple index types on one column:** `forward` + `bloom` + `inverted` 
all preserved.
   - **Legacy cleanup:** `indexingConfig.*` fields cleared post-migration even 
when user slim shape is preserved.
   - **Default-config skip:** user supplies a config equal to type default → 
skipped (`continue` branch); JsonNode preserved.
   - **Cross-type coverage:** `range`, `text`, `json` slim shapes preserved 
(each subclass shares the base codepath).
   - **End-to-end Jackson round-trip:** serialize → deserialize the migrated 
`TableConfig` and confirm the slim shape survives.
   
   Adjacent test suites still pass: `TableConfigUtilsTest` (56), 
`DictionaryIndexConfigTest` (10), `IndexServiceTest` (1), `VectorIndexTest`, 
`H3IndexTest`, `JsonIndexTest`, `RangeIndexTest`, `TextIndexTest`, 
`FstIndexTest`, `ForwardIndexTypeTest`, `DictionaryIndexTypeTest`, 
`BloomFilterTypeTest`, `InvertedIndexTypeTest`, `NullValueIndexTest` — 120 
across the index-type sweep.
   
   `./mvnw spotless:apply checkstyle:check license:format license:check -pl 
pinot-segment-spi,pinot-segment-local` clean.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to