jnioche commented on PR #1868:
URL: https://github.com/apache/stormcrawler/pull/1868#issuecomment-4228296484
Looks good, here are a couple of possible issues flagged by Claude
**1. Timestamp serialization format change (MAJOR — data-compat risk)**
The new module writes timestamps as ISO-8601 strings where the legacy
module writes epoch-millis longs:
- persistence/StatusUpdaterBolt.java:254 — doc.put("nextFetchDate",
nextFetch.get().toInstant().toString())
vs legacy persistence/StatusUpdaterBolt.java:252 —
builder.timeField("nextFetchDate", nextFetch.get()) (epoch millis)
- metrics/MetricsConsumer.java:153 — doc.put("timestamp",
timestamp.toInstant().toString())
- metrics/MetricsReporter.java — same pattern for the metric timestamp
field.
Impact depends on index mapping:
- Indexes using the default date format
(strict_date_optional_time||epoch_millis) will accept both — no breakage.
- Indexes with a custom format: epoch_millis mapping (some deployments use
this for compactness) will reject writes from the new module.
- Mixing writers (legacy + new on the same index) can produce fields with
both shapes, which is still queryable but surprising.
Recommendation: either match legacy (toInstant().toEpochMilli()) or
document the format change in the module README and make sure the example
mappings under src/test/resources/{status,metrics,indexer}.mapping declare a
format that
accepts ISO dates.
**2. responseBufferSize config key silently dropped (MINOR regression)**
Legacy external/opensearch/.../OpenSearchConnection.java:283 reads
opensearch.*.responseBufferSize (default 100 MB) and sets it on the HTTP
client. The new OpenSearchConnection.java does not reference it. Users who
tuned this for large
bulk responses will silently lose the setting on migration. The legacy
sample external/opensearch/opensearch-conf.yaml documents the key in four
places.
Recommendation: either port the setting to the HC5 client builder or list
it as a removed key in the new module's README.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]