dpol1 commented on PR #1868:
URL: https://github.com/apache/stormcrawler/pull/1868#issuecomment-4229245346
> Thanks @dpol1 Looks good, here are a couple of possible issues flagged by
Claude
>
> **1. Timestamp serialization format change (MAJOR — data-compat risk)**
>
> The new module writes timestamps as ISO-8601 strings where the legacy
module writes epoch-millis longs:
>
> * persistence/StatusUpdaterBolt.java:254 — doc.put("nextFetchDate",
nextFetch.get().toInstant().toString())
> vs legacy persistence/StatusUpdaterBolt.java:252 —
builder.timeField("nextFetchDate", nextFetch.get()) (epoch millis)
>
> * metrics/MetricsConsumer.java:153 — doc.put("timestamp",
timestamp.toInstant().toString())
>
> * metrics/MetricsReporter.java — same pattern for the metric timestamp
field.
>
>
> Impact depends on index mapping:
>
> * Indexes using the default date format
(strict_date_optional_time||epoch_millis) will accept both — no breakage.
>
> * Indexes with a custom format: epoch_millis mapping (some deployments
use this for compactness) will reject writes from the new module.
>
> * Mixing writers (legacy + new on the same index) can produce fields
with both shapes, which is still queryable but surprising.
>
>
> Recommendation: either match legacy (toInstant().toEpochMilli()) or
document the format change in the module README and make sure the example
mappings under src/test/resources/{status,metrics,indexer}.mapping declare a
format that accepts ISO dates.
I tried switching to toEpochMilli(), but it immediately broke
AggregationSpout (which expects a String) and caused the tests to fail against
the date_optional_time mapping. I thought an hybrid approch which covers both
but I don't think is solid - Maybe I forgot something but I've reverted to
.toInstant().toString().
>
> **2. responseBufferSize config key silently dropped (MINOR regression)**
>
> Legacy external/opensearch/.../OpenSearchConnection.java:283 reads
opensearch.*.responseBufferSize (default 100 MB) and sets it on the HTTP
client. The new OpenSearchConnection.java does not reference it. Users who
tuned this for large bulk responses will silently lose the setting on
migration. The legacy sample external/opensearch/opensearch-conf.yaml documents
the key in four places.
>
> Recommendation: either port the setting to the HC5 client builder or list
it as a removed key in the new module's README.
Yes properly documented now and for the sniffer as well!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]