markrmiller commented on issue #13797: URL: https://github.com/apache/lucene/issues/13797#issuecomment-3215941226
Uwe, thanks for weighing in—and for the pointers to `IndexUpgrader` and the expert `DirectoryReader#open(.., minSupportedMajor, ..)`. I want to make sure I understand your position: are you opposed to the proposal itself, or mainly highlighting that power users already have a path today? I’m asking because the core goal here is to make the safe path the *default* path for everyone, without encouraging “metadata surgery.” # Why not ask users to overwrite/“trick” the created version? Overwriting the index’s created-version (or using the expert API to sidestep the open check) puts risk on users in ways that are hard to audit: * **Silent break risk vs explicit guardrails.** Some on-disk changes don’t hard-fail; they manifest as subtle scoring shifts, offsets/positions oddities, or retrieval regressions. If we bless multi-major opens *only when no on-disk break occurred*, the **author of the break** must consciously bump `MIN_SUPPORTED_MAJOR` in the same PR. That creates a crisp signal to users, rather than leaving users to guess whether their “hop” is safe. * **Accessibility.** As Robert noted, the expert API path works for sophisticated deployments, but ordinary users don’t reach for it (and shouldn’t have to). Making the policy explicit removes the “tribal knowledge” hurdle. * **Operational clarity.** “My index *opens* but results look off” is far harder to debug than “Lucene refused to open because created-version < minimum.” The latter points you straight to reindex; the former can burn days. * **We aren’t expanding promises.** This doesn’t extend the backcompat window. We still reserve the right to bump the constant if support costs spike or if a real on-disk break lands. We’re just aligning the open gate with *actual* breaks, not calendar majors. # Why support the policy change? Breaks like norms layout changes are rare; most majors don’t alter on-disk format. Keeping `MIN_SUPPORTED_MAJOR` pinned to the last *real* break: * **Removes unnecessary reindexing** when nothing incompatible changed. * **Reduces upgrade friction** for users doing multi-major hops *that are actually safe*. * **Keeps developer cost low**: the **author of any on-disk break** bumps the constant in the same PR * **Plays nicely with faster major cadence** (e.g., aligning with new Java features) without penalizing users on storage and downtime. * **Doesn’t balloon codecs.** We still ship current + previous readers # Complementary to reindexing (not a substitute) I’m +1 on background reindexing for long-term modernization. This type of upgraded is targeted at the many installs where reindexing is a very high cost and that cost outweighs the desire for performance/features that one would access to via a reindex. :bicyclist: Uwe, are you fundamentally against adopting the “lazy `MIN_SUPPORTED_MAJOR`” policy (bump only on true on-disk breaks)? — Mark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org