The technical answer: Undefined and not guaranteed.
Sure, you can experiment and see what the effects "happen" to be in any
given release, and maybe they don't tend to change (too much) between most
releases, but there is no guarantee that any given "change schema but keep
existing data without a delete of directory contents and full reindex" will
actually be benign or what you expect.
As a general proposition, when it comes to changing the schema and not
deleting the directory and doing a full reindex, don't do it! Of course, we
all know not to try to walk on thin ice, but a lot of people will try to do
it anyway - and maybe it happens that most of the time the results are
benign.
OTOH, you could file a Jira to propose that the effects of changing the
schema but keeping the existing data should be precisely defined and
documented, but, that could still change from release to release.
From a practical perspective for your original question: If you suddenly add
a field, there is no guarantee what will happen when you try to access that
field for existing documents, or what will happen if you "update" existing
documents. Sure, people can talk about what "happens to be true today", but
there is no guarantee for the future. Similarly for deleting a field from
the schema, there is no guarantee about the status of existing data, even
though people can chatter about "what it seems to do today."
Generally, you should design your application around contracts and what is
guaranteed to be true, not what happens to be true from experiments or even
experience. Granted, that is the theory and sometimes you do need to rely on
experimentation and folklore and spotty or ambiguous documentation, but to
the extent possible, it is best to avoid explicitly trying to rely on
undocumented, uncontracted behavior.
One question I asked long ago and never received an answer: what is the best
practice for doing a full reindex - is it sufficient to first do a delete of
"*:*", or does the Solr index directory contents or even the directory
itself need to be explicitly deleted first? I believe it is the latter, but
the former "seems" to work, most of the time. Deleting the directory itself
"seems" to be the best answer, to date - but no guarantees!
-- Jack Krupansky
-----Original Message-----
From: Dotan Cohen
Sent: Tuesday, May 28, 2013 5:21 AM
To: solr-user@lucene.apache.org
Subject: What exactly happens to extant documents when the schema changes?
When adding or removing a text field to/from the schema and then
restarting Solr, what exactly happens to extant documents? Is the
schema only consulted when Solr writes a document, therefore extant
documents are unaffected?
Considering that Solr supports dynamic fields, my experimentation with
removing and adding fields to the schema has shown almost no change in
the extant index results returned.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com