The technical answer: Undefined and not guaranteed.

Sure, you can experiment and see what the effects "happen" to be in any given release, and maybe they don't tend to change (too much) between most releases, but there is no guarantee that any given "change schema but keep existing data without a delete of directory contents and full reindex" will actually be benign or what you expect.
As a general proposition, when it comes to changing the schema and not 
deleting the directory and doing a full reindex, don't do it! Of course, we 
all know not to try to walk on thin ice, but a lot of people will try to do 
it anyway - and maybe it happens that most of the time the results are 
benign.
OTOH, you could file a Jira to propose that the effects of changing the 
schema but keeping the existing data should be precisely defined and 
documented, but, that could still change from release to release.
From a practical perspective for your original question: If you suddenly add
a field, there is no guarantee what will happen when you try to access that field for existing documents, or what will happen if you "update" existing documents. Sure, people can talk about what "happens to be true today", but there is no guarantee for the future. Similarly for deleting a field from the schema, there is no guarantee about the status of existing data, even though people can chatter about "what it seems to do today."
Generally, you should design your application around contracts and what is 
guaranteed to be true, not what happens to be true from experiments or even 
experience. Granted, that is the theory and sometimes you do need to rely on 
experimentation and folklore and spotty or ambiguous documentation, but to 
the extent possible, it is best to avoid explicitly trying to rely on 
undocumented, uncontracted behavior.
One question I asked long ago and never received an answer: what is the best 
practice for doing a full reindex - is it sufficient to first do a delete of 
"*:*", or does the Solr index directory contents or even the directory 
itself need to be explicitly deleted first? I believe it is the latter, but 
the former "seems" to work, most of the time. Deleting the directory itself 
"seems" to be the best answer, to date - but no guarantees!

-- Jack Krupansky

-----Original Message----- From: Dotan Cohen
Sent: Tuesday, May 28, 2013 5:21 AM
To: solr-user@lucene.apache.org
Subject: What exactly happens to extant documents when the schema changes?

When adding or removing a text field to/from the schema and then
restarting Solr, what exactly happens to extant documents? Is the
schema only consulted when Solr writes a document, therefore extant
documents are unaffected?

Considering that Solr supports dynamic fields, my experimentation with
removing and adding fields to the schema has shown almost no change in
the extant index results returned.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Reply via email to