On Tue, May 28, 2013 at 3:58 PM, Jack Krupansky <j...@basetechnology.com> wrote: > The technical answer: Undefined and not guaranteed. >
I was afraid of that! > Sure, you can experiment and see what the effects "happen" to be in any > given release, and maybe they don't tend to change (too much) between most > releases, but there is no guarantee that any given "change schema but keep > existing data without a delete of directory contents and full reindex" will > actually be benign or what you expect. > > As a general proposition, when it comes to changing the schema and not > deleting the directory and doing a full reindex, don't do it! Of course, we > all know not to try to walk on thin ice, but a lot of people will try to do > it anyway - and maybe it happens that most of the time the results are > benign. > In the case of this particular application, reindexing really is overly burdensome as the application is performing hundreds of writes to the index per minute. How might I gauge how much spare I/O Solr could commit to a reindex? All the data that I need is in fact in stored fields. Note that because the social media application that feeds our Solr index is global, there are no 'off hours'. > OTOH, you could file a Jira to propose that the effects of changing the > schema but keeping the existing data should be precisely defined and > documented, but, that could still change from release to release. > Seems like a lot of effort to document, for little benefit. I'm not going to file it. I would like to know, though, is the schema consulted at index time, query time, or both? > From a practical perspective for your original question: If you suddenly add > a field, there is no guarantee what will happen when you try to access that > field for existing documents, or what will happen if you "update" existing > documents. Sure, people can talk about what "happens to be true today", but > there is no guarantee for the future. Similarly for deleting a field from > the schema, there is no guarantee about the status of existing data, even > though people can chatter about "what it seems to do today." > > Generally, you should design your application around contracts and what is > guaranteed to be true, not what happens to be true from experiments or even > experience. Granted, that is the theory and sometimes you do need to rely on > experimentation and folklore and spotty or ambiguous documentation, but to > the extent possible, it is best to avoid explicitly trying to rely on > undocumented, uncontracted behavior. > Thanks. The application does change (added features) and we do not want to loose old data. > One question I asked long ago and never received an answer: what is the best > practice for doing a full reindex - is it sufficient to first do a delete of > "*:*", or does the Solr index directory contents or even the directory > itself need to be explicitly deleted first? I believe it is the latter, but > the former "seems" to work, most of the time. Deleting the directory itself > "seems" to be the best answer, to date - but no guarantees! > I don't have an answer for that, sorry! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com