Simply put, trying to cut corners and intuit what would be OK when
changing the schema by NOT reindexing from scratch when you are
_not_ completely familiar with the low-level details of Lucene is an recipe
for problems. As you are finding out and Shawn explained.

Think of it this way. The schema.xml is the theory, what's actually _in_
the segments is the reality. Lucene does not impose any uniformity
at all, Solr does based on the schema file. But that's "by convention",
i.e. by creating Lucene fields in a predictable, uniform way. Which means
that changing the schema can write the new segments with wholly new
assumptions that aren't reconcilable with the old segments.

And the fact that you've deleted docs of type A and B means nothing. All
that really happened is that the docs were _marked_ as deleted. The
underlying segments still have the old data (and assumptions). So the
traces of the original definitions are in the segments files and are
possibly incompatible with the new docs written to new segments.
Like Shawn, I have no real clue whether even optimizing would make
any difference. So don't go there would be my take.

This is one of those things that you really have to "just live with" with
Solr/Lucene.

Best,
Erick

On Fri, Jul 24, 2015 at 3:57 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 7/24/2015 3:48 PM, shamik wrote:
>> Here's the part which I'm not able to understand. I've for e.g. Source A, B,
>> C and D in index. Each source contains "n" number of documents. Now, out of
>> these, a bunch of documents in A and B are tagged with MediaType. I took the
>> following steps:
>>
>> 1. Delete all documents tagged with MediaType for A and B. Documents from C
>> and D are not touched.
>>
>> 2. Re-Index documents which were tagged with MediaType
>>
>> 3. Run Optimization
>>
>> Still, I keep seeing this exception. Does this mean, content from C and D
>> are impacted even though they are not tagged with MediaType ?
>
> Do any docs from C and D have that field?  Never mind whether you need
> to run your operation on them ... do they have the field?  If so, then
> when the facet code (which knows about the schema and the fact that it
> has docValues) looks at those segments, they do not have *any* docValues
> tagging for that field.  This likely would cause big explosions.  This
> lack of docValues tagging probably survives an optimize.
>
> Even if they don't have the field, there may be something about the
> Lucene format that the docValues support just doesn't like when the
> original docs were indexed without docValues on that field.
>
> Rebuilding the *entire* index is recommended for most schema changes,
> especially those like docValues that affect very low-level code
> implementations.  Solr hides lots of low-level Lucene details from the
> administrator, but makes use of those details to do its job.  Making
> sure your config and schema match what was present when the index was
> built is sometimes critical.
>
> Thanks,
> Shawn
>

Reply via email to