Never mind. Anything that didn't merge old segments, just threw them
away when empty (which was my idea) would possibly require as much
disk space as the index currently occupied, so doesn't help your
disk-constrained situation.

Best,
Erick

On Thu, Oct 12, 2017 at 8:06 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> If it's _only_ on a particular replica, here's what you could do:
> Just DELETEREPLICA on it, then ADDREPLICA to bring it back. You can
> define the "node" parameter on ADDREPLICA to get it back on the same
> node. Then the normal replication process would pull the entire index
> down from the leader.
>
> My bet, though, is that this wouldn't really fix things. While it fixes the
> particular case you've noticed I'd guess others would pop up. You can
> see what replicas return what by firing individual queries at the
> particular replica in question with &distrib=false, something like
> solr_server:port/solr/collection1_shard1_replica1/query?distrib=false&blah
> blah blah
>
>
> bq: It is exceedingly unfortunate that reindexing the data on that shard only
> probably won't end up fixing the problem
>
> Well, we've been working on the DWIM (Do What I Mean) feature for years,
> but progress has stalled.
>
> How would that work? You have two segments with vastly different
> characteristics for a field. You could change the type, the multiValued-ness,
> the analysis chain, there's no end to the things that could go wrong. Fixing
> them actually _is_ impossible given how Lucene is structured.
>
> Hmmmm, you've now given me a brainstorm I'll suggest on the JIRA
> system after I talk to the dev list....
>
> Consider indexed=true stored=false. After stemming, "running" can be
> indexed as "run". At merge time you have no way of knowing that
> "running" was the original term so you simply couldn't fix it on merge,
> not to mention that the performance penalty would be...er...
> severe.
>
> Best,
> Erick
>
> On Thu, Oct 12, 2017 at 5:53 AM, Chris Ulicny <culicny@iq.media> wrote:
>> I thought that decision would come back to bite us somehow. At the time, we
>> didn't have enough space available to do a fresh reindex alongside the old
>> collection, so the only course of action available was to index over the
>> old one, and the vast majority of its use worked as expected.
>>
>> We're planning on upgrading to version 7 at some point in the near future
>> and will have enough space to do a full, clean reindex at that time.
>>
>> bq: This can propagate through all following segment merges IIUC.
>>
>> It is exceedingly unfortunate that reindexing the data on that shard only
>> probably won't end up fixing the problem.
>>
>> Out of curiosity, are there any good write-ups or documentation on how two
>> (or more) lucene segments are merged, or is it just worth looking at the
>> source code to figure that out?
>>
>> Thanks,
>> Chris
>>
>> On Wed, Oct 11, 2017 at 6:55 PM Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>
>>> bq: ...but the collection wasn't emptied first....
>>>
>>> This is what I'd suspect is the problem. Here's the issue: Segments
>>> aren't merged identically on all replicas. So at some point you had
>>> this field indexed without docValues, changed that and re-indexed. But
>>> the segment merging could "read" the first segment it's going to merge
>>> and think it knows about docValues for that field, when in fact that
>>> segment had the old (non-DV) definition.
>>>
>>> This would not necessarily be the same on all replicas even on the _same_
>>> shard.
>>>
>>> This can propagate through all following segment merges IIUC.
>>>
>>> So my bet is that if you index into a new collection, everything will
>>> be fine. You can also just delete everything first, but I usually
>>> prefer a new collection so I'm absolutely and positively sure that the
>>> above can't happen.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Oct 11, 2017 at 12:51 PM, Chris Ulicny <culicny@iq.media> wrote:
>>> > Hi,
>>> >
>>> > We've run into a strange issue with our deployment of solrcloud 6.3.0.
>>> > Essentially, a standard facet query on a string field usually comes back
>>> > empty when it shouldn't. However, every now and again the query actually
>>> > returns the correct values. This is only affecting a single shard in our
>>> > setup.
>>> >
>>> > The behavior pattern generally looks like the query works properly when
>>> it
>>> > hasn't been run recently, and then returns nothing after the query seems
>>> to
>>> > have been cached (< 50ms QTime). Wait a while and you get the correct
>>> > result followed by blanks. It doesn't matter which replica of the shard
>>> is
>>> > queried; the results are the same.
>>> >
>>> > The general query in question looks like
>>> > /select?q=*:*&facet=true&facet.field=market&rows=0&fq=<some filters>
>>> >
>>> > The field is defined in the schema as <field name="market" type="string"
>>> > docValues="true"/>
>>> >
>>> > There are numerous other fields defined similarly, and they do not
>>> exhibit
>>> > the same behavior when used as the facet.field value. They consistently
>>> > return the right results on the shard in question.
>>> >
>>> > If we add facet.method=enum to the query, we get the correct results
>>> every
>>> > time (though slower. So our assumption is that something is sporadically
>>> > working when the fc method is chosen by default.
>>> >
>>> > A few other notes about the collection. This collection is not freshly
>>> > indexed, but has not had any particularly bad failures beyond follower
>>> > replicas going down due to PKIAuthentication timeouts (has been fixed).
>>> It
>>> > has also had a full reindex after a schema change added docValues some
>>> > fields (including the one above), but the collection wasn't emptied
>>> first.
>>> > We are using the composite router to co-locate documents.
>>> >
>>> > Currently, our plan is just to reindex all of the documents on the
>>> affected
>>> > shard to see if that fixes the problem. Any ideas on what might be
>>> > happening or ways to troubleshoot this are appreciated.
>>> >
>>> > Thanks,
>>> > Chris
>>>

Reply via email to