Re: Empty rows from /export?

David Hastings Fri, 31 May 2019 11:03:21 -0700

> Ah. So docValues are managed by Solr outside of Lucene. Interesting.

i was under the impression docValues are in lucene, and he is just saying
that an optimize is not a re-index, its just taking the actual files that
already exist in your index and arranging them and removing deletions, an
optimize doesnt re-read the schema and re-index content


On Fri, May 31, 2019 at 1:59 PM Walter Underwood <wun...@wunderwood.org>
wrote:

> Ah. So docValues are managed by Solr outside of Lucene. Interesting.
>
> That actually answers a question I had not asked yet. I was curious if it
> was safe to change the id field to docValues without reindexing if we never
> sorted on it. It looks like fetching the value won’t work until everything
> is reindexed.
>
> It seems like this would be a useful thing to have supported, migrating a
> field to docValues.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 31, 2019, at 5:00 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > bq. but I optimized all the cores, which should rewrite every segment as
> docValues.
> >
> > Not true. Optimize is a Lucene level force merge. Dealing with segments,
> i.e. merging and the like, is a low-level Lucene operation and Lucene has
> no notion of a schema. So a change you made to the schema is irrelevant to
> merging.
> >
> > You have to have something at the Solr level that does some magic for
> this to work. Take a look at UninvertDocValuesMergePolicyFactory if you
> have Solr 7.0 or later. WARNING: I haven’t used that personally, and I do
> not know what the behavior would be on an index that is “mixed”, i.e. one
> that already has segments with some docs having DV entries and some not.
> >
> > Best,
> > Erick
> >
> >> On May 31, 2019, at 12:35 AM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >>
> >> That field was changed to docValues, but I optimized all the cores,
> which should rewrite every segment as docValues.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On May 30, 2019, at 7:37 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >>>
> >>> This is odd. The only reason I know of that would happen is if there
> were no docValues for that field in those documents. By any chance were
> docValues added to an existing index without totally reindexing into a new
> collection?
> >>>
> >>> What happens if you just query the collection rather than the
> individual core? I’m thinking using a streaming expression as a check…..
> >>>
> >>>> On May 30, 2019, at 6:41 PM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >>>>
> >>>> 3/4 of the documents I’m getting back from /export are empty. This
> collection has four shards, so I’m querying the leader core on each shard
> with /export. The results start like this:
> >>>>
> >>>>
> {"numFound":912370,"docs":[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},
> >>>>
> >>>> The final 1/4 of the results have UUIDs (the ID type). The id field
> is stored as docValues. This is the URL.
> >>>>
> >>>>
> http://hostname:8983/solr/decks_shard1_replica1/export?q=id:*&distrib=false&shards=shard1&fl=id&sort=id+asc
> >>>>
> >>>> Running 6.6.2, Solr Cloud. The total number of non-null ids from all
> four shards is a bit less than 1/4 of the document count.
> >>>>
> >>>> Any ideas about what is going on?
> >>>>
> >>>> wunder
> >>>> Walter Underwood
> >>>> wun...@wunderwood.org
> >>>> http://observer.wunderwood.org/  (my blog)
> >>>>
> >>>
> >>
> >
>
>

Re: Empty rows from /export?

Reply via email to