> Ah. So docValues are managed by Solr outside of Lucene. Interesting. i was under the impression docValues are in lucene, and he is just saying that an optimize is not a re-index, its just taking the actual files that already exist in your index and arranging them and removing deletions, an optimize doesnt re-read the schema and re-index content
On Fri, May 31, 2019 at 1:59 PM Walter Underwood <wun...@wunderwood.org> wrote: > Ah. So docValues are managed by Solr outside of Lucene. Interesting. > > That actually answers a question I had not asked yet. I was curious if it > was safe to change the id field to docValues without reindexing if we never > sorted on it. It looks like fetching the value won’t work until everything > is reindexed. > > It seems like this would be a useful thing to have supported, migrating a > field to docValues. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On May 31, 2019, at 5:00 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > > bq. but I optimized all the cores, which should rewrite every segment as > docValues. > > > > Not true. Optimize is a Lucene level force merge. Dealing with segments, > i.e. merging and the like, is a low-level Lucene operation and Lucene has > no notion of a schema. So a change you made to the schema is irrelevant to > merging. > > > > You have to have something at the Solr level that does some magic for > this to work. Take a look at UninvertDocValuesMergePolicyFactory if you > have Solr 7.0 or later. WARNING: I haven’t used that personally, and I do > not know what the behavior would be on an index that is “mixed”, i.e. one > that already has segments with some docs having DV entries and some not. > > > > Best, > > Erick > > > >> On May 31, 2019, at 12:35 AM, Walter Underwood <wun...@wunderwood.org> > wrote: > >> > >> That field was changed to docValues, but I optimized all the cores, > which should rewrite every segment as docValues. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On May 30, 2019, at 7:37 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >>> > >>> This is odd. The only reason I know of that would happen is if there > were no docValues for that field in those documents. By any chance were > docValues added to an existing index without totally reindexing into a new > collection? > >>> > >>> What happens if you just query the collection rather than the > individual core? I’m thinking using a streaming expression as a check….. > >>> > >>>> On May 30, 2019, at 6:41 PM, Walter Underwood <wun...@wunderwood.org> > wrote: > >>>> > >>>> 3/4 of the documents I’m getting back from /export are empty. This > collection has four shards, so I’m querying the leader core on each shard > with /export. The results start like this: > >>>> > >>>> > {"numFound":912370,"docs":[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}, > >>>> > >>>> The final 1/4 of the results have UUIDs (the ID type). The id field > is stored as docValues. This is the URL. > >>>> > >>>> > http://hostname:8983/solr/decks_shard1_replica1/export?q=id:*&distrib=false&shards=shard1&fl=id&sort=id+asc > >>>> > >>>> Running 6.6.2, Solr Cloud. The total number of non-null ids from all > four shards is a bit less than 1/4 of the document count. > >>>> > >>>> Any ideas about what is going on? > >>>> > >>>> wunder > >>>> Walter Underwood > >>>> wun...@wunderwood.org > >>>> http://observer.wunderwood.org/ (my blog) > >>>> > >>> > >> > > > >