bq: my index size grew by 20%. Is this expected Yes. But don't worry about it ;). Basically, you've serialized to disk the "uninverted" form of the field. But, that is accessed through Lucene by MMapDirectory, see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
If you don't use DocValues, the uninverted version is built in Java's memory, which is much more expensive for a variety of reasons. What you lose in disk size you gain in a lower JVM footprint, fewer GC problems etc. But the implication is, indeed, that you should use DocValues for field you intend to facet and/or sort etc on. If you only search it's just wasted space. Best, Erick On Fri, May 27, 2016 at 6:25 AM, Steven White <swhite4...@gmail.com> wrote: > Thank you Erick for pointing out about DocValues. I re-indexed my data > with it set to true and my index size grew by 20%. Is this expected? > > Hi Nick, I'm not clear about SOLR-7495. Are you saying I should not use > docValues=true if:type="int"and multiValued="true"? I'm on Solr 5.2.1. > Thanks. > > Steve > > On Thu, May 26, 2016 at 9:29 PM, Nick D <ndrake0...@gmail.com> wrote: > >> Although you did mention that you wont need to sort and you are using >> mutlivalued=true. On the off chance you do change something like >> multivalued=false docValues=false then this will come in to play: >> >> https://issues.apache.org/jira/browse/SOLR-7495 >> >> This has been a rather large pain to deal with in terms of faceting. (the >> Lucene change that caused a number of Issues is also referenced in this >> Jira). >> >> Nick >> >> >> On Thu, May 26, 2016 at 11:45 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >> >> > I always prefer ints to strings, they can't help but take >> > up less memory, comparing two ints is much faster than >> > two strings etc. Although Lucene can play some tricks >> > to make that less noticeable. >> > >> > Although if these are just a few values, it'll be hard to >> > actually measure the perf difference. >> > >> > And if it's a _lot_ of unique values, you have other problems >> > than the int/string distinction. Faceting on very high >> > cardinality fields is something that can have performance >> > implications. >> > >> > But I'd certainly add docValues="true" to the definition no matter >> > which you decide on. >> > >> > Best, >> > Erick >> > >> > On Wed, May 25, 2016 at 9:29 AM, Steven White <swhite4...@gmail.com> >> > wrote: >> > > Hi everyone, >> > > >> > > I will be faceting on data of type integers and I'm wonder if there is >> > any >> > > difference on how I design my schema. I have no need to sort or use >> > range >> > > facet, given this, in terms of Lucene performance and index size, does >> it >> > > make any difference if I use: >> > > >> > > #1: <field name="FACET_ID" type="string" multiValued="true" >> > indexed="true" >> > > required="true" stored="false"/> >> > > >> > > Or >> > > >> > > #2: <field name="FACET_ID" type="int" multiValued="true" indexed="true" >> > > required="true" stored="false"/> >> > > >> > > (notice how I changed the "type" from "string" to "int" in #2) >> > > >> > > Thanks in advanced. >> > > >> > > Steve >> > >>