I am using the schema from solr 5 which does not have any field with docValues enabled.In fact to ensure that everything is same as solr 5 (except the breaking changes) I am using the solrconfig.xml also from solr 5 with schemaFactory set as classicSchemaFactory to use schema.xml from solr 5.
On Tue, Feb 21, 2017 at 11:33 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Did you reuse the schema or rebuilt it on top of the latest examples? > Because the latest example schema enabled docValues for strings on the > fieldType level. > > I would do a diff of the schemas to see what changed. If they look > very different and you are looking for tools to normalize/extract > elements from schemas, you may find my latest Revolution presentation > useful for that: > https://www.slideshare.net/arafalov/rebuilding-solr-6- > examples-layer-by-layer-lucenesolrrevolution-2016 > (e.g. slide 20). There is also the video there at the end. > > Regards, > Alex. > ---- > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 21 February 2017 at 11:18, Mike Thomsen <mikerthom...@gmail.com> wrote: > > Correct me if I'm wrong, but heavy use of doc values should actually blow > > up the size of your index considerably if they are in fields that get > sent > > a lot of data. > > > > On Tue, Feb 21, 2017 at 10:50 AM, Pratik Patel <pra...@semandex.net> > wrote: > > > >> Thanks for the reply. I can see that in solr 6, more than 50% of the > index > >> directory is occupied by ".nvd" file extension. It is something related > to > >> norms and doc values. > >> > >> On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch < > >> arafa...@gmail.com> > >> wrote: > >> > >> > Did you look in the data directories to check what index file > extensions > >> > contribute most to the difference? That could give a hint. > >> > > >> > Regards, > >> > Alex > >> > > >> > On 21 Feb 2017 9:47 AM, "Pratik Patel" <pra...@semandex.net> wrote: > >> > > >> > > Here is the same question in stackOverflow for better format. > >> > > > >> > > http://stackoverflow.com/questions/42370231/solr- > >> > > dynamic-field-blowing-up-the-index-size > >> > > > >> > > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app > fine > >> > but > >> > > the problem is that index size with solr 6 is way too large. In > solr 5, > >> > > index size was about 15GB and in solr 6, for the same data, the > index > >> > size > >> > > is 300GB! I am not able to understand what contributes to such huge > >> > > difference in solr 6. > >> > > > >> > > I have been able to identify a field which is blowing up the size of > >> > index. > >> > > It is as follows. > >> > > > >> > > <dynamicField name="*_note" type="text_general" indexed="true" > >> > > stored="true" multiValued="true" /> > >> > > > >> > > <field name="textproperty" type="text_general" indexed="true" > >> > > stored="false" multiValued="true" /> > >> > > <copyField source="*_note" dest="textproperty"/> > >> > > > >> > > When this field is commented out, the index size reduces to less > than > >> > 10GB. > >> > > > >> > > This field is of type text_general. Following is the definition of > this > >> > > type. > >> > > > >> > > <fieldType name="text_general" class="solr.TextField" > >> > > positionIncrementGap="100"> > >> > > <analyzer type="index"> > >> > > <charFilter class="solr.HTMLStripCharFilterFactory" /> > >> > > <tokenizer class="solr.StandardTokenizerFactory"/> > >> > > <filter class="solr.LowerCaseFilterFactory"/> > >> > > <charFilter class="solr.PatternReplaceCharFilterFactory" > >> > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > >> > > <filter class="solr.WordDelimiterFilterFactory" > >> > > protected="protwords.txt" generateWordParts="1" > >> > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > >> > > catenateAll="0" splitOnCaseChange="0"/> > >> > > <filter class="solr.KStemFilterFactory" /> > >> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > >> > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/ > >> > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt" > >> > > /> > >> > > </analyzer> > >> > > <analyzer type="query"> > >> > > <charFilter class="solr.HTMLStripCharFilterFactory" /> > >> > > <tokenizer class="solr.StandardTokenizerFactory"/> > >> > > <filter class="solr.LowerCaseFilterFactory"/> > >> > > <charFilter class="solr.PatternReplaceCharFilterFactory" > >> > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > >> > > <filter class="solr.WordDelimiterFilterFactory" > >> > > protected="protwords.txt" generateWordParts="1" > >> > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > >> > > catenateAll="0" splitOnCaseChange="0"/> > >> > > <filter class="solr.KStemFilterFactory" /> > >> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > >> > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/ > >> > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt" > >> > > /> > >> > > </analyzer> > >> > > </fieldType> > >> > > > >> > > Few things which I did to debug this issue: > >> > > > >> > > - I have ensured that field type definition is same as what I was > >> > using > >> > > in solr 5 and it is also valid in version 6. This field type > >> > considers a > >> > > list of "stopwords" to be ignored during indexing. I have > supplied > >> the > >> > > same > >> > > list of stopwords which we were using in solr 5. I have verified > >> that > >> > > path > >> > > of this file is correct and it is being loaded fine in solr admin > >> UI. > >> > > When > >> > > I analyse these fields using "Analysis" tab of the solr admin > UI, I > >> > can > >> > > see > >> > > that stopwords are being filtered out. However, when I query with > >> some > >> > > of > >> > > these stopwords, I do get the results back which makes me think > that > >> > > probably stopwords are being indexed. > >> > > > >> > > Any idea what could increase the size of index by so much in solr 6? > >> > > > >> > > >> >