Correct me if I'm wrong, but heavy use of doc values should actually blow up the size of your index considerably if they are in fields that get sent a lot of data.
On Tue, Feb 21, 2017 at 10:50 AM, Pratik Patel <pra...@semandex.net> wrote: > Thanks for the reply. I can see that in solr 6, more than 50% of the index > directory is occupied by ".nvd" file extension. It is something related to > norms and doc values. > > On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch < > arafa...@gmail.com> > wrote: > > > Did you look in the data directories to check what index file extensions > > contribute most to the difference? That could give a hint. > > > > Regards, > > Alex > > > > On 21 Feb 2017 9:47 AM, "Pratik Patel" <pra...@semandex.net> wrote: > > > > > Here is the same question in stackOverflow for better format. > > > > > > http://stackoverflow.com/questions/42370231/solr- > > > dynamic-field-blowing-up-the-index-size > > > > > > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine > > but > > > the problem is that index size with solr 6 is way too large. In solr 5, > > > index size was about 15GB and in solr 6, for the same data, the index > > size > > > is 300GB! I am not able to understand what contributes to such huge > > > difference in solr 6. > > > > > > I have been able to identify a field which is blowing up the size of > > index. > > > It is as follows. > > > > > > <dynamicField name="*_note" type="text_general" indexed="true" > > > stored="true" multiValued="true" /> > > > > > > <field name="textproperty" type="text_general" indexed="true" > > > stored="false" multiValued="true" /> > > > <copyField source="*_note" dest="textproperty"/> > > > > > > When this field is commented out, the index size reduces to less than > > 10GB. > > > > > > This field is of type text_general. Following is the definition of this > > > type. > > > > > > <fieldType name="text_general" class="solr.TextField" > > > positionIncrementGap="100"> > > > <analyzer type="index"> > > > <charFilter class="solr.HTMLStripCharFilterFactory" /> > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <charFilter class="solr.PatternReplaceCharFilterFactory" > > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > > > <filter class="solr.WordDelimiterFilterFactory" > > > protected="protwords.txt" generateWordParts="1" > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > > catenateAll="0" splitOnCaseChange="0"/> > > > <filter class="solr.KStemFilterFactory" /> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/ > > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt" > > > /> > > > </analyzer> > > > <analyzer type="query"> > > > <charFilter class="solr.HTMLStripCharFilterFactory" /> > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <charFilter class="solr.PatternReplaceCharFilterFactory" > > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > > > <filter class="solr.WordDelimiterFilterFactory" > > > protected="protwords.txt" generateWordParts="1" > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > > catenateAll="0" splitOnCaseChange="0"/> > > > <filter class="solr.KStemFilterFactory" /> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/ > > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt" > > > /> > > > </analyzer> > > > </fieldType> > > > > > > Few things which I did to debug this issue: > > > > > > - I have ensured that field type definition is same as what I was > > using > > > in solr 5 and it is also valid in version 6. This field type > > considers a > > > list of "stopwords" to be ignored during indexing. I have supplied > the > > > same > > > list of stopwords which we were using in solr 5. I have verified > that > > > path > > > of this file is correct and it is being loaded fine in solr admin > UI. > > > When > > > I analyse these fields using "Analysis" tab of the solr admin UI, I > > can > > > see > > > that stopwords are being filtered out. However, when I query with > some > > > of > > > these stopwords, I do get the results back which makes me think that > > > probably stopwords are being indexed. > > > > > > Any idea what could increase the size of index by so much in solr 6? > > > > > >