what is the point of a unique indexed field? If for all of your fields, there is only one possible document, you don't need length normalization, scoring, or a search engine at all... just use a HashMap?
On Thu, Nov 10, 2011 at 7:42 AM, Ivan Hrytsyuk <ihryts...@softserveinc.com> wrote: > Hello everyone, > > We have large index size in case norms are enabled. > > schema.xml: > > type declaration: > <fieldType name="simpleTokenizer" class="solr.TextField" > positionIncrementGap="100" omitNorms="false"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory" /> > </analyzer> > </fieldType> > > fields declaration: > <field name="id" stored="true" indexed="true" required="true" > type="string" /> > <field name="name" stored="true" indexed="true" type="string" /> > <dynamicField name="unique_*" stored="false" indexed="true" > type="simpleTokenizer" multiValued="false" /> > > For 5000 documents (every document has 2 unique fields, 2*5000=10000 > unique fields in index), index size is 48.24 MB. > But if we enable omitting norms (omitNorms="true"), index size is 0.56 > MB. > > Next, if we increase number of unique fields per document to 3 > (3*5000=15000 unique fields in index) we receive: 72.23 MB and 0.70 MB > respectively. > And if we increase number of documents to 10000 ( 3*10000 unique fields > in index) we receive: 287.54 MB and 1.44 MB respectively. > > We've prepared test application to reproduce mentioned behavior. It can > be downloaded here: > https://bitbucket.org/coldserenity/solr-large-index-with-norms > > Could anyone point out if size of index is as expected in mentioned > cases? And if it's, what configuration can be applied to reduce size of > index. > > Thank you in advance, Ivan > -- lucidimagination.com