On 7/2/2013 1:58 PM, Ali, Saqib wrote:
Thanks Shawn.

Here is the text_general type definition. We would like to bring down the
storage requirement down to a minimum for those 500KB content documents. We
just need basic full-text search.

Thanks!!! :)




         <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
             <analyzer type="index">
                 <tokenizer class="solr.StandardTokenizerFactory"/>
                 <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"
                         enablePositionIncrements="true"/>
                 <!-- in this example, we will only use synonyms at query
time
                 <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
                 -->
                 <filter class="solr.LowerCaseFilterFactory"/>
             </analyzer>
             <analyzer type="query">
                 <tokenizer class="solr.StandardTokenizerFactory"/>
                 <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"
                         enablePositionIncrements="true"/>
                 <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                 <filter class="solr.LowerCaseFilterFactory"/>
             </analyzer>
         </fieldType>

Unless you have a huge number of synonyms or the synonyms that you have defined are used a LOT in your index, that should not result in a whole lot of term expansion. I have no way to know how much actual space things will take, but from what I have seen, a 500KB input field will probably take a little bit less than 500KB of disk space, unless it is almost entirely composed of unique terms.

Thanks,
Shawn

Reply via email to