On 7/2/2013 1:58 PM, Ali, Saqib wrote:
Thanks Shawn.
Here is the text_general type definition. We would like to bring down the
storage requirement down to a minimum for those 500KB content documents. We
just need basic full-text search.
Thanks!!! :)
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"/>
<!-- in this example, we will only use synonyms at query
time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Unless you have a huge number of synonyms or the synonyms that you have
defined are used a LOT in your index, that should not result in a whole
lot of term expansion. I have no way to know how much actual space
things will take, but from what I have seen, a 500KB input field will
probably take a little bit less than 500KB of disk space, unless it is
almost entirely composed of unique terms.
Thanks,
Shawn