Tweaking SOLR memory and cull facet words

phiroc Fri, 27 Mar 2015 03:16:45 -0700

Hi,

my SOLR 5 solrconfig.xml file contains the following lines:


<!-- Faceting defaults -->
       <str name="facet">on</str>
                        <str name="facet.field">text</str>
                         <str name="facet.mincount">100</str>


where the 'text' field contains thousands of words.

When I start SOLR, the search engine takes several minutes to index the words 
in the 'text' field (although loading the browse template later only takes a 
few seconds because the 'text' field has already been indexed).

Here are my questions:

- should I increase SOLR's JVM memory to make initial indexing faster?

e.g., SOLR_JAVA_MEM="-Xms1024m -Xmx204800m" in solr.in.sh

- how can I cull facet words according to certain criteria (length, case, 
etc.)? For instance, my facets are the following:

    application (22427)
    inytapdf0 (22427)
    pdf (22427)
    the (22334)
    new (22131)
    herald (21983)
    york (21975)
    paris (21780)
    a (21692)
    and (21298)
    of (21288)
    i (21247)
    in (21062)
    to (20918)
    on (20899)
    m (20857)
    by (20733)
    de (20664)
    for (20580)
    at (20417)
    with (20371) 
...

Obviously, words such as "the", "i", "to","m", etc. should not be indexed. 
Furthermore, I don't care about "nouns". I am only interested in people and 
location names.


Many thanks.

Philippe

Tweaking SOLR memory and cull facet words

Reply via email to