Hello All,

We recently upgraded from Solr 6.6 to Solr 7.7.2 and recently had spikes in
memory that eventually caused either an OOM or almost 100% utilization of
the available memory. After trying a few things, increasing the JVM heap,
making sure docValues were set for all Sort, facet fields (thought maybe
the fieldCache was blowing up), I was able to isolate a single query that
would cause the used memory to become fully exhausted and effectively
render the instance dead. After applying a timeAllowed  value to the query
and reducing the query phrase (system would crash on without throwing the
warning on longer queries containing synonyms). I was able to idenitify the
following warning in the logs:

o.a.s.s.SolrIndexSearcher Query: <____very long synonym expansion____>

the request took too long to iterate over terms. Timeout: timeoutAt:
812182664173653 (System.nanoTime(): 812182715745553),
TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@7a0db441

I have narrowed the problem down to the following:
the way synonyms are being expaneded along with phrase slop.

With a ps=5 I get 4096 possible permutations of the phrase being searched
with because of synonyms, looking similar to:
ngs_title:"bereavement leave type build bereavement leave type data p"~5
 ngs_title:"bereavement leave type build bereavement bereavement type data
p"~5
 ngs_title:"bereavement leave type build bereavement jury duty type data
p"~5
 ngs_title:"bereavement leave type build bereavement maternity leave type
data p"~5
 ngs_title:"bereavement leave type build bereavement paternity type data
p"~5
 ngs_title:"bereavement leave type build bereavement paternity leave type
data p"~5
 ngs_title:"bereavement leave type build bereavement adoption leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty maternity leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty paternity type data p"~5
 ngs_title:"bereavement leave type build jury duty paternity leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty adoption leave type data
p"~5
 ngs_title:"bereavement leave type build jury duty absence type data p"~5
 ngs_title:"bereavement leave type build maternity leave leave type data
p"~5
 ngs_title:"bereavement leave type build maternity leave bereavement type
data p"~5
 ngs_title:"bereavement leave type build maternity leave jury duty type
data p"~5

....

Previously in Solr 6 that same query, with the same synonyms (and query
analysis chain) would produce a parsedQuery like when using a &ps=5:
DisjunctionMaxQuery(((ngs_field_description:\"leave leave type build leave
leave type data ? p leave leave type type.enabled\"~5)^3.0 |
(ngs_title:\"leave leave type build leave leave type data ? p leave leave
type type.enabled\"~5)^10.0)

The expansion wasn't being applied to the added disjunctionMaxQuery to when
adjusting rankings with phrase slop.

In general the parsedqueries between 6 and 7 are differnet, with some new
`spanNears` showing but they don't create the memory consumpution issues
that I have seen when a large synonym expansion is happening along w/ using
a PS parameter.

I didn't see much in terms on release notes changes for synonym changes
(outside of SOW=false being the default for version . 7).

The field being opertated on has the following query analysis chain:

 <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>

Not sure if there is a change in phrase slop that now takes synonyms into
account and if there is way to disable that kind of expansion or not. I am
not sure if it is related to SOLR-10980
<https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-10980> or
not, does seem to be related,  but referenced Solr 6 which does not do the
expansion.

Any help would be greatly appreciated.

Nick

Reply via email to