Are you able to re-index a subset into a new collection? For control of timeouts I would suggest Postman or curl, or some other non-browser client.
On Wed, Jan 2, 2019 at 2:55 PM Webster Homer < webster.ho...@milliporesigma.com> wrote: > We are still having serious problems with our solrcloud failing due to > this problem. > The problem is clearly data related. > How can I determine what documents are being searched? Is it possible to > get Solr/lucene to output the docids being searched? > > I believe that this is a lucene bug, but I need to narrow the focus to a > smaller number of records, and I'm not certain how to do that efficiently. > Are there debug parameters that could help? > > -----Original Message----- > From: Webster Homer <webster.ho...@milliporesigma.com> > Sent: Thursday, December 20, 2018 3:45 PM > To: solr-user@lucene.apache.org > Subject: Query kills Solrcloud > > We are experiencing almost nightly solr crashes due to Japanese queries. > I’ve been able to determine that one of our field types seems to be a > culprit. When I run a much reduced version of the query against out DEV > solrcloud I see the memory usage jump from less than a gb to 5gb using only > a single field in the query. The collection is fairly small ~411,000 > documents of which only ~25,000 have searchable Japanese fields. I have > been able to simplify the query to run against a single Japanese field in > the schema. The JVM memory jumps from less than a gig to close to 5 gb, and > back down. The QTime is 36959 which seems high for 2500 documents. Indeed > the single field that I’m using in my test case has 2031 documents. > > I extended the query to 5 fields and watch the memory usage in the Solr > Console application. The memory usage goes to almost 6gb with a QTime of > 100909. The Solrconsole shows connection errors, and when I look at the > Cloud graph all the replicas on the node where I submitted the query are > down. In dev the replicas eventually recover. In production, with the full > query which has a lot more fields in the qf parameter, the solr cloud dies. > One example query term: > ジエチルアミノヒドロキシベンゾイル安息香酸ヘキシル > > This is the field type that we have defined: > <fieldtype name="text_deep_cjk" class="solr.TextField" > positionIncrementGap="10000" autoGeneratePhraseQueries="false"> > <analyzer type="index"> > <!-- remove spaces between CJK characters --> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="([\p{IsHangul}\p{IsHan}\p{IsKatakana}\p{IsHiragana}]+)\s+(?=[\p{IsHangul}\p{IsHan}\p{IsKatakana}\p{IsHiragana}])" > replacement="$1"/> > <tokenizer class="solr.ICUTokenizerFactory" /> > <!-- normalize width before bigram, as e.g. half-width dakuten > combine --> > <filter class="solr.CJKWidthFilterFactory"/> > <!-- Transform Traditional Han to Simplified Han --> > <filter class="solr.ICUTransformFilterFactory" > id="Traditional-Simplified"/> > <!-- Transform Hiragana to Katakana just as was done for > Endeca --> > <filter class="solr.ICUTransformFilterFactory" > id="Hiragana-Katakana"/> > <filter class="solr.ICUFoldingFilterFactory"/> <!-- NFKC, case > folding, diacritics removed --> > <filter class="solr.CJKBigramFilterFactory" han="true" > hiragana="true" katakana="true" hangul="true" outputUnigrams="true" /> > </analyzer> > > <analyzer type="query"> > <!-- remove spaces between CJK characters --> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="([\p{IsHangul}\p{IsHan}\p{IsKatakana}\p{IsHiragana}]+)\s+(?=[\p{IsHangul}\p{IsHan}\p{IsKatakana}\p{IsHiragana}])" > replacement="$1"/> > > <tokenizer class="solr.ICUTokenizerFactory" /> > <filter class="solr.SynonymGraphFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true" > tokenizerFactory="solr.ICUTokenizerFactory" /> > <filter class="solr.CJKWidthFilterFactory"/> > <!-- Transform Traditional Han to Simplified Han --> > <filter class="solr.ICUTransformFilterFactory" > id="Traditional-Simplified"/> > <!-- Transform Hiragana to Katakana just as was done for > Endeca --> > <filter class="solr.ICUTransformFilterFactory" > id="Hiragana-Katakana"/> > <filter class="solr.ICUFoldingFilterFactory"/> <!-- NFKC, case > folding, diacritics removed --> > <filter class="solr.CJKBigramFilterFactory" han="true" > hiragana="true" katakana="true" hangul="true" outputUnigrams="true" /> > </analyzer> > </fieldtype> > > Why is searching even 1 field of this type so expensive? > I suspect that this is data related, as other queries return in far less > than a second. What are good strategies for determining what documents are > causing the problem? I’m new to debugging Solr so I could use some help. > I’d like to reduce the number of records to a minimum to create a small > dataset to reproduce the problem. > Right now our only option is to stop using this fieldtype, but it does > improve the relevancy of searches that don’t cause Solr to crash. > > It would be a great help if the Solrconsole would not timeout on these > queries, is there a way to turn off the timeout? > We are running Solr 7.2 > -- http://www.the111shift.com