Hi Salman, I personally do not perform stopword removal. So are you saying CommonGramsFilter is not useful without CommonGramsFilterQueryFilter? If yes, do you want to add a comment to confluence explaining this?
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-CommonGramsFilter On Tuesday, December 10, 2013 1:17 PM, Salman Akram <salman.ak...@northbaysolutions.net> wrote: Thanks!! Using CommonGramsQueryFilter resolved the issue. This was not there in 1.4.1 and also for some reason was not there in SOLR 4 Release Notes that we studied before upgrading. On Tue, Dec 10, 2013 at 9:55 AM, Ahmet Arslan <iori...@yahoo.com> wrote: > Hi Salman, > > I never used commons gram filer but I remember there are two classes in > this family. CommonGramsFilter and CommonGramsQueryFilter. It seems that > CommonsGramsQueryFilter is what you are after. > > > http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html > > > http://khaidoan.wikidot.com/solr-common-gram-filter > > > > > > On Tuesday, December 10, 2013 6:43 AM, Salman Akram < > salman.ak...@northbaysolutions.net> wrote: > We used that syntax in 1.4.1 when Surround was not part of SOLR and has to > register it. Didn't know that it is now part of SOLR. Any ways this is a > red herring since I have totally removed Surround and the issue remains > there. > > Below is the debug info when I give a simple phrase query having common > words with default Query Parser. What I don't understand is that why is it > including single tokens as well? I have also included the relevant config > part below. > > "rawquerystring": "Contents:\"only be\"", > "querystring": "Contents:\"only be\"", > "parsedquery": "MultiPhraseQuery(Contents:\"(only only_be) be\")", > "parsedquery_toString": "Contents:\"(only only_be) be\"", > > "QParser": "LuceneQParser", > > ===== > > <fieldtype name="text" class="solr.TextField"> > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StandardFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.CommonGramsFilterFactory" words="commonwords.txt" > ignoreCase="true"/> > </analyzer> > </fieldtype> > > > > On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher <erik.hatc...@gmail.com> > wrote: > > > But again, as Ahmet mentioned… it doesn't look like the surround query > > parser is actually being used. The debug output also mentioned the > query > > parser used, but that part wasn't provided below. One thing to note > here, > > the surround query parser is not available in 1.4.1. It also looks like > > you're surrounding your query with angle brackets, as it says query > string > > is {!surround}<Contents:"only be">, which is not correct syntax. And one > > of the most important things to note here is that the surround query > parser > > does NOT use the analysis chain of the field, see < > > http://wiki.apache.org/solr/SurroundQueryParser#Limitations>. In short, > > you're going to have to do some work to get common grams factored into a > > surround query (such as maybe calling to the analysis request hander to > > "parse" the query before sending it to the surround query parser). > > > > Erik > > > > > > On Dec 9, 2013, at 9:36 AM, Salman Akram < > > salman.ak...@northbaysolutions.net> wrote: > > > > > Yup on debugging I found that its coming in Analyzer. We are using > > Standard > > > Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if > > its > > > a bug or I am missing some config. > > > > > > > > > On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan <iori...@yahoo.com> > wrote: > > > > > >> Hi Salman, > > >> I am confused because with surround no analysis is applied at query > > time. > > >> I suspect that surround query parser is not kicking in. You should see > > >> SrndQuery or something like at parser query section. > > >> > > >> > > >> > > >> On Monday, December 9, 2013 6:24 AM, Salman Akram < > > >> salman.ak...@northbaysolutions.net> wrote: > > >> > > >> All, > > >> > > >> I posted this sub-issue with another issue few days back but maybe it > > was > > >> not obvious so posting it on a separate thread. > > >> > > >> We recently migrated to SOLR 4.6. We use Common Grams but queries with > > >> words in the CG list have slowed down. On debugging we found that for > CG > > >> words the parser is adding individual tokens of those words in the > query > > >> too which ends up slowing it. Below is an example: > > >> > > >> Query = "only be" > > >> > > >> Here is what debug shows. I have highlighted the red part which is > > >> different in both versions i.e. SOLR 4.6 is making it a > multiphrasequery > > >> and adding individual tokens too. Can someone help? > > >> > > >> SOLR 4.6 (takes 20 secs) > > >> <str name="rawquerystring">{!surround}<Contents:"only be"></str> > > >> <str name="querystring">{!surround}<Contents:"only be"></str> > > >> <str name="parsedquery">MultiPhraseQuery(Contents:"(only only_be) > > >> be")</str> > > >> <str name="parsedquery_toString">Contents:"(only only_be) be"</str> > > >> > > >> SOLR 1.4.1 (takes 1 sec) > > >> <str name="rawquerystring">{!surround}<Contents:"only be"></str> > > >> <str name="querystring">{!surround}<Contents:"only be"></str> > > >> <str name="parsedquery">Contents:only_be</str> > > >> <str name="parsedquery_toString">Contents:only_be</str>-- > > >> > > >> > > >> Regards, > > > >> > > >> Salman Akram > > >> > > > > > > > > > > > > -- > > > Regards, > > > > > > Salman Akram > > > > > > > -- > Regards, > > Salman Akram > -- Regards, Salman Akram