OK, I understand how those words are tokenized by different tokenizer factories. My question is that how I can have solr analyze and search for "test" AND "pdf". As Solr1.4 gives result of "test" AND "pdf", I want Solr 3.6 to do the same. (Solr3.6 gives result of "test" OR "pdf").
Any idea? 2012/5/17 Jack Krupansky <j...@basetechnology.com> > The query may be the same, but your analyzers are radically different. > > Just a hunch, but maybe GosenTokenizerFactory is treating the "." as a > space. In 1.4 you were using SenTokenizerFactory. Or maybe > GosenBasicFormFilterFactory is treating the "." as a space. In any case, my > hunch is that "test.pdf" gets to WDF as two separate tokens, which is the > query that is generated on 3.6. > > To debug, remove the filters starting with WDF and see if the "." was > still there before WDF has invoked. No need to reindex, just reload Solr > and look at the parsed query for test.pdf . > > -- Jack Krupansky > > -----Original Message----- From: Katsuyoshi NOGUCHI > Sent: Wednesday, May 16, 2012 6:03 AM > To: solr-user@lucene.apache.org > Subject: Dismax query results vary on Solr1.4 and 3.6. > > Hi, guys! I need some advice. > > When sending the same dismax query to Solr 1.4 and 3.6, > query results of search words analized by WordDelimiterFilterFactory are > different as below: > > [Search Word] > test.pdf > > [Result] > Solr1.4: Search results are analized by "test" AND "pdf" > Solr3.6: Search results are analized by "test" OR "pdf" > > In Solr3.6, how can I recieve the same result of "test" AND "pdf" as in > Solr 1.4? > > [Japanese Analizer] > Solr1.4 -> Sen > Solr3.6 -> lucene-gosen > > > Here are some examples of debug results in solrAdmin: > /*solrAdmin debug result-1.4*/ > <lst name="debug"> > <str name="rawquerystring">test.**pdf</str> > <str name="querystring">test.pdf</**str> > <str name="parsedquery"> > +DisjunctionMaxQuery((**fcontent_tsn_is:"test pdf" | fname_tbg_is:"test > pdf")) () > </str> > <str name="parsedquery_toString"> > +(fcontent_tsn_is:"test pdf" | fname_tbg_is:"test pdf") () > </str> > … > <str name="QParser">DisMaxQParser</**str> > … > </lst> > > /*solrAdmin debug result-3.6*/ > <lst name="debug"> > <str name="rawquerystring">test.**pdf</str> > <str name="querystring">test.pdf</**str> > <str name="parsedquery"> > +DisjunctionMaxQuery(((**fcontent_tsn_is:test fcontent_tsn_is:pdf) | > (fname_tbg_is:test fname_tbg_is:pdf))) > </str> > <str name="parsedquery_toString"> > +((fcontent_tsn_is:test fcontent_tsn_is:pdf) | (fname_tbg_is:test > fname_tbg_is:pdf)) > </str> > ... > <str name="QParser">**ExtendedDismaxQParser</str> > … > </lst> > > > The followings are request handlers used in Solr1.4/3.6: > > /*solrconfig.xml-1.4*/ > <requestHandler name="dismax" class="solr.SearchHandler" > > <lst name="defaults"> > <str name="defType">dismax</str> > <str name="echoParams">explicit</**str> > <str name="q.alt">*:*</str> > <str name="qf">fcontent_tsn_is^1.0 fname_tbg_is^1.0 </str> > </lst> > </requestHandler> > > /*solrconfig.xml-3.6*/ > <requestHandler name="dismax" class="solr.SearchHandler" > > <lst name="defaults"> > <str name="defType">edismax</str> > <str name="echoParams">explicit</**str> > <str name="q.alt">*:*</str> > <str name="qf">content_tsn_is^1.0 name_tbg_is^1.0</str> > </lst> > </requestHandler> > > > The followings are schemas used in Solr1.4/3.6: > /*schema.xml-1.4*/ > <fieldType name="text_sen" class="solr.TextField"> > <analyzer> > <tokenizer class="solrbook.analysis.**SenTokenizerFactory" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="**true" /> > <filter class="solr.**WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0"/> > <filter class="solr.**LowerCaseFilterFactory"/> > <filter class="solr.TrimFilterFactory" /> > <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt" > tokenizerFactory="solrbook.**analysis.SenTokenizerFactory" > ignoreCase="true" > expand="true"/> > </analyzer> > </fieldType> > > <fields> > <dynamicField name="*_tsn_is" type="text_sen" indexed="true" > stored="true" compressed="false" termVectors="true" termPositions="true" > termOffsets="true" /> > <dynamicField name="*_tbg_is" type="text_bigram" indexed="true" > stored="true" compressed="false" termVectors="true" termPositions="true" > termOffsets="true" /> > </fields> > > <solrQueryParser defaultOperator="AND"/> > > /*schema.xml-3.6*/ > <fieldType name="text_sen" class="solr.TextField"> > <analyzer> > <charFilter class="solr.**MappingCharFilterFactory" > mapping="ja-mapping.txt"/> > <filter class="solr.**LowerCaseFilterFactory"/> > <tokenizer class="solr.**GosenTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="**true" /> > <filter class="solr.**GosenBasicFormFilterFactory" /> > <filter class="solr.**WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0"/> > <filter class="solr.TrimFilterFactory" /> > </analyzer> > </fieldType> > > <fields> > <dynamicField name="*_tsn_is" type="text_sen" indexed="true" > stored="true" compressed="false" termVectors="true" termPositions="true" > termOffsets="true" /> > <dynamicField name="*_tbg_is" type="text_bigram" indexed="true" > stored="true" compressed="false" termVectors="true" termPositions="true" > termOffsets="true" /> > </fields> > > <solrQueryParser defaultOperator="AND"/> > > > Regards. >