If you want to treat test.pdf as a phrase "test pdf",
it might work by setting text_sen autoGeneratePhraseQueries="true".

Regards,
Shinichiro Abe

On 2012/05/17, at 10:39, Katsuyoshi NOGUCHI wrote:

> OK, I understand how those words are tokenized by different tokenizer
> factories.
> My question is that how I can have solr analyze and search for "test" AND
> "pdf".
> As Solr1.4 gives result of "test" AND "pdf", I want Solr 3.6 to do the same.
> (Solr3.6 gives result of "test" OR "pdf").
> 
> Any idea?
> 
> 2012/5/17 Jack Krupansky <j...@basetechnology.com>
> 
>> The query may be the same, but your analyzers are radically different.
>> 
>> Just a hunch, but maybe GosenTokenizerFactory is treating the "." as a
>> space. In 1.4 you were using SenTokenizerFactory. Or maybe
>> GosenBasicFormFilterFactory is treating the "." as a space. In any case, my
>> hunch is that "test.pdf" gets to WDF as two separate tokens, which is the
>> query that is generated on 3.6.
>> 
>> To debug, remove the filters starting with WDF and see if the "." was
>> still there before WDF has invoked. No need to reindex, just reload Solr
>> and look at the parsed query for test.pdf .
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Katsuyoshi NOGUCHI
>> Sent: Wednesday, May 16, 2012 6:03 AM
>> To: solr-user@lucene.apache.org
>> Subject: Dismax query results vary on Solr1.4 and 3.6.
>> 
>> Hi, guys! I need some advice.
>> 
>> When sending the same dismax query to Solr 1.4 and 3.6,
>> query results of search words analized by WordDelimiterFilterFactory are
>> different as below:
>> 
>> [Search Word]
>> test.pdf
>> 
>> [Result]
>> Solr1.4: Search results are analized by "test" AND "pdf"
>> Solr3.6: Search results are analized by "test" OR "pdf"
>> 
>> In Solr3.6, how can I recieve the same result of "test" AND "pdf" as in
>> Solr 1.4?
>> 
>> [Japanese Analizer]
>> Solr1.4 -> Sen
>> Solr3.6 -> lucene-gosen
>> 
>> 
>> Here are some examples of debug results in solrAdmin:
>> /*solrAdmin debug result-1.4*/
>> <lst name="debug">
>> <str name="rawquerystring">test.**pdf</str>
>> <str name="querystring">test.pdf</**str>
>> <str name="parsedquery">
>> +DisjunctionMaxQuery((**fcontent_tsn_is:"test pdf" | fname_tbg_is:"test
>> pdf")) ()
>> </str>
>> <str name="parsedquery_toString">
>> +(fcontent_tsn_is:"test pdf" | fname_tbg_is:"test pdf") ()
>> </str>
>> …
>> <str name="QParser">DisMaxQParser</**str>
>> …
>> </lst>
>> 
>> /*solrAdmin debug result-3.6*/
>> <lst name="debug">
>> <str name="rawquerystring">test.**pdf</str>
>> <str name="querystring">test.pdf</**str>
>> <str name="parsedquery">
>> +DisjunctionMaxQuery(((**fcontent_tsn_is:test fcontent_tsn_is:pdf) |
>> (fname_tbg_is:test fname_tbg_is:pdf)))
>> </str>
>> <str name="parsedquery_toString">
>> +((fcontent_tsn_is:test fcontent_tsn_is:pdf) | (fname_tbg_is:test
>> fname_tbg_is:pdf))
>> </str>
>> ...
>> <str name="QParser">**ExtendedDismaxQParser</str>
>> …
>> </lst>
>> 
>> 
>> The followings are request handlers used in Solr1.4/3.6:
>> 
>> /*solrconfig.xml-1.4*/
>> <requestHandler name="dismax" class="solr.SearchHandler" >
>> <lst name="defaults">
>> <str name="defType">dismax</str>
>> <str name="echoParams">explicit</**str>
>> <str name="q.alt">*:*</str>
>> <str name="qf">fcontent_tsn_is^1.0 fname_tbg_is^1.0 </str>
>> </lst>
>> </requestHandler>
>> 
>> /*solrconfig.xml-3.6*/
>> <requestHandler name="dismax" class="solr.SearchHandler" >
>> <lst name="defaults">
>> <str name="defType">edismax</str>
>> <str name="echoParams">explicit</**str>
>> <str name="q.alt">*:*</str>
>> <str name="qf">content_tsn_is^1.0 name_tbg_is^1.0</str>
>> </lst>
>> </requestHandler>
>> 
>> 
>> The followings are schemas used in Solr1.4/3.6:
>> /*schema.xml-1.4*/
>> <fieldType name="text_sen" class="solr.TextField">
>> <analyzer>
>> <tokenizer class="solrbook.analysis.**SenTokenizerFactory" />
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="**true" />
>> <filter class="solr.**WordDelimiterFilterFactory" generateWordParts="1"
>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> catenateAll="0" splitOnCaseChange="0"/>
>> <filter class="solr.**LowerCaseFilterFactory"/>
>> <filter class="solr.TrimFilterFactory" />
>> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
>> tokenizerFactory="solrbook.**analysis.SenTokenizerFactory"
>> ignoreCase="true"
>> expand="true"/>
>> </analyzer>
>> </fieldType>
>> 
>> <fields>
>> <dynamicField name="*_tsn_is"    type="text_sen"   indexed="true"
>> stored="true"   compressed="false" termVectors="true" termPositions="true"
>> termOffsets="true"  />
>> <dynamicField name="*_tbg_is"    type="text_bigram"   indexed="true"
>> stored="true"   compressed="false" termVectors="true" termPositions="true"
>> termOffsets="true"  />
>> </fields>
>> 
>> <solrQueryParser defaultOperator="AND"/>
>> 
>> /*schema.xml-3.6*/
>> <fieldType name="text_sen" class="solr.TextField">
>> <analyzer>
>> <charFilter class="solr.**MappingCharFilterFactory"
>> mapping="ja-mapping.txt"/>
>> <filter class="solr.**LowerCaseFilterFactory"/>
>> <tokenizer class="solr.**GosenTokenizerFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="**true" />
>> <filter class="solr.**GosenBasicFormFilterFactory" />
>> <filter class="solr.**WordDelimiterFilterFactory" generateWordParts="1"
>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> catenateAll="0" splitOnCaseChange="0"/>
>> <filter class="solr.TrimFilterFactory" />
>> </analyzer>
>> </fieldType>
>> 
>> <fields>
>> <dynamicField name="*_tsn_is"    type="text_sen"   indexed="true"
>> stored="true"   compressed="false" termVectors="true" termPositions="true"
>> termOffsets="true"  />
>> <dynamicField name="*_tbg_is"    type="text_bigram"   indexed="true"
>> stored="true"   compressed="false" termVectors="true" termPositions="true"
>> termOffsets="true"  />
>> </fields>
>> 
>> <solrQueryParser defaultOperator="AND"/>
>> 
>> 
>> Regards.
>> 

Reply via email to