Thank you, Marco.  I see the debug out put that looks like:
<str name="rawquerystring">title_jpn:2001年</str>
<str name="querystring">title_jpn:2001年</str>
<str name="parsedquery">PhraseQuery(title_jpn:"2001 年")</str>
<str name="parsedquery_toString">title_jpn:"2001 年"</str>
<lst name="explain"/>
<str name="QParser">LuceneQParser</str>

Does this mean the standard query parser does send the
raw query string to the Analyzer and (because the query
yielded more than one token?) it uses phrase query?

I guess the cause of my problem is somewhere else.


On Mar 17, 2010, at 1:05 AM, Marco Martinez wrote:

> Hello,
> 
> You can see what happen (which analyzer are used for this field and which is
> the output of the analyzers) with this search using the analysis page of the
> solr default web page. I assume you are using the same analyzers and
> tokenizers in indexing and searching for this field in your schema.
> 
> Regards,
> 
> 
> Marco Martínez Bautista
> 
> 
> 
> 2010/3/17 Teruhiko Kurosaka <k...@basistech.com>
> 
>> It seems that Solr's query parser doesn't pass a single term query
>> to the Analyzer for the field. For example, if I give it
>> 2001年 (year 2001 in Japanese), the searcher returns 0 hits
>> but if I quote them with double-quotes, it returns hits.
>> In this experiment, I configured schema.xml so that
>> the field in question will use the morphological Analyzer
>> my company makes that is capable of splitting 2001年
>> into two tokens 2001 and 年.  I am guessing that this
>> Analyzer is called ONLY IF the term is a phrase.
>> Is my observation correct?
>> 
>> If so, is there any configuration parameter that I can tweak
>> to force any query for the text fields be processed by
>> the Analyzer?
>> 
>> One might ask why users won't put space between 2001 and 年.
>> Well if they are clearly two separate words, people do that.
>> But 年 works more like a suffix in this case, and in many
>> Japanese speaker's mind, 2001年 seems like one token, so
>> many people won't.  (Remember Japanese don't use spaces
>> in normal writing.)  Forcing to use Analyzer would also
>> be useful for compound word handling often desirable
>> for languages like German.

----
Teruhiko "Kuro" Kurosaka
RLP + Lucene & Solr = powerful search for global contents

Reply via email to