Re: What to expect when testing Japanese search index

Hayden Muhl Fri, 22 Mar 2013 23:21:24 -0700

A search for a single character will only return hits if that character
makes up a whole word, and only if the tokenizer recognizes that character
as a word. It's just like in other languages, where a search for "p" won't
return documents with the word "apple".

If I were you, I would go into the Solr admin UI and start playing around
with the analysis tool. You can paste a phrase in there and it will show
you what tokens that phrase will be broken into. I think that will give you
a better understanding of why you are getting these search results.

You also don't mention which version of Solr you are using. Can you also
include the definition of your text_ja field type?

- Hayden

On Thu, Mar 21, 2013 at 7:01 AM, Van Tassell, Kristian <
kristian.vantass...@siemens.com> wrote:

> I’m trying to set up our search index to handle Japanese data, and while
> some searches yield results, others do not. This is especially true the
> smaller the search term.
>
> For example, searching for this term: 更
>
> Yields no results even though I know it appears in the text. I understand
> that this character alone may not be a full word without further context,
> and thus, perhaps it should not return a hit(?).
>
> What about putting a star after it? 更*
>
> Should that return hits? I had been using the text_ja boilerplate setup,
> but wonder if a bigram (text_cjk) may work better for my non-Japanese
> speaking testing phase. Thanks in advance for any insight!
>
>

Re: What to expect when testing Japanese search index

Reply via email to