A search for a single character will only return hits if that character makes up a whole word, and only if the tokenizer recognizes that character as a word. It's just like in other languages, where a search for "p" won't return documents with the word "apple".
If I were you, I would go into the Solr admin UI and start playing around with the analysis tool. You can paste a phrase in there and it will show you what tokens that phrase will be broken into. I think that will give you a better understanding of why you are getting these search results. You also don't mention which version of Solr you are using. Can you also include the definition of your text_ja field type? - Hayden On Thu, Mar 21, 2013 at 7:01 AM, Van Tassell, Kristian < kristian.vantass...@siemens.com> wrote: > I’m trying to set up our search index to handle Japanese data, and while > some searches yield results, others do not. This is especially true the > smaller the search term. > > For example, searching for this term: 更 > > Yields no results even though I know it appears in the text. I understand > that this character alone may not be a full word without further context, > and thus, perhaps it should not return a hit(?). > > What about putting a star after it? 更* > > Should that return hits? I had been using the text_ja boilerplate setup, > but wonder if a bigram (text_cjk) may work better for my non-Japanese > speaking testing phase. Thanks in advance for any insight! > >