Re: looking for documentation on solr.JapaneseTokenizerFactory

Erick Erickson Mon, 27 Jun 2016 22:17:26 -0700

There's some more information in the reference guide, see:
https://cwiki.apache.org/confluence/display/solr/Language+Analysis


NOTE: I would _strongly_ urge you to go to the upper-left corner
and follow the link for downloading older versions and pulling down the
4.10 guide. It's a bold attempt to put _current_ documentation in one
place.

Unfortunately I don't have any direct experience with setting that
up so I'm afraid that's all the help I know about.

Best,
Erick

On Mon, Jun 27, 2016 at 5:59 PM, Micheal Cooper <micheal.coo...@oist.jp> wrote:
> I have a vendor-supplied Solr 4.10 set up for multisite search which indexes 
> two large Drupal 7 sites which have content in Japanese, English, and 
> Undefined.
>
> The English searches are OK, but the Japanese does not work well at all. The 
> vendors are in the US, so it is understandable that they cannot really test 
> it for themselves.
>
> I am trying to fix this config before setting userdict, synonyms, stopwords, 
> and the like. There is obviously a problem with the Tokenization.
>
> I have searched Google in English and Japanese and Safari Books in English, 
> but I cannot find a definitive page or tutorial on setting up Solr with 
> Kuromoji (JapaneseTokenizerFactory) correctly, and the official documentation 
> is not helpful. The comments for text_ja in the config say "See 
> http://wiki.apache.org/solr/JapaneseLanguageSupport for more on Japanese 
> language support," but when you go there, it just says, "This page will 
> contain various information on Japanese support in Lucene/Solr 3.6 & 4.0, but 
> it currently just a filler...".
>
> Does anyone have a good source of info for setting up Solr for Japanese 
> content?
>
> Micheal
>

Re: looking for documentation on solr.JapaneseTokenizerFactory

Reply via email to