looking for documentation on solr.JapaneseTokenizerFactory

Micheal Cooper Mon, 27 Jun 2016 21:46:26 -0700

I have a vendor-supplied Solr 4.10 set up for multisite search which indexes 
two large Drupal 7 sites which have content in Japanese, English, and Undefined.


The English searches are OK, but the Japanese does not work well at all. The 
vendors are in the US, so it is understandable that they cannot really test it 
for themselves.

I am trying to fix this config before setting userdict, synonyms, stopwords, 
and the like. There is obviously a problem with the Tokenization.

I have searched Google in English and Japanese and Safari Books in English, but 
I cannot find a definitive page or tutorial on setting up Solr with Kuromoji 
(JapaneseTokenizerFactory) correctly, and the official documentation is not 
helpful. The comments for text_ja in the config say "See 
http://wiki.apache.org/solr/JapaneseLanguageSupport for more on Japanese 
language support," but when you go there, it just says, "This page will contain 
various information on Japanese support in Lucene/Solr 3.6 & 4.0, but it 
currently just a filler...".

Does anyone have a good source of info for setting up Solr for Japanese content?

Micheal

looking for documentation on solr.JapaneseTokenizerFactory

Reply via email to