Very nice. Thank you. My non-Japanese devs had set Solr to use CJK for indexing and Whitespace Tokenizer for search, which does not work at all because Japanese does not use whitespace. I was able to find settings that seem to be working well.
For reference for other knowledge-seekers: I contacted the company that donated Kuromoji, the JapaneseTokenizer from Lucene that is used in Solr, and they directed me to https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-Japanese which has info for v6. The only problem I had was that it seems that JapaneseIterationMarkCharFilterFactory is not available for v4.10, but I just removed it. It is an edge case, and I can look into that later. The other thing to be careful of is loading the library. I could not reload the core because Solr could not load Kuromoji, and I found that that directory was not loaded in the solrconfig.xml. When I tried to use the default relative link method, it did not work. It seems to have something to do with the Lucene libraries. The Japanese blog I found recommended using an absolute link, so I put that in the ‘config’ section that loads library directories, and it worked. Here are some links that also helped: https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-Japanese http://d.hatena.ne.jp/kahnn/20130828/1377645204 http://blog.flect.co.jp/labo/2012/10/solr40schemaxml-bf12.html Micheal On 2016/06/28, 16:10, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote: Have you seen http://discovery-grindstone.blogspot.com.au/ ? It is a series of articles on setting up SJK for library content. Regards, Alex. ---- Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 28 June 2016 at 10:59, Micheal Cooper <micheal.coo...@oist.jp> wrote: > I have a vendor-supplied Solr 4.10 set up for multisite search which indexes > two large Drupal 7 sites which have content in Japanese, English, and > Undefined. > > The English searches are OK, but the Japanese does not work well at all. The > vendors are in the US, so it is understandable that they cannot really test > it for themselves. > > I am trying to fix this config before setting userdict, synonyms, stopwords, > and the like. There is obviously a problem with the Tokenization. > > I have searched Google in English and Japanese and Safari Books in English, > but I cannot find a definitive page or tutorial on setting up Solr with > Kuromoji (JapaneseTokenizerFactory) correctly, and the official documentation > is not helpful. The comments for text_ja in the config say "See > http://wiki.apache.org/solr/JapaneseLanguageSupport for more on Japanese > language support," but when you go there, it just says, "This page will > contain various information on Japanese support in Lucene/Solr 3.6 & 4.0, but > it currently just a filler...". > > Does anyone have a good source of info for setting up Solr for Japanese > content? > > Micheal >