Re: looking for documentation on solr.JapaneseTokenizerFactory

2016-06-28 Thread Micheal Cooper
The very cool people at Atilika, the company that donates the JapaneseTokenizer to Lucene and Solr, just sent me a great slidedeck that you should see if you are interested in Japanese search: https://speakerdeck.com/atilika/japanese-linguistics-in-lucene-and-solr Micheal On 2016/06/28, 17:03,

Re: looking for documentation on solr.JapaneseTokenizerFactory

2016-06-28 Thread Micheal Cooper
Very nice. Thank you. My non-Japanese devs had set Solr to use CJK for indexing and Whitespace Tokenizer for search, which does not work at all because Japanese does not use whitespace. I was able to find settings that seem to be working well. For reference for other knowledge-seekers: I cont

Re: looking for documentation on solr.JapaneseTokenizerFactory

2016-06-28 Thread Alexandre Rafalovitch
Have you seen http://discovery-grindstone.blogspot.com.au/ ? It is a series of articles on setting up SJK for library content. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 28 June 2016 at 10:59, Micheal Cooper wrote: > I hav

Re: looking for documentation on solr.JapaneseTokenizerFactory

2016-06-27 Thread Erick Erickson
There's some more information in the reference guide, see: https://cwiki.apache.org/confluence/display/solr/Language+Analysis NOTE: I would _strongly_ urge you to go to the upper-left corner and follow the link for downloading older versions and pulling down the 4.10 guide. It's a bold attempt to

looking for documentation on solr.JapaneseTokenizerFactory

2016-06-27 Thread Micheal Cooper
I have a vendor-supplied Solr 4.10 set up for multisite search which indexes two large Drupal 7 sites which have content in Japanese, English, and Undefined. The English searches are OK, but the Japanese does not work well at all. The vendors are in the US, so it is understandable that they cann