Thank you Yasufumi!
It looks like the userdict_ja.txt could be a good way for us to go.
I wonder though if there is a more generic solution to this problem? E.g.,
has anyone done some research into a list of commonly desired
decompoundings which the Kuormoji statistics miss? I tried searching onl
Hi,
There are two solutions as far as I know.
1. Use userDictionary attribute
This is common and safe way I think.
Add userDictionary attribute into your tokenizer configuration and define
userDictionary file as follows.
Tokenizer:
userDictionary(lang/userdict_ja.txt in above setting):
日本人,日本
Hi SOLR Community,
I have an example of a basic Japanese indexing/recall scenario which I am
trying to support, but cannot get to work.
The scenario is: I would like for 日本人 (Japanese Person) to be matched by
either 日本 (Japan) or 人 (Person). Currently, I am not seeing this work. My
Japanese text