mocobeta commented on PR #12517:
URL: https://github.com/apache/lucene/pull/12517#issuecomment-2016769417

   Hi, sorry for my late reply.
   I quickly checked the built dictionary size. The latest Unidic is fairly (to 
me, insanely) large - its total size is 1.6G.
   https://clrd.ninjal.ac.jp/unidic/back_number.html#unidic_cwj
   
   The built kuromoji jar with unidic-cwj-3.1.1-full eventually becomes 442M. 
Besides the size, I think we should consider performance. I'm worried that 
there can be a significant impact on analysis/indexing speed. Do you have any 
benchmark result on that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to