mocobeta commented on PR #12517: URL: https://github.com/apache/lucene/pull/12517#issuecomment-2016769417
Hi, sorry for my late reply. I quickly checked the built dictionary size. The latest Unidic is fairly (to me, insanely) large - its total size is 1.6G. https://clrd.ninjal.ac.jp/unidic/back_number.html#unidic_cwj The built kuromoji jar with unidic-cwj-3.1.1-full eventually becomes 442M. Besides the size, I think we should consider performance. I'm worried that there can be a significant impact on analysis/indexing speed. Do you have any benchmark result on that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org