[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037665#comment-17037665 ]
Robert Muir commented on LUCENE-9220: ------------------------------------- Of course tests won't pass! Otherwise this thing gets massively slower because it won't have our fixes to unnecessary reflection, string creation, etc. Also lucene has armenian and estonian, neither of which are currently in the snowball repo (one is on the website though, and the other has a PR). So we have to generate and enable stemmers for those languages. We also support stemmers that are disabled by default (KP, german2, lovins), so we have to enable and generate those too. Finally there is the mixed tabs/space indentation, the lack of license headers, the lack of javadocs, the mixed tab/space indentation, it all adds up to make it quite the pain in the ass. I think instead of patching *generated* code we should patch snowball itself and try to send the fixes to them upstream. It seems reasonable they would want consistent whitespace, docs, licensing, better performance, etc. For example, currently methodhandle patching will fail because the generated structure has changed, but its a one-liner to fix this in their C-code generator, and easier to maintain that way, even as a patch. I have made such changes here: https://github.com/rmuir/snowball/commit/2e1433394ef02ee248127c8e3485d9cbc395d577 > Upgrade Snowball version to 2.0 > ------------------------------- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish > Reporter: Nguyen Minh Gia Huy > Priority: Major > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org