[ 
https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037665#comment-17037665
 ] 

Robert Muir commented on LUCENE-9220:
-------------------------------------

Of course tests won't pass! Otherwise this thing gets massively slower because 
it won't have our fixes to unnecessary reflection, string creation, etc.

Also lucene has armenian and estonian, neither of which are currently in the 
snowball repo (one is on the website though, and the other has a PR). So we 
have to generate and enable stemmers for those languages. We also support 
stemmers that are disabled by default (KP, german2, lovins), so we have to 
enable and generate those too.

Finally there is the mixed tabs/space indentation, the lack of license headers, 
the lack of javadocs, the mixed tab/space indentation, it all adds up to make 
it quite the pain in the ass.

I think instead of patching *generated* code we should patch snowball itself 
and try to send the fixes to them upstream. It seems reasonable they would want 
consistent whitespace, docs, licensing, better performance, etc.

For example, currently methodhandle patching will fail because the generated 
structure has changed, but its a one-liner to fix this in their C-code 
generator, and easier to maintain that way, even as a patch.

I have made such changes here: 
https://github.com/rmuir/snowball/commit/2e1433394ef02ee248127c8e3485d9cbc395d577


> Upgrade Snowball version to 2.0
> -------------------------------
>
>                 Key: LUCENE-9220
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9220
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Nguyen Minh Gia Huy
>            Priority: Major
>
> When working with Snowball-based stemmers, I realized that Lucene is 
> currently [using a pre-compiled version of 
> Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html],
>  that seems from 12 years ago: 
> https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b
> Snowball has just released v2.0 in 10/2019 with many improvements, new 
> supported languages ( Arabic, Indonesian…) and new features ( stringdef 
> notation for Unicode codepoints…). Details of the changes could be found 
> here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think 
> these changes of Snowball could give a promising positive impact on Lucene.
> I wonder when Lucene should upgrade Snowball to the latest version ( v2.0).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to