[GitHub] [lucene] Ankou76ers opened a new issue, #12210: ArrayIndexOutOfBoundsException in OpenNLPSentenceBreakIterator

2023-03-22 Thread via GitHub
Ankou76ers opened a new issue, #12210: URL: https://github.com/apache/lucene/issues/12210 ### Description When calling [preceding ](https://github.com/apache/lucene/blob/0782535017c9e737350e96fb0f53457c7b8ecf03/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNL

[GitHub] [lucene] Ankou76ers commented on issue #12210: ArrayIndexOutOfBoundsException in OpenNLPSentenceBreakIterator

2023-03-22 Thread via GitHub
Ankou76ers commented on issue #12210: URL: https://github.com/apache/lucene/issues/12210#issuecomment-1479267433 [OpenNLPSentenceBreakIterator.patch.txt](https://github.com/apache/lucene/files/11038808/OpenNLPSentenceBreakIterator.patch.txt) -- This is an automated message from the Ap

[GitHub] [lucene] rmuir commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
rmuir commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479280860 > Hi @rmuir, first of all, I deeply appreciate the time you are taking to help us on this issue, thank you for that. When I said "'ll leave the BoostAttribute discussion for another time

[GitHub] [lucene] alessandrobenedetti commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
alessandrobenedetti commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479292833 Hi @rmuir, we definitely don't want to ignore improvement recommendations, rest assured. Sorry if I am pedantic, I just want to understand why it shouldn't be used for add

[GitHub] [lucene] rmuir commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
rmuir commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479301623 > Hi @rmuir, we definitely don't want to ignore improvement recommendations, rest assured. > Sorry if I am pedantic, I just want to understand why it shouldn't be used for additional th

[GitHub] [lucene] rmuir commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
rmuir commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479306693 > Allowing negative numbers shouldn't be an issue, the vector similarity score is 0<=x<=1. You don't understand. Some day, someone may want to add such safety as a check to the attr

[GitHub] [lucene] romseygeek commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
romseygeek commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479323710 Would your worries be assuaged if we created a separate `QueryBoostAttribute` class @rmuir? Then `QueryBuilder` can use that, and we can add checks for negative boosts, javadocs that

[GitHub] [lucene] rmuir commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
rmuir commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479355672 > Would your worries be assuaged if we created a separate `QueryBoostAttribute` class @rmuir? Then `QueryBuilder` can use that, and we can add checks for negative boosts, javadocs that say

[GitHub] [lucene] alessandrobenedetti commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
alessandrobenedetti commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479357302 Ok, let me rephrase my question then: Let's assume: - we don't care why that class was originally created and its JavaDocs comments - we don't intend to modify it

[GitHub] [lucene] rmuir commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
rmuir commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479361208 i explained it to you multiple times on this issue and i feel the javadoc explanation is already good. Please read it! -- This is an automated message from the Apache Git Service. To res

[GitHub] [lucene] rmuir commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
rmuir commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479362609 i'm -1 for this PR. analyzers shouldnt be mixing with query boosts at all. it is a mixing of concerns that we should avoid: please refactor to be something other than an analyzer. -- Th

[GitHub] [lucene] alessandrobenedetti commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
alessandrobenedetti commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479386226 @rmuir bear in mind a veto should be motivated and I have seen honestly zero practical motivation so far. Given that, I definitely don't have time for sterile discussions a

[GitHub] [lucene] mkhludnev commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-22 Thread via GitHub
mkhludnev commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1479396426 > i'm -1 for this PR. analyzers shouldnt be mixing with query boosts at all. it is a mixing of concerns that we should avoid: please refactor to be something other than an analyzer.

[GitHub] [lucene] magibney commented on pull request #12207: simplify PrefixQuery to avoid requiring Automaton

2023-03-22 Thread via GitHub
magibney commented on PR #12207: URL: https://github.com/apache/lucene/pull/12207#issuecomment-1479836168 In fact, even for the `wikimedium10m` case, benefits are apparent for broader matches. The improvements are modest, but demonstrable even for these generic cases. And again, in general

[GitHub] [lucene] david-sitsky commented on issue #12185: Using DirectIODirectory results in BufferOverflowException

2023-03-22 Thread via GitHub
david-sitsky commented on issue #12185: URL: https://github.com/apache/lucene/issues/12185#issuecomment-1480390389 As an aside, in some standard benchmark tests I run with our product, I have found the final optimisation of Lucene indexes after all the data has been indexed took 36 seconds