[GitHub] [lucene] javanna commented on pull request #11793: Prevent PointValues from returning null for ghost fields

2022-11-09 Thread GitBox
javanna commented on PR #11793: URL: https://github.com/apache/lucene/pull/11793#issuecomment-1308433716 @jpountz would you have time to take another look at this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [lucene] mohamedniyaz1996 opened a new issue, #11908: Get scores along with Spellcheck suggestions

2022-11-09 Thread GitBox
mohamedniyaz1996 opened a new issue, #11908: URL: https://github.com/apache/lucene/issues/11908 ### Description As we know, lucene spellcheck [suggestsimilar()](https://lucene.apache.org/core/6_0_1/suggest/org/apache/lucene/search/spell/SpellChecker.html#suggestSimilar-java.lang.Strin

[GitHub] [lucene] benwtrent commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-09 Thread GitBox
benwtrent commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1308664186 @rmuir i took your test, modified it slightly (changing number of vectors and the assertion). It ran for 3.5 hours and failed on the old code in the exactly correct spot (overflowing o

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-09 Thread GitBox
rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1308697281 yeah its your search at the end that triggers the issue, because checkindex on vectors is currently too wimpy and doesn't ever call seek(). this is also an issue that we must address here.

[GitHub] [lucene] donnerpeter opened a new pull request, #11909: hunspell: introduce FragmentChecker to speed up ModifyingSuggester

2022-11-09 Thread GitBox
donnerpeter opened a new pull request, #11909: URL: https://github.com/apache/lucene/pull/11909 add NGramFragmentChecker to quickly check whether insertions/replacements produce strings that are even possible in the language -- This is an automated message from the Apache Git Service. To

[GitHub] [lucene] rmuir opened a new issue, #11910: improve error-prone configuration for int-overflow bugs

2022-11-09 Thread GitBox
rmuir opened a new issue, #11910: URL: https://github.com/apache/lucene/issues/11910 ### Description As mentioned by @dweiss and @benwtrent in #11905, some static analysis could help here. Maybe it can force us to do a pass reviewing other sketchy possible bugs like this in the code

[GitHub] [lucene] jpountz commented on pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-11-09 Thread GitBox
jpountz commented on PR #11875: URL: https://github.com/apache/lucene/pull/11875#issuecomment-1309017859 I'm somewhat familiar with this code, I wonder if it could be refactored in such a way that it would directly leverage Lucene's timeout support, e.g. can the logic that uses live docs to

[GitHub] [lucene] jpountz commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]

2022-11-09 Thread GitBox
jpountz commented on issue #11702: URL: https://github.com/apache/lucene/issues/11702#issuecomment-1309023675 > e.g., the ESQL query language/engine proposed by Elastic. I don't think ESQL is going to be different from existing faceting support: it will still want to use ordinals when

[GitHub] [lucene] benwtrent commented on issue #11910: improve error-prone configuration for int-overflow bugs

2022-11-09 Thread GitBox
benwtrent commented on issue #11910: URL: https://github.com/apache/lucene/issues/11910#issuecomment-1309035065 I think static analysis will be a significant help. So, I did a quick check at turning on `IntLongMath` for `error-prone` and it is noisy. Its flagging things like t

[GitHub] [lucene] rmuir commented on issue #11910: improve error-prone configuration for int-overflow bugs

2022-11-09 Thread GitBox
rmuir commented on issue #11910: URL: https://github.com/apache/lucene/issues/11910#issuecomment-1309038831 maybe we could even turn it on, and dump its output here as a one-time thing? we can just take a pass through it, and filter out the noisy ones. I realize it isn't ideal and doe

[GitHub] [lucene] reta commented on pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-11-09 Thread GitBox
reta commented on PR #11875: URL: https://github.com/apache/lucene/pull/11875#issuecomment-1309053182 > Then it should be possible to use the existing `IndexSearcher#search` logic as-is? Thanks @jpountz, I think that would be the best option for everyone (Lucene / OpenSearch / Elasti

[GitHub] [lucene] reta closed pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-11-09 Thread GitBox
reta closed pull request #11875: Usability improvements for timeout support in IndexSearcher URL: https://github.com/apache/lucene/pull/11875 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [lucene] reta commented on issue #11874: Usability improvements for timeout support in IndexSearcher

2022-11-09 Thread GitBox
reta commented on issue #11874: URL: https://github.com/apache/lucene/issues/11874#issuecomment-1309053848 Closing was "won't do" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [lucene] reta closed issue #11874: Usability improvements for timeout support in IndexSearcher

2022-11-09 Thread GitBox
reta closed issue #11874: Usability improvements for timeout support in IndexSearcher URL: https://github.com/apache/lucene/issues/11874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] dweiss commented on issue #11910: improve error-prone configuration for int-overflow bugs

2022-11-09 Thread GitBox
dweiss commented on issue #11910: URL: https://github.com/apache/lucene/issues/11910#issuecomment-1309095897 I can understand where it signals a problem but can't determine the domain (the v - 2 can underflow if you pass v small enough)... The 2* 2 case is odd, I'm surprised it doesn't see

[GitHub] [lucene] dweiss commented on a diff in pull request #11909: hunspell: introduce FragmentChecker to speed up ModifyingSuggester

2022-11-09 Thread GitBox
dweiss commented on code in PR #11909: URL: https://github.com/apache/lucene/pull/11909#discussion_r1018232665 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/FragmentChecker.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [lucene] benwtrent commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-09 Thread GitBox
benwtrent commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1309129512 Updated the monster test. Ran about 2hrs on my laptop (but I was working, in virtual meetings, etc. during the entire run). Confirmed it fails without this patch. This p

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-09 Thread GitBox
rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1309198710 nice. i'm fine with the changes. We can open another issue to fix the checkindex stuff. I do really think we should do that before releasing to look for more trouble. Also good if we can g

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-09 Thread GitBox
rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1309200544 and thanks for help battling the testing. it will get better! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [lucene] rmuir opened a new issue, #11911: improve checkindex to be more thorough for vectors (e.g. test seeking)

2022-11-09 Thread GitBox
rmuir opened a new issue, #11911: URL: https://github.com/apache/lucene/issues/11911 ### Description Currently checkindex only `next()'s` through the vectors. We should do some seeking as well. It doesnt have to be intense, e.g. we could do 64 seeks (if there's 20,000,000 docs in the

[GitHub] [lucene] rmuir commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]

2022-11-09 Thread GitBox
rmuir commented on issue #11702: URL: https://github.com/apache/lucene/issues/11702#issuecomment-1309244025 > Curious why you think a document can't have multiple locations? Why wouldn't the geo (wkt, json, wkb, protobuf) specification then not have Multi geometry types? The reason they do

[GitHub] [lucene] nknize commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]

2022-11-09 Thread GitBox
nknize commented on issue #11702: URL: https://github.com/apache/lucene/issues/11702#issuecomment-1309269456 > because in the real world objects can only exist in one place a a time. Except in geo search / analysis this depends on [spatial resolution](https://en.wikipedia.org/wiki/Spa

[GitHub] [lucene] nknize commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]

2022-11-09 Thread GitBox
nknize commented on issue #11702: URL: https://github.com/apache/lucene/issues/11702#issuecomment-1309274884 > Therefore, we created a prototype that implements multi-valued binary docvalues which works well. However, having some support for this use case directly in Lucene is preferable, b

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11909: hunspell: introduce FragmentChecker to speed up ModifyingSuggester

2022-11-09 Thread GitBox
donnerpeter commented on code in PR #11909: URL: https://github.com/apache/lucene/pull/11909#discussion_r1018356474 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/FragmentChecker.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] rmuir opened a new issue, #11912: Can we improve MMapDir's exceptions for invalid offsets?

2022-11-09 Thread GitBox
rmuir opened a new issue, #11912: URL: https://github.com/apache/lucene/issues/11912 ### Description For #11905 bug, as an example, the user may get a generic exception "Seek Past EOF". Can we improve it to include the bogus position? For example, If you can see that offset is

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11909: hunspell: introduce FragmentChecker to speed up ModifyingSuggester

2022-11-09 Thread GitBox
donnerpeter commented on code in PR #11909: URL: https://github.com/apache/lucene/pull/11909#discussion_r1018359280 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/NGramFragmentChecker.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] [lucene] dweiss commented on a diff in pull request #11909: hunspell: introduce FragmentChecker to speed up ModifyingSuggester

2022-11-09 Thread GitBox
dweiss commented on code in PR #11909: URL: https://github.com/apache/lucene/pull/11909#discussion_r1018368778 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/FragmentChecker.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11909: hunspell: introduce FragmentChecker to speed up ModifyingSuggester

2022-11-09 Thread GitBox
donnerpeter commented on code in PR #11909: URL: https://github.com/apache/lucene/pull/11909#discussion_r1018377196 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/FragmentChecker.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] uschindler commented on issue #11912: Can we improve MMapDir's exceptions for invalid offsets?

2022-11-09 Thread GitBox
uschindler commented on issue #11912: URL: https://github.com/apache/lucene/issues/11912#issuecomment-1309351422 As discussed in chat an hour ago: - for MemorySegmentIndexInput it already has a reworked Exception code where also those suppress unused warnings are gone due to some trick: w

[GitHub] [lucene] uschindler commented on issue #11912: Can we improve MMapDir's exceptions for invalid offsets?

2022-11-09 Thread GitBox
uschindler commented on issue #11912: URL: https://github.com/apache/lucene/issues/11912#issuecomment-1309352460 Maybe include that in 9.4.3 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] uschindler commented on issue #11912: Can we improve MMapDir's exceptions for invalid offsets?

2022-11-09 Thread GitBox
uschindler commented on issue #11912: URL: https://github.com/apache/lucene/issues/11912#issuecomment-1309354880 Will do that tomorrow, PR will come. We can decide to add it to 9.4 branch to make later debugging.of broken code with recent vectors easier. -- This is an automated message fr

[GitHub] [lucene] stevenschlansker opened a new issue, #11913: lucene-replicator PrimaryNode unsafely publishes reference during construction

2022-11-09 Thread GitBox
stevenschlansker opened a new issue, #11913: URL: https://github.com/apache/lucene/issues/11913 ### Description In the lucene-replicator module, the PrimaryNode does some initialization work in the constructor. It starts with an IndexWriter provided by the application author. At line

[GitHub] [lucene] jtibshirani commented on issue #11911: improve checkindex to be more thorough for vectors (e.g. test seeking)

2022-11-09 Thread GitBox
jtibshirani commented on issue #11911: URL: https://github.com/apache/lucene/issues/11911#issuecomment-1309460506 This is a good idea! As a note -- in order to exercise all parts of the file format, we'll have to perform a kNN search too through `KnnVectorsReader#search`. If we only load `V

[GitHub] [lucene] jpountz opened a new issue, #11914: Remove QueryTimeout#isTimeoutEnabled?

2022-11-09 Thread GitBox
jpountz opened a new issue, #11914: URL: https://github.com/apache/lucene/issues/11914 ### Description I don't understand well why `QueryTimeout` has a `isTimeoutEnabled` method. If the timeout is not enabled, why configure a timeout at all? ### Version and environment details

[GitHub] [lucene] jpountz commented on issue #11868: Add a FilterIndexOutput

2022-11-09 Thread GitBox
jpountz commented on issue #11868: URL: https://github.com/apache/lucene/issues/11868#issuecomment-1309896931 Let's do `FilterIndexInput` at the same time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above