[GitHub] [lucene] akhgeek30 opened a new issue, #11864: ArrayIndexOutOfBoundException

2022-10-20 Thread GitBox
akhgeek30 opened a new issue, #11864: URL: https://github.com/apache/lucene/issues/11864 ### Description Steps to reproduce 1. Query = abc-ghi 2. Create a synonym file as Synonym.txt = { abc,def ghi,jkl } 3. Schema to be followed managed-schema

[GitHub] [lucene] iverase opened a new pull request, #11865: Fix duplicate entry in CHANGES.txt

2022-10-20 Thread GitBox
iverase opened a new pull request, #11865: URL: https://github.com/apache/lucene/pull/11865 Seem a leftover for last commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] iverase merged pull request #11865: Fix duplicate entry in CHANGES.txt

2022-10-20 Thread GitBox
iverase merged PR #11865: URL: https://github.com/apache/lucene/pull/11865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] benwtrent commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-10-20 Thread GitBox
benwtrent commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1000640297 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsReader.java: ## @@ -0,0 +1,505 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-10-20 Thread GitBox
mikemccand commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r1000886175 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteTrackingIndexOutput.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [lucene] mikemccand commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-10-20 Thread GitBox
mikemccand commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1285872091 This looks great to me! I love all the engagement (83+ comments!) and how it iterated to such a simple solution. I left a small comment for a follow-on issue ... and it looks like `

[GitHub] [lucene] NightOwl888 opened a new issue, #11866: On many analyzers, the getDefaultStopSet() method returns a modifiable set, contrary to the docs

2022-10-20 Thread GitBox
NightOwl888 opened a new issue, #11866: URL: https://github.com/apache/lucene/issues/11866 ### Description Several of the analyzers state that they are supposed to return an unmodifiable `CharArraySet`, but the set that is returned is writable, as you can see in the source. h

[GitHub] [lucene] rmuir commented on issue #11866: On many analyzers, the getDefaultStopSet() method returns a modifiable set, contrary to the docs

2022-10-20 Thread GitBox
rmuir commented on issue #11866: URL: https://github.com/apache/lucene/issues/11866#issuecomment-1285890056 The example is not correct. `WordlistLoader.getSnowballWordSet()` returns an unmodifiableSet. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [lucene] benwtrent commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-10-20 Thread GitBox
benwtrent commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1000928799 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsReader.java: ## @@ -0,0 +1,505 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [lucene] NightOwl888 commented on issue #11866: On many analyzers, the getDefaultStopSet() method returns a modifiable set, contrary to the docs

2022-10-20 Thread GitBox
NightOwl888 commented on issue #11866: URL: https://github.com/apache/lucene/issues/11866#issuecomment-1285916552 I attempted to modify it, and it is succeeding. ``` SoraniAnalyzer.getDefaultStopSet().Add("foo33") // returns true ``` -- This is an automated message

[GitHub] [lucene] jtibshirani opened a new pull request, #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
jtibshirani opened a new pull request, #11867: URL: https://github.com/apache/lucene/pull/11867 This is a rough draft of a large-scale test for kNN vectors. It tests a large dataset of kNN vectors to check for issues that only show up when segments are very large, like overflow. Th

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001074649 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001077394 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001089104 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001090648 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001183142 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] jtibshirani commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
jtibshirani commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001200789 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001206144 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001212579 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -61,11 +61,13 @@ @Monster("takes ~2 hours and needs 2GB heap") public class TestManyK

[GitHub] [lucene] jtibshirani commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
jtibshirani commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001214017 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -61,11 +61,13 @@ @Monster("takes ~2 hours and needs 2GB heap") public class Tes

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001217406 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001224833 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] rmuir commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001226287 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] rmuir commented on pull request #11867: Add monster test that indexes 1M vectors

2022-10-20 Thread GitBox
rmuir commented on PR #11867: URL: https://github.com/apache/lucene/pull/11867#issuecomment-1286335299 With current test i hit the exception on the 9.4 tag: BUILD FAILED in 2h 24m 45s: 2GB heap. Never saw any significant time (e.g. 0.1%) in GC or other jvm threads when inspecting the run

[GitHub] [lucene] mdmarshmallow commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-10-20 Thread GitBox
mdmarshmallow commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1286344459 Thanks Mike, I added an issue to `luceneutil`: https://github.com/mikemccand/luceneutil/issues/208 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [lucene] mdmarshmallow opened a new issue, #11868: Add a FilterIndexOutput

2022-10-20 Thread GitBox
mdmarshmallow opened a new issue, #11868: URL: https://github.com/apache/lucene/issues/11868 ### Description We have several subclasses of `IndexOutput` that have delegates, most recently one was added in this PR: https://github.com/apache/lucene/pull/11796. Adding a `FilterIndexOutp

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-10-20 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r1001276262 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteTrackingIndexOutput.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

[GitHub] [lucene] MarcusSorealheis commented on pull request #874: LUCENE-10471 Increse max dims for vectors to 2048

2022-10-20 Thread GitBox
MarcusSorealheis commented on PR #874: URL: https://github.com/apache/lucene/pull/874#issuecomment-1286509849 Should we punish and exclude customers who cannot complete requisite steps of dimensional reduction or allow them to explore with very expensive compute. Many popular large language

[GitHub] [lucene] JavaCoderCff closed pull request #271: LUCENE-9969:TaxoArrays, a member variable of the DirectoryTaxonomyReader class, i…

2022-10-20 Thread GitBox
JavaCoderCff closed pull request #271: LUCENE-9969:TaxoArrays, a member variable of the DirectoryTaxonomyReader class, i… URL: https://github.com/apache/lucene/pull/271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the