[GitHub] [lucene] rmuir commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

2022-10-31 Thread GitBox
rmuir commented on issue #11676: URL: https://github.com/apache/lucene/issues/11676#issuecomment-1296934205 Why so quick to jump to wall time? Please, no wall time, for any reason whatsoever. Surely, nanoTime can be used. -- This is an automated message from the Apache Git Service.

[GitHub] [lucene] xingdong015 commented on issue #11887: TestDocumentsWriterStallControl takes over a minute with -Ptests.seed=B83F4990EF501F47

2022-10-31 Thread GitBox
xingdong015 commented on issue #11887: URL: https://github.com/apache/lucene/issues/11887#issuecomment-1296990074 It looks like gradle initialization takes a lot of time ![image](https://user-images.githubusercontent.com/11306681/199004045-2d675f82-ffdd-43c8-9ce6-0197681a7b6b.png)

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009428049 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java: ## @@ -60,7 +64,11 @@ private List>> findSimilarDictionaryEntries

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009429083 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java: ## @@ -70,10 +78,10 @@ char transformChar(char c) { }

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009429597 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java: ## @@ -87,45 +95,31 @@ char transformChar(char c) { s

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009431207 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/WordStorage.java: ## @@ -179,11 +186,7 @@ void processSuggestibleWords( }

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009432252 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/WordStorage.java: ## @@ -54,7 +55,8 @@ class WordStorage { private static final int COLLI

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009433258 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/WordStorage.java: ## @@ -54,7 +55,8 @@ class WordStorage { private static final int COLLI

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009433258 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/WordStorage.java: ## @@ -54,7 +55,8 @@ class WordStorage { private static final int COLLI

[GitHub] [lucene] donnerpeter commented on pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on PR #11893: URL: https://github.com/apache/lucene/pull/11893#issuecomment-1297113922 With the cache, about 2x memory is used (~850MB for ~190 dictionaries). The caching gives me about 1.5x speedup for en/ru/de. -- This is an automated message from the Apache Git Se

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009435738 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/TestPerformance.java: ## @@ -86,7 +86,7 @@ public void de() throws Exception { @Test

[GitHub] [lucene] benwtrent commented on issue #10665: Benchmark KNN search with ann-benchmarks [LUCENE-9625]

2022-10-31 Thread GitBox
benwtrent commented on issue #10665: URL: https://github.com/apache/lucene/issues/10665#issuecomment-1297146958 I opened a PR for ann-benchmarks: https://github.com/erikbern/ann-benchmarks/pull/315 I tested PyLucene locally, comparing it to @msokolov's "batch" methodology (writing to

[GitHub] [lucene] jtibshirani commented on issue #10665: Benchmark KNN search with ann-benchmarks [LUCENE-9625]

2022-10-31 Thread GitBox
jtibshirani commented on issue #10665: URL: https://github.com/apache/lucene/issues/10665#issuecomment-1297299587 Thanks @benwtrent, it's great to see that PyLucene works well and has low overhead! It feels more solid than what we were doing before. +1 to preparing a new version. As I

[GitHub] [lucene] vigyasharma commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

2022-10-31 Thread GitBox
vigyasharma commented on issue #11676: URL: https://github.com/apache/lucene/issues/11676#issuecomment-1297303476 Sorry, I meant it'll add dependence on `nanoTime()`. I thought we use wallTime to refer to both `currentTimeInMillis` and `nanoTime`. If nanotime is acceptable, I can use

[GitHub] [lucene] reta commented on pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-10-31 Thread GitBox
reta commented on PR #11875: URL: https://github.com/apache/lucene/pull/11875#issuecomment-1297316017 > Looks good to me. I'll wait for a few days before merging, in case people have comments/concerns with the public visibility for `TimeLimitingBulkScorer` Thanks a lot @vigyasharma !

[GitHub] [lucene] rmuir commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

2022-10-31 Thread GitBox
rmuir commented on issue #11676: URL: https://github.com/apache/lucene/issues/11676#issuecomment-1297328617 > Sorry, I meant it'll add dependence on `nanoTime()`. I thought we use wallTime to refer to both `currentTimeInMillis` and `nanoTime`. nanoTime (at least on linux) uses the mon

[GitHub] [lucene] rmuir commented on issue #11887: TestDocumentsWriterStallControl takes over a minute with -Ptests.seed=B83F4990EF501F47

2022-10-31 Thread GitBox
rmuir commented on issue #11887: URL: https://github.com/apache/lucene/issues/11887#issuecomment-1297341214 I don't think profiler is helpful because test is not doing anything, except sleeping on `Object.wait`. I used `jstack` while the test was hung: ``` "TEST-TestDocumentsWriter

[GitHub] [lucene] rmuir commented on issue #11887: TestDocumentsWriterStallControl takes over a minute with -Ptests.seed=B83F4990EF501F47

2022-10-31 Thread GitBox
rmuir commented on issue #11887: URL: https://github.com/apache/lucene/issues/11887#issuecomment-1297345527 It runs much faster with this patch: ``` --- a/lucene/core/src/test/org/apache/lucene/index/TestDocumentsWriterStallControl.java +++ b/lucene/core/src/test/org/apache/lucene/

[GitHub] [lucene] dweiss commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
dweiss commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009622594 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -647,8 +671,23 @@ Root findStem( if (!hasGoodSuggestions && dictionar

[GitHub] [lucene] ovalhub commented on issue #10665: Benchmark KNN search with ann-benchmarks [LUCENE-9625]

2022-10-31 Thread GitBox
ovalhub commented on issue #10665: URL: https://github.com/apache/lucene/issues/10665#issuecomment-1297370722 On Mon, 31 Oct 2022, Benjamin Trent wrote: > I tested PyLucene locally, comparing it to @msokolov's "batch" methodology > (writing to disk and spinning up a Java proces

[GitHub] [lucene] rmuir commented on issue #11887: TestDocumentsWriterStallControl takes over a minute with -Ptests.seed=B83F4990EF501F47

2022-10-31 Thread GitBox
rmuir commented on issue #11887: URL: https://github.com/apache/lucene/issues/11887#issuecomment-1297392725 The condition where this test takes minutes isn't that rare, I ran the test 10 times and hit the slow condition 3 out of 10 executions: * 151s * 158s * 32s With the pat

[GitHub] [lucene] rmuir opened a new pull request, #11894: Tone down TestDocumentsWriterStallControl.testRandom, so it does not take minutes

2022-10-31 Thread GitBox
rmuir opened a new pull request, #11894: URL: https://github.com/apache/lucene/pull/11894 The current test has ~ minute runtimes approximately 30% of the time. Closes #11887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] benwtrent commented on issue #10665: Benchmark KNN search with ann-benchmarks [LUCENE-9625]

2022-10-31 Thread GitBox
benwtrent commented on issue #10665: URL: https://github.com/apache/lucene/issues/10665#issuecomment-1297430223 @ovalhub `numpy` collections are already native. To use them, I have to pull them into python collections and then cast them to be native again. Example: ``` X = X.to

[GitHub] [lucene] benwtrent commented on issue #10665: Benchmark KNN search with ann-benchmarks [LUCENE-9625]

2022-10-31 Thread GitBox
benwtrent commented on issue #10665: URL: https://github.com/apache/lucene/issues/10665#issuecomment-1297440415 > It'd also be great to compare the results against hnswlib as part of the submission. We can double-check that recall is the same for a given set of parameters. This would give c

[GitHub] [lucene] ovalhub commented on issue #10665: Benchmark KNN search with ann-benchmarks [LUCENE-9625]

2022-10-31 Thread GitBox
ovalhub commented on issue #10665: URL: https://github.com/apache/lucene/issues/10665#issuecomment-1297463120 On Mon, 31 Oct 2022, Benjamin Trent wrote: > @ovalhub `numpy` collections are already native. To use them, I have to > pull them into python collections and then cast t

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-10-31 Thread GitBox
donnerpeter commented on code in PR #11893: URL: https://github.com/apache/lucene/pull/11893#discussion_r1009734992 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -72,10 +77,29 @@ public Hunspell(Dictionary dictionary) { * or

[GitHub] [lucene] benwtrent commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-10-31 Thread GitBox
benwtrent commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1009762545 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsReader.java: ## @@ -0,0 +1,505 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [lucene] jtibshirani commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-10-31 Thread GitBox
jtibshirani commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1009765612 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/package-info.java: ## @@ -0,0 +1,422 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [lucene] jtibshirani commented on pull request #239: LUCENE-10040: Handle deletions in nearest vector search

2022-10-31 Thread GitBox
jtibshirani commented on PR #239: URL: https://github.com/apache/lucene/pull/239#issuecomment-1297711772 @harishankar-gopalan sorry for the slow response! Your overall understanding is right. In Lucene, deletions are handled by marking a document as deleted using a 'tombstone'. The index st

[GitHub] [lucene] jpountz commented on pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-10-31 Thread GitBox
jpountz commented on PR #11875: URL: https://github.com/apache/lucene/pull/11875#issuecomment-1298076313 Sorry for the lag I'm on vacation. The problem with "this class may be useful outside of Lucene" to me is that it could apply to any class in Lucene. We did indeed make some classe