[GitHub] [lucene] shubhamvishu commented on pull request #12183: Make some heavy query rewrites concurrent

2023-09-12 Thread via GitHub
shubhamvishu commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1716957965 @jpountz I have made some changes to the `TermStates#build` to unblock this PR and avoid the deadlock issue happening due to executor forking into itself by checking if its a `Thre

[GitHub] [lucene] Tony-X closed issue #12536: Remove `lastPosBlockOffset` from term metadata for Lucene90PostingsFormat

2023-09-12 Thread via GitHub
Tony-X closed issue #12536: Remove `lastPosBlockOffset` from term metadata for Lucene90PostingsFormat URL: https://github.com/apache/lucene/issues/12536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] Tony-X commented on issue #12536: Remove `lastPosBlockOffset` from term metadata for Lucene90PostingsFormat

2023-09-12 Thread via GitHub
Tony-X commented on issue #12536: URL: https://github.com/apache/lucene/issues/12536#issuecomment-1716406470 https://github.com/apache/lucene/pull/12541 is merged and I'll close this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [lucene] Tony-X commented on a diff in pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-12 Thread via GitHub
Tony-X commented on code in PR #12552: URL: https://github.com/apache/lucene/pull/12552#discussion_r1323531587 ## lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java: ## @@ -191,7 +193,9 @@ final class TermsReader extends Terms { this.sumTotalTermFr

[GitHub] [lucene] msokolov commented on a diff in pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-12 Thread via GitHub
msokolov commented on code in PR #12552: URL: https://github.com/apache/lucene/pull/12552#discussion_r1323494538 ## lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java: ## @@ -191,7 +193,9 @@ final class TermsReader extends Terms { this.sumTotalTerm

[GitHub] [lucene] Tony-X opened a new pull request, #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-12 Thread via GitHub
Tony-X opened a new pull request, #12552: URL: https://github.com/apache/lucene/pull/12552 ### Description FSTs supports to load offheap for a while. As we were trying to use `FSTPostingsFormat` for some fields we realized heap usage bumped. Upon further investigation we reali

[GitHub] [lucene] jimczi opened a new pull request, #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

2023-09-12 Thread via GitHub
jimczi opened a new pull request, #12551: URL: https://github.com/apache/lucene/pull/12551 This PR introduces a new parameter known as 'efSearch' to the knn vector query. 'efSearch' governs the maximum size of the priority queue employed for nearest neighbor searches. As each segment may co

[GitHub] [lucene] jpountz merged pull request #12490: Reduce the overhead of ImpactsDISI.

2023-09-12 Thread via GitHub
jpountz merged PR #12490: URL: https://github.com/apache/lucene/pull/12490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jimczi merged pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-12 Thread via GitHub
jimczi merged PR #12529: URL: https://github.com/apache/lucene/pull/12529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] mikemccand commented on pull request #12541: Document why we need `lastPosBlockOffset`

2023-09-12 Thread via GitHub
mikemccand commented on PR #12541: URL: https://github.com/apache/lucene/pull/12541#issuecomment-1715559983 I backported to 9.x as well ... annoying that GitHub doesn't state in summary that the above push was to 9.x (it's only reflected here because it referenced this PR). It does reflect

[GitHub] [lucene] mikemccand merged pull request #12541: Document why we need `lastPosBlockOffset`

2023-09-12 Thread via GitHub
mikemccand merged PR #12541: URL: https://github.com/apache/lucene/pull/12541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] uschindler commented on pull request #12460: Allow reading binary doc values as a DataInput

2023-09-12 Thread via GitHub
uschindler commented on PR #12460: URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715550666 To save more memory copies, the codec may use a slice from the underlying IndexInput directly to support both access apis. All file pointer checks would then be performed by the low l

[GitHub] [lucene] jpountz commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-12 Thread via GitHub
jpountz commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1322897603 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] uschindler commented on pull request #12460: Allow reading binary doc values as a DataInput

2023-09-12 Thread via GitHub
uschindler commented on PR #12460: URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715514900 > This has been a challenge so many times in the past, maybe it's time to add `seek()` support to `DataInput`? We have full random access (positional reads), if you extend the i

[GitHub] [lucene] stefanvodita commented on pull request #12337: Index arbitrary fields in taxonomy docs

2023-09-12 Thread via GitHub
stefanvodita commented on PR #12337: URL: https://github.com/apache/lucene/pull/12337#issuecomment-1715512722 Thank you for the review @mikemccand! I’ve integrated your feedback. Updatable doc values are definitely something to consider. For comparison, I coded up an [association facet fi

[GitHub] [lucene] stefanvodita commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

2023-09-12 Thread via GitHub
stefanvodita commented on code in PR #12337: URL: https://github.com/apache/lucene/pull/12337#discussion_r1322872602 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software

[GitHub] [lucene] jimczi commented on pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-12 Thread via GitHub
jimczi commented on PR #12529: URL: https://github.com/apache/lucene/pull/12529#issuecomment-1715484871 Given that no further concerns have been raised, I am intending to merge this change soon. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [lucene] jpountz commented on pull request #12490: Reduce the overhead of ImpactsDISI.

2023-09-12 Thread via GitHub
jpountz commented on PR #12490: URL: https://github.com/apache/lucene/pull/12490#issuecomment-1715453502 Another benchmark run on the last commit to make sure it still works as expected, and wikibigall this time instead of wikimedium10m: ``` TaskQPS base

[GitHub] [lucene] stefanvodita closed pull request #12550: [Demo] Per label association facet fields

2023-09-12 Thread via GitHub
stefanvodita closed pull request #12550: [Demo] Per label association facet fields URL: https://github.com/apache/lucene/pull/12550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [lucene] stefanvodita commented on pull request #12550: [Demo] Per label association facet fields

2023-09-12 Thread via GitHub
stefanvodita commented on PR #12550: URL: https://github.com/apache/lucene/pull/12550#issuecomment-1715245714 Cancelling right away, this is not meant to be merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [lucene] stefanvodita opened a new pull request, #12550: [Demo] Per label association facet fields

2023-09-12 Thread via GitHub
stefanvodita opened a new pull request, #12550: URL: https://github.com/apache/lucene/pull/12550 ### Description A user could have data about facet labels. In the demo here, we record an author's popularity score, with authors being facet labels in an index of books. Today, use

[GitHub] [lucene] jpountz commented on pull request #12460: Allow reading binary doc values as a DataInput

2023-09-12 Thread via GitHub
jpountz commented on PR #12460: URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715238722 > I think this approach defeats on of the main purposes for this change, that is to avoid allocating a byte array when reading doc values. I don't think we want BinaryDocValues to do tha

[GitHub] [lucene] iverase commented on pull request #12460: Allow reading binary doc values as a DataInput

2023-09-12 Thread via GitHub
iverase commented on PR #12460: URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715224914 > I'm contemplating not introducing a new DataInputDocValues class, and instead have a dataInput() method on BinaryDocValues I think this approach defeats on of the main purposes f

[GitHub] [lucene] jpountz commented on a diff in pull request #12549: Run merge-on-full-flush even though no changes got flushed.

2023-09-12 Thread via GitHub
jpountz commented on code in PR #12549: URL: https://github.com/apache/lucene/pull/12549#discussion_r1322599113 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriterDelete.java: ## @@ -1315,7 +1315,8 @@ public void testTryDeleteDocument() throws Exception { w.addD

[GitHub] [lucene] jpountz commented on a diff in pull request #12549: Run merge-on-full-flush even though no changes got flushed.

2023-09-12 Thread via GitHub
jpountz commented on code in PR #12549: URL: https://github.com/apache/lucene/pull/12549#discussion_r1322592471 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -518,11 +518,10 @@ public void testFlushWithNoMerging() throws IOException { doc.add(n

[GitHub] [lucene] jpountz commented on pull request #12460: Allow reading binary doc values as a DataInput

2023-09-12 Thread via GitHub
jpountz commented on PR #12460: URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715126194 The more I think of this change, the more I like it: most of the time, you would need to read data out of binary doc values, e.g. (variable-length) integers, strings, etc. and exposing b