[GitHub] [lucene] rmuir commented on issue #11869: Add RangeOnRangeFacetCounts

2023-01-17 Thread GitBox
rmuir commented on issue #11869: URL: https://github.com/apache/lucene/issues/11869#issuecomment-1385822481 Closing as the PR has been merged and is in the 9.5.0 section of CHANGES.txt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] rmuir commented on issue #11795: Add FilterDirectory to track write amplification factor

2023-01-17 Thread GitBox
rmuir commented on issue #11795: URL: https://github.com/apache/lucene/issues/11795#issuecomment-1385823162 Closing as the PR has been merged and is in the 9.5.0 section of CHANGES.txt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] rmuir closed issue #11795: Add FilterDirectory to track write amplification factor

2023-01-17 Thread GitBox
rmuir closed issue #11795: Add FilterDirectory to track write amplification factor URL: https://github.com/apache/lucene/issues/11795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [lucene] rmuir closed issue #11869: Add RangeOnRangeFacetCounts

2023-01-17 Thread GitBox
rmuir closed issue #11869: Add RangeOnRangeFacetCounts URL: https://github.com/apache/lucene/issues/11869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

[GitHub] [lucene] gsmiller opened a new pull request, #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
gsmiller opened a new pull request, #12089: URL: https://github.com/apache/lucene/pull/12089 ### Description This is a DRAFT PR to sketch out the idea of a "self optimizing" TermInSetQuery. The idea is to build on the new `KeywordField` being proposed in #12054, which indexes both po

[GitHub] [lucene] gsmiller commented on pull request #12054: Introduce a new `KeywordField`.

2023-01-17 Thread GitBox
gsmiller commented on PR #12054: URL: https://github.com/apache/lucene/pull/12054#issuecomment-1385952712 Somewhat related to this PR, I've been experimenting with the idea of a "self optimizing" `TermInSetQuery` implementation that toggles between using postings and doc values based on ind

[GitHub] [lucene] rmuir commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
rmuir commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1385954675 Thanks for looking at this. I can alter benchmark from #12087 to test this case, honestly we could even just take the benchmark and index the numeric field as a string instead as a start :

[GitHub] [lucene] rmuir commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
rmuir commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1385982331 I modified the benchmark from #12087 to just use StringField instead of IntField. The queries are supposed to be "hard" in that I'm not trying to benchmark what is necessarily typical, ins

[GitHub] [lucene] rmuir commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
rmuir commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1072830614 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java: ## @@ -0,0 +1,527 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
gsmiller commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1072835306 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java: ## @@ -0,0 +1,527 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] gsmiller commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
gsmiller commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1386068784 @rmuir > I was naively thinking to try to the same approach with the DocValuesTermsQuery that is in sandbox... I think that's probably a good place to start honestly. I was th

[GitHub] [lucene] rmuir commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
rmuir commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1072841550 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java: ## @@ -0,0 +1,527 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] rmuir commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
rmuir commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1072855208 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java: ## @@ -0,0 +1,527 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
gsmiller commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1072871141 ## lucene/core/src/java/org/apache/lucene/search/DisiWrapper.java: ## @@ -57,4 +57,14 @@ public DisiWrapper(Scorer scorer) { matchCost = 0f; } } + + p

[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
gsmiller commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1072872477 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java: ## @@ -0,0 +1,527 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox
gsmiller commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1072874867 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java: ## @@ -0,0 +1,527 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] mulugetam opened a new issue, #12090: Building a Lucene posting format that leverages the Java Vector API

2023-01-17 Thread GitBox
mulugetam opened a new issue, #12090: URL: https://github.com/apache/lucene/issues/12090 ### Description This issue is to start a conversation on implementing a vectorized encoding and decoding scheme for postings. A few months ago, we implemented vectorized integer compressio

[GitHub] [lucene] mulugetam opened a new issue, #12091: Speeding up Lucene Vector Similarity through the Java Vector API

2023-01-17 Thread GitBox
mulugetam opened a new issue, #12091: URL: https://github.com/apache/lucene/issues/12091 ### Description Lucene's implementation of ANN relies on a scalar implementation of the vector similarity functions [dot-product,](https://github.com/apache/lucene/blob/4fe8424925ca404d335fa41d26

[GitHub] [lucene] jebnix commented on issue #11870: Create a Markdown based documentation

2023-01-17 Thread GitBox
jebnix commented on issue #11870: URL: https://github.com/apache/lucene/issues/11870#issuecomment-1386297416 @uschindler That's nice, but I personally miss two things about the Lucene repo: 1. The ability to find the documentation in a central place (that makes the contribution much easi

[GitHub] [lucene] vigyasharma opened a new pull request, #12092: Remove UTF8TaxonomyWriterCache

2023-01-17 Thread GitBox
vigyasharma opened a new pull request, #12092: URL: https://github.com/apache/lucene/pull/12092 As per the discussion in PR #12013, this change removes the never evicting `UTF8TaxonomyWriterCache` and uses `LruTaxonomyWriterCache` as the default taxonomy writer cache implementation.

[GitHub] [lucene] vigyasharma commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2023-01-17 Thread GitBox
vigyasharma commented on PR #12013: URL: https://github.com/apache/lucene/pull/12013#issuecomment-1386545577 Created a separate PR - #12092 to remove support for `UTF8TaxonomyWriterCache` from main. Will close this PR. -- This is an automated message from the Apache Git Service. To respon

[GitHub] [lucene] vigyasharma closed pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2023-01-17 Thread GitBox
vigyasharma closed pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close() URL: https://github.com/apache/lucene/pull/12013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [lucene] vigyasharma merged pull request #12045: fix typo in KoreanNumberFilter

2023-01-17 Thread GitBox
vigyasharma merged PR #12045: URL: https://github.com/apache/lucene/pull/12045 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[GitHub] [lucene] vigyasharma opened a new pull request, #12093: Deprecate support for UTF8TaxonomyWriterCache

2023-01-17 Thread GitBox
vigyasharma opened a new pull request, #12093: URL: https://github.com/apache/lucene/pull/12093 As discussed in PR #12013 , deprecating support for `UTF8TaxonomyWriterCache` in branch_9x. Addresses #12000 -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [lucene] vigyasharma commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2023-01-17 Thread GitBox
vigyasharma commented on PR #12013: URL: https://github.com/apache/lucene/pull/12013#issuecomment-1386565076 PR - https://github.com/apache/lucene/pull/12093 to deprecate `UTF8TaxonomyWriterCache` in 9.x -- This is an automated message from the Apache Git Service. To respond to the me