[GitHub] [lucene] jpountz opened a new pull request, #12114: Use radix sort to sort postings when index sorting is enabled.

2023-01-27 Thread via GitHub
jpountz opened a new pull request, #12114: URL: https://github.com/apache/lucene/pull/12114 This switches to LSBRadixSorter instead of TimSorter to sort postings whose index options are `DOCS`. On a synthetic benchmark this yielded barely any difference in the case when the index order is t

[GitHub] [lucene] jpountz commented on pull request #12114: Use radix sort to sort postings when index sorting is enabled.

2023-01-27 Thread via GitHub
jpountz commented on PR #12114: URL: https://github.com/apache/lucene/pull/12114#issuecomment-1406248901 Here is the synthetic benchmark that I used if someone is interested in reproducing: ```java enum Order { RANDOM, ASC, DESC; } public static

[GitHub] [lucene] cpoerschke opened a new issue, #12115: org.apache.lucene.search.uhighlight.TestUnifiedHighlighterTermVec.testFetchTermVecsOncePerDoc fails reproducibly

2023-01-27 Thread via GitHub
cpoerschke opened a new issue, #12115: URL: https://github.com/apache/lucene/issues/12115 ### Description currently 5 matches in the last few weeks: https://lists.apache.org/list?bui...@lucene.apache.org:dfr=2022-1-1|dto=2024-1-1:org.apache.lucene.search.uhighlight.TestUnifiedHighligh

[GitHub] [lucene] jpountz opened a new pull request, #12116: Improve document API for stored fields.

2023-01-27 Thread via GitHub
jpountz opened a new pull request, #12116: URL: https://github.com/apache/lucene/pull/12116 Currently stored fields have to look at binaryValue(), stringValue() and numericValue() to guess the type of the value and then store it. This has a few issues: - If there is a problem, e.g. all

[GitHub] [lucene] jpountz commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-27 Thread via GitHub
jpountz commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1089154108 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcept

[GitHub] [lucene] gsmiller commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-27 Thread via GitHub
gsmiller commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1406839169 I found some time to come back to this and did some more benchmarking. I added a markdown file with some benchmark results in this PR for now just as a place to put it. It's [here](htt

[GitHub] [lucene] rmuir commented on pull request #12116: Improve document API for stored fields.

2023-01-27 Thread via GitHub
rmuir commented on PR #12116: URL: https://github.com/apache/lucene/pull/12116#issuecomment-1406967754 This is great, thanks for looking into it. It moves "type-guessing" into the one place that should be doing it, which is the generic Field.java i didn't really think too deeply about

[GitHub] [lucene] rmuir commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-27 Thread via GitHub
rmuir commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1407020062 thanks for the work benchmarking! you can rename to a .txt file and just attach it to a github comment, as one solution. yeah, this one looked to be trickier at a glance. simply inde

[GitHub] [lucene] gsmiller commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-27 Thread via GitHub
gsmiller commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1407307849 > attach it to a github comment That works! Here's how I benchmarked. One note if you're interested in running this is to make sure to shuffle the genomes data prior to running or