[PR] Parse escaped brackets and spaces in range queries [lucene]

2024-10-10 Thread via GitHub
benchaplin opened a new pull request, #13887: URL: https://github.com/apache/lucene/pull/13887 ### Description [This issue](https://github.com/apache/lucene/issues/13234) raises a question about the QueryParser's ability to handle escaped brackets in a range query's terms. ```

Re: [PR] Remove broken .toArray from Long/CharObjectHashMap entirely [lucene]

2024-10-10 Thread via GitHub
dweiss commented on PR #13884: URL: https://github.com/apache/lucene/pull/13884#issuecomment-2406582594 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Remove broken .toArray from Long/CharObjectHashMap entirely [lucene]

2024-10-10 Thread via GitHub
dweiss merged PR #13884: URL: https://github.com/apache/lucene/pull/13884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Remove broken .toArray from Long/CharObjectHashMap entirely [lucene]

2024-10-10 Thread via GitHub
bugmakerr commented on PR #13884: URL: https://github.com/apache/lucene/pull/13884#issuecomment-2406512304 @dweiss Thanks. I've fixed the lint error, PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] DocValuesSkipper implementation in IndexSortSorted [lucene]

2024-10-10 Thread via GitHub
BrianWoolfolk commented on PR #13886: URL: https://github.com/apache/lucene/pull/13886#issuecomment-2406383720 There are some failing tests because of the unused methods and imports from the older logic, and I wanted to see if my implementation of the doc skipper is good before removing the

[PR] DocValuesSkipper implementation in IndexSortSorted [lucene]

2024-10-10 Thread via GitHub
BrianWoolfolk opened a new pull request, #13886: URL: https://github.com/apache/lucene/pull/13886 Fixes #13840 `IndexSortSortedNumericDocValuesRangeQuery` now implements a similar logic (using `DocValuesSkipper`) as `SortedNumericDocValuesRangeQuery`'s `getDocIdSetIteratorOrNullForP

Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]

2024-10-10 Thread via GitHub
jpountz commented on PR #13885: URL: https://github.com/apache/lucene/pull/13885#issuecomment-2405769428 If you're looking at this sort of allocation, you may also want to specialize BlockDocsEnum into one class that decodes only doc IDs and another one that decodes docs and freqs. The form

Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]

2024-10-10 Thread via GitHub
jpountz commented on PR #13885: URL: https://github.com/apache/lucene/pull/13885#issuecomment-2405765367 Even if it doesn't show up in benchmarks it's disappointing to have these conditions in hot code paths. Could we instead initialize these objects in reset() if `docFreq >= BLOCK_SIZE`?

Re: [PR] Remove broken .toArray from Long/CharObjectHashMap entirely [lucene]

2024-10-10 Thread via GitHub
dweiss commented on PR #13884: URL: https://github.com/apache/lucene/pull/13884#issuecomment-2405653923 Sure, thank you. The linter has failed - seems like an unused method. Please correct and perhaps add a CHANGES.txt entry with an attribution to you? -- This is an automated message from

[PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]

2024-10-10 Thread via GitHub
original-brownbear opened a new pull request, #13885: URL: https://github.com/apache/lucene/pull/13885 Lazy initialize these fields. They consume/cause a lot of memory/GC because they are allocated frequently (~7% of all allocations in luceneutil's wikimedia medium run for me). This does no

Re: [PR] Remove broken .toArray from Long/CharObjectHashMap entirely [lucene]

2024-10-10 Thread via GitHub
bugmakerr commented on PR #13884: URL: https://github.com/apache/lucene/pull/13884#issuecomment-2405554212 @dweiss would you mind taking a look at this if you get a chance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] Remove broken .toArray from Long/CharObjectHashMap entirely [lucene]

2024-10-10 Thread via GitHub
bugmakerr opened a new pull request, #13884: URL: https://github.com/apache/lucene/pull/13884 ### Description Fixes #13761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] IntObjectHashMap.values().toArray() method throws ClassCastException [lucene]

2024-10-10 Thread via GitHub
bugmakerr commented on issue #13761: URL: https://github.com/apache/lucene/issues/13761#issuecomment-2405515035 @dweiss I think that `CharObjectHashMap` and `LongObjectHashMap` need to be fixed as well, I can help to create a PR. -- This is an automated message from the Apache Git Ser

[I] A multi-tenant ConcurrentMergeScheduler [lucene]

2024-10-10 Thread via GitHub
jpountz opened a new issue, #13883: URL: https://github.com/apache/lucene/issues/13883 ### Description `ConcurrentMergeScheduler` computes max thread counts assuming a single `IndexWriter` in the JVM. But it's common with Solr or Elasticsearch to have tens of active `IndexWriter`s ru

Re: [PR] Tombstone branch_9x [lucene]

2024-10-10 Thread via GitHub
ChrisHegarty merged PR #13882: URL: https://github.com/apache/lucene/pull/13882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Add tooling back on 9.10.x branch to generate int7_hnsw.9.10.zip bwc index [lucene]

2024-10-10 Thread via GitHub
mikemccand commented on code in PR #13879: URL: https://github.com/apache/lucene/pull/13879#discussion_r1795372954 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestGenerateBwcIndices.java: ## @@ -82,6 +82,16 @@ public void testCreateSortedIndex() throws IO

Re: [PR] Tombstone branch_9x [lucene]

2024-10-10 Thread via GitHub
jpountz commented on PR #13882: URL: https://github.com/apache/lucene/pull/13882#issuecomment-2404950119 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Speedup OrderedIntervalsSource [lucene]

2024-10-10 Thread via GitHub
jpountz commented on PR #13871: URL: https://github.com/apache/lucene/pull/13871#issuecomment-2404883858 Cool, I pushed an annotation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Speedup OrderedIntervalsSource [lucene]

2024-10-10 Thread via GitHub
original-brownbear commented on PR #13871: URL: https://github.com/apache/lucene/pull/13871#issuecomment-2404878244 Now it's there: `` IntervalsOrdered 19.1(2.6%) 22.0(2.4%) 1.2 X 0.000 ``` :) -- This is an automated message from the Apache Git Service

Re: [PR] Tombstone branch_9x [lucene]

2024-10-10 Thread via GitHub
ChrisHegarty commented on PR #13882: URL: https://github.com/apache/lucene/pull/13882#issuecomment-2404854635 ![Screenshot 2024-10-10 at 12 36 56](https://github.com/user-attachments/assets/703f9dfe-5bbc-459b-ba8e-0569d3a33139) -- This is an automated message from the Apache Git Servi

[PR] Tombstone branch_9x [lucene]

2024-10-10 Thread via GitHub
ChrisHegarty opened a new pull request, #13882: URL: https://github.com/apache/lucene/pull/13882 This commit tombstones branch_9x - wipes all files and updates the readme to point to more modern branches. Not much point looking at the files changes. Maybe just update to the branch an

Re: [PR] Make generated archive files reproducible [lucene]

2024-10-10 Thread via GitHub
dweiss merged PR #13835: URL: https://github.com/apache/lucene/pull/13835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Make generated archive files reproducible [lucene]

2024-10-10 Thread via GitHub
dweiss commented on PR #13835: URL: https://github.com/apache/lucene/pull/13835#issuecomment-2404620384 Thank you. I'll merge this into branch_9x and branch_10x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Sometimes intersect the essential clause and the best non-essential clause. [lucene]

2024-10-10 Thread via GitHub
jpountz commented on PR #12589: URL: https://github.com/apache/lucene/pull/12589#issuecomment-2404503497 No, the sentence is correct. Non essential clauses are clauses that cannot produce a match on their own because the score wouldn't be high enough. So the more non-essential clauses, the

Re: [PR] Make MaxScoreBulkScorer repartition scorers when the min competitive increases. [lucene]

2024-10-10 Thread via GitHub
jpountz commented on PR #13800: URL: https://github.com/apache/lucene/pull/13800#issuecomment-2404466253 @zhaih Indeed conjunctions need a similar fix, I'll look into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2024-10-10 Thread via GitHub
jpountz commented on code in PR #12868: URL: https://github.com/apache/lucene/pull/12868#discussion_r1794974762 ## lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java: ## @@ -150,9 +149,10 @@ private FuzzySet(FixedBitSet filter, int bloomSize, int hashCount) {

Re: [PR] Add reopen method in PerThreadPKLookup [lucene]

2024-10-10 Thread via GitHub
jpountz merged PR #13596: URL: https://github.com/apache/lucene/pull/13596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Make generated archive files reproducible [lucene]

2024-10-10 Thread via GitHub
breskeby commented on PR #13835: URL: https://github.com/apache/lucene/pull/13835#issuecomment-2404281566 > Part of this is actually already in gradle-archives.gradle - I think adding this patch there would be better? @dweiss updated this accordingly. boiled down to a one line change

Re: [PR] Fix 9.12.0 backcompat break (Lucene 9.12.0 cannot read 9.11.x indices written with quantized HNSW, `Lucene99HnswScalarQuantizedVectorsFormat`) [lucene]

2024-10-10 Thread via GitHub
ChrisHegarty commented on PR #13874: URL: https://github.com/apache/lucene/pull/13874#issuecomment-2404222904 > if I separately run `./gradlew clean` and then `./gradlew check` then it's fine ... weird. We’ve seen similar recently, reported in #13567. Likely a grade bug, which w