[PR] Clean up unused code & variables [lucene]

2024-01-04 Thread via GitHub
dungba88 opened a new pull request, #12994: URL: https://github.com/apache/lucene/pull/12994 ### Description Clean up unused code & variables in FSTCompiler -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? [lucene]

2024-01-04 Thread via GitHub
stefanvodita commented on issue #12989: URL: https://github.com/apache/lucene/issues/12989#issuecomment-1878171579 @msfroh - I was looking into this as well and had some thoughts about how to do it. We could replace [`ParallelTaxonomyArrays`](https://github.com/apache/lucene/blob/7b8

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2024-01-04 Thread via GitHub
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1442419722 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { froz

Re: [I] Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? [lucene]

2024-01-04 Thread via GitHub
msfroh commented on issue #12989: URL: https://github.com/apache/lucene/issues/12989#issuecomment-1877979164 If nobody else is working on this, I think I'd like to take it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

2024-01-04 Thread via GitHub
dungba88 commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1877952318 @mikemccand I'm wondering if there is already some benchmark that can show the RAM saved by this change -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Use Collections.addAll() instead of manual array copy and misc. code cleanups [lucene]

2024-01-04 Thread via GitHub
dweiss merged PR #12977: URL: https://github.com/apache/lucene/pull/12977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Use Collections.addAll() instead of manual array copy [lucene]

2024-01-04 Thread via GitHub
dweiss commented on code in PR #12977: URL: https://github.com/apache/lucene/pull/12977#discussion_r1442253173 ## lucene/misc/src/java/org/apache/lucene/misc/index/IndexSplitter.java: ## @@ -67,18 +66,10 @@ public static void main(String[] args) throws Exception { if (args[

Re: [PR] Remove unnecessary comment [lucene]

2024-01-04 Thread via GitHub
andrross commented on PR #12993: URL: https://github.com/apache/lucene/pull/12993#issuecomment-1877693422 FYI @javanna -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] Remove unnecessary comment [lucene]

2024-01-04 Thread via GitHub
andrross opened a new pull request, #12993: URL: https://github.com/apache/lucene/pull/12993 ### Description A [previous iteration][1] of this code used an AtomicInteger and required this comment. The committed version uses a self-documenting boolean and the comment is not needed.

Re: [PR] Add support for index sorting with document blocks [lucene]

2024-01-04 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1442110320 ## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ## @@ -437,6 +488,33 @@ private void verifySoftDeletedFieldName(String fieldName, boolean isSoftDelete

Re: [PR] Introduce workflow for stale PRs [lucene]

2024-01-04 Thread via GitHub
stefanvodita commented on PR #12813: URL: https://github.com/apache/lucene/pull/12813#issuecomment-1877528163 I did some testing with the parameters I mentioned and pushed a new revision. In the tests on my fork, a stale PR took 5 operations to process and a non-stale PR took 1. -- Thi

Re: [PR] Introduce workflow for stale PRs [lucene]

2024-01-04 Thread via GitHub
stefanvodita commented on PR #12813: URL: https://github.com/apache/lucene/pull/12813#issuecomment-1877494977 I ended up checking more of the configurable parameters after looking for the one to exclude draft PRs. Three things to note: 1. I have to configure a few more parameters to handl

Re: [PR] Introduce workflow for stale PRs [lucene]

2024-01-04 Thread via GitHub
uschindler commented on code in PR #12813: URL: https://github.com/apache/lucene/pull/12813#discussion_r1442004702 ## .github/workflows/stale.yml: ## @@ -22,6 +22,7 @@ jobs: with: repo-token: ${{ secrets.GITHUB_TOKEN }} days-before-pr-stale: 14 +

Re: [I] Port PR management bot from Apache Beam [lucene]

2024-01-04 Thread via GitHub
uschindler commented on issue #12796: URL: https://github.com/apache/lucene/issues/12796#issuecomment-1877414041 As said in the mailing list thread: We should maybe exclude "draft" PRs from this. I regularily start them for stuff that I won't merge soon. One example is #12706 -- This is

Re: [PR] [WIP] LUCENE-10002: Deprecate FacetsCollector#search helper methods as they internally use IndexSearcher#search(Query, Collector) API [lucene]

2024-01-04 Thread via GitHub
mikemccand commented on PR #12890: URL: https://github.com/apache/lucene/pull/12890#issuecomment-1877386385 Yeah I agree it's OK to deprecate without replacement, but maybe in the deprecated javadocs (and in `MIGRATE.txt` for 10.0) add a short explanation about using `MultiCollectorManager`

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2024-01-04 Thread via GitHub
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441950879 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { fr

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2024-01-04 Thread via GitHub
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441927006 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { froz

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2024-01-04 Thread via GitHub
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441927006 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { froz

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2024-01-04 Thread via GitHub
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441917076 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { froz

Re: [PR] Initial impl of MMapDirectory for Java 22 [lucene]

2024-01-04 Thread via GitHub
uschindler commented on PR #12706: URL: https://github.com/apache/lucene/pull/12706#issuecomment-1877096268 No no no, I am not stale! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2024-01-04 Thread via GitHub
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441713259 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { fr

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2024-01-04 Thread via GitHub
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441712407 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { fr

[I] Test an FST bytes store that re-reverses (reads bytes forward) on-the-fly [lucene]

2024-01-04 Thread via GitHub
mikemccand opened a new issue, #12992: URL: https://github.com/apache/lucene/issues/12992 ### Description At read-time the FST apis must read bytes in reverse, which is perverse and unnatural for all stacks in modern CPUs / IO devices that do read-ahead optimizations for forward read

Re: [I] Nightly benchmark regression for term dict queries [lucene]

2024-01-04 Thread via GitHub
gf2121 closed issue #12659: Nightly benchmark regression for term dict queries URL: https://github.com/apache/lucene/issues/12659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Remove unnecessary fields loop from extractWeightedSpanTerms() [lucene]

2024-01-04 Thread via GitHub
dweiss commented on PR #12965: URL: https://github.com/apache/lucene/pull/12965#issuecomment-1876672741 Yep, I think this early termination condition makes sense. Could you add it, please? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Use Collections.addAll() instead of manual array copy [lucene]

2024-01-04 Thread via GitHub
dweiss commented on code in PR #12977: URL: https://github.com/apache/lucene/pull/12977#discussion_r1441428909 ## lucene/misc/src/java/org/apache/lucene/misc/index/IndexSplitter.java: ## @@ -67,18 +66,10 @@ public static void main(String[] args) throws Exception { if (args[