[GitHub] [lucene] jpountz commented on pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-17 Thread via GitHub
jpountz commented on PR #12199: URL: https://github.com/apache/lucene/pull/12199#issuecomment-1473272224 I suspect this change to be the source of the speedup when indexing vectors on https://home.apache.org/~mikemccand/lucenebench/indexing.html, but maybe more because of the introduced aff

[GitHub] [lucene] MarcusSorealheis opened a new pull request, #12208: Explain term automaton queries

2023-03-17 Thread via GitHub
MarcusSorealheis opened a new pull request, #12208: URL: https://github.com/apache/lucene/pull/12208 ### Description This is a draft PR to address #12178 that I wrote in my evenings, as I'm out of office and recovering (possibly slow to respond). Using the open source search communi

[GitHub] [lucene] s1monw commented on a diff in pull request #12205: Remove remaining sources of contention on indexing.

2023-03-17 Thread via GitHub
s1monw commented on code in PR #12205: URL: https://github.com/apache/lucene/pull/12205#discussion_r1140021701 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java: ## @@ -634,7 +652,9 @@ private void pruneBlockedQueue(final DocumentsWriterDeleteQueu

[GitHub] [lucene] Hanyakubin opened a new issue, #12209: CPU usage continuously increasing when IMAP SEARCH command coming from user agent

2023-03-17 Thread via GitHub
Hanyakubin opened a new issue, #12209: URL: https://github.com/apache/lucene/issues/12209 ### Description We use Apache James to provide voice visual mail for iPhone users and Lucene module is used in James to store nessecary information for index search. But the CPU usage of J

[GitHub] [lucene] jpountz commented on a diff in pull request #12205: Remove remaining sources of contention on indexing.

2023-03-17 Thread via GitHub
jpountz commented on code in PR #12205: URL: https://github.com/apache/lucene/pull/12205#discussion_r1140418586 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java: ## @@ -634,7 +652,9 @@ private void pruneBlockedQueue(final DocumentsWriterDeleteQue

[GitHub] [lucene] jpountz commented on a diff in pull request #12205: Remove remaining sources of contention on indexing.

2023-03-17 Thread via GitHub
jpountz commented on code in PR #12205: URL: https://github.com/apache/lucene/pull/12205#discussion_r1139165859 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java: ## @@ -133,7 +156,7 @@ private boolean assertMemory() { // peak document

[GitHub] [lucene] MarcusSorealheis commented on pull request #12208: Explain term automaton queries

2023-03-17 Thread via GitHub
MarcusSorealheis commented on PR #12208: URL: https://github.com/apache/lucene/pull/12208#issuecomment-1474501209 @rmuir or @mikemccand Do you have suggestions on who could be a good fit to advise me here on what types of tests or review? There's a few that are obvious that I need to

[GitHub] [lucene] zacharymorn commented on a diff in pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-17 Thread via GitHub
zacharymorn commented on code in PR #12194: URL: https://github.com/apache/lucene/pull/12194#discussion_r1140939880 ## lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java: ## @@ -211,4 +216,22 @@ protected final int slowAdvance(int target) throws IOException {

[GitHub] [lucene] zacharymorn commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-17 Thread via GitHub
zacharymorn commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1474732221 > Sorry for the delayed response, so I saw it in my optimized version of `BitSet#or` that does use the `peekNextNonMatchingDocID` API. I also found the bug, it turns out we just were

[GitHub] [lucene] zacharymorn commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-17 Thread via GitHub
zacharymorn commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1474743257 > Maybe we could try to leverage the geonames dataset (there's a few benchmarks for it in lucene-util), which has a few low-cardinality fields like the time zone or country. Then ena