[GitHub] [lucene] jpountz commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
jpountz commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133614094 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -130,13 +134,15 @@ private void ensureOpen() { } } + private synchronized boolean contains(DocumentsWriterPerThread state) { +return dwpts.contains(state); + } + void marksAsFreeAndUnlock(DocumentsWriterPerThread state) { final long ramBytesUsed = state.ramBytesUsed(); -synchronized (this) { - assert dwpts.contains(state) - : "we tried to add a DWPT back to the pool but the pool doesn't know aobut this DWPT"; - freeList.add(state, ramBytesUsed); -} +assert contains(state) +: "we tried to add a DWPT back to the pool but the pool doesn't know aobut this DWPT"; Review Comment: Thanks for catching Uwe, I fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
jpountz commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133629633 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { Review Comment: I still have the feeling that you are making incorrect assumptions about what this piece of code is doing (this method itself doesn't publish the object to other threads), but I took this thread as a call to keep things simple, so I removed the retry and added a few more comments about the logic of this pool. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
dweiss commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133644248 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { Review Comment: I certainly am! But anyone looking at this code without jumping deep will have the same doubts, I think. Unless there is a convincing (performance) argument that it's worth it, I like the updated version a lot better - it's clear and simple. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
uschindler commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133699509 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { Review Comment: Thanks code looks much better. I was also irritated by the long chain of "tries" with and without synchronized. To me actualy code is easier to read. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
uschindler commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133704928 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); - } - // DWPT is already locked before return by this method: - return dwpt; +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { + // newWriter() adds the DWPT to the `dwpts` set as a side-effect. However it is not added to + // `freeList` at this point, it will be added later on once DocumentsWriter has indexed a + // document into this DWPT and then gives it back to the pool by calling + // #marksAsFreeAndUnlock. + dwpt = newWriter(); Review Comment: I would change this to `return newWriter();` For better readability also invert the if statement like that: ``` DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); if (dwpt != null) { return dwpt; } return newWriter(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
uschindler commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133708757 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); - } - // DWPT is already locked before return by this method: - return dwpt; +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { + // newWriter() adds the DWPT to the `dwpts` set as a side-effect. However it is not added to + // `freeList` at this point, it will be added later on once DocumentsWriter has indexed a + // document into this DWPT and then gives it back to the pool by calling + // #marksAsFreeAndUnlock. + dwpt = newWriter(); Review Comment: Don't need to do this, but looks much more "Java 17 like" - just for demonstation purposes: ```java return Optional.ofNullable(freeList.poll(DocumentsWriterPerThread::tryLock)).orElse(this::newWriter); ``` ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); - } - // DWPT is already locked before return by this method: - return dwpt; +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { + // newWriter() adds the DWPT to the `dwpts` set as a side-effect. However it is not added to + // `freeList` at this point, it will be added later on once DocumentsWriter has indexed a + // document into this DWPT and then gives it back to the pool by calling + // #marksAsFreeAndUnlock. + dwpt = newWriter(); Review Comment: I would change this to `return newWriter();` For better readability also invert the if statement like that: ```java DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); if (dwpt != null) { return dwpt; } return newWriter(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
uschindler commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133704928 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); - } - // DWPT is already locked before return by this method: - return dwpt; +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { + // newWriter() adds the DWPT to the `dwpts` set as a side-effect. However it is not added to + // `freeList` at this point, it will be added later on once DocumentsWriter has indexed a + // document into this DWPT and then gives it back to the pool by calling + // #marksAsFreeAndUnlock. + dwpt = newWriter(); Review Comment: I would change this to `return newWriter();` For better readability also invert the if statement like that: ```java DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); if (dwpt != null) { return dwpt; } return newWriter(); ``` (of course add your comments!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
uschindler commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133708757 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); - } - // DWPT is already locked before return by this method: - return dwpt; +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { + // newWriter() adds the DWPT to the `dwpts` set as a side-effect. However it is not added to + // `freeList` at this point, it will be added later on once DocumentsWriter has indexed a + // document into this DWPT and then gives it back to the pool by calling + // #marksAsFreeAndUnlock. + dwpt = newWriter(); Review Comment: Don't need to do this, but looks much more "Java 17 like" - just for demonstation purposes: ```java return Optional.ofNullable(freeList.poll(DocumentsWriterPerThread::tryLock)).orElseGet(this::newWriter); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
uschindler commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133708757 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); - } - // DWPT is already locked before return by this method: - return dwpt; +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { + // newWriter() adds the DWPT to the `dwpts` set as a side-effect. However it is not added to + // `freeList` at this point, it will be added later on once DocumentsWriter has indexed a + // document into this DWPT and then gives it back to the pool by calling + // #marksAsFreeAndUnlock. + dwpt = newWriter(); Review Comment: Don't need to do this, but looks much more "Java 17 like" - just for demonstation purposes: ```java return Optional.ofNullable(freeList.poll(DocumentsWriterPerThread::tryLock)).orElseGet(this::newWriter); ``` Of course it brings no comments anymore. Don't do it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
dweiss commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133753108 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); - } - // DWPT is already locked before return by this method: - return dwpt; +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { + // newWriter() adds the DWPT to the `dwpts` set as a side-effect. However it is not added to + // `freeList` at this point, it will be added later on once DocumentsWriter has indexed a + // document into this DWPT and then gives it back to the pool by calling + // #marksAsFreeAndUnlock. + dwpt = newWriter(); Review Comment: Nice, Uwe. Although nobody can ever convince me those optional expressions are easier to read and understand than a plain if... :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jasirkt commented on issue #8966: Wildcard query parser of MultiFieldQueryParser should support boosts [LUCENE-7917]
jasirkt commented on issue #8966: URL: https://github.com/apache/lucene/issues/8966#issuecomment-1465981107 Apart from wildcard query, this issue also applies to fuzzy, prefix, range and regexp queries -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
uschindler commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1133798472 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() { * operation (add/updateDocument). */ DocumentsWriterPerThread getAndLock() { -synchronized (this) { - ensureOpen(); - DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); - if (dwpt == null) { -dwpt = newWriter(); - } - // DWPT is already locked before return by this method: - return dwpt; +ensureOpen(); +DocumentsWriterPerThread dwpt = freeList.poll(DocumentsWriterPerThread::tryLock); +if (dwpt == null) { + // newWriter() adds the DWPT to the `dwpts` set as a side-effect. However it is not added to + // `freeList` at this point, it will be added later on once DocumentsWriter has indexed a + // document into this DWPT and then gives it back to the pool by calling + // #marksAsFreeAndUnlock. + dwpt = newWriter(); Review Comment: It was a joke! My intention was the first comment, to invert the if logic so use "return on success" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jasirkt opened a new pull request, #12202: Fix boost missing in MultiFieldQueryParser
jasirkt opened a new pull request, #12202: URL: https://github.com/apache/lucene/pull/12202 ### Description This fixes #8966. When using boost along with any of fuzzy, wildcard, regexp, range or prefix queries, the boost is not applied. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #12202: Fix boost missing in MultiFieldQueryParser
rmuir merged PR #12202: URL: https://github.com/apache/lucene/pull/12202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir closed issue #8966: Wildcard query parser of MultiFieldQueryParser should support boosts [LUCENE-7917]
rmuir closed issue #8966: Wildcard query parser of MultiFieldQueryParser should support boosts [LUCENE-7917] URL: https://github.com/apache/lucene/issues/8966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12202: Fix boost missing in MultiFieldQueryParser
rmuir commented on PR #12202: URL: https://github.com/apache/lucene/pull/12202#issuecomment-1466056963 @jasirkt thanks for fixing another boost bug in this parser! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sherman opened a new issue, #12203: Scalable merge/compaction of big doc values segments.
sherman opened a new issue, #12203: URL: https://github.com/apache/lucene/issues/12203 ### Description The question is regarding the scalable merge/compaction of doc values, given the following context: I have a large sharded index. Each shard can contain segments of millions of documents. There are several hundred fields in the index, and half of them are doc values. I sometimes face issues with merge times when I need to merge or compact a large segment. The problem is that it's a single-threaded operation where a single segment is merged in a single merger thread. From the codec doc values format of version 9.x, it appears possible to use map-reduce techniques when writing new large doc value segments. This is because all metadata is read before any field data can be read, and all doc value types have offset and length fields in the metadata. My basic idea is to write each field in parallel to a single file and then perform a low-level merge of the binary data. After that, I can rewrite only the metadata to update the offsets. As I am still new to Lucene development, could someone please provide some critique of this idea? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kashkambath opened a new issue, #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode
kashkambath opened a new issue, #12204: URL: https://github.com/apache/lucene/issues/12204 ### Description Currently, `ToParentBlockJoinQuery`'s [explain()](https://github.com/apache/lucene/blob/4851bd74f402602f53719374b777c702a7d06fe5/lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java#L401-L422) returns the explanation for the highest-scoring child document, regardless of the [ScoreMode](https://github.com/apache/lucene/blob/main/lucene/join/src/java/org/apache/lucene/search/join/ScoreMode.java). If the `ScoreMode` is `ScoreMode.Max`, this makes sense. However, for other `ScoreMode`s (`Min`, `Avg`, `Total`), this output is not as helpful for the user. We should make the explain output for `ToParentBlockJoinQuery` depend on the `ScoreMode` used in the query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.
uschindler commented on PR #12199: URL: https://github.com/apache/lucene/pull/12199#issuecomment-1466877603 Thanks for the updates. Sorry for nitpicking, I just prefer code that goes sequentially and uses the pattern exit method as soon as condition mets. This makes it easier to understand. Nice that there's no double locking anymore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] cknoxrun commented on issue #10608: FuzzyTermEnums sets negative boost for fuzzy search & highlight [LUCENE-9568]
cknoxrun commented on issue #10608: URL: https://github.com/apache/lucene/issues/10608#issuecomment-1466882597 You might already be thinking of this beyond apostrophes, but just in case I've recently come across this (with Elasticsearch) for characters other than apostrophes, specifically Thai language characters (E.g. "การที่ได้ต้องแงงว่"). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI
mdmarshmallow commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1466954686 Ok so I did some more investigation, I think there might be a bug with `Lucene90PostingsReader.BlockDocsEnum#peekNextNonMatchingDocID`. I haven't looked super deeply into it yet, but I can post the patch/seed/test if you want to look into it as well @zacharymorn. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #12184: [nocommit] Introduce ExpressionFacets along with a demo.
gsmiller commented on code in PR #12184: URL: https://github.com/apache/lucene/pull/12184#discussion_r1134695409 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/ExpressionFacets.java: ## @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy; + +import java.io.IOException; +import java.text.ParseException; +import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Objects; +import java.util.Set; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import org.apache.lucene.expressions.Bindings; +import org.apache.lucene.expressions.SimpleBindings; +import org.apache.lucene.expressions.js.JavascriptCompiler; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.SortedNumericDocValues; +import org.apache.lucene.search.ConjunctionUtils; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.DoubleValues; +import org.apache.lucene.search.DoubleValuesSource; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.util.FixedBitSet; + +/** + * Facet by a provided expression string. Variables in the expression string are expected to be of + * the form: {@code val__[sum|max]}, where {@code } must be a valid + * reference in the provided {@link Bindings}. + * + * nocommit: This is a simple demo and is incomplete and not tested. + */ +public class ExpressionFacets extends FloatTaxonomyFacets { + private static final Pattern RE = Pattern.compile("(?=(val_.+_(max|sum)))"); + private static final Map F_MAP = new HashMap<>(); + + static { +F_MAP.put("sum", AssociationAggregationFunction.SUM); +F_MAP.put("max", AssociationAggregationFunction.MAX); + } + + public ExpressionFacets( + String indexFieldName, + TaxonomyReader taxoReader, + FacetsConfig config, + FacetsCollector facetsCollector, + String expression, + Bindings bindings) + throws IOException, ParseException { +super(indexFieldName, taxoReader, AssociationAggregationFunction.SUM, config); +Set aggregations = parseAggregations(expression); +SimpleBindings aggregationBindings = new SimpleBindings(); +aggregate(expression, bindings, aggregationBindings, aggregations, facetsCollector); + } + + /** + * Identify the component aggregations needed by the expression. String parsing is not my strong + * suit, so I'm sure this is clunky at best and buggy at worst. + */ + private static Set parseAggregations(String expression) { +Set result = new HashSet<>(); +Matcher m = RE.matcher(expression); +while (m.find()) { + int start = m.start(); + int sumPos = expression.indexOf("sum", start); + if (sumPos == -1) { +sumPos = Integer.MAX_VALUE; + } + int maxPos = expression.indexOf("max", start); + if (maxPos == -1) { +maxPos = Integer.MAX_VALUE; + } + int end = Math.min(sumPos, maxPos); + if (end == Integer.MAX_VALUE) { +throw new IllegalArgumentException("Invalid syntax"); + } + end += 3; + String component = expression.substring(start, end); + String[] tokens = component.split("_"); + if (tokens.length < 3 || "val".equals(tokens[0]) == false) { +throw new IllegalArgumentException("Invalid syntax"); + } + AssociationAggregationFunction func = + F_MAP.get(tokens[tokens.length - 1].toLowerCase(Locale.ROOT)); + String ref; + if (tokens.length > 3) { +StringBuilder sb = new StringBuilder(); +for (int i = 1; i < tokens.length - 1; i++) { + sb.append(tokens[i]); + if (i < tokens.length - 2) { +sb.append("_"); + } +} +ref = sb.toString(); + } else { +ref = tokens[1]; + } + result.add(new FieldAggregation(component, ref, func)); +} + +return result; + } + + private void aggregate( + String expr
[GitHub] [lucene] zacharymorn commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI
zacharymorn commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1467191329 > Ok so I did some more investigation, I think there might be a bug with Lucene90PostingsReader.BlockDocsEnum#peekNextNonMatchingDocID. I haven't looked super deeply into it yet, but I can post the patch/seed/test if you want to look into it as well @zacharymorn. Hmm that seems strange, as I would think this API won't have direct impact to `FixedBitSet#or`. Where do you see the error? But yeah feel free to upload a patch/test, or post a comment in the diff for changes that you suspect might be wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org