[GitHub] [lucene] jpountz commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


jpountz commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133614094


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -130,13 +134,15 @@ private void ensureOpen() {
 }
   }
 
+  private synchronized boolean contains(DocumentsWriterPerThread state) {
+return dwpts.contains(state);
+  }
+
   void marksAsFreeAndUnlock(DocumentsWriterPerThread state) {
 final long ramBytesUsed = state.ramBytesUsed();
-synchronized (this) {
-  assert dwpts.contains(state)
-  : "we tried to add a DWPT back to the pool but the pool doesn't know 
aobut this DWPT";
-  freeList.add(state, ramBytesUsed);
-}
+assert contains(state)
+: "we tried to add a DWPT back to the pool but the pool doesn't know 
aobut this DWPT";

Review Comment:
   Thanks for catching Uwe, I fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


jpountz commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133629633


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {

Review Comment:
   I still have the feeling that you are making incorrect assumptions about 
what this piece of code is doing (this method itself doesn't publish the object 
to other threads), but I took this thread as a call to keep things simple, so I 
removed the retry and added a few more comments about the logic of this pool.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


dweiss commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133644248


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {

Review Comment:
   I certainly am! But anyone looking at this code without jumping deep will 
have the same doubts, I think. Unless there is a convincing (performance) 
argument that it's worth it, I like the updated version a lot better - it's 
clear and simple.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


uschindler commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133699509


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {

Review Comment:
   Thanks code looks much better. I was also irritated by the long chain of 
"tries" with and without synchronized. To me actualy code is easier to read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


uschindler commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133704928


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
-  }
-  // DWPT is already locked before return by this method:
-  return dwpt;
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {
+  // newWriter() adds the DWPT to the `dwpts` set as a side-effect. 
However it is not added to
+  // `freeList` at this point, it will be added later on once 
DocumentsWriter has indexed a
+  // document into this DWPT and then gives it back to the pool by calling
+  // #marksAsFreeAndUnlock.
+  dwpt = newWriter();

Review Comment:
   I would change this to `return newWriter();`
   
   For better readability also invert the if statement like that:
   ```
   DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
   if (dwpt != null) {
 return dwpt;
   }
   return newWriter();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


uschindler commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133708757


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
-  }
-  // DWPT is already locked before return by this method:
-  return dwpt;
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {
+  // newWriter() adds the DWPT to the `dwpts` set as a side-effect. 
However it is not added to
+  // `freeList` at this point, it will be added later on once 
DocumentsWriter has indexed a
+  // document into this DWPT and then gives it back to the pool by calling
+  // #marksAsFreeAndUnlock.
+  dwpt = newWriter();

Review Comment:
   Don't need to do this, but looks much more "Java 17 like" - just for 
demonstation purposes:
   ```java
   return 
Optional.ofNullable(freeList.poll(DocumentsWriterPerThread::tryLock)).orElse(this::newWriter);
   ```



##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
-  }
-  // DWPT is already locked before return by this method:
-  return dwpt;
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {
+  // newWriter() adds the DWPT to the `dwpts` set as a side-effect. 
However it is not added to
+  // `freeList` at this point, it will be added later on once 
DocumentsWriter has indexed a
+  // document into this DWPT and then gives it back to the pool by calling
+  // #marksAsFreeAndUnlock.
+  dwpt = newWriter();

Review Comment:
   I would change this to `return newWriter();`
   
   For better readability also invert the if statement like that:
   ```java
   DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
   if (dwpt != null) {
 return dwpt;
   }
   return newWriter();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


uschindler commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133704928


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
-  }
-  // DWPT is already locked before return by this method:
-  return dwpt;
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {
+  // newWriter() adds the DWPT to the `dwpts` set as a side-effect. 
However it is not added to
+  // `freeList` at this point, it will be added later on once 
DocumentsWriter has indexed a
+  // document into this DWPT and then gives it back to the pool by calling
+  // #marksAsFreeAndUnlock.
+  dwpt = newWriter();

Review Comment:
   I would change this to `return newWriter();`
   
   For better readability also invert the if statement like that:
   ```java
   DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
   if (dwpt != null) {
 return dwpt;
   }
   return newWriter();
   ```
   
   (of course add your comments!)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


uschindler commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133708757


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
-  }
-  // DWPT is already locked before return by this method:
-  return dwpt;
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {
+  // newWriter() adds the DWPT to the `dwpts` set as a side-effect. 
However it is not added to
+  // `freeList` at this point, it will be added later on once 
DocumentsWriter has indexed a
+  // document into this DWPT and then gives it back to the pool by calling
+  // #marksAsFreeAndUnlock.
+  dwpt = newWriter();

Review Comment:
   Don't need to do this, but looks much more "Java 17 like" - just for 
demonstation purposes:
   ```java
   return 
Optional.ofNullable(freeList.poll(DocumentsWriterPerThread::tryLock)).orElseGet(this::newWriter);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


uschindler commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133708757


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
-  }
-  // DWPT is already locked before return by this method:
-  return dwpt;
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {
+  // newWriter() adds the DWPT to the `dwpts` set as a side-effect. 
However it is not added to
+  // `freeList` at this point, it will be added later on once 
DocumentsWriter has indexed a
+  // document into this DWPT and then gives it back to the pool by calling
+  // #marksAsFreeAndUnlock.
+  dwpt = newWriter();

Review Comment:
   Don't need to do this, but looks much more "Java 17 like" - just for 
demonstation purposes:
   ```java
   return 
Optional.ofNullable(freeList.poll(DocumentsWriterPerThread::tryLock)).orElseGet(this::newWriter);
   ```
   
   Of course it brings no comments anymore. Don't do it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


dweiss commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133753108


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
-  }
-  // DWPT is already locked before return by this method:
-  return dwpt;
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {
+  // newWriter() adds the DWPT to the `dwpts` set as a side-effect. 
However it is not added to
+  // `freeList` at this point, it will be added later on once 
DocumentsWriter has indexed a
+  // document into this DWPT and then gives it back to the pool by calling
+  // #marksAsFreeAndUnlock.
+  dwpt = newWriter();

Review Comment:
   Nice, Uwe. Although nobody can ever convince me those optional expressions 
are easier to read and understand than a plain if... :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jasirkt commented on issue #8966: Wildcard query parser of MultiFieldQueryParser should support boosts [LUCENE-7917]

2023-03-13 Thread via GitHub


jasirkt commented on issue #8966:
URL: https://github.com/apache/lucene/issues/8966#issuecomment-1465981107

   Apart from wildcard query, this issue also applies to fuzzy, prefix, range 
and regexp queries


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


uschindler commented on code in PR #12199:
URL: https://github.com/apache/lucene/pull/12199#discussion_r1133798472


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -113,15 +113,17 @@ private synchronized DocumentsWriterPerThread newWriter() 
{
* operation (add/updateDocument).
*/
   DocumentsWriterPerThread getAndLock() {
-synchronized (this) {
-  ensureOpen();
-  DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
-  if (dwpt == null) {
-dwpt = newWriter();
-  }
-  // DWPT is already locked before return by this method:
-  return dwpt;
+ensureOpen();
+DocumentsWriterPerThread dwpt = 
freeList.poll(DocumentsWriterPerThread::tryLock);
+if (dwpt == null) {
+  // newWriter() adds the DWPT to the `dwpts` set as a side-effect. 
However it is not added to
+  // `freeList` at this point, it will be added later on once 
DocumentsWriter has indexed a
+  // document into this DWPT and then gives it back to the pool by calling
+  // #marksAsFreeAndUnlock.
+  dwpt = newWriter();

Review Comment:
   It was a joke!
   My intention was the first comment, to invert the if logic so use "return on 
success"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jasirkt opened a new pull request, #12202: Fix boost missing in MultiFieldQueryParser

2023-03-13 Thread via GitHub


jasirkt opened a new pull request, #12202:
URL: https://github.com/apache/lucene/pull/12202

   ### Description
   
   This fixes #8966.
   
   When using boost along with any of fuzzy, wildcard, regexp, range or prefix 
queries, the boost is not applied.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12202: Fix boost missing in MultiFieldQueryParser

2023-03-13 Thread via GitHub


rmuir merged PR #12202:
URL: https://github.com/apache/lucene/pull/12202


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #8966: Wildcard query parser of MultiFieldQueryParser should support boosts [LUCENE-7917]

2023-03-13 Thread via GitHub


rmuir closed issue #8966: Wildcard query parser of MultiFieldQueryParser should 
support boosts [LUCENE-7917]
URL: https://github.com/apache/lucene/issues/8966


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12202: Fix boost missing in MultiFieldQueryParser

2023-03-13 Thread via GitHub


rmuir commented on PR #12202:
URL: https://github.com/apache/lucene/pull/12202#issuecomment-1466056963

   @jasirkt thanks for fixing another boost bug in this parser!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sherman opened a new issue, #12203: Scalable merge/compaction of big doc values segments.

2023-03-13 Thread via GitHub


sherman opened a new issue, #12203:
URL: https://github.com/apache/lucene/issues/12203

   ### Description
   
   The question is regarding the scalable merge/compaction of doc values, given 
the following context:
   
   I have a large sharded index.
   Each shard can contain segments of millions of documents.
   There are several hundred fields in the index, and half of them are doc 
values.
   
   I sometimes face issues with merge times when I need to merge or compact a 
large segment. The problem is that it's a single-threaded operation where a 
single segment is merged in a single merger thread.
   
   From the codec doc values format of version 9.x, it appears possible to use 
map-reduce techniques when writing new large doc value segments. This is 
because all metadata is read before any field data can be read, and all doc 
value types have offset and length fields in the metadata.
   
   My basic idea is to write each field in parallel to a single file and then 
perform a low-level merge of the binary data. After that, I can rewrite only 
the metadata to update the offsets.
   
   As I am still new to Lucene development, could someone please provide some 
critique of this idea?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] kashkambath opened a new issue, #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode

2023-03-13 Thread via GitHub


kashkambath opened a new issue, #12204:
URL: https://github.com/apache/lucene/issues/12204

   ### Description
   
   Currently, `ToParentBlockJoinQuery`'s 
[explain()](https://github.com/apache/lucene/blob/4851bd74f402602f53719374b777c702a7d06fe5/lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java#L401-L422)
 returns the explanation for the highest-scoring child document, regardless of 
the 
[ScoreMode](https://github.com/apache/lucene/blob/main/lucene/join/src/java/org/apache/lucene/search/join/ScoreMode.java).
 If the `ScoreMode` is `ScoreMode.Max`, this makes sense. However, for other 
`ScoreMode`s (`Min`, `Avg`, `Total`), this output is not as helpful for the 
user. We should make the explain output for `ToParentBlockJoinQuery` depend on 
the `ScoreMode` used in the query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-13 Thread via GitHub


uschindler commented on PR #12199:
URL: https://github.com/apache/lucene/pull/12199#issuecomment-1466877603

   Thanks for the updates. Sorry for nitpicking, I just prefer code that goes 
sequentially and uses the pattern exit method as soon as condition mets. This 
makes it easier to understand.
   
   Nice that there's no double locking anymore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] cknoxrun commented on issue #10608: FuzzyTermEnums sets negative boost for fuzzy search & highlight [LUCENE-9568]

2023-03-13 Thread via GitHub


cknoxrun commented on issue #10608:
URL: https://github.com/apache/lucene/issues/10608#issuecomment-1466882597

   You might already be thinking of this beyond apostrophes, but just in case 
I've recently come across this (with Elasticsearch) for characters other than 
apostrophes, specifically Thai language characters (E.g. "การที่ได้ต้องแงงว่").


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-13 Thread via GitHub


mdmarshmallow commented on PR #12194:
URL: https://github.com/apache/lucene/pull/12194#issuecomment-1466954686

   Ok so I did some more investigation, I think there might be a bug with 
`Lucene90PostingsReader.BlockDocsEnum#peekNextNonMatchingDocID`. I haven't 
looked super deeply into it yet, but I can post the patch/seed/test if you want 
to look into it as well @zacharymorn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #12184: [nocommit] Introduce ExpressionFacets along with a demo.

2023-03-13 Thread via GitHub


gsmiller commented on code in PR #12184:
URL: https://github.com/apache/lucene/pull/12184#discussion_r1134695409


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/ExpressionFacets.java:
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy;
+
+import java.io.IOException;
+import java.text.ParseException;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.lucene.expressions.Bindings;
+import org.apache.lucene.expressions.SimpleBindings;
+import org.apache.lucene.expressions.js.JavascriptCompiler;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.search.ConjunctionUtils;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.DoubleValues;
+import org.apache.lucene.search.DoubleValuesSource;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * Facet by a provided expression string. Variables in the expression string 
are expected to be of
+ * the form: {@code val__[sum|max]}, where {@code 
} must be a valid
+ * reference in the provided {@link Bindings}.
+ *
+ * nocommit: This is a simple demo and is incomplete and not tested.
+ */
+public class ExpressionFacets extends FloatTaxonomyFacets {
+  private static final Pattern RE = Pattern.compile("(?=(val_.+_(max|sum)))");
+  private static final Map F_MAP = new 
HashMap<>();
+
+  static {
+F_MAP.put("sum", AssociationAggregationFunction.SUM);
+F_MAP.put("max", AssociationAggregationFunction.MAX);
+  }
+
+  public ExpressionFacets(
+  String indexFieldName,
+  TaxonomyReader taxoReader,
+  FacetsConfig config,
+  FacetsCollector facetsCollector,
+  String expression,
+  Bindings bindings)
+  throws IOException, ParseException {
+super(indexFieldName, taxoReader, AssociationAggregationFunction.SUM, 
config);
+Set aggregations = parseAggregations(expression);
+SimpleBindings aggregationBindings = new SimpleBindings();
+aggregate(expression, bindings, aggregationBindings, aggregations, 
facetsCollector);
+  }
+
+  /**
+   * Identify the component aggregations needed by the expression. String 
parsing is not my strong
+   * suit, so I'm sure this is clunky at best and buggy at worst.
+   */
+  private static Set parseAggregations(String expression) {
+Set result = new HashSet<>();
+Matcher m = RE.matcher(expression);
+while (m.find()) {
+  int start = m.start();
+  int sumPos = expression.indexOf("sum", start);
+  if (sumPos == -1) {
+sumPos = Integer.MAX_VALUE;
+  }
+  int maxPos = expression.indexOf("max", start);
+  if (maxPos == -1) {
+maxPos = Integer.MAX_VALUE;
+  }
+  int end = Math.min(sumPos, maxPos);
+  if (end == Integer.MAX_VALUE) {
+throw new IllegalArgumentException("Invalid syntax");
+  }
+  end += 3;
+  String component = expression.substring(start, end);
+  String[] tokens = component.split("_");
+  if (tokens.length < 3 || "val".equals(tokens[0]) == false) {
+throw new IllegalArgumentException("Invalid syntax");
+  }
+  AssociationAggregationFunction func =
+  F_MAP.get(tokens[tokens.length - 1].toLowerCase(Locale.ROOT));
+  String ref;
+  if (tokens.length > 3) {
+StringBuilder sb = new StringBuilder();
+for (int i = 1; i < tokens.length - 1; i++) {
+  sb.append(tokens[i]);
+  if (i < tokens.length - 2) {
+sb.append("_");
+  }
+}
+ref = sb.toString();
+  } else {
+ref = tokens[1];
+  }
+  result.add(new FieldAggregation(component, ref, func));
+}
+
+return result;
+  }
+
+  private void aggregate(
+  String expr

[GitHub] [lucene] zacharymorn commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-13 Thread via GitHub


zacharymorn commented on PR #12194:
URL: https://github.com/apache/lucene/pull/12194#issuecomment-1467191329

   > Ok so I did some more investigation, I think there might be a bug with 
Lucene90PostingsReader.BlockDocsEnum#peekNextNonMatchingDocID. I haven't looked 
super deeply into it yet, but I can post the patch/seed/test if you want to 
look into it as well @zacharymorn.
   
   Hmm that seems strange, as I would think this API won't have direct impact 
to `FixedBitSet#or`. Where do you see the error? But yeah feel free to upload a 
patch/test, or post a comment in the diff for changes that you suspect might be 
wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org