date:20201112

[GitHub] [lucene-solr] iverase commented on pull request #2059: LUCENE-9595: Make Component2D#withinPoint implementations consistent with ShapeQuery logic

2020-11-12 Thread GitBox



iverase commented on pull request #2059:
URL: https://github.com/apache/lucene-solr/pull/2059#issuecomment-725913962


   @nknize Do you have an opinion about this change?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9569) Temporarily disable sort optimization on _doc for 8.7 release.

2020-11-12 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230413#comment-17230413
 ] 

Adrien Grand commented on LUCENE-9569:
--

[~mayyas] Should we add it back now?

> Temporarily disable sort optimization on _doc for 8.7 release.
> --
>
> Key: LUCENE-9569
> URL: https://issues.apache.org/jira/browse/LUCENE-9569
> Project: Lucene - Core
>  Issue Type: Task
>Affects Versions: 8.7
>Reporter: Mayya Sharipova
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sort optimization on _doc was introduced in LUCENE-9449, but it looks 
> unstable and lead to some recent tests failures. 
> As the release of 8.7 is very soon, we need to temporarily disable this sort 
> optimization for _doc for this release with a plan to stabilize it for later 
> releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-12 Thread GitBox



dweiss commented on a change in pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r521911417



##
File path: .gitignore
##
@@ -8,6 +8,10 @@ build/
 /.idea/
 #IntelliJ creates this folder, ignore.
 /dev-tools/missing-doclet/out/
+*.iml

Review comment:
   Right... gradlew idea is a plugin - it does generate those files. 
IntelliJ has native gradle support nowadays. See help/IDEs.txt; perhaps it 
should be clarified there.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9511) Include StoredFieldsWriter in DWPT accounting

2020-11-12 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9511.
--
Fix Version/s: 8.7
   Resolution: Fixed

> Include StoredFieldsWriter in DWPT accounting
> -
>
> Key: LUCENE-9511
> URL: https://issues.apache.org/jira/browse/LUCENE-9511
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: 8.7
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> StoredFieldsWriter might consume some heap space memory that can have a 
> significant impact on decisions made in the IW if writers should be stalled 
> or DWPTs should be flushed if memory settings are small in IWC and flushes 
> are frequent. We should add some accounting to the StoredFieldsWriter since 
> it's part of the DWPT lifecycle and not just present during flush.
> Our nightly builds ran into some OOMs due to the large chunk size used in the 
> CompressedStoredFieldsFormat. The reason are very frequent flushes due to 
> small maxBufferedDocs which causes 300+ DWPTs to be blocked for flush causing 
> ultimately an OOM exception.
> {noformat}
>  
>  NOTE: reproduce with: ant test  -Dtestcase=TestIndexingSequenceNumbers 
> -Dtests.method=testStressConcurrentCommit -Dtests.seed=A04943A98C8E2954 
> -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=vo-001 -Dtests.timezone=Africa/Ouagadougou 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8*06:06:15*[junit4] ERROR   
>  107s J3 | TestIndexingSequenceNumbers.testStressConcurrentCommit 
> <<<*06:06:15*[junit4]> Throwable #1: 
> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
> closed*06:06:15*[junit4]>at 
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:876)*06:06:15*
> [junit4]>  at 
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:890)*06:06:15*
> [junit4]>  at 
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3727)*06:06:15*   
>  [junit4]> at 
> org.apache.lucene.index.TestIndexingSequenceNumbers.testStressConcurrentCommit(TestIndexingSequenceNumbers.java:228)*06:06:15*
> [junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)*06:06:15*[junit4]>at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*06:06:15*
> [junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*06:06:15*
> [junit4]>  at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)*06:06:15*
> [junit4]>at 
> java.base/java.lang.Thread.run(Thread.java:834)*06:06:15*[junit4]> 
> Caused by: java.lang.OutOfMemoryError: Java heap space*06:06:15*[junit4]  
>   > at 
> __randomizedtesting.SeedInfo.seed([A04943A98C8E2954]:0)*06:06:15*[junit4] 
>>   at 
> org.apache.lucene.store.GrowableByteArrayDataOutput.(GrowableByteArrayDataOutput.java:46)*06:06:15*
> [junit4]>   at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:111)*06:06:15*
> [junit4]> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)*06:06:15*
> [junit4]>   at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)*06:06:15*
> [junit4]>at 
> org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:48)*06:06:15*
> [junit4]>  at 
> org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:39)*06:06:15*
> [junit4]> at 
> org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:46)*06:06:15*
> [junit4]>  at 
> org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:426)*06:06:15*
> [junit4]> at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:462)*06:06:15*
> [junit4]>   at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:233)*06:06:15*
> [junit4]>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:419)*06:06:15*
> [junit4]> at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1470)*06:06:15*
> [junit4]>at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1463)*06:06:15*
> [junit4]>at 
> org.apache.lucene.index.TestIndexin

[jira] [Closed] (LUCENE-9511) Include StoredFieldsWriter in DWPT accounting

2020-11-12 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand closed LUCENE-9511.


> Include StoredFieldsWriter in DWPT accounting
> -
>
> Key: LUCENE-9511
> URL: https://issues.apache.org/jira/browse/LUCENE-9511
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: 8.7
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> StoredFieldsWriter might consume some heap space memory that can have a 
> significant impact on decisions made in the IW if writers should be stalled 
> or DWPTs should be flushed if memory settings are small in IWC and flushes 
> are frequent. We should add some accounting to the StoredFieldsWriter since 
> it's part of the DWPT lifecycle and not just present during flush.
> Our nightly builds ran into some OOMs due to the large chunk size used in the 
> CompressedStoredFieldsFormat. The reason are very frequent flushes due to 
> small maxBufferedDocs which causes 300+ DWPTs to be blocked for flush causing 
> ultimately an OOM exception.
> {noformat}
>  
>  NOTE: reproduce with: ant test  -Dtestcase=TestIndexingSequenceNumbers 
> -Dtests.method=testStressConcurrentCommit -Dtests.seed=A04943A98C8E2954 
> -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=vo-001 -Dtests.timezone=Africa/Ouagadougou 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8*06:06:15*[junit4] ERROR   
>  107s J3 | TestIndexingSequenceNumbers.testStressConcurrentCommit 
> <<<*06:06:15*[junit4]> Throwable #1: 
> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
> closed*06:06:15*[junit4]>at 
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:876)*06:06:15*
> [junit4]>  at 
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:890)*06:06:15*
> [junit4]>  at 
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3727)*06:06:15*   
>  [junit4]> at 
> org.apache.lucene.index.TestIndexingSequenceNumbers.testStressConcurrentCommit(TestIndexingSequenceNumbers.java:228)*06:06:15*
> [junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)*06:06:15*[junit4]>at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*06:06:15*
> [junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*06:06:15*
> [junit4]>  at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)*06:06:15*
> [junit4]>at 
> java.base/java.lang.Thread.run(Thread.java:834)*06:06:15*[junit4]> 
> Caused by: java.lang.OutOfMemoryError: Java heap space*06:06:15*[junit4]  
>   > at 
> __randomizedtesting.SeedInfo.seed([A04943A98C8E2954]:0)*06:06:15*[junit4] 
>>   at 
> org.apache.lucene.store.GrowableByteArrayDataOutput.(GrowableByteArrayDataOutput.java:46)*06:06:15*
> [junit4]>   at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:111)*06:06:15*
> [junit4]> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)*06:06:15*
> [junit4]>   at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)*06:06:15*
> [junit4]>at 
> org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:48)*06:06:15*
> [junit4]>  at 
> org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:39)*06:06:15*
> [junit4]> at 
> org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:46)*06:06:15*
> [junit4]>  at 
> org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:426)*06:06:15*
> [junit4]> at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:462)*06:06:15*
> [junit4]>   at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:233)*06:06:15*
> [junit4]>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:419)*06:06:15*
> [junit4]> at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1470)*06:06:15*
> [junit4]>at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1463)*06:06:15*
> [junit4]>at 
> org.apache.lucene.index.TestIndexingSequenceNumbers$2.run(TestIndexingSequenceNumbers.j

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-12 Thread GitBox



dweiss commented on a change in pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r521912140



##
File path: lucene/misc/native/src/main/posix/NativePosixUtil.cpp
##
@@ -38,12 +38,12 @@
 
 #ifdef LINUX
 /*
- * Class: org_apache_lucene_store_NativePosixUtil
+ * Class: org_apache_lucene_misc_store_NativePosixUtil
  * Method:posix_fadvise
  * Signature: (Ljava/io/FileDescriptor;JJI)V
  */
 extern "C"
-JNIEXPORT jint JNICALL 
Java_org_apache_lucene_store_NativePosixUtil_posix_1fadvise(JNIEnv *env, jclass 
_ignore, jobject fileDescriptor, jlong offset, jlong len, jint advice)
+JNIEXPORT jint JNICALL 
Java_org_apache_lucene_misc_store_NativePosixUtil_posix_1fadvise(JNIEnv *env, 
jclass _ignore, jobject fileDescriptor, jlong offset, jlong len, jint advice)

Review comment:
   Yeah. We really should try to add a test that tries to run with these 
libs. I don't know how to handle this yet - can be a follow-up issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-12 Thread GitBox



dweiss commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-725917917


   > On the other hand, I think these tests will break if run from IDEs. Do we 
need to support that in this PR?
   
   Oh, thanks! I'll take a look after I come back from work. I think the PR 
should be clean in that it doesn't break other people's workflow, so yes - if 
you added tests they should run or be quietly ignored if they're not supported. 
I'll take a look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2077: LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish

2020-11-12 Thread GitBox



dweiss commented on pull request #2077:
URL: https://github.com/apache/lucene-solr/pull/2077#issuecomment-725922377


   LGTM. Just scratching my head about one thing - the generator has that extra 
field now yet the patch doesn't contain diffs for other languages - only 
Serbian and Yiddish?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14975) Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230421#comment-17230421
 ] 

ASF subversion and git services commented on SOLR-14975:


Commit c53f0630169c535f24534a1b1333cbebdfc7ea2f in lucene-solr's branch 
refs/heads/branch_8x from Bruno Roustant
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c53f063 ]

SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
Also optimize getCoreDescriptors.

Closes #2066


> Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames 
> --
>
> Key: SOLR-14975
> URL: https://issues.apache.org/jira/browse/SOLR-14975
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The methods CoreContainer.getAllCoreNames and getLoadedCoreNames hold a lock 
> while they grab core names to put into a TreeSet.  When there are *many* 
> cores, this delay is noticeable.  Holding this lock effectively blocks 
> queries since queries lookup a core; so it's critically important that these 
> methods are *fast*.  The tragedy here is that some callers merely want to 
> know if a particular name is in the set, or what the aggregated size is.  
> Some callers want to iterate the names but don't really care what the 
> iteration order is.
> I propose that some callers of these two methods find suitable alternatives, 
> like getCoreDescriptor to check for null.  And I propose that these methods 
> return a HashSet -- no order.  If the caller wants it sorted, it can do so 
> itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant closed pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-12 Thread GitBox



bruno-roustant closed pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14975) Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames

2020-11-12 Thread Bruno Roustant (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Roustant resolved SOLR-14975.
---
Resolution: Fixed

Thanks Erick and David for the review!

> Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames 
> --
>
> Key: SOLR-14975
> URL: https://issues.apache.org/jira/browse/SOLR-14975
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The methods CoreContainer.getAllCoreNames and getLoadedCoreNames hold a lock 
> while they grab core names to put into a TreeSet.  When there are *many* 
> cores, this delay is noticeable.  Holding this lock effectively blocks 
> queries since queries lookup a core; so it's critically important that these 
> methods are *fast*.  The tragedy here is that some callers merely want to 
> know if a particular name is in the set, or what the aggregated size is.  
> Some callers want to iterate the names but don't really care what the 
> iteration order is.
> I propose that some callers of these two methods find suitable alternatives, 
> like getCoreDescriptor to check for null.  And I propose that these methods 
> return a HashSet -- no order.  If the caller wants it sorted, it can do so 
> itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2077: LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish

2020-11-12 Thread GitBox



uschindler commented on pull request #2077:
URL: https://github.com/apache/lucene-solr/pull/2077#issuecomment-725941942


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9606) Wrap boolean queries generated by shape fields with a Constant score query

2020-11-12 Thread Ignacio Vera (Jira)

Ignacio Vera created LUCENE-9606:


 Summary: Wrap boolean queries generated by shape fields with a 
Constant score query
 Key: LUCENE-9606
 URL: https://issues.apache.org/jira/browse/LUCENE-9606
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ignacio Vera


When querying a shape field with a Geometry collection and a CONTAINS spatial 
relationship, the query is rewritten as a boolean query. We should wrap the 
resulting query with a ConstantScoreQuery.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9606) Wrap boolean queries generated by shape fields with a Constant score query

2020-11-12 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230490#comment-17230490
 ] 

Adrien Grand commented on LUCENE-9606:
--

+1

> Wrap boolean queries generated by shape fields with a Constant score query
> --
>
> Key: LUCENE-9606
> URL: https://issues.apache.org/jira/browse/LUCENE-9606
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>
> When querying a shape field with a Geometry collection and a CONTAINS spatial 
> relationship, the query is rewritten as a boolean query. We should wrap the 
> resulting query with a ConstantScoreQuery.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-11-12 Thread Simon Willnauer (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230497#comment-17230497
 ] 

Simon Willnauer commented on LUCENE-9508:
-

Hey Zach, thanks for opening this. Lemme ask some question and clarify what is 
going on here first:
{quote} 2) Should the fullFlush thread wait indefinitely for the lock on 
ThreadStates ? Since single blocking writing thread can block the full flush 
here.
{quote}

yes we have to block on the threadstates here since this is the contract of 
full flush in order to atomically commit changes and establish a happens before 
relationship.

{quote}
1) Should *preUpdate* look into the blocked flushes information as well instead 
of just flush queue ?
{quote}

I am not sure what is would do with the information in blocked flushes? Can you 
elaborate on this? we can't let blocked flushes go unless the full flush is 
over otherwise we will have inconsistent commits. 

Can you share your IndexWriter config and how you configured the 10% heap?
Can you also share what thread holds the ThreadState that the full flush is 
waiting for? I wonder what causes this situation. 



> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> 
>
> Key: LUCENE-9508
> URL: https://issues.apache.org/jira/browse/LUCENE-9508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.1
>Reporter: Sorabh Hamirwasia
>Priority: Major
>  Labels: IndexWriter
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-12 Thread GitBox



s1monw commented on a change in pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r521992111



##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValues.java
##
@@ -74,6 +74,18 @@ public BytesRef binaryValue() throws IOException {
 throw new UnsupportedOperationException();
   }
 
+  /**
+   * Return the k nearest neighbor documents as determined by comparison of 
their vector values
+   * for this field, to the given vector, by the field's search strategy. If 
the search strategy is
+   * reversed, lower values indicate nearer vectors, otherwise higher scores 
indicate nearer
+   * vectors. Unlike relevance scores, vector scores may be negative.
+   * @param target the vector-valued query
+   * @param k  the number of docs to return
+   * @param fanout control the accuracy/speed tradeoff - larger values give 
better recall at higher cost

Review comment:
   @mikemccand it was pushed but removed again in 
https://issues.apache.org/jira/browse/LUCENE-9257 a little while ago. I get why 
it's removed but it seems useful maybe we add it back?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9590) Add javadoc for Lucene86PointsFormat class

2020-11-12 Thread Lu Xugang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230509#comment-17230509
 ] 

Lu Xugang commented on LUCENE-9590:
---

Or as [~dsmiley] said in mail,  We could link to it from javadocs and host it 
at the Confluence based wiki here: 
[https://cwiki.apache.org/confluence/display/LUCENE/Home] ?

> Add javadoc for  Lucene86PointsFormat class
> ---
>
> Key: LUCENE-9590
> URL: https://issues.apache.org/jira/browse/LUCENE-9590
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/codecs
>Reporter: Lu Xugang
>Priority: Minor
> Attachments: 1.png
>
>
> I would like to add javadoc for Lucene86PointsFormat class,  it is really 
> helpful for source reader to understand the data structure with point value, 
> is anyone doing this or plan?
> The attachment list part of the data structure （filled with color means it 
> has sub data structure）
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] sigram commented on a change in pull request #2065: SOLR-14977 : ContainerPlugins should be configurable

2020-11-12 Thread GitBox



sigram commented on a change in pull request #2065:
URL: https://github.com/apache/lucene-solr/pull/2065#discussion_r522034988



##
File path: solr/core/src/test/org/apache/solr/handler/TestContainerPlugin.java
##
@@ -366,7 +381,7 @@ public void m2(SolrQueryRequest req, SolrQueryResponse rsp) 
{
 
   }
 
-  public static class CConfig extends PluginMeta {
+  public static class CConfig implements ReflectMapWriter {

Review comment:
   Then we have to explicitly say so in `ConfigurablePlugin`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] sigram commented on a change in pull request #2065: SOLR-14977 : ContainerPlugins should be configurable

2020-11-12 Thread GitBox



sigram commented on a change in pull request #2065:
URL: https://github.com/apache/lucene-solr/pull/2065#discussion_r522039722



##
File path: solr/core/src/java/org/apache/solr/api/ContainerPluginsRegistry.java
##
@@ -114,6 +118,16 @@ public synchronized ApiInfo getPlugin(String name) {
 return currentPlugins.get(name);
   }
 
+  static class PluginMetaHolder {
+private final Map original;
+private final PluginMeta meta;

Review comment:
   I know that I can ignore it - my point was that this property is a relic 
of the time when we allowed only Api handlers as plugins. Now when non-Api 
plugins are first-class citizens half of the time this property doesn't make 
sense because it's specific only to Api plugins - so it should not be exposed 
as a standard property for all plugins.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14991) tag and remove obsolete branches

2020-11-12 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230611#comment-17230611
 ] 

Erick Erickson commented on SOLR-14991:
---

I'll tag these and remove the branch this weekend absent feedback.

[~noble.paul] 
 remotes/origin/jira-14151-revert
 remotes/origin/jira/V2Request

[~caomanhdat] 
 remotes/origin/jira/http2

[~danmuzi] or maybe [~rmuir]

remotes/origin/revert-776-remove_icu_dependency Maybe this is LUCENE-8912 pull 
776? In which case the JIRAs closed and I'll tag/remove.

> tag and remove obsolete branches
> 
>
> Key: SOLR-14991
> URL: https://issues.apache.org/jira/browse/SOLR-14991
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> I'm going to gradually work through the branches, tagging and removing
> 1> anything with a Jira name that's fixed
> 2> anything that I'm certain will never be fixed (e.g. the various gradle 
> build branches)
> So the changes will still available, they just won't pollute the branch list.
> I'll list the branches here, all the tags will be
> history/branches/lucene-solr/
>  
> This specifically will _not_ include
> 1> any release, e.g. branch_8_4
> 2> anything I'm unsure about. People who've created branches should expect 
> some pings about this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14983) Score returned in search request is original score and not reranked score

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230620#comment-17230620
 ] 

ASF subversion and git services commented on SOLR-14983:


Commit 2f02040a4c45e4dfdb1f569ae05637c86f0f001b in lucene-solr's branch 
refs/heads/master from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2f02040 ]

SOLR-14983: Fix response returning original score instead of reranked score due 
to query and filter combining.
(Krishan Goyal, Jason Baik, Christine Poerschke)


> Score returned in search request is original score and not reranked score
> -
>
> Key: SOLR-14983
> URL: https://issues.apache.org/jira/browse/SOLR-14983
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 8.0
>Reporter: Krishan Goyal
>Assignee: Christine Poerschke
>Priority: Major
> Attachments: 0001-LUCENE-9542-Unit-test-to-reproduce-bug.patch, 
> SOLR-14983.patch, SOLR-14983.patch
>
>
> Score returned in search request is original score and not reranked score 
> post the changes in https://issues.apache.org/jira/browse/LUCENE-8412.
> Commit - 
> [https://github.com/apache/lucene-solr/commit/55bfadbce115a825a75686fe0bfe71406bc3ee44#diff-4e354f104ed52bd7f620b0c05ae8467d]
> Specifically - 
> if (cmd.getSort() != null && query instanceof RankQuery == false && 
> (cmd.getFlags() & GET_SCORES) != 0) {
>     TopFieldCollector.populateScores(topDocs.scoreDocs, this, query);
> }
> in SolrIndexSearcher.java recomputes the score but outputs only the original 
> score and not the reranked score.
>  
> The issue is cmd.getQuery() is a type of RankQuery but the "query" variable 
> is a boolean query and probably replacing query with cmd.getQuery() should be 
> the right fix for this so that the score is not overriden for rerank queries
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14983) Score returned in search request is original score and not reranked score

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230622#comment-17230622
 ] 

ASF subversion and git services commented on SOLR-14983:


Commit ad27dd7c56e4f9e1d1828206e7595b557ed070cb in lucene-solr's branch 
refs/heads/branch_8x from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad27dd7 ]

SOLR-14983: Fix response returning original score instead of reranked score due 
to query and filter combining.
(Krishan Goyal, Jason Baik, Christine Poerschke)


> Score returned in search request is original score and not reranked score
> -
>
> Key: SOLR-14983
> URL: https://issues.apache.org/jira/browse/SOLR-14983
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 8.0
>Reporter: Krishan Goyal
>Assignee: Christine Poerschke
>Priority: Major
> Attachments: 0001-LUCENE-9542-Unit-test-to-reproduce-bug.patch, 
> SOLR-14983.patch, SOLR-14983.patch
>
>
> Score returned in search request is original score and not reranked score 
> post the changes in https://issues.apache.org/jira/browse/LUCENE-8412.
> Commit - 
> [https://github.com/apache/lucene-solr/commit/55bfadbce115a825a75686fe0bfe71406bc3ee44#diff-4e354f104ed52bd7f620b0c05ae8467d]
> Specifically - 
> if (cmd.getSort() != null && query instanceof RankQuery == false && 
> (cmd.getFlags() & GET_SCORES) != 0) {
>     TopFieldCollector.populateScores(topDocs.scoreDocs, this, query);
> }
> in SolrIndexSearcher.java recomputes the score but outputs only the original 
> score and not the reranked score.
>  
> The issue is cmd.getQuery() is a type of RankQuery but the "query" variable 
> is a boolean query and probably replacing query with cmd.getQuery() should be 
> the right fix for this so that the score is not overriden for rerank queries
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14983) Score returned in search request is original score and not reranked score

2020-11-12 Thread Christine Poerschke (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-14983:
---
Fix Version/s: 8.8
   master (9.0)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks everyone!

> Score returned in search request is original score and not reranked score
> -
>
> Key: SOLR-14983
> URL: https://issues.apache.org/jira/browse/SOLR-14983
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 8.0
>Reporter: Krishan Goyal
>Assignee: Christine Poerschke
>Priority: Major
> Fix For: master (9.0), 8.8
>
> Attachments: 0001-LUCENE-9542-Unit-test-to-reproduce-bug.patch, 
> SOLR-14983.patch, SOLR-14983.patch
>
>
> Score returned in search request is original score and not reranked score 
> post the changes in https://issues.apache.org/jira/browse/LUCENE-8412.
> Commit - 
> [https://github.com/apache/lucene-solr/commit/55bfadbce115a825a75686fe0bfe71406bc3ee44#diff-4e354f104ed52bd7f620b0c05ae8467d]
> Specifically - 
> if (cmd.getSort() != null && query instanceof RankQuery == false && 
> (cmd.getFlags() & GET_SCORES) != 0) {
>     TopFieldCollector.populateScores(topDocs.scoreDocs, this, query);
> }
> in SolrIndexSearcher.java recomputes the score but outputs only the original 
> score and not the reranked score.
>  
> The issue is cmd.getQuery() is a type of RankQuery but the "query" variable 
> is a boolean query and probably replacing query with cmd.getQuery() should be 
> the right fix for this so that the score is not overriden for rerank queries
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14986) Add warning to ref guide that using "properties.name" is an expert option

2020-11-12 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-14986:
--
Summary: Add warning to ref guide that using "properties.name" is an expert 
option  (was: Restrict the properties possible to define with 
"property.name=value" when creating a collection)

> Add warning to ref guide that using "properties.name" is an expert option
> -
>
> Key: SOLR-14986
> URL: https://issues.apache.org/jira/browse/SOLR-14986
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> This came to light when I was looking at two user-list questions where people 
> try to manually define core.properties to define _replicas_ in SolrCloud. 
> There are two related issues:
> 1> You can do things like "action=CREATE&name=eoe&property.collection=blivet" 
> which results in an opaque error about "could not create replica." I 
> propose we return a better error here like "property.collection should not be 
> specified when creating a collection". What do people think about the rest of 
> the auto-created properties on collection creation? 
> coreNodeName
> collection.configName
> name
> numShards
> shard
> collection
> replicaType
> "name" seems to be OK to change, although i don't see anyplace anyone can 
> actually see it afterwards
> 2> Change the ref guide to steer people away from attempting to manually 
> create a core.properties file to define cores/replicas in SolrCloud. There's 
> no warning on the "defining-core-properties.adoc" for instance. Additionally 
> there should be some kind of message on the collections API documentation 
> about not trying to set the properties in <1> on the CREATE command.
> <2> used to actually work (apparently) with legacyCloud...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14991) tag and remove obsolete branches

2020-11-12 Thread Cao Manh Dat (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230628#comment-17230628
 ] 

Cao Manh Dat commented on SOLR-14991:
-

thank you Erick, I am ok on removing that!

> tag and remove obsolete branches
> 
>
> Key: SOLR-14991
> URL: https://issues.apache.org/jira/browse/SOLR-14991
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> I'm going to gradually work through the branches, tagging and removing
> 1> anything with a Jira name that's fixed
> 2> anything that I'm certain will never be fixed (e.g. the various gradle 
> build branches)
> So the changes will still available, they just won't pollute the branch list.
> I'll list the branches here, all the tags will be
> history/branches/lucene-solr/
>  
> This specifically will _not_ include
> 1> any release, e.g. branch_8_4
> 2> anything I'm unsure about. People who've created branches should expect 
> some pings about this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] rmuir commented on pull request #2077: LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish

2020-11-12 Thread GitBox



rmuir commented on pull request #2077:
URL: https://github.com/apache/lucene-solr/pull/2077#issuecomment-726089629


   > Just scratching my head about one thing - the generator has that extra 
field now yet the patch doesn't contain diffs for other languages - only 
Serbian and Yiddish?
   
   @dweiss Which extra field?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand merged pull request #2076: LUCENE-9603: Remove redundant fieldType.stored() check

2020-11-12 Thread GitBox



mikemccand merged pull request #2076:
URL: https://github.com/apache/lucene-solr/pull/2076


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2077: LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish

2020-11-12 Thread GitBox



dweiss commented on pull request #2077:
URL: https://github.com/apache/lucene-solr/pull/2077#issuecomment-726091990


   This one?
   
   
https://github.com/apache/lucene-solr/pull/2077/files#diff-455f29a3b76e17c21dded0a5f1b853145bfa7e6e5f1f36c52012a5f124e14ac2R589



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] rmuir commented on pull request #2077: LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish

2020-11-12 Thread GitBox



rmuir commented on pull request #2077:
URL: https://github.com/apache/lucene-solr/pull/2077#issuecomment-726092971


   There isn't any changes here. It's confusing because its showing a diff of a 
patch file, and github is highlighting something in green that isn't a change. 
The only thing that actually changed in the patch file are the line numbers of 
patch chunks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] rmuir commented on pull request #2077: LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish

2020-11-12 Thread GitBox



rmuir commented on pull request #2077:
URL: https://github.com/apache/lucene-solr/pull/2077#issuecomment-726094165


   It's not your fault, this is why its good to try to work down this patch 
file, it is confusing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2077: LUCENE-9605: update snowball to d8cf01ddf37a, adds Yiddish

2020-11-12 Thread GitBox



dweiss commented on pull request #2077:
URL: https://github.com/apache/lucene-solr/pull/2077#issuecomment-726093808


   Duh. I see it now... darn, sorry for the noise.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ErickErickson opened a new pull request #2078: SOLR-14986: Add warning to ref guide that using properties.name is an…

2020-11-12 Thread GitBox



ErickErickson opened a new pull request #2078:
URL: https://github.com/apache/lucene-solr/pull/2078


   Changed both CREATE and ADDREPLICA to just add the warning to the docs. The 
JIRA has a long explanation about why fixing it in the code is too 
risky/expensive.
   
   gw buildsite succeeds.
   
   I'll commmit this over the weekend absent objections.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-12 Thread GitBox



mikemccand commented on a change in pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r522164296



##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValues.java
##
@@ -74,6 +74,18 @@ public BytesRef binaryValue() throws IOException {
 throw new UnsupportedOperationException();
   }
 
+  /**
+   * Return the k nearest neighbor documents as determined by comparison of 
their vector values
+   * for this field, to the given vector, by the field's search strategy. If 
the search strategy is
+   * reversed, lower values indicate nearer vectors, otherwise higher scores 
indicate nearer
+   * vectors. Unlike relevance scores, vector scores may be negative.
+   * @param target the vector-valued query
+   * @param k  the number of docs to return
+   * @param fanout control the accuracy/speed tradeoff - larger values give 
better recall at higher cost

Review comment:
   Ahh OK thanks for the context @s1monw.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14986) Add warning to ref guide that using "properties.name" is an expert option

2020-11-12 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230681#comment-17230681
 ] 

Erick Erickson commented on SOLR-14986:
---

[~ctargett] [~mdrob]  Any comments? It's just a couple of warnings in the docs 
now.

> Add warning to ref guide that using "properties.name" is an expert option
> -
>
> Key: SOLR-14986
> URL: https://issues.apache.org/jira/browse/SOLR-14986
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This came to light when I was looking at two user-list questions where people 
> try to manually define core.properties to define _replicas_ in SolrCloud. 
> There are two related issues:
> 1> You can do things like "action=CREATE&name=eoe&property.collection=blivet" 
> which results in an opaque error about "could not create replica." I 
> propose we return a better error here like "property.collection should not be 
> specified when creating a collection". What do people think about the rest of 
> the auto-created properties on collection creation? 
> coreNodeName
> collection.configName
> name
> numShards
> shard
> collection
> replicaType
> "name" seems to be OK to change, although i don't see anyplace anyone can 
> actually see it afterwards
> 2> Change the ref guide to steer people away from attempting to manually 
> create a core.properties file to define cores/replicas in SolrCloud. There's 
> no warning on the "defining-core-properties.adoc" for instance. Additionally 
> there should be some kind of message on the collections API documentation 
> about not trying to set the properties in <1> on the CREATE command.
> <2> used to actually work (apparently) with legacyCloud...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz merged pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.

2020-11-12 Thread GitBox



jpountz merged pull request #2069:
URL: https://github.com/apache/lucene-solr/pull/2069


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230700#comment-17230700
 ] 

ASF subversion and git services commented on LUCENE-9378:
-

Commit 06877b2c6e47bc481a79d7bedd8ea4fb099f1b4c in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=06877b2 ]

LUCENE-9378: Make it possible to configure how to trade speed for compression 
on doc values. (#2069)

This adds a switch to `Lucene80DocValuesFormat` which allows to
configure whether to prioritize retrieval speed over compression ratio
or the other way around. When prioritizing retrieval speed, binary doc
values are written using the exact same format as before more aggressive
compression got introduced.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230701#comment-17230701
 ] 

ASF subversion and git services commented on LUCENE-9378:
-

Commit 06877b2c6e47bc481a79d7bedd8ea4fb099f1b4c in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=06877b2 ]

LUCENE-9378: Make it possible to configure how to trade speed for compression 
on doc values. (#2069)

This adds a switch to `Lucene80DocValuesFormat` which allows to
configure whether to prioritize retrieval speed over compression ratio
or the other way around. When prioritizing retrieval speed, binary doc
values are written using the exact same format as before more aggressive
compression got introduced.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] HoustonPutman commented on pull request #2078: SOLR-14986: Add warning to ref guide that using properties.name is an…

2020-11-12 Thread GitBox



HoustonPutman commented on pull request #2078:
URL: https://github.com/apache/lucene-solr/pull/2078#issuecomment-726206070


   I believe you can actually make a warning box in asciidoc via:
   
   ```asciidoc
   [WARNING]
   
   
   
   ```
   
   I also prefer "overwriting" or "overriding" to "conflicting", I feel that 
more accurately describes what the user would be doing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-12 Thread Nico Tonozzi (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230801#comment-17230801
 ] 

Nico Tonozzi commented on LUCENE-9378:
--

Thank you for the help here folks, and especially [~jpountz]!

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-12 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-9499:
---

Hi,
since this commit, the Windows builds fail reproducible:

FAILURE: Build failed with an exception.

* Where:
Script 
'C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\gradle\validation\check-broken-links.gradle'
 line: 63

* What went wrong:
Execution failed for task ':lucene:documentation:checkBrokenLinks'.
> Broken links check failed. Command output at: 
> C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\documentation\build\tmp\checkBrokenLinks\check-broken-links-output.txt

Currently, a build is running on Jenkins, so the mentioned temp file is not yet 
there: 
https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/ws/lucene/documentation/build/

Will post once visible.

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-12 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230851#comment-17230851
 ] 

Uwe Schindler edited comment on LUCENE-9499 at 11/12/20, 6:39 PM:
--

Hi,
since this commit (#2072), the Windows builds fail reproducible:

FAILURE: Build failed with an exception.

* Where:
Script 
'C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\gradle\validation\check-broken-links.gradle'
 line: 63

* What went wrong:
Execution failed for task ':lucene:documentation:checkBrokenLinks'.
> Broken links check failed. Command output at: 
> C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\documentation\build\tmp\checkBrokenLinks\check-broken-links-output.txt

Currently, a build is running on Jenkins, so the mentioned temp file is not yet 
there: 
https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/ws/lucene/documentation/build/

Will post once visible.


was (Author: thetaphi):
Hi,
since this commit, the Windows builds fail reproducible:

FAILURE: Build failed with an exception.

* Where:
Script 
'C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\gradle\validation\check-broken-links.gradle'
 line: 63

* What went wrong:
Execution failed for task ':lucene:documentation:checkBrokenLinks'.
> Broken links check failed. Command output at: 
> C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\documentation\build\tmp\checkBrokenLinks\check-broken-links-output.txt

Currently, a build is running on Jenkins, so the mentioned temp file is not yet 
there: 
https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/ws/lucene/documentation/build/

Will post once visible.

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-12 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230855#comment-17230855
 ] 

Uwe Schindler commented on LUCENE-9499:
---

And this may be removed, as we have no split packages anymore: 
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=gradle/documentation/render-javadoc.gradle;h=bbd1b5e603a0c9f513c836452a18b9ce9caa83e7;hb=426a9c2#l277

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-12 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230907#comment-17230907
 ] 

Uwe Schindler commented on LUCENE-9499:
---

The problem is the following:

{noformat}
Crawl/parse...

Verify...

file:///C%3A/Users/jenkins/workspace/Lucene-Solr-master-Windows/lucene/documentation/build/site/core/org/apache/lucene/index/PointValues.html
  BROKEN LINK: 
file:///C%3A/Users/jenkins/workspace/Lucene-Solr-master-Windows/lucene/documentation/build/site/misc/org/apache/lucene/document/InetAddressPoint.html

Broken javadocs links were found! Common root causes:
* A typo of some sort for manually created links.
* Public methods referencing non-public classes in their signature.
{noformat}

It looks like there is missing a relative link, because all those links should 
be relative. The link checker on unix does not figure this out as it also 
accepts absolute links, but because the link is completely wrongly escaped 
here, the error was catched.

Not sure if it really came from this commit, but i think it's the correct place 
to fix it.

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-12 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230908#comment-17230908
 ] 

Uwe Schindler commented on LUCENE-9499:
---

I think that's caused by LUCENE-9600, will reopen that.

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-9600) Clean up package name conflicts for misc module

2020-11-12 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-9600:
---

I think this caused the bug reported in LUCENE-9499.

> Clean up package name conflicts for misc module
> ---
>
> Key: LUCENE-9600
> URL: https://issues.apache.org/jira/browse/LUCENE-9600
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> misc module shares the package names o.a.l.document, o.a.l.index, 
> o.a.l.search, o.a.l.store, and o.a.l.util with lucene-core. Those should be 
> moved under o.a.l.misc (or some classed should be moved to core?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230912#comment-17230912
 ] 

ASF subversion and git services commented on LUCENE-9378:
-

Commit a48dc123b3e65928d5e59a76735cf6c88099915a in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a48dc12 ]

LUCENE-9378: Make it possible to configure how to trade speed for compression 
on doc values. (#2069)

This adds a switch to `Lucene80DocValuesFormat` which allows to
configure whether to prioritize retrieval speed over compression ratio
or the other way around. When prioritizing retrieval speed, binary doc
values are written using the exact same format as before more aggressive
compression got introduced.


> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230911#comment-17230911
 ] 

ASF subversion and git services commented on LUCENE-9378:
-

Commit a48dc123b3e65928d5e59a76735cf6c88099915a in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a48dc12 ]

LUCENE-9378: Make it possible to configure how to trade speed for compression 
on doc values. (#2069)

This adds a switch to `Lucene80DocValuesFormat` which allows to
configure whether to prioritize retrieval speed over compression ratio
or the other way around. When prioritizing retrieval speed, binary doc
values are written using the exact same format as before more aggressive
compression got introduced.


> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9600) Clean up package name conflicts for misc module

2020-11-12 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230918#comment-17230918
 ] 

Uwe Schindler commented on LUCENE-9600:
---

The problem is caused by this link: 
https://github.com/apache/lucene-solr/blob/32bf7bad4bb59a630942e72a7fe5d1d2cc47cb56/lucene/core/src/java/org/apache/lucene/index/PointValues.java#L51

It also affects linux, as I just noticed.

> Clean up package name conflicts for misc module
> ---
>
> Key: LUCENE-9600
> URL: https://issues.apache.org/jira/browse/LUCENE-9600
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> misc module shares the package names o.a.l.document, o.a.l.index, 
> o.a.l.search, o.a.l.store, and o.a.l.util with lucene-core. Those should be 
> moved under o.a.l.misc (or some classed should be moved to core?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9600) Clean up package name conflicts for misc module

2020-11-12 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230918#comment-17230918
 ] 

Uwe Schindler edited comment on LUCENE-9600 at 11/12/20, 8:33 PM:
--

The problem is caused by this link: 
https://github.com/apache/lucene-solr/blob/32bf7bad4bb59a630942e72a7fe5d1d2cc47cb56/lucene/core/src/java/org/apache/lucene/index/PointValues.java#L51

As InetAddressPoint was moved to core, we can make a standard {{@link}} out of 
it and import the class. We should also check the link in the line following 
this one.

It also affects linux, as I just noticed.


was (Author: thetaphi):
The problem is caused by this link: 
https://github.com/apache/lucene-solr/blob/32bf7bad4bb59a630942e72a7fe5d1d2cc47cb56/lucene/core/src/java/org/apache/lucene/index/PointValues.java#L51

It also affects linux, as I just noticed.

> Clean up package name conflicts for misc module
> ---
>
> Key: LUCENE-9600
> URL: https://issues.apache.org/jira/browse/LUCENE-9600
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> misc module shares the package names o.a.l.document, o.a.l.index, 
> o.a.l.search, o.a.l.store, and o.a.l.util with lucene-core. Those should be 
> moved under o.a.l.misc (or some classed should be moved to core?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-12 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9378.
--
Fix Version/s: 8.8
   Resolution: Fixed

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Fix For: 8.8
>
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9569) Temporarily disable sort optimization on _doc for 8.7 release.

2020-11-12 Thread Mayya Sharipova (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova resolved LUCENE-9569.
-
Fix Version/s: 8.7
   Resolution: Fixed

> Temporarily disable sort optimization on _doc for 8.7 release.
> --
>
> Key: LUCENE-9569
> URL: https://issues.apache.org/jira/browse/LUCENE-9569
> Project: Lucene - Core
>  Issue Type: Task
>Affects Versions: 8.7
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.7
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sort optimization on _doc was introduced in LUCENE-9449, but it looks 
> unstable and lead to some recent tests failures. 
> As the release of 8.7 is very soon, we need to temporarily disable this sort 
> optimization for _doc for this release with a plan to stabilize it for later 
> releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9569) Temporarily disable sort optimization on _doc for 8.7 release.

2020-11-12 Thread Mayya Sharipova (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230994#comment-17230994
 ] 

Mayya Sharipova commented on LUCENE-9569:
-

[~jpountz] Thank you for the reminder. I have reverted this commit, so sort 
optimization on _doc should again be enabled on branch_8x.

> Temporarily disable sort optimization on _doc for 8.7 release.
> --
>
> Key: LUCENE-9569
> URL: https://issues.apache.org/jira/browse/LUCENE-9569
> Project: Lucene - Core
>  Issue Type: Task
>Affects Versions: 8.7
>Reporter: Mayya Sharipova
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sort optimization on _doc was introduced in LUCENE-9449, but it looks 
> unstable and lead to some recent tests failures. 
> As the release of 8.7 is very soon, we need to temporarily disable this sort 
> optimization for _doc for this release with a plan to stabilize it for later 
> releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9569) Temporarily disable sort optimization on _doc for 8.7 release.

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230997#comment-17230997
 ] 

ASF subversion and git services commented on LUCENE-9569:
-

Commit e4038142cc33cfcdf58fa2fdde9d8e66251ae4bf in lucene-solr's branch 
refs/heads/branch_8x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e403814 ]

Revert "LUCENE-9569 Disalbe sort opt on _doc (#1959)"

Re-enabled sort optimization on _doc

This reverts commit 1c0f07ac03f0235adaf5c150f1c6656336e4282f.


> Temporarily disable sort optimization on _doc for 8.7 release.
> --
>
> Key: LUCENE-9569
> URL: https://issues.apache.org/jira/browse/LUCENE-9569
> Project: Lucene - Core
>  Issue Type: Task
>Affects Versions: 8.7
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.7
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sort optimization on _doc was introduced in LUCENE-9449, but it looks 
> unstable and lead to some recent tests failures. 
> As the release of 8.7 is very soon, we need to temporarily disable this sort 
> optimization for _doc for this release with a plan to stabilize it for later 
> releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9569) Temporarily disable sort optimization on _doc for 8.7 release.

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230998#comment-17230998
 ] 

ASF subversion and git services commented on LUCENE-9569:
-

Commit e4038142cc33cfcdf58fa2fdde9d8e66251ae4bf in lucene-solr's branch 
refs/heads/branch_8x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e403814 ]

Revert "LUCENE-9569 Disalbe sort opt on _doc (#1959)"

Re-enabled sort optimization on _doc

This reverts commit 1c0f07ac03f0235adaf5c150f1c6656336e4282f.


> Temporarily disable sort optimization on _doc for 8.7 release.
> --
>
> Key: LUCENE-9569
> URL: https://issues.apache.org/jira/browse/LUCENE-9569
> Project: Lucene - Core
>  Issue Type: Task
>Affects Versions: 8.7
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.7
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sort optimization on _doc was introduced in LUCENE-9449, but it looks 
> unstable and lead to some recent tests failures. 
> As the release of 8.7 is very soon, we need to temporarily disable this sort 
> optimization for _doc for this release with a plan to stabilize it for later 
> releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9450) Taxonomy index should use DocValues not StoredFields

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231006#comment-17231006
 ] 

ASF subversion and git services commented on LUCENE-9450:
-

Commit 3f8f84f9b063277e9017221bfc5e80fb901fc1ce in lucene-solr's branch 
refs/heads/master from Gautam Worah
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3f8f84f ]

LUCENE-9450 Switch to BinaryDocValues instead of stored fields in Lucene's 
facet implementation, yielding ~4-5% red-line QPS gain in pure faceting 
benchmarks (#1733)



> Taxonomy index should use DocValues not StoredFields
> 
>
> Key: LUCENE-9450
> URL: https://issues.apache.org/jira/browse/LUCENE-9450
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.5.2
>Reporter: Gautam Worah
>Priority: Minor
>  Labels: performance
> Attachments: LUCENE-9450-localrun.py-v1, wip_taxonomy_patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The taxonomy index that maps binning labels to ordinals was created before 
> Lucene added BinaryDocValues.
> I've attached a WIP patch (does not pass tests currently)
> Issue suggested by [~mikemccand]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand merged pull request #1733: LUCENE-9450 Use BinaryDocValues in the taxonomy writer

2020-11-12 Thread GitBox



mikemccand merged pull request #1733:
URL: https://github.com/apache/lucene-solr/pull/1733


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9600) Clean up package name conflicts for misc module

2020-11-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231029#comment-17231029
 ] 

ASF subversion and git services commented on LUCENE-9600:
-

Commit af47cb7bcdd4eb10263a0586474c6e255307 in lucene-solr's branch 
refs/heads/master from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=af47cb7 ]

LUCENE-9600: Fix wrong link


> Clean up package name conflicts for misc module
> ---
>
> Key: LUCENE-9600
> URL: https://issues.apache.org/jira/browse/LUCENE-9600
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> misc module shares the package names o.a.l.document, o.a.l.index, 
> o.a.l.search, o.a.l.store, and o.a.l.util with lucene-core. Those should be 
> moved under o.a.l.misc (or some classed should be moved to core?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-12 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-9499.
---
Resolution: Fixed

Fixed by LUCENE-9600.

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-12 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231031#comment-17231031
 ] 

Uwe Schindler edited comment on LUCENE-9499 at 11/12/20, 11:29 PM:
---

Fixed by recent commit described in LUCENE-9600.


was (Author: thetaphi):
Fixed by LUCENE-9600.

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9600) Clean up package name conflicts for misc module

2020-11-12 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-9600.
---
Resolution: Fixed

This should be fixed now.

> Clean up package name conflicts for misc module
> ---
>
> Key: LUCENE-9600
> URL: https://issues.apache.org/jira/browse/LUCENE-9600
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> misc module shares the package names o.a.l.document, o.a.l.index, 
> o.a.l.search, o.a.l.store, and o.a.l.util with lucene-core. Those should be 
> moved under o.a.l.misc (or some classed should be moved to core?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke commented on pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-12 Thread GitBox



cpoerschke commented on pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#issuecomment-726414214


   Hi @alessandrobenedetti, I returned to this pull request today, both the 
changes above and the tests (which I hadn't looked at before). Very 
comprehensive test coverage, thank you. Have pushed all my remaining insights 
to the 
https://github.com/cpoerschke/lucene-solr/commits/feature/SOLR-14560-cpoerschke-2
 branch -- for the 
https://github.com/cpoerschke/lucene-solr/commit/4912daccd596435f5c61ac1a3cf86eaebb039118
 and 
https://github.com/cpoerschke/lucene-solr/commit/3a61287a0e4fb5a77f080a92e7129582b234cbd7
 commits which are perhaps a bit subtle I've added annotations on the pull 
request here -- the other commits are hopefully relatively self-explanatory. 
Let me know what you think, I agree the commit phase is fast approaching here :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-12 Thread GitBox



cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522486838



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }
-  
-  // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
 
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
+  LTRScoringQuery[] rerankingQueriesFromContext = 
SolrQueryRequestContextUtils.getScoringQueries(req);
+  docsWereNotReranked = (rerankingQueriesFromContext == null || 
rerankingQueriesFromContext.length == 0);
+  String transformerFeatureStore = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  Map transformerExternalFeatureInfo = 
LTRQParserPlugin.extractEFIParams(localparams);
 
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
+  initLoggingModel(transformerFeatureStore);

Review comment:
   LTRFeatureLoggerTransformerFactory.2 - `loggingModel` being a member of 
the transformer factory gives it `SolrCore` lifetime/scope but here it's 
initialised based on per-request parameters. If multiple threads use the same 
transformer factory object concurrently then they might trampled upon each 
other. 
https://github.com/cpoerschke/lucene-solr/commit/4912daccd596435f5c61ac1a3cf86eaebb039118
 proposes to not have the logging model as a member of the transformer factory.

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -79,6 +81,7 @@
   private char csvFeatureSeparator = 
CSVFeatureLogger.DEFAULT_FEATURE_SEPARATOR;
 
   private LTRThreadModule threadManager = null;
+  private LoggingModel loggingModel = null;

Review comment:
   LTRFeatureLoggerTransformerFactory.1 - `loggingModel` is a member of of 
the transformer factory here.

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }

Review comment:
   LTRFeatureLoggerTransformerFactory.3 - I noted that `threadManager` here 
is an existing member of the transformer factory and it is initialised as part 
of request processing. Since there's no locking or anything there could be a 
chance that multiple threads concurrently call `threadManager.setExecutor()` 
but the argument to the set call is not specific to the request i.e. all 
requests would set the same thing (whereas for the logging model different 
requests could supply a different feature store name via the `fl=[feature 
store=...]` parameter).

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }
-  
-  // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
 
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
+  LTRScoringQuery[] rerankingQueriesFromContext = 
SolrQueryRequestContextUtils.getScoringQueries(req);
+  docsWereNotReranked = (rerankingQueriesFromContext == null || 
rerankingQueriesFromContext.le

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-12 Thread GitBox



cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522515743



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.
+ * Team Draft Interleaving produces a fair distribution of ranking models’ 
elements in the final interleaved list.
+ * It has also proved to overcome an issue of the previous implemented 
approach, Balanced interleaving, in determining the winning model[4].
+ * 
+ * [1] T. Joachims. Optimizing search engines using clickthrough data. KDD 
(2002)
+ * [2] 
T.Joachims.Evaluatingretrievalperformanceusingclickthroughdata.InJ.Franke, G. 
Nakhaeizadeh, and I. Renz, editors,
+ * Text Mining, pages 79–96. Physica/Springer (2003)
+ * [3] F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data 
reflect re-
+ * trieval quality? In CIKM, pages 43–52. ACM Press (2008)
+ * [4] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue.
+ * Large-scale validation and analysis of interleaved search evaluation. ACM 
TOIS, 30(1):1–41, Feb. (2012)
+ */
+public class TeamDraftInterleaving implements Interleaving{
+  public static Random RANDOM;
+
+  static {
+// We try to make things reproducible in the context of our tests by 
initializing the random instance
+// based on the current seed
+String seed = System.getProperty("tests.seed");
+if (seed == null) {
+  RANDOM = new Random();
+} else {
+  RANDOM = new Random(seed.hashCode());
+}
+  }
+
+  /**
+   * Team Draft Interleaving considers two ranking models: modelA and modelB.
+   * For a given query, each model returns its ranked list of documents La = 
(a1,a2,...) and Lb = (b1, b2, ...).
+   * The algorithm creates a unique ranked list I = (i1, i2, ...).
+   * This list is created by interleaving elements from the two lists la and 
lb as described by Chapelle et al.[1].
+   * Each element Ij is labelled TeamA if it is selected from La and TeamB if 
it is selected from Lb.
+   * 
+   * [1] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue.
+   * Large-scale validation and analysis of interleaved search evaluation. ACM 
TOIS, 30(1):1–41, Feb. (2012)
+   * 
+   * Assumptions:
+   * - rerankedA and rerankedB has the same length.
+   * They contains the same search results, ranked differently by two ranking 
models
+   * - each reranked list can not contain the same search result more than 
once.
+   *
+   * @param rerankedA a ranked list of search results produced by a ranking 
model A
+   * @param rerankedB a ranked list of search results produced by a ranking 
model B
+   * @return the interleaved ranking list
+   */
+  public InterleavingResult interleave(ScoreDoc[] rerankedA, ScoreDoc[] 
rerankedB) {
+LinkedHashSet interleavedResults = new LinkedHashSet<>();
+ScoreDoc[] interleavedResultArray = new ScoreDoc[rerankedA.length];
+ArrayList> interleavingPicks = new ArrayList<>(2);
+Set teamA = new HashSet<>();
+Set teamB = new HashSet<>();
+int topN = rerankedA.length;
+int indexA = 0, indexB = 0;
+
+while (interleavedResults.size() < topN && indexA < rerankedA.length && 
indexB < rerankedB.length) {
+  if(teamA.size() interleaved, int index, 
ScoreDoc[] reranked) {
+boolean foundElementToAdd = false;
+while (index < reranked.length && !foundElementToAdd) {
+  ScoreDoc elementToCheck = reranked[index];
+  if (interleaved.contains(elementToCheck)) {

Review comment:
   > ... currently Interleaving doesn't support sharding ...
   
   Let's include that in the documentation somehow, e.g. 
https://github.com/cp

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-12 Thread GitBox



cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522516056



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -210,50 +216,59 @@ public void setContext(ResultContext context) {
   }
   
   // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
-
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
-
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
-
-  scoringQuery = new LTRScoringQuery(lm,
-  LTRQParserPlugin.extractEFIParams(localparams),
-  true,
-  threadManager); // request feature weights to be created for all 
features
-
-}catch (final Exception e) {
-  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "retrieving the feature store "+featureStoreName, e);
-}
-  }
+  rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req);
 
-  if (scoringQuery.getOriginalQuery() == null) {
-scoringQuery.setOriginalQuery(context.getQuery());
+  docsWereNotReranked = (rerankingQueries == null || 
rerankingQueries.length == 0);
+  if (docsWereNotReranked) {
+rerankingQueries = new LTRScoringQuery[]{null};
   }
-  if (scoringQuery.getFeatureLogger() == null){
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-  }
-  scoringQuery.setRequest(req);
-
-  featureLogger = scoringQuery.getFeatureLogger();
+  modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length];
+  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  for (int i = 0; i < rerankingQueries.length; i++) {
+LTRScoringQuery scoringQuery = rerankingQueries[i];
+if ((scoringQuery == null || !(scoringQuery instanceof 
OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName 
!= null && 
!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {

Review comment:
   > ... I believe the code is much more readable now ... now that part is 
extremely clear.
   
   Yes, I agree, very nice.
   
   > ... another consideration sparkled: ... a separate Jira for that ...
   
   Interesting points, will need to think about them a bit (next week). I agree 
it's unrelated i.e. not a blocker for this pull request here.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14928) Remove Overseer ClusterStateUpdater

2020-11-12 Thread Ilan Ginzburg (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231069#comment-17231069
 ] 

Ilan Ginzburg commented on SOLR-14928:
--

Just forced push a [new 
commit|https://github.com/murblanc/lucene-solr/commit/e77e0ec784f024438788c6b7eb5ca12785cac5d2]
 to the same branch. Nothing is tested yet but the latest drop is somewhat more 
complete and hopefully I'll be able to debug then run it for collection 
creation (code supports the {{CreateCollectionCmd}} cluster state changes only, 
not the nodes advertising the replicas are up) and get a feel on how it behaves.

> Remove Overseer ClusterStateUpdater
> ---
>
> Key: SOLR-14928
> URL: https://issues.apache.org/jira/browse/SOLR-14928
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Ilan Ginzburg
>Assignee: Ilan Ginzburg
>Priority: Major
>  Labels: cluster, collection-api, overseer
>
> Remove the Overseer {{ClusterStateUpdater}} thread and associated Zookeeper 
> queue at {{<_chroot_>/overseer/queue}}.
> Change cluster state updates so that each (Collection API) command execution 
> does the update directly in Zookeeper using optimistic locking (Compare and 
> Swap on the {{state.json}} Zookeeper files).
> Following this change cluster state updates would still be happening only 
> from the Overseer node (that's where Collection API commands are executing), 
> but the code will be ready for distribution once such commands can be 
> executed by any node (other work done in the context of parent task 
> SOLR-14927).
> See the [Cluster State 
> Updater|https://docs.google.com/document/d/1u4QHsIHuIxlglIW6hekYlXGNOP0HjLGVX5N6inkj6Ok/edit#heading=h.ymtfm3p518c]
>  section in the Removing Overseer doc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zacharymorn commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-12 Thread GitBox



zacharymorn commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-726467539


   > > On the other hand, I think these tests will break if run from IDEs. Do 
we need to support that in this PR?
   > 
   > Oh, thanks! I'll take a look after I come back from work. I think the PR 
should be clean in that it doesn't break other people's workflow, so yes - if 
you added tests they should run or be quietly ignored if they're not supported. 
I'll take a look.
   
   Agreed. I just looked around as well, and it seems @Nightly annotation might 
be a possible work around here to disable running these tests from IDE. Granted 
it's a bit of a hack as it also disabled running those tests from developer's 
local gradle build by default, but at least these tests will get run in nightly 
build and bugs can be caught (albeit a bit late).
   
   Another solution is perhaps to create a new annotation to disable tests 
running from IDE, if such a thing can be detected ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-11-12 Thread Zach Chen (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231109#comment-17231109
 ] 

Zach Chen commented on LUCENE-9508:
---

Hi [~simonw], just to clarify I wasn't the one who originally reported the 
issue. I just poked around in code and run some tests to see if I can help here.

> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> 
>
> Key: LUCENE-9508
> URL: https://issues.apache.org/jira/browse/LUCENE-9508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.1
>Reporter: Sorabh Hamirwasia
>Priority: Major
>  Labels: IndexWriter
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14991) tag and remove obsolete branches

2020-11-12 Thread Noble Paul (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231113#comment-17231113
 ] 

Noble Paul commented on SOLR-14991:
---

done [~erickerickson]

> tag and remove obsolete branches
> 
>
> Key: SOLR-14991
> URL: https://issues.apache.org/jira/browse/SOLR-14991
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> I'm going to gradually work through the branches, tagging and removing
> 1> anything with a Jira name that's fixed
> 2> anything that I'm certain will never be fixed (e.g. the various gradle 
> build branches)
> So the changes will still available, they just won't pollute the branch list.
> I'll list the branches here, all the tags will be
> history/branches/lucene-solr/
>  
> This specifically will _not_ include
> 1> any release, e.g. branch_8_4
> 2> anything I'm unsure about. People who've created branches should expect 
> some pings about this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

68 matches

Mail list logo