[jira] [Updated] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated SOLR-14901:
-
Description: 
Base analysis factories' package name were renamed in [LUCENE-9317]. 
{{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old FQCNs, 
that needs to be fixed.

See https://github.com/apache/lucene-solr/pull/1836 for details.

  was:
Base analysis factories' package name were renamed in [LUCENE-9317]. 
{{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old FQDNs, 
that needs to be fixed.

See https://github.com/apache/lucene-solr/pull/1836 for details.


> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Dawid Weiss
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated SOLR-14901:
-
Summary: TestPackages uses binary precompiled classes to refer to analysis 
factory FQCNs  (was: TestPackages uses binary precompiled classes to refer to 
analysis factory FQDNs)

> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Dawid Weiss
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQDNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi commented on pull request #1924: LUCENE-9544: Port Nori dictionary compilation

2020-09-28 Thread GitBox


danmuzi commented on pull request #1924:
URL: https://github.com/apache/lucene-solr/pull/1924#issuecomment-699838629


   Thanks for checking, @dweiss! 👍 
   Trivial, but there is no issue ID(LUCENE-9544) in the merged commit.
   Is it okay?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9544) Port Nori dictionary compilation

2020-09-28 Thread Namgyu Kim (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203070#comment-17203070
 ] 

Namgyu Kim commented on LUCENE-9544:


Of course it's okay!

Thanks for merging.

But I have a minor concern, could you check the PR comment?

> Port Nori dictionary compilation
> 
>
> Key: LUCENE-9544
> URL: https://issues.apache.org/jira/browse/LUCENE-9544
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is no script for Nori dictionary after the Ant build was deleted in 
> LUCENE-9433.
>  I made a patch by referring to LUCENE-9155. (Thanks [~dweiss] :D)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9317) Resolve package name conflicts for StandardAnalyzer to allow Java module system support

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203073#comment-17203073
 ] 

ASF subversion and git services commented on LUCENE-9317:
-

Commit 5e617ccc33d91998a992a87ae258de43ef75242e in lucene-solr's branch 
refs/heads/master from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5e617cc ]

LUCENE-9317: Clean up split package in analyzers-common (#1836)



> Resolve package name conflicts for StandardAnalyzer to allow Java module 
> system support
> ---
>
> Key: LUCENE-9317
> URL: https://issues.apache.org/jira/browse/LUCENE-9317
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: master (9.0)
>Reporter: David Ryan
>Priority: Major
>  Labels: build, features
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
>  
> To allow Lucene to be modularised there are a few preparatory tasks to be 
> completed prior to this being possible.  The Java module system requires that 
> jars do not use the same package name in different jars.  The lucene-core and 
> lucene-analyzers-common both share the package 
> org.apache.lucene.analysis.standard.
> Possible resolutions to this issue are discussed by Uwe on the mailing list 
> here:
>  
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAM21Rt8FHOq_JeUSELhsQJH0uN0eKBgduBQX4fQKxbs49TLqzA%40mail.gmail.com%3E]
> {quote}About StandardAnalyzer: Unfortunately I aggressively complained a 
> while back when Mike McCandless wanted to move standard analyzer out of the 
> analysis package into core (“for convenience”). This was a bad step, and IMHO 
> we should revert that or completely rename the packages and everything. The 
> problem here is: As the analysis services are only part of lucene-analyzers, 
> we had to leave the factory classes there, but move the implementation 
> classes in core. The package has to be the same. The only way around that is 
> to move the analysis factory framework also to core (I would not be against 
> that). This would include all factory base classes and the service loading 
> stuff. Then we can move standard analyzer and some of the filters/tokenizers 
> including their factories to core an that problem would be solved.
> {quote}
> There are two options here, either move factory framework into core or revert 
> StandardAnalyzer back to lucene-analyzers.  In the email, the solution lands 
> on reverting back as per the task list:
> {quote}Add some preparatory issues to cleanup class hierarchy: Move Analysis 
> SPI to core / remove StandardAnalyzer and related classes out of core back to 
> anaysis
> {quote}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta merged pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-28 Thread GitBox


mocobeta merged pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9317) Resolve package name conflicts for StandardAnalyzer to allow Java module system support

2020-09-28 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida reassigned LUCENE-9317:
-

Assignee: Tomoko Uchida

> Resolve package name conflicts for StandardAnalyzer to allow Java module 
> system support
> ---
>
> Key: LUCENE-9317
> URL: https://issues.apache.org/jira/browse/LUCENE-9317
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: master (9.0)
>Reporter: David Ryan
>Assignee: Tomoko Uchida
>Priority: Major
>  Labels: build, features
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
>  
> To allow Lucene to be modularised there are a few preparatory tasks to be 
> completed prior to this being possible.  The Java module system requires that 
> jars do not use the same package name in different jars.  The lucene-core and 
> lucene-analyzers-common both share the package 
> org.apache.lucene.analysis.standard.
> Possible resolutions to this issue are discussed by Uwe on the mailing list 
> here:
>  
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAM21Rt8FHOq_JeUSELhsQJH0uN0eKBgduBQX4fQKxbs49TLqzA%40mail.gmail.com%3E]
> {quote}About StandardAnalyzer: Unfortunately I aggressively complained a 
> while back when Mike McCandless wanted to move standard analyzer out of the 
> analysis package into core (“for convenience”). This was a bad step, and IMHO 
> we should revert that or completely rename the packages and everything. The 
> problem here is: As the analysis services are only part of lucene-analyzers, 
> we had to leave the factory classes there, but move the implementation 
> classes in core. The package has to be the same. The only way around that is 
> to move the analysis factory framework also to core (I would not be against 
> that). This would include all factory base classes and the service loading 
> stuff. Then we can move standard analyzer and some of the filters/tokenizers 
> including their factories to core an that problem would be solved.
> {quote}
> There are two options here, either move factory framework into core or revert 
> StandardAnalyzer back to lucene-analyzers.  In the email, the solution lands 
> on reverting back as per the task list:
> {quote}Add some preparatory issues to cleanup class hierarchy: Move Analysis 
> SPI to core / remove StandardAnalyzer and related classes out of core back to 
> anaysis
> {quote}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1924: LUCENE-9544: Port Nori dictionary compilation

2020-09-28 Thread GitBox


dweiss commented on pull request #1924:
URL: https://github.com/apache/lucene-solr/pull/1924#issuecomment-699843163


   Let me revert that commit and you please commit it in, ok? I rushed with 
clicking the github merge button...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-28 Thread GitBox


mocobeta commented on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-699843865


   I will keep open https://issues.apache.org/jira/browse/LUCENE-9317. 
   @uschindler if you notice any follow-ups are needed, please give comments 
there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1924: LUCENE-9544: Port Nori dictionary compilation

2020-09-28 Thread GitBox


dweiss commented on pull request #1924:
URL: https://github.com/apache/lucene-solr/pull/1924#issuecomment-699843889


   Reverted. You can also add CHANGES.txt entry while you're at it, thank you 
and sorry for the noise!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta edited a comment on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-28 Thread GitBox


mocobeta edited a comment on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-699843865


   I will keep open https://issues.apache.org/jira/browse/LUCENE-9317. 
   @uschindler if you notice that any follow-ups are needed, please give 
comments there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-28 Thread GitBox


dweiss commented on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-699844397


   I ran a full check, passes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-28 Thread GitBox


mocobeta commented on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-699844716


   > I ran a full check, passes.
   
   Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


s1monw commented on pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#issuecomment-699863956


   @mikemccand can you take a look at this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw opened a new pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


s1monw opened a new pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925


   DWPT currently has no real notion of a state but it's lifecycle really
   requires such a notion. We move DWPTs from active to flush pending to 
flushing
   and execute certain actions like RAM accounting based on these states. To 
simplify
   the transitions and the concurrency involved in it, it makes sense to 
formalize the
   transitions and if it can happen under lock or not.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


s1monw commented on pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#issuecomment-699864240


   this is a followup from https://github.com/apache/lucene-solr/pull/1918



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


dweiss commented on a change in pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#discussion_r495776845



##
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java
##
@@ -2933,7 +2933,7 @@ public void testFlushLargestWriter() throws IOException, 
InterruptedException {
 int numRamDocs = w.numRamDocs();
 int numDocsInDWPT = largestNonPendingWriter.getNumDocsInRAM();
 assertTrue(w.flushNextBuffer());
-assertTrue(largestNonPendingWriter.hasFlushed());
+assertEquals(DocumentsWriterPerThread.State.FLUSHED, 
largestNonPendingWriter.getState());

Review comment:
   Wouldn't this be more pleasant to read?
   assertTrue 
largestNonPendingWriter.inState(DocumentsWriterPerThread.State.FLUSHED)?

##
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##
@@ -593,9 +568,61 @@ void unlock() {
   }
 
   /**
-   * Returns true iff this DWPT has been flushed
+   * Returns the DWPTs current state.
*/
-  boolean hasFlushed() {
-return hasFlushed.get() == Boolean.TRUE;
+  State getState() {
+return state;
   }
+
+  /**
+   * Transitions the DWPT to the given state of fails if the transition is 
invalid.
+   * @throws IllegalStateException if the given state can not be transitioned 
to.
+   */
+  synchronized void transitionTo(State state) {
+if (state.canTransitionFrom(this.state) == false) {
+  throw new IllegalStateException("Can't transition from " + this.state + 
" to " + state);
+}
+assert state.mustHoldLock == false || isHeldByCurrentThread() : "illegal 
state: " + state + " lock is held: " + isHeldByCurrentThread();
+this.state = state;
+  }
+
+  /**
+   * Internal DWPT State.
+   */
+  enum State {
+/**
+ * Default states when a DWPT is initialized and ready to index documents.
+ */
+ACTIVE(null, true),
+/**
+ * The DWPT can still index documents but should be moved to FLUSHING 
state as soon as possible.
+ * Transitions to this state can be done concurrently while another thread 
is actively indexing into this DWPT.
+ */
+FLUSH_PENDING(ACTIVE, false),
+/**
+ * The DWPT should not receive any further documents and is current 
flushing or queued up for flushing.
+ */
+FLUSHING(FLUSH_PENDING, true),
+/**
+ * The DWPT has been flushed and is ready to be garbage collected.
+ */
+FLUSHED(FLUSHING, false);
+
+private final State previousState;
+final boolean mustHoldLock; // only for asserts
+
+State(State previousState, boolean mustHoldLock) {

Review comment:
   Now you only need a formal correctness solver for this Petri net and 
you'll be good. 👍 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


s1monw commented on a change in pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#discussion_r495778992



##
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##
@@ -593,9 +568,61 @@ void unlock() {
   }
 
   /**
-   * Returns true iff this DWPT has been flushed
+   * Returns the DWPTs current state.
*/
-  boolean hasFlushed() {
-return hasFlushed.get() == Boolean.TRUE;
+  State getState() {
+return state;
   }
+
+  /**
+   * Transitions the DWPT to the given state of fails if the transition is 
invalid.
+   * @throws IllegalStateException if the given state can not be transitioned 
to.
+   */
+  synchronized void transitionTo(State state) {
+if (state.canTransitionFrom(this.state) == false) {
+  throw new IllegalStateException("Can't transition from " + this.state + 
" to " + state);
+}
+assert state.mustHoldLock == false || isHeldByCurrentThread() : "illegal 
state: " + state + " lock is held: " + isHeldByCurrentThread();
+this.state = state;
+  }
+
+  /**
+   * Internal DWPT State.
+   */
+  enum State {
+/**
+ * Default states when a DWPT is initialized and ready to index documents.
+ */
+ACTIVE(null, true),
+/**
+ * The DWPT can still index documents but should be moved to FLUSHING 
state as soon as possible.
+ * Transitions to this state can be done concurrently while another thread 
is actively indexing into this DWPT.
+ */
+FLUSH_PENDING(ACTIVE, false),
+/**
+ * The DWPT should not receive any further documents and is current 
flushing or queued up for flushing.
+ */
+FLUSHING(FLUSH_PENDING, true),
+/**
+ * The DWPT has been flushed and is ready to be garbage collected.
+ */
+FLUSHED(FLUSHING, false);
+
+private final State previousState;
+final boolean mustHoldLock; // only for asserts
+
+State(State previousState, boolean mustHoldLock) {

Review comment:
   I gotta solve the halting problem first many, I am on it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


s1monw commented on a change in pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#discussion_r495779250



##
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java
##
@@ -2933,7 +2933,7 @@ public void testFlushLargestWriter() throws IOException, 
InterruptedException {
 int numRamDocs = w.numRamDocs();
 int numDocsInDWPT = largestNonPendingWriter.getNumDocsInRAM();
 assertTrue(w.flushNextBuffer());
-assertTrue(largestNonPendingWriter.hasFlushed());
+assertEquals(DocumentsWriterPerThread.State.FLUSHED, 
largestNonPendingWriter.getState());

Review comment:
   I guess it would. I am not sure if we should really add more methods 
just for tests?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


dweiss commented on a change in pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#discussion_r495780529



##
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java
##
@@ -2933,7 +2933,7 @@ public void testFlushLargestWriter() throws IOException, 
InterruptedException {
 int numRamDocs = w.numRamDocs();
 int numDocsInDWPT = largestNonPendingWriter.getNumDocsInRAM();
 assertTrue(w.flushNextBuffer());
-assertTrue(largestNonPendingWriter.hasFlushed());
+assertEquals(DocumentsWriterPerThread.State.FLUSHED, 
largestNonPendingWriter.getState());

Review comment:
   ok, fine it is then.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


dweiss commented on a change in pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#discussion_r495782001



##
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##
@@ -593,9 +568,61 @@ void unlock() {
   }
 
   /**
-   * Returns true iff this DWPT has been flushed
+   * Returns the DWPTs current state.
*/
-  boolean hasFlushed() {
-return hasFlushed.get() == Boolean.TRUE;
+  State getState() {
+return state;
   }
+
+  /**
+   * Transitions the DWPT to the given state of fails if the transition is 
invalid.
+   * @throws IllegalStateException if the given state can not be transitioned 
to.
+   */
+  synchronized void transitionTo(State state) {
+if (state.canTransitionFrom(this.state) == false) {
+  throw new IllegalStateException("Can't transition from " + this.state + 
" to " + state);
+}
+assert state.mustHoldLock == false || isHeldByCurrentThread() : "illegal 
state: " + state + " lock is held: " + isHeldByCurrentThread();
+this.state = state;
+  }
+
+  /**
+   * Internal DWPT State.
+   */
+  enum State {
+/**
+ * Default states when a DWPT is initialized and ready to index documents.
+ */
+ACTIVE(null, true),
+/**
+ * The DWPT can still index documents but should be moved to FLUSHING 
state as soon as possible.
+ * Transitions to this state can be done concurrently while another thread 
is actively indexing into this DWPT.
+ */
+FLUSH_PENDING(ACTIVE, false),
+/**
+ * The DWPT should not receive any further documents and is current 
flushing or queued up for flushing.
+ */
+FLUSHING(FLUSH_PENDING, true),
+/**
+ * The DWPT has been flushed and is ready to be garbage collected.
+ */
+FLUSHED(FLUSHING, false);
+
+private final State previousState;
+final boolean mustHoldLock; // only for asserts
+
+State(State previousState, boolean mustHoldLock) {

Review comment:
   You have time until the end of this week [George Dantzig]





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9545) Remove Analyzer.get/setVersion()

2020-09-28 Thread Alan Woodward (Jira)
Alan Woodward created LUCENE-9545:
-

 Summary: Remove Analyzer.get/setVersion()
 Key: LUCENE-9545
 URL: https://issues.apache.org/jira/browse/LUCENE-9545
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Alan Woodward
Assignee: Alan Woodward


In days of yore, some lucene Analyzers would change their behaviour depending 
on a version constant, so you could say 'use this analyzer in the way that it 
would have worked in lucene 2.1'.  However, we have no Analyzers that make use 
of this in the 9x or 8x lines, and I think it's pretty confusing behaviour 
anyway.  We have factories to configure analyzers, and version-specific 
behaviour can reside there if we really need it.  We should just remove this 
functionality from Analyzer altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris merged pull request #1906: SOLR-13528: Implement API Based Config For Rate Limiters

2020-09-28 Thread GitBox


atris merged pull request #1906:
URL: https://github.com/apache/lucene-solr/pull/1906


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13528) Rate limiting in Solr

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203118#comment-17203118
 ] 

ASF subversion and git services commented on SOLR-13528:


Commit 4105414c90c94a3f426ce28893a744d3a800dbf4 in lucene-solr's branch 
refs/heads/master from Atri Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4105414 ]

SOLR-13528: Implement API Based Config For Rate Limiters (#1906)

This commit moves Rate Limiter configurations from web.xml to a new command 
based approach using clusterprops.json

> Rate limiting in Solr
> -
>
> Key: SOLR-13528
> URL: https://issues.apache.org/jira/browse/SOLR-13528
> Project: Solr
>  Issue Type: New Feature
>Reporter: Anshum Gupta
>Assignee: Atri Sharma
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> In relation to SOLR-13527, Solr also needs a way to throttle update and 
> search requests based on usage metrics. This is the umbrella JIRA for both 
> update and search rate limiting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi opened a new pull request #1926: LUCENE-9544: Port Nori dictionary compilation

2020-09-28 Thread GitBox


danmuzi opened a new pull request #1926:
URL: https://github.com/apache/lucene-solr/pull/1926


   It's almost the same as the #1924 (except CHANGES.txt).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi commented on pull request #1924: LUCENE-9544: Port Nori dictionary compilation

2020-09-28 Thread GitBox


danmuzi commented on pull request #1924:
URL: https://github.com/apache/lucene-solr/pull/1924#issuecomment-699900793


   Thanks for reverting, @dweiss!
   I reopened the pull request(#1926) for issue ID and CHANGES.txt.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi merged pull request #1926: LUCENE-9544: Port Nori dictionary compilation

2020-09-28 Thread GitBox


danmuzi merged pull request #1926:
URL: https://github.com/apache/lucene-solr/pull/1926


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9544) Port Nori dictionary compilation

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203165#comment-17203165
 ] 

ASF subversion and git services commented on LUCENE-9544:
-

Commit 00d7f5ea68d8eaec618e4019714fda02060539a6 in lucene-solr's branch 
refs/heads/master from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=00d7f5e ]

LUCENE-9544: Port Nori dictionary compilation (#1926)



> Port Nori dictionary compilation
> 
>
> Key: LUCENE-9544
> URL: https://issues.apache.org/jira/browse/LUCENE-9544
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There is no script for Nori dictionary after the Ant build was deleted in 
> LUCENE-9433.
>  I made a patch by referring to LUCENE-9155. (Thanks [~dweiss] :D)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi commented on pull request #1926: LUCENE-9544: Port Nori dictionary compilation

2020-09-28 Thread GitBox


danmuzi commented on pull request #1926:
URL: https://github.com/apache/lucene-solr/pull/1926#issuecomment-699949157


   Thanks, @dweiss ! 👍 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14397) Vector Search in Solr

2020-09-28 Thread RUEI-SING CHOU (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203172#comment-17203172
 ] 

RUEI-SING CHOU commented on SOLR-14397:
---

I would like to propose an new approach:

1. Leverage [SuperBit|https://github.com/tdebatty/java-LSH] hash to 
de-dimension the vector from double array to a boolean array
 * It will keep the similarity. [According to 
this.|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/examples/SuperBitExample.java#L58]
 * We should leverage the DocValues

2. According to the [new similarity scoring 
function|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/SuperBit.java#L197],
 the original implementation is for comparison with cosine similarity. So if we 
want to leverage this for new scoring and keep the documents in order, we can 
simplify the algorithm:
 # De-dimension the vectors to a boolean array by SuperBit. (Stored as a binary 
array in DocValues)
 # Count the same bit in each digit as the score.

Simplify:
 # Use [BitSet|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html] 
in Java, convert both binary array vector into BitSet.
 ## Use bitwise "BitSet.and()" operation, then the digits with the same bit 
will be true.
 ## Use 
"BitSet.[cardinality|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html#cardinality--]()"
 to get the count of true
 ## Done.
 # The BitSet is memory efficient, [see this 
discussion|https://stackoverflow.com/questions/605226/boolean-vs-bitset-which-is-more-efficient].

3. Additional:
 # Filter similarity score > .8 (In this case, cardinality > k), to make the 
recall and precision keep at a good level. If the SuperBit keeps the similarity 
good, the recall and precision will be excellent.

Note: *this approach will compute the score with all documents.* Since the 
calculation cost is low, and memory consumption is low, we can leverage the 
field attribute _docValuesFormat="Memory"_. Furthermore, the DocValues should 
support the BinaryDocValues.

ref: A Revisit of Hashing Algorithms for Approximate
Nearest Neighbor Search https://arxiv.org/pdf/1612.07545.pdf



> Vector Search in Solr
> -
>
> Key: SOLR-14397
> URL: https://issues.apache.org/jira/browse/SOLR-14397
> Project: Solr
>  Issue Type: Improvement
>Reporter: Trey Grainger
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Search engines have traditionally relied upon token-based matching (typically 
> keywords) on an inverted index, plus relevance ranking based upon keyword 
> occurrence statistics. This can be viewed as a "sparse vector” match (where 
> each term is a one-hot encoded dimension in the vector), since only a few 
> keywords out of all possible keywords are considered in each query. With the 
> introduction of deep-learning-based transformers over the last few years, 
> however, the state of the art in relevance has moved to ranking models based 
> upon dense vectors that encode a latent, semantic understanding of both 
> language constructs and the underlying domain upon which the model was 
> trained. These dense vectors are also referred to as “embeddings”. An example 
> of this kind of embedding would be taking the phrase “chief executive officer 
> of the tech company” and converting it to [0.03, 1.7, 9.12, 0, 0.3]
>  . Other similar phrases should encode to vectors with very similar numbers, 
> so we may expect a query like “CEO of a technology org” to generate a vector 
> like [0.1, 1.9, 8.9, 0.1, 0.4]. When performing a cosine similarity 
> calculation between these vectors, we would expect a number closer to 1.0, 
> whereas a very unrelated text blurb would generate a much smaller cosine 
> similarity.
> This is a proposal for how we should implement these vector search 
> capabilities in Solr.
> h1. Search Process Overview:
> In order to implement dense vector search, the following process is typically 
> followed:
> h2. Offline:
> An encoder is built. An encoder can take in text (a query, a sentence, a 
> paragraph, a document, etc.) and return a dense vector representing that 
> document in a rich semantic space. The semantic space is learned from 
> training on textual data (usually, though other sources work, too), typically 
> from the domain of the search engine.
> h2. Document Ingestion:
> When documents are processed, they are passed to the encoder, and the dense 
> vector(s) returned are stored as fields on the document. There could be one 
> or more vectors per-document, as the granularity of the vectors could be 
> per-document, per field, per paragraph, per-sentence, or even per phrase or 
> per term.
> h2. Query Time:
> *Encoding:* The query is translated to a dense vector by passing it to the 
> encoder
>  Quantization: The q

[jira] [Comment Edited] (SOLR-14397) Vector Search in Solr

2020-09-28 Thread Steven Chou (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203172#comment-17203172
 ] 

Steven Chou edited comment on SOLR-14397 at 9/28/20, 11:50 AM:
---

I would like to propose an new approach:

1. Leverage [SuperBit|https://github.com/tdebatty/java-LSH] hash to 
de-dimension the vector from double array to a boolean array
 * It will keep the similarity. [According to 
this.|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/examples/SuperBitExample.java#L58]
 * We should leverage the DocValues

2. According to the [new similarity scoring 
function|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/SuperBit.java#L197],
 the original implementation is for comparison with cosine similarity. So if we 
want to leverage this for new scoring and keep the documents in order, we can 
simplify the algorithm:
 # De-dimension the vectors to a boolean array by SuperBit. (Stored as a binary 
array in DocValues)
 # Count the identical bit in each digit as the score.

Simplify:
 # Use [BitSet|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html] 
in Java, convert both binary array vector into BitSet.
 ## Use bitwise "BitSet.and()" operation, then the digits with the same bit 
will be true.
 ## Use 
"BitSet.[cardinality|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html#cardinality--]()"
 to get the count of true
 ## Done.
 # The BitSet is memory efficient, [see this 
discussion|https://stackoverflow.com/questions/605226/boolean-vs-bitset-which-is-more-efficient].

3. Additional:
 # Filter similarity score > .8 (In this case, cardinality > k), to make the 
recall and precision keep at a good level. If the SuperBit keeps the similarity 
good, the recall and precision will be excellent.

Note: *this approach will compute the score with all documents.* Since the 
calculation cost is low, and memory consumption is low, we can leverage the 
field attribute _docValuesFormat="Memory"_. Furthermore, the DocValues should 
support the BinaryDocValues.

ref: A Revisit of Hashing Algorithms for Approximate
Nearest Neighbor Search https://arxiv.org/pdf/1612.07545.pdf




was (Author: sing10407):
I would like to propose an new approach:

1. Leverage [SuperBit|https://github.com/tdebatty/java-LSH] hash to 
de-dimension the vector from double array to a boolean array
 * It will keep the similarity. [According to 
this.|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/examples/SuperBitExample.java#L58]
 * We should leverage the DocValues

2. According to the [new similarity scoring 
function|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/SuperBit.java#L197],
 the original implementation is for comparison with cosine similarity. So if we 
want to leverage this for new scoring and keep the documents in order, we can 
simplify the algorithm:
 # De-dimension the vectors to a boolean array by SuperBit. (Stored as a binary 
array in DocValues)
 # Count the same bit in each digit as the score.

Simplify:
 # Use [BitSet|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html] 
in Java, convert both binary array vector into BitSet.
 ## Use bitwise "BitSet.and()" operation, then the digits with the same bit 
will be true.
 ## Use 
"BitSet.[cardinality|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html#cardinality--]()"
 to get the count of true
 ## Done.
 # The BitSet is memory efficient, [see this 
discussion|https://stackoverflow.com/questions/605226/boolean-vs-bitset-which-is-more-efficient].

3. Additional:
 # Filter similarity score > .8 (In this case, cardinality > k), to make the 
recall and precision keep at a good level. If the SuperBit keeps the similarity 
good, the recall and precision will be excellent.

Note: *this approach will compute the score with all documents.* Since the 
calculation cost is low, and memory consumption is low, we can leverage the 
field attribute _docValuesFormat="Memory"_. Furthermore, the DocValues should 
support the BinaryDocValues.

ref: A Revisit of Hashing Algorithms for Approximate
Nearest Neighbor Search https://arxiv.org/pdf/1612.07545.pdf



> Vector Search in Solr
> -
>
> Key: SOLR-14397
> URL: https://issues.apache.org/jira/browse/SOLR-14397
> Project: Solr
>  Issue Type: Improvement
>Reporter: Trey Grainger
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Search engines have traditionally relied upon token-based matching (typically 
> keywords) on an inverted index, plus relevance ranking based upon keyword 
> occurrence statistics. This can be viewed as a "sparse vector” match (where 
> each term is a one-hot encoded dimension in the vector), sin

[jira] [Comment Edited] (SOLR-14397) Vector Search in Solr

2020-09-28 Thread Steven Chou (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203172#comment-17203172
 ] 

Steven Chou edited comment on SOLR-14397 at 9/28/20, 11:53 AM:
---

I would like to propose an new approach:

1. Leverage [SuperBit|https://github.com/tdebatty/java-LSH] hash to 
de-dimension the vector from double array to a boolean array
 * It will keep the similarity. [According to 
this.|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/examples/SuperBitExample.java#L58]
 * We should leverage the DocValues

2. According to the [new similarity scoring 
function|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/SuperBit.java#L197],
 the original implementation is for comparison with cosine similarity. So if we 
want to leverage this for new scoring and keep the documents in order, we can 
simplify the algorithm:
 # De-dimension the vectors to a boolean array by SuperBit. (Stored as a binary 
array in DocValues)
 # Count the identical bit in each digit as the score.

Simplify:
 # Use [BitSet|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html] 
in Java, convert both binary array vector into BitSet.
 ## Use bitwise "BitSet.and()" operation, then the digits with the same bit 
will be true.
 ## Use 
"BitSet.[cardinality|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html#cardinality--]()"
 to get the count of true
 ## Done.
 # The BitSet is memory efficient, [see this 
discussion|https://stackoverflow.com/questions/605226/boolean-vs-bitset-which-is-more-efficient].

3. Additional:
 # Filter similarity score > .8 (In this case, cardinality > k), to make the 
recall and precision keep at a good level. If the SuperBit keeps the similarity 
good, the recall and precision will be excellent.

Note: *this approach will compute the score with all documents.* Since the 
calculation cost is low, and memory consumption is low, we can leverage the 
field attribute _docValuesFormat="Memory"_. Furthermore, the DocValues should 
support the BinaryDocValues and all vectors should be hashed by SuperBit and 
converted into binary format in order to store as BinaryDocValues.

ref: A Revisit of Hashing Algorithms for Approximate
Nearest Neighbor Search https://arxiv.org/pdf/1612.07545.pdf




was (Author: sing10407):
I would like to propose an new approach:

1. Leverage [SuperBit|https://github.com/tdebatty/java-LSH] hash to 
de-dimension the vector from double array to a boolean array
 * It will keep the similarity. [According to 
this.|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/examples/SuperBitExample.java#L58]
 * We should leverage the DocValues

2. According to the [new similarity scoring 
function|https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/SuperBit.java#L197],
 the original implementation is for comparison with cosine similarity. So if we 
want to leverage this for new scoring and keep the documents in order, we can 
simplify the algorithm:
 # De-dimension the vectors to a boolean array by SuperBit. (Stored as a binary 
array in DocValues)
 # Count the identical bit in each digit as the score.

Simplify:
 # Use [BitSet|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html] 
in Java, convert both binary array vector into BitSet.
 ## Use bitwise "BitSet.and()" operation, then the digits with the same bit 
will be true.
 ## Use 
"BitSet.[cardinality|https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html#cardinality--]()"
 to get the count of true
 ## Done.
 # The BitSet is memory efficient, [see this 
discussion|https://stackoverflow.com/questions/605226/boolean-vs-bitset-which-is-more-efficient].

3. Additional:
 # Filter similarity score > .8 (In this case, cardinality > k), to make the 
recall and precision keep at a good level. If the SuperBit keeps the similarity 
good, the recall and precision will be excellent.

Note: *this approach will compute the score with all documents.* Since the 
calculation cost is low, and memory consumption is low, we can leverage the 
field attribute _docValuesFormat="Memory"_. Furthermore, the DocValues should 
support the BinaryDocValues.

ref: A Revisit of Hashing Algorithms for Approximate
Nearest Neighbor Search https://arxiv.org/pdf/1612.07545.pdf



> Vector Search in Solr
> -
>
> Key: SOLR-14397
> URL: https://issues.apache.org/jira/browse/SOLR-14397
> Project: Solr
>  Issue Type: Improvement
>Reporter: Trey Grainger
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Search engines have traditionally relied upon token-based matching (typically 
> keywords) on an inverted index, plus relevance ranking based upon keyword 
> occurrence statist

[GitHub] [lucene-solr] ctargett commented on a change in pull request #1923: SOLR-14900: Reference Guide build cleanup/module upgrade

2020-09-28 Thread GitBox


ctargett commented on a change in pull request #1923:
URL: https://github.com/apache/lucene-solr/pull/1923#discussion_r495883260



##
File path: solr/solr-ref-guide/build.gradle
##
@@ -90,14 +90,14 @@ dependencies {
 depVer('org.apache.zookeeper:zookeeper')
 
 // jekyll dependencies
-gems 'rubygems:jekyll:3.5.2'
+gems 'rubygems:jekyll:4.1.1'

Review comment:
   There are significant changes between Jekyll 3 and 4 and we haven't 
upgraded to it yet because it needs some careful analysis and fixing. In the 
past attempts to build with Jekyll 4 have failed. I don't see evidence in this 
PR that you have done sufficient testing for me to feel confident that this is 
OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ctargett commented on a change in pull request #1923: SOLR-14900: Reference Guide build cleanup/module upgrade

2020-09-28 Thread GitBox


ctargett commented on a change in pull request #1923:
URL: https://github.com/apache/lucene-solr/pull/1923#discussion_r495883947



##
File path: solr/solr-ref-guide/src/_config.yml.template
##
@@ -91,6 +91,11 @@ asciidoctor:
 <<: *solr-attributes-ref
 attribute-missing: "warn"
 icons: "font"
-source-highlighter: "rouge"
-rouge-theme: "thankful-eyes"
 stem:
+source-highlighter: "rouge"
+# Options from: https://github.com/jirutka/asciidoctor-rouge
+rouge-css: "class"
+# class option requires css file generated by 
build/.gems/gems/rouge-vx.y.z/bin/rougify style style_name
+# other option: style for inline styles, which is larger in size
+# rouge-css: "style"
+# rouge-style: "github"

Review comment:
   I'm not sure I understand the point of the commented sections here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ctargett commented on a change in pull request #1923: SOLR-14900: Reference Guide build cleanup/module upgrade

2020-09-28 Thread GitBox


ctargett commented on a change in pull request #1923:
URL: https://github.com/apache/lucene-solr/pull/1923#discussion_r495884475



##
File path: solr/solr-ref-guide/src/_includes/head.html
##
@@ -11,6 +11,7 @@
 
 
 
+

[GitHub] [lucene-solr] ctargett commented on pull request #1923: SOLR-14900: Reference Guide build cleanup/module upgrade

2020-09-28 Thread GitBox


ctargett commented on pull request #1923:
URL: https://github.com/apache/lucene-solr/pull/1923#issuecomment-699964182


   The Jekyll 3 to 4 upgrade is a major one and has not been done to date 
because it has caused failures in the past when attempted. I don't understand 
from this so far what testing has been done to ensure it is 100% working 
properly - every single page was compared visually and with 'diff'? And they 
were **exactly** the same?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14850) ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203177#comment-17203177
 ] 

ASF subversion and git services commented on SOLR-14850:


Commit 32041c8d9b98e2839ba9d29a0940feccb4f75dd4 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=32041c8 ]

SOLR-14850: Fix ExactStatsCache NullPointerException when shards.tolerant=true.


> ExactStatsCache NullPointerException when shards.tolerant=true
> --
>
> Key: SOLR-14850
> URL: https://issues.apache.org/jira/browse/SOLR-14850
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 8.6.2
>Reporter: Eugene Tenkaev
>Assignee: Andrzej Bialecki
>Priority: Critical
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> All derived classes from *ExactStatsCache* fails if *shards.tolerant* set to 
> *true* and some shard is down.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:59)
>   at 
> org.apache.solr.search.stats.ExactStatsCache.doMergeToGlobalStats(ExactStatsCache.java:104)
>   at 
> org.apache.solr.search.stats.StatsCache.mergeToGlobalStats(StatsCache.java:173)
>   at 
> org.apache.solr.handler.component.QueryComponent.updateStats(QueryComponent.java:713)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:457)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2606)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:812)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:588)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
>   at 
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at org.eclipse.jetty.server.Server.handle(Server.java:500)
>   at 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
>   at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
>   at 
> org.eclipse.jetty.io.AbstractConnection$Rea

[GitHub] [lucene-solr] arafalov commented on a change in pull request #1923: SOLR-14900: Reference Guide build cleanup/module upgrade

2020-09-28 Thread GitBox


arafalov commented on a change in pull request #1923:
URL: https://github.com/apache/lucene-solr/pull/1923#discussion_r495890083



##
File path: solr/solr-ref-guide/build.gradle
##
@@ -90,14 +90,14 @@ dependencies {
 depVer('org.apache.zookeeper:zookeeper')
 
 // jekyll dependencies
-gems 'rubygems:jekyll:3.5.2'
+gems 'rubygems:jekyll:4.1.1'

Review comment:
   Do you remember any specific examples I could double-check on? I saw 
some past discussions, but it was about ant and PDF issues. I can double check. 
However, from the analysis I did do, only source blocks were actually affected. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] arafalov commented on a change in pull request #1923: SOLR-14900: Reference Guide build cleanup/module upgrade

2020-09-28 Thread GitBox


arafalov commented on a change in pull request #1923:
URL: https://github.com/apache/lucene-solr/pull/1923#discussion_r495891765



##
File path: solr/solr-ref-guide/src/_config.yml.template
##
@@ -91,6 +91,11 @@ asciidoctor:
 <<: *solr-attributes-ref
 attribute-missing: "warn"
 icons: "font"
-source-highlighter: "rouge"
-rouge-theme: "thankful-eyes"
 stem:
+source-highlighter: "rouge"
+# Options from: https://github.com/jirutka/asciidoctor-rouge
+rouge-css: "class"
+# class option requires css file generated by 
build/.gems/gems/rouge-vx.y.z/bin/rougify style style_name
+# other option: style for inline styles, which is larger in size
+# rouge-css: "style"
+# rouge-style: "github"

Review comment:
   The styles generated for the source blocks (highlights) can be inlined 
or in a separate css, which needs to be generated. Due to the support for - I 
guess - dark mode - the inline styles make the documents somewhat larger. So, 
CSS looked like a better option.
   
   Also, the previous style was spelled incorrectly and therefore falling-back 
to default style. I've just made of this explicit after spending time trying to 
figure out the actual situation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203181#comment-17203181
 ] 

Dawid Weiss commented on SOLR-14901:


I looked at fixing this test but I don't know how. These binary classes in 
schema-plugins.jar.bin don't have any equivalent in the source code:
{code}
MyPatternReplaceCharFilterFactory.class
MyTextField.class
MyWhitespaceAnalyzer.class
MyWhitespaceTokenizerFactory.class
{code}

The entire JAR is also signed... [~noble.paul] can you regenerate these files 
after core classes have been repackaged? I don't think binary artifacts are 
great in this test - they should be generated dynamically - but it's way beyond 
my knowledge of this API to fix this test.

> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Dawid Weiss
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9458) WordDelimiterGraphFilter (and non-graph) should tie-break order using end offset

2020-09-28 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203188#comment-17203188
 ] 

David Smiley commented on LUCENE-9458:
--

In the updated PR, I only changed WDGF, reverting my attempt to fix WDF.

> WordDelimiterGraphFilter (and non-graph) should tie-break order using end 
> offset
> 
>
> Key: LUCENE-9458
> URL: https://issues.apache.org/jira/browse/LUCENE-9458
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> WordDelimiterGraphFilter and WordDelimiterFilter do not consult the end 
> offset in their sub-token _ordering_.  In the event of a tie-break, I propose 
> the longer token come first.  This usually happens already, but not always, 
> and so this also feels like an inconsistency when you see it.  This issue can 
> be thought of as a bug fix to LUCENE-9006 or an improvement; I have no strong 
> feelings on the issue classification.  Before reading further, definitely 
> read that issue.
> I see this is a problem when using CATENATE_ALL with either 
> GENERATE_WORD_PARTS xor GENERATE_NUMBER_PARTS when the input ends with that 
> part not being generated.  Consider the input: "other-9" and let's assume we 
> want to catenate all, generate word parts, but nothing else (not numbers).  
> This should be tokenized in this order: "other9", "other" but today is 
> emitted in reverse order.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14902) Merge Jetty and Solr classpath

2020-09-28 Thread David Smiley (Jira)
David Smiley created SOLR-14902:
---

 Summary: Merge Jetty and Solr classpath
 Key: SOLR-14902
 URL: https://issues.apache.org/jira/browse/SOLR-14902
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley


The standard webapp/server classpath separation is pointless for Solr.  The 
separation is abused/broken when SSL is configured (see bin/solr WEB-INF/lib/* 
hack addition to Jetty's classpath).  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] arafalov commented on a change in pull request #1923: SOLR-14900: Reference Guide build cleanup/module upgrade

2020-09-28 Thread GitBox


arafalov commented on a change in pull request #1923:
URL: https://github.com/apache/lucene-solr/pull/1923#discussion_r495913848



##
File path: solr/solr-ref-guide/src/_includes/head.html
##
@@ -11,6 +11,7 @@
 
 
 
+

[jira] [Commented] (SOLR-10783) Using Hadoop Credential Provider as SSL/TLS store password source

2020-09-28 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203204#comment-17203204
 ] 

David Smiley commented on SOLR-10783:
-

I noticed this issue modified bin/solr to add all of WEB-INF/lib JARs to the 
Jetty classpath when SSL options are used.  It's quite a hack to behold for 
those that understand webapps/servlets.  I think this deserved more discussion. 
 It appears this is needed because etc/jetty-ssl.xml has a call to 
{{org.apache.solr.util.configuration.SSLConfigurationsFactory}} which thus must 
be on Jetty's classpath (server/lib) but Solr doesn't live there.  I see two 
better options:
(A) add a new Solr code module for code at the Jetty level that would include 
this SSL thing and anything else (logging stuff?) that ought to exist at that 
level, plus configure Jetty to expose these classes to the Solr webapp in case 
Solr needs access as well.
(B) stop pretending Solr is some typical webapp with the classpath separation 
between server and webapp.  Merge Solr's classpath down into Jetty.  If Jetty 
insists on a webapp classloader existing, it'd be defunct -- nothing to load 
directly.

I think the project should choose "B". I filed SOLR-14902.

> Using Hadoop Credential Provider as SSL/TLS store password source
> -
>
> Key: SOLR-10783
> URL: https://issues.apache.org/jira/browse/SOLR-10783
> Project: Solr
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 7.0
>Reporter: Mano Kovacs
>Assignee: Mark Miller
>Priority: Major
> Fix For: 7.4, 8.0
>
> Attachments: SOLR-10783-fix.patch, SOLR-10783.patch, 
> SOLR-10783.patch, SOLR-10783.patch, SOLR-10783.patch, SOLR-10783.patch, 
> SOLR-10783.patch, SOLR-10783.patch, SOLR-10783.patch
>
>
> As a second iteration of SOLR-10307, I propose support of hadoop credential 
> providers as source of SSL store passwords. 
> Motivation: When SOLR is used in hadoop environment, support of  HCP gives 
> better integration and unified method to pass sensitive credentials to SOLR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] arafalov commented on pull request #1923: SOLR-14900: Reference Guide build cleanup/module upgrade

2020-09-28 Thread GitBox


arafalov commented on pull request #1923:
URL: https://github.com/apache/lucene-solr/pull/1923#issuecomment-699988193


   Thank you for the feedback. Was there a particular methodology you were 
going to utilize for the upgrade yourself? I did visual and diff spot checks on 
pages that seemed to have significant changes and only found the highlighter 
differences (styles and additional markup around white-spaces). But, if there 
are expectations of significant discrepancies, I am happy to follow a different 
suggested approach.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14902) Merge Jetty and Solr classpath

2020-09-28 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203216#comment-17203216
 ] 

David Smiley commented on SOLR-14902:
-

The SSL hack was added in SOLR-10783; I added a comment on that issue pointing 
here.

Proposal:
* What goes in WEB-INF/lib would instead go to a new lib dir /server/lib/solr.  
Need to tell Jetty about that dir in bin/solr.
* in solr-jetty-context.xml, set {{true}}.  Not sure if this is needed.
* in solr-jetty-context.xml, expose SolrDispatchFilter from the server's 
classpath to the web context.  Libs at the Jetty level are invisible to Solr, 
with some exceptions that can be controlled.

https://www.eclipse.org/jetty/documentation/current/jetty-classloading.html#configuring-webapp-classloading

CC [~uschindler] I suspect you have thoughts on this since you've proposed 
replacing much of the Jetty/Solr bootstrapping with a single source file.  That 
would be the epitome of hard-coding -- we'd lose the rather nice text file 
configuration that's possible today.  One example is adding CORS, which is 
rather common and can be done easily without code changes.

> Merge Jetty and Solr classpath
> --
>
> Key: SOLR-14902
> URL: https://issues.apache.org/jira/browse/SOLR-14902
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Major
>
> The standard webapp/server classpath separation is pointless for Solr.  The 
> separation is abused/broken when SSL is configured (see bin/solr 
> WEB-INF/lib/* hack addition to Jetty's classpath).  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1919: Compute RAM usage ByteBuffersDataOutput on the fly.

2020-09-28 Thread GitBox


jpountz merged pull request #1919:
URL: https://github.com/apache/lucene-solr/pull/1919


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #1912: LUCENE-9535: Try to do larger flushes.

2020-09-28 Thread GitBox


jpountz commented on pull request #1912:
URL: https://github.com/apache/lucene-solr/pull/1912#issuecomment-67077


   > We are talking about assigning DWPT to incoming indexing thread, right?
   
   Right.
   
   The thing that makes me hesitate is that it would probably make fetching a 
DWPT slightly more costly, which could slow indexing down with very fast 
indexing chains like the 1kb wikipedia documents dataset we use for nightly 
benchmarks since this method is called under a lock.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-14901:
-

Assignee: Noble Paul  (was: Dawid Weiss)

> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Noble Paul
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203231#comment-17203231
 ] 

Noble Paul commented on SOLR-14901:
---

[~dweiss] [~tomoko]

Can you just {{@Ignore}} this test as a part of LUCENE-9317.

I've assigned this to myself. Whichever release this is a part of, please make 
it as a blocker.

it's not possible for me to fix this unless the classes are moved

> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Noble Paul
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203231#comment-17203231
 ] 

Noble Paul edited comment on SOLR-14901 at 9/28/20, 1:27 PM:
-

[~dweiss] [~tomoko]

Can you just {{@Ignore}} this test as a part of LUCENE-9317.

I've assigned this to myself. Whichever release this is a part of, please make 
it as a blocker.

it's not possible for me to fix this unless the classes are moved

[~dweiss]it's not an elegant solution to keep the compiled binaries in code. I 
will add the source files too when I fix this


was (Author: noble.paul):
[~dweiss] [~tomoko]

Can you just {{@Ignore}} this test as a part of LUCENE-9317.

I've assigned this to myself. Whichever release this is a part of, please make 
it as a blocker.

it's not possible for me to fix this unless the classes are moved

> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Noble Paul
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203233#comment-17203233
 ] 

Dawid Weiss commented on SOLR-14901:


These classes have been moved already (on master) - that's why the code 
currently fails. We marked it with AwaitsFix so that it's ignored in builds (on 
master). Thank you.

> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Noble Paul
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14897) HttpSolrCall will forward a virtually unlimited number of times until ClusterState ZkWatcher is updated after collection delete

2020-09-28 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-14897:

Attachment: SOLR-14897.patch

> HttpSolrCall will forward a virtually unlimited number of times until 
> ClusterState ZkWatcher is updated after collection delete
> ---
>
> Key: SOLR-14897
> URL: https://issues.apache.org/jira/browse/SOLR-14897
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14897.patch
>
>
> While investigating the root cause of some SOLR-14896 related failures, I 
> have seen evidence that if a collection is deleted, but a client makes a 
> subequent request for that collection _before_ the local ClusterState has 
> been updated to remove that DocCollection, HttpSolrCall will forward/proxy 
> that request a (virtually) unbounded number of times in a very short time 
> period - stopping only once the the "cached" local DocCollection is updated 
> to indicate there are no active replicas.**
> While HttpSolrCall does track & increment a {{_forwardedCount}} param on 
> every request it forwards, it doesn't consult that request unless/until it 
> finds a situation where the (local) DocCollection says there are no active 
> replicas.
> So if you have a collection XX with 4 total replicas on 4 diff nodes 
> (A,B,C,D), and and you delete XX (triggering sequential core deletions on 
> A,B,C,D that fire successive ZkWatchers on various nodes to update the 
> collection state) a request for XX can bounce back and forth between nodes C 
> & D 20+ times until the ClusterState watcher fires on both of those nodes so 
> they finally realize that the {{_forwardedCount=20}} is more the the 0 active 
> replicas...
> In the below code snippet from HttpSolrCall, the first call to 
> {{getCoreUrl(...)}} is expected to return null if there are no active 
> replicas - but it uses the local cached DocCollection, which may _think_ 
> there is an active replica on another node, so it forwards the request to 
> that node - where the replica may have been deleted, so that node runs hte 
> same code and may forward the request right back to the original node
> {code:java}
> String coreUrl = getCoreUrl(collectionName, origCorename, clusterState,
> activeSlices, byCoreName, true);
> // Avoid getting into a recursive loop of requests being forwarded by
> // stopping forwarding and erroring out after (totalReplicas) forwards
> if (coreUrl == null) {
>   if (queryParams.getInt(INTERNAL_REQUEST_COUNT, 0) > totalReplicas){
> throw new SolrException(SolrException.ErrorCode.INVALID_STATE,
> "No active replicas found for collection: " + collectionName);
>   }
>   coreUrl = getCoreUrl(collectionName, origCorename, clusterState,
>   activeSlices, byCoreName, false);
> }
> {code}
> ..the check that is suppose to prevent a "recursive loop" is only consulted 
> once a situation arises where local ClusterState indicates there are no 
> active replicas - which seems to defeat the point of the forward check?  (at 
> which point if the total number of replicas hasn't been exceeded, the code is 
> happy to forward the request to a coreUrl which the local ClusterState 
> indicates is _not_ active (which also sems to defeat the point?)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14897) HttpSolrCall will forward a virtually unlimited number of times until ClusterState ZkWatcher is updated after collection delete

2020-09-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203241#comment-17203241
 ] 

Munendra S N commented on SOLR-14897:
-

 [^SOLR-14897.patch] 
This does forwardCount check even in case of active replica. I'm not able 
figure out way to add test for this change, any help would be appreciated

> HttpSolrCall will forward a virtually unlimited number of times until 
> ClusterState ZkWatcher is updated after collection delete
> ---
>
> Key: SOLR-14897
> URL: https://issues.apache.org/jira/browse/SOLR-14897
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14897.patch
>
>
> While investigating the root cause of some SOLR-14896 related failures, I 
> have seen evidence that if a collection is deleted, but a client makes a 
> subequent request for that collection _before_ the local ClusterState has 
> been updated to remove that DocCollection, HttpSolrCall will forward/proxy 
> that request a (virtually) unbounded number of times in a very short time 
> period - stopping only once the the "cached" local DocCollection is updated 
> to indicate there are no active replicas.**
> While HttpSolrCall does track & increment a {{_forwardedCount}} param on 
> every request it forwards, it doesn't consult that request unless/until it 
> finds a situation where the (local) DocCollection says there are no active 
> replicas.
> So if you have a collection XX with 4 total replicas on 4 diff nodes 
> (A,B,C,D), and and you delete XX (triggering sequential core deletions on 
> A,B,C,D that fire successive ZkWatchers on various nodes to update the 
> collection state) a request for XX can bounce back and forth between nodes C 
> & D 20+ times until the ClusterState watcher fires on both of those nodes so 
> they finally realize that the {{_forwardedCount=20}} is more the the 0 active 
> replicas...
> In the below code snippet from HttpSolrCall, the first call to 
> {{getCoreUrl(...)}} is expected to return null if there are no active 
> replicas - but it uses the local cached DocCollection, which may _think_ 
> there is an active replica on another node, so it forwards the request to 
> that node - where the replica may have been deleted, so that node runs hte 
> same code and may forward the request right back to the original node
> {code:java}
> String coreUrl = getCoreUrl(collectionName, origCorename, clusterState,
> activeSlices, byCoreName, true);
> // Avoid getting into a recursive loop of requests being forwarded by
> // stopping forwarding and erroring out after (totalReplicas) forwards
> if (coreUrl == null) {
>   if (queryParams.getInt(INTERNAL_REQUEST_COUNT, 0) > totalReplicas){
> throw new SolrException(SolrException.ErrorCode.INVALID_STATE,
> "No active replicas found for collection: " + collectionName);
>   }
>   coreUrl = getCoreUrl(collectionName, origCorename, clusterState,
>   activeSlices, byCoreName, false);
> }
> {code}
> ..the check that is suppose to prevent a "recursive loop" is only consulted 
> once a situation arises where local ClusterState indicates there are no 
> active replicas - which seems to defeat the point of the forward check?  (at 
> which point if the total number of replicas hasn't been exceeded, the code is 
> happy to forward the request to a coreUrl which the local ClusterState 
> indicates is _not_ active (which also sems to defeat the point?)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14898) Proxied/Forwarded requests to other nodes wind up getting duplicate response headers

2020-09-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203249#comment-17203249
 ] 

Munendra S N commented on SOLR-14898:
-

When a request is received by any node, security headers are set in the 
response by the Jetty's RewriteHandler. When there is no local core then 
request is forwarded/proxied to Node with core and the returned response is 
sent back to the user. Here, all the response headers from proxy request are 
[*added*|https://github.com/apache/lucene-solr/blob/c3f97fbdc11cf29e17a4e715981108dda7ba3aea/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L731]
 to original response.

addHeader could be replaced by setHeader, so that defaults are overwritten by 
the headers in response(which is the actual response we return to user).
Another approach is to add new filter to set security headers instead of 
RewriteHandler but we will still have the problem(again, we might need contains 
check or replace it with setHeader)
 
Let me know which approach looks better, will attach a patch post that

> Proxied/Forwarded requests to other nodes wind up getting duplicate response 
> headers
> 
>
> Key: SOLR-14898
> URL: https://issues.apache.org/jira/browse/SOLR-14898
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> When Solr receives a request for a collection not hosted on the current node, 
> HttpSolrCall forwards/proxies that request - but the final response for the 
> client can include duplicate response headers - one header from the remote 
> node that ultimately handled the request, and a second copy of the header 
> added by the current node...
> {noformat}
> # create a simple 2 node cluster...
> $ ./bin/solr -e cloud -noprompt
> # ...
> $ curl 
> 'http://localhost:8983/solr/admin/collections?action=CREATE&name=solo&numShards=1&nrtReplicas=1'
> # ...
> # node 8983 is the node currently hosting the only replica of the 'solo' 
> collection, and responds to requests directly...
> #
> $ curl -S -s -D - -o /dev/null http://localhost:8983/solr/solo/select
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/json;charset=utf-8
> Content-Length: 169
> # node 7574 does not host a replica, and forwards requests for it to 8983
> # the response the client gets from 7574 has several security related headers 
> duplicated...
> #
> $ curl -S -s -D - -o /dev/null http://localhost:7574/solr/solo/select
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/json;charset=utf-8
> Content-Length: 197
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9401) ComplexPhraseQuery's toString method always omits field name

2020-09-28 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated LUCENE-9401:
-
Attachment: LUCENE-9401.patch

> ComplexPhraseQuery's toString method always omits field name
> 
>
> Key: LUCENE-9401
> URL: https://issues.apache.org/jira/browse/LUCENE-9401
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 8.5.2
>Reporter: Thomas Hecker
>Priority: Trivial
> Attachments: LUCENE-9401.patch, LUCENE-9401.patch
>
>
> The toString(String field) method in 
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery
>  should only omit the field name if query's field name is not equal to field 
> name that is passed as an argument.
> Instead, the query's field name is never included in the returned String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9401) ComplexPhraseQuery's toString method always omits field name

2020-09-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203251#comment-17203251
 ] 

Munendra S N commented on LUCENE-9401:
--

 [^LUCENE-9401.patch] 
Minor formatting and fixes failing test. I will commit this shortly

> ComplexPhraseQuery's toString method always omits field name
> 
>
> Key: LUCENE-9401
> URL: https://issues.apache.org/jira/browse/LUCENE-9401
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 8.5.2
>Reporter: Thomas Hecker
>Priority: Trivial
> Attachments: LUCENE-9401.patch, LUCENE-9401.patch
>
>
> The toString(String field) method in 
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery
>  should only omit the field name if query's field name is not equal to field 
> name that is passed as an argument.
> Instead, the query's field name is never included in the returned String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9401) ComplexPhraseQuery's toString method always omits field name

2020-09-28 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N reassigned LUCENE-9401:


Assignee: Munendra S N

> ComplexPhraseQuery's toString method always omits field name
> 
>
> Key: LUCENE-9401
> URL: https://issues.apache.org/jira/browse/LUCENE-9401
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 8.5.2
>Reporter: Thomas Hecker
>Assignee: Munendra S N
>Priority: Trivial
> Attachments: LUCENE-9401.patch, LUCENE-9401.patch
>
>
> The toString(String field) method in 
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery
>  should only omit the field name if query's field name is not equal to field 
> name that is passed as an argument.
> Instead, the query's field name is never included in the returned String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-09-28 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203261#comment-17203261
 ] 

Michael McCandless commented on LUCENE-9004:


This is an exciting feature!  It is not every day that a new {{Codec}} 
component is born!  Nearest-neighbor vector search seems here to stay, so 
supporting this efficiently in Lucene makes sense.

+1 to [~sokolov]'s plan above to break this complex and exciting feature into 
bite-sized steps!   Future improvements (chunking the graph for a large 
segment, more efficient filtering with other query clauses, maybe adding 
hierarchy/layers to the graph) can come as subsequent iterations.

Handling the sorted index is hopefully not so difficult, since you already 
dereference the dense ordinals to docids.

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query i

[jira] [Commented] (LUCENE-9317) Resolve package name conflicts for StandardAnalyzer to allow Java module system support

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203271#comment-17203271
 ] 

ASF subversion and git services commented on LUCENE-9317:
-

Commit fc6d0a40dc3aaffc87213b62c8e6bddbe727d6a9 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fc6d0a4 ]

LUCENE-9317: Remove unused imports.


> Resolve package name conflicts for StandardAnalyzer to allow Java module 
> system support
> ---
>
> Key: LUCENE-9317
> URL: https://issues.apache.org/jira/browse/LUCENE-9317
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: master (9.0)
>Reporter: David Ryan
>Assignee: Tomoko Uchida
>Priority: Major
>  Labels: build, features
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
>  
> To allow Lucene to be modularised there are a few preparatory tasks to be 
> completed prior to this being possible.  The Java module system requires that 
> jars do not use the same package name in different jars.  The lucene-core and 
> lucene-analyzers-common both share the package 
> org.apache.lucene.analysis.standard.
> Possible resolutions to this issue are discussed by Uwe on the mailing list 
> here:
>  
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAM21Rt8FHOq_JeUSELhsQJH0uN0eKBgduBQX4fQKxbs49TLqzA%40mail.gmail.com%3E]
> {quote}About StandardAnalyzer: Unfortunately I aggressively complained a 
> while back when Mike McCandless wanted to move standard analyzer out of the 
> analysis package into core (“for convenience”). This was a bad step, and IMHO 
> we should revert that or completely rename the packages and everything. The 
> problem here is: As the analysis services are only part of lucene-analyzers, 
> we had to leave the factory classes there, but move the implementation 
> classes in core. The package has to be the same. The only way around that is 
> to move the analysis factory framework also to core (I would not be against 
> that). This would include all factory base classes and the service loading 
> stuff. Then we can move standard analyzer and some of the filters/tokenizers 
> including their factories to core an that problem would be solved.
> {quote}
> There are two options here, either move factory framework into core or revert 
> StandardAnalyzer back to lucene-analyzers.  In the email, the solution lands 
> on reverting back as per the task list:
> {quote}Add some preparatory issues to cleanup class hierarchy: Move Analysis 
> SPI to core / remove StandardAnalyzer and related classes out of core back to 
> anaysis
> {quote}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand merged pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-28 Thread GitBox


mikemccand merged pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203281#comment-17203281
 ] 

ASF subversion and git services commented on LUCENE-9444:
-

Commit 24aadc220ba9578f581637b9fd0e7e973d46426c in lucene-solr's branch 
refs/heads/master from goankur
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=24aadc2 ]

LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index 
for a facet field (#1893)

LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index 
for a facet field so such fields do not also have to be redundantly stored in 
the index.

Co-authored-by: Ankur Goel 

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1758: SOLR-14749: Provide a clean API for cluster-level event processing, Initial draft.

2020-09-28 Thread GitBox


murblanc commented on a change in pull request #1758:
URL: https://github.com/apache/lucene-solr/pull/1758#discussion_r496005660



##
File path: 
solr/core/src/java/org/apache/solr/cluster/events/impl/CollectionsRepairEventListener.java
##
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.events.impl;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.apache.solr.client.solrj.SolrClient;
+import org.apache.solr.client.solrj.cloud.SolrCloudManager;
+import org.apache.solr.client.solrj.request.CollectionAdminRequest;
+import org.apache.solr.cloud.api.collections.Assign;
+import org.apache.solr.cluster.events.ClusterEvent;
+import org.apache.solr.cluster.events.ClusterEventListener;
+import org.apache.solr.cluster.events.NodesDownEvent;
+import org.apache.solr.cluster.events.ReplicasDownEvent;
+import org.apache.solr.common.cloud.ClusterState;
+import org.apache.solr.common.cloud.Replica;
+import org.apache.solr.common.cloud.ReplicaPosition;
+import org.apache.solr.core.CoreContainer;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This is an illustration how to re-implement the combination of 8x
+ * NodeLostTrigger and AutoAddReplicasPlanAction to maintain the collection's 
replication factor.
+ * NOTE: there's no support for 'waitFor' yet.
+ * NOTE 2: this functionality would be probably more reliable when executed 
also as a
+ * periodically scheduled check - both as a reactive (listener) and proactive 
(scheduled) measure.
+ */
+public class CollectionsRepairEventListener implements ClusterEventListener {
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String PLUGIN_NAME = "collectionsRepairListener";
+  private static final String ASYNC_ID_PREFIX = "_async_" + PLUGIN_NAME;
+  private static final AtomicInteger counter = new AtomicInteger();
+
+  private final SolrClient solrClient;
+  private final SolrCloudManager solrCloudManager;
+
+  private boolean running = false;

Review comment:
   Likely needs to be volatile (or get proper synchronization).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203282#comment-17203282
 ] 

ASF subversion and git services commented on LUCENE-9444:
-

Commit 24aadc220ba9578f581637b9fd0e7e973d46426c in lucene-solr's branch 
refs/heads/master from goankur
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=24aadc2 ]

LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index 
for a facet field (#1893)

LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index 
for a facet field so such fields do not also have to be redundantly stored in 
the index.

Co-authored-by: Ankur Goel 

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] munendrasn commented on pull request #1775: SOLR-14767 : fix long field parsing from string

2020-09-28 Thread GitBox


munendrasn commented on pull request #1775:
URL: https://github.com/apache/lucene-solr/pull/1775#issuecomment-700073981


   I have included changes for TrieFields and modified to parse as Float for 
Int fields. Probably, we could call toNativeType(val) from createField to avoid 
duplication but will defer that change to the future (unless someone else 
prefers that approach too)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14850) ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203348#comment-17203348
 ] 

ASF subversion and git services commented on SOLR-14850:


Commit 0cb8b1b15d0b1937afab3b6885348e712e0d5f84 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0cb8b1b ]

SOLR-14850: Fix ExactStatsCache NullPointerException when shards.tolerant=true.


> ExactStatsCache NullPointerException when shards.tolerant=true
> --
>
> Key: SOLR-14850
> URL: https://issues.apache.org/jira/browse/SOLR-14850
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 8.6.2
>Reporter: Eugene Tenkaev
>Assignee: Andrzej Bialecki
>Priority: Critical
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> All derived classes from *ExactStatsCache* fails if *shards.tolerant* set to 
> *true* and some shard is down.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:59)
>   at 
> org.apache.solr.search.stats.ExactStatsCache.doMergeToGlobalStats(ExactStatsCache.java:104)
>   at 
> org.apache.solr.search.stats.StatsCache.mergeToGlobalStats(StatsCache.java:173)
>   at 
> org.apache.solr.handler.component.QueryComponent.updateStats(QueryComponent.java:713)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:457)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2606)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:812)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:588)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
>   at 
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at org.eclipse.jetty.server.Server.handle(Server.java:500)
>   at 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
>   at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
>   at 
> org.eclipse.jetty.io.AbstractConnection$

[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing

2020-09-28 Thread Mark Robert Miller (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203352#comment-17203352
 ] 

Mark Robert Miller commented on SOLR-14788:
---

Shortly, I'll start publishing regular benchmarks (for a limited time) to 
[https://people.apache.org/~markrmiller/bench/]

There is some early data there, that will be reset and I'll add some more 
charts this week. This will help keep on eye on production while we move 
towards a first milestone.

Initially the focus is just to track changes in this branches performance using 
micro benchmarks. Eventually, I'll look at publishing master comparisons as 
well.

The 'core' benchmarks will be the standard benchmarks run more frequently and 
the 'study' benchmarks are more esoteric or deep dive or experimental 
benchmarks that are run less frequently.

> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mark Robert Miller
>Priority: Critical
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is on duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} 
> *down for some vigilante justice, but I won't be walking the beat, all that 
> stuff about sit back and relax goes out the window.*{color}_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14897) HttpSolrCall will forward a virtually unlimited number of times until ClusterState ZkWatcher is updated after collection delete

2020-09-28 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14897:
--
Fix Version/s: 8.6.3
 Priority: Blocker  (was: Major)

[~munendrasn] - +1 to committing your patch.

bq. ... I'm not able figure out way to add test for this change, any help would 
be appreciated

I'm not sure we have any good template/plumbing/helpers for testing this kind 
of situation ... i have some thoughts on how we might go about it (from an 
offline idea proposed by AB) that i'll put into a new jira, but i don't think 
we should let building new test scaffolding for situations like this should 
solw us down in trying to fix this really heinous bug ASAP.



> HttpSolrCall will forward a virtually unlimited number of times until 
> ClusterState ZkWatcher is updated after collection delete
> ---
>
> Key: SOLR-14897
> URL: https://issues.apache.org/jira/browse/SOLR-14897
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Blocker
> Fix For: 8.6.3
>
> Attachments: SOLR-14897.patch
>
>
> While investigating the root cause of some SOLR-14896 related failures, I 
> have seen evidence that if a collection is deleted, but a client makes a 
> subequent request for that collection _before_ the local ClusterState has 
> been updated to remove that DocCollection, HttpSolrCall will forward/proxy 
> that request a (virtually) unbounded number of times in a very short time 
> period - stopping only once the the "cached" local DocCollection is updated 
> to indicate there are no active replicas.**
> While HttpSolrCall does track & increment a {{_forwardedCount}} param on 
> every request it forwards, it doesn't consult that request unless/until it 
> finds a situation where the (local) DocCollection says there are no active 
> replicas.
> So if you have a collection XX with 4 total replicas on 4 diff nodes 
> (A,B,C,D), and and you delete XX (triggering sequential core deletions on 
> A,B,C,D that fire successive ZkWatchers on various nodes to update the 
> collection state) a request for XX can bounce back and forth between nodes C 
> & D 20+ times until the ClusterState watcher fires on both of those nodes so 
> they finally realize that the {{_forwardedCount=20}} is more the the 0 active 
> replicas...
> In the below code snippet from HttpSolrCall, the first call to 
> {{getCoreUrl(...)}} is expected to return null if there are no active 
> replicas - but it uses the local cached DocCollection, which may _think_ 
> there is an active replica on another node, so it forwards the request to 
> that node - where the replica may have been deleted, so that node runs hte 
> same code and may forward the request right back to the original node
> {code:java}
> String coreUrl = getCoreUrl(collectionName, origCorename, clusterState,
> activeSlices, byCoreName, true);
> // Avoid getting into a recursive loop of requests being forwarded by
> // stopping forwarding and erroring out after (totalReplicas) forwards
> if (coreUrl == null) {
>   if (queryParams.getInt(INTERNAL_REQUEST_COUNT, 0) > totalReplicas){
> throw new SolrException(SolrException.ErrorCode.INVALID_STATE,
> "No active replicas found for collection: " + collectionName);
>   }
>   coreUrl = getCoreUrl(collectionName, origCorename, clusterState,
>   activeSlices, byCoreName, false);
> }
> {code}
> ..the check that is suppose to prevent a "recursive loop" is only consulted 
> once a situation arises where local ClusterState indicates there are no 
> active replicas - which seems to defeat the point of the forward check?  (at 
> which point if the total number of replicas hasn't been exceeded, the code is 
> happy to forward the request to a coreUrl which the local ClusterState 
> indicates is _not_ active (which also sems to defeat the point?)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203366#comment-17203366
 ] 

ASF subversion and git services commented on LUCENE-9444:
-

Commit 98a49ed18d9f258823d3395b3c84c893c35d818b in lucene-solr's branch 
refs/heads/master from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=98a49ed ]

LUCENE-9444: add CHANGES.txt entry


> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203369#comment-17203369
 ] 

ASF subversion and git services commented on LUCENE-9444:
-

Commit cc31f167519920f85a1ebcc6fc494f2823ab3a06 in lucene-solr's branch 
refs/heads/branch_8x from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cc31f16 ]

LUCENE-9444: add CHANGES.txt entry


> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread Michael McCandless (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-9444:
---
Fix Version/s: 8.7
   master (9.0)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~goankur]!  Lucene's facets models is a bit easier to use now :)

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0), 8.7
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203368#comment-17203368
 ] 

ASF subversion and git services commented on LUCENE-9444:
-

Commit d71923065dba0b6fe275cb60bd5cffb017509827 in lucene-solr's branch 
refs/heads/branch_8x from goankur
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d719230 ]

LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index 
for a facet field (#1893)

LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index 
for a facet field so such fields do not also have to be redundantly stored in 
the index.

Co-authored-by: Ankur Goel 

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203367#comment-17203367
 ] 

ASF subversion and git services commented on LUCENE-9444:
-

Commit d71923065dba0b6fe275cb60bd5cffb017509827 in lucene-solr's branch 
refs/heads/branch_8x from goankur
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d719230 ]

LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index 
for a facet field (#1893)

LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index 
for a facet field so such fields do not also have to be redundantly stored in 
the index.

Co-authored-by: Ankur Goel 

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread Michael McCandless (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-9444:


Woops, {{Set.of}} is only available in JDK 9+ ... I'll fix on our 8.x branch.

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0), 8.7
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14898) Proxied/Forwarded requests to other nodes wind up getting duplicate response headers

2020-09-28 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter reassigned SOLR-14898:
-

Affects Version/s: 8.6.3
 Assignee: Chris M. Hostetter
 Priority: Blocker  (was: Major)

[~munendrasn] - I think you're right, in the case of HttpSolrCall.remoteQuery 
we should definitely "set" the response headers to match the remote node.  

Ideally when as a "proxy" we shouldn't be "adding" anything to the response 
that didn't come from the proxied node, so we could/should in theory not let 
RewriteHandler (on nodeA) add those headers if when nodeA proxies to nodeB 
(leave it to nodeB to set them) but in practice since we aren't a generalized 
proxy we're a solr node handing off to another solr node it should be fine to 
leave that as is.

I'm going to work up a patch with a fix and (hopefully) a simple cloud test - 
i'd like to get a fix into 8.6.3 for this given how it contributes to causing 
SOLR-14896.

> Proxied/Forwarded requests to other nodes wind up getting duplicate response 
> headers
> 
>
> Key: SOLR-14898
> URL: https://issues.apache.org/jira/browse/SOLR-14898
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Blocker
>
> When Solr receives a request for a collection not hosted on the current node, 
> HttpSolrCall forwards/proxies that request - but the final response for the 
> client can include duplicate response headers - one header from the remote 
> node that ultimately handled the request, and a second copy of the header 
> added by the current node...
> {noformat}
> # create a simple 2 node cluster...
> $ ./bin/solr -e cloud -noprompt
> # ...
> $ curl 
> 'http://localhost:8983/solr/admin/collections?action=CREATE&name=solo&numShards=1&nrtReplicas=1'
> # ...
> # node 8983 is the node currently hosting the only replica of the 'solo' 
> collection, and responds to requests directly...
> #
> $ curl -S -s -D - -o /dev/null http://localhost:8983/solr/solo/select
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/json;charset=utf-8
> Content-Length: 169
> # node 7574 does not host a replica, and forwards requests for it to 8983
> # the response the client gets from 7574 has several security related headers 
> duplicated...
> #
> $ curl -S -s -D - -o /dev/null http://localhost:7574/solr/solo/select
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/json;charset=utf-8
> Content-Length: 197
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203405#comment-17203405
 ] 

ASF subversion and git services commented on LUCENE-9444:
-

Commit acce3c15b4a1dcdb69301d06e4ab9bda3d461d58 in lucene-solr's branch 
refs/heads/branch_8x from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=acce3c1 ]

LUCENE-9444: woops, cannot use Set.of until JDK 9


> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0), 8.7
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread Michael McCandless (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-9444.

Resolution: Fixed

OK, fixed!

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0), 8.7
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14903) SolrCloud tests should use the jetty.xml that we ship with

2020-09-28 Thread Chris M. Hostetter (Jira)
Chris M. Hostetter created SOLR-14903:
-

 Summary: SolrCloud tests should use the jetty.xml that we ship with
 Key: SOLR-14903
 URL: https://issues.apache.org/jira/browse/SOLR-14903
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter


writing a good test for SOLR-14898 turned out to be a challenge because 
SolrCloudTestCase -- and any test using jetty -- doesn't _actually_ use the 
jetty.xml that real Solr nodes use, because JettySolrRunner manually configures 
i'ts jetty "Server" instance.

We should fix this so what you get i na test matches what a real Solr node does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14898) Proxied/Forwarded requests to other nodes wind up getting duplicate response headers

2020-09-28 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14898:
--
Attachment: SOLR-14898.patch
Status: Open  (was: Open)

writing a junit test proved (virutally) impossible due to SOLR-14903.

Attached patch includes manually verified fix, as well as the starting point of 
my test currently marked AwaitsFix waiting on SOLR-14903 (which is a big enough 
ball of wax i don't think this issue should be held up waiting for it).

I'm still running full checks/tests to make sure i this doesn't break anything 
in some weird way ... Would appreciate review/eyeballs before committing & 
backporting in the meantime.

> Proxied/Forwarded requests to other nodes wind up getting duplicate response 
> headers
> 
>
> Key: SOLR-14898
> URL: https://issues.apache.org/jira/browse/SOLR-14898
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Blocker
> Attachments: SOLR-14898.patch
>
>
> When Solr receives a request for a collection not hosted on the current node, 
> HttpSolrCall forwards/proxies that request - but the final response for the 
> client can include duplicate response headers - one header from the remote 
> node that ultimately handled the request, and a second copy of the header 
> added by the current node...
> {noformat}
> # create a simple 2 node cluster...
> $ ./bin/solr -e cloud -noprompt
> # ...
> $ curl 
> 'http://localhost:8983/solr/admin/collections?action=CREATE&name=solo&numShards=1&nrtReplicas=1'
> # ...
> # node 8983 is the node currently hosting the only replica of the 'solo' 
> collection, and responds to requests directly...
> #
> $ curl -S -s -D - -o /dev/null http://localhost:8983/solr/solo/select
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/json;charset=utf-8
> Content-Length: 169
> # node 7574 does not host a replica, and forwards requests for it to 8983
> # the response the client gets from 7574 has several security related headers 
> duplicated...
> #
> $ curl -S -s -D - -o /dev/null http://localhost:7574/solr/solo/select
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/json;charset=utf-8
> Content-Length: 197
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14151) Make schema components load from packages

2020-09-28 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203454#comment-17203454
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14151:
--

Thanks for fixing the test Noble.
bq. Regarding the async reload of SolrCore, I think you should just remove that 
method. It's just dead and untested code now.
bq. (y)
You didn’t remove this yet.
{quote}
Commit cc31e23341ba9e4e409c0bc7d0beb434743744e4 in lucene-solr's branch 
refs/heads/master from noblepaul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cc31e23 ] 

SOLR-14151: Fixing TestBulkSchemaConcurrent failures
{quote}
This change is not thread safe, looks like it should.


> Make schema components load from packages
> -
>
> Key: SOLR-14151
> URL: https://issues.apache.org/jira/browse/SOLR-14151
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.7
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Example:
> {code:xml}
>  
> 
>   
>generateNumberParts="0" catenateWords="0"
>   catenateNumbers="0" catenateAll="0"/>
>   
>   
> 
>   
> {code}
> * When a package is updated, the entire {{IndexSchema}} object is refreshed, 
> but the SolrCore object is not reloaded
> * Any component can be prefixed with the package name
> * The semantics of loading plugins remain the same as that of the components 
> in {{solrconfig.xml}}
> * Plugins can be registered using schema API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14408) Refactor MoreLikeThisHandler Implementation

2020-09-28 Thread Alessandro Benedetti (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203486#comment-17203486
 ] 

Alessandro Benedetti commented on SOLR-14408:
-

Hi [~Seidan], sorry for the abysmal delay in responding, I just got the chance 
to review your pull request.
It seems ok to me, only question is related the interesting term class.
Are we sure there isn't anything similar already in the Apache Lucene/Solr 
codebase ?
I took a quick look and I wasn't able to find it.

Let me know and we can progress with the merge,

Cheers


> Refactor MoreLikeThisHandler Implementation
> ---
>
> Key: SOLR-14408
> URL: https://issues.apache.org/jira/browse/SOLR-14408
> Project: Solr
>  Issue Type: Improvement
>  Components: MoreLikeThis
>Reporter: Nazerke Seidan
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The main goal of this refactoring is for readability and accessibility of 
> MoreLikeThisHandler class. Current MoreLikeThisHandler class consists of two 
> static subclasses and accessing them later in MoreLikeThisComponent.  I 
> propose to have them as separate public classes. 
> cc: [~abenedetti], as you have had the recent commit for MLT, what do you 
> think about this?  Anyway, the code is ready for review. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14850) ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-28 Thread Yevhen Tienkaiev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203495#comment-17203495
 ] 

Yevhen Tienkaiev commented on SOLR-14850:
-

[~ab] please adjust my full name to `Yevhen Tienkaiev`, thanks

> ExactStatsCache NullPointerException when shards.tolerant=true
> --
>
> Key: SOLR-14850
> URL: https://issues.apache.org/jira/browse/SOLR-14850
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 8.6.2
>Reporter: Yevhen Tienkaiev
>Assignee: Andrzej Bialecki
>Priority: Critical
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> All derived classes from *ExactStatsCache* fails if *shards.tolerant* set to 
> *true* and some shard is down.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:59)
>   at 
> org.apache.solr.search.stats.ExactStatsCache.doMergeToGlobalStats(ExactStatsCache.java:104)
>   at 
> org.apache.solr.search.stats.StatsCache.mergeToGlobalStats(StatsCache.java:173)
>   at 
> org.apache.solr.handler.component.QueryComponent.updateStats(QueryComponent.java:713)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:457)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2606)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:812)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:588)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
>   at 
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at org.eclipse.jetty.server.Server.handle(Server.java:500)
>   at 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
>   at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.ecli

[jira] [Comment Edited] (SOLR-14850) ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-28 Thread Yevhen Tienkaiev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203495#comment-17203495
 ] 

Yevhen Tienkaiev edited comment on SOLR-14850 at 9/28/20, 8:12 PM:
---

[~ab] please adjust my full name to `Yevhen Tienkaiev`, previously it was a 
little incorrect here, thanks


was (Author: hronom):
[~ab] please adjust my full name to `Yevhen Tienkaiev`, thanks

> ExactStatsCache NullPointerException when shards.tolerant=true
> --
>
> Key: SOLR-14850
> URL: https://issues.apache.org/jira/browse/SOLR-14850
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 8.6.2
>Reporter: Yevhen Tienkaiev
>Assignee: Andrzej Bialecki
>Priority: Critical
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> All derived classes from *ExactStatsCache* fails if *shards.tolerant* set to 
> *true* and some shard is down.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:59)
>   at 
> org.apache.solr.search.stats.ExactStatsCache.doMergeToGlobalStats(ExactStatsCache.java:104)
>   at 
> org.apache.solr.search.stats.StatsCache.mergeToGlobalStats(StatsCache.java:173)
>   at 
> org.apache.solr.handler.component.QueryComponent.updateStats(QueryComponent.java:713)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:457)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2606)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:812)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:588)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
>   at 
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at org.eclipse.jetty.server.Server.handle(Server.java:500)
>   at 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
>   at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:31

[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-09-28 Thread GitBox


alessandrobenedetti commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r496225246



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java
##
@@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, 
SolrParams params,
 @Override
 public Query parse() throws SyntaxError {
   // ReRanking Model
-  final String modelName = localParams.get(LTRQParserPlugin.MODEL);
-  if ((modelName == null) || modelName.isEmpty()) {
+  final String[] modelNames = 
localParams.getParams(LTRQParserPlugin.MODEL);
+  if ((modelNames == null) || modelNames.length==0 || 
modelNames[0].isEmpty()) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
 "Must provide model in the request");
   }
-
-  final LTRScoringModel ltrScoringModel = mr.getModel(modelName);
-  if (ltrScoringModel == null) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-"cannot find " + LTRQParserPlugin.MODEL + " " + modelName);
-  }
-
-  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
-  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
-  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  // Check if features are requested and if the model feature store and 
feature-transform feature store are the same
-  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures:false;
-  if (threadManager != null) {
-
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
-  }
-  final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel,
-  extractEFIParams(localParams),
-  featuresRequestedFromSameStore, threadManager);
-
-  // Enable the feature vector caching if we are extracting features, and 
the features
-  // we requested are the same ones we are reranking with
-  if (featuresRequestedFromSameStore) {
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+ 
+  LTRScoringQuery[] rerankingQueries = new 
LTRScoringQuery[modelNames.length];
+  for (int i = 0; i < modelNames.length; i++) {
+final LTRScoringQuery rerankingQuery;
+if (!ORIGINAL_RANKING.equals(modelNames[i])) {

Review comment:
   Hi @cpoerschke , I think you are right, it is probably better to have a 
separate param to activate the interleaving with the original ranking.
   I have just made the change and updated the tests.
   They are green locally, I'll just wait for them to be green remotely as well





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12987) Log deprecation warnings to separate log file

2020-09-28 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203524#comment-17203524
 ] 

David Smiley commented on SOLR-12987:
-

First, I want to say that I don't think we should bother logging these 
deprecations to a separate file.  If a user wants to configure log4j2.xml for 
this, they could do so (e.g. via all warnings).  Moreover, there is _already_ 
an easy way to see your warnings (to include deprecations) thanks to the 
logging tab in the Solr admin UI powered by the LoggingHandler and LogWatcher 
(defaulting to WARN level).  Great!

I got started on a trivial deprecation logger and quickly realized what I was 
doing was partially redundant with a cool feature in Solr that already logs 
Deprecated annotations on classes loaded via 
[SolrResourceLoader|https://github.com/apache/lucene-solr/blob/5e617ccc33d91998a992a87ae258de43ef75242e/solr/core/src/java/org/apache/solr/core/SolrResourceLoader.java#L532].
  While cool, it's a problem that +there is no way to control the message that 
is reported+, or to prevent logging this.  Here is what's reported for the DIH:
{noformat}
2020-09-28 20:50:27.899 WARN  (coreLoadExecutor-13-thread-2) [   x:mail] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.DataImportHandler]. Please consult documentation how to replace it 
accordingly.
{noformat}

This is a great example of the problem because I'd rather we not log a warning 
about the DIH being deprecated -- it is being _moved_.  The deprecation notion 
is _sometimes_ only useful for those maintaining Solr itself (us committers).  
Maybe we could just remove the annotation and only use the javadoc 
{{\@deprecated}} tag for plugins we know are _moving_?  CC [~ichattopadhyaya]

Also, once we have a DeprecationLog feature (PR forthcoming), SRL could be 
modified to use it so that it logs once.

> Log deprecation warnings to separate log file
> -
>
> Key: SOLR-12987
> URL: https://issues.apache.org/jira/browse/SOLR-12987
> Project: Solr
>  Issue Type: New Feature
>  Components: logging
>Reporter: Jan Høydahl
>Assignee: David Smiley
>Priority: Major
>
> As discussed in solr-user list:
> {quote}When instructing people in what to do before upgrading to a new 
> version, we often tell them to check for deprecation log messages and fix 
> those before upgrading. Normally you'll see the most important logs as WARN 
> level in the Admin UI log tab just after startup and first use. But I'm 
> wondering if it also makes sense to introduce a separate 
> DeprecationLogger.log(foo) that is configured in log4j2.xml to log to a 
> separate logs/deprecation.log to make it easier to check this from the 
> command line. If the file is non-empty you have work to do :)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley opened a new pull request #1927: SOLR-12987: Deprecated plugins are logged once and with log category …

2020-09-28 Thread GitBox


dsmiley opened a new pull request #1927:
URL: https://github.com/apache/lucene-solr/pull/1927


   …org.apache.solr.DEPRECATED
   
   https://issues.apache.org/jira/browse/SOLR-12987



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-09-28 Thread GitBox


alessandrobenedetti commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r496244191



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java
##
@@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, 
SolrParams params,
 @Override
 public Query parse() throws SyntaxError {
   // ReRanking Model
-  final String modelName = localParams.get(LTRQParserPlugin.MODEL);
-  if ((modelName == null) || modelName.isEmpty()) {
+  final String[] modelNames = 
localParams.getParams(LTRQParserPlugin.MODEL);
+  if ((modelNames == null) || modelNames.length==0 || 
modelNames[0].isEmpty()) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
 "Must provide model in the request");
   }
-
-  final LTRScoringModel ltrScoringModel = mr.getModel(modelName);
-  if (ltrScoringModel == null) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-"cannot find " + LTRQParserPlugin.MODEL + " " + modelName);
-  }
-
-  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
-  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
-  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  // Check if features are requested and if the model feature store and 
feature-transform feature store are the same
-  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures:false;
-  if (threadManager != null) {
-
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
-  }
-  final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel,
-  extractEFIParams(localParams),
-  featuresRequestedFromSameStore, threadManager);
-
-  // Enable the feature vector caching if we are extracting features, and 
the features
-  // we requested are the same ones we are reranking with
-  if (featuresRequestedFromSameStore) {
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+ 
+  LTRScoringQuery[] rerankingQueries = new 
LTRScoringQuery[modelNames.length];
+  for (int i = 0; i < modelNames.length; i++) {
+final LTRScoringQuery rerankingQuery;
+if (!ORIGINAL_RANKING.equals(modelNames[i])) {

Review comment:
   Thinking a little bit more about it, even if we solve the 
"originalRanking" request through a special param, when returning the results, 
the [interleaving] transformer will need to present for the search results 
coming from the original ranking a value :
   
   {
   "id":"GB18030TEST",
   "score":1.0005897,
   "[interleaving]":"OriginalRanking"},
 {
   "id":"UTF8TEST",
   "score":0.79656565,
   "[interleaving]":"myModel"}]
 }
   
   This must follow the same format ( I wouldn't add a special field name 
different from [interleaing], this would complicate evaluators behaviours.
   
   This means that in the end I guess we need to keep a special name to 
indicate the original ranking, and this name can't be used by the admin for any 
other model.
   Given that, do you think is still beneficial to have a separate parameter?
   Or just using the original approach, with the documented exclusivity of a 
special name, would look cleaner?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-09-28 Thread GitBox


muse-dev[bot] commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r496264303



##
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java
##
@@ -166,64 +186,77 @@ public void scoreFeatures(IndexSearcher indexSearcher, 
TopDocs firstPassTopDocs,
 docBase = readerContext.docBase;
 scorer = modelWeight.scorer(readerContext);
   }
-  // Scorer for a LTRScoringQuery.ModelWeight should never be null since 
we always have to
-  // call score
-  // even if no feature scorers match, since a model might use that info to
-  // return a
-  // non-zero score. Same applies for the case of advancing a 
LTRScoringQuery.ModelWeight.ModelScorer
-  // past the target
-  // doc since the model algorithm still needs to compute a potentially
-  // non-zero score from blank features.
-  assert (scorer != null);
-  final int targetDoc = docID - docBase;
-  scorer.docID();
-  scorer.iterator().advance(targetDoc);
-
-  scorer.getDocInfo().setOriginalDocScore(hit.score);
-  hit.score = scorer.score();
-  if (hitUpto < topN) {
-reranked[hitUpto] = hit;
-// if the heap is not full, maybe I want to log the features for this
-// document
+  scoreSingleHit(indexSearcher, topN, modelWeight, docBase, hitUpto, hit, 
docID, scoringQuery, scorer, reranked);

Review comment:
   *NULL_DEREFERENCE:*  object `scorer` last assigned on line 172 could be 
null and is dereferenced by call to `scoreSingleHit(...)` at line 189.

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+public class TeamDraftInterleaving implements Interleaving{
+  public static Random RANDOM;
+
+  static {
+// We try to make things reproducible in the context of our tests by 
initializing the random instance
+// based on the current seed
+String seed = System.getProperty("tests.seed");
+if (seed == null) {
+  RANDOM = new Random();

Review comment:
   *PREDICTABLE_RANDOM:*  This random generator (java.util.Random) is 
predictable 
[(details)](https://find-sec-bugs.github.io/bugs.htm#PREDICTABLE_RANDOM)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] goankur opened a new pull request #1928: LUCENE-9444: Improve test coverage for TaxonomyFacetLabels

2020-09-28 Thread GitBox


goankur opened a new pull request #1928:
URL: https://github.com/apache/lucene-solr/pull/1928


   
   
   
   # Description
   
   This PR improves test coverage for `TaxonomyFacetLabels` ( added in [PR 
1893](https://github.com/apache/lucene-solr/pull/1893/files) ) by exercising 
the API - `TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)` to fetch 
facet labels for a specific dimension.
   
   # Solution
   
   The solution enhances the test method - 
`TestTaxonomyFacetCounts.testRandom()` and does the following
 - Pick a dimension at random
 - Filter expected facet labels to retain entries for chosen dimension only.
 - Invoke the API - `TaxonomyFacetLabels.nextFacetLabel(docId, 
facetDimension)` to get facet labels for specific dimension.
 - Ensures that `expected` and `actual` facet labels match.
   
   # Tests
   
 - `TestTaxonomyFacetCounts.testRandom()` is enhanced to improve test 
coverage for `TaxonomyFacetLabels`. 
 - `FacetTestCase` is refactored to add a `dimension` parameter to utility 
method `getAllTaxonomyFacetLabels(...)`
   
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ x] I have developed this patch against the `master` branch.
   - [x ] I have run `./gradlew check`.
   - [x ] I have added tests for my changes.
   - [x ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread Ankur (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur reopened LUCENE-9444:
---

Thanks [~mikemccand] for merging the 
[PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files]

I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}}  
did not exercise the API to get facet labels for specific dimension -  
{{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I made changes 
in [PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files] Can you 
please take a look ?

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0), 8.7
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread Ankur (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203559#comment-17203559
 ] 

Ankur edited comment on LUCENE-9444 at 9/29/20, 12:05 AM:
--

Thanks [~mikemccand] for merging the 
[PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files]

I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}}  
did not exercise the API to get facet labels for specific dimension -  
{{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I added the 
required changes in 
[PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files]

 

Re-opening the issue so that you can take a look.


was (Author: goankur):
Thanks [~mikemccand] for merging the 
[PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files]

I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}}  
did not exercise the API to get facet labels for specific dimension -  
{{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I made changes 
in [PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files] Can you 
please take a look ?

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0), 8.7
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-09-28 Thread Ankur (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203559#comment-17203559
 ] 

Ankur edited comment on LUCENE-9444 at 9/29/20, 12:06 AM:
--

Thanks [~mikemccand] for merging the 
[PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files]

I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}}  
did not exercise the API to get facet labels for specific dimension -  
{{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I added the 
required changes in 
[PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files]

 Re-opening the issue so that you can take a look.


was (Author: goankur):
Thanks [~mikemccand] for merging the 
[PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files]

I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}}  
did not exercise the API to get facet labels for specific dimension -  
{{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I added the 
required changes in 
[PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files]

 

Re-opening the issue so that you can take a look.

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0), 8.7
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14151) Make schema components load from packages

2020-09-28 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203575#comment-17203575
 ] 

Noble Paul commented on SOLR-14151:
---

{quote}This change is not thread safe, looks like it should.{quote}

How is it not threadsafe?

> Make schema components load from packages
> -
>
> Key: SOLR-14151
> URL: https://issues.apache.org/jira/browse/SOLR-14151
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.7
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Example:
> {code:xml}
>  
> 
>   
>generateNumberParts="0" catenateWords="0"
>   catenateNumbers="0" catenateAll="0"/>
>   
>   
> 
>   
> {code}
> * When a package is updated, the entire {{IndexSchema}} object is refreshed, 
> but the SolrCore object is not reloaded
> * Any component can be prefixed with the package name
> * The semantics of loading plugins remain the same as that of the components 
> in {{solrconfig.xml}}
> * Plugins can be registered using schema API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1925: Cleanup DWPT state handling

2020-09-28 Thread GitBox


dnhatn commented on a change in pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#discussion_r496348288



##
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##
@@ -284,7 +283,7 @@ FrozenBufferedUpdates prepareFlush() {
 
   /** Flush all pending docs to a new segment */
   FlushedSegment flush(DocumentsWriter.FlushNotifications flushNotifications) 
throws IOException {
-assert flushPending.get() == Boolean.TRUE;
+assert state == State.FLUSHING;

Review comment:
   Maybe add the current state to the assertion.

##
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##
@@ -169,6 +167,7 @@ long updateDocuments(Iterable> docs
 try {
   testPoint("DocumentsWriterPerThread addDocuments start");
   assert abortingException == null: "DWPT has hit aborting exception but 
is still indexing";
+  assert state == State.ACTIVE || state == State.FLUSH_PENDING : "Illegal 
state: " + state + " must be ACTIVE of FLUSH_PENDING";

Review comment:
   nit: of -> or

##
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java
##
@@ -157,25 +157,18 @@ private boolean updatePeaks(long delta) {
   }
 
   DocumentsWriterPerThread doAfterDocument(DocumentsWriterPerThread perThread, 
boolean isUpdate) {
-final long delta = perThread.getCommitLastBytesUsedDelta();
+final long delta = perThread.commitLastBytesUsed();
 synchronized (this) {
-  // we need to commit this under lock but calculate it outside of the 
lock to minimize the time this lock is held
-  // per document. The reason we update this under lock is that we mark 
DWPTs as pending without acquiring it's
-  // lock in #setFlushPending and this also reads the committed bytes and 
modifies the flush/activeBytes.
-  // In the future we can clean this up to be more intuitive.
-  perThread.commitLastBytesUsed(delta);
   try {
 /*
  * We need to differentiate here if we are pending since 
setFlushPending
  * moves the perThread memory to the flushBytes and we could be set to
  * pending during a delete
  */
-if (perThread.isFlushPending()) {
-  flushBytes += delta;
-  assert updatePeaks(delta);
-} else {
-  activeBytes += delta;
-  assert updatePeaks(delta);
+activeBytes += delta;
+assert updatePeaks(delta);
+if (perThread.isFlushPending() == false) {
+  assert perThread.getState() == DocumentsWriterPerThread.State.ACTIVE 
: "expected ACTIVE state but was: " + perThread.getState();

Review comment:
   Do we still need `isFlushPending` method? Should we compare the state of 
`perThread` to ACTIVE or FLUSH_PENDING instead.

##
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##
@@ -593,9 +568,61 @@ void unlock() {
   }
 
   /**
-   * Returns true iff this DWPT has been flushed
+   * Returns the DWPTs current state.
*/
-  boolean hasFlushed() {
-return hasFlushed.get() == Boolean.TRUE;
+  State getState() {
+return state;
   }
+
+  /**
+   * Transitions the DWPT to the given state of fails if the transition is 
invalid.
+   * @throws IllegalStateException if the given state can not be transitioned 
to.
+   */
+  synchronized void transitionTo(State state) {
+if (state.canTransitionFrom(this.state) == false) {
+  throw new IllegalStateException("Can't transition from " + this.state + 
" to " + state);
+}
+assert state.mustHoldLock == false || isHeldByCurrentThread() : "illegal 
state: " + state + " lock is held: " + isHeldByCurrentThread();
+this.state = state;
+  }
+
+  /**
+   * Internal DWPT State.
+   */
+  enum State {
+/**
+ * Default states when a DWPT is initialized and ready to index documents.
+ */
+ACTIVE(null, true),
+/**
+ * The DWPT can still index documents but should be moved to FLUSHING 
state as soon as possible.
+ * Transitions to this state can be done concurrently while another thread 
is actively indexing into this DWPT.
+ */
+FLUSH_PENDING(ACTIVE, false),
+/**
+ * The DWPT should not receive any further documents and is current 
flushing or queued up for flushing.
+ */
+FLUSHING(FLUSH_PENDING, true),
+/**
+ * The DWPT has been flushed and is ready to be garbage collected.
+ */
+FLUSHED(FLUSHING, false);
+
+private final State previousState;
+final boolean mustHoldLock; // only for asserts

Review comment:
   This can be private?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1921: SOLR-14829: Improve documentation for Request Handlers in RefGuide and solrconfig.xml

2020-09-28 Thread GitBox


dsmiley commented on a change in pull request #1921:
URL: https://github.com/apache/lucene-solr/pull/1921#discussion_r496366133



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -307,11 +307,13 @@ The `echoParams` parameter controls what information 
about request parameters is
 
 The `echoParams` parameter accepts the following values:
 
-* `explicit`: This is the default value. Only parameters included in the 
actual request, plus the `_` parameter (which is a 64-bit numeric timestamp) 
will be added to the `params` section of the response header.
+* `explicit`: Only parameters included in the actual request, plus the `_` 
parameter (which is a 64-bit numeric timestamp) will be added to the `params` 
section of the response header.

Review comment:
   @gerlowskija this reminds me... (off topic of this PR), do you know 
where and why "_" parameter is added?  If it's for query identification 
purposes then it's duplicitous with the "rid" you added.  I'm having trouble 
searching for it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14151) Make schema components load from packages

2020-09-28 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203625#comment-17203625
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14151:
--

bq. How is it not threadsafe?
That {{schema}} field that you added is shared memory that is accessed by 
different threads. You should either make it volatile, an atomic reference or 
alternatively always access it in synchronized blocks.

> Make schema components load from packages
> -
>
> Key: SOLR-14151
> URL: https://issues.apache.org/jira/browse/SOLR-14151
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.7
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Example:
> {code:xml}
>  
> 
>   
>generateNumberParts="0" catenateWords="0"
>   catenateNumbers="0" catenateAll="0"/>
>   
>   
> 
>   
> {code}
> * When a package is updated, the entire {{IndexSchema}} object is refreshed, 
> but the SolrCore object is not reloaded
> * Any component can be prefixed with the package name
> * The semantics of loading plugins remain the same as that of the components 
> in {{solrconfig.xml}}
> * Plugins can be registered using schema API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203649#comment-17203649
 ] 

ASF subversion and git services commented on SOLR-14901:


Commit 01da67c728df1e46ed82fee1a178546c0665b421 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=01da67c ]

SOLR-14901: TestPackages uses binary precompiled classes to refer to analysis 
factory FQCNs


> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Noble Paul
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203650#comment-17203650
 ] 

ASF subversion and git services commented on SOLR-14901:


Commit f5219061252912c000f5d119d49bb315e5e6f1ae in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f521906 ]

SOLR-14901: TestPackages uses binary precompiled classes to refer to analysis 
factory FQCNs


> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Noble Paul
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203651#comment-17203651
 ] 

ASF subversion and git services commented on SOLR-14901:


Commit 01da67c728df1e46ed82fee1a178546c0665b421 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=01da67c ]

SOLR-14901: TestPackages uses binary precompiled classes to refer to analysis 
factory FQCNs


> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Noble Paul
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14901) TestPackages uses binary precompiled classes to refer to analysis factory FQCNs

2020-09-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203652#comment-17203652
 ] 

ASF subversion and git services commented on SOLR-14901:


Commit f5219061252912c000f5d119d49bb315e5e6f1ae in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f521906 ]

SOLR-14901: TestPackages uses binary precompiled classes to refer to analysis 
factory FQCNs


> TestPackages uses binary precompiled classes to refer to analysis factory 
> FQCNs
> ---
>
> Key: SOLR-14901
> URL: https://issues.apache.org/jira/browse/SOLR-14901
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Noble Paul
>Priority: Minor
>
> Base analysis factories' package name were renamed in [LUCENE-9317]. 
> {{o.a.s.pkg.TestPackages}} is failing since it has hard coded their old 
> FQCNs, that needs to be fixed.
> See https://github.com/apache/lucene-solr/pull/1836 for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >