date:20201023

[jira] [Commented] (SOLR-14944) solr metrics should remove "spins" references

2020-10-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219616#comment-17219616
 ] 

ASF subversion and git services commented on SOLR-14944:


Commit 97551dd644b94390f696c907d94ed602657844db in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=97551dd ]

SOLR-14944: Fix the description to reflect the fact that this is not removed in 
8.7.


> solr metrics should remove "spins" references
> -
>
> Key: SOLR-14944
> URL: https://issues.apache.org/jira/browse/SOLR-14944
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: master (9.0)
>Reporter: Robert Muir
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0), 8.7
>
>
> The lucene internal IOUtils.spins stuff was exposed in various ways here, in 
> order to not break stuff in LUCENE-9576 I simply wired these apis to 
> {{false}}, but they should probably be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14944) solr metrics should remove "spins" references

2020-10-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219615#comment-17219615
 ] 

ASF subversion and git services commented on SOLR-14944:


Commit b8a3d11c47d22f5b61ceacb6b289ab19ee69dfdc in lucene-solr's branch 
refs/heads/branch_8_7 from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b8a3d11 ]

SOLR-14944: Fix the description to reflect the fact that this is not removed in 
8.7.


> solr metrics should remove "spins" references
> -
>
> Key: SOLR-14944
> URL: https://issues.apache.org/jira/browse/SOLR-14944
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: master (9.0)
>Reporter: Robert Muir
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0), 8.7
>
>
> The lucene internal IOUtils.spins stuff was exposed in various ways here, in 
> order to not break stuff in LUCENE-9576 I simply wired these apis to 
> {{false}}, but they should probably be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-10-23 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219642#comment-17219642
 ] 

Jan Høydahl commented on SOLR-13973:


I'm aware of that design doc, and my comment was mostly referring to the "slim" 
distro, where 1st party packages needs to be released outside of the solr 
tarball, but still an official release from the project.

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on pull request #1993: .gitignore clean up

2020-10-23 Thread GitBox



dsmiley commented on pull request #1993:
URL: https://github.com/apache/lucene-solr/pull/1993#issuecomment-715309491


   @msokolov you added `.#*` -- what comment should I use in this file to 
explain what this is?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async

2020-10-23 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219649#comment-17219649
 ] 

David Smiley commented on SOLR-14354:
-

FYI I did last night and +1'ed it.
BUILD SUCCESSFUL
Total time: 42 minutes 55 seconds

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wait.
> By using this one, the model of HttpShardHandler can be written into 
> something like this
> {code:java}
> handler.sendReq(req, (is) -> {
>   executor.submit(() ->
> try (is) {
>   // Read the content from InputStream
> }
>   )
> }) {code}
>  The first diagram will be changed into this
> !image-2020-03-23-10-09-10-221.png!
> Notice that although “sending req to shard1” is wide, it won’t take long time 
> since sending req is a very quick operation. With this operation, handling 
> threads won’t be spin up until first bytes are sent back. Notice that in this 
> approach we still have active threads waiting for more data from InputStream
> h2. 4. Solution 2: Buffering data and handle it inside jetty’s thread.
> Jetty have another listener called BufferingResponseListener. This is how it 
> is get used
> {code:java}
> client.newRequest(...).send(new BufferingResponseListener() {
>   public void onComplete(Result result) {
> try {
>   byte[

[GitHub] [lucene-solr] dsmiley closed pull request #1109: More pervasive use of PackageLoader / PluginInfo

2020-10-23 Thread GitBox



dsmiley closed pull request #1109:
URL: https://github.com/apache/lucene-solr/pull/1109


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gus-asf commented on a change in pull request #1995: LUCENE-9575 Add PatternTypingFilter to annotate tokens with flags and types

2020-10-23 Thread GitBox



gus-asf commented on a change in pull request #1995:
URL: https://github.com/apache/lucene-solr/pull/1995#discussion_r510861146



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTypingFilter.java
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.pattern;
+
+import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+
+/**
+ * Set a type attribute to a parameterized value when tokens are matched by 
any of a several regex patterns. The
+ * value set in the type attribute is parameterized with the match groups of 
the regex used for matching.
+ * In combination with TypeAsSynonymFilter and DropIfFlagged filter this can 
supply complex synonym patterns
+ * that are protected from subsequent analysis, and optionally drop the 
original term based on the flag
+ * set in this filter. See {@link PatternTypingFilterFactory} for full 
documentation.
+ *
+ * @since 8.8.0
+ * @see PatternTypingFilterFactory
+ */
+public class PatternTypingFilter extends TokenFilter {
+
+  private final Map patterns;
+  private final Map flags;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final FlagsAttribute flagAtt = addAttribute(FlagsAttribute.class);
+  private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+
+  public PatternTypingFilter(TokenStream input, LinkedHashMap 
patterns, Map flags) {
+super(input);
+this.patterns = patterns;
+this.flags = flags;
+  }
+
+  @Override
+  public final boolean incrementToken() throws IOException {
+if (input.incrementToken()) {
+  if (hasAttribute(CharTermAttribute.class)) {
+String termText = termAtt.toString();
+for (Map.Entry patRep : patterns.entrySet()) {
+  Pattern pattern = patRep.getKey();
+  Matcher matcher = pattern.matcher(termText);
+  String replaced = matcher.replaceFirst(patRep.getValue());
+  // N.B. Does not support producing a synonym identical to the 
original term.
+  // Avoids having to match() then replace() which performs a second 
find().
+  if (!replaced.equals(termText)) {

Review comment:
   @uschindler any further thoughts? If you agree with the above, I think 
all would be resolved.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] shalinmangar commented on pull request #2004: SOLR-14942: Reduce leader election time on node shutdown

2020-10-23 Thread GitBox



shalinmangar commented on pull request #2004:
URL: https://github.com/apache/lucene-solr/pull/2004#issuecomment-715322784


   @madrob would you like to make a final pass at this PR? I wish to merge to 
master and back port to 8x today.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #1995: LUCENE-9575 Add PatternTypingFilter to annotate tokens with flags and types

2020-10-23 Thread GitBox



uschindler commented on a change in pull request #1995:
URL: https://github.com/apache/lucene-solr/pull/1995#discussion_r510864013



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTypingFilter.java
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.pattern;
+
+import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+
+/**
+ * Set a type attribute to a parameterized value when tokens are matched by 
any of a several regex patterns. The
+ * value set in the type attribute is parameterized with the match groups of 
the regex used for matching.
+ * In combination with TypeAsSynonymFilter and DropIfFlagged filter this can 
supply complex synonym patterns
+ * that are protected from subsequent analysis, and optionally drop the 
original term based on the flag
+ * set in this filter. See {@link PatternTypingFilterFactory} for full 
documentation.
+ *
+ * @since 8.8.0
+ * @see PatternTypingFilterFactory
+ */
+public class PatternTypingFilter extends TokenFilter {
+
+  private final Map patterns;
+  private final Map flags;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final FlagsAttribute flagAtt = addAttribute(FlagsAttribute.class);
+  private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+
+  public PatternTypingFilter(TokenStream input, LinkedHashMap 
patterns, Map flags) {
+super(input);
+this.patterns = patterns;
+this.flags = flags;
+  }
+
+  @Override
+  public final boolean incrementToken() throws IOException {
+if (input.incrementToken()) {
+  if (hasAttribute(CharTermAttribute.class)) {
+String termText = termAtt.toString();
+for (Map.Entry patRep : patterns.entrySet()) {
+  Pattern pattern = patRep.getKey();
+  Matcher matcher = pattern.matcher(termText);
+  String replaced = matcher.replaceFirst(patRep.getValue());
+  // N.B. Does not support producing a synonym identical to the 
original term.
+  // Avoids having to match() then replace() which performs a second 
find().
+  if (!replaced.equals(termText)) {

Review comment:
   I think that's fine.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #1995: LUCENE-9575 Add PatternTypingFilter to annotate tokens with flags and types

2020-10-23 Thread GitBox



uschindler commented on a change in pull request #1995:
URL: https://github.com/apache/lucene-solr/pull/1995#discussion_r510868759



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTypingFilter.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.pattern;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+
+import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Set a type attribute to a parameterized value when tokens are matched by 
any of a several regex patterns. The
+ * value set in the type attribute is parameterized with the match groups of 
the regex used for matching.
+ * In combination with TypeAsSynonymFilter and DropIfFlagged filter this can 
supply complex synonym patterns
+ * that are protected from subsequent analysis, and optionally drop the 
original term based on the flag
+ * set in this filter. See {@link PatternTypingFilterFactory} for full 
documentation.
+ *
+ * @see PatternTypingFilterFactory
+ * @since 8.8.0
+ */
+public class PatternTypingFilter extends TokenFilter {
+
+  private final Map> 
replacementAndFlagByPattern;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final FlagsAttribute flagAtt = addAttribute(FlagsAttribute.class);
+  private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+
+  public PatternTypingFilter(TokenStream input, LinkedHashMap> replacementAndFlagByPattern) {

Review comment:
   Explicitely saying LinkedHashMap sounds strange. I know the list must be 
sorted, so it's more a list.
   
   I liked it use a  record. This would be a classical 
exple of the new Java 6 record types!
   
   This is a public API, any client code may call this - also non-Solr users. 
So maybe the constructor argument should be `a List` (a new 
class, which may actually be a Record in Java 16/17).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #1995: LUCENE-9575 Add PatternTypingFilter to annotate tokens with flags and types

2020-10-23 Thread GitBox



uschindler commented on a change in pull request #1995:
URL: https://github.com/apache/lucene-solr/pull/1995#discussion_r510869235



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTypingFilter.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.pattern;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+
+import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Set a type attribute to a parameterized value when tokens are matched by 
any of a several regex patterns. The
+ * value set in the type attribute is parameterized with the match groups of 
the regex used for matching.
+ * In combination with TypeAsSynonymFilter and DropIfFlagged filter this can 
supply complex synonym patterns
+ * that are protected from subsequent analysis, and optionally drop the 
original term based on the flag
+ * set in this filter. See {@link PatternTypingFilterFactory} for full 
documentation.
+ *
+ * @see PatternTypingFilterFactory
+ * @since 8.8.0
+ */
+public class PatternTypingFilter extends TokenFilter {
+
+  private final Map> 
replacementAndFlagByPattern;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final FlagsAttribute flagAtt = addAttribute(FlagsAttribute.class);
+  private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+
+  public PatternTypingFilter(TokenStream input, LinkedHashMap> replacementAndFlagByPattern) {

Review comment:
   FYI, I was not aware that the Map of Patterns and Elements are exposed 
as public API.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #1995: LUCENE-9575 Add PatternTypingFilter to annotate tokens with flags and types

2020-10-23 Thread GitBox



uschindler commented on a change in pull request #1995:
URL: https://github.com/apache/lucene-solr/pull/1995#discussion_r510870382



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTypingFilter.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.pattern;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+
+import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Set a type attribute to a parameterized value when tokens are matched by 
any of a several regex patterns. The
+ * value set in the type attribute is parameterized with the match groups of 
the regex used for matching.
+ * In combination with TypeAsSynonymFilter and DropIfFlagged filter this can 
supply complex synonym patterns
+ * that are protected from subsequent analysis, and optionally drop the 
original term based on the flag
+ * set in this filter. See {@link PatternTypingFilterFactory} for full 
documentation.
+ *
+ * @see PatternTypingFilterFactory
+ * @since 8.8.0
+ */
+public class PatternTypingFilter extends TokenFilter {
+
+  private final Map> 
replacementAndFlagByPattern;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final FlagsAttribute flagAtt = addAttribute(FlagsAttribute.class);
+  private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+
+  public PatternTypingFilter(TokenStream input, LinkedHashMap> replacementAndFlagByPattern) {

Review comment:
   If we have a class `PatternTypingRule` we can make constructors to 
create those: `new PatternTypeingRule(Pattern pattern, String replacement, int 
flag)` but also `new PatternTypingRule(String pattern, String replacement, int 
flag)`.
   
   I would strogly prefer tono misuse maps





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #1995: LUCENE-9575 Add PatternTypingFilter to annotate tokens with flags and types

2020-10-23 Thread GitBox



uschindler commented on a change in pull request #1995:
URL: https://github.com/apache/lucene-solr/pull/1995#discussion_r510870382



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTypingFilter.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.pattern;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+
+import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Set a type attribute to a parameterized value when tokens are matched by 
any of a several regex patterns. The
+ * value set in the type attribute is parameterized with the match groups of 
the regex used for matching.
+ * In combination with TypeAsSynonymFilter and DropIfFlagged filter this can 
supply complex synonym patterns
+ * that are protected from subsequent analysis, and optionally drop the 
original term based on the flag
+ * set in this filter. See {@link PatternTypingFilterFactory} for full 
documentation.
+ *
+ * @see PatternTypingFilterFactory
+ * @since 8.8.0
+ */
+public class PatternTypingFilter extends TokenFilter {
+
+  private final Map> 
replacementAndFlagByPattern;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final FlagsAttribute flagAtt = addAttribute(FlagsAttribute.class);
+  private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+
+  public PatternTypingFilter(TokenStream input, LinkedHashMap> replacementAndFlagByPattern) {

Review comment:
   If we have a class `PatternTypingRule` we can make constructors to 
create those: `new PatternTypeingRule(Pattern pattern, String replacement, int 
flag)` but also `new PatternTypingRule(String pattern, String replacement, int 
flag)`.
   
   I would strongly prefer to not misuse maps.
   
   We should also add a varargs constructor `PatternTypingFilter(TokenStream 
input, PatternTypeingRule... rules) `





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on a change in pull request #1995: LUCENE-9575 Add PatternTypingFilter to annotate tokens with flags and types

2020-10-23 Thread GitBox



uschindler commented on a change in pull request #1995:
URL: https://github.com/apache/lucene-solr/pull/1995#discussion_r510868759



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTypingFilter.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.pattern;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+
+import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Set a type attribute to a parameterized value when tokens are matched by 
any of a several regex patterns. The
+ * value set in the type attribute is parameterized with the match groups of 
the regex used for matching.
+ * In combination with TypeAsSynonymFilter and DropIfFlagged filter this can 
supply complex synonym patterns
+ * that are protected from subsequent analysis, and optionally drop the 
original term based on the flag
+ * set in this filter. See {@link PatternTypingFilterFactory} for full 
documentation.
+ *
+ * @see PatternTypingFilterFactory
+ * @since 8.8.0
+ */
+public class PatternTypingFilter extends TokenFilter {
+
+  private final Map> 
replacementAndFlagByPattern;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final FlagsAttribute flagAtt = addAttribute(FlagsAttribute.class);
+  private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+
+  public PatternTypingFilter(TokenStream input, LinkedHashMap> replacementAndFlagByPattern) {

Review comment:
   Explicitely saying LinkedHashMap sounds strange. I know the list must be 
sorted, so it's more a list.
   
   I  would like it to use a  record. This would be a 
classical exple of the new Java 16 record types!
   
   This is a public API, any client code may call this - also non-Solr users. 
So maybe the constructor argument should be `a List` (a new 
class, which may actually be a Record in Java 16/17).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #2018: LUCENE-9582: rename VectorValues.ScoreFunction to SearchStrategy

2020-10-23 Thread GitBox



msokolov commented on a change in pull request #2018:
URL: https://github.com/apache/lucene-solr/pull/2018#discussion_r510895391



##
File path: lucene/core/src/java/org/apache/lucene/util/VectorUtil.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util;
+ 
+/**
+ * Utilities for computations with numeric arrays
+ */
+public final class VectorUtil {

Review comment:
   oh, good call. I will add





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14942) Reduce leader election time on node shutdown

2020-10-23 Thread Shalin Shekhar Mangar (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219733#comment-17219733
 ] 

Shalin Shekhar Mangar commented on SOLR-14942:
--

Thanks Hoss. I have updated the PR with code comments. Mike Drob also gave some 
feedback on the PR which has been incorporated as well. I intend to merge to 
master over the weekend.

> Reduce leader election time on node shutdown
> 
>
> Key: SOLR-14942
> URL: https://issues.apache.org/jira/browse/SOLR-14942
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.7.3, 8.6.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The credit for this issue and investigation belongs to [~caomanhdat]. I am 
> merely reporting the issue and creating PRs based on his work.
> The shutdown process waits for all replicas/cores to be closed before 
> removing the election node of the leader. This can take some time due to 
> index flush or merge activities on the leader cores and delays new leaders 
> from being elected.
> This process happens at CoreContainer.shutdown():
> # zkController.preClose(): remove current node from live_node and change 
> states of all cores in this node to DOWN state. Assuming that the current 
> node hosting a leader of a shard, the shard becomes leaderless after calling 
> this method, since the state of the leader is DOWN now. The leader election 
> process is not triggered for the shard since the election node is still 
> on-hold by the current node.
> # Waiting for all cores to be loaded (if there are any).
> # SolrCores.close(): close all cores.
> # zkController.close(): this is where all ephemeral nodes are removed from ZK 
> which include election nodes created by this node. Therefore other replicas 
> in the shard can take part in the leader election from now.
> Note that CoreContainer.shutdown() is invoked when Jetty/Solr nodes receive 
> SIGTERM signal. 
> On receiving SIGTERM, Jetty will also stop accepting new connections and new 
> requests. This is a very important factor, since even if the leader replica 
> is ACTIVE and its node in live_nodes, the shard will be considered as 
> leaderless if no-one can index to that shard. Therefore shards become 
> leaderless as soon as the node (which contains shard’s leader) receives 
> SIGTERM.
> Therefore the longer time step 1, 2 and 3 needed to finish, the longer shards 
> remain leaderless. The time needed for step 3 scales with the number of cores 
> so the more cores a node has, the worse. This time is spent in 
> IndexWriter.close() where the system will 
> # Flush all pending updates to disk
> # Waiting for all merge finish (this most likely is the meaty part)
> The shutdown process is proposed to changed to:
> # Wait for all in-flight indexing requests and replication requests to 
> complete
> # Remove election nodes
> # Close all replicas/cores
> This ensures that index flush or merges do not block new leader elections 
> anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on pull request #2016: SOLR-14067 v2 Move Stateless Scripting Update Process to /contrib

2020-10-23 Thread GitBox



epugh commented on pull request #2016:
URL: https://github.com/apache/lucene-solr/pull/2016#issuecomment-715416443


   I'm getting there, and I wanted to specifically mention that @chatman other 
PR was critical for me following the chain of changes required to move this to 
/contrib.   Thanks @chatman for chasing the all the touchpoints.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

2020-10-23 Thread Michael Sokolov (Jira)

Michael Sokolov created LUCENE-9583:
---

 Summary: How should we expose VectorValues.RandomAccess?
 Key: LUCENE-9583
 URL: https://issues.apache.org/jira/browse/LUCENE-9583
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael Sokolov


In the newly-added VectorValues API, we have a RandomAccess sub-interface. 
[[~jtibshirani] pointed out this is not needed by some vector-indexing 
strategies which can operate solely using a forward-iterator (it is needed by 
HNSW), and so in the interest of simplifying the public API we should not 
expose this internal detail (which by the way surfaces internal ordinals that 
are somewhat uninteresting outside the random access API).

I looked into how to move this inside the HNSW-specific code and remembered 
that we do also currently make use of the RA API when merging vector fields 
over sorted indexes. Without it, we would need to load all vectors into RAM  
while flushing/merging, as we currently do in BinaryDocValuesWriter.BinaryDVs. 
I wonder if it's worth paying this cost for the simpler API.

Another thing I noticed while reviewing this is that I moved the KNN 
`search(float[] target, int topK, int fanout)` method from `VectorValues` to 
`VectorValues.RandomAccess`. This I think we could move back, and handle the 
HNSW requirements for search elsewhere. I wonder if that would alleviate the 
major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

2020-10-23 Thread Michael Sokolov (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov updated LUCENE-9583:

Description: 
In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
sub-interface. [~jtibshirani] pointed out this is not needed by some 
vector-indexing strategies which can operate solely using a forward-iterator 
(it is needed by HNSW), and so in the interest of simplifying the public API we 
should not expose this internal detail (which by the way surfaces internal 
ordinals that are somewhat uninteresting outside the random access API).

I looked into how to move this inside the HNSW-specific code and remembered 
that we do also currently make use of the RA API when merging vector fields 
over sorted indexes. Without it, we would need to load all vectors into RAM  
while flushing/merging, as we currently do in 
{{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
for the simpler API.

Another thing I noticed while reviewing this is that I moved the KNN 
{{search(float[] target, int topK, int fanout)}} method from {{VectorValues}}  
to {{VectorValues.RandomAccess}}. This I think we could move back, and handle 
the HNSW requirements for search elsewhere. I wonder if that would alleviate 
the major concern here? 

  was:
In the newly-added VectorValues API, we have a RandomAccess sub-interface. 
[[~jtibshirani] pointed out this is not needed by some vector-indexing 
strategies which can operate solely using a forward-iterator (it is needed by 
HNSW), and so in the interest of simplifying the public API we should not 
expose this internal detail (which by the way surfaces internal ordinals that 
are somewhat uninteresting outside the random access API).

I looked into how to move this inside the HNSW-specific code and remembered 
that we do also currently make use of the RA API when merging vector fields 
over sorted indexes. Without it, we would need to load all vectors into RAM  
while flushing/merging, as we currently do in BinaryDocValuesWriter.BinaryDVs. 
I wonder if it's worth paying this cost for the simpler API.

Another thing I noticed while reviewing this is that I moved the KNN 
`search(float[] target, int topK, int fanout)` method from `VectorValues` to 
`VectorValues.RandomAccess`. This I think we could move back, and handle the 
HNSW requirements for search elsewhere. I wonder if that would alleviate the 
major concern here? 


> How should we expose VectorValues.RandomAccess?
> ---
>
> Key: LUCENE-9583
> URL: https://issues.apache.org/jira/browse/LUCENE-9583
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #2020: SOLR-14949: Ability to customize Solr Docker build

2020-10-23 Thread GitBox



madrob commented on a change in pull request #2020:
URL: https://github.com/apache/lucene-solr/pull/2020#discussion_r510986901



##
File path: .github/workflows/docker-test.yml
##
@@ -17,6 +17,10 @@ jobs:
 
 runs-on: ubuntu-latest
 
+env:
+  DOCKER_SOLR_IMAGE_REPO: github-pr/solr

Review comment:
   Does this mean we will publish all of our PR docker images somewhere?

##
File path: help/docker.txt
##
@@ -0,0 +1,53 @@
+Docker Images for Solr
+==
+
+Solr docker images are built using Palantir's Docker Gradle plugin, 
https://github.com/palantir/gradle-docker.
+
+Common Inputs
+-
+
+The docker image and it's tag can be customized via the following options, all 
accepted via both Environment Variables and Gradle Properties.
+
+Docker Image Repository:
+   Default: "apache/solr"
+   EnvVar: DOCKER_SOLR_IMAGE_REPO
+   Gradle Property: -Pdocker.solr.imageRepo

Review comment:
   Should this be solr.docker instead of docker.solr?

##
File path: solr/docker/tests/cases/version/test.sh
##
@@ -1,45 +0,0 @@
-#!/bin/bash

Review comment:
   Deleted because we can set the tag and version to be anything now?

##
File path: help/docker.txt
##
@@ -0,0 +1,53 @@
+Docker Images for Solr
+==
+
+Solr docker images are built using Palantir's Docker Gradle plugin, 
https://github.com/palantir/gradle-docker.
+
+Common Inputs
+-
+
+The docker image and it's tag can be customized via the following options, all 
accepted via both Environment Variables and Gradle Properties.

Review comment:
   s/it's/its

##
File path: help/docker.txt
##
@@ -0,0 +1,53 @@
+Docker Images for Solr
+==
+
+Solr docker images are built using Palantir's Docker Gradle plugin, 
https://github.com/palantir/gradle-docker.
+
+Common Inputs
+-
+
+The docker image and it's tag can be customized via the following options, all 
accepted via both Environment Variables and Gradle Properties.
+
+Docker Image Repository:
+   Default: "apache/solr"
+   EnvVar: DOCKER_SOLR_IMAGE_REPO
+   Gradle Property: -Pdocker.solr.imageRepo
+
+Docker Image Tag:
+   Default: the Solr version, e.g. "9.0.0-SNAPSHOT"
+   EnvVar: DOCKER_SOLR_IMAGE_TAG
+   Gradle Property: -Pdocker.solr.imageTag
+
+Docker Image Name: (Use this to explicitly set a whole image name. If given, 
the image repo and image version options above are ignored.)
+   Default: {image_repo}/{image_tag} (both options provided above, with 
defaults)
+   EnvVar: DOCKER_SOLR_IMAGE_NAME
+   Gradle Property: -Pdocker.solr.imageName
+
+Building
+
+
+In order to build the Solr Docker image, run:
+
+gradlew docker
+
+The docker build task (`gradlew docker`) accepts the following inputs, in 
addition to the common inputs listed above:

Review comment:
   This is displayed as plain text, not markdown.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #2020: SOLR-14949: Ability to customize Solr Docker build

2020-10-23 Thread GitBox



HoustonPutman commented on a change in pull request #2020:
URL: https://github.com/apache/lucene-solr/pull/2020#discussion_r510994999



##
File path: solr/docker/tests/cases/version/test.sh
##
@@ -1,45 +0,0 @@
-#!/bin/bash

Review comment:
   Yeah, this is something that might be useful when we start creating 
release artifacts, but the test we want will probably look quite different. 
This is pretty shallow and doesn't actually provide a whole lot.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #2020: SOLR-14949: Ability to customize Solr Docker build

2020-10-23 Thread GitBox



HoustonPutman commented on a change in pull request #2020:
URL: https://github.com/apache/lucene-solr/pull/2020#discussion_r510995608



##
File path: .github/workflows/docker-test.yml
##
@@ -17,6 +17,10 @@ jobs:
 
 runs-on: ubuntu-latest
 
+env:
+  DOCKER_SOLR_IMAGE_REPO: github-pr/solr

Review comment:
   nah this doesn't actually push. It's just testing that the custom 
repo/tag works.
   
   I'm pretty sure it would fail if it tried, because github-pr isn't a docker 
hub repo.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #2020: SOLR-14949: Ability to customize Solr Docker build

2020-10-23 Thread GitBox



HoustonPutman commented on a change in pull request #2020:
URL: https://github.com/apache/lucene-solr/pull/2020#discussion_r510997092



##
File path: help/docker.txt
##
@@ -0,0 +1,53 @@
+Docker Images for Solr
+==
+
+Solr docker images are built using Palantir's Docker Gradle plugin, 
https://github.com/palantir/gradle-docker.
+
+Common Inputs
+-
+
+The docker image and it's tag can be customized via the following options, all 
accepted via both Environment Variables and Gradle Properties.
+
+Docker Image Repository:
+   Default: "apache/solr"
+   EnvVar: DOCKER_SOLR_IMAGE_REPO
+   Gradle Property: -Pdocker.solr.imageRepo

Review comment:
   I could go either way. 
   
   Say we add a prometheus exporter. Which would we prefer?
   - `docker.solr.imageRepo` and `docker.prometheusExporter.imageRepo`
   - `solr.docker.imageRepo` and `prometheusExporter.docker.imageRepo`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9581) Clarify discardCompoundToken behavior in the JapaneseTokenizer

2020-10-23 Thread Kazuaki Hiraga (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219808#comment-17219808
 ] 

Kazuaki Hiraga commented on LUCENE-9581:


Thank you for your input, [~jimczi].  My patch was just showing the super easy 
approach to fix the issue as a short term solution (and I have tried to 
remember the discussion of a bit confusable options which is a different story 
from this issue, though).

Why I have modified the the minimal length for the penalty is that that is a 
similar idea what we can specify the behavior of MeCab's unknown word 
processing for the known words (In MeCab, not only Kanji characters but also 
others can be targeted and can be configurable by a configuration file, 
though), and I think `>=` is better in some cases (but this can be created 
another discussion). Anyway, this is a different story and I think your 
approach is appropriate for resolving the issue. So, I agree with your approach.

 
{quote}I am also unsure that we should make discardCompoundToken true by 
default in Lucene 9
{quote}
As we have discussed in LUCENE-9123, we want to change the default behavior of 
the current search mode that the tokenization results will be the same with 
`discardCompoundToken=true`. 

If I understand correctly, the result of the discussion is that 1) search mode 
will not return the compound tokens along with the decomposed tokens in Lucene 
9 (Tokenizer won't return the compound tokens unless explicitly 
`discardCompoundToken=false` is specified), 2) merge the normal mode and search 
mode to only return the decomposed tokens, and remove the mode and related 
parameters in Lucene 10(?). Any opinions / suggestions ?

 

> Clarify discardCompoundToken behavior in the JapaneseTokenizer
> --
>
> Key: LUCENE-9581
> URL: https://issues.apache.org/jira/browse/LUCENE-9581
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9581.patch, LUCENE-9581.patch
>
>
> At first sight, the discardCompoundToken option added in LUCENE-9123 seems 
> redundant with the NORMAL mode of the Japanese tokenizer. When set to true, 
> the current behavior is to disable the decomposition for compounds, that's 
> exactly what the NORMAL mode does.
> So I wonder if the right semantic of the option would be to keep only the 
> decomposition of the compound or if it's really needed. If the goal is to 
> make the output compatible with a graph token filter, the current workaround 
> to set the mode to NORMAL should be enough.
> That's consistent with the mode that should be used to preserve positions in 
> the index since we don't handle position length on the indexing side. 
> Am I missing something regarding the new option ? Is there a compelling case 
> where it differs from the NORMAL mode ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #1993: .gitignore clean up

2020-10-23 Thread GitBox



msokolov commented on pull request #1993:
URL: https://github.com/apache/lucene-solr/pull/1993#issuecomment-715478148


   > @msokolov you added .#* -- what comment should I use in this file to 
explain what this is?
   It's more cruft emacs sometimes leaves behind _ I think in this case it's an 
autosave backup file left if you exited while editing. On second thought, we 
probably don't need to list this here - it shouldn't arise as a normal thing



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] samuelgmartinez opened a new pull request #2021: SOLR-14844: upgrade jetty to 9.4.32.v20200930

2020-10-23 Thread GitBox



samuelgmartinez opened a new pull request #2021:
URL: https://github.com/apache/lucene-solr/pull/2021


   
   
   
   # Description
   
   Upgrades Jetty to 9.4.32.v20200930 as described in JIRA ticket.
   
   # Solution
   
   After upgrading the compression related tests started to fail, so some of 
the broken unit tests were modified also. The reasons behind the broken unit 
tests are described in the original ticket.
   
   Also, created SOLR-14945 to track the need to improve HttpSolrClient 
compression handling to avoid problems like this in the future.
   
   # Tests
   
   BasicHttpSolrClientTest had to be modified in order to match Jetty's new 
behaviour for empty responses.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on pull request #2016: SOLR-14067 v2 Move Stateless Scripting Update Process to /contrib

2020-10-23 Thread GitBox



epugh commented on pull request #2016:
URL: https://github.com/apache/lucene-solr/pull/2016#issuecomment-715500720


   Okay, this PR is kind of "ready".   I've migrated the content out of the 
Cwiki page, and into the ref guide.   One thing I want to highlight is that I'd 
like to be able to easily demonstrate the power of the ScriptingUpdateProcessor 
with the Techproducts example, however to do that, I had to set the 
`enableStreamBody` to true to make it work.  I hope the fact that it is enabled 
when you do `bin/solr start -e techproducts` is okay.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14939) JSON facets: range faceting to support cache=false parameter

2020-10-23 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219868#comment-17219868
 ] 

Joel Bernstein commented on SOLR-14939:
---

Interesting, I would not have suspected that range queries would have this 
effect.

> JSON facets: range faceting to support cache=false parameter
> 
>
> Key: SOLR-14939
> URL: https://issues.apache.org/jira/browse/SOLR-14939
> Project: Solr
>  Issue Type: Bug
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The {{cache}} parameter, if set to {{false}}, is intended to support 
> non-caching of the search results: 
> [https://lucene.apache.org/solr/guide/8_6/common-query-parameters.html#cache-parameter]
> Based on inspection of
> {code:java}
> curl 
> "http://localhost:8983/solr/admin/metrics?prefix=CACHE.searcher.filterCache"; 
> {code}
> metrics before and after a search we can see that range JSON facet queries 
> currently do not support a {{cache=false}} parameter.
> Using the {{techproducts}} example collection as an illustration, if a 1 
> MONTH {{gap}} value is used then 12 {{filterCache}} entries are added for a 
> one year {{start/end}} time range.
> {code:java}
> curl "http://localhost:8983/solr/techproducts/query?q=*:*&rows=0&cache=false"; 
> -d 'json.facet={
>   manufacturedate_dt_ranges : {
> type : range,
> field : manufacturedate_dt,
> mincount : 1,
> gap : "%2B1MONTH",
> start : "2005-01-01T00:00:00.000Z",
> end   : "2005-12-31T23:59:59.999Z",
>   }
> }'
> {code}
> Similarly, if a 1 DAY {{gap}} value is used then 365 {{filterCache}} entries 
> are added for a one year {{start/end}} time range and if a 1 HOUR {{gap}} 
> value were to be used that would equate to 365 x 24 = 8,760 entries. This 
> means that a single search potentially displaces many or all existing 
> {{filterCache}} entries.
> This ticket proposes to support the {{cache}} parameter for JSON range facet 
> queries:
>  * the current and default behaviour would remain {{cache=true}} and
>  * via {{cache=false}} users would be able run an 'uncommon' search with many 
> range buckets without impact on the 'common' searches with fewer range 
> buckets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14939) JSON facets: range faceting to support cache=false parameter

2020-10-23 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219868#comment-17219868
 ] 

Joel Bernstein edited comment on SOLR-14939 at 10/23/20, 6:44 PM:
--

Interesting, I would not have suspected that range facets would have this 
effect.


was (Author: joel.bernstein):
Interesting, I would not have suspected that range queries would have this 
effect.

> JSON facets: range faceting to support cache=false parameter
> 
>
> Key: SOLR-14939
> URL: https://issues.apache.org/jira/browse/SOLR-14939
> Project: Solr
>  Issue Type: Bug
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The {{cache}} parameter, if set to {{false}}, is intended to support 
> non-caching of the search results: 
> [https://lucene.apache.org/solr/guide/8_6/common-query-parameters.html#cache-parameter]
> Based on inspection of
> {code:java}
> curl 
> "http://localhost:8983/solr/admin/metrics?prefix=CACHE.searcher.filterCache"; 
> {code}
> metrics before and after a search we can see that range JSON facet queries 
> currently do not support a {{cache=false}} parameter.
> Using the {{techproducts}} example collection as an illustration, if a 1 
> MONTH {{gap}} value is used then 12 {{filterCache}} entries are added for a 
> one year {{start/end}} time range.
> {code:java}
> curl "http://localhost:8983/solr/techproducts/query?q=*:*&rows=0&cache=false"; 
> -d 'json.facet={
>   manufacturedate_dt_ranges : {
> type : range,
> field : manufacturedate_dt,
> mincount : 1,
> gap : "%2B1MONTH",
> start : "2005-01-01T00:00:00.000Z",
> end   : "2005-12-31T23:59:59.999Z",
>   }
> }'
> {code}
> Similarly, if a 1 DAY {{gap}} value is used then 365 {{filterCache}} entries 
> are added for a one year {{start/end}} time range and if a 1 HOUR {{gap}} 
> value were to be used that would equate to 365 x 24 = 8,760 entries. This 
> means that a single search potentially displaces many or all existing 
> {{filterCache}} entries.
> This ticket proposes to support the {{cache}} parameter for JSON range facet 
> queries:
>  * the current and default behaviour would remain {{cache=true}} and
>  * via {{cache=false}} users would be able run an 'uncommon' search with many 
> range buckets without impact on the 'common' searches with fewer range 
> buckets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14067) Move StatelessScriptUpdateProcessor to a contrib

2020-10-23 Thread David Eric Pugh (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219869#comment-17219869
 ] 

David Eric Pugh commented on SOLR-14067:


[~tflobbe]do you have an example of where we have done the rename?   I like 
that idea, that way I can go through and use the cleaned up 
"ScriptingUpdateRequestProcessor" everywhere, and have it still be backwards 
compatible.

[~dsmiley] I like the idea this is only in 9, and not back porting it.   Should 
we update the wiki page to say it is moved in 9, and not touched in 8.x?  
https://cwiki.apache.org/confluence/display/SOLR/Deprecations



> Move StatelessScriptUpdateProcessor to a contrib
> 
>
> Key: SOLR-14067
> URL: https://issues.apache.org/jira/browse/SOLR-14067
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: David Eric Pugh
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Move server-side scripting out of core and into a new contrib.  This is 
> better for security.
> Former description:
> 
> We should eliminate all scripting capabilities within Solr. Let us start with 
> the StatelessScriptUpdateProcessor deprecation/removal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2018: LUCENE-9582: rename VectorValues.ScoreFunction to SearchStrategy

2020-10-23 Thread GitBox



jtibshirani commented on a change in pull request #2018:
URL: https://github.com/apache/lucene-solr/pull/2018#discussion_r511087483



##
File path: 
lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalQuery.java
##
@@ -99,7 +99,7 @@ public IntervalQuery(String field, IntervalsSource 
intervalsSource, float pivot,
   private IntervalQuery(String field, IntervalsSource intervalsSource, 
IntervalScoreFunction scoreFunction) {
 Objects.requireNonNull(field, "null field aren't accepted");
 Objects.requireNonNull(intervalsSource, "null intervalsSource aren't 
accepted");
-Objects.requireNonNull(scoreFunction, "null scoreFunction aren't 
accepted");
+Objects.requireNonNull(scoreFunction, "null searchStrategy aren't 
accepted");

Review comment:
   Just noticed this rename seems accidental.

##
File path: lucene/core/src/test/org/apache/lucene/util/TestVectorUtil.java
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+public class TestVectorUtil extends LuceneTestCase {
+
+  public void testBasicDotProduct() {
+assertEquals(5, VectorUtil.dotProduct(new float[]{1, 2, 3}, new 
float[]{-10, 0, 5}), 0);
+  }
+
+  public void testSelfDotProduct() {
+// the dot product of a vector with itself is equal to the sum of the 
squares of its components
+float[] v = randomVector();
+assertEquals(l2(v), VectorUtil.dotProduct(v, v), 1e-5);
+  }
+
+  public void testOrthogonalDotProduct() {
+// the dot product of two perpendicular vectors is 0
+float[] v = new float[2];
+v[0] = random().nextInt(100);
+v[1] = random().nextInt(100);
+float[] u = new float[2];
+u[0] = v[1];
+u[1] = -v[0];
+assertEquals(0, VectorUtil.dotProduct(u, v), 1e-5);
+  }
+
+  public void testSelfSquareSum() {
+// the l2 distance of a vector with itself is zero
+float[] v = randomVector();
+assertEquals(0, VectorUtil.squareSum(v, v), 1e-5);
+  }
+
+  public void testBasicSquareSum() {
+assertEquals(12, VectorUtil.squareSum(new float[]{1, 2, 3}, new 
float[]{-1, 0, 5}), 0);
+  }
+
+  public void testRandomSquareSum() {
+// the MSE of a vector with its inverse is equal to four times the sum of 
squares of its components

Review comment:
   Small comment: I don't think you mean MSE here since it's not a mean, it 
could be 'squared distance' ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14844) Upgrade Jetty to 9.4.32.v20200930

2020-10-23 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219892#comment-17219892
 ] 

Samuel García Martínez commented on SOLR-14844:
---

branch_8x pull request: https://github.com/apache/lucene-solr/pull/2003
master pull request: https://github.com/apache/lucene-solr/pull/2021

master pull request fails because, after upgrading Jetty, junit changes to 4.12 
for some reason, so the checksum fails on precommit. master is currently using 
4.13.1. I need some help to create the checksums for that dependency (and 
others that may changed also).

Also, I've opened SOLR-14945 to address the problems with the interceptors and 
refactoring SolrJ client to avoid this kind of issues in the future (relying on 
the HttpClient directly, instead of writing custom classes to handle 
compression and whatnot).

> Upgrade Jetty to 9.4.32.v20200930
> -
>
> Key: SOLR-14844
> URL: https://issues.apache.org/jira/browse/SOLR-14844
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.6
>Reporter: Cassandra Targett
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-14844-master.patch, SOLR-14884-8x.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A CVE was found in Jetty 9.4.27-9.4.29 that has some security scanning tools 
> raising red flags 
> ([https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-17638]).
> Here's the Jetty issue: 
> [https://bugs.eclipse.org/bugs/show_bug.cgi?id=564984]. It's fixed in 
> 9.4.30+, so we should upgrade to that for 8.7
> -It has a simple mitigation (raise Jetty's responseHeaderSize to higher than 
> requestHeaderSize), but I don't know how Solr uses Jetty well enough to a) 
> know if this problem is even exploitable in Solr, or b) if the workaround 
> suggested is even possible in Solr.-
> In normal Solr installs, w/o jetty optimizations, this issue is largely 
> mitigated in 8.6.3: see SOLR-14896 (and linked bug fixes) for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14844) Upgrade Jetty to 9.4.32.v20200930

2020-10-23 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219908#comment-17219908
 ] 

Erick Erickson commented on SOLR-14844:
---

I'll take a look today sometime, it can be a rat-hole to straighten out. Thanks!

> Upgrade Jetty to 9.4.32.v20200930
> -
>
> Key: SOLR-14844
> URL: https://issues.apache.org/jira/browse/SOLR-14844
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.6
>Reporter: Cassandra Targett
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-14844-master.patch, SOLR-14884-8x.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A CVE was found in Jetty 9.4.27-9.4.29 that has some security scanning tools 
> raising red flags 
> ([https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-17638]).
> Here's the Jetty issue: 
> [https://bugs.eclipse.org/bugs/show_bug.cgi?id=564984]. It's fixed in 
> 9.4.30+, so we should upgrade to that for 8.7
> -It has a simple mitigation (raise Jetty's responseHeaderSize to higher than 
> requestHeaderSize), but I don't know how Solr uses Jetty well enough to a) 
> know if this problem is even exploitable in Solr, or b) if the workaround 
> suggested is even possible in Solr.-
> In normal Solr installs, w/o jetty optimizations, this issue is largely 
> mitigated in 8.6.3: see SOLR-14896 (and linked bug fixes) for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14067) Move StatelessScriptUpdateProcessor to a contrib

2020-10-23 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219926#comment-17219926
 ] 

David Smiley commented on SOLR-14067:
-

BTW, RE XSLT, it'd clearly be another issue.
If the change is in 9.0 but not 8.x, no need to handle back-compat.  Users 
already need to know to use the package.

I don't see why this component needs to be mentioned at all on a page named 
"Deprecations".  Moving within the project is not a deprecation.  It does need 
to be mentioned on {{major-changes-in-solr-9.adoc}}.

> Move StatelessScriptUpdateProcessor to a contrib
> 
>
> Key: SOLR-14067
> URL: https://issues.apache.org/jira/browse/SOLR-14067
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: David Eric Pugh
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Move server-side scripting out of core and into a new contrib.  This is 
> better for security.
> Former description:
> 
> We should eliminate all scripting capabilities within Solr. Let us start with 
> the StatelessScriptUpdateProcessor deprecation/removal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14413) allow timeAllowed and cursorMark parameters

2020-10-23 Thread Yevhen Tienkaiev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219927#comment-17219927
 ] 

Yevhen Tienkaiev commented on SOLR-14413:
-

Hello, is there any status on this? This is pretty critical, please someone can 
push this forward?

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Priority: Minor
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413.patch, 
> image-2020-08-18-16-56-41-736.png, image-2020-08-18-16-56-59-178.png, 
> image-2020-08-21-14-18-36-229.png, timeallowed_cursormarks_results.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on pull request #1993: .gitignore clean up

2020-10-23 Thread GitBox



dsmiley commented on pull request #1993:
URL: https://github.com/apache/lucene-solr/pull/1993#issuecomment-715572444


   Finally; I think it's ready.  Much simpler file than before and more 
organized.  Outdated items from the 8x branch are gone.
   I'll merge Monday unless I get an approving review sooner.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on pull request #2018: LUCENE-9582: rename VectorValues.ScoreFunction to SearchStrategy

2020-10-23 Thread GitBox



msokolov commented on pull request #2018:
URL: https://github.com/apache/lucene-solr/pull/2018#issuecomment-715591773


   thanks for the review! I'll fix when merging



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #2018: LUCENE-9582: rename VectorValues.ScoreFunction to SearchStrategy

2020-10-23 Thread GitBox



msokolov merged pull request #2018:
URL: https://github.com/apache/lucene-solr/pull/2018


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9582) Rename VectorValues.ScoreFunction to SearchStrategy

2020-10-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219948#comment-17219948
 ] 

ASF subversion and git services commented on LUCENE-9582:
-

Commit 840a353bc7062c1ab8fc0ab7ebeaa68ccf97fac1 in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=840a353 ]

LUCENE-9582: rename VectorValues.ScoreFunction to SearchStrategy (#2018)

Co-authored-by: Julie Tibshirani

> Rename VectorValues.ScoreFunction  to SearchStrategy
> 
>
> Key: LUCENE-9582
> URL: https://issues.apache.org/jira/browse/LUCENE-9582
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is an issue to apply some of the feedback from LUCENE-9322 that came 
> after it was pushed; we want to:
> 1. rename VectorValues.ScoreFunction -> SearchStrategy (and all of the 
> references to that terminology), and make it a simple enum with no 
> implementation
> 2. rename the strategies to indicate the ANN implementation that backs them, 
> so we can represent more than one such implementation/algorithm.
> 3. Move scoring implementation to a utility class
> I'll open a separate issue for exploring how to hide the 
> VectorValues.RandomAccess  API, which is probably specific to HNSW
> FYI [~jtibshirani] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov opened a new pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-10-23 Thread GitBox



msokolov opened a new pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022


   Phew this has been a long time coming, but I think it is in good shape now. 
We started with a scratchy prototype about a year ago, then @mocobeta got it on 
a better footing by adding a new codec and also implemented the full 
hierarchical algorithm, making the graph search faithful to the published 
literature. Then we took a step back to add underlying vector format as a 
separate patch, now landed. This patch builds on the new vector format, 
providing KNN search with NSW graphs. It's the simplest implementation I could 
tease out (single layer graph, simple neighbor selection, no max fanout 
control), but I think it will be a good foundation. I've done some pretty 
extensive performance testing and hyperparameter exploration using the 
(included) KnnGraphTester with some proprietary data, and get good results. I 
will follow up later with specifics, but single-threaded latencies in a few ms 
on my i7 laptop over a 1M x 256-dim dataset seems pretty good. Followups will 
include repeatable 
 benchmarks on public datasets.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on pull request #2004: SOLR-14942: Reduce leader election time on node shutdown

2020-10-23 Thread GitBox



madrob commented on pull request #2004:
URL: https://github.com/apache/lucene-solr/pull/2004#issuecomment-715622888


   Please add a CHANGES entry, and credit @CaoManhDat as well, if this work was 
based on initial work done by him.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14067) Move StatelessScriptUpdateProcessor to a contrib

2020-10-23 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219962#comment-17219962
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14067:
--

bq. Tomas Eduardo Fernandez Lobbe do you have an example of where we have done 
the rename? I like that idea, that way I can go through and use the cleaned up 
"ScriptingUpdateRequestProcessor" everywhere, and have it still be backwards 
compatible.
I'm sure this has been done in other places, but see for example the SolrServer 
-> SolrClient rename: https://issues.apache.org/jira/browse/SOLR-6895

> Move StatelessScriptUpdateProcessor to a contrib
> 
>
> Key: SOLR-14067
> URL: https://issues.apache.org/jira/browse/SOLR-14067
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: David Eric Pugh
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Move server-side scripting out of core and into a new contrib.  This is 
> better for security.
> Former description:
> 
> We should eliminate all scripting capabilities within Solr. Let us start with 
> the StatelessScriptUpdateProcessor deprecation/removal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14067) Move StatelessScriptUpdateProcessor to a contrib

2020-10-23 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219964#comment-17219964
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14067:
--

Here is another, more recent, example: 
https://github.com/apache/lucene-solr/commit/0836ea5/

> Move StatelessScriptUpdateProcessor to a contrib
> 
>
> Key: SOLR-14067
> URL: https://issues.apache.org/jira/browse/SOLR-14067
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: David Eric Pugh
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Move server-side scripting out of core and into a new contrib.  This is 
> better for security.
> Former description:
> 
> We should eliminate all scripting capabilities within Solr. Let us start with 
> the StatelessScriptUpdateProcessor deprecation/removal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14354) HttpShardHandler send requests in async

2020-10-23 Thread Varun Thacker (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-14354:
-
Attachment: image-2020-10-23-16-45-21-789.png

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png, 
> image-2020-10-23-16-45-20-034.png, image-2020-10-23-16-45-21-789.png, 
> image-2020-10-23-16-45-37-628.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wait.
> By using this one, the model of HttpShardHandler can be written into 
> something like this
> {code:java}
> handler.sendReq(req, (is) -> {
>   executor.submit(() ->
> try (is) {
>   // Read the content from InputStream
> }
>   )
> }) {code}
>  The first diagram will be changed into this
> !image-2020-03-23-10-09-10-221.png!
> Notice that although “sending req to shard1” is wide, it won’t take long time 
> since sending req is a very quick operation. With this operation, handling 
> threads won’t be spin up until first bytes are sent back. Notice that in this 
> approach we still have active threads waiting for more data from InputStream
> h2. 4. Solution 2: Buffering data and handle it inside jetty’s thread.
> Jetty have another listener called BufferingResponseListener. This is how it 
> is get used
> {code:java}
> client.newRequest(...).send(new BufferingResponseListener() {
>   public void onComplete(Result result) {
>

[jira] [Updated] (SOLR-14354) HttpShardHandler send requests in async

2020-10-23 Thread Varun Thacker (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-14354:
-
Attachment: image-2020-10-23-16-45-37-628.png

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png, 
> image-2020-10-23-16-45-20-034.png, image-2020-10-23-16-45-21-789.png, 
> image-2020-10-23-16-45-37-628.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wait.
> By using this one, the model of HttpShardHandler can be written into 
> something like this
> {code:java}
> handler.sendReq(req, (is) -> {
>   executor.submit(() ->
> try (is) {
>   // Read the content from InputStream
> }
>   )
> }) {code}
>  The first diagram will be changed into this
> !image-2020-03-23-10-09-10-221.png!
> Notice that although “sending req to shard1” is wide, it won’t take long time 
> since sending req is a very quick operation. With this operation, handling 
> threads won’t be spin up until first bytes are sent back. Notice that in this 
> approach we still have active threads waiting for more data from InputStream
> h2. 4. Solution 2: Buffering data and handle it inside jetty’s thread.
> Jetty have another listener called BufferingResponseListener. This is how it 
> is get used
> {code:java}
> client.newRequest(...).send(new BufferingResponseListener() {
>   public void onComplete(Result result) {
>

[jira] [Updated] (SOLR-14354) HttpShardHandler send requests in async

2020-10-23 Thread Varun Thacker (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-14354:
-
Attachment: image-2020-10-23-16-45-20-034.png

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png, 
> image-2020-10-23-16-45-20-034.png, image-2020-10-23-16-45-21-789.png, 
> image-2020-10-23-16-45-37-628.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wait.
> By using this one, the model of HttpShardHandler can be written into 
> something like this
> {code:java}
> handler.sendReq(req, (is) -> {
>   executor.submit(() ->
> try (is) {
>   // Read the content from InputStream
> }
>   )
> }) {code}
>  The first diagram will be changed into this
> !image-2020-03-23-10-09-10-221.png!
> Notice that although “sending req to shard1” is wide, it won’t take long time 
> since sending req is a very quick operation. With this operation, handling 
> threads won’t be spin up until first bytes are sent back. Notice that in this 
> approach we still have active threads waiting for more data from InputStream
> h2. 4. Solution 2: Buffering data and handle it inside jetty’s thread.
> Jetty have another listener called BufferingResponseListener. This is how it 
> is get used
> {code:java}
> client.newRequest(...).send(new BufferingResponseListener() {
>   public void onComplete(Result result) {
>

[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async

2020-10-23 Thread Varun Thacker (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219975#comment-17219975
 ] 

Varun Thacker commented on SOLR-14354:
--

Whenever we've taken flamegraphs ( 
[https://github.com/jvm-profiling-tools/async-profiler] with -e wall ) of 
production solr clusters, HttpShardHandler has taken significant amount of wall 
clock time

!image-2020-10-23-16-45-37-628.png!

 

I would have been really curious to find out the performance implications of 
this change on the cluster. Perhaps in a months timeframe I can try to apply 
the patch on top of 8.7 ( we'll first have to upgrade to 8.7 ) and then report 
back with some real numbers.

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png, 
> image-2020-10-23-16-45-20-034.png, image-2020-10-23-16-45-21-789.png, 
> image-2020-10-23-16-45-37-628.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wait.
> By using this one, the model of HttpShardHandler can be written into 
> something like this
> {code:java}
> handler.sendReq(req, (is) -> {
>   executor.submit(() ->
> try (is) {
>   // Read the content from InputStream
> }
>   )
> }) {code}
>  The first diagram will be changed into this
> !image-2020-03-23-10-09-10-221.png!
> Notice that although “sending req to shard1” is wide, it won’t take long time 
> since sending

[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-10-23 Thread GitBox



muse-dev[bot] commented on a change in pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r511202225



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorReader.java
##
@@ -165,42 +191,88 @@ public VectorValues getVectorValues(String field) throws 
IOException {
 return new OffHeapVectorValues(fieldEntry, bytesSlice);
   }
 
+  // exposed for testing
+  public KnnGraphValues getGraphValues(String field) throws IOException {
+FieldInfo info = fieldInfos.fieldInfo(field);
+if (info == null) {
+  throw new IllegalArgumentException("No such field '" + field + "'");
+}
+FieldEntry entry = fields.get(field);
+if (entry != null && entry.indexDataLength > 0) {
+  return getGraphValues(entry);
+} else {
+  return KnnGraphValues.EMPTY;
+}
+  }
+
+  private KnnGraphValues getGraphValues(FieldEntry entry) throws IOException {
+if (isHnswStrategy(entry.searchStrategy)) {
+  HnswGraphFieldEntry graphEntry = (HnswGraphFieldEntry) entry;
+  IndexInput bytesSlice = vectorIndex.slice("graph-data", 
entry.indexDataOffset, entry.indexDataLength);
+  return new IndexedKnnGraphReader(graphEntry, bytesSlice);
+} else {
+  return KnnGraphValues.EMPTY;
+}
+  }
+
   @Override
   public void close() throws IOException {
-vectorData.close();
+IOUtils.close(vectorData, vectorIndex);
   }
 
   private static class FieldEntry {
 
 final int dimension;
 final VectorValues.SearchStrategy searchStrategy;
-final int maxDoc;
 
 final long vectorDataOffset;
 final long vectorDataLength;
+final long indexDataOffset;
+final long indexDataLength;
 final int[] ordToDoc;
 
-FieldEntry(int dimension, VectorValues.SearchStrategy searchStrategy, int 
maxDoc,
-   long vectorDataOffset, long vectorDataLength, int[] ordToDoc) {
-  this.dimension = dimension;
+FieldEntry(DataInput input, VectorValues.SearchStrategy searchStrategy) 
throws IOException {
   this.searchStrategy = searchStrategy;
-  this.maxDoc = maxDoc;
-  this.vectorDataOffset = vectorDataOffset;
-  this.vectorDataLength = vectorDataLength;
-  this.ordToDoc = ordToDoc;
+  vectorDataOffset = input.readVLong();
+  vectorDataLength = input.readVLong();
+  indexDataOffset = input.readVLong();
+  indexDataLength = input.readVLong();
+  dimension = input.readInt();
+  int size = input.readInt();
+  ordToDoc = new int[size];
+  for (int i = 0; i < size; i++) {
+int doc = input.readVInt();
+ordToDoc[i] = doc;
+  }
 }
 
 int size() {
   return ordToDoc.length;
 }
   }
 
+  private static class HnswGraphFieldEntry extends FieldEntry {
+
+final long[] ordOffsets;
+
+HnswGraphFieldEntry(DataInput input, VectorValues.SearchStrategy 
searchStrategy) throws IOException {
+  super(input, searchStrategy);
+  ordOffsets = new long[size()];
+  long offset = 0;
+  for (int i = 0; i < ordOffsets.length; i++) {
+offset += input.readVLong();
+ordOffsets[i] = offset;
+  }
+}
+  }
+
   /** Read the vector values from the index input. This supports both iterated 
and random access. */
-  private final static class OffHeapVectorValues extends VectorValues {
+  private final class OffHeapVectorValues extends VectorValues {
 
 final FieldEntry fieldEntry;
 final IndexInput dataIn;
 
+final Random random = new Random();

Review comment:
   *PREDICTABLE_RANDOM:*  This random generator (java.util.Random) is 
predictable 
[(details)](https://find-sec-bugs.github.io/bugs.htm#PREDICTABLE_RANDOM)

##
File path: 
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java
##
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.hnsw;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Random;
+
+import org.apache.lucene.index.KnnGraphValues;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Builder for HNSW

[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async

2020-10-23 Thread Mark Robert Miller (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219994#comment-17219994
 ] 

Mark Robert Miller commented on SOLR-14354:
---

It’s likely a decent improvement, the issue is really just throwing in the 
change without reasonable verification of what it results in practically (eg 
http2 client is not very good at connection reuse right now, large number of 
shards and high request rate should be big benefactor, but what are the 
implications for few shards, reasonable request rate).

I’m pretty into the idea that it’s a good move with lots of benefits, but it 
requires some pretty rigorous testing to make this kind of change.

I have the benchmarks to check things in a pretty comprehensive way, eventually 
I’ll turn some of them to master and can check this - I was planning on that 
when I first commented here - but then I realized it needed further work with 
http2 impl and config at a minimum and I was not going to do that work on 
master and the results could still even be better but also push a path I 
wouldn’t agree is ready, so not so valuable  

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png, 
> image-2020-10-23-16-45-20-034.png, image-2020-10-23-16-45-21-789.png, 
> image-2020-10-23-16-45-37-628.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wa

[jira] [Updated] (SOLR-14413) allow timeAllowed and cursorMark parameters

2020-10-23 Thread John Gallagher (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Gallagher updated SOLR-14413:
--
Attachment: Screen Shot 2020-10-23 at 10.08.26 PM.png

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Priority: Minor
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413.patch, Screen Shot 2020-10-23 at 
> 10.08.26 PM.png, image-2020-08-18-16-56-41-736.png, 
> image-2020-08-18-16-56-59-178.png, image-2020-08-21-14-18-36-229.png, 
> timeallowed_cursormarks_results.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14413) allow timeAllowed and cursorMark parameters

2020-10-23 Thread John Gallagher (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Gallagher updated SOLR-14413:
--
Attachment: Screen Shot 2020-10-23 at 10.09.11 PM.png

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Priority: Minor
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413.patch, Screen Shot 2020-10-23 at 
> 10.08.26 PM.png, Screen Shot 2020-10-23 at 10.09.11 PM.png, 
> image-2020-08-18-16-56-41-736.png, image-2020-08-18-16-56-59-178.png, 
> image-2020-08-21-14-18-36-229.png, timeallowed_cursormarks_results.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14413) allow timeAllowed and cursorMark parameters

2020-10-23 Thread John Gallagher (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Gallagher updated SOLR-14413:
--
Attachment: SOLR-14413-jg-update3.patch

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Priority: Minor
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413-jg-update3.patch, SOLR-14413.patch, 
> Screen Shot 2020-10-23 at 10.08.26 PM.png, Screen Shot 2020-10-23 at 10.09.11 
> PM.png, image-2020-08-18-16-56-41-736.png, image-2020-08-18-16-56-59-178.png, 
> image-2020-08-21-14-18-36-229.png, timeallowed_cursormarks_results.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14413) allow timeAllowed and cursorMark parameters

2020-10-23 Thread John Gallagher (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220001#comment-17220001
 ] 

John Gallagher commented on SOLR-14413:
---

[~bvd] the issue was with an assumption that the test was making. 

It is not always true that every document will be found when using timeAllowed 
and cursorMark in combination.  There may be holes in the result sets, but at 
least the ordering between and within result sets will be correct with respect 
to the sort.

This is what I had suspected was the case when I proposed allowing the 
combination, but I didn't have an example at the time.  I think it is still a 
good idea to allow these parameters in combination (its something that you 
could encounter when using shards.tolerant and cursorMark in combination, and 
that combination is allowed).

When using timeAllowed and cursorMark in combination, and there are multiple 
segments in the index, it is possible that a query may terminate before 
visiting the matching documents in every segment.  The hint for this is in the 
warning message's stack trace associated with the failing seed you found in the 
previous revision: 
[https://gist.github.com/slackhappy/1a48d56e10679404cea3441f87a0fecc#file-gistfile1-txt-L6]
 .  "The request took too long to iterate over terms."  occurs while in a 
specific segment, which prevents iterating on to the next segment.

I have updated my pull request: 
[https://github.com/apache/lucene-solr/pull/1436]

And I have updated my proposed documentation changes to include a mention that 
results may be missing if partialResults is true:

 

!Screen Shot 2020-10-23 at 10.08.26 PM.png|width=545,height=114!

 

!Screen Shot 2020-10-23 at 10.09.11 PM.png|width=577,height=161!

 

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Priority: Minor
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413.patch, Screen Shot 2020-10-23 at 
> 10.08.26 PM.png, Screen Shot 2020-10-23 at 10.09.11 PM.png, 
> image-2020-08-18-16-56-41-736.png, image-2020-08-18-16-56-59-178.png, 
> image-2020-08-21-14-18-36-229.png, timeallowed_cursormarks_results.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

53 matches

Mail list logo