[GitHub] [lucene-solr] markharwood commented on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
markharwood commented on issue #1234: Add compression for Binary doc value 
fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583313015
 
 
   I've reclaimed my Jira log-in and opened 
https://issues.apache.org/jira/browse/LUCENE-9211


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek opened a new pull request #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton

2020-02-07 Thread GitBox
romseygeek opened a new pull request #1243: LUCENE-9212: Intervals.multiterm() 
should take CompiledAutomaton
URL: https://github.com/apache/lucene-solr/pull/1243
 
 
   Currently it takes `Automaton` and then compiles it internally, but we need 
to do things
   like check for binary-vs-unicode status; it should just take 
`CompiledAutomaton` instead,
   and put responsibility for determinization, binaryness, etc, on the caller.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood commented on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
markharwood commented on issue #1234: Add compression for Binary doc value 
fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583449275
 
 
   There was a suggestion from @jimczi that we fall back to writing raw data if 
content doesn't compress well. I'm not sure this logic is worth developing for 
the reasons outlined below:
   
   I wrote a [compression 
buffer](https://gist.github.com/markharwood/91cc8d96d6611ad97df11f244b1b1d0f) 
to see what the compression algo outputs before deciding whether to write the 
compressed or  raw data to disk.
   I tested with the most uncompressible content I could imagine:
   
   public static void fillRandom(byte[] buffer, int length) {
   for (int i = 0; i < length; i++) {
   buffer[i] =  (byte) (Math.random() * Byte.MAX_VALUE);
   }
   } 
   
   The LZ4 compressed versions of this content were only marginally bigger than 
their raw counterparts (adding 0.4% overhead to the original content e.g. 
96,921 compressed vs 96,541 raw bytes).
   On that basis I'm not sure if it's worth doubling the memory costs of the 
indexing logic (we would require a temporary output buffer that is at least the 
same size as the raw data being compressed) and additional byte shuffling.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376459427
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/schema/TextField.java
 ##
 @@ -43,6 +43,7 @@
 public class TextField extends FieldType {
   protected boolean autoGeneratePhraseQueries;
   protected boolean enableGraphQueries;
+  protected boolean synonymBoostByPayload;
 
 Review comment:
   I thought we switched the approach from a payload to boost attribute?  
Besides; it's not clear we need this toggle at all since the user could arrange 
for this behavior simply by having the new DelimitedBoost filter thing in the 
chain.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376455962
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java
 ##
 @@ -509,33 +549,40 @@ protected Query analyzeGraphBoolean(String field, 
TokenStream source, BooleanCla
 end = articulationPoints[i];
   }
   lastState = end;
-  final Query queryPos;
+  final Query positionalQuery;
   if (graph.hasSidePath(start)) {
-final Iterator it = graph.getFiniteStrings(start, end);
+final Iterator sidePathsIterator = 
graph.getFiniteStrings(start, end);
 Iterator queries = new Iterator() {
   @Override
   public boolean hasNext() {
-return it.hasNext();
+return sidePathsIterator.hasNext();
   }
 
   @Override
   public Query next() {
-TokenStream ts = it.next();
-return createFieldQuery(ts, BooleanClause.Occur.MUST, field, 
getAutoGenerateMultiTermSynonymsPhraseQuery(), 0);
+TokenStream sidePath = sidePathsIterator.next();
+return createFieldQuery(sidePath, BooleanClause.Occur.MUST, field, 
getAutoGenerateMultiTermSynonymsPhraseQuery(), 0);
   }
 };
-queryPos = newGraphSynonymQuery(queries);
+positionalQuery = newGraphSynonymQuery(queries);
   } else {
-Term[] terms = graph.getTerms(field, start);
+List attributes = graph.getTerms(start);
 
 Review comment:
   I think I mentioned a List of AttributeSource is weird (I've never seen 
this) and it's heavyweight.  Why not a TokenStream or TermAndBoost[] ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376450137
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.boost;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.search.BoostAttribute;
+
+import java.io.IOException;
+
+
+/**
+ * Characters before the delimiter are the "token", those after are the boost.
+ * 
+ * For example, if the delimiter is '|', then for the string "foo|0.7", foo is 
the token
+ * and 0.7 is the boost.
+ * 
+ * Note make sure your Tokenizer doesn't split on the delimiter, or this won't 
work
+ */
+public final class DelimitedBoostTokenFilter extends TokenFilter {
+  private final char delimiter;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class);
+
+  public DelimitedBoostTokenFilter(TokenStream input, char delimiter) {
+super(input);
+this.delimiter = delimiter;
+  }
+
+  @Override
+  public boolean incrementToken() throws IOException {
+if (input.incrementToken()) {
+  final char[] buffer = termAtt.buffer();
+  final int length = termAtt.length();
+  for (int i = 0; i < length; i++) {
+if (buffer[i] == delimiter) {
+  float boost = Float.parseFloat(new String(buffer, i + 1, (length - 
(i + 1;
+  boostAtt.setBoost(boost);
+  termAtt.setLength(i);
+  return true;
+}
+  }
+  // we have not seen the delimiter
+  boostAtt.setBoost(1.0f);
 
 Review comment:
   Shouldn't be needed; leave the boost be -- defaults to 1.0 any way.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376460611
 
 

 ##
 File path: solr/core/src/test-files/solr/collection1/conf/schema12.xml
 ##
 @@ -238,6 +227,18 @@
 
   
 
+  
 
 Review comment:
   You can remove "payload" everywhere from this PR now; no?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost 
by payload 
URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-583474019
 
 
   hi @romseygeek , @dsmiley ,
   first of all, thank you again for your patience and very useful insights.
   I have incorporated Alan's changes and cleaned everything up.
   
   My un-resolved questions:
   - boostAttribute doesn’t use BytesRef but directly float, is it a concern? 
We are expected to use it at query time, so we could actually see a query time 
minimal benefit in not encoding/decoding?
   - Alan expressed concerns over SpanBoostQuery, mentioning they are sort of 
broken, what should we do in that regard? right now the create span query seems 
to work as expected with boosted synonyms(see the related test), I suspect if 
SpanBoostQuery are broken , they should get resolved in another ticket?
   - from an original comment in the test code 
org.apache.solr.search.TestSolrQueryParser#testSynonymQueryStyle:
   "confirm autoGeneratePhraseQueries always builds OR queries"
   I changed that, was there any reason for that behaviour?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376474333
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java
 ##
 @@ -509,33 +549,40 @@ protected Query analyzeGraphBoolean(String field, 
TokenStream source, BooleanCla
 end = articulationPoints[i];
   }
   lastState = end;
-  final Query queryPos;
+  final Query positionalQuery;
   if (graph.hasSidePath(start)) {
-final Iterator it = graph.getFiniteStrings(start, end);
+final Iterator sidePathsIterator = 
graph.getFiniteStrings(start, end);
 Iterator queries = new Iterator() {
   @Override
   public boolean hasNext() {
-return it.hasNext();
+return sidePathsIterator.hasNext();
   }
 
   @Override
   public Query next() {
-TokenStream ts = it.next();
-return createFieldQuery(ts, BooleanClause.Occur.MUST, field, 
getAutoGenerateMultiTermSynonymsPhraseQuery(), 0);
+TokenStream sidePath = sidePathsIterator.next();
+return createFieldQuery(sidePath, BooleanClause.Occur.MUST, field, 
getAutoGenerateMultiTermSynonymsPhraseQuery(), 0);
   }
 };
-queryPos = newGraphSynonymQuery(queries);
+positionalQuery = newGraphSynonymQuery(queries);
   } else {
-Term[] terms = graph.getTerms(field, start);
+List attributes = graph.getTerms(start);
 
 Review comment:
   This is what GraphTokenStreamFiniteStrings returns currently, for multiple 
tokens at the same position.  Maybe `TermAndBoost[]` would make more sense 
though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376473778
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java
 ##
 @@ -450,9 +485,13 @@ protected Query analyzePhrase(String field, TokenStream 
stream, int slop) throws
 position += 1;
   }
   builder.add(new Term(field, termAtt.getBytesRef()), position);
+  phraseBoost = boostAtt.getBoost();
 
 Review comment:
   I think this isn't quite right, because we need to combine boosts together 
somehow; currently your phrase boost is just the boost of the last term in the 
phrase.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376476198
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/schema/TextField.java
 ##
 @@ -43,6 +43,7 @@
 public class TextField extends FieldType {
   protected boolean autoGeneratePhraseQueries;
   protected boolean enableGraphQueries;
+  protected boolean synonymBoostByPayload;
 
 Review comment:
   agreed and fixed!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376476976
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.boost;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.search.BoostAttribute;
+
+import java.io.IOException;
+
+
+/**
+ * Characters before the delimiter are the "token", those after are the boost.
+ * 
+ * For example, if the delimiter is '|', then for the string "foo|0.7", foo is 
the token
+ * and 0.7 is the boost.
+ * 
+ * Note make sure your Tokenizer doesn't split on the delimiter, or this won't 
work
+ */
+public final class DelimitedBoostTokenFilter extends TokenFilter {
+  private final char delimiter;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class);
+
+  public DelimitedBoostTokenFilter(TokenStream input, char delimiter) {
+super(input);
+this.delimiter = delimiter;
+  }
+
+  @Override
+  public boolean incrementToken() throws IOException {
+if (input.incrementToken()) {
+  final char[] buffer = termAtt.buffer();
+  final int length = termAtt.length();
+  for (int i = 0; i < length; i++) {
+if (buffer[i] == delimiter) {
+  float boost = Float.parseFloat(new String(buffer, i + 1, (length - 
(i + 1;
+  boostAtt.setBoost(boost);
+  termAtt.setLength(i);
+  return true;
+}
+  }
+  // we have not seen the delimiter
+  boostAtt.setBoost(1.0f);
 
 Review comment:
   Fixed in the next coming commit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376478661
 
 

 ##
 File path: solr/core/src/test-files/solr/collection1/conf/schema12.xml
 ##
 @@ -238,6 +227,18 @@
 
   
 
+  
 
 Review comment:
   Fixed in the next coming commit!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376503587
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java
 ##
 @@ -509,33 +549,40 @@ protected Query analyzeGraphBoolean(String field, 
TokenStream source, BooleanCla
 end = articulationPoints[i];
   }
   lastState = end;
-  final Query queryPos;
+  final Query positionalQuery;
   if (graph.hasSidePath(start)) {
-final Iterator it = graph.getFiniteStrings(start, end);
+final Iterator sidePathsIterator = 
graph.getFiniteStrings(start, end);
 Iterator queries = new Iterator() {
   @Override
   public boolean hasNext() {
-return it.hasNext();
+return sidePathsIterator.hasNext();
   }
 
   @Override
   public Query next() {
-TokenStream ts = it.next();
-return createFieldQuery(ts, BooleanClause.Occur.MUST, field, 
getAutoGenerateMultiTermSynonymsPhraseQuery(), 0);
+TokenStream sidePath = sidePathsIterator.next();
+return createFieldQuery(sidePath, BooleanClause.Occur.MUST, field, 
getAutoGenerateMultiTermSynonymsPhraseQuery(), 0);
   }
 };
-queryPos = newGraphSynonymQuery(queries);
+positionalQuery = newGraphSynonymQuery(queries);
   } else {
-Term[] terms = graph.getTerms(field, start);
+List attributes = graph.getTerms(start);
 
 Review comment:
   a tentative change is coming in the next commit, I added also few tests to 
cover that else coding branch


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376513280
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java
 ##
 @@ -450,9 +485,13 @@ protected Query analyzePhrase(String field, TokenStream 
stream, int slop) throws
 position += 1;
   }
   builder.add(new Term(field, termAtt.getBytesRef()), position);
+  phraseBoost = boostAtt.getBoost();
 
 Review comment:
   I implemented a simple multiplicative boost.
   It's back compatible with the designed use case (multi term synonym -> 
single concept -> single boost -> e.g. panthera onca => jaguar|0.95, big 
cat|0.85, black panther|0.65))
   
   But it's also compatible in not synonym cases, if the user needs a boost per 
token in phrase and span queries.
   It's in the upcoming commit, let me know if you believe something different 
is necessary


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost by payload

2020-02-07 Thread GitBox
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost 
by payload 
URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-583518344
 
 
   I have applied the changes to solve the feedback points and consequentially 
added additional tests to cover some missing scenario.
   We should be almost ready to go :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
msokolov commented on issue #1234: Add compression for Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583519622
 
 
   > The LZ4 compressed versions of this content were only marginally bigger 
than their raw counterparts 
   Did you also test read performance in this incompressible case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob opened a new pull request #1244: SOLR-14247 Remove unneeded sleeps

2020-02-07 Thread GitBox
madrob opened a new pull request #1244: SOLR-14247 Remove unneeded sleeps
URL: https://github.com/apache/lucene-solr/pull/1244
 
 
   This test is slow because it sleeps a lot. Removing the sleeps, it still 
passes consistently on my machine, but I would like other folks to confirm this 
on their different hardware as well.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [ ] ~I have added tests for my changes.~
   - [ ] ~I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).~
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov edited a comment on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
msokolov edited a comment on issue #1234: Add compression for Binary doc value 
fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583519622
 
 
   > The LZ4 compressed versions of this content were only marginally bigger 
than their raw counterparts 
   
   Did you also test read performance in this incompressible case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
jpountz commented on issue #1234: Add compression for Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583529199
 
 
   In the case of content that can't be compressed, the compressed data will 
consist of the number of bytes, followed by the bytes. So decompressing 
consists of decoding the length and then reading the bytes. The only overhead 
compared to reading bytes directly is the decoding of the number of bytes, so I 
would believe that the overhead is rather small.
   
   I don't have a strong preference regarding whether this case should be 
handled explicitly or not. It's true that not having a special "not-compressed" 
case helps keep the logic simpler.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood commented on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
markharwood commented on issue #1234: Add compression for Binary doc value 
fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583529462
 
 
   >Did you also test read performance in this incompressible case?
   
   Just tried it and it does look 4x faster reading raw random bytes Vs 
compressed random bytes
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on issue #1236: Add back assertions removed by LUCENE-9187.

2020-02-07 Thread GitBox
rmuir commented on issue #1236: Add back assertions removed by LUCENE-9187.
URL: https://github.com/apache/lucene-solr/pull/1236#issuecomment-583534489
 
 
   +1, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
jpountz commented on a change in pull request #1234: Add compression for Binary 
doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376528169
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 ##
 @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException {
 };
   }
 }
+  }  
+  
+  // Decompresses blocks of binary values to retrieve content
+  class BinaryDecoder {
+
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private int []uncompressedDocEnds = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK];
+private int uncompressedBlockLength = 0;
+private int numDocsInBlock = 0;
+private final byte[] uncompressedBlock;
+private final BytesRef uncompressedBytesRef;
+
+public BinaryDecoder(LongValues addresses, IndexInput compressedData, int 
biggestUncompressedBlockSize) {
+  super();
+  this.addresses = addresses;
+  this.compressedData = compressedData;
+  // pre-allocate a byte array large enough for the biggest uncompressed 
block needed.
+  this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
+  uncompressedBytesRef = new BytesRef(uncompressedBlock);
+  
+}
+
+BytesRef decode(int docNumber) throws IOException {
+  int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; 
+  int docInBlockId = docNumber % 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+  assert docInBlockId < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+  
+  
+  // already read and uncompressed?
+  if (blockId != lastBlockId) {
+lastBlockId = blockId;
+long blockStartOffset = addresses.get(blockId);
+compressedData.seek(blockStartOffset);
+
+numDocsInBlock = compressedData.readVInt();
 
 Review comment:
   do we really need to record the number of documents in the block? It should 
be 32 for all blocks except for the last one? Maybe at index-time we could 
append dummy values to the last block to make sure it has 32 values too, and we 
wouldn't need this vInt anymore?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
jpountz commented on a change in pull request #1234: Add compression for Binary 
doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376531952
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 ##
 @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException {
 };
   }
 }
+  }  
+  
+  // Decompresses blocks of binary values to retrieve content
+  class BinaryDecoder {
+
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private int []uncompressedDocEnds = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK];
 
 Review comment:
   in the past we've put these constants in the meta file and BinaryEntry so 
that it's easier to change values over time


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
jpountz commented on a change in pull request #1234: Add compression for Binary 
doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376527753
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 ##
 @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException {
 };
   }
 }
+  }  
+  
+  // Decompresses blocks of binary values to retrieve content
+  class BinaryDecoder {
+
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private int []uncompressedDocEnds = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK];
+private int uncompressedBlockLength = 0;
+private int numDocsInBlock = 0;
+private final byte[] uncompressedBlock;
+private final BytesRef uncompressedBytesRef;
+
+public BinaryDecoder(LongValues addresses, IndexInput compressedData, int 
biggestUncompressedBlockSize) {
+  super();
+  this.addresses = addresses;
+  this.compressedData = compressedData;
+  // pre-allocate a byte array large enough for the biggest uncompressed 
block needed.
+  this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
+  uncompressedBytesRef = new BytesRef(uncompressedBlock);
+  
+}
+
+BytesRef decode(int docNumber) throws IOException {
+  int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; 
+  int docInBlockId = docNumber % 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+  assert docInBlockId < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+  
+  
+  // already read and uncompressed?
+  if (blockId != lastBlockId) {
+lastBlockId = blockId;
+long blockStartOffset = addresses.get(blockId);
+compressedData.seek(blockStartOffset);
+
+numDocsInBlock = compressedData.readVInt();
+assert numDocsInBlock <= 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+uncompressedDocEnds = new int[numDocsInBlock];
+uncompressedBlockLength = 0;
+
+int onlyLength = -1;
+for (int i = 0; i < numDocsInBlock; i++) {
+  if (i == 0) {
+// The first length value is special. It is shifted and has a bit 
to denote if
+// all other values are the same length
+int lengthPlusSameInd = compressedData.readVInt();
+int sameIndicator = lengthPlusSameInd & 1;
+int firstValLength = lengthPlusSameInd >>1;
+if (sameIndicator == 1) {
+  onlyLength = firstValLength;
+}
+uncompressedBlockLength += firstValLength;
+  } else {
+if (onlyLength == -1) {
+  // Various lengths are stored - read each from disk
+  uncompressedBlockLength += compressedData.readVInt();
+} else {
+  // Only one length 
+  uncompressedBlockLength += onlyLength;
+}
+  }
+  uncompressedDocEnds[i] = uncompressedBlockLength;
 
 Review comment:
   maybe we could call it `uncompressedDocStarts` and set the index at `i+1` 
which would then help below to remove the else block of the `docInBlockId > 0` 
condition below?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
jpountz commented on a change in pull request #1234: Add compression for Binary 
doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376529195
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 ##
 @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException {
 };
   }
 }
+  }  
+  
+  // Decompresses blocks of binary values to retrieve content
+  class BinaryDecoder {
+
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private int []uncompressedDocEnds = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK];
+private int uncompressedBlockLength = 0;
+private int numDocsInBlock = 0;
+private final byte[] uncompressedBlock;
+private final BytesRef uncompressedBytesRef;
+
+public BinaryDecoder(LongValues addresses, IndexInput compressedData, int 
biggestUncompressedBlockSize) {
+  super();
+  this.addresses = addresses;
+  this.compressedData = compressedData;
+  // pre-allocate a byte array large enough for the biggest uncompressed 
block needed.
+  this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
+  uncompressedBytesRef = new BytesRef(uncompressedBlock);
+  
+}
+
+BytesRef decode(int docNumber) throws IOException {
+  int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; 
+  int docInBlockId = docNumber % 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+  assert docInBlockId < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+  
+  
+  // already read and uncompressed?
+  if (blockId != lastBlockId) {
+lastBlockId = blockId;
+long blockStartOffset = addresses.get(blockId);
+compressedData.seek(blockStartOffset);
+
+numDocsInBlock = compressedData.readVInt();
+assert numDocsInBlock <= 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+uncompressedDocEnds = new int[numDocsInBlock];
+uncompressedBlockLength = 0;
+
+int onlyLength = -1;
+for (int i = 0; i < numDocsInBlock; i++) {
+  if (i == 0) {
+// The first length value is special. It is shifted and has a bit 
to denote if
+// all other values are the same length
+int lengthPlusSameInd = compressedData.readVInt();
+int sameIndicator = lengthPlusSameInd & 1;
+int firstValLength = lengthPlusSameInd >>1;
 
 Review comment:
   Since you are stealing a bit, we should do an unsigned shift (`>>>`) instead.
   
   This would never be a problem in practice, but imagine than the length was a 
31-bits integer. Shifting by one bit on the left at index time would make this 
number negative. So here we need an unsigned shift rather than a signed shift 
that preserves the sign.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
jpountz commented on a change in pull request #1234: Add compression for Binary 
doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376532189
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 ##
 @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException {
 };
   }
 }
+  }  
+  
+  // Decompresses blocks of binary values to retrieve content
+  class BinaryDecoder {
+
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private int []uncompressedDocEnds = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK];
+private int uncompressedBlockLength = 0;
+private int numDocsInBlock = 0;
+private final byte[] uncompressedBlock;
+private final BytesRef uncompressedBytesRef;
+
+public BinaryDecoder(LongValues addresses, IndexInput compressedData, int 
biggestUncompressedBlockSize) {
+  super();
+  this.addresses = addresses;
+  this.compressedData = compressedData;
+  // pre-allocate a byte array large enough for the biggest uncompressed 
block needed.
+  this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
+  uncompressedBytesRef = new BytesRef(uncompressedBlock);
+  
+}
+
+BytesRef decode(int docNumber) throws IOException {
+  int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; 
+  int docInBlockId = docNumber % 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+  assert docInBlockId < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+  
+  
+  // already read and uncompressed?
+  if (blockId != lastBlockId) {
+lastBlockId = blockId;
+long blockStartOffset = addresses.get(blockId);
+compressedData.seek(blockStartOffset);
+
+numDocsInBlock = compressedData.readVInt();
+assert numDocsInBlock <= 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+uncompressedDocEnds = new int[numDocsInBlock];
 
 Review comment:
   can we reuse the same array across blocks?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
jpountz commented on issue #1234: Add compression for Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583536606
 
 
   @msokolov FWIW LZ4 only removes duplicate strings from a stream: when it 
finds one it inserts a reference to a previous sequence of bytes. In the 
special case that the content in incompressible, the LZ4 compressed data just 
consists of the number of bytes followed by the bytes, so the only overhead 
compared to reading the bytes directly is the decoding of the number of bytes, 
which should be rather low.
   
   I don't have a preference regarding whether we should have an explicit 
"not-compressed" case, but I understand how not having one helps keep things 
simpler.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
msokolov commented on issue #1234: Add compression for Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583538389
 
 
   Strange that Mark would measure 4x slowdown from decoding the lengths... 
Perhaps the random bytes are not totally incompressible, just barely 
compressible? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood commented on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
markharwood commented on issue #1234: Add compression for Binary doc value 
fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583539216
 
 
   >Strange that Mark would measure 4x slowdown from decoding the lengths... 
Perhaps the random bytes are not totally incompressible, just barely 
compressible?
   
   I may have been too hasty in that reply - I've not been able to reproduce 
that and the timings are very similar in the additional tests I've done so echo 
what @jpountz expects


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood edited a comment on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
markharwood edited a comment on issue #1234: Add compression for Binary doc 
value fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583539216
 
 
   >Strange that Mark would measure 4x slowdown from decoding the lengths... 
Perhaps the random bytes are not totally incompressible, just barely 
compressible?
   
   I may have been too hasty in that reply - I've not been able to reproduce 
that and the timings are very similar in the additional tests I've done so echo 
what @jpountz expects. My first (faster) run had random bytes selected in the 
range 0-20 and not the 0-127 range where I'm seeing parity


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase merged pull request #1224: LUCENE-9194: Simplify XYShapeQuery API

2020-02-07 Thread GitBox
iverase merged pull request #1224: LUCENE-9194: Simplify XYShapeQuery API
URL: https://github.com/apache/lucene-solr/pull/1224
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] markharwood edited a comment on issue #1234: Add compression for Binary doc value fields

2020-02-07 Thread GitBox
markharwood edited a comment on issue #1234: Add compression for Binary doc 
value fields
URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583539216
 
 
   >Strange that Mark would measure 4x slowdown from decoding the lengths... 
Perhaps the random bytes are not totally incompressible, just barely 
compressible?
   
   I may have been too hasty in that reply - I've not been able to reproduce 
that and the raw vs compressed timings are very similar in the additional tests 
I've done so echo what @jpountz expects. My first (faster) run had random bytes 
selected in the range 0-20 and not the 0-127 range where I'm seeing parity


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] anshumg opened a new pull request #1245: Create gradle precommit action

2020-02-07 Thread GitBox
anshumg opened a new pull request #1245: Create gradle precommit action
URL: https://github.com/apache/lucene-solr/pull/1245
 
 
   This adds a gradle precommit action w/ Java11 for all branches.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] asfgit merged pull request #1182: LUCENE-9149: Increase data dimension limit in BKD

2020-02-07 Thread GitBox
asfgit merged pull request #1182: LUCENE-9149: Increase data dimension limit in 
BKD
URL: https://github.com/apache/lucene-solr/pull/1182
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] anshumg merged pull request #1245: LUCENE-9146: Create gradle precommit action

2020-02-07 Thread GitBox
anshumg merged pull request #1245: LUCENE-9146: Create gradle precommit action
URL: https://github.com/apache/lucene-solr/pull/1245
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-08 Thread GitBox
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583751044
 
 
   The task should just be defined for each sourceSet. Then tests and compile 
works automatically. Grafles will automatically add 2 tasks (one for each 
sourceSet): ecjLintMain and ecjLintTest (if you call it ecjLint base name). To 
set this up ask Gradle for current sourceSets and generate a task with 
automatic name based on SourceSet name. Classpath is provided gratis.
   
   See e.g Gradle internal tasks or forbiddenapis source code how those tasks 
should be declared. This easy seen here is not in line with the model behind 
Gradle (you define tasks per sourceSet, so it's extensible).
   
   sourceSet by the way also has source target and/or release version.
   Thi


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-08 Thread GitBox
uschindler edited a comment on issue #1242: LUCENE-9201: Port 
documentation-lint task to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583751044
 
 
   The task should just be defined for each sourceSet. Then tests and compile 
works automatically. Grafles will automatically add 2 tasks (one for each 
sourceSet): ecjLintMain and ecjLintTest (if you call it ecjLint base name). To 
set this up ask Gradle for current sourceSets and generate a task with 
automatic name based on SourceSet name. Classpath is provided gratis.
   
   See e.g Gradle internal tasks or forbiddenapis source code how those tasks 
should be declared. This easy seen here is not in line with the model behind 
Gradle (you define tasks per sourceSet, so it's extensible, e.g. if we add new 
sourceSets when building multi-release jars for some modules).
   
   sourceSet by the way also has source target and/or release version.
   Thi


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-08 Thread GitBox
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583753281
 
 
   Here the forbiddenapis example how to setup a task per sourceSet: 
https://github.com/policeman-tools/forbidden-apis/blob/master/src/main/resources/de/thetaphi/forbiddenapis/gradle/plugin-init.groovy#L42


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk closed pull request #1209: SOLR-14209: Upgrade JQuery to 3.4.1

2020-02-08 Thread GitBox
risdenk closed pull request #1209: SOLR-14209: Upgrade JQuery to 3.4.1
URL: https://github.com/apache/lucene-solr/pull/1209
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on issue #591: SOLR-9840: Add a unit test for LDAP integration (Hrishikesh Gadre, Kevin Risden)

2020-02-08 Thread GitBox
risdenk commented on issue #591: SOLR-9840: Add a unit test for LDAP 
integration (Hrishikesh Gadre, Kevin Risden)
URL: https://github.com/apache/lucene-solr/pull/591#issuecomment-583764671
 
 
   working on rebasing to latest master to make sure still valid.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley closed pull request #1202: SOLR-14149: CHANGES.txt Remove off-topic stuff

2020-02-08 Thread GitBox
dsmiley closed pull request #1202: SOLR-14149: CHANGES.txt Remove off-topic 
stuff
URL: https://github.com/apache/lucene-solr/pull/1202
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-08 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376753387
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.boost;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.search.BoostAttribute;
+
+import java.io.IOException;
+
+
+/**
+ * Characters before the delimiter are the "token", those after are the boost.
+ * 
+ * For example, if the delimiter is '|', then for the string "foo|0.7", foo is 
the token
+ * and 0.7 is the boost.
+ * 
+ * Note make sure your Tokenizer doesn't split on the delimiter, or this won't 
work
+ */
+public final class DelimitedBoostTokenFilter extends TokenFilter {
+  private final char delimiter;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class);
+
+  public DelimitedBoostTokenFilter(TokenStream input, char delimiter) {
+super(input);
+this.delimiter = delimiter;
+  }
+
+  @Override
+  public boolean incrementToken() throws IOException {
+if (input.incrementToken()) {
+  final char[] buffer = termAtt.buffer();
+  final int length = termAtt.length();
+  for (int i = 0; i < length; i++) {
+if (buffer[i] == delimiter) {
+  float boost = Float.parseFloat(new String(buffer, i + 1, (length - 
(i + 1;
+  boostAtt.setBoost(boost);
+  termAtt.setLength(i);
+  return true;
+}
+  }
+  return true;
+} else return false;
 
 Review comment:
   I know this is a minor matter of taste but please but brackets on the false 
side of the else with the code on its own line.  This is for consistency with 
our defacto code style in the project.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-08 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376754130
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ##
 @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) {
 .toArray(Term[]::new);
   }
 
+  /**
+   * Returns the list of terms that start at the provided state
+   */
+  public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int 
state) {
 
 Review comment:
   Given that this class, GraphTokenStreamFiniteStrings deals with 
List (something I did not know when I made a previous 
comment), and also that TermAndBoost is an inner class to QueryBuilder, I think 
it's better to put this back into QueryBuilder.  I still think 
`List` is weird and heavyweight but you didn't add it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-08 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376753645
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/package-info.java
 ##
 @@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Provides various convenience classes for creating boosts on Tokens.
+ */
+package org.apache.lucene.analysis.boost;
 
 Review comment:
   While I can see why you chose a new "boost" sub-package because the payload 
based filter from which you drew inspiration was in a "payload" sub-package, I 
lean towards the "miscellaneous" package.  Note that 
DelimitedTermFrequencyTokenFilter is in "miscellaneous" too.  WDYT @romseygeek 
?  Or maybe we need a new "delimited" sub-package for all these to go; I dunno.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-08 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376754130
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ##
 @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) {
 .toArray(Term[]::new);
   }
 
+  /**
+   * Returns the list of terms that start at the provided state
+   */
+  public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int 
state) {
 
 Review comment:
   Given that this class, GraphTokenStreamFiniteStrings deals with 
`List` (something I did not know when I made a previous 
comment), and also that TermAndBoost is an inner class to QueryBuilder, I think 
it's better to put this back into QueryBuilder.  I still think 
`List` is weird and heavyweight but you didn't add it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on issue #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging

2020-02-09 Thread GitBox
ErickErickson commented on issue #1169: LUCENE-9004: A minor feature and patch 
-- support deleting vector values and fix segments merging
URL: https://github.com/apache/lucene-solr/pull/1169#issuecomment-583848170
 
 
   Julie:
   
   Moving the conversation about forceMerge over from the JIRA as per Julie.
   
   I can imagine ways to shorten the merge process, but it'll still take quite 
a long time. My main concern was that I didn't know if the problem Julie was 
talking about was functional or not. So it sounds like the issue is "just" 
performance.
   
   Ways to shorten it: First, I'm assuming you're using TieredMergePolicy, 
which is the default. The forceMerge(1) option _may_ rewrite any given segment 
multiple times. There's a limit of 30 segments merged at any given time, see 
maxMergeAtOnceExplicit. So say you have 300 segments, first you'd have 10 
merges of 30 segments in the first pass, then another merge of the resulting 
segments. Each pass is a complete rewrite of the entire index. Depending on the 
number of segments, there could be more passes. That limit is mainly there so 
forceMerge doesn't consume too many resources if, say, indexing or searching 
are going on, but in your case I'd guess you don't care about that. So you 
could set it to a very large number and get it done in a single pass.
   
   I think that's about the most savings you'd get, I don't think (but haven't 
measured) whether merging 150 small segments totaling 300G in a single pass is 
any slower or faster than merging 10 segments totaling 300G, if you wanted to 
try that you could set maxMergedSegmentMB. That would simply do more merging in 
the background during indexing to produce fewer, larger segments. Like I said, 
though, I don't think this will make any difference.
   
   So my guess is that if you bump maxMergeAtOnceExplicit to a very large 
number, you'll cut your merge time in half (or a third or quarter, or... 
depending on the number of passes). It'll still take considerable time, but may 
be acceptable.
   
   Best,
   Erick
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1245: LUCENE-9146: Create gradle precommit action

2020-02-09 Thread GitBox
dweiss commented on a change in pull request #1245: LUCENE-9146: Create gradle 
precommit action
URL: https://github.com/apache/lucene-solr/pull/1245#discussion_r376800656
 
 

 ##
 File path: .github/workflows/gradle-precommit.yml
 ##
 @@ -0,0 +1,23 @@
+name: Gradle Precommit
+
+on: 
+  pull_request:
+branches:
+- '*'
+
+jobs:
+  test:
+name: gradle precommit w/ Java 11
+
+runs-on: ubuntu-latest
+
+steps:
+- uses: actions/checkout@v2
+- name: Set up JDK 11
+  uses: actions/setup-java@v1
+  with:
+java-version: 11
+- name: Grant execute permission for gradlew
+  run: chmod +x gradlew
 
 Review comment:
   gradlew should have this permission already when you do a git clone? Why is 
it explicit?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader

2020-02-09 Thread GitBox
dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader
URL: https://github.com/apache/lucene-solr/pull/1191#issuecomment-583950052
 
 
   Perhaps the remaining larger changes relating to new classes (e.g. 
StandaloneSolrResourceLoader) should wait for a follow-on commit; there's 
plenty here already.  Maybe a few static methods could/should move elsewhere 
but this is ready for a review I think.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase opened a new pull request #1246: LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE

2020-02-10 Thread GitBox
iverase opened a new pull request #1246: LUCENE-9216: Make sure we index 
LEAST_DOUBLE_VALUE
URL: https://github.com/apache/lucene-solr/pull/1246
 
 
   Trivial test fix


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376935901
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/package-info.java
 ##
 @@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Provides various convenience classes for creating boosts on Tokens.
+ */
+package org.apache.lucene.analysis.boost;
 
 Review comment:
   I like the `boost` package - I'm already thinking about a 
`TypeToBoostTokenFilter` that would automatically boost tokens marked with a 
`SYNONYM` type for example, and there are probably other boosting filters we 
can come up with, so a package to collect them all makes sense to me.  I prefer 
to group packages by functionality rather than implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376936277
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ##
 @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) {
 .toArray(Term[]::new);
   }
 
+  /**
+   * Returns the list of terms that start at the provided state
+   */
+  public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int 
state) {
 
 Review comment:
   Yes, let's go back to `AttributeSource` - sorry for the back and forth on 
this @alessandrobenedetti 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376937485
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java
 ##
 @@ -63,6 +66,25 @@
   protected boolean enableGraphQueries = true;
   protected boolean autoGenerateMultiTermSynonymsPhraseQuery = false;
 
+  /**
+   * Wraps a term and boost
+   */
+  public static class TermAndBoost {
+private static final float DEFAULT_BOOST = 1.0f;
 
 Review comment:
   I think this should probably be on `BoostAttribute` rather than here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on issue #1215: LUCENE-9164: Ignore ACE on tragic event if IW is closed

2020-02-10 Thread GitBox
s1monw commented on issue #1215: LUCENE-9164: Ignore ACE on tragic event if IW 
is closed
URL: https://github.com/apache/lucene-solr/pull/1215#issuecomment-584038675
 
 
   I will start working on some refactorings to streamline this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376974784
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/package-info.java
 ##
 @@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Provides various convenience classes for creating boosts on Tokens.
+ */
+package org.apache.lucene.analysis.boost;
 
 Review comment:
   So let's keep boost package then? no strong opinion here my side


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376978767
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ##
 @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) {
 .toArray(Term[]::new);
   }
 
+  /**
+   * Returns the list of terms that start at the provided state
+   */
+  public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int 
state) {
 
 Review comment:
   no worries at all, done in the upcoming commit!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376980226
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java
 ##
 @@ -63,6 +66,25 @@
   protected boolean enableGraphQueries = true;
   protected boolean autoGenerateMultiTermSynonymsPhraseQuery = false;
 
+  /**
+   * Wraps a term and boost
+   */
+  public static class TermAndBoost {
+private static final float DEFAULT_BOOST = 1.0f;
 
 Review comment:
   I agree, coming in the upcoming commit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376980226
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java
 ##
 @@ -63,6 +66,25 @@
   protected boolean enableGraphQueries = true;
   protected boolean autoGenerateMultiTermSynonymsPhraseQuery = false;
 
+  /**
+   * Wraps a term and boost
+   */
+  public static class TermAndBoost {
+private static final float DEFAULT_BOOST = 1.0f;
 
 Review comment:
   I agree, coming in the upcoming commit.
   Furthermore in a lot of places in Lucene and Solr 1.0f is used when it is 
actually the DEFAULT_BOOST, I won't change that, it's not the scope of this 
issue but it would be nice to add a ticket to do that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376982361
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.boost;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.search.BoostAttribute;
+
+import java.io.IOException;
+
+
+/**
+ * Characters before the delimiter are the "token", those after are the boost.
+ * 
+ * For example, if the delimiter is '|', then for the string "foo|0.7", foo is 
the token
+ * and 0.7 is the boost.
+ * 
+ * Note make sure your Tokenizer doesn't split on the delimiter, or this won't 
work
+ */
+public final class DelimitedBoostTokenFilter extends TokenFilter {
+  private final char delimiter;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class);
+
+  public DelimitedBoostTokenFilter(TokenStream input, char delimiter) {
+super(input);
+this.delimiter = delimiter;
+  }
+
+  @Override
+  public boolean incrementToken() throws IOException {
+if (input.incrementToken()) {
+  final char[] buffer = termAtt.buffer();
+  final int length = termAtt.length();
+  for (int i = 0; i < length; i++) {
+if (buffer[i] == delimiter) {
+  float boost = Float.parseFloat(new String(buffer, i + 1, (length - 
(i + 1;
+  boostAtt.setBoost(boost);
+  termAtt.setLength(i);
+  return true;
+}
+  }
+  return true;
+} else return false;
 
 Review comment:
   coming in the next commit, can you check it? I took it from the 
delimitedPayload, I guess code style is somewhat inconsistent across the 
project (I verified that multiple times in the past)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-584059142
 
 
   Latest comments have been addressed, let me know if there's anything else 
needed here :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase merged pull request #1246: LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE

2020-02-10 Thread GitBox
iverase merged pull request #1246: LUCENE-9216: Make sure we index 
LEAST_DOUBLE_VALUE
URL: https://github.com/apache/lucene-solr/pull/1246
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] andywebb1975 opened a new pull request #1247: SOLR-14252 use double rather than Double to avoid NPE

2020-02-10 Thread GitBox
andywebb1975 opened a new pull request #1247: SOLR-14252 use double rather than 
Double to avoid NPE
URL: https://github.com/apache/lucene-solr/pull/1247
 
 
   # Description
   
   The getMax and getMin methods in AggregateMetric can throw an NPE if 
non-Number values are present in values, when it tries to cast a null Double to 
a double.
   
   # Solution
   
   This PR switches to using primitive doubles, defaulting to zero, and warns 
when non-Number values are provided.
   
   # Tests
   
   TBC
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] andywebb1975 commented on issue #1247: SOLR-14252 use double rather than Double to avoid NPE

2020-02-10 Thread GitBox
andywebb1975 commented on issue #1247: SOLR-14252 use double rather than Double 
to avoid NPE
URL: https://github.com/apache/lucene-solr/pull/1247#issuecomment-584087529
 
 
   The PR really just changes an exception to a warning - it may be papering 
over another issue. I'm going to try changing `public Object value;` to `public 
Number value;` at line 41 in order to trigger earlier exceptions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson closed pull request #1241: Gradle util

2020-02-10 Thread GitBox
ErickErickson closed pull request #1241: Gradle util
URL: https://github.com/apache/lucene-solr/pull/1241
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on issue #1241: Gradle util

2020-02-10 Thread GitBox
ErickErickson commented on issue #1241: Gradle util
URL: https://github.com/apache/lucene-solr/pull/1241#issuecomment-584119269
 
 
   Didn't link appropriately,  I wondered why nobody replied.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson opened a new pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
ErickErickson opened a new pull request #1248: LUCENE-9134: Port ant-regenerate 
tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248
 
 
   This adds the generation targets for util/packed and util/automaton.
   
   For whatever reason my local Python doesn't do anything weird like it did 
when regenerating the html entities, the generated code is identical.
   
   One thing I'd like to draw attention to is that I had to change 
createLevAutomata.py to path to the new place moman is downloaded to.
   
   I'll merge upstream in the next day or two barring objections.
   
   I think this finishes off the regeneration work, so I'll close LUCENE-9134 
after merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] shalinmangar merged pull request #1220: SOLR-13996: Refactor HttpShardHandler.prepDistributed method

2020-02-10 Thread GitBox
shalinmangar merged pull request #1220: SOLR-13996: Refactor 
HttpShardHandler.prepDistributed method
URL: https://github.com/apache/lucene-solr/pull/1220
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] andywebb1975 commented on a change in pull request #1247: SOLR-14252 use double rather than Double to avoid NPE

2020-02-10 Thread GitBox
andywebb1975 commented on a change in pull request #1247: SOLR-14252 use double 
rather than Double to avoid NPE
URL: https://github.com/apache/lucene-solr/pull/1247#discussion_r377223058
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/metrics/AggregateMetric.java
 ##
 @@ -93,16 +99,13 @@ public double getMax() {
 if (values.isEmpty()) {
   return 0;
 }
-Double res = null;
+double res = 0;
 for (Update u : values.values()) {
   if (!(u.value instanceof Number)) {
+log.warn("not a Number: " + u.value);
 
 Review comment:
   Note I'm not completely clear whether `u.value` is ever _expected_ to not be 
a `Number` - have seen this line report `false` and `LocalStatsCache` and I'm 
tracing back through to find out why these occur.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-10 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r377238017
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ##
 @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) {
 .toArray(Term[]::new);
   }
 
+  /**
+   * Returns the list of terms that start at the provided state
+   */
+  public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int 
state) {
 
 Review comment:
   Can't you remove this now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #1238: SOLR-14240: Clean up znodes after shard deletion is invoked

2020-02-10 Thread GitBox
HoustonPutman commented on a change in pull request #1238: SOLR-14240: Clean up 
znodes after shard deletion is invoked
URL: https://github.com/apache/lucene-solr/pull/1238#discussion_r377325559
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/DeleteShardCmd.java
 ##
 @@ -151,6 +154,21 @@ public void call(ClusterState clusterState, ZkNodeProps 
message, NamedList resul
   "Error executing delete operation for collection: " + collectionName 
+ " shard: " + sliceId, e);
 }
   }
+  
+  private void cleanupZooKeeperShardMetadata(SolrZkClient client, String 
collection, String sliceId) throws InterruptedException {
+String leaderElectPath = ZkStateReader.COLLECTIONS_ZKNODE + "/" + 
collection + "/leader_elect/" + sliceId;
+String shardLeaderPath = ZkStateReader.COLLECTIONS_ZKNODE + "/" + 
collection + "/leaders/" + sliceId;
+String shardTermsPath = ZkStateReader.COLLECTIONS_ZKNODE + "/" + 
collection + "/terms/" + sliceId;
+
+try {
+  client.clean(leaderElectPath);
+  client.clean(shardLeaderPath);
+  client.clean(shardTermsPath);
+} catch (KeeperException ex) {
+  log.warn("Non-fatal error occured attempting to delete shard metadata on 
zooker for collection " + 
 
 Review comment:
   If you are just logging a warning on failure, you might want to loop through 
each one, with the try-catch inside the loop. Therefore if one path fails, the 
others have a chance of succeeding. You can also log the path that failed which 
will help in debugging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
madrob commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377332813
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/automaton/createLevAutomata.py
 ##
 @@ -22,7 +22,7 @@
 import os
 import sys
 # sys.path.insert(0, 'moman/finenight/python')
-sys.path.insert(0, '../../../../../../../../build/core/moman/finenight/python')
+sys.path.insert(0, '../../../../../../../../../build/moman/finenight/python')
 
 Review comment:
   I think this has been answered before, but please remind me, does this break 
`ant regenerate` meaning that both cannot co-exist?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
madrob commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377327334
 
 

 ##
 File path: gradle/generation/util.gradle
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+apply plugin: "de.undercouch.download"
+
+configure(rootProject) {
+  configurations {
+utilgen
+  }
+
+  dependencies {
+  }
+
+  task utilgen {
+description "Regenerate sources for ...lucene/util/automaton and 
...lucene/util/packed."
+group "generation"
+
+dependsOn ":lucene:core:utilGenPacked"
+dependsOn ":lucene:core:utilGenLev"
+  }
+}
+
+
+task installMoman(type: Download) {
+  def momanDir = new File(buildDir, "moman").getAbsolutePath()
+  def momanZip = new File(momanDir, "moman.zip").getAbsolutePath()
+
+  src "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip";
+  dest momanZip
+  onlyIfModified true
+
+  doLast {
+logger.lifecycle("Downloading moman to: ${buildDir}")
+ant.unzip(src: momanZip, dest: momanDir, overwrite: "true") {
+  ant.cutdirsmapper(dirs: "1")
+}
+  }
+}
+
+configure(project(":lucene:core")) {
+  task utilGenPacked(dependsOn: installMoman) {
+description "Regenerate util/PackedBulkOperationsPacked*.java and 
Packed64SingleBlock.java"
+group "generation"
+
+def workDir = "src/java/org/apache/lucene/util/packed"
+
+doLast {
+  ['gen_BulkOperation.py', 'gen_Packed64SingleBlock.py'].each { prog ->
+logger.lifecycle("Executing: ${prog} in ${workDir}")
+project.exec {
+  workingDir workDir
+  executable "python"
+  args = ['-B', "${prog}"]
+}
+  }
+  // Correct line endings for Windows.
+  ['Packed64SingleBlock.java', 'BulkOperation*.java'].each { files ->
 
 Review comment:
   Does this need to be an `each` block, or can we specify multiple includes 
for the ant execution?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
madrob commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377327836
 
 

 ##
 File path: gradle/generation/util.gradle
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+apply plugin: "de.undercouch.download"
+
+configure(rootProject) {
+  configurations {
+utilgen
+  }
+
+  dependencies {
+  }
+
+  task utilgen {
+description "Regenerate sources for ...lucene/util/automaton and 
...lucene/util/packed."
+group "generation"
+
+dependsOn ":lucene:core:utilGenPacked"
+dependsOn ":lucene:core:utilGenLev"
+  }
+}
+
+
+task installMoman(type: Download) {
+  def momanDir = new File(buildDir, "moman").getAbsolutePath()
+  def momanZip = new File(momanDir, "moman.zip").getAbsolutePath()
+
+  src "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip";
+  dest momanZip
+  onlyIfModified true
+
+  doLast {
+logger.lifecycle("Downloading moman to: ${buildDir}")
+ant.unzip(src: momanZip, dest: momanDir, overwrite: "true") {
+  ant.cutdirsmapper(dirs: "1")
+}
+  }
+}
+
+configure(project(":lucene:core")) {
+  task utilGenPacked(dependsOn: installMoman) {
+description "Regenerate util/PackedBulkOperationsPacked*.java and 
Packed64SingleBlock.java"
+group "generation"
+
+def workDir = "src/java/org/apache/lucene/util/packed"
+
+doLast {
+  ['gen_BulkOperation.py', 'gen_Packed64SingleBlock.py'].each { prog ->
+logger.lifecycle("Executing: ${prog} in ${workDir}")
+project.exec {
+  workingDir workDir
+  executable "python"
+  args = ['-B', "${prog}"]
+}
+  }
+  // Correct line endings for Windows.
+  ['Packed64SingleBlock.java', 'BulkOperation*.java'].each { files ->
+project.ant.fixcrlf(
+srcDir: workDir,
+includes: files,
+encoding: 'UTF-8',
+eol: 'lf'
+)
+  }
+}
+  }
+}
+
+configure(project(":lucene:core")) {
 
 Review comment:
   I think you can combine this with the previous configure block, I don't 
think separating them adds readability. Let me know if you did this 
intentionally though


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377369604
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/automaton/createLevAutomata.py
 ##
 @@ -22,7 +22,7 @@
 import os
 import sys
 # sys.path.insert(0, 'moman/finenight/python')
-sys.path.insert(0, '../../../../../../../../build/core/moman/finenight/python')
+sys.path.insert(0, '../../../../../../../../../build/moman/finenight/python')
 
 Review comment:
   Yeah. I'm pretty sure it does break the ant regenerate. Since the whole 
regenerate process is apparently run extremely rarely (like every couple of 
years or so from what I can tell), I think we'll be on gradle exclusively the 
next time this is run.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377374590
 
 

 ##
 File path: gradle/generation/util.gradle
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+apply plugin: "de.undercouch.download"
+
+configure(rootProject) {
+  configurations {
+utilgen
+  }
+
+  dependencies {
+  }
+
+  task utilgen {
+description "Regenerate sources for ...lucene/util/automaton and 
...lucene/util/packed."
+group "generation"
+
+dependsOn ":lucene:core:utilGenPacked"
+dependsOn ":lucene:core:utilGenLev"
+  }
+}
+
+
+task installMoman(type: Download) {
+  def momanDir = new File(buildDir, "moman").getAbsolutePath()
+  def momanZip = new File(momanDir, "moman.zip").getAbsolutePath()
+
+  src "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip";
+  dest momanZip
+  onlyIfModified true
+
+  doLast {
+logger.lifecycle("Downloading moman to: ${buildDir}")
+ant.unzip(src: momanZip, dest: momanDir, overwrite: "true") {
+  ant.cutdirsmapper(dirs: "1")
+}
+  }
+}
+
+configure(project(":lucene:core")) {
+  task utilGenPacked(dependsOn: installMoman) {
+description "Regenerate util/PackedBulkOperationsPacked*.java and 
Packed64SingleBlock.java"
+group "generation"
+
+def workDir = "src/java/org/apache/lucene/util/packed"
+
+doLast {
+  ['gen_BulkOperation.py', 'gen_Packed64SingleBlock.py'].each { prog ->
+logger.lifecycle("Executing: ${prog} in ${workDir}")
+project.exec {
+  workingDir workDir
+  executable "python"
+  args = ['-B', "${prog}"]
+}
+  }
+  // Correct line endings for Windows.
+  ['Packed64SingleBlock.java', 'BulkOperation*.java'].each { files ->
+project.ant.fixcrlf(
+srcDir: workDir,
+includes: files,
+encoding: 'UTF-8',
+eol: 'lf'
+)
+  }
+}
+  }
+}
+
+configure(project(":lucene:core")) {
 
 Review comment:
   Good point, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377376693
 
 

 ##
 File path: gradle/generation/util.gradle
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+apply plugin: "de.undercouch.download"
+
+configure(rootProject) {
+  configurations {
+utilgen
+  }
+
+  dependencies {
+  }
+
+  task utilgen {
+description "Regenerate sources for ...lucene/util/automaton and 
...lucene/util/packed."
+group "generation"
+
+dependsOn ":lucene:core:utilGenPacked"
+dependsOn ":lucene:core:utilGenLev"
+  }
+}
+
+
+task installMoman(type: Download) {
+  def momanDir = new File(buildDir, "moman").getAbsolutePath()
+  def momanZip = new File(momanDir, "moman.zip").getAbsolutePath()
+
+  src "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip";
+  dest momanZip
+  onlyIfModified true
+
+  doLast {
+logger.lifecycle("Downloading moman to: ${buildDir}")
+ant.unzip(src: momanZip, dest: momanDir, overwrite: "true") {
+  ant.cutdirsmapper(dirs: "1")
+}
+  }
+}
+
+configure(project(":lucene:core")) {
+  task utilGenPacked(dependsOn: installMoman) {
+description "Regenerate util/PackedBulkOperationsPacked*.java and 
Packed64SingleBlock.java"
+group "generation"
+
+def workDir = "src/java/org/apache/lucene/util/packed"
+
+doLast {
+  ['gen_BulkOperation.py', 'gen_Packed64SingleBlock.py'].each { prog ->
+logger.lifecycle("Executing: ${prog} in ${workDir}")
+project.exec {
+  workingDir workDir
+  executable "python"
+  args = ['-B', "${prog}"]
+}
+  }
+  // Correct line endings for Windows.
+  ['Packed64SingleBlock.java', 'BulkOperation*.java'].each { files ->
 
 Review comment:
   True, I'll  change it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on issue #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
ErickErickson commented on issue #1248: LUCENE-9134: Port ant-regenerate tasks 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#issuecomment-584409136
 
 
   I made the changes Mike mentioned, but I won't create another PR for a bit 
to give others a chance to look


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
madrob commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377429993
 
 

 ##
 File path: gradle/generation/util.gradle
 ##
 @@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+apply plugin: "de.undercouch.download"
+
+configure(rootProject) {
+  configurations {
+utilgen
+  }
+
+  dependencies {
 
 Review comment:
   nit: drop this empty block?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #1244: SOLR-14247 Remove unneeded sleeps

2020-02-10 Thread GitBox
madrob merged pull request #1244: SOLR-14247 Remove unneeded sleeps
URL: https://github.com/apache/lucene-solr/pull/1244
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader

2020-02-10 Thread GitBox
dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader
URL: https://github.com/apache/lucene-solr/pull/1191#issuecomment-584496996
 
 
   I think this PR is ready for review @madrob since there are a lot of changes 
even without introducing SRL subclasses.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase opened a new pull request #1249: LUCENE-9217: Add validation to XYGeometries

2020-02-10 Thread GitBox
iverase opened a new pull request #1249: LUCENE-9217: Add validation to 
XYGeometries
URL: https://github.com/apache/lucene-solr/pull/1249
 
 
This PR adds validation for XYGeometries, in particular checking for 
non-valid values like NaN, INF and -INF.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase commented on a change in pull request #1249: LUCENE-9217: Add validation to XYGeometries

2020-02-10 Thread GitBox
iverase commented on a change in pull request #1249: LUCENE-9217: Add 
validation to XYGeometries
URL: https://github.com/apache/lucene-solr/pull/1249#discussion_r377470379
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/geo/XYRectangle.java
 ##
 @@ -29,12 +31,16 @@
 
   /** Constructs a bounding box by first validating the provided x and y 
coordinates */
   public XYRectangle(double minX, double maxX, double minY, double maxY) {
-this.minX = minX;
-this.maxX = maxX;
-this.minY = minY;
-this.maxY = maxY;
-assert minX <= maxX;
-assert minY <= maxY;
+if (minX > maxX) {
 
 Review comment:
   I wonder if a XYRectangle should be initialise with floats instead of 
doubles like the other XYGeometries?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-10 Thread GitBox
dweiss commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377479333
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/automaton/createLevAutomata.py
 ##
 @@ -22,7 +22,7 @@
 import os
 import sys
 # sys.path.insert(0, 'moman/finenight/python')
-sys.path.insert(0, '../../../../../../../../build/core/moman/finenight/python')
+sys.path.insert(0, '../../../../../../../../../build/moman/finenight/python')
 
 Review comment:
   If so then we should add a fail to ant and just say "use gradle?". These 
python scripts could take an argument - the path to moman passed from gradle 
script. Then it'd be elegant and clear without those ugly relative paths.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #1214: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-02-10 Thread GitBox
atris commented on issue #1214: LUCENE-9074: Slice Allocation Control Plane For 
Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1214#issuecomment-584514132
 
 
   @jpountz Updated, please see and let me know your thoughts


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #1214: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-02-10 Thread GitBox
atris commented on a change in pull request #1214: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1214#discussion_r377481946
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/search/QueueSizeBasedExecutionControlPlane.java
 ##
 @@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Future;
+import java.util.concurrent.FutureTask;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.ThreadPoolExecutor;
+
+/**
+ * Implementation of SliceExecutionControlPlane with queue backpressure based 
thread allocation
+ */
+public class QueueSizeBasedExecutionControlPlane implements 
SliceExecutionControlPlane {
+  private static final double LIMITING_FACTOR = 1.5;
+  private static final int NUMBER_OF_PROCESSORS = 
Runtime.getRuntime().availableProcessors();
+
+  private Executor executor;
+
+  public QueueSizeBasedExecutionControlPlane(Executor executor) {
+this.executor = executor;
+  }
+
+  @Override
+  public List> invokeAll(Collection tasks) {
+boolean isThresholdCheckEnabled = true;
+
+if (tasks == null) {
+  throw new IllegalArgumentException("Tasks is null");
+}
+
+if (executor == null) {
+  throw new IllegalArgumentException("Executor is null");
+}
+
+ThreadPoolExecutor threadPoolExecutor = null;
+if ((executor instanceof ThreadPoolExecutor) == false) {
 
 Review comment:
   Agreed. Reverted the Executor changes and added the abstraction while 
updating the docs for IndexSearcher


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy opened a new pull request #1250: SOLR-14250 Fix error logging on Expect: 100-continue

2020-02-11 Thread GitBox
janhoy opened a new pull request #1250: SOLR-14250 Fix error logging on Expect: 
100-continue
URL: https://github.com/apache/lucene-solr/pull/1250
 
 
   See https://issues.apache.org/jira/browse/SOLR-14250
   
   With this PR, we'll still always try to consume the stream, 
   but if the input stream is not available due to Expect: header,
   and an error has already been sent by Solr, the resulting IOException 
   from Jetty will not be logged at INFO level but instead be a simple
   line on DEBUG level.
   
   The PR does not try to add a test to validate behaviour
   since this is a logging change only, so there is no risk
   that the stream is not consumed anymore.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on issue #1250: SOLR-14250 Fix error logging on Expect: 100-continue

2020-02-11 Thread GitBox
janhoy commented on issue #1250: SOLR-14250 Fix error logging on Expect: 
100-continue
URL: https://github.com/apache/lucene-solr/pull/1250#issuecomment-584537676
 
 
   I tested manually that the new DEBUG log line is printed when hitting a 
non-existent URL, e.g.
   
   curl -H "Content-Type: application/json" -H "Expect: 100-continue" 
http://localhost:8983/solr/foo/update2
   
   So I think this is good to go. Will leave it sitting here to collect 
feedback for a few days.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-11 Thread GitBox
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] 
Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r377532448
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ##
 @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) {
 .toArray(Term[]::new);
   }
 
+  /**
+   * Returns the list of terms that start at the provided state
+   */
+  public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int 
state) {
 
 Review comment:
   just forgot, it's done now


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost

2020-02-11 Thread GitBox
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-584554973
 
 
   So the code should be ok now, should we think where and how to properly 
document it ?
   I will definitely write a blog post on that (that we can later link) but I 
guess we should think to the official documentation part now


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-11 Thread GitBox
juanka588 commented on a change in pull request #1234: Add compression for 
Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377544478
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
 ##
 @@ -353,67 +360,193 @@ private void writeBlock(long[] values, int length, long 
gcd, ByteBuffersDataOutp
 }
   }
 
-  @Override
-  public void addBinaryField(FieldInfo field, DocValuesProducer 
valuesProducer) throws IOException {
-meta.writeInt(field.number);
-meta.writeByte(Lucene80DocValuesFormat.BINARY);
-
-BinaryDocValues values = valuesProducer.getBinary(field);
-long start = data.getFilePointer();
-meta.writeLong(start); // dataOffset
-int numDocsWithField = 0;
-int minLength = Integer.MAX_VALUE;
-int maxLength = 0;
-for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= values.nextDoc()) {
-  numDocsWithField++;
-  BytesRef v = values.binaryValue();
-  int length = v.length;
-  data.writeBytes(v.bytes, v.offset, v.length);
-  minLength = Math.min(length, minLength);
-  maxLength = Math.max(length, maxLength);
+  class CompressedBinaryBlockWriter implements Closeable {
+FastCompressionHashTable ht = new LZ4.FastCompressionHashTable();
+int uncompressedBlockLength = 0;
+int maxUncompressedBlockLength = 0;
+int numDocsInCurrentBlock = 0;
+int[] docLengths = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; 
+byte[] block = new byte [1024 * 16];
+int totalChunks = 0;
+long maxPointer = 0;
+long blockAddressesStart = -1; 
+
+private IndexOutput tempBinaryOffsets;
+
+
+public CompressedBinaryBlockWriter() throws IOException {
+  tempBinaryOffsets = 
state.directory.createTempOutput(state.segmentInfo.name, "binary_pointers", 
state.context);
+  boolean success = false;
+  try {
+CodecUtil.writeHeader(tempBinaryOffsets, 
Lucene80DocValuesFormat.META_CODEC + "FilePointers", 
Lucene80DocValuesFormat.VERSION_CURRENT);
+success = true;
+  } finally {
+if (success == false) {
+  IOUtils.closeWhileHandlingException(this); //self-close because 
constructor caller can't 
+}
+  }
 }
-assert numDocsWithField <= maxDoc;
-meta.writeLong(data.getFilePointer() - start); // dataLength
 
-if (numDocsWithField == 0) {
-  meta.writeLong(-2); // docsWithFieldOffset
-  meta.writeLong(0L); // docsWithFieldLength
-  meta.writeShort((short) -1); // jumpTableEntryCount
-  meta.writeByte((byte) -1);   // denseRankPower
-} else if (numDocsWithField == maxDoc) {
-  meta.writeLong(-1); // docsWithFieldOffset
-  meta.writeLong(0L); // docsWithFieldLength
-  meta.writeShort((short) -1); // jumpTableEntryCount
-  meta.writeByte((byte) -1);   // denseRankPower
-} else {
-  long offset = data.getFilePointer();
-  meta.writeLong(offset); // docsWithFieldOffset
-  values = valuesProducer.getBinary(field);
-  final short jumpTableEntryCount = IndexedDISI.writeBitSet(values, data, 
IndexedDISI.DEFAULT_DENSE_RANK_POWER);
-  meta.writeLong(data.getFilePointer() - offset); // docsWithFieldLength
-  meta.writeShort(jumpTableEntryCount);
-  meta.writeByte(IndexedDISI.DEFAULT_DENSE_RANK_POWER);
+void addDoc(int doc, BytesRef v) throws IOException {
+  if (blockAddressesStart < 0) {
+blockAddressesStart = data.getFilePointer();
+  }
+  docLengths[numDocsInCurrentBlock] = v.length;
+  block = ArrayUtil.grow(block, uncompressedBlockLength + v.length);
+  System.arraycopy(v.bytes, v.offset, block, uncompressedBlockLength, 
v.length);
+  uncompressedBlockLength += v.length;
+  numDocsInCurrentBlock++;
+  if (numDocsInCurrentBlock == 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK) {
+flushData();
+  }  
 }
 
-meta.writeInt(numDocsWithField);
-meta.writeInt(minLength);
-meta.writeInt(maxLength);
-if (maxLength > minLength) {
-  start = data.getFilePointer();
-  meta.writeLong(start);
+private void flushData() throws IOException {
+  if (numDocsInCurrentBlock > 0) {
+// Write offset to this block to temporary offsets file
+totalChunks++;
+long thisBlockStartPointer = data.getFilePointer();
+
+// Optimisation - check if all lengths are same
+boolean allLengthsSame = true && numDocsInCurrentBlock >0  ;
+for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK && allLengthsSame; 
i++) {
+  if (i > 0 && docLengths[i] != docLengths[i-1]) {
+allLengthsSame = false;
+  }
+}
+if (allLengthsSame) {
+// Only write one value shifted. Steal a bit to indicate all other 
lengths are the same
+int onlyOneLength = (docLengths[0

[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-11 Thread GitBox
juanka588 commented on a change in pull request #1234: Add compression for 
Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377545909
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
 ##
 @@ -353,67 +360,193 @@ private void writeBlock(long[] values, int length, long 
gcd, ByteBuffersDataOutp
 }
   }
 
-  @Override
-  public void addBinaryField(FieldInfo field, DocValuesProducer 
valuesProducer) throws IOException {
-meta.writeInt(field.number);
-meta.writeByte(Lucene80DocValuesFormat.BINARY);
-
-BinaryDocValues values = valuesProducer.getBinary(field);
-long start = data.getFilePointer();
-meta.writeLong(start); // dataOffset
-int numDocsWithField = 0;
-int minLength = Integer.MAX_VALUE;
-int maxLength = 0;
-for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= values.nextDoc()) {
-  numDocsWithField++;
-  BytesRef v = values.binaryValue();
-  int length = v.length;
-  data.writeBytes(v.bytes, v.offset, v.length);
-  minLength = Math.min(length, minLength);
-  maxLength = Math.max(length, maxLength);
+  class CompressedBinaryBlockWriter implements Closeable {
+FastCompressionHashTable ht = new LZ4.FastCompressionHashTable();
+int uncompressedBlockLength = 0;
+int maxUncompressedBlockLength = 0;
+int numDocsInCurrentBlock = 0;
+int[] docLengths = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; 
+byte[] block = new byte [1024 * 16];
+int totalChunks = 0;
+long maxPointer = 0;
+long blockAddressesStart = -1; 
+
+private IndexOutput tempBinaryOffsets;
+
+
+public CompressedBinaryBlockWriter() throws IOException {
+  tempBinaryOffsets = 
state.directory.createTempOutput(state.segmentInfo.name, "binary_pointers", 
state.context);
+  boolean success = false;
+  try {
+CodecUtil.writeHeader(tempBinaryOffsets, 
Lucene80DocValuesFormat.META_CODEC + "FilePointers", 
Lucene80DocValuesFormat.VERSION_CURRENT);
+success = true;
+  } finally {
+if (success == false) {
+  IOUtils.closeWhileHandlingException(this); //self-close because 
constructor caller can't 
+}
+  }
 }
-assert numDocsWithField <= maxDoc;
-meta.writeLong(data.getFilePointer() - start); // dataLength
 
-if (numDocsWithField == 0) {
-  meta.writeLong(-2); // docsWithFieldOffset
-  meta.writeLong(0L); // docsWithFieldLength
-  meta.writeShort((short) -1); // jumpTableEntryCount
-  meta.writeByte((byte) -1);   // denseRankPower
-} else if (numDocsWithField == maxDoc) {
-  meta.writeLong(-1); // docsWithFieldOffset
-  meta.writeLong(0L); // docsWithFieldLength
-  meta.writeShort((short) -1); // jumpTableEntryCount
-  meta.writeByte((byte) -1);   // denseRankPower
-} else {
-  long offset = data.getFilePointer();
-  meta.writeLong(offset); // docsWithFieldOffset
-  values = valuesProducer.getBinary(field);
-  final short jumpTableEntryCount = IndexedDISI.writeBitSet(values, data, 
IndexedDISI.DEFAULT_DENSE_RANK_POWER);
-  meta.writeLong(data.getFilePointer() - offset); // docsWithFieldLength
-  meta.writeShort(jumpTableEntryCount);
-  meta.writeByte(IndexedDISI.DEFAULT_DENSE_RANK_POWER);
+void addDoc(int doc, BytesRef v) throws IOException {
+  if (blockAddressesStart < 0) {
+blockAddressesStart = data.getFilePointer();
+  }
+  docLengths[numDocsInCurrentBlock] = v.length;
+  block = ArrayUtil.grow(block, uncompressedBlockLength + v.length);
+  System.arraycopy(v.bytes, v.offset, block, uncompressedBlockLength, 
v.length);
+  uncompressedBlockLength += v.length;
+  numDocsInCurrentBlock++;
+  if (numDocsInCurrentBlock == 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK) {
+flushData();
+  }  
 }
 
-meta.writeInt(numDocsWithField);
-meta.writeInt(minLength);
-meta.writeInt(maxLength);
-if (maxLength > minLength) {
-  start = data.getFilePointer();
-  meta.writeLong(start);
+private void flushData() throws IOException {
+  if (numDocsInCurrentBlock > 0) {
+// Write offset to this block to temporary offsets file
+totalChunks++;
+long thisBlockStartPointer = data.getFilePointer();
+
+// Optimisation - check if all lengths are same
+boolean allLengthsSame = true && numDocsInCurrentBlock >0  ;
+for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK && allLengthsSame; 
i++) {
+  if (i > 0 && docLengths[i] != docLengths[i-1]) {
+allLengthsSame = false;
+  }
+}
+if (allLengthsSame) {
+// Only write one value shifted. Steal a bit to indicate all other 
lengths are the same
+int onlyOneLength = (docLengths[0

[GitHub] [lucene-solr] mocobeta commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-11 Thread GitBox
mocobeta commented on issue #1242: LUCENE-9201: Port documentation-lint task to 
Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-584574521
 
 
   Thanks @uschindler for your comments. I rewrote the task to use sourceSets 
instead of relying the assumption the target projects (for linting) have 
"src/java".
   However there is another problem, the Java source directory isn't available 
from SourceSet as far as I know. Actual source directory path is required when 
executing the ECJ.
   ```groovy
   // excerpt from the custom ECJ lint task
   project.plugins.withId('java', {
 project.sourceSets.each { sourceSet ->
   project.javaexec {
 classpath {
   project.rootProject.configurations.ecj.asPath
 }
 main = "org.eclipse.jdt.internal.compiler.batch.Main"
 args += [
   // Unfortunately, 'testCompileClasspath' is not available from 
sourceSet, so without the second term test classes cannot be compiled.
   "-classpath", sourceSet.compileClasspath.toList().join(':') + 
project.configurations.testCompileClasspath.findAll().join(":"),
   "-d", dstDir,
   "-encoding", "UTF-8",
   "-source", "11", // How this can be obtained from sourceSet or 
project?
   "-target", "11",
   "-nowarn",
   "-enableJavadoc",
   "-properties", 
"${project.rootProject.rootDir}/lucene/tools/javadoc/ecj.javadocs.prefs",
   "src/java" // ... or "src/test". How this can be obtained from 
sourceSet or project?
 ]
   }
 }
   })
   ```
   
   SourceSet has a property `allJava` that contains all Java source file, this 
is no help here.
   
https://docs.gradle.org/current/dsl/org.gradle.api.tasks.SourceSet.html#org.gradle.api.tasks.SourceSet
   I might misses something, or another hack is required to identify the actual 
source directory path?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-11 Thread GitBox
juanka588 commented on a change in pull request #1234: Add compression for 
Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377566543
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesFormat.java
 ##
 @@ -151,7 +151,8 @@ public DocValuesProducer fieldsProducer(SegmentReadState 
state) throws IOExcepti
   static final String META_CODEC = "Lucene80DocValuesMetadata";
   static final String META_EXTENSION = "dvm";
   static final int VERSION_START = 0;
-  static final int VERSION_CURRENT = VERSION_START;
+  static final int VERSION_BIN_COMPRESSED = 1;  
 
 Review comment:
   This could be potentially in the BinaryDocValuesFormat class


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-11 Thread GitBox
juanka588 commented on a change in pull request #1234: Add compression for 
Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377579943
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 ##
 @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException {
 };
   }
 }
+  }  
+  
+  // Decompresses blocks of binary values to retrieve content
+  class BinaryDecoder {
+
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private int []uncompressedDocEnds = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK];
 
 Review comment:
   @jpountz we should use the same structure while writing the data, in that 
case you will see all the properties of the class instead of adding comments in 
the code


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields

2020-02-11 Thread GitBox
jpountz commented on a change in pull request #1234: Add compression for Binary 
doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377621003
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
 ##
 @@ -353,67 +360,193 @@ private void writeBlock(long[] values, int length, long 
gcd, ByteBuffersDataOutp
 }
   }
 
-  @Override
-  public void addBinaryField(FieldInfo field, DocValuesProducer 
valuesProducer) throws IOException {
-meta.writeInt(field.number);
-meta.writeByte(Lucene80DocValuesFormat.BINARY);
-
-BinaryDocValues values = valuesProducer.getBinary(field);
-long start = data.getFilePointer();
-meta.writeLong(start); // dataOffset
-int numDocsWithField = 0;
-int minLength = Integer.MAX_VALUE;
-int maxLength = 0;
-for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= values.nextDoc()) {
-  numDocsWithField++;
-  BytesRef v = values.binaryValue();
-  int length = v.length;
-  data.writeBytes(v.bytes, v.offset, v.length);
-  minLength = Math.min(length, minLength);
-  maxLength = Math.max(length, maxLength);
+  class CompressedBinaryBlockWriter implements Closeable {
+FastCompressionHashTable ht = new LZ4.FastCompressionHashTable();
+int uncompressedBlockLength = 0;
+int maxUncompressedBlockLength = 0;
+int numDocsInCurrentBlock = 0;
+int[] docLengths = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; 
+byte[] block = new byte [1024 * 16];
+int totalChunks = 0;
+long maxPointer = 0;
+long blockAddressesStart = -1; 
+
+private IndexOutput tempBinaryOffsets;
+
+
+public CompressedBinaryBlockWriter() throws IOException {
+  tempBinaryOffsets = 
state.directory.createTempOutput(state.segmentInfo.name, "binary_pointers", 
state.context);
+  boolean success = false;
+  try {
+CodecUtil.writeHeader(tempBinaryOffsets, 
Lucene80DocValuesFormat.META_CODEC + "FilePointers", 
Lucene80DocValuesFormat.VERSION_CURRENT);
+success = true;
+  } finally {
+if (success == false) {
+  IOUtils.closeWhileHandlingException(this); //self-close because 
constructor caller can't 
+}
+  }
 }
-assert numDocsWithField <= maxDoc;
-meta.writeLong(data.getFilePointer() - start); // dataLength
 
-if (numDocsWithField == 0) {
-  meta.writeLong(-2); // docsWithFieldOffset
-  meta.writeLong(0L); // docsWithFieldLength
-  meta.writeShort((short) -1); // jumpTableEntryCount
-  meta.writeByte((byte) -1);   // denseRankPower
-} else if (numDocsWithField == maxDoc) {
-  meta.writeLong(-1); // docsWithFieldOffset
-  meta.writeLong(0L); // docsWithFieldLength
-  meta.writeShort((short) -1); // jumpTableEntryCount
-  meta.writeByte((byte) -1);   // denseRankPower
-} else {
-  long offset = data.getFilePointer();
-  meta.writeLong(offset); // docsWithFieldOffset
-  values = valuesProducer.getBinary(field);
-  final short jumpTableEntryCount = IndexedDISI.writeBitSet(values, data, 
IndexedDISI.DEFAULT_DENSE_RANK_POWER);
-  meta.writeLong(data.getFilePointer() - offset); // docsWithFieldLength
-  meta.writeShort(jumpTableEntryCount);
-  meta.writeByte(IndexedDISI.DEFAULT_DENSE_RANK_POWER);
+void addDoc(int doc, BytesRef v) throws IOException {
+  if (blockAddressesStart < 0) {
+blockAddressesStart = data.getFilePointer();
+  }
+  docLengths[numDocsInCurrentBlock] = v.length;
+  block = ArrayUtil.grow(block, uncompressedBlockLength + v.length);
+  System.arraycopy(v.bytes, v.offset, block, uncompressedBlockLength, 
v.length);
+  uncompressedBlockLength += v.length;
+  numDocsInCurrentBlock++;
+  if (numDocsInCurrentBlock == 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK) {
+flushData();
+  }  
 }
 
-meta.writeInt(numDocsWithField);
-meta.writeInt(minLength);
-meta.writeInt(maxLength);
-if (maxLength > minLength) {
-  start = data.getFilePointer();
-  meta.writeLong(start);
+private void flushData() throws IOException {
+  if (numDocsInCurrentBlock > 0) {
+// Write offset to this block to temporary offsets file
+totalChunks++;
+long thisBlockStartPointer = data.getFilePointer();
+
+// Optimisation - check if all lengths are same
+boolean allLengthsSame = true && numDocsInCurrentBlock >0  ;
+for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK && allLengthsSame; 
i++) {
+  if (i > 0 && docLengths[i] != docLengths[i-1]) {
+allLengthsSame = false;
+  }
+}
+if (allLengthsSame) {
+// Only write one value shifted. Steal a bit to indicate all other 
lengths are the same
+int onlyOneLength = (docLengths[0] 

[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-11 Thread GitBox
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377624894
 
 

 ##
 File path: gradle/generation/util.gradle
 ##
 @@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+apply plugin: "de.undercouch.download"
+
+configure(rootProject) {
+  configurations {
+utilgen
+  }
+
+  dependencies {
 
 Review comment:
   Yeah, I'll nuke that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-02-11 Thread GitBox
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port 
ant-regenerate tasks to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377627813
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/automaton/createLevAutomata.py
 ##
 @@ -22,7 +22,7 @@
 import os
 import sys
 # sys.path.insert(0, 'moman/finenight/python')
-sys.path.insert(0, '../../../../../../../../build/core/moman/finenight/python')
+sys.path.insert(0, '../../../../../../../../../build/moman/finenight/python')
 
 Review comment:
   Dawid:
   
   Yeah, I wondered about that. But please don't make me run the whole 
regenerate task in ant ;).
   
   I'll change this to take an argument and get it working in Gradle, and leave 
a comment in the python code about having to change things a bit if running 
from Ant.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-11 Thread GitBox
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-584631168
 
 
   Hi @mocobeta, 
   what's the problem with the source folder? Here it is 
https://docs.gradle.org/current/dsl/org.gradle.api.tasks.SourceSet.html#org.gradle.api.tasks.SourceSet:java
   
   The second problem is the "joning" classpath: This won't work on windows 
(":" is only valid on Linux). With sourceSets. Use the following method: 
`getAsPath()` 
https://docs.gradle.org/current/javadoc/org/gradle/api/file/FileCollection.html#getAsPath--
 (also the complieClassPath for the test sourceset really also contains the 
classes from the main sourceset). The problem may only be incorrect 
dependencies. The bug is that you have to define a separate task per sourceset. 
So don't add a global lintJavadocs path and instead just register a new task 
for each project named "sourceLint" (remove Javadocs from it, the javadocs in 
the ECJ call is obsolete, its just from former times. We now primarily use it 
to find obsolete imports) that depends on 
   
   If you then execute sourceLint from top-level it will execute the task for 
every project separately. You should also be able to call it separately for a 
single unit. Ideally that task should then be depended on each project's 
"check".
   
   I'd rewrite the whole thing, should I work on it. The current setup is very 
gradle-unlike. You'd never do it like that, feels like Ant. :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-11 Thread GitBox
uschindler edited a comment on issue #1242: LUCENE-9201: Port 
documentation-lint task to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-584631168
 
 
   Hi @mocobeta, 
   what's the problem with the source folder? Here it is 
https://docs.gradle.org/current/dsl/org.gradle.api.tasks.SourceSet.html#org.gradle.api.tasks.SourceSet:java
   
   The second problem is the "joning" classpath: This won't work on windows 
(":" is only valid on Linux). With sourceSets. Use the following method: 
`getAsPath()` 
https://docs.gradle.org/current/javadoc/org/gradle/api/file/FileCollection.html#getAsPath--
 (also the complieClassPath for the test sourceset really also contains the 
classes from the main sourceset). The problem may only be incorrect 
dependencies. The bug is that you have to define a separate task per sourceset. 
So don't add a global lintJavadocs path and instead just register a new task 
for each project named "sourceLint" (remove Javadocs from it, the javadocs in 
the ECJ call is obsolete, its just from former times. We now primarily use it 
to find obsolete imports) that depends on 
   
   If you then execute sourceLint from top-level it will execute the task for 
every project separately. You should also be able to call it separately for a 
single unit. Ideally that task should then be depended on each project's 
"check".
   
   I'd rewrite the whole thing - should I work on it? (I don't have much time, 
but spending too much time here in explaining what to do costs more time). 
IMHO, the current setup is very gradle-unlike. You'd never do it like that, 
feels like Ant. :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-11 Thread GitBox
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-584633315
 
 
   The `copyAllJavadocs` should be placed outside of linter.
   
   We will need this anyways, as the whole Javadocs are not structured by 
modules at the moment, we also copy them together in Ant (because it's 
published as one huge folder layout) on the Lucene and Solr web pages.
   
   We should add another task to collect all Javadocs for the lucene and also 
for the solr root projects, add the XSL-based index.html and so allow it to be 
published on website or Jenkins.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >