[GitHub] [lucene-solr] markharwood commented on issue #1234: Add compression for Binary doc value fields
markharwood commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583313015 I've reclaimed my Jira log-in and opened https://issues.apache.org/jira/browse/LUCENE-9211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek opened a new pull request #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton
romseygeek opened a new pull request #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton URL: https://github.com/apache/lucene-solr/pull/1243 Currently it takes `Automaton` and then compiles it internally, but we need to do things like check for binary-vs-unicode status; it should just take `CompiledAutomaton` instead, and put responsibility for determinization, binaryness, etc, on the caller. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] markharwood commented on issue #1234: Add compression for Binary doc value fields
markharwood commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583449275 There was a suggestion from @jimczi that we fall back to writing raw data if content doesn't compress well. I'm not sure this logic is worth developing for the reasons outlined below: I wrote a [compression buffer](https://gist.github.com/markharwood/91cc8d96d6611ad97df11f244b1b1d0f) to see what the compression algo outputs before deciding whether to write the compressed or raw data to disk. I tested with the most uncompressible content I could imagine: public static void fillRandom(byte[] buffer, int length) { for (int i = 0; i < length; i++) { buffer[i] = (byte) (Math.random() * Byte.MAX_VALUE); } } The LZ4 compressed versions of this content were only marginally bigger than their raw counterparts (adding 0.4% overhead to the original content e.g. 96,921 compressed vs 96,541 raw bytes). On that basis I'm not sure if it's worth doubling the memory costs of the indexing logic (we would require a temporary output buffer that is at least the same size as the raw data being compressed) and additional byte shuffling. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376459427 ## File path: solr/core/src/java/org/apache/solr/schema/TextField.java ## @@ -43,6 +43,7 @@ public class TextField extends FieldType { protected boolean autoGeneratePhraseQueries; protected boolean enableGraphQueries; + protected boolean synonymBoostByPayload; Review comment: I thought we switched the approach from a payload to boost attribute? Besides; it's not clear we need this toggle at all since the user could arrange for this behavior simply by having the new DelimitedBoost filter thing in the chain. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376455962 ## File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java ## @@ -509,33 +549,40 @@ protected Query analyzeGraphBoolean(String field, TokenStream source, BooleanCla end = articulationPoints[i]; } lastState = end; - final Query queryPos; + final Query positionalQuery; if (graph.hasSidePath(start)) { -final Iterator it = graph.getFiniteStrings(start, end); +final Iterator sidePathsIterator = graph.getFiniteStrings(start, end); Iterator queries = new Iterator() { @Override public boolean hasNext() { -return it.hasNext(); +return sidePathsIterator.hasNext(); } @Override public Query next() { -TokenStream ts = it.next(); -return createFieldQuery(ts, BooleanClause.Occur.MUST, field, getAutoGenerateMultiTermSynonymsPhraseQuery(), 0); +TokenStream sidePath = sidePathsIterator.next(); +return createFieldQuery(sidePath, BooleanClause.Occur.MUST, field, getAutoGenerateMultiTermSynonymsPhraseQuery(), 0); } }; -queryPos = newGraphSynonymQuery(queries); +positionalQuery = newGraphSynonymQuery(queries); } else { -Term[] terms = graph.getTerms(field, start); +List attributes = graph.getTerms(start); Review comment: I think I mentioned a List of AttributeSource is weird (I've never seen this) and it's heavyweight. Why not a TokenStream or TermAndBoost[] ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376450137 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.boost; + +import org.apache.lucene.analysis.TokenFilter; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.search.BoostAttribute; + +import java.io.IOException; + + +/** + * Characters before the delimiter are the "token", those after are the boost. + * + * For example, if the delimiter is '|', then for the string "foo|0.7", foo is the token + * and 0.7 is the boost. + * + * Note make sure your Tokenizer doesn't split on the delimiter, or this won't work + */ +public final class DelimitedBoostTokenFilter extends TokenFilter { + private final char delimiter; + private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); + private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class); + + public DelimitedBoostTokenFilter(TokenStream input, char delimiter) { +super(input); +this.delimiter = delimiter; + } + + @Override + public boolean incrementToken() throws IOException { +if (input.incrementToken()) { + final char[] buffer = termAtt.buffer(); + final int length = termAtt.length(); + for (int i = 0; i < length; i++) { +if (buffer[i] == delimiter) { + float boost = Float.parseFloat(new String(buffer, i + 1, (length - (i + 1; + boostAtt.setBoost(boost); + termAtt.setLength(i); + return true; +} + } + // we have not seen the delimiter + boostAtt.setBoost(1.0f); Review comment: Shouldn't be needed; leave the boost be -- defaults to 1.0 any way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376460611 ## File path: solr/core/src/test-files/solr/collection1/conf/schema12.xml ## @@ -238,6 +227,18 @@ + Review comment: You can remove "payload" everywhere from this PR now; no? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost by payload
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-583474019 hi @romseygeek , @dsmiley , first of all, thank you again for your patience and very useful insights. I have incorporated Alan's changes and cleaned everything up. My un-resolved questions: - boostAttribute doesn’t use BytesRef but directly float, is it a concern? We are expected to use it at query time, so we could actually see a query time minimal benefit in not encoding/decoding? - Alan expressed concerns over SpanBoostQuery, mentioning they are sort of broken, what should we do in that regard? right now the create span query seems to work as expected with boosted synonyms(see the related test), I suspect if SpanBoostQuery are broken , they should get resolved in another ticket? - from an original comment in the test code org.apache.solr.search.TestSolrQueryParser#testSynonymQueryStyle: "confirm autoGeneratePhraseQueries always builds OR queries" I changed that, was there any reason for that behaviour? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376474333 ## File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java ## @@ -509,33 +549,40 @@ protected Query analyzeGraphBoolean(String field, TokenStream source, BooleanCla end = articulationPoints[i]; } lastState = end; - final Query queryPos; + final Query positionalQuery; if (graph.hasSidePath(start)) { -final Iterator it = graph.getFiniteStrings(start, end); +final Iterator sidePathsIterator = graph.getFiniteStrings(start, end); Iterator queries = new Iterator() { @Override public boolean hasNext() { -return it.hasNext(); +return sidePathsIterator.hasNext(); } @Override public Query next() { -TokenStream ts = it.next(); -return createFieldQuery(ts, BooleanClause.Occur.MUST, field, getAutoGenerateMultiTermSynonymsPhraseQuery(), 0); +TokenStream sidePath = sidePathsIterator.next(); +return createFieldQuery(sidePath, BooleanClause.Occur.MUST, field, getAutoGenerateMultiTermSynonymsPhraseQuery(), 0); } }; -queryPos = newGraphSynonymQuery(queries); +positionalQuery = newGraphSynonymQuery(queries); } else { -Term[] terms = graph.getTerms(field, start); +List attributes = graph.getTerms(start); Review comment: This is what GraphTokenStreamFiniteStrings returns currently, for multiple tokens at the same position. Maybe `TermAndBoost[]` would make more sense though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376473778 ## File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java ## @@ -450,9 +485,13 @@ protected Query analyzePhrase(String field, TokenStream stream, int slop) throws position += 1; } builder.add(new Term(field, termAtt.getBytesRef()), position); + phraseBoost = boostAtt.getBoost(); Review comment: I think this isn't quite right, because we need to combine boosts together somehow; currently your phrase boost is just the boost of the last term in the phrase. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376476198 ## File path: solr/core/src/java/org/apache/solr/schema/TextField.java ## @@ -43,6 +43,7 @@ public class TextField extends FieldType { protected boolean autoGeneratePhraseQueries; protected boolean enableGraphQueries; + protected boolean synonymBoostByPayload; Review comment: agreed and fixed! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376476976 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.boost; + +import org.apache.lucene.analysis.TokenFilter; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.search.BoostAttribute; + +import java.io.IOException; + + +/** + * Characters before the delimiter are the "token", those after are the boost. + * + * For example, if the delimiter is '|', then for the string "foo|0.7", foo is the token + * and 0.7 is the boost. + * + * Note make sure your Tokenizer doesn't split on the delimiter, or this won't work + */ +public final class DelimitedBoostTokenFilter extends TokenFilter { + private final char delimiter; + private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); + private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class); + + public DelimitedBoostTokenFilter(TokenStream input, char delimiter) { +super(input); +this.delimiter = delimiter; + } + + @Override + public boolean incrementToken() throws IOException { +if (input.incrementToken()) { + final char[] buffer = termAtt.buffer(); + final int length = termAtt.length(); + for (int i = 0; i < length; i++) { +if (buffer[i] == delimiter) { + float boost = Float.parseFloat(new String(buffer, i + 1, (length - (i + 1; + boostAtt.setBoost(boost); + termAtt.setLength(i); + return true; +} + } + // we have not seen the delimiter + boostAtt.setBoost(1.0f); Review comment: Fixed in the next coming commit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376478661 ## File path: solr/core/src/test-files/solr/collection1/conf/schema12.xml ## @@ -238,6 +227,18 @@ + Review comment: Fixed in the next coming commit! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376503587 ## File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java ## @@ -509,33 +549,40 @@ protected Query analyzeGraphBoolean(String field, TokenStream source, BooleanCla end = articulationPoints[i]; } lastState = end; - final Query queryPos; + final Query positionalQuery; if (graph.hasSidePath(start)) { -final Iterator it = graph.getFiniteStrings(start, end); +final Iterator sidePathsIterator = graph.getFiniteStrings(start, end); Iterator queries = new Iterator() { @Override public boolean hasNext() { -return it.hasNext(); +return sidePathsIterator.hasNext(); } @Override public Query next() { -TokenStream ts = it.next(); -return createFieldQuery(ts, BooleanClause.Occur.MUST, field, getAutoGenerateMultiTermSynonymsPhraseQuery(), 0); +TokenStream sidePath = sidePathsIterator.next(); +return createFieldQuery(sidePath, BooleanClause.Occur.MUST, field, getAutoGenerateMultiTermSynonymsPhraseQuery(), 0); } }; -queryPos = newGraphSynonymQuery(queries); +positionalQuery = newGraphSynonymQuery(queries); } else { -Term[] terms = graph.getTerms(field, start); +List attributes = graph.getTerms(start); Review comment: a tentative change is coming in the next commit, I added also few tests to cover that else coding branch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376513280 ## File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java ## @@ -450,9 +485,13 @@ protected Query analyzePhrase(String field, TokenStream stream, int slop) throws position += 1; } builder.add(new Term(field, termAtt.getBytesRef()), position); + phraseBoost = boostAtt.getBoost(); Review comment: I implemented a simple multiplicative boost. It's back compatible with the designed use case (multi term synonym -> single concept -> single boost -> e.g. panthera onca => jaguar|0.95, big cat|0.85, black panther|0.65)) But it's also compatible in not synonym cases, if the user needs a boost per token in phrase and span queries. It's in the upcoming commit, let me know if you believe something different is necessary This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost by payload
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost by payload URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-583518344 I have applied the changes to solve the feedback points and consequentially added additional tests to cover some missing scenario. We should be almost ready to go :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on issue #1234: Add compression for Binary doc value fields
msokolov commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583519622 > The LZ4 compressed versions of this content were only marginally bigger than their raw counterparts Did you also test read performance in this incompressible case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob opened a new pull request #1244: SOLR-14247 Remove unneeded sleeps
madrob opened a new pull request #1244: SOLR-14247 Remove unneeded sleeps URL: https://github.com/apache/lucene-solr/pull/1244 This test is slow because it sleeps a lot. Removing the sleeps, it still passes consistently on my machine, but I would like other folks to confirm this on their different hardware as well. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `ant precommit` and the appropriate test suite. - [ ] ~I have added tests for my changes.~ - [ ] ~I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov edited a comment on issue #1234: Add compression for Binary doc value fields
msokolov edited a comment on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583519622 > The LZ4 compressed versions of this content were only marginally bigger than their raw counterparts Did you also test read performance in this incompressible case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on issue #1234: Add compression for Binary doc value fields
jpountz commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583529199 In the case of content that can't be compressed, the compressed data will consist of the number of bytes, followed by the bytes. So decompressing consists of decoding the length and then reading the bytes. The only overhead compared to reading bytes directly is the decoding of the number of bytes, so I would believe that the overhead is rather small. I don't have a strong preference regarding whether this case should be handled explicitly or not. It's true that not having a special "not-compressed" case helps keep the logic simpler. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] markharwood commented on issue #1234: Add compression for Binary doc value fields
markharwood commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583529462 >Did you also test read performance in this incompressible case? Just tried it and it does look 4x faster reading raw random bytes Vs compressed random bytes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on issue #1236: Add back assertions removed by LUCENE-9187.
rmuir commented on issue #1236: Add back assertions removed by LUCENE-9187. URL: https://github.com/apache/lucene-solr/pull/1236#issuecomment-583534489 +1, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields
jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376528169 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException { }; } } + } + + // Decompresses blocks of binary values to retrieve content + class BinaryDecoder { + +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private int []uncompressedDocEnds = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; +private int uncompressedBlockLength = 0; +private int numDocsInBlock = 0; +private final byte[] uncompressedBlock; +private final BytesRef uncompressedBytesRef; + +public BinaryDecoder(LongValues addresses, IndexInput compressedData, int biggestUncompressedBlockSize) { + super(); + this.addresses = addresses; + this.compressedData = compressedData; + // pre-allocate a byte array large enough for the biggest uncompressed block needed. + this.uncompressedBlock = new byte[biggestUncompressedBlockSize]; + uncompressedBytesRef = new BytesRef(uncompressedBlock); + +} + +BytesRef decode(int docNumber) throws IOException { + int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; + int docInBlockId = docNumber % Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; + assert docInBlockId < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; + + + // already read and uncompressed? + if (blockId != lastBlockId) { +lastBlockId = blockId; +long blockStartOffset = addresses.get(blockId); +compressedData.seek(blockStartOffset); + +numDocsInBlock = compressedData.readVInt(); Review comment: do we really need to record the number of documents in the block? It should be 32 for all blocks except for the last one? Maybe at index-time we could append dummy values to the last block to make sure it has 32 values too, and we wouldn't need this vInt anymore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields
jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376531952 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException { }; } } + } + + // Decompresses blocks of binary values to retrieve content + class BinaryDecoder { + +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private int []uncompressedDocEnds = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; Review comment: in the past we've put these constants in the meta file and BinaryEntry so that it's easier to change values over time This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields
jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376527753 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException { }; } } + } + + // Decompresses blocks of binary values to retrieve content + class BinaryDecoder { + +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private int []uncompressedDocEnds = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; +private int uncompressedBlockLength = 0; +private int numDocsInBlock = 0; +private final byte[] uncompressedBlock; +private final BytesRef uncompressedBytesRef; + +public BinaryDecoder(LongValues addresses, IndexInput compressedData, int biggestUncompressedBlockSize) { + super(); + this.addresses = addresses; + this.compressedData = compressedData; + // pre-allocate a byte array large enough for the biggest uncompressed block needed. + this.uncompressedBlock = new byte[biggestUncompressedBlockSize]; + uncompressedBytesRef = new BytesRef(uncompressedBlock); + +} + +BytesRef decode(int docNumber) throws IOException { + int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; + int docInBlockId = docNumber % Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; + assert docInBlockId < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; + + + // already read and uncompressed? + if (blockId != lastBlockId) { +lastBlockId = blockId; +long blockStartOffset = addresses.get(blockId); +compressedData.seek(blockStartOffset); + +numDocsInBlock = compressedData.readVInt(); +assert numDocsInBlock <= Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; +uncompressedDocEnds = new int[numDocsInBlock]; +uncompressedBlockLength = 0; + +int onlyLength = -1; +for (int i = 0; i < numDocsInBlock; i++) { + if (i == 0) { +// The first length value is special. It is shifted and has a bit to denote if +// all other values are the same length +int lengthPlusSameInd = compressedData.readVInt(); +int sameIndicator = lengthPlusSameInd & 1; +int firstValLength = lengthPlusSameInd >>1; +if (sameIndicator == 1) { + onlyLength = firstValLength; +} +uncompressedBlockLength += firstValLength; + } else { +if (onlyLength == -1) { + // Various lengths are stored - read each from disk + uncompressedBlockLength += compressedData.readVInt(); +} else { + // Only one length + uncompressedBlockLength += onlyLength; +} + } + uncompressedDocEnds[i] = uncompressedBlockLength; Review comment: maybe we could call it `uncompressedDocStarts` and set the index at `i+1` which would then help below to remove the else block of the `docInBlockId > 0` condition below? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields
jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376529195 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException { }; } } + } + + // Decompresses blocks of binary values to retrieve content + class BinaryDecoder { + +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private int []uncompressedDocEnds = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; +private int uncompressedBlockLength = 0; +private int numDocsInBlock = 0; +private final byte[] uncompressedBlock; +private final BytesRef uncompressedBytesRef; + +public BinaryDecoder(LongValues addresses, IndexInput compressedData, int biggestUncompressedBlockSize) { + super(); + this.addresses = addresses; + this.compressedData = compressedData; + // pre-allocate a byte array large enough for the biggest uncompressed block needed. + this.uncompressedBlock = new byte[biggestUncompressedBlockSize]; + uncompressedBytesRef = new BytesRef(uncompressedBlock); + +} + +BytesRef decode(int docNumber) throws IOException { + int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; + int docInBlockId = docNumber % Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; + assert docInBlockId < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; + + + // already read and uncompressed? + if (blockId != lastBlockId) { +lastBlockId = blockId; +long blockStartOffset = addresses.get(blockId); +compressedData.seek(blockStartOffset); + +numDocsInBlock = compressedData.readVInt(); +assert numDocsInBlock <= Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; +uncompressedDocEnds = new int[numDocsInBlock]; +uncompressedBlockLength = 0; + +int onlyLength = -1; +for (int i = 0; i < numDocsInBlock; i++) { + if (i == 0) { +// The first length value is special. It is shifted and has a bit to denote if +// all other values are the same length +int lengthPlusSameInd = compressedData.readVInt(); +int sameIndicator = lengthPlusSameInd & 1; +int firstValLength = lengthPlusSameInd >>1; Review comment: Since you are stealing a bit, we should do an unsigned shift (`>>>`) instead. This would never be a problem in practice, but imagine than the length was a 31-bits integer. Shifting by one bit on the left at index time would make this number negative. So here we need an unsigned shift rather than a signed shift that preserves the sign. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields
jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r376532189 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException { }; } } + } + + // Decompresses blocks of binary values to retrieve content + class BinaryDecoder { + +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private int []uncompressedDocEnds = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; +private int uncompressedBlockLength = 0; +private int numDocsInBlock = 0; +private final byte[] uncompressedBlock; +private final BytesRef uncompressedBytesRef; + +public BinaryDecoder(LongValues addresses, IndexInput compressedData, int biggestUncompressedBlockSize) { + super(); + this.addresses = addresses; + this.compressedData = compressedData; + // pre-allocate a byte array large enough for the biggest uncompressed block needed. + this.uncompressedBlock = new byte[biggestUncompressedBlockSize]; + uncompressedBytesRef = new BytesRef(uncompressedBlock); + +} + +BytesRef decode(int docNumber) throws IOException { + int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; + int docInBlockId = docNumber % Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; + assert docInBlockId < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; + + + // already read and uncompressed? + if (blockId != lastBlockId) { +lastBlockId = blockId; +long blockStartOffset = addresses.get(blockId); +compressedData.seek(blockStartOffset); + +numDocsInBlock = compressedData.readVInt(); +assert numDocsInBlock <= Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; +uncompressedDocEnds = new int[numDocsInBlock]; Review comment: can we reuse the same array across blocks? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on issue #1234: Add compression for Binary doc value fields
jpountz commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583536606 @msokolov FWIW LZ4 only removes duplicate strings from a stream: when it finds one it inserts a reference to a previous sequence of bytes. In the special case that the content in incompressible, the LZ4 compressed data just consists of the number of bytes followed by the bytes, so the only overhead compared to reading the bytes directly is the decoding of the number of bytes, which should be rather low. I don't have a preference regarding whether we should have an explicit "not-compressed" case, but I understand how not having one helps keep things simpler. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on issue #1234: Add compression for Binary doc value fields
msokolov commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583538389 Strange that Mark would measure 4x slowdown from decoding the lengths... Perhaps the random bytes are not totally incompressible, just barely compressible? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] markharwood commented on issue #1234: Add compression for Binary doc value fields
markharwood commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583539216 >Strange that Mark would measure 4x slowdown from decoding the lengths... Perhaps the random bytes are not totally incompressible, just barely compressible? I may have been too hasty in that reply - I've not been able to reproduce that and the timings are very similar in the additional tests I've done so echo what @jpountz expects This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] markharwood edited a comment on issue #1234: Add compression for Binary doc value fields
markharwood edited a comment on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583539216 >Strange that Mark would measure 4x slowdown from decoding the lengths... Perhaps the random bytes are not totally incompressible, just barely compressible? I may have been too hasty in that reply - I've not been able to reproduce that and the timings are very similar in the additional tests I've done so echo what @jpountz expects. My first (faster) run had random bytes selected in the range 0-20 and not the 0-127 range where I'm seeing parity This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #1224: LUCENE-9194: Simplify XYShapeQuery API
iverase merged pull request #1224: LUCENE-9194: Simplify XYShapeQuery API URL: https://github.com/apache/lucene-solr/pull/1224 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] markharwood edited a comment on issue #1234: Add compression for Binary doc value fields
markharwood edited a comment on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583539216 >Strange that Mark would measure 4x slowdown from decoding the lengths... Perhaps the random bytes are not totally incompressible, just barely compressible? I may have been too hasty in that reply - I've not been able to reproduce that and the raw vs compressed timings are very similar in the additional tests I've done so echo what @jpountz expects. My first (faster) run had random bytes selected in the range 0-20 and not the 0-127 range where I'm seeing parity This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] anshumg opened a new pull request #1245: Create gradle precommit action
anshumg opened a new pull request #1245: Create gradle precommit action URL: https://github.com/apache/lucene-solr/pull/1245 This adds a gradle precommit action w/ Java11 for all branches. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit merged pull request #1182: LUCENE-9149: Increase data dimension limit in BKD
asfgit merged pull request #1182: LUCENE-9149: Increase data dimension limit in BKD URL: https://github.com/apache/lucene-solr/pull/1182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] anshumg merged pull request #1245: LUCENE-9146: Create gradle precommit action
anshumg merged pull request #1245: LUCENE-9146: Create gradle precommit action URL: https://github.com/apache/lucene-solr/pull/1245 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583751044 The task should just be defined for each sourceSet. Then tests and compile works automatically. Grafles will automatically add 2 tasks (one for each sourceSet): ecjLintMain and ecjLintTest (if you call it ecjLint base name). To set this up ask Gradle for current sourceSets and generate a task with automatic name based on SourceSet name. Classpath is provided gratis. See e.g Gradle internal tasks or forbiddenapis source code how those tasks should be declared. This easy seen here is not in line with the model behind Gradle (you define tasks per sourceSet, so it's extensible). sourceSet by the way also has source target and/or release version. Thi This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler edited a comment on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
uschindler edited a comment on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583751044 The task should just be defined for each sourceSet. Then tests and compile works automatically. Grafles will automatically add 2 tasks (one for each sourceSet): ecjLintMain and ecjLintTest (if you call it ecjLint base name). To set this up ask Gradle for current sourceSets and generate a task with automatic name based on SourceSet name. Classpath is provided gratis. See e.g Gradle internal tasks or forbiddenapis source code how those tasks should be declared. This easy seen here is not in line with the model behind Gradle (you define tasks per sourceSet, so it's extensible, e.g. if we add new sourceSets when building multi-release jars for some modules). sourceSet by the way also has source target and/or release version. Thi This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583753281 Here the forbiddenapis example how to setup a task per sourceSet: https://github.com/policeman-tools/forbidden-apis/blob/master/src/main/resources/de/thetaphi/forbiddenapis/gradle/plugin-init.groovy#L42 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk closed pull request #1209: SOLR-14209: Upgrade JQuery to 3.4.1
risdenk closed pull request #1209: SOLR-14209: Upgrade JQuery to 3.4.1 URL: https://github.com/apache/lucene-solr/pull/1209 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk commented on issue #591: SOLR-9840: Add a unit test for LDAP integration (Hrishikesh Gadre, Kevin Risden)
risdenk commented on issue #591: SOLR-9840: Add a unit test for LDAP integration (Hrishikesh Gadre, Kevin Risden) URL: https://github.com/apache/lucene-solr/pull/591#issuecomment-583764671 working on rebasing to latest master to make sure still valid. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley closed pull request #1202: SOLR-14149: CHANGES.txt Remove off-topic stuff
dsmiley closed pull request #1202: SOLR-14149: CHANGES.txt Remove off-topic stuff URL: https://github.com/apache/lucene-solr/pull/1202 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376753387 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.boost; + +import org.apache.lucene.analysis.TokenFilter; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.search.BoostAttribute; + +import java.io.IOException; + + +/** + * Characters before the delimiter are the "token", those after are the boost. + * + * For example, if the delimiter is '|', then for the string "foo|0.7", foo is the token + * and 0.7 is the boost. + * + * Note make sure your Tokenizer doesn't split on the delimiter, or this won't work + */ +public final class DelimitedBoostTokenFilter extends TokenFilter { + private final char delimiter; + private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); + private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class); + + public DelimitedBoostTokenFilter(TokenStream input, char delimiter) { +super(input); +this.delimiter = delimiter; + } + + @Override + public boolean incrementToken() throws IOException { +if (input.incrementToken()) { + final char[] buffer = termAtt.buffer(); + final int length = termAtt.length(); + for (int i = 0; i < length; i++) { +if (buffer[i] == delimiter) { + float boost = Float.parseFloat(new String(buffer, i + 1, (length - (i + 1; + boostAtt.setBoost(boost); + termAtt.setLength(i); + return true; +} + } + return true; +} else return false; Review comment: I know this is a minor matter of taste but please but brackets on the false side of the else with the code on its own line. This is for consistency with our defacto code style in the project. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376754130 ## File path: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java ## @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) { .toArray(Term[]::new); } + /** + * Returns the list of terms that start at the provided state + */ + public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int state) { Review comment: Given that this class, GraphTokenStreamFiniteStrings deals with List (something I did not know when I made a previous comment), and also that TermAndBoost is an inner class to QueryBuilder, I think it's better to put this back into QueryBuilder. I still think `List` is weird and heavyweight but you didn't add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376753645 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/package-info.java ## @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides various convenience classes for creating boosts on Tokens. + */ +package org.apache.lucene.analysis.boost; Review comment: While I can see why you chose a new "boost" sub-package because the payload based filter from which you drew inspiration was in a "payload" sub-package, I lean towards the "miscellaneous" package. Note that DelimitedTermFrequencyTokenFilter is in "miscellaneous" too. WDYT @romseygeek ? Or maybe we need a new "delimited" sub-package for all these to go; I dunno. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376754130 ## File path: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java ## @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) { .toArray(Term[]::new); } + /** + * Returns the list of terms that start at the provided state + */ + public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int state) { Review comment: Given that this class, GraphTokenStreamFiniteStrings deals with `List` (something I did not know when I made a previous comment), and also that TermAndBoost is an inner class to QueryBuilder, I think it's better to put this back into QueryBuilder. I still think `List` is weird and heavyweight but you didn't add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on issue #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging
ErickErickson commented on issue #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging URL: https://github.com/apache/lucene-solr/pull/1169#issuecomment-583848170 Julie: Moving the conversation about forceMerge over from the JIRA as per Julie. I can imagine ways to shorten the merge process, but it'll still take quite a long time. My main concern was that I didn't know if the problem Julie was talking about was functional or not. So it sounds like the issue is "just" performance. Ways to shorten it: First, I'm assuming you're using TieredMergePolicy, which is the default. The forceMerge(1) option _may_ rewrite any given segment multiple times. There's a limit of 30 segments merged at any given time, see maxMergeAtOnceExplicit. So say you have 300 segments, first you'd have 10 merges of 30 segments in the first pass, then another merge of the resulting segments. Each pass is a complete rewrite of the entire index. Depending on the number of segments, there could be more passes. That limit is mainly there so forceMerge doesn't consume too many resources if, say, indexing or searching are going on, but in your case I'd guess you don't care about that. So you could set it to a very large number and get it done in a single pass. I think that's about the most savings you'd get, I don't think (but haven't measured) whether merging 150 small segments totaling 300G in a single pass is any slower or faster than merging 10 segments totaling 300G, if you wanted to try that you could set maxMergedSegmentMB. That would simply do more merging in the background during indexing to produce fewer, larger segments. Like I said, though, I don't think this will make any difference. So my guess is that if you bump maxMergeAtOnceExplicit to a very large number, you'll cut your merge time in half (or a third or quarter, or... depending on the number of passes). It'll still take considerable time, but may be acceptable. Best, Erick This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1245: LUCENE-9146: Create gradle precommit action
dweiss commented on a change in pull request #1245: LUCENE-9146: Create gradle precommit action URL: https://github.com/apache/lucene-solr/pull/1245#discussion_r376800656 ## File path: .github/workflows/gradle-precommit.yml ## @@ -0,0 +1,23 @@ +name: Gradle Precommit + +on: + pull_request: +branches: +- '*' + +jobs: + test: +name: gradle precommit w/ Java 11 + +runs-on: ubuntu-latest + +steps: +- uses: actions/checkout@v2 +- name: Set up JDK 11 + uses: actions/setup-java@v1 + with: +java-version: 11 +- name: Grant execute permission for gradlew + run: chmod +x gradlew Review comment: gradlew should have this permission already when you do a git clone? Why is it explicit? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader
dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader URL: https://github.com/apache/lucene-solr/pull/1191#issuecomment-583950052 Perhaps the remaining larger changes relating to new classes (e.g. StandaloneSolrResourceLoader) should wait for a follow-on commit; there's plenty here already. Maybe a few static methods could/should move elsewhere but this is ready for a review I think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase opened a new pull request #1246: LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE
iverase opened a new pull request #1246: LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE URL: https://github.com/apache/lucene-solr/pull/1246 Trivial test fix This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376935901 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/package-info.java ## @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides various convenience classes for creating boosts on Tokens. + */ +package org.apache.lucene.analysis.boost; Review comment: I like the `boost` package - I'm already thinking about a `TypeToBoostTokenFilter` that would automatically boost tokens marked with a `SYNONYM` type for example, and there are probably other boosting filters we can come up with, so a package to collect them all makes sense to me. I prefer to group packages by functionality rather than implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376936277 ## File path: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java ## @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) { .toArray(Term[]::new); } + /** + * Returns the list of terms that start at the provided state + */ + public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int state) { Review comment: Yes, let's go back to `AttributeSource` - sorry for the back and forth on this @alessandrobenedetti This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
romseygeek commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376937485 ## File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java ## @@ -63,6 +66,25 @@ protected boolean enableGraphQueries = true; protected boolean autoGenerateMultiTermSynonymsPhraseQuery = false; + /** + * Wraps a term and boost + */ + public static class TermAndBoost { +private static final float DEFAULT_BOOST = 1.0f; Review comment: I think this should probably be on `BoostAttribute` rather than here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on issue #1215: LUCENE-9164: Ignore ACE on tragic event if IW is closed
s1monw commented on issue #1215: LUCENE-9164: Ignore ACE on tragic event if IW is closed URL: https://github.com/apache/lucene-solr/pull/1215#issuecomment-584038675 I will start working on some refactorings to streamline this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376974784 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/package-info.java ## @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides various convenience classes for creating boosts on Tokens. + */ +package org.apache.lucene.analysis.boost; Review comment: So let's keep boost package then? no strong opinion here my side This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376978767 ## File path: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java ## @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) { .toArray(Term[]::new); } + /** + * Returns the list of terms that start at the provided state + */ + public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int state) { Review comment: no worries at all, done in the upcoming commit! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376980226 ## File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java ## @@ -63,6 +66,25 @@ protected boolean enableGraphQueries = true; protected boolean autoGenerateMultiTermSynonymsPhraseQuery = false; + /** + * Wraps a term and boost + */ + public static class TermAndBoost { +private static final float DEFAULT_BOOST = 1.0f; Review comment: I agree, coming in the upcoming commit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376980226 ## File path: lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java ## @@ -63,6 +66,25 @@ protected boolean enableGraphQueries = true; protected boolean autoGenerateMultiTermSynonymsPhraseQuery = false; + /** + * Wraps a term and boost + */ + public static class TermAndBoost { +private static final float DEFAULT_BOOST = 1.0f; Review comment: I agree, coming in the upcoming commit. Furthermore in a lot of places in Lucene and Solr 1.0f is used when it is actually the DEFAULT_BOOST, I won't change that, it's not the scope of this issue but it would be nice to add a ticket to do that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376982361 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.boost; + +import org.apache.lucene.analysis.TokenFilter; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.search.BoostAttribute; + +import java.io.IOException; + + +/** + * Characters before the delimiter are the "token", those after are the boost. + * + * For example, if the delimiter is '|', then for the string "foo|0.7", foo is the token + * and 0.7 is the boost. + * + * Note make sure your Tokenizer doesn't split on the delimiter, or this won't work + */ +public final class DelimitedBoostTokenFilter extends TokenFilter { + private final char delimiter; + private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); + private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class); + + public DelimitedBoostTokenFilter(TokenStream input, char delimiter) { +super(input); +this.delimiter = delimiter; + } + + @Override + public boolean incrementToken() throws IOException { +if (input.incrementToken()) { + final char[] buffer = termAtt.buffer(); + final int length = termAtt.length(); + for (int i = 0; i < length; i++) { +if (buffer[i] == delimiter) { + float boost = Float.parseFloat(new String(buffer, i + 1, (length - (i + 1; + boostAtt.setBoost(boost); + termAtt.setLength(i); + return true; +} + } + return true; +} else return false; Review comment: coming in the next commit, can you check it? I took it from the delimitedPayload, I guess code style is somewhat inconsistent across the project (I verified that multiple times in the past) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-584059142 Latest comments have been addressed, let me know if there's anything else needed here :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #1246: LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE
iverase merged pull request #1246: LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE URL: https://github.com/apache/lucene-solr/pull/1246 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 opened a new pull request #1247: SOLR-14252 use double rather than Double to avoid NPE
andywebb1975 opened a new pull request #1247: SOLR-14252 use double rather than Double to avoid NPE URL: https://github.com/apache/lucene-solr/pull/1247 # Description The getMax and getMin methods in AggregateMetric can throw an NPE if non-Number values are present in values, when it tries to cast a null Double to a double. # Solution This PR switches to using primitive doubles, defaulting to zero, and warns when non-Number values are provided. # Tests TBC # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 commented on issue #1247: SOLR-14252 use double rather than Double to avoid NPE
andywebb1975 commented on issue #1247: SOLR-14252 use double rather than Double to avoid NPE URL: https://github.com/apache/lucene-solr/pull/1247#issuecomment-584087529 The PR really just changes an exception to a warning - it may be papering over another issue. I'm going to try changing `public Object value;` to `public Number value;` at line 41 in order to trigger earlier exceptions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson closed pull request #1241: Gradle util
ErickErickson closed pull request #1241: Gradle util URL: https://github.com/apache/lucene-solr/pull/1241 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on issue #1241: Gradle util
ErickErickson commented on issue #1241: Gradle util URL: https://github.com/apache/lucene-solr/pull/1241#issuecomment-584119269 Didn't link appropriately, I wondered why nobody replied. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson opened a new pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson opened a new pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248 This adds the generation targets for util/packed and util/automaton. For whatever reason my local Python doesn't do anything weird like it did when regenerating the html entities, the generated code is identical. One thing I'd like to draw attention to is that I had to change createLevAutomata.py to path to the new place moman is downloaded to. I'll merge upstream in the next day or two barring objections. I think this finishes off the regeneration work, so I'll close LUCENE-9134 after merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] shalinmangar merged pull request #1220: SOLR-13996: Refactor HttpShardHandler.prepDistributed method
shalinmangar merged pull request #1220: SOLR-13996: Refactor HttpShardHandler.prepDistributed method URL: https://github.com/apache/lucene-solr/pull/1220 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 commented on a change in pull request #1247: SOLR-14252 use double rather than Double to avoid NPE
andywebb1975 commented on a change in pull request #1247: SOLR-14252 use double rather than Double to avoid NPE URL: https://github.com/apache/lucene-solr/pull/1247#discussion_r377223058 ## File path: solr/core/src/java/org/apache/solr/metrics/AggregateMetric.java ## @@ -93,16 +99,13 @@ public double getMax() { if (values.isEmpty()) { return 0; } -Double res = null; +double res = 0; for (Update u : values.values()) { if (!(u.value instanceof Number)) { +log.warn("not a Number: " + u.value); Review comment: Note I'm not completely clear whether `u.value` is ever _expected_ to not be a `Number` - have seen this line report `false` and `LocalStatsCache` and I'm tracing back through to find out why these occur. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r377238017 ## File path: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java ## @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) { .toArray(Term[]::new); } + /** + * Returns the list of terms that start at the provided state + */ + public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int state) { Review comment: Can't you remove this now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #1238: SOLR-14240: Clean up znodes after shard deletion is invoked
HoustonPutman commented on a change in pull request #1238: SOLR-14240: Clean up znodes after shard deletion is invoked URL: https://github.com/apache/lucene-solr/pull/1238#discussion_r377325559 ## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/DeleteShardCmd.java ## @@ -151,6 +154,21 @@ public void call(ClusterState clusterState, ZkNodeProps message, NamedList resul "Error executing delete operation for collection: " + collectionName + " shard: " + sliceId, e); } } + + private void cleanupZooKeeperShardMetadata(SolrZkClient client, String collection, String sliceId) throws InterruptedException { +String leaderElectPath = ZkStateReader.COLLECTIONS_ZKNODE + "/" + collection + "/leader_elect/" + sliceId; +String shardLeaderPath = ZkStateReader.COLLECTIONS_ZKNODE + "/" + collection + "/leaders/" + sliceId; +String shardTermsPath = ZkStateReader.COLLECTIONS_ZKNODE + "/" + collection + "/terms/" + sliceId; + +try { + client.clean(leaderElectPath); + client.clean(shardLeaderPath); + client.clean(shardTermsPath); +} catch (KeeperException ex) { + log.warn("Non-fatal error occured attempting to delete shard metadata on zooker for collection " + Review comment: If you are just logging a warning on failure, you might want to loop through each one, with the try-catch inside the loop. Therefore if one path fails, the others have a chance of succeeding. You can also log the path that failed which will help in debugging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377332813 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/createLevAutomata.py ## @@ -22,7 +22,7 @@ import os import sys # sys.path.insert(0, 'moman/finenight/python') -sys.path.insert(0, '../../../../../../../../build/core/moman/finenight/python') +sys.path.insert(0, '../../../../../../../../../build/moman/finenight/python') Review comment: I think this has been answered before, but please remind me, does this break `ant regenerate` meaning that both cannot co-exist? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377327334 ## File path: gradle/generation/util.gradle ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +apply plugin: "de.undercouch.download" + +configure(rootProject) { + configurations { +utilgen + } + + dependencies { + } + + task utilgen { +description "Regenerate sources for ...lucene/util/automaton and ...lucene/util/packed." +group "generation" + +dependsOn ":lucene:core:utilGenPacked" +dependsOn ":lucene:core:utilGenLev" + } +} + + +task installMoman(type: Download) { + def momanDir = new File(buildDir, "moman").getAbsolutePath() + def momanZip = new File(momanDir, "moman.zip").getAbsolutePath() + + src "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip"; + dest momanZip + onlyIfModified true + + doLast { +logger.lifecycle("Downloading moman to: ${buildDir}") +ant.unzip(src: momanZip, dest: momanDir, overwrite: "true") { + ant.cutdirsmapper(dirs: "1") +} + } +} + +configure(project(":lucene:core")) { + task utilGenPacked(dependsOn: installMoman) { +description "Regenerate util/PackedBulkOperationsPacked*.java and Packed64SingleBlock.java" +group "generation" + +def workDir = "src/java/org/apache/lucene/util/packed" + +doLast { + ['gen_BulkOperation.py', 'gen_Packed64SingleBlock.py'].each { prog -> +logger.lifecycle("Executing: ${prog} in ${workDir}") +project.exec { + workingDir workDir + executable "python" + args = ['-B', "${prog}"] +} + } + // Correct line endings for Windows. + ['Packed64SingleBlock.java', 'BulkOperation*.java'].each { files -> Review comment: Does this need to be an `each` block, or can we specify multiple includes for the ant execution? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377327836 ## File path: gradle/generation/util.gradle ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +apply plugin: "de.undercouch.download" + +configure(rootProject) { + configurations { +utilgen + } + + dependencies { + } + + task utilgen { +description "Regenerate sources for ...lucene/util/automaton and ...lucene/util/packed." +group "generation" + +dependsOn ":lucene:core:utilGenPacked" +dependsOn ":lucene:core:utilGenLev" + } +} + + +task installMoman(type: Download) { + def momanDir = new File(buildDir, "moman").getAbsolutePath() + def momanZip = new File(momanDir, "moman.zip").getAbsolutePath() + + src "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip"; + dest momanZip + onlyIfModified true + + doLast { +logger.lifecycle("Downloading moman to: ${buildDir}") +ant.unzip(src: momanZip, dest: momanDir, overwrite: "true") { + ant.cutdirsmapper(dirs: "1") +} + } +} + +configure(project(":lucene:core")) { + task utilGenPacked(dependsOn: installMoman) { +description "Regenerate util/PackedBulkOperationsPacked*.java and Packed64SingleBlock.java" +group "generation" + +def workDir = "src/java/org/apache/lucene/util/packed" + +doLast { + ['gen_BulkOperation.py', 'gen_Packed64SingleBlock.py'].each { prog -> +logger.lifecycle("Executing: ${prog} in ${workDir}") +project.exec { + workingDir workDir + executable "python" + args = ['-B', "${prog}"] +} + } + // Correct line endings for Windows. + ['Packed64SingleBlock.java', 'BulkOperation*.java'].each { files -> +project.ant.fixcrlf( +srcDir: workDir, +includes: files, +encoding: 'UTF-8', +eol: 'lf' +) + } +} + } +} + +configure(project(":lucene:core")) { Review comment: I think you can combine this with the previous configure block, I don't think separating them adds readability. Let me know if you did this intentionally though This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377369604 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/createLevAutomata.py ## @@ -22,7 +22,7 @@ import os import sys # sys.path.insert(0, 'moman/finenight/python') -sys.path.insert(0, '../../../../../../../../build/core/moman/finenight/python') +sys.path.insert(0, '../../../../../../../../../build/moman/finenight/python') Review comment: Yeah. I'm pretty sure it does break the ant regenerate. Since the whole regenerate process is apparently run extremely rarely (like every couple of years or so from what I can tell), I think we'll be on gradle exclusively the next time this is run. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377374590 ## File path: gradle/generation/util.gradle ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +apply plugin: "de.undercouch.download" + +configure(rootProject) { + configurations { +utilgen + } + + dependencies { + } + + task utilgen { +description "Regenerate sources for ...lucene/util/automaton and ...lucene/util/packed." +group "generation" + +dependsOn ":lucene:core:utilGenPacked" +dependsOn ":lucene:core:utilGenLev" + } +} + + +task installMoman(type: Download) { + def momanDir = new File(buildDir, "moman").getAbsolutePath() + def momanZip = new File(momanDir, "moman.zip").getAbsolutePath() + + src "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip"; + dest momanZip + onlyIfModified true + + doLast { +logger.lifecycle("Downloading moman to: ${buildDir}") +ant.unzip(src: momanZip, dest: momanDir, overwrite: "true") { + ant.cutdirsmapper(dirs: "1") +} + } +} + +configure(project(":lucene:core")) { + task utilGenPacked(dependsOn: installMoman) { +description "Regenerate util/PackedBulkOperationsPacked*.java and Packed64SingleBlock.java" +group "generation" + +def workDir = "src/java/org/apache/lucene/util/packed" + +doLast { + ['gen_BulkOperation.py', 'gen_Packed64SingleBlock.py'].each { prog -> +logger.lifecycle("Executing: ${prog} in ${workDir}") +project.exec { + workingDir workDir + executable "python" + args = ['-B', "${prog}"] +} + } + // Correct line endings for Windows. + ['Packed64SingleBlock.java', 'BulkOperation*.java'].each { files -> +project.ant.fixcrlf( +srcDir: workDir, +includes: files, +encoding: 'UTF-8', +eol: 'lf' +) + } +} + } +} + +configure(project(":lucene:core")) { Review comment: Good point, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377376693 ## File path: gradle/generation/util.gradle ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +apply plugin: "de.undercouch.download" + +configure(rootProject) { + configurations { +utilgen + } + + dependencies { + } + + task utilgen { +description "Regenerate sources for ...lucene/util/automaton and ...lucene/util/packed." +group "generation" + +dependsOn ":lucene:core:utilGenPacked" +dependsOn ":lucene:core:utilGenLev" + } +} + + +task installMoman(type: Download) { + def momanDir = new File(buildDir, "moman").getAbsolutePath() + def momanZip = new File(momanDir, "moman.zip").getAbsolutePath() + + src "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip"; + dest momanZip + onlyIfModified true + + doLast { +logger.lifecycle("Downloading moman to: ${buildDir}") +ant.unzip(src: momanZip, dest: momanDir, overwrite: "true") { + ant.cutdirsmapper(dirs: "1") +} + } +} + +configure(project(":lucene:core")) { + task utilGenPacked(dependsOn: installMoman) { +description "Regenerate util/PackedBulkOperationsPacked*.java and Packed64SingleBlock.java" +group "generation" + +def workDir = "src/java/org/apache/lucene/util/packed" + +doLast { + ['gen_BulkOperation.py', 'gen_Packed64SingleBlock.py'].each { prog -> +logger.lifecycle("Executing: ${prog} in ${workDir}") +project.exec { + workingDir workDir + executable "python" + args = ['-B', "${prog}"] +} + } + // Correct line endings for Windows. + ['Packed64SingleBlock.java', 'BulkOperation*.java'].each { files -> Review comment: True, I'll change it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on issue #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson commented on issue #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#issuecomment-584409136 I made the changes Mike mentioned, but I won't create another PR for a bit to give others a chance to look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
madrob commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377429993 ## File path: gradle/generation/util.gradle ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +apply plugin: "de.undercouch.download" + +configure(rootProject) { + configurations { +utilgen + } + + dependencies { Review comment: nit: drop this empty block? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #1244: SOLR-14247 Remove unneeded sleeps
madrob merged pull request #1244: SOLR-14247 Remove unneeded sleeps URL: https://github.com/apache/lucene-solr/pull/1244 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader
dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader URL: https://github.com/apache/lucene-solr/pull/1191#issuecomment-584496996 I think this PR is ready for review @madrob since there are a lot of changes even without introducing SRL subclasses. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase opened a new pull request #1249: LUCENE-9217: Add validation to XYGeometries
iverase opened a new pull request #1249: LUCENE-9217: Add validation to XYGeometries URL: https://github.com/apache/lucene-solr/pull/1249 This PR adds validation for XYGeometries, in particular checking for non-valid values like NaN, INF and -INF. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on a change in pull request #1249: LUCENE-9217: Add validation to XYGeometries
iverase commented on a change in pull request #1249: LUCENE-9217: Add validation to XYGeometries URL: https://github.com/apache/lucene-solr/pull/1249#discussion_r377470379 ## File path: lucene/core/src/java/org/apache/lucene/geo/XYRectangle.java ## @@ -29,12 +31,16 @@ /** Constructs a bounding box by first validating the provided x and y coordinates */ public XYRectangle(double minX, double maxX, double minY, double maxY) { -this.minX = minX; -this.maxX = maxX; -this.minY = minY; -this.maxY = maxY; -assert minX <= maxX; -assert minY <= maxY; +if (minX > maxX) { Review comment: I wonder if a XYRectangle should be initialise with floats instead of doubles like the other XYGeometries? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
dweiss commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377479333 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/createLevAutomata.py ## @@ -22,7 +22,7 @@ import os import sys # sys.path.insert(0, 'moman/finenight/python') -sys.path.insert(0, '../../../../../../../../build/core/moman/finenight/python') +sys.path.insert(0, '../../../../../../../../../build/moman/finenight/python') Review comment: If so then we should add a fail to ant and just say "use gradle?". These python scripts could take an argument - the path to moman passed from gradle script. Then it'd be elegant and clear without those ugly relative paths. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on issue #1214: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches
atris commented on issue #1214: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches URL: https://github.com/apache/lucene-solr/pull/1214#issuecomment-584514132 @jpountz Updated, please see and let me know your thoughts This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #1214: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches
atris commented on a change in pull request #1214: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches URL: https://github.com/apache/lucene-solr/pull/1214#discussion_r377481946 ## File path: lucene/core/src/java/org/apache/lucene/search/QueueSizeBasedExecutionControlPlane.java ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.search; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.concurrent.Executor; +import java.util.concurrent.Future; +import java.util.concurrent.FutureTask; +import java.util.concurrent.RejectedExecutionException; +import java.util.concurrent.ThreadPoolExecutor; + +/** + * Implementation of SliceExecutionControlPlane with queue backpressure based thread allocation + */ +public class QueueSizeBasedExecutionControlPlane implements SliceExecutionControlPlane { + private static final double LIMITING_FACTOR = 1.5; + private static final int NUMBER_OF_PROCESSORS = Runtime.getRuntime().availableProcessors(); + + private Executor executor; + + public QueueSizeBasedExecutionControlPlane(Executor executor) { +this.executor = executor; + } + + @Override + public List> invokeAll(Collection tasks) { +boolean isThresholdCheckEnabled = true; + +if (tasks == null) { + throw new IllegalArgumentException("Tasks is null"); +} + +if (executor == null) { + throw new IllegalArgumentException("Executor is null"); +} + +ThreadPoolExecutor threadPoolExecutor = null; +if ((executor instanceof ThreadPoolExecutor) == false) { Review comment: Agreed. Reverted the Executor changes and added the abstraction while updating the docs for IndexSearcher This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #1250: SOLR-14250 Fix error logging on Expect: 100-continue
janhoy opened a new pull request #1250: SOLR-14250 Fix error logging on Expect: 100-continue URL: https://github.com/apache/lucene-solr/pull/1250 See https://issues.apache.org/jira/browse/SOLR-14250 With this PR, we'll still always try to consume the stream, but if the input stream is not available due to Expect: header, and an error has already been sent by Solr, the resulting IOException from Jetty will not be logged at INFO level but instead be a simple line on DEBUG level. The PR does not try to add a test to validate behaviour since this is a logging change only, so there is no risk that the stream is not consumed anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #1250: SOLR-14250 Fix error logging on Expect: 100-continue
janhoy commented on issue #1250: SOLR-14250 Fix error logging on Expect: 100-continue URL: https://github.com/apache/lucene-solr/pull/1250#issuecomment-584537676 I tested manually that the new DEBUG log line is printed when hitting a non-existent URL, e.g. curl -H "Content-Type: application/json" -H "Expect: 100-continue" http://localhost:8983/solr/foo/update2 So I think this is good to go. Will leave it sitting here to collect feedback for a few days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost
alessandrobenedetti commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#discussion_r377532448 ## File path: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java ## @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) { .toArray(Term[]::new); } + /** + * Returns the list of terms that start at the provided state + */ + public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int state) { Review comment: just forgot, it's done now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-584554973 So the code should be ok now, should we think where and how to properly document it ? I will definitely write a blog post on that (that we can later link) but I guess we should think to the official documentation part now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields
juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377544478 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -353,67 +360,193 @@ private void writeBlock(long[] values, int length, long gcd, ByteBuffersDataOutp } } - @Override - public void addBinaryField(FieldInfo field, DocValuesProducer valuesProducer) throws IOException { -meta.writeInt(field.number); -meta.writeByte(Lucene80DocValuesFormat.BINARY); - -BinaryDocValues values = valuesProducer.getBinary(field); -long start = data.getFilePointer(); -meta.writeLong(start); // dataOffset -int numDocsWithField = 0; -int minLength = Integer.MAX_VALUE; -int maxLength = 0; -for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = values.nextDoc()) { - numDocsWithField++; - BytesRef v = values.binaryValue(); - int length = v.length; - data.writeBytes(v.bytes, v.offset, v.length); - minLength = Math.min(length, minLength); - maxLength = Math.max(length, maxLength); + class CompressedBinaryBlockWriter implements Closeable { +FastCompressionHashTable ht = new LZ4.FastCompressionHashTable(); +int uncompressedBlockLength = 0; +int maxUncompressedBlockLength = 0; +int numDocsInCurrentBlock = 0; +int[] docLengths = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; +byte[] block = new byte [1024 * 16]; +int totalChunks = 0; +long maxPointer = 0; +long blockAddressesStart = -1; + +private IndexOutput tempBinaryOffsets; + + +public CompressedBinaryBlockWriter() throws IOException { + tempBinaryOffsets = state.directory.createTempOutput(state.segmentInfo.name, "binary_pointers", state.context); + boolean success = false; + try { +CodecUtil.writeHeader(tempBinaryOffsets, Lucene80DocValuesFormat.META_CODEC + "FilePointers", Lucene80DocValuesFormat.VERSION_CURRENT); +success = true; + } finally { +if (success == false) { + IOUtils.closeWhileHandlingException(this); //self-close because constructor caller can't +} + } } -assert numDocsWithField <= maxDoc; -meta.writeLong(data.getFilePointer() - start); // dataLength -if (numDocsWithField == 0) { - meta.writeLong(-2); // docsWithFieldOffset - meta.writeLong(0L); // docsWithFieldLength - meta.writeShort((short) -1); // jumpTableEntryCount - meta.writeByte((byte) -1); // denseRankPower -} else if (numDocsWithField == maxDoc) { - meta.writeLong(-1); // docsWithFieldOffset - meta.writeLong(0L); // docsWithFieldLength - meta.writeShort((short) -1); // jumpTableEntryCount - meta.writeByte((byte) -1); // denseRankPower -} else { - long offset = data.getFilePointer(); - meta.writeLong(offset); // docsWithFieldOffset - values = valuesProducer.getBinary(field); - final short jumpTableEntryCount = IndexedDISI.writeBitSet(values, data, IndexedDISI.DEFAULT_DENSE_RANK_POWER); - meta.writeLong(data.getFilePointer() - offset); // docsWithFieldLength - meta.writeShort(jumpTableEntryCount); - meta.writeByte(IndexedDISI.DEFAULT_DENSE_RANK_POWER); +void addDoc(int doc, BytesRef v) throws IOException { + if (blockAddressesStart < 0) { +blockAddressesStart = data.getFilePointer(); + } + docLengths[numDocsInCurrentBlock] = v.length; + block = ArrayUtil.grow(block, uncompressedBlockLength + v.length); + System.arraycopy(v.bytes, v.offset, block, uncompressedBlockLength, v.length); + uncompressedBlockLength += v.length; + numDocsInCurrentBlock++; + if (numDocsInCurrentBlock == Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK) { +flushData(); + } } -meta.writeInt(numDocsWithField); -meta.writeInt(minLength); -meta.writeInt(maxLength); -if (maxLength > minLength) { - start = data.getFilePointer(); - meta.writeLong(start); +private void flushData() throws IOException { + if (numDocsInCurrentBlock > 0) { +// Write offset to this block to temporary offsets file +totalChunks++; +long thisBlockStartPointer = data.getFilePointer(); + +// Optimisation - check if all lengths are same +boolean allLengthsSame = true && numDocsInCurrentBlock >0 ; +for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK && allLengthsSame; i++) { + if (i > 0 && docLengths[i] != docLengths[i-1]) { +allLengthsSame = false; + } +} +if (allLengthsSame) { +// Only write one value shifted. Steal a bit to indicate all other lengths are the same +int onlyOneLength = (docLengths[0
[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields
juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377545909 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -353,67 +360,193 @@ private void writeBlock(long[] values, int length, long gcd, ByteBuffersDataOutp } } - @Override - public void addBinaryField(FieldInfo field, DocValuesProducer valuesProducer) throws IOException { -meta.writeInt(field.number); -meta.writeByte(Lucene80DocValuesFormat.BINARY); - -BinaryDocValues values = valuesProducer.getBinary(field); -long start = data.getFilePointer(); -meta.writeLong(start); // dataOffset -int numDocsWithField = 0; -int minLength = Integer.MAX_VALUE; -int maxLength = 0; -for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = values.nextDoc()) { - numDocsWithField++; - BytesRef v = values.binaryValue(); - int length = v.length; - data.writeBytes(v.bytes, v.offset, v.length); - minLength = Math.min(length, minLength); - maxLength = Math.max(length, maxLength); + class CompressedBinaryBlockWriter implements Closeable { +FastCompressionHashTable ht = new LZ4.FastCompressionHashTable(); +int uncompressedBlockLength = 0; +int maxUncompressedBlockLength = 0; +int numDocsInCurrentBlock = 0; +int[] docLengths = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; +byte[] block = new byte [1024 * 16]; +int totalChunks = 0; +long maxPointer = 0; +long blockAddressesStart = -1; + +private IndexOutput tempBinaryOffsets; + + +public CompressedBinaryBlockWriter() throws IOException { + tempBinaryOffsets = state.directory.createTempOutput(state.segmentInfo.name, "binary_pointers", state.context); + boolean success = false; + try { +CodecUtil.writeHeader(tempBinaryOffsets, Lucene80DocValuesFormat.META_CODEC + "FilePointers", Lucene80DocValuesFormat.VERSION_CURRENT); +success = true; + } finally { +if (success == false) { + IOUtils.closeWhileHandlingException(this); //self-close because constructor caller can't +} + } } -assert numDocsWithField <= maxDoc; -meta.writeLong(data.getFilePointer() - start); // dataLength -if (numDocsWithField == 0) { - meta.writeLong(-2); // docsWithFieldOffset - meta.writeLong(0L); // docsWithFieldLength - meta.writeShort((short) -1); // jumpTableEntryCount - meta.writeByte((byte) -1); // denseRankPower -} else if (numDocsWithField == maxDoc) { - meta.writeLong(-1); // docsWithFieldOffset - meta.writeLong(0L); // docsWithFieldLength - meta.writeShort((short) -1); // jumpTableEntryCount - meta.writeByte((byte) -1); // denseRankPower -} else { - long offset = data.getFilePointer(); - meta.writeLong(offset); // docsWithFieldOffset - values = valuesProducer.getBinary(field); - final short jumpTableEntryCount = IndexedDISI.writeBitSet(values, data, IndexedDISI.DEFAULT_DENSE_RANK_POWER); - meta.writeLong(data.getFilePointer() - offset); // docsWithFieldLength - meta.writeShort(jumpTableEntryCount); - meta.writeByte(IndexedDISI.DEFAULT_DENSE_RANK_POWER); +void addDoc(int doc, BytesRef v) throws IOException { + if (blockAddressesStart < 0) { +blockAddressesStart = data.getFilePointer(); + } + docLengths[numDocsInCurrentBlock] = v.length; + block = ArrayUtil.grow(block, uncompressedBlockLength + v.length); + System.arraycopy(v.bytes, v.offset, block, uncompressedBlockLength, v.length); + uncompressedBlockLength += v.length; + numDocsInCurrentBlock++; + if (numDocsInCurrentBlock == Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK) { +flushData(); + } } -meta.writeInt(numDocsWithField); -meta.writeInt(minLength); -meta.writeInt(maxLength); -if (maxLength > minLength) { - start = data.getFilePointer(); - meta.writeLong(start); +private void flushData() throws IOException { + if (numDocsInCurrentBlock > 0) { +// Write offset to this block to temporary offsets file +totalChunks++; +long thisBlockStartPointer = data.getFilePointer(); + +// Optimisation - check if all lengths are same +boolean allLengthsSame = true && numDocsInCurrentBlock >0 ; +for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK && allLengthsSame; i++) { + if (i > 0 && docLengths[i] != docLengths[i-1]) { +allLengthsSame = false; + } +} +if (allLengthsSame) { +// Only write one value shifted. Steal a bit to indicate all other lengths are the same +int onlyOneLength = (docLengths[0
[GitHub] [lucene-solr] mocobeta commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
mocobeta commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-584574521 Thanks @uschindler for your comments. I rewrote the task to use sourceSets instead of relying the assumption the target projects (for linting) have "src/java". However there is another problem, the Java source directory isn't available from SourceSet as far as I know. Actual source directory path is required when executing the ECJ. ```groovy // excerpt from the custom ECJ lint task project.plugins.withId('java', { project.sourceSets.each { sourceSet -> project.javaexec { classpath { project.rootProject.configurations.ecj.asPath } main = "org.eclipse.jdt.internal.compiler.batch.Main" args += [ // Unfortunately, 'testCompileClasspath' is not available from sourceSet, so without the second term test classes cannot be compiled. "-classpath", sourceSet.compileClasspath.toList().join(':') + project.configurations.testCompileClasspath.findAll().join(":"), "-d", dstDir, "-encoding", "UTF-8", "-source", "11", // How this can be obtained from sourceSet or project? "-target", "11", "-nowarn", "-enableJavadoc", "-properties", "${project.rootProject.rootDir}/lucene/tools/javadoc/ecj.javadocs.prefs", "src/java" // ... or "src/test". How this can be obtained from sourceSet or project? ] } } }) ``` SourceSet has a property `allJava` that contains all Java source file, this is no help here. https://docs.gradle.org/current/dsl/org.gradle.api.tasks.SourceSet.html#org.gradle.api.tasks.SourceSet I might misses something, or another hack is required to identify the actual source directory path? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields
juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377566543 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesFormat.java ## @@ -151,7 +151,8 @@ public DocValuesProducer fieldsProducer(SegmentReadState state) throws IOExcepti static final String META_CODEC = "Lucene80DocValuesMetadata"; static final String META_EXTENSION = "dvm"; static final int VERSION_START = 0; - static final int VERSION_CURRENT = VERSION_START; + static final int VERSION_BIN_COMPRESSED = 1; Review comment: This could be potentially in the BinaryDocValuesFormat class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields
juanka588 commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377579943 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -742,6 +755,131 @@ public BytesRef binaryValue() throws IOException { }; } } + } + + // Decompresses blocks of binary values to retrieve content + class BinaryDecoder { + +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private int []uncompressedDocEnds = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; Review comment: @jpountz we should use the same structure while writing the data, in that case you will see all the properties of the class instead of adding comments in the code This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields
jpountz commented on a change in pull request #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r377621003 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -353,67 +360,193 @@ private void writeBlock(long[] values, int length, long gcd, ByteBuffersDataOutp } } - @Override - public void addBinaryField(FieldInfo field, DocValuesProducer valuesProducer) throws IOException { -meta.writeInt(field.number); -meta.writeByte(Lucene80DocValuesFormat.BINARY); - -BinaryDocValues values = valuesProducer.getBinary(field); -long start = data.getFilePointer(); -meta.writeLong(start); // dataOffset -int numDocsWithField = 0; -int minLength = Integer.MAX_VALUE; -int maxLength = 0; -for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = values.nextDoc()) { - numDocsWithField++; - BytesRef v = values.binaryValue(); - int length = v.length; - data.writeBytes(v.bytes, v.offset, v.length); - minLength = Math.min(length, minLength); - maxLength = Math.max(length, maxLength); + class CompressedBinaryBlockWriter implements Closeable { +FastCompressionHashTable ht = new LZ4.FastCompressionHashTable(); +int uncompressedBlockLength = 0; +int maxUncompressedBlockLength = 0; +int numDocsInCurrentBlock = 0; +int[] docLengths = new int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK]; +byte[] block = new byte [1024 * 16]; +int totalChunks = 0; +long maxPointer = 0; +long blockAddressesStart = -1; + +private IndexOutput tempBinaryOffsets; + + +public CompressedBinaryBlockWriter() throws IOException { + tempBinaryOffsets = state.directory.createTempOutput(state.segmentInfo.name, "binary_pointers", state.context); + boolean success = false; + try { +CodecUtil.writeHeader(tempBinaryOffsets, Lucene80DocValuesFormat.META_CODEC + "FilePointers", Lucene80DocValuesFormat.VERSION_CURRENT); +success = true; + } finally { +if (success == false) { + IOUtils.closeWhileHandlingException(this); //self-close because constructor caller can't +} + } } -assert numDocsWithField <= maxDoc; -meta.writeLong(data.getFilePointer() - start); // dataLength -if (numDocsWithField == 0) { - meta.writeLong(-2); // docsWithFieldOffset - meta.writeLong(0L); // docsWithFieldLength - meta.writeShort((short) -1); // jumpTableEntryCount - meta.writeByte((byte) -1); // denseRankPower -} else if (numDocsWithField == maxDoc) { - meta.writeLong(-1); // docsWithFieldOffset - meta.writeLong(0L); // docsWithFieldLength - meta.writeShort((short) -1); // jumpTableEntryCount - meta.writeByte((byte) -1); // denseRankPower -} else { - long offset = data.getFilePointer(); - meta.writeLong(offset); // docsWithFieldOffset - values = valuesProducer.getBinary(field); - final short jumpTableEntryCount = IndexedDISI.writeBitSet(values, data, IndexedDISI.DEFAULT_DENSE_RANK_POWER); - meta.writeLong(data.getFilePointer() - offset); // docsWithFieldLength - meta.writeShort(jumpTableEntryCount); - meta.writeByte(IndexedDISI.DEFAULT_DENSE_RANK_POWER); +void addDoc(int doc, BytesRef v) throws IOException { + if (blockAddressesStart < 0) { +blockAddressesStart = data.getFilePointer(); + } + docLengths[numDocsInCurrentBlock] = v.length; + block = ArrayUtil.grow(block, uncompressedBlockLength + v.length); + System.arraycopy(v.bytes, v.offset, block, uncompressedBlockLength, v.length); + uncompressedBlockLength += v.length; + numDocsInCurrentBlock++; + if (numDocsInCurrentBlock == Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK) { +flushData(); + } } -meta.writeInt(numDocsWithField); -meta.writeInt(minLength); -meta.writeInt(maxLength); -if (maxLength > minLength) { - start = data.getFilePointer(); - meta.writeLong(start); +private void flushData() throws IOException { + if (numDocsInCurrentBlock > 0) { +// Write offset to this block to temporary offsets file +totalChunks++; +long thisBlockStartPointer = data.getFilePointer(); + +// Optimisation - check if all lengths are same +boolean allLengthsSame = true && numDocsInCurrentBlock >0 ; +for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK && allLengthsSame; i++) { + if (i > 0 && docLengths[i] != docLengths[i-1]) { +allLengthsSame = false; + } +} +if (allLengthsSame) { +// Only write one value shifted. Steal a bit to indicate all other lengths are the same +int onlyOneLength = (docLengths[0]
[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377624894 ## File path: gradle/generation/util.gradle ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +apply plugin: "de.undercouch.download" + +configure(rootProject) { + configurations { +utilgen + } + + dependencies { Review comment: Yeah, I'll nuke that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson commented on a change in pull request #1248: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1248#discussion_r377627813 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/createLevAutomata.py ## @@ -22,7 +22,7 @@ import os import sys # sys.path.insert(0, 'moman/finenight/python') -sys.path.insert(0, '../../../../../../../../build/core/moman/finenight/python') +sys.path.insert(0, '../../../../../../../../../build/moman/finenight/python') Review comment: Dawid: Yeah, I wondered about that. But please don't make me run the whole regenerate task in ant ;). I'll change this to take an argument and get it working in Gradle, and leave a comment in the python code about having to change things a bit if running from Ant. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-584631168 Hi @mocobeta, what's the problem with the source folder? Here it is https://docs.gradle.org/current/dsl/org.gradle.api.tasks.SourceSet.html#org.gradle.api.tasks.SourceSet:java The second problem is the "joning" classpath: This won't work on windows (":" is only valid on Linux). With sourceSets. Use the following method: `getAsPath()` https://docs.gradle.org/current/javadoc/org/gradle/api/file/FileCollection.html#getAsPath-- (also the complieClassPath for the test sourceset really also contains the classes from the main sourceset). The problem may only be incorrect dependencies. The bug is that you have to define a separate task per sourceset. So don't add a global lintJavadocs path and instead just register a new task for each project named "sourceLint" (remove Javadocs from it, the javadocs in the ECJ call is obsolete, its just from former times. We now primarily use it to find obsolete imports) that depends on If you then execute sourceLint from top-level it will execute the task for every project separately. You should also be able to call it separately for a single unit. Ideally that task should then be depended on each project's "check". I'd rewrite the whole thing, should I work on it. The current setup is very gradle-unlike. You'd never do it like that, feels like Ant. :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler edited a comment on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
uschindler edited a comment on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-584631168 Hi @mocobeta, what's the problem with the source folder? Here it is https://docs.gradle.org/current/dsl/org.gradle.api.tasks.SourceSet.html#org.gradle.api.tasks.SourceSet:java The second problem is the "joning" classpath: This won't work on windows (":" is only valid on Linux). With sourceSets. Use the following method: `getAsPath()` https://docs.gradle.org/current/javadoc/org/gradle/api/file/FileCollection.html#getAsPath-- (also the complieClassPath for the test sourceset really also contains the classes from the main sourceset). The problem may only be incorrect dependencies. The bug is that you have to define a separate task per sourceset. So don't add a global lintJavadocs path and instead just register a new task for each project named "sourceLint" (remove Javadocs from it, the javadocs in the ECJ call is obsolete, its just from former times. We now primarily use it to find obsolete imports) that depends on If you then execute sourceLint from top-level it will execute the task for every project separately. You should also be able to call it separately for a single unit. Ideally that task should then be depended on each project's "check". I'd rewrite the whole thing - should I work on it? (I don't have much time, but spending too much time here in explaining what to do costs more time). IMHO, the current setup is very gradle-unlike. You'd never do it like that, feels like Ant. :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-584633315 The `copyAllJavadocs` should be placed outside of linter. We will need this anyways, as the whole Javadocs are not structured by modules at the moment, we also copy them together in Ant (because it's published as one huge folder layout) on the Lucene and Solr web pages. We should add another task to collect all Javadocs for the lucene and also for the solr root projects, add the XSL-based index.html and so allow it to be published on website or Jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org