[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
donnerpeter commented on a change in pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568394955 ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/SpellCheckerTest.java ## @@ -61,59 +61,74 @@ public void needAffixOnAffixes() throws Exception { doTest("needaffix5"); } + @Test public void testBreak() throws Exception { doTest("break"); } - public void testBreakDefault() throws Exception { + @Test + public void breakDefault() throws Exception { doTest("breakdefault"); } - public void testBreakOff() throws Exception { + @Test + public void breakOff() throws Exception { doTest("breakoff"); } - public void testCompoundrule() throws Exception { + @Test + public void compoundrule() throws Exception { doTest("compoundrule"); } - public void testCompoundrule2() throws Exception { + @Test + public void compoundrule2() throws Exception { doTest("compoundrule2"); } - public void testCompoundrule3() throws Exception { + @Test + public void compoundrule3() throws Exception { doTest("compoundrule3"); } - public void testCompoundrule4() throws Exception { + @Test + public void compoundrule4() throws Exception { doTest("compoundrule4"); } - public void testCompoundrule5() throws Exception { + @Test + public void compoundrule5() throws Exception { doTest("compoundrule5"); } - public void testCompoundrule6() throws Exception { + @Test + public void compoundrule6() throws Exception { doTest("compoundrule6"); } - public void testCompoundrule7() throws Exception { + @Test + public void compoundrule7() throws Exception { doTest("compoundrule7"); } - public void testCompoundrule8() throws Exception { + @Test + public void compoundrule8() throws Exception { doTest("compoundrule8"); } - public void testGermanCompounding() throws Exception { + @Test + public void germanCompounding() throws Exception { doTest("germancompounding"); } protected void doTest(String name) throws Exception { -InputStream affixStream = -Objects.requireNonNull(getClass().getResourceAsStream(name + ".aff"), name); -InputStream dictStream = -Objects.requireNonNull(getClass().getResourceAsStream(name + ".dic"), name); +checkSpellCheckerExpectations( Review comment: Thanks for looking into this and for your patch! I've no idea why you can't push, I've got the checkbox enabled on this PR:  That's what https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork seems to recommend. I also checked the repository settings and couldn't find an option about committers. I can invite you personally though :) By renaming do you mean `TestSpellChecker`? I'll do it, thanks for the suggestion, but preferably a bit later, when there won't be so much merging around this class :) BTW what do you think about renaming `SpellChecker` into `Hunspell`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory
dweiss commented on a change in pull request #2258: URL: https://github.com/apache/lucene-solr/pull/2258#discussion_r568395337 ## File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java ## @@ -381,17 +377,18 @@ public long length() { @Override public byte readByte() throws IOException { if (!buffer.hasRemaining()) { -refill(); +refill(1); } + return buffer.get(); } -private void refill() throws IOException { +private void refill(int byteToRead) throws IOException { filePos += buffer.capacity(); // BaseDirectoryTestCase#testSeekPastEOF test for consecutive read past EOF, // hence throwing EOFException early to maintain buffer state (position in particular) - if (filePos > channel.size()) { + if (filePos > channel.size() || (channel.size() - filePos < byteToRead)) { Review comment: Ok. Thanks for explaining! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory
dweiss merged pull request #2258: URL: https://github.com/apache/lucene-solr/pull/2258 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9686) TestDirectIODirectory#testFloatsUnderflow can fail assertion
[ https://issues.apache.org/jira/browse/LUCENE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9686. - Fix Version/s: master (9.0) Resolution: Fixed > TestDirectIODirectory#testFloatsUnderflow can fail assertion > > > Key: LUCENE-9686 > URL: https://issues.apache.org/jira/browse/LUCENE-9686 > Project: Lucene - Core > Issue Type: Test >Reporter: Julie Tibshirani >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Reproduction line: > {code} > ./gradlew test --tests TestDirectIODirectory.testFloatsUnderflow > -Dtests.seed=73B56EAB13269C91 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=haw-US -Dtests.timezone=America/Inuvik -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {code} > I didn't have the chance to look deeply, but it seems like the wrong > exception type is being thrown: > {code:java} > junit.framework.AssertionFailedError: Unexpected exception type, expected > EOFException but got java.nio.BufferUnderflowException > at > __randomizedtesting.SeedInfo.seed([73B56EAB13269C91:1FD75ACA1CD83E9C]:0) > at > org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2895) > at > org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2876) > at > org.apache.lucene.store.BaseDirectoryTestCase.testFloatsUnderflow(BaseDirectoryTestCase.java:291) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9686) TestDirectIODirectory#testFloatsUnderflow can fail assertion
[ https://issues.apache.org/jira/browse/LUCENE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276925#comment-17276925 ] ASF subversion and git services commented on LUCENE-9686: - Commit 3835cb4e95ce6ba93ab5e3d5caa35001c90db30a in lucene-solr's branch refs/heads/master from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3835cb4 ] LUCENE-9686: Fix read past EOF handling in DirectIODirectory (#2258) > TestDirectIODirectory#testFloatsUnderflow can fail assertion > > > Key: LUCENE-9686 > URL: https://issues.apache.org/jira/browse/LUCENE-9686 > Project: Lucene - Core > Issue Type: Test >Reporter: Julie Tibshirani >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Reproduction line: > {code} > ./gradlew test --tests TestDirectIODirectory.testFloatsUnderflow > -Dtests.seed=73B56EAB13269C91 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=haw-US -Dtests.timezone=America/Inuvik -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {code} > I didn't have the chance to look deeply, but it seems like the wrong > exception type is being thrown: > {code:java} > junit.framework.AssertionFailedError: Unexpected exception type, expected > EOFException but got java.nio.BufferUnderflowException > at > __randomizedtesting.SeedInfo.seed([73B56EAB13269C91:1FD75ACA1CD83E9C]:0) > at > org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2895) > at > org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2876) > at > org.apache.lucene.store.BaseDirectoryTestCase.testFloatsUnderflow(BaseDirectoryTestCase.java:291) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9686) TestDirectIODirectory#testFloatsUnderflow can fail assertion
[ https://issues.apache.org/jira/browse/LUCENE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276927#comment-17276927 ] ASF subversion and git services commented on LUCENE-9686: - Commit 2da7a4a86d3620add49f3372a12d90c8b9aee0fd in lucene-solr's branch refs/heads/master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2da7a4a ] LUCENE-9686: Add changes entry. > TestDirectIODirectory#testFloatsUnderflow can fail assertion > > > Key: LUCENE-9686 > URL: https://issues.apache.org/jira/browse/LUCENE-9686 > Project: Lucene - Core > Issue Type: Test >Reporter: Julie Tibshirani >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Reproduction line: > {code} > ./gradlew test --tests TestDirectIODirectory.testFloatsUnderflow > -Dtests.seed=73B56EAB13269C91 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=haw-US -Dtests.timezone=America/Inuvik -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {code} > I didn't have the chance to look deeply, but it seems like the wrong > exception type is being thrown: > {code:java} > junit.framework.AssertionFailedError: Unexpected exception type, expected > EOFException but got java.nio.BufferUnderflowException > at > __randomizedtesting.SeedInfo.seed([73B56EAB13269C91:1FD75ACA1CD83E9C]:0) > at > org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2895) > at > org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2876) > at > org.apache.lucene.store.BaseDirectoryTestCase.testFloatsUnderflow(BaseDirectoryTestCase.java:291) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
dweiss commented on a change in pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568397883 ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/SpellCheckerTest.java ## @@ -61,59 +61,74 @@ public void needAffixOnAffixes() throws Exception { doTest("needaffix5"); } + @Test public void testBreak() throws Exception { doTest("break"); } - public void testBreakDefault() throws Exception { + @Test + public void breakDefault() throws Exception { doTest("breakdefault"); } - public void testBreakOff() throws Exception { + @Test + public void breakOff() throws Exception { doTest("breakoff"); } - public void testCompoundrule() throws Exception { + @Test + public void compoundrule() throws Exception { doTest("compoundrule"); } - public void testCompoundrule2() throws Exception { + @Test + public void compoundrule2() throws Exception { doTest("compoundrule2"); } - public void testCompoundrule3() throws Exception { + @Test + public void compoundrule3() throws Exception { doTest("compoundrule3"); } - public void testCompoundrule4() throws Exception { + @Test + public void compoundrule4() throws Exception { doTest("compoundrule4"); } - public void testCompoundrule5() throws Exception { + @Test + public void compoundrule5() throws Exception { doTest("compoundrule5"); } - public void testCompoundrule6() throws Exception { + @Test + public void compoundrule6() throws Exception { doTest("compoundrule6"); } - public void testCompoundrule7() throws Exception { + @Test + public void compoundrule7() throws Exception { doTest("compoundrule7"); } - public void testCompoundrule8() throws Exception { + @Test + public void compoundrule8() throws Exception { doTest("compoundrule8"); } - public void testGermanCompounding() throws Exception { + @Test + public void germanCompounding() throws Exception { doTest("germancompounding"); } protected void doTest(String name) throws Exception { -InputStream affixStream = -Objects.requireNonNull(getClass().getResourceAsStream(name + ".aff"), name); -InputStream dictStream = -Objects.requireNonNull(getClass().getResourceAsStream(name + ".dic"), name); +checkSpellCheckerExpectations( Review comment: Mhmm... let me try again then. It's weird - tried last night and got permission denied. Could be that I pulled your changes via https and not ssh... Sorry, it was late. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
dweiss commented on a change in pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568398245 ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/SpellCheckerTest.java ## @@ -61,59 +61,74 @@ public void needAffixOnAffixes() throws Exception { doTest("needaffix5"); } + @Test public void testBreak() throws Exception { doTest("break"); } - public void testBreakDefault() throws Exception { + @Test + public void breakDefault() throws Exception { doTest("breakdefault"); } - public void testBreakOff() throws Exception { + @Test + public void breakOff() throws Exception { doTest("breakoff"); } - public void testCompoundrule() throws Exception { + @Test + public void compoundrule() throws Exception { doTest("compoundrule"); } - public void testCompoundrule2() throws Exception { + @Test + public void compoundrule2() throws Exception { doTest("compoundrule2"); } - public void testCompoundrule3() throws Exception { + @Test + public void compoundrule3() throws Exception { doTest("compoundrule3"); } - public void testCompoundrule4() throws Exception { + @Test + public void compoundrule4() throws Exception { doTest("compoundrule4"); } - public void testCompoundrule5() throws Exception { + @Test + public void compoundrule5() throws Exception { doTest("compoundrule5"); } - public void testCompoundrule6() throws Exception { + @Test + public void compoundrule6() throws Exception { doTest("compoundrule6"); } - public void testCompoundrule7() throws Exception { + @Test + public void compoundrule7() throws Exception { doTest("compoundrule7"); } - public void testCompoundrule8() throws Exception { + @Test + public void compoundrule8() throws Exception { doTest("compoundrule8"); } - public void testGermanCompounding() throws Exception { + @Test + public void germanCompounding() throws Exception { doTest("germancompounding"); } protected void doTest(String name) throws Exception { -InputStream affixStream = -Objects.requireNonNull(getClass().getResourceAsStream(name + ".aff"), name); -InputStream dictStream = -Objects.requireNonNull(getClass().getResourceAsStream(name + ".dic"), name); +checkSpellCheckerExpectations( Review comment: Renaming SpellChecker to Hunspell - yes, I think it's a good idea. Renaming tests later - absolutely. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
donnerpeter commented on a change in pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568402121 ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/TestsFromOriginalHunspellRepository.java ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.hunspell; + +import java.io.IOException; +import java.nio.file.DirectoryStream; +import java.nio.file.Files; +import java.nio.file.Path; +import java.text.ParseException; +import java.util.Collection; +import java.util.Collections; +import java.util.Set; +import java.util.TreeSet; +import java.util.stream.Collectors; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; + +/** + * Same as {@link SpellCheckerTest}, but checks all Hunspell's test data. The path to the checked + * out Hunspell repository should be in {@code -Dhunspell.repo.path=...} system property. + */ +@RunWith(Parameterized.class) Review comment: Thanks! Now I'm starting to doubt whether this approach makes sense at all. I could avoid parameterization by generating test methods explicitly by files, with some risk that new files appear (which could be checked by additional code). And is it OK to modify the test policy for such local tests? I planned to add more not-easy-to-have-in-CI tests, which would measure performance and check correctness. They'd need external files with dictionaries, corpora for various languages (external or is there anything internal already?), and a test-only Hunspell JNI library for comparison (which needs a native binary and a couple of other jars, all of them need sha and license files, and it all gets quite verbose). Do you think the benefits of having this in the repo outweigh the costs? I could also leave this all locally, since I seem to be the only one needing these tests in the near future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15104) Restart solr will override the gc log
[ https://issues.apache.org/jira/browse/SOLR-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276948#comment-17276948 ] Hongxu Ma commented on SOLR-15104: -- I opened a PR to improve it: https://github.com/apache/lucene-solr/pull/2289 > Restart solr will override the gc log > - > > Key: SOLR-15104 > URL: https://issues.apache.org/jira/browse/SOLR-15104 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hongxu Ma >Priority: Minor > > When restarting Solr, it will override the previous java gc log. > This behavior is not friendly for debugging OOM, looks it is caused by the > hard-code gc param (in bin/solr): > [https://github.com/apache/lucene-solr/blob/3e2fb59272f5b4d8106b3d8edf847f50bacd7a61/solr/bin/solr#L2031] > > Refer other sys, I think add timestamp in default gc filename will be better. > https://issues.apache.org/jira/browse/HBASE-18274 > https://issues.apache.org/jira/browse/CASSANDRA-2418 > > Hope it can be improved, thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
dweiss commented on a change in pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568415196 ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/TestsFromOriginalHunspellRepository.java ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.hunspell; + +import java.io.IOException; +import java.nio.file.DirectoryStream; +import java.nio.file.Files; +import java.nio.file.Path; +import java.text.ParseException; +import java.util.Collection; +import java.util.Collections; +import java.util.Set; +import java.util.TreeSet; +import java.util.stream.Collectors; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; + +/** + * Same as {@link SpellCheckerTest}, but checks all Hunspell's test data. The path to the checked + * out Hunspell repository should be in {@code -Dhunspell.repo.path=...} system property. + */ +@RunWith(Parameterized.class) Review comment: I think they should reside in the repo if they are useful (even for local launches). What I'm afraid of is that if these tools are not in use, they'll eventually degrade and stop working without anyone noticing. I think the way to integrate such tests properly would be to add a specific gradle test task which would configure an appropriate policy, require pointers to the required resources, etc. This way these tests can be run as a CI run (somewhere... maybe a github action, even?). I think this can be ironed out later on, once you've written (notice the 'you' here... ;) more of such tests - the patterns of making them work with the CI will naturally emerge from that. For now, feel free to use that original parameterized test runner - I'll look into making IntelliJ work with randomizedtesting again (because I use it here and in other projects). It's a moving target and thus a bit discouraging (I did the same thing a few times in the past already for various IDEs that interpreted test descriptions differently). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #2284: SOLR-11233: Add optional JAVA8_GC_LOG_FILE_OPTS for bin/solr.
cpoerschke commented on a change in pull request #2284: URL: https://github.com/apache/lucene-solr/pull/2284#discussion_r568428808 ## File path: solr/bin/solr ## @@ -2026,7 +2026,11 @@ if [ "$GC_LOG_OPTS" != "" ]; then if [ "$JAVA_VENDOR" == "IBM J9" ]; then gc_log_flag="-Xverbosegclog" fi -GC_LOG_OPTS+=("$gc_log_flag:$SOLR_LOGS_DIR/solr_gc.log" '-XX:+UseGCLogFileRotation' '-XX:NumberOfGCLogFiles=9' '-XX:GCLogFileSize=20M') +if [ -z ${JAVA8_GC_LOG_FILE_OPTS+x} ]; then Review comment: This variant is consistent with the existing variant at line 2010 but yes it confused me too when I first read it, looks there's a subtle difference between "unset" and "set to empty string" behaviour. Illustration: ``` $ unset GC_LOG_OPTS $ if [ -z ${GC_LOG_OPTS+x} ]; then echo unset; else echo not unset; fi unset $ if [ -z ${GC_LOG_OPTS:+x} ]; then echo empty; else echo not empty; fi empty $ if [ -z ${GC_LOG_OPTS} ]; then echo empty; else echo not empty; fi empty $ $ GC_LOG_OPTS= $ if [ -z ${GC_LOG_OPTS+x} ]; then echo unset; else echo not unset; fi not unset $ if [ -z ${GC_LOG_OPTS:+x} ]; then echo empty; else echo not empty; fi empty $ if [ -z ${GC_LOG_OPTS} ]; then echo empty; else echo not empty; fi empty $ $ GC_LOG_OPTS=foobar $ if [ -z ${GC_LOG_OPTS+x} ]; then echo unset; else echo not unset; fi not unset $ if [ -z ${GC_LOG_OPTS:+x} ]; then echo empty; else echo not empty; fi not empty $ if [ -z ${GC_LOG_OPTS} ]; then echo empty; else echo not empty; fi not empty ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #2263: SOLR-14978 OOM Killer in Foreground (#2055)
cpoerschke commented on a change in pull request #2263: URL: https://github.com/apache/lucene-solr/pull/2263#discussion_r568436589 ## File path: solr/bin/solr ## @@ -2115,6 +2128,15 @@ function start_solr() { SOLR_OPTS+=($AUTHC_OPTS) fi + # If a heap dump directory is specified, enable it in SOLR_OPTS + if [[ -z "$SOLR_HEAP_DUMP_DIR" ]] && [[ "$SOLR_HEAP_DUMP" == "true" ]]; then +SOLR_HEAP_DUMP_DIR="${SOLR_LOGS_DIR}/dumps" + fi + if [[ -n "$SOLR_HEAP_DUMP_DIR" ]]; then +SOLR_OPTS+=("-XX:+HeapDumpOnOutOfMemoryError") +SOLR_OPTS+=("-XX:HeapDumpPath=$SOLR_HEAP_DUMP_DIR/solr-$(date +%s)-pid$$.hprof") Review comment: How about also optionally supporting customisation of the file name e.g. via a `SOLR_HEAP_DUMP_FILE` variable? Reasons users might wish to customise: * inclusion of `SOLR_PORT` in the file name to more easily differentiate dumps for different Solr instances on the same machine * preference of (say) `date -u '+%Y%m%d-%H%M%S'` over `date +%s` for the timestamp * always use the same dump file as a way to limit the amount of disk space successive OOMs can use up (a colleague of mine had this insight) * omission of the pid and restriction of the timestamp e.g. to `date -u '+%Y%m%d'` so that at most one OOM file per day would exist * omission of the pid to avoid confusion when running in the background (because the the pid would be that of the shell script and not that of the Solr JVM, I think) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
donnerpeter commented on pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771492242 I've done some rebasing, included your patch, renamed the test and tweaked the code a bit. Hopefully it's better now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
dweiss commented on pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771498850 Eh. This is why Parameterized works for you and randomizedtesting doesn't: https://github.com/JetBrains/intellij-community/blob/master/plugins/junit_rt/src/com/intellij/junit4/JUnit4TestRunnerUtil.java#L96-L105 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
dweiss commented on pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771499577 If you take a look at that class you'll understand why it's such a mess to try to navigate those test descriptions... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
donnerpeter commented on pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771503328 > Eh. This is why Parameterized works for you and randomizedtesting doesn't: > https://github.com/JetBrains/intellij-community/blob/master/plugins/junit_rt/src/com/intellij/junit4/JUnit4TestRunnerUtil.java#L96-L105 That's what I feared: relying on JUnit internals :( This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
dweiss commented on pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771504446 Yes, sadly. I haven't looked at junit5, shame on me. Perhaps it's improved there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data
dweiss merged pull request #2267: URL: https://github.com/apache/lucene-solr/pull/2267 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9707) Hunspell: check Lucene's implementation against Hunspell's test data
[ https://issues.apache.org/jira/browse/LUCENE-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9707. - Fix Version/s: master (9.0) Resolution: Fixed > Hunspell: check Lucene's implementation against Hunspell's test data > > > Key: LUCENE-9707 > URL: https://issues.apache.org/jira/browse/LUCENE-9707 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Peter Gromov >Priority: Major > Fix For: master (9.0) > > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9707) Hunspell: check Lucene's implementation against Hunspell's test data
[ https://issues.apache.org/jira/browse/LUCENE-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276991#comment-17276991 ] ASF subversion and git services commented on LUCENE-9707: - Commit b48d5beb34957e83e99ced60d57d4839b474f018 in lucene-solr's branch refs/heads/master from Peter Gromov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b48d5be ] LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data (#2267) > Hunspell: check Lucene's implementation against Hunspell's test data > > > Key: LUCENE-9707 > URL: https://issues.apache.org/jira/browse/LUCENE-9707 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Peter Gromov >Priority: Major > Fix For: master (9.0) > > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-15122: Fix Version/s: master (9.0) > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reassigned SOLR-15122: --- Assignee: Andrzej Bialecki > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277041#comment-17277041 ] ASF subversion and git services commented on SOLR-15122: Commit 4cb1000ea0a1f6c0d7be2486a709fc82dc94616b in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4cb1000 ] SOLR-15122: Tentative fix for the test failure - the node in the test could go down before the new plugin was active on the Overseer. > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15068) RefGuide documentation for replica placement plugins
[ https://issues.apache.org/jira/browse/SOLR-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved SOLR-15068. - Resolution: Fixed > RefGuide documentation for replica placement plugins > > > Key: SOLR-15068 > URL: https://issues.apache.org/jira/browse/SOLR-15068 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277044#comment-17277044 ] Andrzej Bialecki commented on SOLR-15122: - I'll leave this open to see if the fix works. > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter
mayya-sharipova commented on a change in pull request #2256: URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568520864 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException { // obtain the write.lock. If the user configured a timeout, // we wrap with a sleeper and this might take some time. writeLock = d.obtainLock(WRITE_LOCK_NAME); +if (config.getIndexSort() != null && leafSorter != null) { + throw new IllegalArgumentException( + "[IndexWriter] can't use index sort and leaf sorter at the same time!"); Review comment: @msokolov Thank you for the clarification. Indeed, it is much clear with an example you provided. Looks like we need to think and discuss more about merging scenario, may be in the next PR or Jira ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9705) Move all codec formats to the o.a.l.codecs.Lucene90 package
[ https://issues.apache.org/jira/browse/LUCENE-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277050#comment-17277050 ] Ignacio Vera commented on LUCENE-9705: -- Thanks Julie, I think you capture the spirit of this change. In addition once we have new formats, we can try to simplify things, for example getting rid of PackedInts (legacy) in all current codecs in favour of DirectReader and DirectWriter. > Move all codec formats to the o.a.l.codecs.Lucene90 package > --- > > Key: LUCENE-9705 > URL: https://issues.apache.org/jira/browse/LUCENE-9705 > Project: Lucene - Core > Issue Type: Wish >Reporter: Ignacio Vera >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Current formats are distributed in different packages, prefixed with the > Lucene version they were created. With the upcoming release of Lucene 9.0, it > would be nice to move all those formats to just the o.a.l.codecs.Lucene90 > package (and of course moving the current ones to the backwards-codecs). > This issue would actually facilitate moving the directory API to little > endian (LUCENE-9047) as the only codecs that would need to handle backwards > compatibility will be the codecs in backwards codecs. > In addition, it can help formalising the use of internal versions vs format > versioning ( LUCENE-9616) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified
dweiss commented on a change in pull request #2277: URL: https://github.com/apache/lucene-solr/pull/2277#discussion_r568569461 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -696,45 +690,25 @@ char affixData(int affixIndex, int offset) { return fstCompiler.compile(); } - /** pattern accepts optional BOM + SET + any whitespace */ - static final Pattern ENCODING_PATTERN = Pattern.compile("^(\u00EF\u00BB\u00BF)?SET\\s+"); + /** Parses the encoding and flag format specified in the provided InputStream */ + private void readConfig(InputStream affix) throws IOException, ParseException { +LineNumberReader reader = new LineNumberReader(new InputStreamReader(affix, DEFAULT_CHARSET)); +while (true) { + String line = reader.readLine(); + if (line == null) break; - /** - * Parses the encoding specified in the affix file readable through the provided InputStream - * - * @param affix InputStream for reading the affix file - * @return Encoding specified in the affix file - * @throws IOException Can be thrown while reading from the InputStream - */ - static String getDictionaryEncoding(InputStream affix) throws IOException { -final StringBuilder encoding = new StringBuilder(); -for (; ; ) { - encoding.setLength(0); - int ch; - while ((ch = affix.read()) >= 0) { -if (ch == '\n') { - break; -} -if (ch != '\r') { - encoding.append((char) ch); -} - } - if (encoding.length() == 0 - || encoding.charAt(0) == '#' - || - // this test only at the end as ineffective but would allow lines only containing spaces: - encoding.toString().trim().length() == 0) { -if (ch < 0) { - return DEFAULT_CHARSET.name(); -} -continue; + line = line.trim(); + + while (line.startsWith("\u00EF") || line.startsWith("\u00BB") || line.startsWith("\u00BF")) { Review comment: Can the bom really be present on any line? Wouldn't a more elegant solution be to use a buffered input stream (or a pushback input stream) and just consume the bom if it's leading the file? It is a bit awkward that those files are parsed as ascii (well, iso8859-1) and at the same time have utf bom (not to mention that bit where you convert to utf8 from usi8859-1)... Is this encoding situation really so messed up in hunspell? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified
donnerpeter commented on a change in pull request #2277: URL: https://github.com/apache/lucene-solr/pull/2277#discussion_r568575767 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -696,45 +690,25 @@ char affixData(int affixIndex, int offset) { return fstCompiler.compile(); } - /** pattern accepts optional BOM + SET + any whitespace */ - static final Pattern ENCODING_PATTERN = Pattern.compile("^(\u00EF\u00BB\u00BF)?SET\\s+"); + /** Parses the encoding and flag format specified in the provided InputStream */ + private void readConfig(InputStream affix) throws IOException, ParseException { +LineNumberReader reader = new LineNumberReader(new InputStreamReader(affix, DEFAULT_CHARSET)); +while (true) { + String line = reader.readLine(); + if (line == null) break; - /** - * Parses the encoding specified in the affix file readable through the provided InputStream - * - * @param affix InputStream for reading the affix file - * @return Encoding specified in the affix file - * @throws IOException Can be thrown while reading from the InputStream - */ - static String getDictionaryEncoding(InputStream affix) throws IOException { -final StringBuilder encoding = new StringBuilder(); -for (; ; ) { - encoding.setLength(0); - int ch; - while ((ch = affix.read()) >= 0) { -if (ch == '\n') { - break; -} -if (ch != '\r') { - encoding.append((char) ch); -} - } - if (encoding.length() == 0 - || encoding.charAt(0) == '#' - || - // this test only at the end as ineffective but would allow lines only containing spaces: - encoding.toString().trim().length() == 0) { -if (ch < 0) { - return DEFAULT_CHARSET.name(); -} -continue; + line = line.trim(); + + while (line.startsWith("\u00EF") || line.startsWith("\u00BB") || line.startsWith("\u00BF")) { Review comment: Most likely it's just on the first line, handling it this way was just easier. Pushback might indeed be more elegant, I'll try that, thanks! The situation with encoding is complicated indeed. AFAIU the encodings are either ASCII-based 8-bit, or UTF-8, so for the first time we read the file we can safely check Latin letters. At least that's what Hunspell appears to do as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified
dweiss commented on a change in pull request #2277: URL: https://github.com/apache/lucene-solr/pull/2277#discussion_r568580925 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -696,45 +690,25 @@ char affixData(int affixIndex, int offset) { return fstCompiler.compile(); } - /** pattern accepts optional BOM + SET + any whitespace */ - static final Pattern ENCODING_PATTERN = Pattern.compile("^(\u00EF\u00BB\u00BF)?SET\\s+"); + /** Parses the encoding and flag format specified in the provided InputStream */ + private void readConfig(InputStream affix) throws IOException, ParseException { +LineNumberReader reader = new LineNumberReader(new InputStreamReader(affix, DEFAULT_CHARSET)); +while (true) { + String line = reader.readLine(); + if (line == null) break; - /** - * Parses the encoding specified in the affix file readable through the provided InputStream - * - * @param affix InputStream for reading the affix file - * @return Encoding specified in the affix file - * @throws IOException Can be thrown while reading from the InputStream - */ - static String getDictionaryEncoding(InputStream affix) throws IOException { -final StringBuilder encoding = new StringBuilder(); -for (; ; ) { - encoding.setLength(0); - int ch; - while ((ch = affix.read()) >= 0) { -if (ch == '\n') { - break; -} -if (ch != '\r') { - encoding.append((char) ch); -} - } - if (encoding.length() == 0 - || encoding.charAt(0) == '#' - || - // this test only at the end as ineffective but would allow lines only containing spaces: - encoding.toString().trim().length() == 0) { -if (ch < 0) { - return DEFAULT_CHARSET.name(); -} -continue; + line = line.trim(); + + while (line.startsWith("\u00EF") || line.startsWith("\u00BB") || line.startsWith("\u00BF")) { Review comment: Ok, so it's essentially an unknown byte stream with dynamic charset detection. Not fun. If it's restricted to a reasonable subset (like you said) then a preflight of the content could determine the actual encoding (at least until an explicit encoding declaration is found). Then things would be less messy down the road as you'd just have a Reader to read from... Pushback is fine too. Either this or a BufferedInputStream and use mark/reset to adjust stream position after you detect the BOM (or not). As much as I like PushbackInputStream, it predates dinosaurs. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15119) Make LINK splitMethod the default for SplitShardCmd
[ https://issues.apache.org/jira/browse/SOLR-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277109#comment-17277109 ] Gézapeti commented on SOLR-15119: - I've tried it out and the Link method works with HDFS as well, it's smart enough to fall back to copy the whole index over on HDFS. > Make LINK splitMethod the default for SplitShardCmd > --- > > Key: SOLR-15119 > URL: https://issues.apache.org/jira/browse/SOLR-15119 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0) >Reporter: Megan Carey >Priority: Major > Labels: easy-fix > Time Spent: 1h 10m > Remaining Estimate: 0h > > REWRITE splitMethod is still the default in SplitShardCmd [1], despite LINK > being much faster. IndexSizeTrigger in branch_8x already uses LINK by default > [2], and we have found LINK to be reliable and performant at scale. This work > will just update the default in SplitShardCmd to make LINK the default > overall. > > > [1][https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java#L88] > > [2][https://github.com/apache/lucene-solr/blob/branch_8x/solr/core/src/java/org/apache/solr/cloud/autoscaling/IndexSizeTrigger.java#L186] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9636) Exact and operation to get a SIMD optimize
[ https://issues.apache.org/jira/browse/LUCENE-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277119#comment-17277119 ] Markus Jelsma commented on LUCENE-9636: --- * [LUCENE-9636|http://issues.apache.org/jira/browse/LUCENE-9636]: Faster decoding of postings for some numbers of bits per value. (Guo Feng via Adrien Grand) According to the CHANGES, this ticket should be marked as resolved is it not? > Exact and operation to get a SIMD optimize > -- > > Key: LUCENE-9636 > URL: https://issues.apache.org/jira/browse/LUCENE-9636 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Trivial > Time Spent: 40m > Remaining Estimate: 0h > > In decode6(), decode7(), decode14(), decode15(), decode24() longs always `&` > a same mask and do some shift. By printing assemble language, i find that JIT > did not optimize them with SIMD instructions. But when we extract all `&` > operations and do them first, JIT will use SIMD optimize on them. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9636) Exact and operation to get a SIMD optimize
[ https://issues.apache.org/jira/browse/LUCENE-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo resolved LUCENE-9636. -- Resolution: Fixed > Exact and operation to get a SIMD optimize > -- > > Key: LUCENE-9636 > URL: https://issues.apache.org/jira/browse/LUCENE-9636 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Trivial > Time Spent: 40m > Remaining Estimate: 0h > > In decode6(), decode7(), decode14(), decode15(), decode24() longs always `&` > a same mask and do some shift. By printing assemble language, i find that JIT > did not optimize them with SIMD instructions. But when we extract all `&` > operations and do them first, JIT will use SIMD optimize on them. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277129#comment-17277129 ] Mike Drob commented on SOLR-15122: -- I had just gone through and replaced a bunch of busy wait loops with real conditional tools, it would be good to do the same here. Perhaps the test code can set a monitor and then if a monitor is not null, the event producer can notify on it whenever the version changes. > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #2268: LUCENE-9705: Move Lucene50CompoundFormat to Lucene90CompoundFormat
jpountz commented on a change in pull request #2268: URL: https://github.com/apache/lucene-solr/pull/2268#discussion_r568617447 ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50CompoundFormat.java ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.backward_codecs.lucene50; + +import java.io.IOException; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.codecs.CompoundDirectory; +import org.apache.lucene.codecs.CompoundFormat; +import org.apache.lucene.index.SegmentInfo; +import org.apache.lucene.store.DataOutput; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; + +/** + * Lucene 5.0 compound file format + * + * Files: + * + * + * .cfs: An optional "virtual" file consisting of all the other index files for + * systems that frequently run out of file handles. + * .cfe: The "virtual" compound file's entry table holding all entries in the + * corresponding .cfs file. + * + * + * Description: + * + * + * Compound (.cfs) --> Header, FileData FileCount, Footer + * Compound Entry Table (.cfe) --> Header, FileCount,FileCount + * Header --> {@link CodecUtil#writeIndexHeader IndexHeader} + * FileCount --> {@link DataOutput#writeVInt VInt} + * DataOffset,DataLength,Checksum --> {@link DataOutput#writeLong UInt64} + * FileName --> {@link DataOutput#writeString String} + * FileData --> raw file data + * Footer --> {@link CodecUtil#writeFooter CodecFooter} + * + * + * Notes: + * + * + * FileCount indicates how many files are contained in this compound file. The entry table + * that follows has that many entries. + * Each directory entry contains a long pointer to the start of this file's data section, the + * files length, and a String with that file's name. + * + */ +public final class Lucene50CompoundFormat extends CompoundFormat { + + /** Extension of compound file */ + static final String DATA_EXTENSION = "cfs"; + /** Extension of compound file entries */ + static final String ENTRIES_EXTENSION = "cfe"; + + static final String DATA_CODEC = "Lucene50CompoundData"; + static final String ENTRY_CODEC = "Lucene50CompoundEntries"; + static final int VERSION_START = 0; Review comment: I'd like to keep it for now, even if the version is always 0. My gut feeling is that we should fork file formats more aggressively than we do today but I still don't have full confidence that we will never use the internal versioning again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9723) Hunspell: update sanity tests that load all dictionaries
Peter Gromov created LUCENE-9723: Summary: Hunspell: update sanity tests that load all dictionaries Key: LUCENE-9723 URL: https://issues.apache.org/jira/browse/LUCENE-9723 Project: Lucene - Core Issue Type: Sub-task Reporter: Peter Gromov -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter opened a new pull request #2290: LUCENE-9723: Hunspell: update sanity tests that load all dictionaries
donnerpeter opened a new pull request #2290: URL: https://github.com/apache/lucene-solr/pull/2290 # Description `TestAllDictionaries`(2) are hard to run and their javadoc outdated, as well as the package's javadoc # Solution Make it a single test understanding dictionary dir format of at least two repositories, point to them in the package javadoc. # Tests `TestAllDictionaries` is updated (but failing for now) # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified
dweiss commented on pull request #2277: URL: https://github.com/apache/lucene-solr/pull/2277#issuecomment-771677497 Hi Peter. I pushed a commit which replaces bom consumption with a small function so that either it's read atomically or nothing is consumed at all. Looking at the code also made me wonder whether a sufficiently large leading buffer could be used to just parse the needed stuff from the input stream (bypassing the need to create a temp file)... can be left for a later improvement though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified
donnerpeter commented on pull request #2277: URL: https://github.com/apache/lucene-solr/pull/2277#issuecomment-771711060 Thank you, LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15119) Make LINK splitMethod the default for SplitShardCmd
[ https://issues.apache.org/jira/browse/SOLR-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277109#comment-17277109 ] Gézapeti edited comment on SOLR-15119 at 2/2/21, 4:01 PM: -- I've tried it out and the Link method works with HDFS as well, it's smart enough to fall back to copy the whole index over on HDFS. In any ways, I'm fine with changing the default was (Author: gezapeti): I've tried it out and the Link method works with HDFS as well, it's smart enough to fall back to copy the whole index over on HDFS. > Make LINK splitMethod the default for SplitShardCmd > --- > > Key: SOLR-15119 > URL: https://issues.apache.org/jira/browse/SOLR-15119 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0) >Reporter: Megan Carey >Priority: Major > Labels: easy-fix > Time Spent: 1h 10m > Remaining Estimate: 0h > > REWRITE splitMethod is still the default in SplitShardCmd [1], despite LINK > being much faster. IndexSizeTrigger in branch_8x already uses LINK by default > [2], and we have found LINK to be reliable and performant at scale. This work > will just update the default in SplitShardCmd to make LINK the default > overall. > > > [1][https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java#L88] > > [2][https://github.com/apache/lucene-solr/blob/branch_8x/solr/core/src/java/org/apache/solr/cloud/autoscaling/IndexSizeTrigger.java#L186] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on pull request #2288: LUCENE-9722: Close merged readers on abort
dnhatn commented on pull request #2288: URL: https://github.com/apache/lucene-solr/pull/2288#issuecomment-771758863 Thanks Simon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn merged pull request #2288: LUCENE-9722: Close merged readers on abort
dnhatn merged pull request #2288: URL: https://github.com/apache/lucene-solr/pull/2288 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9722) Aborted merge can leak readers if the output is empty
[ https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277256#comment-17277256 ] ASF subversion and git services commented on LUCENE-9722: - Commit 47e3d06ce00642624634e5d45ebc16fa33d48099 in lucene-solr's branch refs/heads/master from Nhat Nguyen [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=47e3d06 ] LUCENE-9722: Close merged readers on abort (#2288) We fail to close the merged readers of an aborted merge if its output segment contains no document. This bug was discovered by a test in Elasticsearch (elastic/elasticsearch#67884). > Aborted merge can leak readers if the output is empty > - > > Key: LUCENE-9722 > URL: https://issues.apache.org/jira/browse/LUCENE-9722 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: master (9.0), 8.7 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > We fail to close the merged readers of an aborted merge if its output segment > contains no document. > This bug was discovered by a test in Elasticsearch > ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9724) Hunspell: load dictionaries with extra content on REP lines
Peter Gromov created LUCENE-9724: Summary: Hunspell: load dictionaries with extra content on REP lines Key: LUCENE-9724 URL: https://issues.apache.org/jira/browse/LUCENE-9724 Project: Lucene - Core Issue Type: Sub-task Reporter: Peter Gromov -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9724) Hunspell: tolerate extra content on REP lines
[ https://issues.apache.org/jira/browse/LUCENE-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Gromov updated LUCENE-9724: - Summary: Hunspell: tolerate extra content on REP lines (was: Hunspell: load dictionaries with extra content on REP lines) > Hunspell: tolerate extra content on REP lines > - > > Key: LUCENE-9724 > URL: https://issues.apache.org/jira/browse/LUCENE-9724 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Peter Gromov >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14886) Suppress stack trace in Query response.
[ https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabelle Giguere updated SOLR-14886: Attachment: SOLR-14886.patch > Suppress stack trace in Query response. > --- > > Key: SOLR-14886 > URL: https://issues.apache.org/jira/browse/SOLR-14886 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.6.2 >Reporter: Vrinda Davda >Priority: Minor > Attachments: SOLR-14886.patch, SOLR-14886.patch > > > Currently there is no way to suppress the stack trace in solr response when > it throws an exception, like when a client sends a badly formed query string, > or exception with status 500 It sends full stack trace in the response. > I would propose a configuration for error messages so that the stack trace is > not visible to avoid any sensitive information in the stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14886) Suppress stack trace in Query response.
[ https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277272#comment-17277272 ] Isabelle Giguere commented on SOLR-14886: - Patch off current Solr master branch (9.x) - Add a property "hideStackTrace" to solr.xml - In NodeConfig, the default value is "false", for back-compatibility. - Use the new property in ResponseUtils, to print out, or not, the stack trace. - Adapt code that calls ResponseUtils - Add documentation in Ref Guide There's no direct path between solr.xml and ResponseUtils, or any class that uses ResponseUtils, so the "hideStackTrace" property is duplicated in CoreContainer, just so it lives in a place where it can be read. May not be the best approach. Note that the patch cannot fix the cases where the error message ()contains the full stack trace. > Suppress stack trace in Query response. > --- > > Key: SOLR-14886 > URL: https://issues.apache.org/jira/browse/SOLR-14886 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.6.2 >Reporter: Vrinda Davda >Priority: Minor > Attachments: SOLR-14886.patch, SOLR-14886.patch > > > Currently there is no way to suppress the stack trace in solr response when > it throws an exception, like when a client sends a badly formed query string, > or exception with status 500 It sends full stack trace in the response. > I would propose a configuration for error messages so that the stack trace is > not visible to avoid any sensitive information in the stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
bruno-roustant commented on pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#issuecomment-771782427 I expect you don't need to change Lucene code but just write a new custom codec (with a specific name) which provides a custom DocValuesFormat. It extends PerFieldDocValuesFormat and implements the method DocValuesFormat getDocValuesFormatForField(String field). This method provides either a standard Lucene80DocValuesFormat (no compression) or another new custom DocValuesFormat (with a specific name to write in the index) extending Lucene80DocValuesFormat with BEST_COMPRESSION mode. The choice can be made either based on a config (e.g. file) which lists all compressed DocValue based fields, or based on a naming convention. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9722) Aborted merge can leak readers if the output is empty
[ https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277288#comment-17277288 ] ASF subversion and git services commented on LUCENE-9722: - Commit 4ade962679cf07bd4e706f1851bb740a4ad2916a in lucene-solr's branch refs/heads/branch_8x from Nhat Nguyen [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4ade962 ] LUCENE-9722: Close merged readers on abort (#2288) We fail to close the merged readers of an aborted merge if its output segment contains no document. This bug was discovered by a test in Elasticsearch (elastic/elasticsearch#67884). > Aborted merge can leak readers if the output is empty > - > > Key: LUCENE-9722 > URL: https://issues.apache.org/jira/browse/LUCENE-9722 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: master (9.0), 8.7 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > We fail to close the merged readers of an aborted merge if its output segment > contains no document. > This bug was discovered by a test in Elasticsearch > ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9722) Aborted merge can leak readers if the output is empty
[ https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277339#comment-17277339 ] ASF subversion and git services commented on LUCENE-9722: - Commit 2e7cfbd8e60cf8ccb23619db8b20f193546fd1c8 in lucene-solr's branch refs/heads/branch_8_8 from Nhat Nguyen [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2e7cfbd ] LUCENE-9722: Close merged readers on abort (#2288) We fail to close the merged readers of an aborted merge if its output segment contains no document. This bug was discovered by a test in Elasticsearch (elastic/elasticsearch#67884). > Aborted merge can leak readers if the output is empty > - > > Key: LUCENE-9722 > URL: https://issues.apache.org/jira/browse/LUCENE-9722 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: master (9.0), 8.7 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > We fail to close the merged readers of an aborted merge if its output segment > contains no document. > This bug was discovered by a test in Elasticsearch > ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9724) Hunspell: tolerate existing aff/dic file typos
[ https://issues.apache.org/jira/browse/LUCENE-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Gromov updated LUCENE-9724: - Summary: Hunspell: tolerate existing aff/dic file typos (was: Hunspell: tolerate extra content on REP lines) > Hunspell: tolerate existing aff/dic file typos > -- > > Key: LUCENE-9724 > URL: https://issues.apache.org/jira/browse/LUCENE-9724 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Peter Gromov >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on pull request #2276: Improve backwards compatibility tests for sorted indexes.
jtibshirani commented on pull request #2276: URL: https://github.com/apache/lucene-solr/pull/2276#issuecomment-771834969 @mikemccand I tagged you for a (hopefully quick) review, as I think you added the TODOs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9722) Aborted merge can leak readers if the output is empty
[ https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nhat Nguyen updated LUCENE-9722: Fix Version/s: 8.9 master (9.0) 8.0.1 Resolution: Fixed Status: Resolved (was: Patch Available) > Aborted merge can leak readers if the output is empty > - > > Key: LUCENE-9722 > URL: https://issues.apache.org/jira/browse/LUCENE-9722 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: master (9.0), 8.7 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > Fix For: 8.0.1, master (9.0), 8.9 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We fail to close the merged readers of an aborted merge if its output segment > contains no document. > This bug was discovered by a test in Elasticsearch > ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob opened a new pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
madrob opened a new pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15124) Remove node/container level admin handlers from ImplicitPlugins.json (core level).
[ https://issues.apache.org/jira/browse/SOLR-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277347#comment-17277347 ] Cassandra Targett commented on SOLR-15124: -- I'm not able to understand entirely from the description & attached PR if this is removing some requestHandlers from being implicit, or if it's an internal change that won't impact what users see/use? Either way I just wanted to mention that there is a page in the Ref Guide {{implicit-requesthandlers.adoc}} which may need to be updated depending on the scope. > Remove node/container level admin handlers from ImplicitPlugins.json (core > level). > -- > > Key: SOLR-15124 > URL: https://issues.apache.org/jira/browse/SOLR-15124 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Priority: Blocker > Labels: newdev > Fix For: master (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > There are many very old administrative RequestHandlers registered in a > SolrCore that are actually JVM / node / CoreContainer level in nature. These > pre-dated CoreContainer level handlers. We should (1) remove them from > ImplictPlugins.json, and (2) make simplifying tweaks to them to remove that > they work at the core level. For example LoggingHandler has two constructors > and a non-final Watcher because it works in these two modalities. It need > only have the one that takes a CoreContainer, and Watcher will then be final. > /admin/threads > /admin/properties > /admin/logging > Should stay because has core-level stuff: > /admin/plugins > /admin/mbeans > This one: > /admin/system -- SystemInfoHandler > returns "core" level information, and also node level stuff. I propose > splitting this one to a CoreInfoHandler to split the logic. Maybe a separate > issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277353#comment-17277353 ] Mike Drob commented on SOLR-15122: -- I put up a PR to demonstrate what I meant, I think the same pattern should be applied to the DelegatingPlacementPlugin code, if this was based on that. I trust that the design here is correct for accomplishing what we need, but the implementation needed a few touch ups from what you did. * We shouldn't use postfix increment with a volatile variable, as that is not an atomic operation. * Using wait/notify is going to be more efficient use of resources than a busy wait. * You weren't saving the new value of version on subsequent calls, so I updated that too. Please take a look and confirm that this still maintains the intent of what you were trying to do. > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277360#comment-17277360 ] Ilan Ginzburg commented on SOLR-15122: -- ??We shouldn't use postfix increment with a volatile variable, as that is not an atomic operation?? I wouldn't make this a blanket statement. One reason to use a volatile if to conform to the Java Memory Model and not require synchronization for access from different threads. In some cases atomocity is not needed. Also, using an AtomicInteger and using it only from within a synchronized section seems a bit overkill. Any Integer would do (or even an integer if the mutex block is using another object). > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277360#comment-17277360 ] Ilan Ginzburg edited comment on SOLR-15122 at 2/2/21, 6:16 PM: --- ??We shouldn't use postfix increment with a volatile variable, as that is not an atomic operation?? I wouldn't make this a blanket statement. One reason to use a volatile is to conform to the Java Memory Model and not require synchronization for access from different threads. In some cases atomocity is not needed. Also, using an AtomicInteger and using it only from within a synchronized section seems a bit overkill. Any Integer would do (or even an integer if the mutex block is using another object). was (Author: murblanc): ??We shouldn't use postfix increment with a volatile variable, as that is not an atomic operation?? I wouldn't make this a blanket statement. One reason to use a volatile if to conform to the Java Memory Model and not require synchronization for access from different threads. In some cases atomocity is not needed. Also, using an AtomicInteger and using it only from within a synchronized section seems a bit overkill. Any Integer would do (or even an integer if the mutex block is using another object). > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
sigram commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568832617 ## File path: solr/core/src/java/org/apache/solr/cluster/events/impl/DelegatingClusterEventProducer.java ## @@ -90,7 +95,10 @@ public void setDelegate(ClusterEventProducer newDelegate) { log.debug("--- delegate {} already in state {}", delegate, delegate.getState()); } } -this.version++; +synchronized (version) { Review comment: I don't think we need AtomicInteger if all sections that access `version` are synchronized? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
sigram commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568833400 ## File path: solr/core/src/java/org/apache/solr/cluster/events/impl/DelegatingClusterEventProducer.java ## @@ -144,7 +152,25 @@ public synchronized void stop() { } @VisibleForTesting - public int getVersion() { -return version; + public int waitForVersionChange(int currentVersion, int timeoutSec) throws InterruptedException, TimeoutException { Review comment: I debated whether to add this to the wrappers... it's only needed in tests. OTOH putting it here makes the test code much simpler. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
sigram commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568833741 ## File path: solr/core/src/test/org/apache/solr/cluster/events/ClusterEventProducerTest.java ## @@ -292,7 +287,7 @@ public void testListenerPlugins() throws Exception { .build(); V2Response rsp = req.process(cluster.getSolrClient()); assertEquals(0, rsp.getStatus()); -version = waitForVersionChange(-1, 10); +version = waitForVersionChange(version, 10); Review comment: Gah.. copy/paste error. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15128) nodeName does not contain expected ':' separator: localhost
[ https://issues.apache.org/jira/browse/SOLR-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter updated SOLR-15128: -- Fix Version/s: master (9.0) > nodeName does not contain expected ':' separator: localhost > --- > > Key: SOLR-15128 > URL: https://issues.apache.org/jira/browse/SOLR-15128 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Environment: Only seems to affect master, 8.8 is not affected. >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Fix For: master (9.0) > > > {code} > "error":{"msg":"nodeName does not contain expected ':' separator: > localhost","trace":"java.lang.IllegalArgumentException: nodeName does not > contain expected ':' separator: localhost\n\tat > org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:764)\n\tat > org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:759)\n\tat > org.apache.solr.common.cloud.UrlScheme.getBaseUrlForNodeName(UrlScheme.java:54)\n\ta{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15128) nodeName does not contain expected ':' separator: localhost
Timothy Potter created SOLR-15128: - Summary: nodeName does not contain expected ':' separator: localhost Key: SOLR-15128 URL: https://issues.apache.org/jira/browse/SOLR-15128 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Environment: Only seems to affect master, 8.8 is not affected. Reporter: Timothy Potter Assignee: Timothy Potter {code} "error":{"msg":"nodeName does not contain expected ':' separator: localhost","trace":"java.lang.IllegalArgumentException: nodeName does not contain expected ':' separator: localhost\n\tat org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:764)\n\tat org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:759)\n\tat org.apache.solr.common.cloud.UrlScheme.getBaseUrlForNodeName(UrlScheme.java:54)\n\ta{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15128) nodeName does not contain expected ':' separator: localhost
[ https://issues.apache.org/jira/browse/SOLR-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277376#comment-17277376 ] Timothy Potter commented on SOLR-15128: --- Not sure why master is seeing {{localhost}} as the nodeName w/o port on it, but this is breaking the parsing in {{Utils.getBaseUrlFromNodeName}} > nodeName does not contain expected ':' separator: localhost > --- > > Key: SOLR-15128 > URL: https://issues.apache.org/jira/browse/SOLR-15128 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Environment: Only seems to affect master, 8.8 is not affected. >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Fix For: master (9.0) > > > {code} > "error":{"msg":"nodeName does not contain expected ':' separator: > localhost","trace":"java.lang.IllegalArgumentException: nodeName does not > contain expected ':' separator: localhost\n\tat > org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:764)\n\tat > org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:759)\n\tat > org.apache.solr.common.cloud.UrlScheme.getBaseUrlForNodeName(UrlScheme.java:54)\n\ta{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15128) nodeName does not contain expected ':' separator: localhost
[ https://issues.apache.org/jira/browse/SOLR-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter resolved SOLR-15128. --- Resolution: Won't Fix Ignore this ;-) Was using the wrong method to get the nodeName in some code that only exists on master. {{zkcontroller.getNodeName}} is the correct way to get the nodeName for converting to a URL. > nodeName does not contain expected ':' separator: localhost > --- > > Key: SOLR-15128 > URL: https://issues.apache.org/jira/browse/SOLR-15128 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Environment: Only seems to affect master, 8.8 is not affected. >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Fix For: master (9.0) > > > {code} > "error":{"msg":"nodeName does not contain expected ':' separator: > localhost","trace":"java.lang.IllegalArgumentException: nodeName does not > contain expected ':' separator: localhost\n\tat > org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:764)\n\tat > org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:759)\n\tat > org.apache.solr.common.cloud.UrlScheme.getBaseUrlForNodeName(UrlScheme.java:54)\n\ta{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277399#comment-17277399 ] Mike Drob commented on SOLR-15122: -- I'm pretty comfortable with that blanket statement. If you're using volatile, that means you expect multiple threads accessing, and if you have multiple threads writing then you shouldn't be using non-atomic postfix increment. If you can guarantee that you only have one writer, and the volatile is for the readers, then... maybe it's ok? It's still trappy and can lead to issues down the line. I used AtomicInteger because we don't have a mutable Integer and needed an object anyway for the sync block. I can rewrite that with an Object and an int if you prefer. > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
madrob commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568852727 ## File path: solr/core/src/java/org/apache/solr/cluster/events/impl/DelegatingClusterEventProducer.java ## @@ -144,7 +152,25 @@ public synchronized void stop() { } @VisibleForTesting - public int getVersion() { -return version; + public int waitForVersionChange(int currentVersion, int timeoutSec) throws InterruptedException, TimeoutException { Review comment: It should really go in a third place I think, because then we can reuse the versioning logic between this and DelegatingPlacementPluginFactory/IntegrationTest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277406#comment-17277406 ] Ilan Ginzburg commented on SOLR-15122: -- You're right about no mutable Integer. Would there be a way for the test code to pass a synchronization object (a latch? that would be null for non test code) that prod code would use so that we don't end up with large methods in production classes that are only used for tests? > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277423#comment-17277423 ] Mike Drob commented on SOLR-15122: -- Refactored the code in anticipation of working over DelegatingPlacementPluginFactory as well, moved the wait logic to test sources with an int and an Object instead of Atomic Integer. Please take a look before I consolidate the other implementation > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] orenovadia commented on a change in pull request #2231: LUCENE-9680 - Re-add IndexWriter::getFieldNames
orenovadia commented on a change in pull request #2231: URL: https://github.com/apache/lucene-solr/pull/2231#discussion_r568890937 ## File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java ## @@ -4600,4 +4600,49 @@ public void testIndexWriterBlocksOnStall() throws IOException, InterruptedExcept } } } + + public void testGetFieldNames() throws IOException { +Directory dir = newDirectory(); + +IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig(new MockAnalyzer(random(; + +assertEquals(Set.of(), writer.getFieldNames()); + +addDocWithField(writer, "f1"); +assertEquals(Set.of("f1"), writer.getFieldNames()); + +// should be unmodifiable: +final Set fieldSet = writer.getFieldNames(); +assertThrows(UnsupportedOperationException.class, () -> fieldSet.add("cannot modify")); +assertThrows(UnsupportedOperationException.class, () -> fieldSet.remove("f1")); + +addDocWithField(writer, "f2"); +assertEquals(Set.of("f1", "f2"), writer.getFieldNames()); Review comment: Sounds good! Added in: 1bc95ae7f7e This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified
dweiss merged pull request #2277: URL: https://github.com/apache/lucene-solr/pull/2277 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9716) Hunspell: support flag usage before its format is even specified
[ https://issues.apache.org/jira/browse/LUCENE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9716. - Fix Version/s: master (9.0) Resolution: Fixed > Hunspell: support flag usage before its format is even specified > > > Key: LUCENE-9716 > URL: https://issues.apache.org/jira/browse/LUCENE-9716 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Peter Gromov >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > e.g. `nl` dictionaries first have `KEEPCASE Kc` and only then `FLAG long` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9716) Hunspell: support flag usage before its format is even specified
[ https://issues.apache.org/jira/browse/LUCENE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277437#comment-17277437 ] ASF subversion and git services commented on LUCENE-9716: - Commit 8f75933f3dae9f334e7d302bbfdc05d2b2e3c979 in lucene-solr's branch refs/heads/master from Peter Gromov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8f75933 ] LUCENE-9716: Hunspell: support flag usage before its format is even specified (#2277) > Hunspell: support flag usage before its format is even specified > > > Key: LUCENE-9716 > URL: https://issues.apache.org/jira/browse/LUCENE-9716 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Peter Gromov >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > e.g. `nl` dictionaries first have `KEEPCASE Kc` and only then `FLAG long` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
murblanc commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568912229 ## File path: solr/core/src/java/org/apache/solr/cluster/events/impl/DelegatingClusterEventProducer.java ## @@ -90,7 +96,9 @@ public void setDelegate(ClusterEventProducer newDelegate) { log.debug("--- delegate {} already in state {}", delegate, delegate.getState()); } } -this.version++; +if (versionTracker != null) { Review comment: We have a synchronization issue (memory barrier type, not concurrent access type). The thread calling `setDelegate()` is accessing `versionTracker` set by another thread without synchronization. Can be fixed by making `versionTracker` volatile. ## File path: solr/core/src/test/org/apache/solr/cluster/VersionTrackerImpl.java ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cluster; + +import org.apache.solr.common.util.TimeSource; +import org.apache.solr.util.TimeOut; + +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; + +public class VersionTrackerImpl implements VersionTracker { +private int version = 0; + +@Override +public synchronized void increment() { +version++; +this.notifyAll(); +} + +@Override +public int waitForVersionChange(int currentVersion, int timeoutSec) throws InterruptedException, TimeoutException { +TimeOut timeout = new TimeOut(timeoutSec, TimeUnit.SECONDS, TimeSource.NANO_TIME); +int newVersion = currentVersion; +while (! timeout.hasTimedOut()) { +synchronized (this) { +if ((newVersion = version) != currentVersion) { +break; +} +this.wait(timeout.timeLeft(TimeUnit.MILLISECONDS)); +} +} +if (newVersion < currentVersion) { +// ArithmeticException? This means we overflowed +throw new RuntimeException("Invalid version - went back! currentVersion=" + currentVersion + +" newVersion=" + newVersion); +} else if (newVersion == currentVersion) { +throw new TimeoutException("Timed out waiting for version change."); Review comment: Add the version value to the exception, might help debug tests. ## File path: solr/core/src/java/org/apache/solr/cluster/VersionTracker.java ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cluster; + +import java.util.concurrent.TimeoutException; + +/** + * Allows for tracking state change from test classes. Typical use will be to set a version tracker on a stateful + * object, which will call {@link #increment()} every time state changes. Test clients observing the state will call + * {@link #waitForVersionChange(int, int)} to be notified of the next increment call. + */ +public interface VersionTracker { Review comment: Tracking versions by incrementing is one possible implementation of this interface, but maybe the interface doesn't have to hint that this is the implementation? Renaming `increment` into `notifyEvent` or something similar and `VersionTracker` into `NotificationCallback` would make it more generic (not suggesting these actual names, but you get the idea). `waitForVersionChange` doesn't have to b
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277448#comment-17277448 ] Ilan Ginzburg commented on SOLR-15122: -- Added a few comments. I like this approach much better. Thanks. > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15092) Loosen Ref Guide link checking to allow empty anchors in links
[ https://issues.apache.org/jira/browse/SOLR-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-15092: -- Attachment: SOLR-15092.patch Assignee: Chris M. Hostetter Status: Open (was: Open) the attached patch takes care of relaxing this rule, while still ensuring that if an anchor is used, it must exist in the linked file. Once this patch is applied, the following perl command can be run to "clean up" any no longer needed anchors that point at the id on the body of each page... {noformat} perl -i -ple 's/<<(.*?)\.adoc#\1,/<<$1.adoc#,/g' src/*.adoc {noformat} ...although in at least one place some manual cleanup needs to be done, because otherwise asciidoctor gets confused by this line in {{language-analysis.adoc}} ... {noformat} ... Blank lines and lines that begin with "#" are ignored. See <> for more information. {noformat} ...and thinks the {{# ... #}} bit is suppose to be "highlighted" using html5 {{}} tags... {noformat} ... Blank lines and lines that begin with "" are ignored. See Resource Loading for more information. {noformat} ...so we'll have to either escape of backtick quote the first {{#}} character in the line. (I didn't include the modifications made by the perl command in the patch, because we'll want to run that command on each branch given the other content changes between master & branch_8x) > Loosen Ref Guide link checking to allow empty anchors in links > -- > > Key: SOLR-15092 > URL: https://issues.apache.org/jira/browse/SOLR-15092 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cassandra Targett >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15092.patch > > > Back when we were publishing the PDF, we needed to be sure to include an > explicit section title as an anchor for all inter-document links (such as > {{<>}}) because when the entire guide > was assembled into a single file the explicit anchor ensured links went to > the right spot in the overall Guide. > Without the PDF, if we want to just link to another page in its entirety and > not a sub-section of a page, we can use a shorter syntax with an empty > anchor: {{<>}}. I can't find this explicitly > documented, but it does construct a correct link (i.e., {{ href="page-title.html#">Page Title}}). > However, our link checking will fail this structure because it still assumes > we must have a section name in the anchor and won't allow blank anchors. This > issue is to loosen that check a bit and update the Ref Guide how-to docs to > show it as a possible option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
madrob commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568943853 ## File path: solr/core/src/test/org/apache/solr/cluster/VersionTrackerImpl.java ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cluster; + +import org.apache.solr.common.util.TimeSource; +import org.apache.solr.util.TimeOut; + +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; + +public class VersionTrackerImpl implements VersionTracker { +private int version = 0; + +@Override +public synchronized void increment() { +version++; +this.notifyAll(); +} + +@Override +public int waitForVersionChange(int currentVersion, int timeoutSec) throws InterruptedException, TimeoutException { +TimeOut timeout = new TimeOut(timeoutSec, TimeUnit.SECONDS, TimeSource.NANO_TIME); +int newVersion = currentVersion; +while (! timeout.hasTimedOut()) { Review comment: It's hard to do profiling on this since it generally only loops once on my machine, but I'll switch it to the loop inside of the block. It's part of the condition that we are implicitly checking with the wait, so it makes sense this way too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
madrob commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568944250 ## File path: solr/core/src/test/org/apache/solr/cluster/events/ClusterEventProducerTest.java ## @@ -102,7 +105,7 @@ public void teardown() throws Exception { @Test public void testEvents() throws Exception { -int version = waitForVersionChange(-1, 10); +int version = versionTracker.waitForVersionChange(-1, 10); Review comment: -1 is the last value we've "seen", I'll add some docs around this. Effectively this is a getVersion in this implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
murblanc commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568968626 ## File path: solr/core/src/test/org/apache/solr/cluster/CountingStateChangeListener.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cluster; + +import org.apache.solr.common.util.TimeSource; +import org.apache.solr.util.TimeOut; + +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; + +/** + * A counting StateChangeListener that will internally track how many times {@link #stateChanged()} has been called. + * Consumers can compare the number of state change calls before and after an event to determine if they should proceed, + * made simple with {@link #waitForVersionChange(int, int)} method. + */ +public class CountingStateChangeListener implements StateChangeListener { +private int version = 0; + +@Override +public synchronized void stateChanged() { +version++; +this.notifyAll(); +} + +/** + * Given a last known number of state changes, wait for additional changes to come in. If no state changes have + * occurred beyond the known value, this method will wait for additional changes to come in. + * If the current number of change events is unknown to the caller, then this method can be called with -1 + * to return immediately with the number of events up to this point. + * @param lastVersion the previous number of changes seen + * @param timeoutSec how long to wait for additional changes to occur + * @return the number of changes seen since initialization + */ +public int waitForVersionChange(int lastVersion, int timeoutSec) throws InterruptedException, TimeoutException { +TimeOut timeout = new TimeOut(timeoutSec, TimeUnit.SECONDS, TimeSource.NANO_TIME); +int newVersion = lastVersion; +synchronized (this) { +while (!timeout.hasTimedOut() && (newVersion = version) != lastVersion) { Review comment: I don't get the condition here. Shouldn't we loop while `version == lastVersion` (so we exit the loop when it changes or when we time out) rather than loop while they're different? I suspect that the test without this improved lockstep synchronization was passing always on your machine and it continues to pass for the same reason. ## File path: solr/core/src/test/org/apache/solr/cluster/CountingStateChangeListener.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cluster; + +import org.apache.solr.common.util.TimeSource; +import org.apache.solr.util.TimeOut; + +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; + +/** + * A counting StateChangeListener that will internally track how many times {@link #stateChanged()} has been called. + * Consumers can compare the number of state change calls before and after an event to determine if they should proceed, + * made simple with {@link #waitForVersionChange(int, int)} method. + */ +public class CountingStateChangeListener implements StateChangeListener { +private int version = 0; + +@Override +public synchronized void stateChanged() { +version++; +this.notifyAll(); +} + +/** + * Given a last known number of state changes, wait for additional changes to
[GitHub] [lucene-solr] msokolov merged pull request #2282: LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes
msokolov merged pull request #2282: URL: https://github.com/apache/lucene-solr/pull/2282 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9615) Expose HnswGraphBuilder index-time hyperparameters
[ https://issues.apache.org/jira/browse/LUCENE-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277502#comment-17277502 ] ASF subversion and git services commented on LUCENE-9615: - Commit a53e8e722884e5655206292590da67bb71efc34d in lucene-solr's branch refs/heads/master from sbeniwal12 [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a53e8e7 ] LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes (from Shubham Beniwal)) > Expose HnswGraphBuilder index-time hyperparameters > -- > > Key: LUCENE-9615 > URL: https://issues.apache.org/jira/browse/LUCENE-9615 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > HnswGraphBuilder has a few tunables: maxConnections, beamWidth, and we may > add a few more, such as whether to use a diversity heuristic when choosing > neighbors to link in the graph. Currently these are locked to defaults set by > global variables. Instead we should provide some interface for configuring > them. The best candidate so far seems to be to add them either as attributes > on a FieldType, or as Codec level configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify
madrob commented on a change in pull request #2291: URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568977953 ## File path: solr/core/src/test/org/apache/solr/cluster/CountingStateChangeListener.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cluster; + +import org.apache.solr.common.util.TimeSource; +import org.apache.solr.util.TimeOut; + +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; + +/** + * A counting StateChangeListener that will internally track how many times {@link #stateChanged()} has been called. + * Consumers can compare the number of state change calls before and after an event to determine if they should proceed, + * made simple with {@link #waitForVersionChange(int, int)} method. + */ +public class CountingStateChangeListener implements StateChangeListener { +private int version = 0; + +@Override +public synchronized void stateChanged() { +version++; +this.notifyAll(); +} + +/** + * Given a last known number of state changes, wait for additional changes to come in. If no state changes have + * occurred beyond the known value, this method will wait for additional changes to come in. + * If the current number of change events is unknown to the caller, then this method can be called with -1 + * to return immediately with the number of events up to this point. + * @param lastVersion the previous number of changes seen + * @param timeoutSec how long to wait for additional changes to occur + * @return the number of changes seen since initialization + */ +public int waitForVersionChange(int lastVersion, int timeoutSec) throws InterruptedException, TimeoutException { +TimeOut timeout = new TimeOut(timeoutSec, TimeUnit.SECONDS, TimeSource.NANO_TIME); +int newVersion = lastVersion; +synchronized (this) { +while (!timeout.hasTimedOut() && (newVersion = version) != lastVersion) { Review comment: It's a copy-paste error This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #2231: LUCENE-9680 - Re-add IndexWriter::getFieldNames
msokolov commented on a change in pull request #2231: URL: https://github.com/apache/lucene-solr/pull/2231#discussion_r568979069 ## File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java ## @@ -4600,4 +4600,49 @@ public void testIndexWriterBlocksOnStall() throws IOException, InterruptedExcept } } } + + public void testGetFieldNames() throws IOException { +Directory dir = newDirectory(); + +IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig(new MockAnalyzer(random(; + +assertEquals(Set.of(), writer.getFieldNames()); + +addDocWithField(writer, "f1"); +assertEquals(Set.of("f1"), writer.getFieldNames()); + +// should be unmodifiable: +final Set fieldSet = writer.getFieldNames(); +assertThrows(UnsupportedOperationException.class, () -> fieldSet.add("cannot modify")); +assertThrows(UnsupportedOperationException.class, () -> fieldSet.remove("f1")); + +addDocWithField(writer, "f2"); +assertEquals(Set.of("f1", "f2"), writer.getFieldNames()); Review comment: thanks, @orenovadia ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #2231: LUCENE-9680 - Re-add IndexWriter::getFieldNames
msokolov merged pull request #2231: URL: https://github.com/apache/lucene-solr/pull/2231 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9680) Re-add IndexWriter.getFieldNames
[ https://issues.apache.org/jira/browse/LUCENE-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277508#comment-17277508 ] ASF subversion and git services commented on LUCENE-9680: - Commit 8d0cbcbb53139413a3fdbb364764e811145b2ccf in lucene-solr's branch refs/heads/master from orenovadia [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8d0cbcb ] LUCENE-9680 - Re-add IndexWriter::getFieldNames > Re-add IndexWriter.getFieldNames > > > Key: LUCENE-9680 > URL: https://issues.apache.org/jira/browse/LUCENE-9680 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Oren Ovadia >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > IndexWriter.getFieldNames was deprecated in LUCENE-8909. > It is useful to have this information exposed by IW to cap (or report) when > too many fields have been created. > getFieldNames was introduced in LUCENE-7659. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9680) Re-add IndexWriter.getFieldNames
[ https://issues.apache.org/jira/browse/LUCENE-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277510#comment-17277510 ] Michael Sokolov commented on LUCENE-9680: - [~oren.ovadia] do you also want to backport to branch_8x? It doesn't seem urgent, but if it would be useful to you to have this in the next 8.x release, you might want to do so. > Re-add IndexWriter.getFieldNames > > > Key: LUCENE-9680 > URL: https://issues.apache.org/jira/browse/LUCENE-9680 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Oren Ovadia >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > IndexWriter.getFieldNames was deprecated in LUCENE-8909. > It is useful to have this information exposed by IW to cap (or report) when > too many fields have been created. > getFieldNames was introduced in LUCENE-7659. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277519#comment-17277519 ] Mike Drob commented on SOLR-15122: -- I was thinking about this some more, and think we should use a Phaser instead of rolling our own concurrency implementation. Thoughts? > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277522#comment-17277522 ] Isabelle Giguere commented on SOLR-5480: [~erickerickson], [~noble.paul], [~anshum], [~hossman] Before we deprecate the MLT Handler, can we please have some sort of valid solution for passing in text to the MLT QParser ? To support uses cases where the id of the initial document is not known. https://issues.apache.org/jira/browse/SOLR-7913?focusedCommentId=17267477&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17267477 The main purpose of SOLR-7913 is to pass plain text to MLT QParser. It concentrates on stream.body, because, at one time, it looked like the best way to do so. But if text could be passed to MLT QParser in any other way, there would be no reason to insist on using stream.body. > Make MoreLikeThisHandler distributable > -- > > Key: SOLR-5480 > URL: https://issues.apache.org/jira/browse/SOLR-5480 > Project: Solr > Issue Type: Improvement > Components: MoreLikeThis >Reporter: Steve Molloy >Assignee: Noble Paul >Priority: Major > Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, > SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, > SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, > SOLR-5480.patch, SOLR-5480.patch > > > The MoreLikeThis component, when used in the standard search handler supports > distributed searches. But the MoreLikeThisHandler itself doesn't, which > prevents from say, passing in text to perform the query. I'll start looking > into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone > has some work done already and want to share, or want to contribute, any help > will be welcomed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable
[ https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277529#comment-17277529 ] Ilan Ginzburg commented on SOLR-15122: -- I've never used this specific class but if you make it so that it's hidden behind the {{StateChangeListener}} interface (i.e. it is a test writer implementation choice) then I'm perfectly fine with it. I'd be much more hesitant to expose a specific concurrency class in the interface though. > ClusterEventProducerTest.testEvents is unstable > --- > > Key: SOLR-15122 > URL: https://issues.apache.org/jira/browse/SOLR-15122 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This test looks to be unstable according to Jenkins since about Nov 5. I just > started seeing occasional failures locally when running the whole suite but > cannot reproduce when running in isolation. > https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15127) All-In-One Dockerfile for building local images as well as reproducible release builds directly from (remote) git tags
[ https://issues.apache.org/jira/browse/SOLR-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277530#comment-17277530 ] Houston Putman commented on SOLR-15127: --- I think there are two possible ways of going forward with something that the official Docker image people might be ok with. # Using the git tags, as with your example. Doing the gradle build in the multi-stage build. # [~dsmiley]'s suggestion of using the Solr TGZ release as the docker context itself. ** In order to have the Solr TGZ become the docker context, we would merely need to add the Dockerfile and solr/docker/scripts to the release. I'll put up a PR that would use the Solr TGZ as the docker context, allowing us to use docker build directly with the released artifacts. That way we can compare pros/cons of each approach. Besides this bigger question. There are some things I really like in your patch: * Trying to remove the SOLR_VERSION argument (Big improvement, as there would be no required ARGs) ** I think we can actually add the version as a file inside the release, and then read it into an env var as a part of RUN. Then we can sym-link from /opt/solr to /opt/solr-$version, to keep backwards compatibility. * Consolidating the last two RUN layers I am split on the jattach thing. It will be great when it can be moved to the {{apt-get install}} section. Until then, I don't mind if it's fetched in the actual image or the builder image. Did you move it to the builder so that the final image wouldn't need the GITHUB_URL arg? > All-In-One Dockerfile for building local images as well as reproducible > release builds directly from (remote) git tags > -- > > Key: SOLR-15127 > URL: https://issues.apache.org/jira/browse/SOLR-15127 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15127.patch > > > There was a recent dev@lucene discussion about the future of the > github/docker-solr repo and (Apache) "official" solr docker images and using > the "apache/solr" nameing vs (docker-library official) "_/solr" names... > http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3CCAD4GwrNCPEnAJAjy4tY%3DpMeX5vWvnFyLe9ZDaXmF4J8XchA98Q%40mail.gmail.com%3E > In that disussion, mak pointed out that docker-library evidently allows for > some more flexibility in the way "official" docker-library packages can be > built (compared to the rules that were evidnlty in place when the mak setup > the current docker-solr image building process/tooling), pointing out how the > "docker official" elasticsearch images are current built from the "elastic > official" elasticsearch images... > http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3C3CED9683-1DD2-4F08-97F9-4FC549EDE47D%40greenhills.co.uk%3E > Based on this, I proposed that we could probably restructure the Solr > Dockerfile so that it could be useful for both "local development" -- using > the current repo checkout -- as well as for "apache official" apache/solr > images that could be reproducibly built directly from pristine git tags using > the remote git URL syntax supported by "docker build" (and then -- evidently > -- extended by trivial one line Dockerfiles for the "docker-library official" > _/solr images)... > http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3Calpine.DEB.2.21.2101221423340.16298%40slate%3E > This jira tracks this idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15129) Use the Solr TGZ artifact as Docker context
Houston Putman created SOLR-15129: - Summary: Use the Solr TGZ artifact as Docker context Key: SOLR-15129 URL: https://issues.apache.org/jira/browse/SOLR-15129 Project: Solr Issue Type: Sub-task Security Level: Public (Default Security Level. Issues are Public) Affects Versions: master (9.0) Reporter: Houston Putman As discussed in SOLR-15127, there is a need for a unified Dockerfile that allows for release and local builds. This ticket is an attempt to achieve this by using the Solr distribution TGZ as the docker context to build from. Therefore release images would be completely reproducible by running: {{docker build -f solr-9.0.0/Dockerfile https://www.apache.org/dyn/closer.lua/lucene/solr/9.0.0/solr-9.0.0.tgz}} The changes to the Solr distribution would include adding a Dockerfile at {{solr-/Dockerfile}}, adding the docker scripts under {{solr-/docker}}, and adding a version file at {{solr-/VERSION.txt}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9725) Allow BM25FQuery to use other similarities
Julie Tibshirani created LUCENE-9725: Summary: Allow BM25FQuery to use other similarities Key: LUCENE-9725 URL: https://issues.apache.org/jira/browse/LUCENE-9725 Project: Lucene - Core Issue Type: Improvement Reporter: Julie Tibshirani >From a high level, BM25FQuery works as follows: 1. Given a list of fields and weights, it pretends there's a synthetic combined field where all terms have been indexed. It computes new term and collection statistics for this combined field. 2. It uses a disjunction iterator and BM25Similarity to score the documents. The steps are (1) compute statistics that represent the combined field content, and (2) pass these to a similarity function. There is nothing really specific to BM25Similarity in this approach. In step 2, we could use another similarity, for example BooleanSimilarity or those based on language models like LMDirichletSimilarity. The main restriction is that norms have to be additive (the norm of the combined field must be the sum of the field norms). Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the one configured on `IndexSearcher`. We could think of this as providing a sensible default approach to cross-field scoring for many similarities. It's an incremental step towards LUCENE-8711, which would give similarities more fine-grained control over how stats/ scores are combined across fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] HoustonPutman opened a new pull request #2292: SOLR-15102: Use Solr distribution TGZ as docker context
HoustonPutman opened a new pull request #2292: URL: https://github.com/apache/lucene-solr/pull/2292 https://issues.apache.org/jira/browse/SOLR-15102 This should work, but there is still cleanup needed with the gradle changes. Also we might want to infer the Solr version another way. Backwards incompatibility that needs to be added back in: `/opt/docker-solr` Changes in the image: - `/opt/docker-solr` -> `/opt/solr/docker` - `/opt/solr` is no longer a sym link, `/opt/solr-` is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15127) All-In-One Dockerfile for building local images as well as reproducible release builds directly from (remote) git tags
[ https://issues.apache.org/jira/browse/SOLR-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277530#comment-17277530 ] Houston Putman edited comment on SOLR-15127 at 2/2/21, 11:49 PM: - I think there are two possible ways of going forward with something that the official Docker image people might be ok with. # Using the git tags, as with your example. Doing the gradle build in the multi-stage build. # [~dsmiley]'s suggestion of using the Solr TGZ release as the docker context itself. ** In order to have the Solr TGZ become the docker context, we would merely need to add the Dockerfile and solr/docker/scripts to the release. I'll put up a PR that would use the Solr TGZ as the docker context, allowing us to use docker build directly with the released artifacts. That way we can compare pros/cons of each approach. (Can be found at SOLR-15129) Besides this bigger question. There are some things I really like in your patch: * Trying to remove the SOLR_VERSION argument (Big improvement, as there would be no required ARGs) ** I think we can actually add the version as a file inside the release, and then read it into an env var as a part of RUN. Then we can sym-link from /opt/solr to /opt/solr-$version, to keep backwards compatibility. * Consolidating the last two RUN layers I am split on the jattach thing. It will be great when it can be moved to the {{apt-get install}} section. Until then, I don't mind if it's fetched in the actual image or the builder image. Did you move it to the builder so that the final image wouldn't need the GITHUB_URL arg? was (Author: houston): I think there are two possible ways of going forward with something that the official Docker image people might be ok with. # Using the git tags, as with your example. Doing the gradle build in the multi-stage build. # [~dsmiley]'s suggestion of using the Solr TGZ release as the docker context itself. ** In order to have the Solr TGZ become the docker context, we would merely need to add the Dockerfile and solr/docker/scripts to the release. I'll put up a PR that would use the Solr TGZ as the docker context, allowing us to use docker build directly with the released artifacts. That way we can compare pros/cons of each approach. Besides this bigger question. There are some things I really like in your patch: * Trying to remove the SOLR_VERSION argument (Big improvement, as there would be no required ARGs) ** I think we can actually add the version as a file inside the release, and then read it into an env var as a part of RUN. Then we can sym-link from /opt/solr to /opt/solr-$version, to keep backwards compatibility. * Consolidating the last two RUN layers I am split on the jattach thing. It will be great when it can be moved to the {{apt-get install}} section. Until then, I don't mind if it's fetched in the actual image or the builder image. Did you move it to the builder so that the final image wouldn't need the GITHUB_URL arg? > All-In-One Dockerfile for building local images as well as reproducible > release builds directly from (remote) git tags > -- > > Key: SOLR-15127 > URL: https://issues.apache.org/jira/browse/SOLR-15127 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15127.patch > > > There was a recent dev@lucene discussion about the future of the > github/docker-solr repo and (Apache) "official" solr docker images and using > the "apache/solr" nameing vs (docker-library official) "_/solr" names... > http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3CCAD4GwrNCPEnAJAjy4tY%3DpMeX5vWvnFyLe9ZDaXmF4J8XchA98Q%40mail.gmail.com%3E > In that disussion, mak pointed out that docker-library evidently allows for > some more flexibility in the way "official" docker-library packages can be > built (compared to the rules that were evidnlty in place when the mak setup > the current docker-solr image building process/tooling), pointing out how the > "docker official" elasticsearch images are current built from the "elastic > official" elasticsearch images... > http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3C3CED9683-1DD2-4F08-97F9-4FC549EDE47D%40greenhills.co.uk%3E > Based on this, I proposed that we could probably restructure the Solr > Dockerfile so that it could be useful for both "local development" -- using > the current repo checkout -- as well as for "apache official" apache/solr > images that could be reproducibly built directly from pristine git tags using > the remote git URL syntax supported by "docker build"
[jira] [Updated] (LUCENE-9725) Allow BM25FQuery to use other similarities
[ https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani updated LUCENE-9725: - Description: >From a high level, BM25FQuery works as follows: # Given a list of fields and weights, it pretends there's a synthetic combined field where all terms have been indexed. It computes new term and collection statistics for this combined field. # It uses a disjunction iterator and BM25Similarity to score the documents. The steps are (1) compute statistics that represent the combined field content, and (2) pass these to a similarity function. There is nothing really specific to BM25Similarity in this approach. In step 2, we could use another similarity, for example BooleanSimilarity or those based on language models like LMDirichletSimilarity. The main restriction is that norms have to be additive (the norm of the combined field must be the sum of the field norms). Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the one configured on IndexSearcher. We could think of this as providing a sensible default approach to cross-field scoring for many similarities. It's an incremental step towards LUCENE-8711, which would give similarities more fine-grained control over how stats/ scores are combined across fields. was: >From a high level, BM25FQuery works as follows: 1. Given a list of fields and weights, it pretends there's a synthetic combined field where all terms have been indexed. It computes new term and collection statistics for this combined field. 2. It uses a disjunction iterator and BM25Similarity to score the documents. The steps are (1) compute statistics that represent the combined field content, and (2) pass these to a similarity function. There is nothing really specific to BM25Similarity in this approach. In step 2, we could use another similarity, for example BooleanSimilarity or those based on language models like LMDirichletSimilarity. The main restriction is that norms have to be additive (the norm of the combined field must be the sum of the field norms). Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the one configured on `IndexSearcher`. We could think of this as providing a sensible default approach to cross-field scoring for many similarities. It's an incremental step towards LUCENE-8711, which would give similarities more fine-grained control over how stats/ scores are combined across fields. > Allow BM25FQuery to use other similarities > -- > > Key: LUCENE-9725 > URL: https://issues.apache.org/jira/browse/LUCENE-9725 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Priority: Major > > From a high level, BM25FQuery works as follows: > # Given a list of fields and weights, it pretends there's a synthetic > combined field where all terms have been indexed. It computes new term and > collection statistics for this combined field. > # It uses a disjunction iterator and BM25Similarity to score the documents. > The steps are (1) compute statistics that represent the combined field > content, and (2) pass these to a similarity function. There is nothing really > specific to BM25Similarity in this approach. In step 2, we could use another > similarity, for example BooleanSimilarity or those based on language models > like LMDirichletSimilarity. The main restriction is that norms have to be > additive (the norm of the combined field must be the sum of the field norms). > Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the > one configured on IndexSearcher. We could think of this as providing a > sensible default approach to cross-field scoring for many similarities. It's > an incremental step towards LUCENE-8711, which would give similarities more > fine-grained control over how stats/ scores are combined across fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15127) All-In-One Dockerfile for building local images as well as reproducible release builds directly from (remote) git tags
[ https://issues.apache.org/jira/browse/SOLR-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277542#comment-17277542 ] Chris M. Hostetter commented on SOLR-15127: --- bq. David Smiley's suggestion of using the Solr TGZ release as the docker context itself. That's pretty close to what i do in one code path of this patch -- except that (as you mentioned) the Dockerfile and solr/docker/scripts aren't in the solr.TGZ so i left in the existing build.gradle logic to provide those. I do however think there is a lot of value in supporting the "build from a remote git url" approach as well, since it let's people build from arbitrary branches w/o a local java env. I also think that from a _transparency_ standpoint with the official builds, it would be better to build them from _source_ ... either the official git tag, or perhaps using the solr-src.tgz release instead of the (compiled) solr.tgz? The broader question I have though at this point is how people feel about this style of "all in one" Dockerfile that uses 'sh' conditional logic in the RUN to support 2 diff ways to building: * "docker stage runs gradle to create solr.tgz; then creates & lays out image" ** makes it easy to use git repo or solr-src.tgz as build context for transparency and portable building of docker images w/o java dev env * "gradle builds solr.tgz; then invokes docker to create & layout image" ** makes it eas(ier) for people to iteratively develop/patch solr in their java env & then build docker images from that It really feels like the best of both worlds to me. bq. I think we can actually add the version as a file inside the release, and then read it into an env var as a part of RUN. Then we can sym-link from /opt/solr to /opt/solr-$version, to keep backwards compatibility. I wasn't sure if there was a *reason* to keep the symlink approach, but yeah, it would be easy to add back if needed. I don't really have any strong feelings on where it happens -- just trying to take advantage of the fact that we can be multi-stage. bq. Did you move it to the builder so that the final image wouldn't need the GITHUB_URL arg? My goal was just to move everything that *could* be in the builder stage into the builder stage, to "fail fast" and try to keep the final image as small as possible. > All-In-One Dockerfile for building local images as well as reproducible > release builds directly from (remote) git tags > -- > > Key: SOLR-15127 > URL: https://issues.apache.org/jira/browse/SOLR-15127 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15127.patch > > > There was a recent dev@lucene discussion about the future of the > github/docker-solr repo and (Apache) "official" solr docker images and using > the "apache/solr" nameing vs (docker-library official) "_/solr" names... > http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3CCAD4GwrNCPEnAJAjy4tY%3DpMeX5vWvnFyLe9ZDaXmF4J8XchA98Q%40mail.gmail.com%3E > In that disussion, mak pointed out that docker-library evidently allows for > some more flexibility in the way "official" docker-library packages can be > built (compared to the rules that were evidnlty in place when the mak setup > the current docker-solr image building process/tooling), pointing out how the > "docker official" elasticsearch images are current built from the "elastic > official" elasticsearch images... > http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3C3CED9683-1DD2-4F08-97F9-4FC549EDE47D%40greenhills.co.uk%3E > Based on this, I proposed that we could probably restructure the Solr > Dockerfile so that it could be useful for both "local development" -- using > the current repo checkout -- as well as for "apache official" apache/solr > images that could be reproducibly built directly from pristine git tags using > the remote git URL syntax supported by "docker build" (and then -- evidently > -- extended by trivial one line Dockerfiles for the "docker-library official" > _/solr images)... > http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3Calpine.DEB.2.21.2101221423340.16298%40slate%3E > This jira tracks this idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani opened a new pull request #2293: LUCENE-9725: Allow BM25FQuery to use other similarities.
jtibshirani opened a new pull request #2293: URL: https://github.com/apache/lucene-solr/pull/2293 From a high level, BM25FQuery (1) computes statistic that represent the combined field content and (2) passes these to a score function. This model makes sense for many similarities besides BM25. This PR unhardcodes BM25Similarity in BM25FQuery and instead uses the one configured on IndexSearcher. It also renames BM25FQuery since it's no longer specific to BM25. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities
[ https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277545#comment-17277545 ] Julie Tibshirani commented on LUCENE-9725: -- I opened https://github.com/apache/lucene-solr/pull/2293 to show the idea. > Allow BM25FQuery to use other similarities > -- > > Key: LUCENE-9725 > URL: https://issues.apache.org/jira/browse/LUCENE-9725 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > From a high level, BM25FQuery works as follows: > # Given a list of fields and weights, it pretends there's a synthetic > combined field where all terms have been indexed. It computes new term and > collection statistics for this combined field. > # It uses a disjunction iterator and BM25Similarity to score the documents. > The steps are (1) compute statistics that represent the combined field > content, and (2) pass these to a similarity function. There is nothing really > specific to BM25Similarity in this approach. In step 2, we could use another > similarity, for example BooleanSimilarity or those based on language models > like LMDirichletSimilarity. The main restriction is that norms have to be > additive (the norm of the combined field must be the sum of the field norms). > Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the > one configured on IndexSearcher. We could think of this as providing a > sensible default approach to cross-field scoring for many similarities. It's > an incremental step towards LUCENE-8711, which would give similarities more > fine-grained control over how stats/ scores are combined across fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zhaih commented on pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
zhaih commented on pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#issuecomment-771265310 I see, so I think for now I could test it via a customized PerFieldDocValuesFormat, I'll give PerFieldDocValuesFormat route a try then. Tho IMO I would prefer a simpler configuration (as proposed by @jaisonbi) rather than customize using PerFieldDocValuesFormat in the future, if these 2 compression are showing different performance characteristic. Since if my understand is correct, to enable only TermDictCompression using PerFieldDOcValuesFormat we need to enumerate all SSDV field names in that class? Which sounds not quite maintainable if there's regularly field addition/deletion. Please correct me if I'm wrong as I'm not quite familiar with codec part... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org