[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


donnerpeter commented on a change in pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568394955



##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/SpellCheckerTest.java
##
@@ -61,59 +61,74 @@ public void needAffixOnAffixes() throws Exception {
 doTest("needaffix5");
   }
 
+  @Test
   public void testBreak() throws Exception {
 doTest("break");
   }
 
-  public void testBreakDefault() throws Exception {
+  @Test
+  public void breakDefault() throws Exception {
 doTest("breakdefault");
   }
 
-  public void testBreakOff() throws Exception {
+  @Test
+  public void breakOff() throws Exception {
 doTest("breakoff");
   }
 
-  public void testCompoundrule() throws Exception {
+  @Test
+  public void compoundrule() throws Exception {
 doTest("compoundrule");
   }
 
-  public void testCompoundrule2() throws Exception {
+  @Test
+  public void compoundrule2() throws Exception {
 doTest("compoundrule2");
   }
 
-  public void testCompoundrule3() throws Exception {
+  @Test
+  public void compoundrule3() throws Exception {
 doTest("compoundrule3");
   }
 
-  public void testCompoundrule4() throws Exception {
+  @Test
+  public void compoundrule4() throws Exception {
 doTest("compoundrule4");
   }
 
-  public void testCompoundrule5() throws Exception {
+  @Test
+  public void compoundrule5() throws Exception {
 doTest("compoundrule5");
   }
 
-  public void testCompoundrule6() throws Exception {
+  @Test
+  public void compoundrule6() throws Exception {
 doTest("compoundrule6");
   }
 
-  public void testCompoundrule7() throws Exception {
+  @Test
+  public void compoundrule7() throws Exception {
 doTest("compoundrule7");
   }
 
-  public void testCompoundrule8() throws Exception {
+  @Test
+  public void compoundrule8() throws Exception {
 doTest("compoundrule8");
   }
 
-  public void testGermanCompounding() throws Exception {
+  @Test
+  public void germanCompounding() throws Exception {
 doTest("germancompounding");
   }
 
   protected void doTest(String name) throws Exception {
-InputStream affixStream =
-Objects.requireNonNull(getClass().getResourceAsStream(name + ".aff"), 
name);
-InputStream dictStream =
-Objects.requireNonNull(getClass().getResourceAsStream(name + ".dic"), 
name);
+checkSpellCheckerExpectations(

Review comment:
   Thanks for looking into this and for your patch! I've no idea why you 
can't push, I've got the checkbox enabled on this PR:
   
   
![image](https://user-images.githubusercontent.com/122009/106570058-afadb100-6535-11eb-83d4-bd16102f31b5.png)
   
   That's what 
https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork
 seems to recommend. I also checked the repository settings and couldn't find 
an option about committers. I can invite you personally though :)
   
   By renaming do you mean `TestSpellChecker`? I'll do it, thanks for the 
suggestion, but preferably a bit later, when there won't be so much merging 
around this class :)
   
   BTW what do you think about renaming `SpellChecker` into `Hunspell`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

2021-02-02 Thread GitBox


dweiss commented on a change in pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#discussion_r568395337



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -381,17 +377,18 @@ public long length() {
 @Override
 public byte readByte() throws IOException {
   if (!buffer.hasRemaining()) {
-refill();
+refill(1);
   }
+
   return buffer.get();
 }
 
-private void refill() throws IOException {
+private void refill(int byteToRead) throws IOException {
   filePos += buffer.capacity();
 
   // BaseDirectoryTestCase#testSeekPastEOF test for consecutive read past 
EOF,
   // hence throwing EOFException early to maintain buffer state (position 
in particular)
-  if (filePos > channel.size()) {
+  if (filePos > channel.size() || (channel.size() - filePos < byteToRead)) 
{

Review comment:
   Ok. Thanks for explaining!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

2021-02-02 Thread GitBox


dweiss merged pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9686) TestDirectIODirectory#testFloatsUnderflow can fail assertion

2021-02-02 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9686.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> TestDirectIODirectory#testFloatsUnderflow can fail assertion
> 
>
> Key: LUCENE-9686
> URL: https://issues.apache.org/jira/browse/LUCENE-9686
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Julie Tibshirani
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Reproduction line:
> {code}
> ./gradlew test --tests TestDirectIODirectory.testFloatsUnderflow 
> -Dtests.seed=73B56EAB13269C91 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=haw-US -Dtests.timezone=America/Inuvik -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> {code}
> I didn't have the chance to look deeply, but it seems like the wrong 
> exception type is being thrown:
> {code:java}
>  junit.framework.AssertionFailedError: Unexpected exception type, expected 
> EOFException but got java.nio.BufferUnderflowException
> at 
> __randomizedtesting.SeedInfo.seed([73B56EAB13269C91:1FD75ACA1CD83E9C]:0)
> at 
> org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2895)
> at 
> org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2876)
> at 
> org.apache.lucene.store.BaseDirectoryTestCase.testFloatsUnderflow(BaseDirectoryTestCase.java:291)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9686) TestDirectIODirectory#testFloatsUnderflow can fail assertion

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276925#comment-17276925
 ] 

ASF subversion and git services commented on LUCENE-9686:
-

Commit 3835cb4e95ce6ba93ab5e3d5caa35001c90db30a in lucene-solr's branch 
refs/heads/master from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3835cb4 ]

LUCENE-9686: Fix read past EOF handling in DirectIODirectory (#2258)



> TestDirectIODirectory#testFloatsUnderflow can fail assertion
> 
>
> Key: LUCENE-9686
> URL: https://issues.apache.org/jira/browse/LUCENE-9686
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Julie Tibshirani
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Reproduction line:
> {code}
> ./gradlew test --tests TestDirectIODirectory.testFloatsUnderflow 
> -Dtests.seed=73B56EAB13269C91 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=haw-US -Dtests.timezone=America/Inuvik -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> {code}
> I didn't have the chance to look deeply, but it seems like the wrong 
> exception type is being thrown:
> {code:java}
>  junit.framework.AssertionFailedError: Unexpected exception type, expected 
> EOFException but got java.nio.BufferUnderflowException
> at 
> __randomizedtesting.SeedInfo.seed([73B56EAB13269C91:1FD75ACA1CD83E9C]:0)
> at 
> org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2895)
> at 
> org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2876)
> at 
> org.apache.lucene.store.BaseDirectoryTestCase.testFloatsUnderflow(BaseDirectoryTestCase.java:291)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9686) TestDirectIODirectory#testFloatsUnderflow can fail assertion

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276927#comment-17276927
 ] 

ASF subversion and git services commented on LUCENE-9686:
-

Commit 2da7a4a86d3620add49f3372a12d90c8b9aee0fd in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2da7a4a ]

LUCENE-9686: Add changes entry.


> TestDirectIODirectory#testFloatsUnderflow can fail assertion
> 
>
> Key: LUCENE-9686
> URL: https://issues.apache.org/jira/browse/LUCENE-9686
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Julie Tibshirani
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Reproduction line:
> {code}
> ./gradlew test --tests TestDirectIODirectory.testFloatsUnderflow 
> -Dtests.seed=73B56EAB13269C91 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=haw-US -Dtests.timezone=America/Inuvik -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> {code}
> I didn't have the chance to look deeply, but it seems like the wrong 
> exception type is being thrown:
> {code:java}
>  junit.framework.AssertionFailedError: Unexpected exception type, expected 
> EOFException but got java.nio.BufferUnderflowException
> at 
> __randomizedtesting.SeedInfo.seed([73B56EAB13269C91:1FD75ACA1CD83E9C]:0)
> at 
> org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2895)
> at 
> org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2876)
> at 
> org.apache.lucene.store.BaseDirectoryTestCase.testFloatsUnderflow(BaseDirectoryTestCase.java:291)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


dweiss commented on a change in pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568397883



##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/SpellCheckerTest.java
##
@@ -61,59 +61,74 @@ public void needAffixOnAffixes() throws Exception {
 doTest("needaffix5");
   }
 
+  @Test
   public void testBreak() throws Exception {
 doTest("break");
   }
 
-  public void testBreakDefault() throws Exception {
+  @Test
+  public void breakDefault() throws Exception {
 doTest("breakdefault");
   }
 
-  public void testBreakOff() throws Exception {
+  @Test
+  public void breakOff() throws Exception {
 doTest("breakoff");
   }
 
-  public void testCompoundrule() throws Exception {
+  @Test
+  public void compoundrule() throws Exception {
 doTest("compoundrule");
   }
 
-  public void testCompoundrule2() throws Exception {
+  @Test
+  public void compoundrule2() throws Exception {
 doTest("compoundrule2");
   }
 
-  public void testCompoundrule3() throws Exception {
+  @Test
+  public void compoundrule3() throws Exception {
 doTest("compoundrule3");
   }
 
-  public void testCompoundrule4() throws Exception {
+  @Test
+  public void compoundrule4() throws Exception {
 doTest("compoundrule4");
   }
 
-  public void testCompoundrule5() throws Exception {
+  @Test
+  public void compoundrule5() throws Exception {
 doTest("compoundrule5");
   }
 
-  public void testCompoundrule6() throws Exception {
+  @Test
+  public void compoundrule6() throws Exception {
 doTest("compoundrule6");
   }
 
-  public void testCompoundrule7() throws Exception {
+  @Test
+  public void compoundrule7() throws Exception {
 doTest("compoundrule7");
   }
 
-  public void testCompoundrule8() throws Exception {
+  @Test
+  public void compoundrule8() throws Exception {
 doTest("compoundrule8");
   }
 
-  public void testGermanCompounding() throws Exception {
+  @Test
+  public void germanCompounding() throws Exception {
 doTest("germancompounding");
   }
 
   protected void doTest(String name) throws Exception {
-InputStream affixStream =
-Objects.requireNonNull(getClass().getResourceAsStream(name + ".aff"), 
name);
-InputStream dictStream =
-Objects.requireNonNull(getClass().getResourceAsStream(name + ".dic"), 
name);
+checkSpellCheckerExpectations(

Review comment:
   Mhmm... let me try again then. It's weird - tried last night and got 
permission denied. Could be that I pulled your changes via https and not ssh... 
Sorry, it was late.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


dweiss commented on a change in pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568398245



##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/SpellCheckerTest.java
##
@@ -61,59 +61,74 @@ public void needAffixOnAffixes() throws Exception {
 doTest("needaffix5");
   }
 
+  @Test
   public void testBreak() throws Exception {
 doTest("break");
   }
 
-  public void testBreakDefault() throws Exception {
+  @Test
+  public void breakDefault() throws Exception {
 doTest("breakdefault");
   }
 
-  public void testBreakOff() throws Exception {
+  @Test
+  public void breakOff() throws Exception {
 doTest("breakoff");
   }
 
-  public void testCompoundrule() throws Exception {
+  @Test
+  public void compoundrule() throws Exception {
 doTest("compoundrule");
   }
 
-  public void testCompoundrule2() throws Exception {
+  @Test
+  public void compoundrule2() throws Exception {
 doTest("compoundrule2");
   }
 
-  public void testCompoundrule3() throws Exception {
+  @Test
+  public void compoundrule3() throws Exception {
 doTest("compoundrule3");
   }
 
-  public void testCompoundrule4() throws Exception {
+  @Test
+  public void compoundrule4() throws Exception {
 doTest("compoundrule4");
   }
 
-  public void testCompoundrule5() throws Exception {
+  @Test
+  public void compoundrule5() throws Exception {
 doTest("compoundrule5");
   }
 
-  public void testCompoundrule6() throws Exception {
+  @Test
+  public void compoundrule6() throws Exception {
 doTest("compoundrule6");
   }
 
-  public void testCompoundrule7() throws Exception {
+  @Test
+  public void compoundrule7() throws Exception {
 doTest("compoundrule7");
   }
 
-  public void testCompoundrule8() throws Exception {
+  @Test
+  public void compoundrule8() throws Exception {
 doTest("compoundrule8");
   }
 
-  public void testGermanCompounding() throws Exception {
+  @Test
+  public void germanCompounding() throws Exception {
 doTest("germancompounding");
   }
 
   protected void doTest(String name) throws Exception {
-InputStream affixStream =
-Objects.requireNonNull(getClass().getResourceAsStream(name + ".aff"), 
name);
-InputStream dictStream =
-Objects.requireNonNull(getClass().getResourceAsStream(name + ".dic"), 
name);
+checkSpellCheckerExpectations(

Review comment:
   Renaming SpellChecker to Hunspell - yes, I think  it's a good idea. 
Renaming tests later - absolutely.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


donnerpeter commented on a change in pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568402121



##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/TestsFromOriginalHunspellRepository.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.hunspell;
+
+import java.io.IOException;
+import java.nio.file.DirectoryStream;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.text.ParseException;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Set;
+import java.util.TreeSet;
+import java.util.stream.Collectors;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+/**
+ * Same as {@link SpellCheckerTest}, but checks all Hunspell's test data. The 
path to the checked
+ * out Hunspell repository should be in {@code -Dhunspell.repo.path=...} 
system property.
+ */
+@RunWith(Parameterized.class)

Review comment:
   Thanks! Now I'm starting to doubt whether this approach makes sense at 
all. I could avoid parameterization by generating test methods explicitly by 
files, with some risk that new files appear (which could be checked by 
additional code).
   
   And is it OK to modify the test policy for such local tests? I planned to 
add more not-easy-to-have-in-CI tests, which would measure performance and 
check correctness. They'd need external files with dictionaries, corpora for 
various languages (external or is there anything internal already?), and a 
test-only Hunspell JNI library for comparison (which needs a native binary and 
a couple of other jars, all of them need sha and license files, and it all gets 
quite verbose). Do you think the benefits of having this in the repo outweigh 
the costs? I could also leave this all locally, since I seem to be the only one 
needing these tests in the near future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15104) Restart solr will override the gc log

2021-02-02 Thread Hongxu Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276948#comment-17276948
 ] 

Hongxu Ma commented on SOLR-15104:
--

I opened a PR to improve it: https://github.com/apache/lucene-solr/pull/2289

> Restart solr will override the gc log
> -
>
> Key: SOLR-15104
> URL: https://issues.apache.org/jira/browse/SOLR-15104
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hongxu Ma
>Priority: Minor
>
> When restarting Solr, it will override the previous java gc log.
> This behavior is not friendly for debugging OOM, looks it is caused by the 
> hard-code gc param (in bin/solr):
> [https://github.com/apache/lucene-solr/blob/3e2fb59272f5b4d8106b3d8edf847f50bacd7a61/solr/bin/solr#L2031]
>  
> Refer other sys, I think add timestamp in default gc filename will be better.
> https://issues.apache.org/jira/browse/HBASE-18274
> https://issues.apache.org/jira/browse/CASSANDRA-2418
>  
> Hope it can be improved, thanks.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


dweiss commented on a change in pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#discussion_r568415196



##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/TestsFromOriginalHunspellRepository.java
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.hunspell;
+
+import java.io.IOException;
+import java.nio.file.DirectoryStream;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.text.ParseException;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Set;
+import java.util.TreeSet;
+import java.util.stream.Collectors;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+/**
+ * Same as {@link SpellCheckerTest}, but checks all Hunspell's test data. The 
path to the checked
+ * out Hunspell repository should be in {@code -Dhunspell.repo.path=...} 
system property.
+ */
+@RunWith(Parameterized.class)

Review comment:
   I think they should reside in the repo if they are useful (even for 
local launches). What I'm afraid of is that if these tools are not in use, 
they'll eventually degrade and stop working without anyone noticing. 
   
   I think the way to integrate such tests properly would be to add a specific 
gradle test task which would configure an appropriate policy, require pointers 
to the required resources, etc. This way these tests can be run as a CI run 
(somewhere... maybe a github action, even?).
   
   I think this can be ironed out later on, once you've written (notice the 
'you' here... ;) more of such tests - the patterns of making them work with the 
CI will naturally emerge from that.
   
   For now, feel free to use that original parameterized test runner - I'll 
look into making IntelliJ work with randomizedtesting again (because I use it 
here and in other projects). It's a moving target and thus a bit discouraging 
(I did the same thing a few times in the past already for various IDEs that 
interpreted test descriptions differently).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #2284: SOLR-11233: Add optional JAVA8_GC_LOG_FILE_OPTS for bin/solr.

2021-02-02 Thread GitBox


cpoerschke commented on a change in pull request #2284:
URL: https://github.com/apache/lucene-solr/pull/2284#discussion_r568428808



##
File path: solr/bin/solr
##
@@ -2026,7 +2026,11 @@ if [ "$GC_LOG_OPTS" != "" ]; then
 if [ "$JAVA_VENDOR" == "IBM J9" ]; then
   gc_log_flag="-Xverbosegclog"
 fi
-GC_LOG_OPTS+=("$gc_log_flag:$SOLR_LOGS_DIR/solr_gc.log" 
'-XX:+UseGCLogFileRotation' '-XX:NumberOfGCLogFiles=9' '-XX:GCLogFileSize=20M')
+if [ -z ${JAVA8_GC_LOG_FILE_OPTS+x} ]; then

Review comment:
   This variant is consistent with the existing variant at line 2010 but 
yes it confused me too when I first read it, looks there's a subtle difference 
between "unset" and "set to empty string" behaviour. Illustration:
   
   ```
   $ unset GC_LOG_OPTS
   $ if [ -z ${GC_LOG_OPTS+x} ]; then echo unset; else echo not unset; fi
   unset
   $ if [ -z ${GC_LOG_OPTS:+x} ]; then echo empty; else echo not empty; fi
   empty
   $ if [ -z ${GC_LOG_OPTS} ]; then echo empty; else echo not empty; fi
   empty
   $ 
   $ GC_LOG_OPTS=
   $ if [ -z ${GC_LOG_OPTS+x} ]; then echo unset; else echo not unset; fi
   not unset
   $ if [ -z ${GC_LOG_OPTS:+x} ]; then echo empty; else echo not empty; fi
   empty
   $ if [ -z ${GC_LOG_OPTS} ]; then echo empty; else echo not empty; fi
   empty
   $
   $ GC_LOG_OPTS=foobar
   $ if [ -z ${GC_LOG_OPTS+x} ]; then echo unset; else echo not unset; fi
   not unset
   $ if [ -z ${GC_LOG_OPTS:+x} ]; then echo empty; else echo not empty; fi
   not empty
   $ if [ -z ${GC_LOG_OPTS} ]; then echo empty; else echo not empty; fi
   not empty
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #2263: SOLR-14978 OOM Killer in Foreground (#2055)

2021-02-02 Thread GitBox


cpoerschke commented on a change in pull request #2263:
URL: https://github.com/apache/lucene-solr/pull/2263#discussion_r568436589



##
File path: solr/bin/solr
##
@@ -2115,6 +2128,15 @@ function start_solr() {
 SOLR_OPTS+=($AUTHC_OPTS)
   fi
 
+  # If a heap dump directory is specified, enable it in SOLR_OPTS
+  if [[ -z "$SOLR_HEAP_DUMP_DIR" ]] && [[ "$SOLR_HEAP_DUMP" == "true" ]]; then
+SOLR_HEAP_DUMP_DIR="${SOLR_LOGS_DIR}/dumps"
+  fi
+  if [[ -n "$SOLR_HEAP_DUMP_DIR" ]]; then
+SOLR_OPTS+=("-XX:+HeapDumpOnOutOfMemoryError")
+SOLR_OPTS+=("-XX:HeapDumpPath=$SOLR_HEAP_DUMP_DIR/solr-$(date 
+%s)-pid$$.hprof")

Review comment:
   How about also optionally supporting customisation of the file name e.g. 
via a `SOLR_HEAP_DUMP_FILE` variable? Reasons users might wish to customise:
   * inclusion of `SOLR_PORT` in the file name to more easily differentiate 
dumps for different Solr instances on the same machine
   * preference of (say) `date -u '+%Y%m%d-%H%M%S'` over `date +%s` for the 
timestamp
   * always use the same dump file as a way to limit the amount of disk space 
successive OOMs can use up (a colleague of mine had this insight)
   * omission of the pid and restriction of the timestamp e.g. to `date -u 
'+%Y%m%d'` so that at most one OOM file per day would exist
   * omission of the pid to avoid confusion when running in the background 
(because the the pid would be that of the shell script and not that of the Solr 
JVM, I think)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


donnerpeter commented on pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771492242


   I've done some rebasing, included your patch, renamed the test and tweaked 
the code a bit. Hopefully it's better now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


dweiss commented on pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771498850


   Eh. This is why Parameterized works for you and randomizedtesting doesn't:
   
https://github.com/JetBrains/intellij-community/blob/master/plugins/junit_rt/src/com/intellij/junit4/JUnit4TestRunnerUtil.java#L96-L105
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


dweiss commented on pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771499577


   If you take a look at that class you'll understand why it's such a mess to 
try to navigate those test descriptions...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


donnerpeter commented on pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771503328


   > Eh. This is why Parameterized works for you and randomizedtesting doesn't:
   > 
https://github.com/JetBrains/intellij-community/blob/master/plugins/junit_rt/src/com/intellij/junit4/JUnit4TestRunnerUtil.java#L96-L105
   
   That's what I feared: relying on JUnit internals :(



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


dweiss commented on pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267#issuecomment-771504446


   Yes, sadly. I haven't looked at junit5, shame on me. Perhaps it's improved 
there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #2267: LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data

2021-02-02 Thread GitBox


dweiss merged pull request #2267:
URL: https://github.com/apache/lucene-solr/pull/2267


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9707) Hunspell: check Lucene's implementation against Hunspell's test data

2021-02-02 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9707.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Hunspell: check Lucene's implementation against Hunspell's test data
> 
>
> Key: LUCENE-9707
> URL: https://issues.apache.org/jira/browse/LUCENE-9707
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Peter Gromov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9707) Hunspell: check Lucene's implementation against Hunspell's test data

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276991#comment-17276991
 ] 

ASF subversion and git services commented on LUCENE-9707:
-

Commit b48d5beb34957e83e99ced60d57d4839b474f018 in lucene-solr's branch 
refs/heads/master from Peter Gromov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b48d5be ]

LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test 
data (#2267)



> Hunspell: check Lucene's implementation against Hunspell's test data
> 
>
> Key: LUCENE-9707
> URL: https://issues.apache.org/jira/browse/LUCENE-9707
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Peter Gromov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15122:

Fix Version/s: master (9.0)

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-15122:
---

Assignee: Andrzej Bialecki

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277041#comment-17277041
 ] 

ASF subversion and git services commented on SOLR-15122:


Commit 4cb1000ea0a1f6c0d7be2486a709fc82dc94616b in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4cb1000 ]

SOLR-15122: Tentative fix for the test failure - the node in the test could go 
down
before the new plugin was active on the Overseer.


> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15068) RefGuide documentation for replica placement plugins

2021-02-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15068.
-
Resolution: Fixed

> RefGuide documentation for replica placement plugins
> 
>
> Key: SOLR-15068
> URL: https://issues.apache.org/jira/browse/SOLR-15068
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277044#comment-17277044
 ] 

Andrzej Bialecki commented on SOLR-15122:
-

I'll leave this open to see if the fix works.

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

2021-02-02 Thread GitBox


mayya-sharipova commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r568520864



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -941,6 +969,11 @@ public IndexWriter(Directory d, IndexWriterConfig conf) 
throws IOException {
 // obtain the write.lock. If the user configured a timeout,
 // we wrap with a sleeper and this might take some time.
 writeLock = d.obtainLock(WRITE_LOCK_NAME);
+if (config.getIndexSort() != null && leafSorter != null) {
+  throw new IllegalArgumentException(
+  "[IndexWriter] can't use index sort and leaf sorter at the same 
time!");

Review comment:
   @msokolov Thank you for  the clarification.  Indeed, it is much clear 
with an example you provided. Looks like we need to think  and discuss more 
about merging scenario, may be in the next PR or Jira ticket. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9705) Move all codec formats to the o.a.l.codecs.Lucene90 package

2021-02-02 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277050#comment-17277050
 ] 

Ignacio Vera commented on LUCENE-9705:
--

Thanks Julie, I think you capture the spirit of this change.

In addition once we have new formats, we can try to simplify things, for 
example getting rid of PackedInts (legacy) in all current codecs in favour of 
DirectReader and DirectWriter. 

 

> Move all codec formats to the o.a.l.codecs.Lucene90 package
> ---
>
> Key: LUCENE-9705
> URL: https://issues.apache.org/jira/browse/LUCENE-9705
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current formats are distributed in different packages, prefixed with the 
> Lucene version they were created. With the upcoming release of Lucene 9.0, it 
> would be nice to move all those formats to just the o.a.l.codecs.Lucene90 
> package (and of course moving the current ones to the backwards-codecs).
> This issue would actually facilitate moving the directory API to little 
> endian (LUCENE-9047) as the only codecs that would need to handle backwards 
> compatibility will be the codecs in backwards codecs.
> In addition, it can help formalising the use of internal versions vs format 
> versioning ( LUCENE-9616)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified

2021-02-02 Thread GitBox


dweiss commented on a change in pull request #2277:
URL: https://github.com/apache/lucene-solr/pull/2277#discussion_r568569461



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -696,45 +690,25 @@ char affixData(int affixIndex, int offset) {
 return fstCompiler.compile();
   }
 
-  /** pattern accepts optional BOM + SET + any whitespace */
-  static final Pattern ENCODING_PATTERN = 
Pattern.compile("^(\u00EF\u00BB\u00BF)?SET\\s+");
+  /** Parses the encoding and flag format specified in the provided 
InputStream */
+  private void readConfig(InputStream affix) throws IOException, 
ParseException {
+LineNumberReader reader = new LineNumberReader(new 
InputStreamReader(affix, DEFAULT_CHARSET));
+while (true) {
+  String line = reader.readLine();
+  if (line == null) break;
 
-  /**
-   * Parses the encoding specified in the affix file readable through the 
provided InputStream
-   *
-   * @param affix InputStream for reading the affix file
-   * @return Encoding specified in the affix file
-   * @throws IOException Can be thrown while reading from the InputStream
-   */
-  static String getDictionaryEncoding(InputStream affix) throws IOException {
-final StringBuilder encoding = new StringBuilder();
-for (; ; ) {
-  encoding.setLength(0);
-  int ch;
-  while ((ch = affix.read()) >= 0) {
-if (ch == '\n') {
-  break;
-}
-if (ch != '\r') {
-  encoding.append((char) ch);
-}
-  }
-  if (encoding.length() == 0
-  || encoding.charAt(0) == '#'
-  ||
-  // this test only at the end as ineffective but would allow lines 
only containing spaces:
-  encoding.toString().trim().length() == 0) {
-if (ch < 0) {
-  return DEFAULT_CHARSET.name();
-}
-continue;
+  line = line.trim();
+
+  while (line.startsWith("\u00EF") || line.startsWith("\u00BB") || 
line.startsWith("\u00BF")) {

Review comment:
   Can the bom really be present on any line? Wouldn't a more elegant 
solution be to use a buffered input stream (or a pushback input stream) and 
just consume the bom if it's leading the file?
   
   It is a bit awkward that those files are parsed as ascii (well, iso8859-1) 
and at the same time have utf bom (not to mention that bit where you convert to 
utf8 from usi8859-1)... Is this encoding situation really so messed up in 
hunspell?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified

2021-02-02 Thread GitBox


donnerpeter commented on a change in pull request #2277:
URL: https://github.com/apache/lucene-solr/pull/2277#discussion_r568575767



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -696,45 +690,25 @@ char affixData(int affixIndex, int offset) {
 return fstCompiler.compile();
   }
 
-  /** pattern accepts optional BOM + SET + any whitespace */
-  static final Pattern ENCODING_PATTERN = 
Pattern.compile("^(\u00EF\u00BB\u00BF)?SET\\s+");
+  /** Parses the encoding and flag format specified in the provided 
InputStream */
+  private void readConfig(InputStream affix) throws IOException, 
ParseException {
+LineNumberReader reader = new LineNumberReader(new 
InputStreamReader(affix, DEFAULT_CHARSET));
+while (true) {
+  String line = reader.readLine();
+  if (line == null) break;
 
-  /**
-   * Parses the encoding specified in the affix file readable through the 
provided InputStream
-   *
-   * @param affix InputStream for reading the affix file
-   * @return Encoding specified in the affix file
-   * @throws IOException Can be thrown while reading from the InputStream
-   */
-  static String getDictionaryEncoding(InputStream affix) throws IOException {
-final StringBuilder encoding = new StringBuilder();
-for (; ; ) {
-  encoding.setLength(0);
-  int ch;
-  while ((ch = affix.read()) >= 0) {
-if (ch == '\n') {
-  break;
-}
-if (ch != '\r') {
-  encoding.append((char) ch);
-}
-  }
-  if (encoding.length() == 0
-  || encoding.charAt(0) == '#'
-  ||
-  // this test only at the end as ineffective but would allow lines 
only containing spaces:
-  encoding.toString().trim().length() == 0) {
-if (ch < 0) {
-  return DEFAULT_CHARSET.name();
-}
-continue;
+  line = line.trim();
+
+  while (line.startsWith("\u00EF") || line.startsWith("\u00BB") || 
line.startsWith("\u00BF")) {

Review comment:
   Most likely it's just on the first line, handling it this way was just 
easier. Pushback might indeed be more elegant, I'll try that, thanks!
   
   The situation with encoding is complicated indeed. AFAIU the encodings are 
either ASCII-based 8-bit, or UTF-8, so for the first time we read the file we 
can safely check Latin letters. At least that's what Hunspell appears to do as 
well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified

2021-02-02 Thread GitBox


dweiss commented on a change in pull request #2277:
URL: https://github.com/apache/lucene-solr/pull/2277#discussion_r568580925



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -696,45 +690,25 @@ char affixData(int affixIndex, int offset) {
 return fstCompiler.compile();
   }
 
-  /** pattern accepts optional BOM + SET + any whitespace */
-  static final Pattern ENCODING_PATTERN = 
Pattern.compile("^(\u00EF\u00BB\u00BF)?SET\\s+");
+  /** Parses the encoding and flag format specified in the provided 
InputStream */
+  private void readConfig(InputStream affix) throws IOException, 
ParseException {
+LineNumberReader reader = new LineNumberReader(new 
InputStreamReader(affix, DEFAULT_CHARSET));
+while (true) {
+  String line = reader.readLine();
+  if (line == null) break;
 
-  /**
-   * Parses the encoding specified in the affix file readable through the 
provided InputStream
-   *
-   * @param affix InputStream for reading the affix file
-   * @return Encoding specified in the affix file
-   * @throws IOException Can be thrown while reading from the InputStream
-   */
-  static String getDictionaryEncoding(InputStream affix) throws IOException {
-final StringBuilder encoding = new StringBuilder();
-for (; ; ) {
-  encoding.setLength(0);
-  int ch;
-  while ((ch = affix.read()) >= 0) {
-if (ch == '\n') {
-  break;
-}
-if (ch != '\r') {
-  encoding.append((char) ch);
-}
-  }
-  if (encoding.length() == 0
-  || encoding.charAt(0) == '#'
-  ||
-  // this test only at the end as ineffective but would allow lines 
only containing spaces:
-  encoding.toString().trim().length() == 0) {
-if (ch < 0) {
-  return DEFAULT_CHARSET.name();
-}
-continue;
+  line = line.trim();
+
+  while (line.startsWith("\u00EF") || line.startsWith("\u00BB") || 
line.startsWith("\u00BF")) {

Review comment:
   Ok, so it's essentially an unknown byte stream with dynamic charset 
detection. Not fun. If it's restricted to a reasonable subset (like you said) 
then a preflight of the content could determine the actual encoding (at least 
until an explicit encoding declaration is found). Then things would be less 
messy down the road as you'd just have a Reader to read from... 
   
   Pushback is fine too. Either this or a BufferedInputStream and use 
mark/reset to adjust stream position after you detect the BOM (or not). As much 
as I like PushbackInputStream, it predates dinosaurs. :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15119) Make LINK splitMethod the default for SplitShardCmd

2021-02-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277109#comment-17277109
 ] 

Gézapeti commented on SOLR-15119:
-

I've tried it out and the Link method works with HDFS as well, it's smart 
enough to fall back to copy the whole index over on HDFS. 

> Make LINK splitMethod the default for SplitShardCmd
> ---
>
> Key: SOLR-15119
> URL: https://issues.apache.org/jira/browse/SOLR-15119
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Megan Carey
>Priority: Major
>  Labels: easy-fix
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> REWRITE splitMethod is still the default in SplitShardCmd [1], despite LINK 
> being much faster. IndexSizeTrigger in branch_8x already uses LINK by default 
> [2], and we have found LINK to be reliable and performant at scale. This work 
> will just update the default in SplitShardCmd to make LINK the default 
> overall.
>  
>  
> [1][https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java#L88]
>  
> [2][https://github.com/apache/lucene-solr/blob/branch_8x/solr/core/src/java/org/apache/solr/cloud/autoscaling/IndexSizeTrigger.java#L186]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9636) Exact and operation to get a SIMD optimize

2021-02-02 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277119#comment-17277119
 ] 

Markus Jelsma commented on LUCENE-9636:
---

* [LUCENE-9636|http://issues.apache.org/jira/browse/LUCENE-9636]: Faster 
decoding of postings for some numbers of bits per value. 
(Guo Feng via Adrien Grand)

According to the CHANGES, this ticket should be marked as resolved is it not?

> Exact and operation to get a SIMD optimize
> --
>
> Key: LUCENE-9636
> URL: https://issues.apache.org/jira/browse/LUCENE-9636
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In decode6(), decode7(), decode14(), decode15(), decode24() longs always `&` 
> a same mask and do some shift. By printing assemble language, i find that JIT 
> did not optimize them with SIMD instructions. But when we extract all `&` 
> operations and do them first, JIT will use SIMD optimize on them.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9636) Exact and operation to get a SIMD optimize

2021-02-02 Thread Feng Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo resolved LUCENE-9636.
--
Resolution: Fixed

> Exact and operation to get a SIMD optimize
> --
>
> Key: LUCENE-9636
> URL: https://issues.apache.org/jira/browse/LUCENE-9636
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In decode6(), decode7(), decode14(), decode15(), decode24() longs always `&` 
> a same mask and do some shift. By printing assemble language, i find that JIT 
> did not optimize them with SIMD instructions. But when we extract all `&` 
> operations and do them first, JIT will use SIMD optimize on them.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277129#comment-17277129
 ] 

Mike Drob commented on SOLR-15122:
--

I had just gone through and replaced a bunch of busy wait loops with real 
conditional tools, it would be good to do the same here. Perhaps the test code 
can set a monitor and then if a monitor is not null, the event producer can 
notify on it whenever the version changes. 

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #2268: LUCENE-9705: Move Lucene50CompoundFormat to Lucene90CompoundFormat

2021-02-02 Thread GitBox


jpountz commented on a change in pull request #2268:
URL: https://github.com/apache/lucene-solr/pull/2268#discussion_r568617447



##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50CompoundFormat.java
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_codecs.lucene50;
+
+import java.io.IOException;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.CompoundDirectory;
+import org.apache.lucene.codecs.CompoundFormat;
+import org.apache.lucene.index.SegmentInfo;
+import org.apache.lucene.store.DataOutput;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+
+/**
+ * Lucene 5.0 compound file format
+ *
+ * Files:
+ *
+ * 
+ *   .cfs: An optional "virtual" file consisting of all the 
other index files for
+ *   systems that frequently run out of file handles.
+ *   .cfe: The "virtual" compound file's entry table holding 
all entries in the
+ *   corresponding .cfs file.
+ * 
+ *
+ * Description:
+ *
+ * 
+ *   Compound (.cfs) --> Header, FileData FileCount, Footer
+ *   Compound Entry Table (.cfe) --> Header, FileCount,  FileCount
+ *   Header --> {@link CodecUtil#writeIndexHeader IndexHeader}
+ *   FileCount --> {@link DataOutput#writeVInt VInt}
+ *   DataOffset,DataLength,Checksum --> {@link DataOutput#writeLong 
UInt64}
+ *   FileName --> {@link DataOutput#writeString String}
+ *   FileData --> raw file data
+ *   Footer --> {@link CodecUtil#writeFooter CodecFooter}
+ * 
+ *
+ * Notes:
+ *
+ * 
+ *   FileCount indicates how many files are contained in this compound 
file. The entry table
+ *   that follows has that many entries.
+ *   Each directory entry contains a long pointer to the start of this 
file's data section, the
+ *   files length, and a String with that file's name.
+ * 
+ */
+public final class Lucene50CompoundFormat extends CompoundFormat {
+
+  /** Extension of compound file */
+  static final String DATA_EXTENSION = "cfs";
+  /** Extension of compound file entries */
+  static final String ENTRIES_EXTENSION = "cfe";
+
+  static final String DATA_CODEC = "Lucene50CompoundData";
+  static final String ENTRY_CODEC = "Lucene50CompoundEntries";
+  static final int VERSION_START = 0;

Review comment:
   I'd like to keep it for now, even if the version is always 0. My gut 
feeling is that we should fork file formats more aggressively than we do today 
but I still don't have full confidence that we will never use the internal 
versioning again.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9723) Hunspell: update sanity tests that load all dictionaries

2021-02-02 Thread Peter Gromov (Jira)
Peter Gromov created LUCENE-9723:


 Summary: Hunspell: update sanity tests that load all dictionaries
 Key: LUCENE-9723
 URL: https://issues.apache.org/jira/browse/LUCENE-9723
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter opened a new pull request #2290: LUCENE-9723: Hunspell: update sanity tests that load all dictionaries

2021-02-02 Thread GitBox


donnerpeter opened a new pull request #2290:
URL: https://github.com/apache/lucene-solr/pull/2290


   
   
   
   # Description
   
   `TestAllDictionaries`(2) are hard to run and their javadoc outdated, as well 
as the package's javadoc
   
   # Solution
   
   Make it a single test understanding dictionary dir format of at least two 
repositories, point to them in the package javadoc.
   
   # Tests
   
   `TestAllDictionaries` is updated (but failing for now)
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified

2021-02-02 Thread GitBox


dweiss commented on pull request #2277:
URL: https://github.com/apache/lucene-solr/pull/2277#issuecomment-771677497


   Hi Peter. I pushed a commit which replaces bom consumption with a small 
function so that either it's read atomically or nothing is consumed at all. 
Looking at the code also made me wonder whether a sufficiently large leading 
buffer could be used to just parse the needed stuff from the input stream 
(bypassing the need to create a temp file)... can be left for a later 
improvement though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified

2021-02-02 Thread GitBox


donnerpeter commented on pull request #2277:
URL: https://github.com/apache/lucene-solr/pull/2277#issuecomment-771711060


   Thank you, LGTM!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15119) Make LINK splitMethod the default for SplitShardCmd

2021-02-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277109#comment-17277109
 ] 

Gézapeti edited comment on SOLR-15119 at 2/2/21, 4:01 PM:
--

I've tried it out and the Link method works with HDFS as well, it's smart 
enough to fall back to copy the whole index over on HDFS. 
In any ways, I'm fine with changing the default


was (Author: gezapeti):
I've tried it out and the Link method works with HDFS as well, it's smart 
enough to fall back to copy the whole index over on HDFS. 

> Make LINK splitMethod the default for SplitShardCmd
> ---
>
> Key: SOLR-15119
> URL: https://issues.apache.org/jira/browse/SOLR-15119
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Megan Carey
>Priority: Major
>  Labels: easy-fix
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> REWRITE splitMethod is still the default in SplitShardCmd [1], despite LINK 
> being much faster. IndexSizeTrigger in branch_8x already uses LINK by default 
> [2], and we have found LINK to be reliable and performant at scale. This work 
> will just update the default in SplitShardCmd to make LINK the default 
> overall.
>  
>  
> [1][https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java#L88]
>  
> [2][https://github.com/apache/lucene-solr/blob/branch_8x/solr/core/src/java/org/apache/solr/cloud/autoscaling/IndexSizeTrigger.java#L186]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dnhatn commented on pull request #2288: LUCENE-9722: Close merged readers on abort

2021-02-02 Thread GitBox


dnhatn commented on pull request #2288:
URL: https://github.com/apache/lucene-solr/pull/2288#issuecomment-771758863


   Thanks Simon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dnhatn merged pull request #2288: LUCENE-9722: Close merged readers on abort

2021-02-02 Thread GitBox


dnhatn merged pull request #2288:
URL: https://github.com/apache/lucene-solr/pull/2288


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9722) Aborted merge can leak readers if the output is empty

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277256#comment-17277256
 ] 

ASF subversion and git services commented on LUCENE-9722:
-

Commit 47e3d06ce00642624634e5d45ebc16fa33d48099 in lucene-solr's branch 
refs/heads/master from Nhat Nguyen
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=47e3d06 ]

LUCENE-9722: Close merged readers on abort (#2288)

We fail to close the merged readers of an aborted merge if its 
output segment contains no document.

This bug was discovered by a test in Elasticsearch 
(elastic/elasticsearch#67884).

> Aborted merge can leak readers if the output is empty
> -
>
> Key: LUCENE-9722
> URL: https://issues.apache.org/jira/browse/LUCENE-9722
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (9.0), 8.7
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We fail to close the merged readers of an aborted merge if its output segment 
> contains no document.
> This bug was discovered by a test in Elasticsearch 
> ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9724) Hunspell: load dictionaries with extra content on REP lines

2021-02-02 Thread Peter Gromov (Jira)
Peter Gromov created LUCENE-9724:


 Summary: Hunspell: load dictionaries with extra content on REP 
lines
 Key: LUCENE-9724
 URL: https://issues.apache.org/jira/browse/LUCENE-9724
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9724) Hunspell: tolerate extra content on REP lines

2021-02-02 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov updated LUCENE-9724:
-
Summary: Hunspell: tolerate extra content on REP lines  (was: Hunspell: 
load dictionaries with extra content on REP lines)

> Hunspell: tolerate extra content on REP lines
> -
>
> Key: LUCENE-9724
> URL: https://issues.apache.org/jira/browse/LUCENE-9724
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Peter Gromov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14886) Suppress stack trace in Query response.

2021-02-02 Thread Isabelle Giguere (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabelle Giguere updated SOLR-14886:

Attachment: SOLR-14886.patch

> Suppress stack trace in Query response.
> ---
>
> Key: SOLR-14886
> URL: https://issues.apache.org/jira/browse/SOLR-14886
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.6.2
>Reporter: Vrinda Davda
>Priority: Minor
> Attachments: SOLR-14886.patch, SOLR-14886.patch
>
>
> Currently there is no way to suppress the stack trace in solr response when 
> it throws an exception, like when a client sends a badly formed query string, 
> or exception with status 500 It sends full stack trace in the response. 
> I would propose a configuration for error messages so that the stack trace is 
> not visible to avoid any sensitive information in the stack trace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14886) Suppress stack trace in Query response.

2021-02-02 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277272#comment-17277272
 ] 

Isabelle Giguere commented on SOLR-14886:
-

Patch off current Solr master branch (9.x)

- Add a property "hideStackTrace" to solr.xml
- In NodeConfig, the default value is "false", for back-compatibility.
- Use the new property in ResponseUtils, to print out, or not, the stack trace.
- Adapt code that calls ResponseUtils
- Add documentation in Ref Guide

There's no direct path between solr.xml and ResponseUtils, or any class that 
uses ResponseUtils, so the "hideStackTrace" property is duplicated in 
CoreContainer, just so it lives in a place where it can be read. May not be the 
best approach.

Note that the patch cannot fix the cases where the error message ()contains the full stack trace.

> Suppress stack trace in Query response.
> ---
>
> Key: SOLR-14886
> URL: https://issues.apache.org/jira/browse/SOLR-14886
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.6.2
>Reporter: Vrinda Davda
>Priority: Minor
> Attachments: SOLR-14886.patch, SOLR-14886.patch
>
>
> Currently there is no way to suppress the stack trace in solr response when 
> it throws an exception, like when a client sends a badly formed query string, 
> or exception with status 500 It sends full stack trace in the response. 
> I would propose a configuration for error messages so that the stack trace is 
> not visible to avoid any sensitive information in the stack trace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-02-02 Thread GitBox


bruno-roustant commented on pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#issuecomment-771782427


   I expect you don't need to change Lucene code but just write a new custom 
codec (with a specific name) which provides a custom DocValuesFormat. It 
extends PerFieldDocValuesFormat and implements the method
   DocValuesFormat getDocValuesFormatForField(String field).
   This method provides either a standard Lucene80DocValuesFormat (no 
compression) or another new custom DocValuesFormat (with a specific name to 
write in the index) extending Lucene80DocValuesFormat with BEST_COMPRESSION 
mode.
   The choice can be made either based on a config (e.g. file) which lists all 
compressed DocValue based fields, or based on a naming convention.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9722) Aborted merge can leak readers if the output is empty

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277288#comment-17277288
 ] 

ASF subversion and git services commented on LUCENE-9722:
-

Commit 4ade962679cf07bd4e706f1851bb740a4ad2916a in lucene-solr's branch 
refs/heads/branch_8x from Nhat Nguyen
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4ade962 ]

LUCENE-9722: Close merged readers on abort (#2288)

We fail to close the merged readers of an aborted merge if its 
output segment contains no document.

This bug was discovered by a test in Elasticsearch 
(elastic/elasticsearch#67884).

> Aborted merge can leak readers if the output is empty
> -
>
> Key: LUCENE-9722
> URL: https://issues.apache.org/jira/browse/LUCENE-9722
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (9.0), 8.7
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We fail to close the merged readers of an aborted merge if its output segment 
> contains no document.
> This bug was discovered by a test in Elasticsearch 
> ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9722) Aborted merge can leak readers if the output is empty

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277339#comment-17277339
 ] 

ASF subversion and git services commented on LUCENE-9722:
-

Commit 2e7cfbd8e60cf8ccb23619db8b20f193546fd1c8 in lucene-solr's branch 
refs/heads/branch_8_8 from Nhat Nguyen
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2e7cfbd ]

LUCENE-9722: Close merged readers on abort (#2288)

We fail to close the merged readers of an aborted merge if its 
output segment contains no document.

This bug was discovered by a test in Elasticsearch 
(elastic/elasticsearch#67884).

> Aborted merge can leak readers if the output is empty
> -
>
> Key: LUCENE-9722
> URL: https://issues.apache.org/jira/browse/LUCENE-9722
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (9.0), 8.7
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We fail to close the merged readers of an aborted merge if its output segment 
> contains no document.
> This bug was discovered by a test in Elasticsearch 
> ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9724) Hunspell: tolerate existing aff/dic file typos

2021-02-02 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov updated LUCENE-9724:
-
Summary: Hunspell: tolerate existing aff/dic file typos  (was: Hunspell: 
tolerate extra content on REP lines)

> Hunspell: tolerate existing aff/dic file typos
> --
>
> Key: LUCENE-9724
> URL: https://issues.apache.org/jira/browse/LUCENE-9724
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Peter Gromov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on pull request #2276: Improve backwards compatibility tests for sorted indexes.

2021-02-02 Thread GitBox


jtibshirani commented on pull request #2276:
URL: https://github.com/apache/lucene-solr/pull/2276#issuecomment-771834969


   @mikemccand I tagged you for a (hopefully quick) review, as I think you 
added the TODOs?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9722) Aborted merge can leak readers if the output is empty

2021-02-02 Thread Nhat Nguyen (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nhat Nguyen updated LUCENE-9722:

Fix Version/s: 8.9
   master (9.0)
   8.0.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Aborted merge can leak readers if the output is empty
> -
>
> Key: LUCENE-9722
> URL: https://issues.apache.org/jira/browse/LUCENE-9722
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (9.0), 8.7
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
> Fix For: 8.0.1, master (9.0), 8.9
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We fail to close the merged readers of an aborted merge if its output segment 
> contains no document.
> This bug was discovered by a test in Elasticsearch 
> ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob opened a new pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


madrob opened a new pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15124) Remove node/container level admin handlers from ImplicitPlugins.json (core level).

2021-02-02 Thread Cassandra Targett (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277347#comment-17277347
 ] 

Cassandra Targett commented on SOLR-15124:
--

I'm not able to understand entirely from the description & attached PR if this 
is removing some requestHandlers from being implicit, or if it's an internal 
change that won't impact what users see/use? Either way I just wanted to 
mention that there is a page in the Ref Guide {{implicit-requesthandlers.adoc}} 
which may need to be updated depending on the scope.

> Remove node/container level admin handlers from ImplicitPlugins.json (core 
> level).
> --
>
> Key: SOLR-15124
> URL: https://issues.apache.org/jira/browse/SOLR-15124
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Blocker
>  Labels: newdev
> Fix For: master (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There are many very old administrative RequestHandlers registered in a 
> SolrCore that are actually JVM / node / CoreContainer level in nature.  These 
> pre-dated CoreContainer level handlers.  We should (1) remove them from 
> ImplictPlugins.json, and (2) make simplifying tweaks to them to remove that 
> they work at the core level.  For example LoggingHandler has two constructors 
> and a non-final Watcher because it works in these two modalities.  It need 
> only have the one that takes a CoreContainer, and Watcher will then be final.
> /admin/threads
> /admin/properties
> /admin/logging
> Should stay because has core-level stuff:
> /admin/plugins
> /admin/mbeans
> This one:
> /admin/system -- SystemInfoHandler
> returns "core" level information, and also node level stuff.  I propose 
> splitting this one to a CoreInfoHandler to split the logic.  Maybe a separate 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277353#comment-17277353
 ] 

Mike Drob commented on SOLR-15122:
--

I put up a PR to demonstrate what I meant, I think the same pattern should be 
applied to the DelegatingPlacementPlugin code, if this was based on that.

I trust that the design here is correct for accomplishing what we need, but the 
implementation needed a few touch ups from what you did.

* We shouldn't use postfix increment with a volatile variable, as that is not 
an atomic operation.
* Using wait/notify is going to be more efficient use of resources than a busy 
wait.
* You weren't saving the new value of version on subsequent calls, so I updated 
that too.

Please take a look and confirm that this still maintains the intent of what you 
were trying to do.

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277360#comment-17277360
 ] 

Ilan Ginzburg commented on SOLR-15122:
--

??We shouldn't use postfix increment with a volatile variable, as that is not 
an atomic operation??

I wouldn't make this a blanket statement. One reason to use a volatile if to 
conform to the Java Memory Model and not require synchronization for access 
from different threads. In some cases atomocity is not needed.

Also, using an AtomicInteger and using it only from within a synchronized 
section seems a bit overkill. Any Integer would do (or even an integer if the 
mutex block is using another object).

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277360#comment-17277360
 ] 

Ilan Ginzburg edited comment on SOLR-15122 at 2/2/21, 6:16 PM:
---

??We shouldn't use postfix increment with a volatile variable, as that is not 
an atomic operation??

I wouldn't make this a blanket statement. One reason to use a volatile is to 
conform to the Java Memory Model and not require synchronization for access 
from different threads. In some cases atomocity is not needed.

Also, using an AtomicInteger and using it only from within a synchronized 
section seems a bit overkill. Any Integer would do (or even an integer if the 
mutex block is using another object).


was (Author: murblanc):
??We shouldn't use postfix increment with a volatile variable, as that is not 
an atomic operation??

I wouldn't make this a blanket statement. One reason to use a volatile if to 
conform to the Java Memory Model and not require synchronization for access 
from different threads. In some cases atomocity is not needed.

Also, using an AtomicInteger and using it only from within a synchronized 
section seems a bit overkill. Any Integer would do (or even an integer if the 
mutex block is using another object).

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


sigram commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568832617



##
File path: 
solr/core/src/java/org/apache/solr/cluster/events/impl/DelegatingClusterEventProducer.java
##
@@ -90,7 +95,10 @@ public void setDelegate(ClusterEventProducer newDelegate) {
 log.debug("--- delegate {} already in state {}", delegate, 
delegate.getState());
   }
 }
-this.version++;
+synchronized (version) {

Review comment:
   I don't think we need AtomicInteger if all sections that access 
`version` are synchronized?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


sigram commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568833400



##
File path: 
solr/core/src/java/org/apache/solr/cluster/events/impl/DelegatingClusterEventProducer.java
##
@@ -144,7 +152,25 @@ public synchronized void stop() {
   }
 
   @VisibleForTesting
-  public int getVersion() {
-return version;
+  public int waitForVersionChange(int currentVersion, int timeoutSec) throws 
InterruptedException, TimeoutException {

Review comment:
   I debated whether to add this to the wrappers... it's only needed in 
tests. OTOH putting it here makes the test code much simpler.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


sigram commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568833741



##
File path: 
solr/core/src/test/org/apache/solr/cluster/events/ClusterEventProducerTest.java
##
@@ -292,7 +287,7 @@ public void testListenerPlugins() throws Exception {
 .build();
 V2Response rsp = req.process(cluster.getSolrClient());
 assertEquals(0, rsp.getStatus());
-version = waitForVersionChange(-1, 10);
+version = waitForVersionChange(version, 10);

Review comment:
   Gah.. copy/paste error.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15128) nodeName does not contain expected ':' separator: localhost

2021-02-02 Thread Timothy Potter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-15128:
--
Fix Version/s: master (9.0)

> nodeName does not contain expected ':' separator: localhost
> ---
>
> Key: SOLR-15128
> URL: https://issues.apache.org/jira/browse/SOLR-15128
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: Only seems to affect master, 8.8 is not affected.
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Fix For: master (9.0)
>
>
> {code}
> "error":{"msg":"nodeName does not contain expected ':' separator: 
> localhost","trace":"java.lang.IllegalArgumentException: nodeName does not 
> contain expected ':' separator: localhost\n\tat 
> org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:764)\n\tat 
> org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:759)\n\tat 
> org.apache.solr.common.cloud.UrlScheme.getBaseUrlForNodeName(UrlScheme.java:54)\n\ta{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15128) nodeName does not contain expected ':' separator: localhost

2021-02-02 Thread Timothy Potter (Jira)
Timothy Potter created SOLR-15128:
-

 Summary: nodeName does not contain expected ':' separator: 
localhost
 Key: SOLR-15128
 URL: https://issues.apache.org/jira/browse/SOLR-15128
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
 Environment: Only seems to affect master, 8.8 is not affected.
Reporter: Timothy Potter
Assignee: Timothy Potter


{code}
"error":{"msg":"nodeName does not contain expected ':' separator: 
localhost","trace":"java.lang.IllegalArgumentException: nodeName does not 
contain expected ':' separator: localhost\n\tat 
org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:764)\n\tat 
org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:759)\n\tat 
org.apache.solr.common.cloud.UrlScheme.getBaseUrlForNodeName(UrlScheme.java:54)\n\ta{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15128) nodeName does not contain expected ':' separator: localhost

2021-02-02 Thread Timothy Potter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277376#comment-17277376
 ] 

Timothy Potter commented on SOLR-15128:
---

Not sure why master is seeing {{localhost}} as the nodeName w/o port on it, but 
this is breaking the parsing in {{Utils.getBaseUrlFromNodeName}}

> nodeName does not contain expected ':' separator: localhost
> ---
>
> Key: SOLR-15128
> URL: https://issues.apache.org/jira/browse/SOLR-15128
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: Only seems to affect master, 8.8 is not affected.
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Fix For: master (9.0)
>
>
> {code}
> "error":{"msg":"nodeName does not contain expected ':' separator: 
> localhost","trace":"java.lang.IllegalArgumentException: nodeName does not 
> contain expected ':' separator: localhost\n\tat 
> org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:764)\n\tat 
> org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:759)\n\tat 
> org.apache.solr.common.cloud.UrlScheme.getBaseUrlForNodeName(UrlScheme.java:54)\n\ta{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15128) nodeName does not contain expected ':' separator: localhost

2021-02-02 Thread Timothy Potter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter resolved SOLR-15128.
---
Resolution: Won't Fix

Ignore this ;-) Was using the wrong method to get the nodeName in some code 
that only exists on master. {{zkcontroller.getNodeName}} is the correct way to 
get the nodeName for converting to a URL.

> nodeName does not contain expected ':' separator: localhost
> ---
>
> Key: SOLR-15128
> URL: https://issues.apache.org/jira/browse/SOLR-15128
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: Only seems to affect master, 8.8 is not affected.
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Fix For: master (9.0)
>
>
> {code}
> "error":{"msg":"nodeName does not contain expected ':' separator: 
> localhost","trace":"java.lang.IllegalArgumentException: nodeName does not 
> contain expected ':' separator: localhost\n\tat 
> org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:764)\n\tat 
> org.apache.solr.common.util.Utils.getBaseUrlForNodeName(Utils.java:759)\n\tat 
> org.apache.solr.common.cloud.UrlScheme.getBaseUrlForNodeName(UrlScheme.java:54)\n\ta{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277399#comment-17277399
 ] 

Mike Drob commented on SOLR-15122:
--

I'm pretty comfortable with that blanket statement. If you're using volatile, 
that means you expect multiple threads accessing, and if you have multiple 
threads writing then you shouldn't be using non-atomic postfix increment. If 
you can guarantee that you only have one writer, and the volatile is for the 
readers, then... maybe it's ok? It's still trappy and can lead to issues down 
the line.

I used AtomicInteger because we don't have a mutable Integer and needed an 
object anyway for the sync block. I can rewrite that with an Object and an int 
if you prefer.

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


madrob commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568852727



##
File path: 
solr/core/src/java/org/apache/solr/cluster/events/impl/DelegatingClusterEventProducer.java
##
@@ -144,7 +152,25 @@ public synchronized void stop() {
   }
 
   @VisibleForTesting
-  public int getVersion() {
-return version;
+  public int waitForVersionChange(int currentVersion, int timeoutSec) throws 
InterruptedException, TimeoutException {

Review comment:
   It should really go in a third place I think, because then we can reuse 
the versioning logic between this and 
DelegatingPlacementPluginFactory/IntegrationTest





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277406#comment-17277406
 ] 

Ilan Ginzburg commented on SOLR-15122:
--

You're right about no mutable Integer.

Would there be a way for the test code to pass a synchronization object (a 
latch? that would be null for non test code) that prod code would use so that 
we don't end up with large methods in production classes that are only used for 
tests?

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277423#comment-17277423
 ] 

Mike Drob commented on SOLR-15122:
--

Refactored the code in anticipation of working over 
DelegatingPlacementPluginFactory as well, moved the wait logic to test sources 
with an int and an Object instead of Atomic Integer. Please take a look before 
I consolidate the other implementation

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] orenovadia commented on a change in pull request #2231: LUCENE-9680 - Re-add IndexWriter::getFieldNames

2021-02-02 Thread GitBox


orenovadia commented on a change in pull request #2231:
URL: https://github.com/apache/lucene-solr/pull/2231#discussion_r568890937



##
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java
##
@@ -4600,4 +4600,49 @@ public void testIndexWriterBlocksOnStall() throws 
IOException, InterruptedExcept
   }
 }
   }
+
+  public void testGetFieldNames() throws IOException {
+Directory dir = newDirectory();
+
+IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig(new 
MockAnalyzer(random(;
+
+assertEquals(Set.of(), writer.getFieldNames());
+
+addDocWithField(writer, "f1");
+assertEquals(Set.of("f1"), writer.getFieldNames());
+
+// should be unmodifiable:
+final Set fieldSet = writer.getFieldNames();
+assertThrows(UnsupportedOperationException.class, () -> 
fieldSet.add("cannot modify"));
+assertThrows(UnsupportedOperationException.class, () -> 
fieldSet.remove("f1"));
+
+addDocWithField(writer, "f2");
+assertEquals(Set.of("f1", "f2"), writer.getFieldNames());

Review comment:
   Sounds good!
   Added in: 1bc95ae7f7e





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified

2021-02-02 Thread GitBox


dweiss merged pull request #2277:
URL: https://github.com/apache/lucene-solr/pull/2277


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9716) Hunspell: support flag usage before its format is even specified

2021-02-02 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9716.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Hunspell: support flag usage before its format is even specified
> 
>
> Key: LUCENE-9716
> URL: https://issues.apache.org/jira/browse/LUCENE-9716
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Peter Gromov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> e.g. `nl` dictionaries first have `KEEPCASE Kc` and only then `FLAG long`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9716) Hunspell: support flag usage before its format is even specified

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277437#comment-17277437
 ] 

ASF subversion and git services commented on LUCENE-9716:
-

Commit 8f75933f3dae9f334e7d302bbfdc05d2b2e3c979 in lucene-solr's branch 
refs/heads/master from Peter Gromov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8f75933 ]

LUCENE-9716: Hunspell: support flag usage before its format is even specified 
(#2277)



> Hunspell: support flag usage before its format is even specified
> 
>
> Key: LUCENE-9716
> URL: https://issues.apache.org/jira/browse/LUCENE-9716
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Peter Gromov
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> e.g. `nl` dictionaries first have `KEEPCASE Kc` and only then `FLAG long`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


murblanc commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568912229



##
File path: 
solr/core/src/java/org/apache/solr/cluster/events/impl/DelegatingClusterEventProducer.java
##
@@ -90,7 +96,9 @@ public void setDelegate(ClusterEventProducer newDelegate) {
 log.debug("--- delegate {} already in state {}", delegate, 
delegate.getState());
   }
 }
-this.version++;
+if (versionTracker != null) {

Review comment:
   We have a synchronization issue (memory barrier type, not concurrent 
access type). The thread calling `setDelegate()` is accessing `versionTracker` 
set by another thread without synchronization.
   Can be fixed by making `versionTracker` volatile.

##
File path: solr/core/src/test/org/apache/solr/cluster/VersionTrackerImpl.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster;
+
+import org.apache.solr.common.util.TimeSource;
+import org.apache.solr.util.TimeOut;
+
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+public class VersionTrackerImpl implements VersionTracker {
+private int version = 0;
+
+@Override
+public synchronized void increment() {
+version++;
+this.notifyAll();
+}
+
+@Override
+public int waitForVersionChange(int currentVersion, int timeoutSec) throws 
InterruptedException, TimeoutException {
+TimeOut timeout = new TimeOut(timeoutSec, TimeUnit.SECONDS, 
TimeSource.NANO_TIME);
+int newVersion = currentVersion;
+while (! timeout.hasTimedOut()) {
+synchronized (this) {
+if ((newVersion = version) != currentVersion) {
+break;
+}
+this.wait(timeout.timeLeft(TimeUnit.MILLISECONDS));
+}
+}
+if (newVersion < currentVersion) {
+// ArithmeticException? This means we overflowed
+throw new RuntimeException("Invalid version - went back! 
currentVersion=" + currentVersion +
+" newVersion=" + newVersion);
+} else if (newVersion == currentVersion) {
+throw new TimeoutException("Timed out waiting for version 
change.");

Review comment:
   Add the version value to the exception, might help debug tests.

##
File path: solr/core/src/java/org/apache/solr/cluster/VersionTracker.java
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster;
+
+import java.util.concurrent.TimeoutException;
+
+/**
+ * Allows for tracking state change from test classes. Typical use will be to 
set a version tracker on a stateful
+ * object, which will call {@link #increment()} every time state changes. Test 
clients observing the state will call
+ * {@link #waitForVersionChange(int, int)} to be notified of the next 
increment call.
+ */
+public interface VersionTracker {

Review comment:
   Tracking versions by incrementing is one possible implementation of this 
interface, but maybe the interface doesn't have to hint that this is the 
implementation?
   Renaming `increment` into `notifyEvent` or something similar and 
`VersionTracker` into `NotificationCallback` would make it more generic (not 
suggesting these actual names, but you get the idea).
   
   `waitForVersionChange` doesn't have to b

[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277448#comment-17277448
 ] 

Ilan Ginzburg commented on SOLR-15122:
--

Added a few comments. I like this approach much better. Thanks.

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15092) Loosen Ref Guide link checking to allow empty anchors in links

2021-02-02 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-15092:
--
Attachment: SOLR-15092.patch
  Assignee: Chris M. Hostetter
Status: Open  (was: Open)

the attached patch takes care of relaxing this rule, while still ensuring that 
if an anchor is used, it must exist in the linked file.

Once this patch is applied, the following perl command can be run to "clean up" 
any no longer needed anchors that point at the id on the body of each page...
{noformat}
perl -i -ple 's/<<(.*?)\.adoc#\1,/<<$1.adoc#,/g' src/*.adoc
{noformat}
...although in at least one place some manual cleanup needs to be done, because 
otherwise asciidoctor gets confused by this line in {{language-analysis.adoc}} 
...
{noformat}
... Blank lines and lines that begin with "#" are ignored.  See 
<> for more information.
{noformat}
...and thinks the {{# ... #}} bit is suppose to be "highlighted" using  html5 
{{}} tags...
{noformat}
... Blank lines and lines that begin with "" are ignored.  See Resource Loading for more information.
{noformat}
...so we'll have to either escape of backtick quote the first {{#}} character 
in the line.

(I didn't include the modifications made by the perl command in the patch, 
because we'll want to run that command on each branch given the other content 
changes between master & branch_8x)

> Loosen Ref Guide link checking to allow empty anchors in links
> --
>
> Key: SOLR-15092
> URL: https://issues.apache.org/jira/browse/SOLR-15092
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cassandra Targett
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15092.patch
>
>
> Back when we were publishing the PDF, we needed to be sure to include an 
> explicit section title as an anchor for all inter-document links (such as 
> {{<>}}) because when the entire guide 
> was assembled into a single file the explicit anchor ensured links went to 
> the right spot in the overall Guide.
> Without the PDF, if we want to just link to another page in its entirety and 
> not a sub-section of a page, we can use a shorter syntax with an empty 
> anchor: {{<>}}. I can't find this explicitly 
> documented, but it does construct a correct link (i.e., {{ href="page-title.html#">Page Title}}).
> However, our link checking will fail this structure because it still assumes 
> we must have a section name in the anchor and won't allow blank anchors. This 
> issue is to loosen that check a bit and update the Ref Guide how-to docs to 
> show it as a possible option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


madrob commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568943853



##
File path: solr/core/src/test/org/apache/solr/cluster/VersionTrackerImpl.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster;
+
+import org.apache.solr.common.util.TimeSource;
+import org.apache.solr.util.TimeOut;
+
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+public class VersionTrackerImpl implements VersionTracker {
+private int version = 0;
+
+@Override
+public synchronized void increment() {
+version++;
+this.notifyAll();
+}
+
+@Override
+public int waitForVersionChange(int currentVersion, int timeoutSec) throws 
InterruptedException, TimeoutException {
+TimeOut timeout = new TimeOut(timeoutSec, TimeUnit.SECONDS, 
TimeSource.NANO_TIME);
+int newVersion = currentVersion;
+while (! timeout.hasTimedOut()) {

Review comment:
   It's hard to do profiling on this since it generally only loops once on 
my machine, but I'll switch it to the loop inside of the block. It's part of 
the condition that we are implicitly checking with the wait, so it makes sense 
this way too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


madrob commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568944250



##
File path: 
solr/core/src/test/org/apache/solr/cluster/events/ClusterEventProducerTest.java
##
@@ -102,7 +105,7 @@ public void teardown() throws Exception {
 
   @Test
   public void testEvents() throws Exception {
-int version = waitForVersionChange(-1, 10);
+int version = versionTracker.waitForVersionChange(-1, 10);

Review comment:
   -1 is the last value we've "seen", I'll add some docs around this. 
Effectively this is a getVersion in this implementation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


murblanc commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568968626



##
File path: 
solr/core/src/test/org/apache/solr/cluster/CountingStateChangeListener.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster;
+
+import org.apache.solr.common.util.TimeSource;
+import org.apache.solr.util.TimeOut;
+
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A counting StateChangeListener that will internally track how many times 
{@link #stateChanged()} has been called.
+ * Consumers can compare the number of state change calls before and after an 
event to determine if they should proceed,
+ * made simple with {@link #waitForVersionChange(int, int)} method.
+ */
+public class CountingStateChangeListener implements StateChangeListener {
+private int version = 0;
+
+@Override
+public synchronized void stateChanged() {
+version++;
+this.notifyAll();
+}
+
+/**
+ * Given a last known number of state changes, wait for additional changes 
to come in. If no state changes have
+ * occurred beyond the known value, this method will wait for additional 
changes to come in.
+ * If the current number of change events is unknown to the caller, then 
this method can be called with -1
+ * to return immediately with the number of events up to this point.
+ * @param lastVersion the previous number of changes seen
+ * @param timeoutSec how long to wait for additional changes to occur
+ * @return the number of changes seen since initialization
+ */
+public int waitForVersionChange(int lastVersion, int timeoutSec) throws 
InterruptedException, TimeoutException {
+TimeOut timeout = new TimeOut(timeoutSec, TimeUnit.SECONDS, 
TimeSource.NANO_TIME);
+int newVersion = lastVersion;
+synchronized (this) {
+while (!timeout.hasTimedOut() && (newVersion = version) != 
lastVersion) {

Review comment:
   I don't get the condition here. Shouldn't we loop while `version == 
lastVersion` (so we exit the loop when it changes or when we time out) rather 
than loop while they're different?
   
   I suspect that the test without this improved lockstep synchronization was 
passing always on your machine and it continues to pass for the same reason.

##
File path: 
solr/core/src/test/org/apache/solr/cluster/CountingStateChangeListener.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster;
+
+import org.apache.solr.common.util.TimeSource;
+import org.apache.solr.util.TimeOut;
+
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A counting StateChangeListener that will internally track how many times 
{@link #stateChanged()} has been called.
+ * Consumers can compare the number of state change calls before and after an 
event to determine if they should proceed,
+ * made simple with {@link #waitForVersionChange(int, int)} method.
+ */
+public class CountingStateChangeListener implements StateChangeListener {
+private int version = 0;
+
+@Override
+public synchronized void stateChanged() {
+version++;
+this.notifyAll();
+}
+
+/**
+ * Given a last known number of state changes, wait for additional changes 
to 

[GitHub] [lucene-solr] msokolov merged pull request #2282: LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes

2021-02-02 Thread GitBox


msokolov merged pull request #2282:
URL: https://github.com/apache/lucene-solr/pull/2282


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9615) Expose HnswGraphBuilder index-time hyperparameters

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277502#comment-17277502
 ] 

ASF subversion and git services commented on LUCENE-9615:
-

Commit a53e8e722884e5655206292590da67bb71efc34d in lucene-solr's branch 
refs/heads/master from sbeniwal12
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a53e8e7 ]

LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType 
attributes (from Shubham Beniwal))



> Expose HnswGraphBuilder index-time hyperparameters
> --
>
> Key: LUCENE-9615
> URL: https://issues.apache.org/jira/browse/LUCENE-9615
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HnswGraphBuilder has a few tunables: maxConnections, beamWidth, and we may 
> add a few more, such as whether to use a diversity heuristic when choosing 
> neighbors to link in the graph. Currently these are locked to defaults set by 
> global variables. Instead we should provide some interface for configuring 
> them. The best candidate so far seems to be to add them either as attributes 
> on a FieldType, or as Codec level configurations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2291: SOLR-15122 Replace volatile+sleep with wait/notify

2021-02-02 Thread GitBox


madrob commented on a change in pull request #2291:
URL: https://github.com/apache/lucene-solr/pull/2291#discussion_r568977953



##
File path: 
solr/core/src/test/org/apache/solr/cluster/CountingStateChangeListener.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster;
+
+import org.apache.solr.common.util.TimeSource;
+import org.apache.solr.util.TimeOut;
+
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A counting StateChangeListener that will internally track how many times 
{@link #stateChanged()} has been called.
+ * Consumers can compare the number of state change calls before and after an 
event to determine if they should proceed,
+ * made simple with {@link #waitForVersionChange(int, int)} method.
+ */
+public class CountingStateChangeListener implements StateChangeListener {
+private int version = 0;
+
+@Override
+public synchronized void stateChanged() {
+version++;
+this.notifyAll();
+}
+
+/**
+ * Given a last known number of state changes, wait for additional changes 
to come in. If no state changes have
+ * occurred beyond the known value, this method will wait for additional 
changes to come in.
+ * If the current number of change events is unknown to the caller, then 
this method can be called with -1
+ * to return immediately with the number of events up to this point.
+ * @param lastVersion the previous number of changes seen
+ * @param timeoutSec how long to wait for additional changes to occur
+ * @return the number of changes seen since initialization
+ */
+public int waitForVersionChange(int lastVersion, int timeoutSec) throws 
InterruptedException, TimeoutException {
+TimeOut timeout = new TimeOut(timeoutSec, TimeUnit.SECONDS, 
TimeSource.NANO_TIME);
+int newVersion = lastVersion;
+synchronized (this) {
+while (!timeout.hasTimedOut() && (newVersion = version) != 
lastVersion) {

Review comment:
   It's a copy-paste error





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #2231: LUCENE-9680 - Re-add IndexWriter::getFieldNames

2021-02-02 Thread GitBox


msokolov commented on a change in pull request #2231:
URL: https://github.com/apache/lucene-solr/pull/2231#discussion_r568979069



##
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java
##
@@ -4600,4 +4600,49 @@ public void testIndexWriterBlocksOnStall() throws 
IOException, InterruptedExcept
   }
 }
   }
+
+  public void testGetFieldNames() throws IOException {
+Directory dir = newDirectory();
+
+IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig(new 
MockAnalyzer(random(;
+
+assertEquals(Set.of(), writer.getFieldNames());
+
+addDocWithField(writer, "f1");
+assertEquals(Set.of("f1"), writer.getFieldNames());
+
+// should be unmodifiable:
+final Set fieldSet = writer.getFieldNames();
+assertThrows(UnsupportedOperationException.class, () -> 
fieldSet.add("cannot modify"));
+assertThrows(UnsupportedOperationException.class, () -> 
fieldSet.remove("f1"));
+
+addDocWithField(writer, "f2");
+assertEquals(Set.of("f1", "f2"), writer.getFieldNames());

Review comment:
   thanks, @orenovadia !





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov merged pull request #2231: LUCENE-9680 - Re-add IndexWriter::getFieldNames

2021-02-02 Thread GitBox


msokolov merged pull request #2231:
URL: https://github.com/apache/lucene-solr/pull/2231


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9680) Re-add IndexWriter.getFieldNames

2021-02-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277508#comment-17277508
 ] 

ASF subversion and git services commented on LUCENE-9680:
-

Commit 8d0cbcbb53139413a3fdbb364764e811145b2ccf in lucene-solr's branch 
refs/heads/master from orenovadia
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8d0cbcb ]

LUCENE-9680 - Re-add IndexWriter::getFieldNames



> Re-add IndexWriter.getFieldNames
> 
>
> Key: LUCENE-9680
> URL: https://issues.apache.org/jira/browse/LUCENE-9680
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Oren Ovadia
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> IndexWriter.getFieldNames was deprecated in LUCENE-8909.
> It is useful to have this information exposed by IW to cap (or report) when 
> too many fields have been created.
> getFieldNames was introduced in LUCENE-7659.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9680) Re-add IndexWriter.getFieldNames

2021-02-02 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277510#comment-17277510
 ] 

Michael Sokolov commented on LUCENE-9680:
-

[~oren.ovadia] do you also want to backport to branch_8x? It doesn't seem 
urgent, but if it would be useful to you to have this in the next 8.x release, 
you might want to do so.

> Re-add IndexWriter.getFieldNames
> 
>
> Key: LUCENE-9680
> URL: https://issues.apache.org/jira/browse/LUCENE-9680
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Oren Ovadia
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> IndexWriter.getFieldNames was deprecated in LUCENE-8909.
> It is useful to have this information exposed by IW to cap (or report) when 
> too many fields have been created.
> getFieldNames was introduced in LUCENE-7659.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277519#comment-17277519
 ] 

Mike Drob commented on SOLR-15122:
--

I was thinking about this some more, and think we should use a Phaser instead 
of rolling our own concurrency implementation. Thoughts?

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable

2021-02-02 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277522#comment-17277522
 ] 

Isabelle Giguere commented on SOLR-5480:


[~erickerickson], [~noble.paul], [~anshum], [~hossman]
Before we deprecate the MLT Handler, can we please have some sort of valid 
solution for passing in text to the MLT QParser ?  To support uses cases where 
the id of the initial document is not known.

https://issues.apache.org/jira/browse/SOLR-7913?focusedCommentId=17267477&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17267477

The main purpose of SOLR-7913 is to pass plain text to MLT QParser.  It 
concentrates on stream.body, because, at one time, it looked like the best way 
to do so.
But if text could be passed to MLT QParser in any other way, there would be no 
reason to insist on using stream.body.


> Make MoreLikeThisHandler distributable
> --
>
> Key: SOLR-5480
> URL: https://issues.apache.org/jira/browse/SOLR-5480
> Project: Solr
>  Issue Type: Improvement
>  Components: MoreLikeThis
>Reporter: Steve Molloy
>Assignee: Noble Paul
>Priority: Major
> Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, 
> SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
> SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
> SOLR-5480.patch, SOLR-5480.patch
>
>
> The MoreLikeThis component, when used in the standard search handler supports 
> distributed searches. But the MoreLikeThisHandler itself doesn't, which 
> prevents from say, passing in text to perform the query. I'll start looking 
> into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
> has some work done already and want to share, or want to contribute, any help 
> will be welcomed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277529#comment-17277529
 ] 

Ilan Ginzburg commented on SOLR-15122:
--

I've never used this specific class but if you make it so that it's hidden 
behind the {{StateChangeListener}} interface (i.e. it is a test writer 
implementation choice) then I'm perfectly fine with it.

I'd be much more hesitant to expose a specific concurrency class in the 
interface though.

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15127) All-In-One Dockerfile for building local images as well as reproducible release builds directly from (remote) git tags

2021-02-02 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277530#comment-17277530
 ] 

Houston Putman commented on SOLR-15127:
---

I think there are two possible ways of going forward with something that the 
official Docker image people might be ok with.
 # Using the git tags, as with your example. Doing the gradle build in the 
multi-stage build.
 # [~dsmiley]'s suggestion of using the Solr TGZ release as the docker context 
itself.
 ** In order to have the Solr TGZ become the docker context, we would merely 
need to add the Dockerfile and solr/docker/scripts to the release.

I'll put up a PR that would use the Solr TGZ as the docker context, allowing us 
to use docker build directly with the released artifacts. That way we can 
compare pros/cons of each approach.

Besides this bigger question. There are some things I really like in your patch:
 * Trying to remove the SOLR_VERSION argument (Big improvement, as there would 
be no required ARGs)
 ** I think we can actually add the version as a file inside the release, and 
then read it into an env var as a part of RUN.
Then we can sym-link from /opt/solr to /opt/solr-$version, to keep backwards 
compatibility.
 * Consolidating the last two RUN layers

I am split on the jattach thing. It will be great when it can be moved to the 
{{apt-get install}} section. Until then, I don't mind if it's fetched in the 
actual image or the builder image. Did you move it to the builder so that the 
final image wouldn't need the GITHUB_URL arg?

> All-In-One Dockerfile for building local images as well as reproducible 
> release builds directly from (remote) git tags
> --
>
> Key: SOLR-15127
> URL: https://issues.apache.org/jira/browse/SOLR-15127
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15127.patch
>
>
> There was a recent dev@lucene discussion about the future of the 
> github/docker-solr repo and (Apache) "official" solr docker images and using 
> the "apache/solr" nameing vs (docker-library official) "_/solr" names...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3CCAD4GwrNCPEnAJAjy4tY%3DpMeX5vWvnFyLe9ZDaXmF4J8XchA98Q%40mail.gmail.com%3E
> In that disussion, mak pointed out that docker-library evidently allows for 
> some more flexibility in the way "official" docker-library packages can be 
> built (compared to the rules that were evidnlty in place when the mak setup 
> the current docker-solr image building process/tooling), pointing out how the 
> "docker official" elasticsearch images are current built from the "elastic 
> official" elasticsearch images...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3C3CED9683-1DD2-4F08-97F9-4FC549EDE47D%40greenhills.co.uk%3E
> Based on this, I proposed that we could probably restructure the Solr 
> Dockerfile so that it could be useful for both "local development" -- using 
> the current repo checkout -- as well as for "apache official" apache/solr 
> images that could be reproducibly built directly from pristine git tags using 
> the remote git URL syntax supported by "docker build" (and then -- evidently 
> -- extended by trivial one line Dockerfiles for the "docker-library official" 
> _/solr images)...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3Calpine.DEB.2.21.2101221423340.16298%40slate%3E
> This jira tracks this idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15129) Use the Solr TGZ artifact as Docker context

2021-02-02 Thread Houston Putman (Jira)
Houston Putman created SOLR-15129:
-

 Summary: Use the Solr TGZ artifact as Docker context
 Key: SOLR-15129
 URL: https://issues.apache.org/jira/browse/SOLR-15129
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: master (9.0)
Reporter: Houston Putman


As discussed in SOLR-15127, there is a need for a unified Dockerfile that 
allows for release and local builds.

This ticket is an attempt to achieve this by using the Solr distribution TGZ as 
the docker context to build from.

Therefore release images would be completely reproducible by running:

{{docker build -f solr-9.0.0/Dockerfile 
https://www.apache.org/dyn/closer.lua/lucene/solr/9.0.0/solr-9.0.0.tgz}}

The changes to the Solr distribution would include adding a Dockerfile at 
{{solr-/Dockerfile}}, adding the docker scripts under 
{{solr-/docker}}, and adding a version file at 
{{solr-/VERSION.txt}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-02 Thread Julie Tibshirani (Jira)
Julie Tibshirani created LUCENE-9725:


 Summary: Allow BM25FQuery to use other similarities
 Key: LUCENE-9725
 URL: https://issues.apache.org/jira/browse/LUCENE-9725
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Julie Tibshirani


>From a high level, BM25FQuery works as follows:
 1. Given a list of fields and weights, it pretends there's a synthetic 
combined field where all terms have been indexed. It computes new term and 
collection statistics for this combined field.
 2. It uses a disjunction iterator and BM25Similarity to score the documents.

The steps are (1) compute statistics that represent the combined field content, 
and (2) pass these to a similarity function. There is nothing really specific 
to BM25Similarity in this approach. In step 2, we could use another similarity, 
for example BooleanSimilarity or those based on language models like 
LMDirichletSimilarity. The main restriction is that norms have to be additive 
(the norm of the combined field must be the sum of the field norms).

Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the one 
configured on `IndexSearcher`. We could think of this as providing a sensible 
default approach to cross-field scoring for many similarities. It's an 
incremental step towards LUCENE-8711, which would give similarities more 
fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman opened a new pull request #2292: SOLR-15102: Use Solr distribution TGZ as docker context

2021-02-02 Thread GitBox


HoustonPutman opened a new pull request #2292:
URL: https://github.com/apache/lucene-solr/pull/2292


   https://issues.apache.org/jira/browse/SOLR-15102
   
   This should work, but there is still cleanup needed with the gradle changes.
   
   Also we might want to infer the Solr version another way.
   
   Backwards incompatibility that needs to be added back in: `/opt/docker-solr`
   
   Changes in the image:
   - `/opt/docker-solr` -> `/opt/solr/docker`
   - `/opt/solr` is no longer a sym link, `/opt/solr-` is.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15127) All-In-One Dockerfile for building local images as well as reproducible release builds directly from (remote) git tags

2021-02-02 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277530#comment-17277530
 ] 

Houston Putman edited comment on SOLR-15127 at 2/2/21, 11:49 PM:
-

I think there are two possible ways of going forward with something that the 
official Docker image people might be ok with.
 # Using the git tags, as with your example. Doing the gradle build in the 
multi-stage build.
 # [~dsmiley]'s suggestion of using the Solr TGZ release as the docker context 
itself.
 ** In order to have the Solr TGZ become the docker context, we would merely 
need to add the Dockerfile and solr/docker/scripts to the release.

I'll put up a PR that would use the Solr TGZ as the docker context, allowing us 
to use docker build directly with the released artifacts. That way we can 
compare pros/cons of each approach. (Can be found at SOLR-15129)

Besides this bigger question. There are some things I really like in your patch:
 * Trying to remove the SOLR_VERSION argument (Big improvement, as there would 
be no required ARGs)
 ** I think we can actually add the version as a file inside the release, and 
then read it into an env var as a part of RUN.
Then we can sym-link from /opt/solr to /opt/solr-$version, to keep backwards 
compatibility.
 * Consolidating the last two RUN layers

I am split on the jattach thing. It will be great when it can be moved to the 
{{apt-get install}} section. Until then, I don't mind if it's fetched in the 
actual image or the builder image. Did you move it to the builder so that the 
final image wouldn't need the GITHUB_URL arg?


was (Author: houston):
I think there are two possible ways of going forward with something that the 
official Docker image people might be ok with.
 # Using the git tags, as with your example. Doing the gradle build in the 
multi-stage build.
 # [~dsmiley]'s suggestion of using the Solr TGZ release as the docker context 
itself.
 ** In order to have the Solr TGZ become the docker context, we would merely 
need to add the Dockerfile and solr/docker/scripts to the release.

I'll put up a PR that would use the Solr TGZ as the docker context, allowing us 
to use docker build directly with the released artifacts. That way we can 
compare pros/cons of each approach.

Besides this bigger question. There are some things I really like in your patch:
 * Trying to remove the SOLR_VERSION argument (Big improvement, as there would 
be no required ARGs)
 ** I think we can actually add the version as a file inside the release, and 
then read it into an env var as a part of RUN.
Then we can sym-link from /opt/solr to /opt/solr-$version, to keep backwards 
compatibility.
 * Consolidating the last two RUN layers

I am split on the jattach thing. It will be great when it can be moved to the 
{{apt-get install}} section. Until then, I don't mind if it's fetched in the 
actual image or the builder image. Did you move it to the builder so that the 
final image wouldn't need the GITHUB_URL arg?

> All-In-One Dockerfile for building local images as well as reproducible 
> release builds directly from (remote) git tags
> --
>
> Key: SOLR-15127
> URL: https://issues.apache.org/jira/browse/SOLR-15127
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15127.patch
>
>
> There was a recent dev@lucene discussion about the future of the 
> github/docker-solr repo and (Apache) "official" solr docker images and using 
> the "apache/solr" nameing vs (docker-library official) "_/solr" names...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3CCAD4GwrNCPEnAJAjy4tY%3DpMeX5vWvnFyLe9ZDaXmF4J8XchA98Q%40mail.gmail.com%3E
> In that disussion, mak pointed out that docker-library evidently allows for 
> some more flexibility in the way "official" docker-library packages can be 
> built (compared to the rules that were evidnlty in place when the mak setup 
> the current docker-solr image building process/tooling), pointing out how the 
> "docker official" elasticsearch images are current built from the "elastic 
> official" elasticsearch images...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3C3CED9683-1DD2-4F08-97F9-4FC549EDE47D%40greenhills.co.uk%3E
> Based on this, I proposed that we could probably restructure the Solr 
> Dockerfile so that it could be useful for both "local development" -- using 
> the current repo checkout -- as well as for "apache official" apache/solr 
> images that could be reproducibly built directly from pristine git tags using 
> the remote git URL syntax supported by "docker build"

[jira] [Updated] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-02 Thread Julie Tibshirani (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julie Tibshirani updated LUCENE-9725:
-
Description: 
>From a high level, BM25FQuery works as follows:
# Given a list of fields and weights, it pretends there's a synthetic combined 
field where all terms have been indexed. It computes new term and collection 
statistics for this combined field.
# It uses a disjunction iterator and BM25Similarity to score the documents.

The steps are (1) compute statistics that represent the combined field content, 
and (2) pass these to a similarity function. There is nothing really specific 
to BM25Similarity in this approach. In step 2, we could use another similarity, 
for example BooleanSimilarity or those based on language models like 
LMDirichletSimilarity. The main restriction is that norms have to be additive 
(the norm of the combined field must be the sum of the field norms).

Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the one 
configured on IndexSearcher. We could think of this as providing a sensible 
default approach to cross-field scoring for many similarities. It's an 
incremental step towards LUCENE-8711, which would give similarities more 
fine-grained control over how stats/ scores are combined across fields.

  was:
>From a high level, BM25FQuery works as follows:
 1. Given a list of fields and weights, it pretends there's a synthetic 
combined field where all terms have been indexed. It computes new term and 
collection statistics for this combined field.
 2. It uses a disjunction iterator and BM25Similarity to score the documents.

The steps are (1) compute statistics that represent the combined field content, 
and (2) pass these to a similarity function. There is nothing really specific 
to BM25Similarity in this approach. In step 2, we could use another similarity, 
for example BooleanSimilarity or those based on language models like 
LMDirichletSimilarity. The main restriction is that norms have to be additive 
(the norm of the combined field must be the sum of the field norms).

Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the one 
configured on `IndexSearcher`. We could think of this as providing a sensible 
default approach to cross-field scoring for many similarities. It's an 
incremental step towards LUCENE-8711, which would give similarities more 
fine-grained control over how stats/ scores are combined across fields.


> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Major
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15127) All-In-One Dockerfile for building local images as well as reproducible release builds directly from (remote) git tags

2021-02-02 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277542#comment-17277542
 ] 

Chris M. Hostetter commented on SOLR-15127:
---

bq. David Smiley's suggestion of using the Solr TGZ release as the docker 
context itself. 

That's pretty close to what i do in one code path of this patch -- except that 
(as you mentioned) the Dockerfile and solr/docker/scripts aren't in the 
solr.TGZ so i left in the existing build.gradle logic to provide those.

I do however think there is a lot of value in supporting the "build from a 
remote git url" approach as well, since it let's people build from arbitrary 
branches w/o a local java env.  I also think that from a _transparency_ 
standpoint with the official builds, it would be better to build them from 
_source_ ... either the official git tag, or perhaps using the solr-src.tgz 
release instead of the (compiled) solr.tgz?

The broader question I have though at this point is how people feel about this 
style of "all in one" Dockerfile that uses 'sh' conditional logic in the RUN to 
support 2 diff ways to building: 
* "docker stage runs gradle to create solr.tgz; then creates & lays out image"
** makes it easy to use git repo or solr-src.tgz as build context for 
transparency and portable building of docker images w/o java dev env
* "gradle builds solr.tgz; then invokes docker to create & layout image"
** makes it eas(ier) for people to iteratively develop/patch solr in their java 
env & then build docker images from that

It really feels like the best of both worlds to me.


bq. I think we can actually add the version as a file inside the release, and 
then read it into an env var as a part of RUN.  Then we can sym-link from 
/opt/solr to /opt/solr-$version, to keep backwards compatibility.

I wasn't sure if there was a *reason* to keep the symlink approach, but yeah, 
it would be easy to add back if needed.  I don't really have any strong 
feelings on where it happens -- just trying to take advantage of the fact that 
we can be multi-stage.

bq. Did you move it to the builder so that the final image wouldn't need the 
GITHUB_URL arg?

My goal was just to move everything that *could* be in the builder stage into 
the builder stage, to "fail fast" and try to keep the final image as small as 
possible.




> All-In-One Dockerfile for building local images as well as reproducible 
> release builds directly from (remote) git tags
> --
>
> Key: SOLR-15127
> URL: https://issues.apache.org/jira/browse/SOLR-15127
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15127.patch
>
>
> There was a recent dev@lucene discussion about the future of the 
> github/docker-solr repo and (Apache) "official" solr docker images and using 
> the "apache/solr" nameing vs (docker-library official) "_/solr" names...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3CCAD4GwrNCPEnAJAjy4tY%3DpMeX5vWvnFyLe9ZDaXmF4J8XchA98Q%40mail.gmail.com%3E
> In that disussion, mak pointed out that docker-library evidently allows for 
> some more flexibility in the way "official" docker-library packages can be 
> built (compared to the rules that were evidnlty in place when the mak setup 
> the current docker-solr image building process/tooling), pointing out how the 
> "docker official" elasticsearch images are current built from the "elastic 
> official" elasticsearch images...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3C3CED9683-1DD2-4F08-97F9-4FC549EDE47D%40greenhills.co.uk%3E
> Based on this, I proposed that we could probably restructure the Solr 
> Dockerfile so that it could be useful for both "local development" -- using 
> the current repo checkout -- as well as for "apache official" apache/solr 
> images that could be reproducibly built directly from pristine git tags using 
> the remote git URL syntax supported by "docker build" (and then -- evidently 
> -- extended by trivial one line Dockerfiles for the "docker-library official" 
> _/solr images)...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202101.mbox/%3Calpine.DEB.2.21.2101221423340.16298%40slate%3E
> This jira tracks this idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani opened a new pull request #2293: LUCENE-9725: Allow BM25FQuery to use other similarities.

2021-02-02 Thread GitBox


jtibshirani opened a new pull request #2293:
URL: https://github.com/apache/lucene-solr/pull/2293


   From a high level, BM25FQuery (1) computes statistic that represent the 
combined
   field content and (2) passes these to a score function. This model makes 
sense
   for many similarities besides BM25.
   
   This PR unhardcodes BM25Similarity in BM25FQuery and instead uses the one
   configured on IndexSearcher. It also renames BM25FQuery since it's no longer
   specific to BM25.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-02 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277545#comment-17277545
 ] 

Julie Tibshirani commented on LUCENE-9725:
--

I opened https://github.com/apache/lucene-solr/pull/2293 to show the idea.

> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zhaih commented on pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-02-02 Thread GitBox


zhaih commented on pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#issuecomment-771265310


   I see, so I think for now I could test it via a customized 
PerFieldDocValuesFormat, I'll give PerFieldDocValuesFormat route a try then.
   
   Tho IMO I would prefer a simpler configuration (as proposed by @jaisonbi) 
rather than customize using PerFieldDocValuesFormat in the future, if these 2 
compression are showing different performance characteristic. Since if my 
understand is correct, to enable only TermDictCompression using 
PerFieldDOcValuesFormat we need to enumerate all SSDV field names in that 
class? Which sounds not quite maintainable if there's regularly field 
addition/deletion. Please correct me if I'm wrong as I'm not quite familiar 
with codec part...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >