date:20210209

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281602#comment-17281602
 ] 

Dawid Weiss commented on LUCENE-9747:
-

I get this with openjdk 14.0.1+7:
{code}
javadoc: error - org.apache.lucene.util (package): javadocs are missing
C:\Work\apache\lucene-solr.master\lucene\core\src\java\org\apache\lucene\analysis\standard\StandardAnalyzer.java:84:
 javadoc empty but @Override declared, skipping.
... [lots more warnings]
{code}

But no NPE. Which Java version are you using?

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9748) Hunspell: suggest inflected dictionary entries similar to the misspelled word

2021-02-09 Thread Peter Gromov (Jira)

Peter Gromov created LUCENE-9748:


 Summary: Hunspell: suggest inflected dictionary entries similar to 
the misspelled word
 Key: LUCENE-9748
 URL: https://issues.apache.org/jira/browse/LUCENE-9748
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572666204



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -70,6 +70,9 @@
 
 /** In-memory structure for the dictionary (.dic) and affix (.aff) data of a 
hunspell dictionary. */
 public class Dictionary {
+  // Derived from woorm/ openoffice dictionaries.

Review comment:
   LibreOffice?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572666977



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -224,26 +227,37 @@ public Dictionary(
 this.needsInputCleaning = ignoreCase;
 this.needsOutputCleaning = false; // set if we have an OCONV
 
-Path tempPath = getDefaultTempDir(); // TODO: make this configurable?
-Path aff = Files.createTempFile(tempPath, "affix", "aff");
-
-BufferedInputStream aff1 = null;
-InputStream aff2 = null;
-boolean success = false;
-try {
-  // Copy contents of the affix stream to a temp file.
-  try (OutputStream os = Files.newOutputStream(aff)) {
-affix.transferTo(os);
+try (BufferedInputStream affixStream =
+new BufferedInputStream(affix, MAX_PROLOGUE_SCAN_WINDOW) {
+  @Override
+  public void close() throws IOException {
+// TODO: maybe we should consume and close it? Why does it need to 
stay open?

Review comment:
   Probably so that the callers who opened the streams can close them 
safely using try-with-resources.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572667748



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -346,6 +352,7 @@ private void readAffixFile(InputStream affixStream, 
CharsetDecoder decoder, Flag
   if (line.isEmpty()) continue;
 
   String firstWord = line.split("\\s")[0];
+  // TODO: convert to a switch?

Review comment:
   I thought about that. Maybe switch expression, when the language level 
allows that. Switch with break statements would be too verbose for my taste





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572668532



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -778,31 +791,36 @@ char affixData(int affixIndex, int offset) {
   private static final byte[] BOM_UTF8 = {(byte) 0xef, (byte) 0xbb, (byte) 
0xbf};
 
   /** Parses the encoding and flag format specified in the provided 
InputStream */
-  private void readConfig(BufferedInputStream stream) throws IOException, 
ParseException {
-// I assume we don't support other BOMs (utf16, etc.)? We trivially could,
-// by adding maybeConsume() with a proper bom... but I don't see hunspell 
repo to have
-// any such exotic examples.
-Charset streamCharset;
-if (maybeConsume(stream, BOM_UTF8)) {
-  streamCharset = StandardCharsets.UTF_8;
-} else {
-  streamCharset = DEFAULT_CHARSET;
-}
-
-// TODO: can these flags change throughout the file? If not then we can 
abort sooner. And
-// then we wouldn't even need to create a temp file for the affix stream - 
a large enough
-// leading buffer (BufferedInputStream) would be sufficient?
+  private void readConfig(InputStream stream, Charset streamCharset)
+  throws IOException, ParseException {
 LineNumberReader reader = new LineNumberReader(new 
InputStreamReader(stream, streamCharset));
 String line;
+String flagLine = null;
+boolean charsetFound = false;
+boolean flagFound = false;
 while ((line = reader.readLine()) != null) {
   if (line.isBlank()) continue;
 
   String firstWord = line.split("\\s")[0];
   if ("SET".equals(firstWord)) {
 decoder = getDecoder(singleArgument(reader, line));
+charsetFound = true;
   } else if ("FLAG".equals(firstWord)) {
-flagParsingStrategy = getFlagParsingStrategy(line, decoder.charset());
+// Preserve the flag line for parsing later since we need the 
decoder's charset
+// and just in case they come out of order.
+flagLine = line;
+flagFound = true;
+  } else {
+continue;
   }
+
+  if (charsetFound && flagFound) {
+break;
+  }
+}
+
+if (flagFound) {

Review comment:
   flagLine != null?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9740) Avoid buffering and double-scan of flags in *.aff file

2021-02-09 Thread Peter Gromov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281609#comment-17281609
 ] 

Peter Gromov commented on LUCENE-9740:
--

Very nice, thanks! I think this can be merged, and additional checks can be 
added later

> Avoid buffering and double-scan of flags in *.aff file
> --
>
> Key: LUCENE-9740
> URL: https://issues.apache.org/jira/browse/LUCENE-9740
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I wrote a small utility test to scan through all the *.aff files from 
> openoffice and woorm - no file has double flags (SET or FLAG) and maximum 
> leading offsets until these flags appear are roughly:
> {code}
> Flag SET at maximum offset 10753
> Flag FLAG at maximum offset 4559
> {code}
> I think we could just make an assumption that, say, affix files are read with 
> an 20kB buffered reader and this provides a maximum leading window for 
> scanning for those flags. The dictionary parsing could also fail if any of 
> these flags occurs more than once in the input file?
> This would avoid having to read the file twice and perhaps simplify the API 
> (no need for a temporary spill).
> I'll piggyback this test as part of LUCENE-9727 if you'd like to re-run it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9749) Hunspell: apply output conversion (OCONV) to the suggestions

2021-02-09 Thread Peter Gromov (Jira)

Peter Gromov created LUCENE-9749:


 Summary: Hunspell: apply output conversion (OCONV) to the 
suggestions
 Key: LUCENE-9749
 URL: https://issues.apache.org/jira/browse/LUCENE-9749
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter opened a new pull request #2329: LUCENE-9749: Hunspell: apply output conversion (OCONV) to the suggestions

2021-02-09 Thread GitBox



donnerpeter opened a new pull request #2329:
URL: https://github.com/apache/lucene-solr/pull/2329


   
   
   
   # Description
   
   OCONV should be applied not only to stems, but also suggestions
   
   # Solution
   
   Call the method that applies it :)
   
   # Tests
   
   `oconv` from Hunspell repo
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter opened a new pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter opened a new pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330


   …o the misspelled word
   
   
   
   
   # Description
   
   A follow up of the "ngram" suggestion support that adds single prefixes and 
suffixes to dictionary entries to get better suggestions
   
   # Solution
   
   Copy Hunspell's logic, extract some common code for FST traversal
   
   # Tests
   
   `allcaps.sug` from Hunspell repo
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase merged pull request #2268: LUCENE-9705: Move Lucene50CompoundFormat to Lucene90CompoundFormat

2021-02-09 Thread GitBox



iverase merged pull request #2268:
URL: https://github.com/apache/lucene-solr/pull/2268


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9705) Move all codec formats to the o.a.l.codecs.Lucene90 package

2021-02-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281617#comment-17281617
 ] 

ASF subversion and git services commented on LUCENE-9705:
-

Commit eafeb6643408e7e978f2fcb8d456b5eb3ca9c187 in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eafeb66 ]

LUCENE-9705: Move Lucene50CompoundFormat to Lucene90CompoundFormat (#2268)



> Move all codec formats to the o.a.l.codecs.Lucene90 package
> ---
>
> Key: LUCENE-9705
> URL: https://issues.apache.org/jira/browse/LUCENE-9705
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Current formats are distributed in different packages, prefixed with the 
> Lucene version they were created. With the upcoming release of Lucene 9.0, it 
> would be nice to move all those formats to just the o.a.l.codecs.Lucene90 
> package (and of course moving the current ones to the backwards-codecs).
> This issue would actually facilitate moving the directory API to little 
> endian (LUCENE-9047) as the only codecs that would need to handle backwards 
> compatibility will be the codecs in backwards codecs.
> In addition, it can help formalising the use of internal versions vs format 
> versioning ( LUCENE-9616)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572676276



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -33,44 +40,59 @@
  */
 class GeneratingSuggester {
   private static final int MAX_ROOTS = 100;
-  private static final int MAX_GUESSES = 100;
+  private static final int MAX_WORDS = 100;
+  private static final int MAX_GUESSES = 200;
   private final Dictionary dictionary;
+  private final SpellChecker speller;
 
-  GeneratingSuggester(Dictionary dictionary) {
-this.dictionary = dictionary;
+  GeneratingSuggester(SpellChecker speller) {
+this.dictionary = speller.dictionary;
+this.speller = speller;
   }
 
   List suggest(String word, WordCase originalCase, Set 
prevSuggestions) {
-List roots = findSimilarDictionaryEntries(word, 
originalCase);
-List expanded = expandRoots(word, roots);
-TreeSet bySimilarity = rankBySimilarity(word, expanded);
+List> roots = findSimilarDictionaryEntries(word, 
originalCase);

Review comment:
   Just renamed a parameterized `WeightedWord`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572676594



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -33,44 +40,59 @@
  */
 class GeneratingSuggester {
   private static final int MAX_ROOTS = 100;
-  private static final int MAX_GUESSES = 100;
+  private static final int MAX_WORDS = 100;
+  private static final int MAX_GUESSES = 200;
   private final Dictionary dictionary;
+  private final SpellChecker speller;
 
-  GeneratingSuggester(Dictionary dictionary) {
-this.dictionary = dictionary;
+  GeneratingSuggester(SpellChecker speller) {
+this.dictionary = speller.dictionary;
+this.speller = speller;
   }
 
   List suggest(String word, WordCase originalCase, Set 
prevSuggestions) {
-List roots = findSimilarDictionaryEntries(word, 
originalCase);
-List expanded = expandRoots(word, roots);
-TreeSet bySimilarity = rankBySimilarity(word, expanded);
+List> roots = findSimilarDictionaryEntries(word, 
originalCase);
+List> expanded = expandRoots(word, roots);
+TreeSet> bySimilarity = rankBySimilarity(word, expanded);
 return getMostRelevantSuggestions(bySimilarity, prevSuggestions);
   }
 
-  private List findSimilarDictionaryEntries(String word, 
WordCase originalCase) {
-try {
-  IntsRefFSTEnum fstEnum = new IntsRefFSTEnum<>(dictionary.words);
-  TreeSet roots = new TreeSet<>();
+  private List> findSimilarDictionaryEntries(
+  String word, WordCase originalCase) {
+TreeSet> roots = new TreeSet<>();
+processFST(

Review comment:
   extracted FST traversal into a separate method





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572676998



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -84,33 +106,34 @@ private static String toString(IntsRef key) {
 return new String(chars);
   }
 
-  private boolean isSuitableRoot(IntsRef forms) {
+  private List filterSuitableEntries(String word, IntsRef forms) {
+List result = new ArrayList<>();
 for (int i = 0; i < forms.length; i += dictionary.formStep()) {
   int entryId = forms.ints[forms.offset + i];
-  if (dictionary.hasFlag(entryId, dictionary.needaffix)

Review comment:
   needaffix check is moved into `expandRoot`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572677318



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -132,14 +155,105 @@ private static int calcThreshold(String word) {
 return thresh / 3 - 1;
   }
 
-  private TreeSet rankBySimilarity(String word, 
List expanded) {
+  private List expandRoot(DictEntry root, String misspelled) {

Review comment:
   Main change here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572677980



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##
@@ -798,8 +798,4 @@ private boolean isFlagAppendedByAffix(int affixId, char 
flag) {
 int appendId = dictionary.affixData(affixId, Dictionary.AFFIX_APPEND);
 return dictionary.hasFlag(appendId, flag);
   }
-
-  private boolean isCrossProduct(int affix) {

Review comment:
   moved to Dictionary





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase merged pull request #2269: LUCENE-9322: Add TestLucene90FieldInfosFormat

2021-02-09 Thread GitBox



iverase merged pull request #2269:
URL: https://github.com/apache/lucene-solr/pull/2269


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572696542



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -33,44 +40,59 @@
  */
 class GeneratingSuggester {
   private static final int MAX_ROOTS = 100;
-  private static final int MAX_GUESSES = 100;
+  private static final int MAX_WORDS = 100;
+  private static final int MAX_GUESSES = 200;
   private final Dictionary dictionary;
+  private final SpellChecker speller;
 
-  GeneratingSuggester(Dictionary dictionary) {
-this.dictionary = dictionary;
+  GeneratingSuggester(SpellChecker speller) {
+this.dictionary = speller.dictionary;
+this.speller = speller;
   }
 
   List suggest(String word, WordCase originalCase, Set 
prevSuggestions) {
-List roots = findSimilarDictionaryEntries(word, 
originalCase);
-List expanded = expandRoots(word, roots);
-TreeSet bySimilarity = rankBySimilarity(word, expanded);
+List> roots = findSimilarDictionaryEntries(word, 
originalCase);
+List> expanded = expandRoots(word, roots);
+TreeSet> bySimilarity = rankBySimilarity(word, expanded);
 return getMostRelevantSuggestions(bySimilarity, prevSuggestions);
   }
 
-  private List findSimilarDictionaryEntries(String word, 
WordCase originalCase) {
-try {
-  IntsRefFSTEnum fstEnum = new IntsRefFSTEnum<>(dictionary.words);
-  TreeSet roots = new TreeSet<>();
+  private List> findSimilarDictionaryEntries(
+  String word, WordCase originalCase) {
+TreeSet> roots = new TreeSet<>();
+processFST(
+dictionary.words,
+(key, forms) -> {
+  if (Math.abs(key.length - word.length()) > 4) return;
+
+  String root = toString(key);
+  List entries = filterSuitableEntries(root, forms);
+  if (entries.isEmpty()) return;
+
+  if (originalCase == WordCase.LOWER
+  && WordCase.caseOf(root) == WordCase.TITLE
+  && !dictionary.hasLanguage("de")) {
+return;
+  }
 
-  IntsRefFSTEnum.InputOutput mapping;
-  while ((mapping = fstEnum.next()) != null) {
-IntsRef key = mapping.input;
-if (Math.abs(key.length - word.length()) > 4 || 
!isSuitableRoot(mapping.output)) continue;
-
-String root = toString(key);
-if (originalCase == WordCase.LOWER
-&& WordCase.caseOf(root) == WordCase.TITLE
-&& !dictionary.hasLanguage("de")) {
-  continue;
-}
+  String lower = dictionary.toLowerCase(root);
+  int sc =
+  ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE))
+  + commonPrefix(word, root);
 
-String lower = dictionary.toLowerCase(root);
-int sc =
-ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE)) + 
commonPrefix(word, root);
+  entries.forEach(e -> roots.add(new Weighted<>(e, sc)));
+});
+return roots.stream().limit(MAX_ROOTS).collect(Collectors.toList());
+  }
 
-roots.add(new WeightedWord(root, sc));
+  private void processFST(FST fst, BiConsumer 
keyValueConsumer) {

Review comment:
   This might be worth moving to some util, e.g. `IntsRefFSTEnum`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase opened a new pull request #2331: LUCENE-9322: Lucene90VectorWriter can leak open files

2021-02-09 Thread GitBox



iverase opened a new pull request #2331:
URL: https://github.com/apache/lucene-solr/pull/2331


   While trying to add a Base test class for Vector based on 
`BaseIndexFileFormatTestCase`, a bug surface on the Lucene90VectorWriter 
constructor. If there is an exception if the middle, it might happen that files 
are not properly closed and therefore a leak.
   
   Here is the proposal, move the current `TestVectorValues` to a 
`BaseVectorFormatTestCase` which extends `BaseIndexFileFormatTestCase`.
   
   Fix the constructor so it handle closing files on error properly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572701976



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -778,31 +791,36 @@ char affixData(int affixIndex, int offset) {
   private static final byte[] BOM_UTF8 = {(byte) 0xef, (byte) 0xbb, (byte) 
0xbf};
 
   /** Parses the encoding and flag format specified in the provided 
InputStream */
-  private void readConfig(BufferedInputStream stream) throws IOException, 
ParseException {
-// I assume we don't support other BOMs (utf16, etc.)? We trivially could,
-// by adding maybeConsume() with a proper bom... but I don't see hunspell 
repo to have
-// any such exotic examples.
-Charset streamCharset;
-if (maybeConsume(stream, BOM_UTF8)) {
-  streamCharset = StandardCharsets.UTF_8;
-} else {
-  streamCharset = DEFAULT_CHARSET;
-}
-
-// TODO: can these flags change throughout the file? If not then we can 
abort sooner. And
-// then we wouldn't even need to create a temp file for the affix stream - 
a large enough
-// leading buffer (BufferedInputStream) would be sufficient?
+  private void readConfig(InputStream stream, Charset streamCharset)
+  throws IOException, ParseException {
 LineNumberReader reader = new LineNumberReader(new 
InputStreamReader(stream, streamCharset));
 String line;
+String flagLine = null;
+boolean charsetFound = false;
+boolean flagFound = false;
 while ((line = reader.readLine()) != null) {
   if (line.isBlank()) continue;
 
   String firstWord = line.split("\\s")[0];
   if ("SET".equals(firstWord)) {
 decoder = getDecoder(singleArgument(reader, line));
+charsetFound = true;
   } else if ("FLAG".equals(firstWord)) {
-flagParsingStrategy = getFlagParsingStrategy(line, decoder.charset());
+// Preserve the flag line for parsing later since we need the 
decoder's charset
+// and just in case they come out of order.
+flagLine = line;
+flagFound = true;
+  } else {
+continue;
   }
+
+  if (charsetFound && flagFound) {
+break;
+  }
+}
+
+if (flagFound) {

Review comment:
   If flagLine is true then line had to be != null, otherwise you'd get an 
NPE earlier on line.split?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss merged pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



dweiss merged pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9740) Avoid buffering and double-scan of flags in *.aff file

2021-02-09 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9740.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Avoid buffering and double-scan of flags in *.aff file
> --
>
> Key: LUCENE-9740
> URL: https://issues.apache.org/jira/browse/LUCENE-9740
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I wrote a small utility test to scan through all the *.aff files from 
> openoffice and woorm - no file has double flags (SET or FLAG) and maximum 
> leading offsets until these flags appear are roughly:
> {code}
> Flag SET at maximum offset 10753
> Flag FLAG at maximum offset 4559
> {code}
> I think we could just make an assumption that, say, affix files are read with 
> an 20kB buffered reader and this provides a maximum leading window for 
> scanning for those flags. The dictionary parsing could also fail if any of 
> these flags occurs more than once in the input file?
> This would avoid having to read the file twice and perhaps simplify the API 
> (no need for a temporary spill).
> I'll piggyback this test as part of LUCENE-9727 if you'd like to re-run it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9740) Avoid buffering and double-scan of flags in *.aff file

2021-02-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281640#comment-17281640
 ] 

ASF subversion and git services commented on LUCENE-9740:
-

Commit 061b3f29c99cf4070677eeaf4525ff6f9fff0a56 in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=061b3f2 ]

LUCENE-9740: scan affix stream once. (#2327)



> Avoid buffering and double-scan of flags in *.aff file
> --
>
> Key: LUCENE-9740
> URL: https://issues.apache.org/jira/browse/LUCENE-9740
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I wrote a small utility test to scan through all the *.aff files from 
> openoffice and woorm - no file has double flags (SET or FLAG) and maximum 
> leading offsets until these flags appear are roughly:
> {code}
> Flag SET at maximum offset 10753
> Flag FLAG at maximum offset 4559
> {code}
> I think we could just make an assumption that, say, affix files are read with 
> an 20kB buffered reader and this provides a maximum leading window for 
> scanning for those flags. The dictionary parsing could also fail if any of 
> these flags occurs more than once in the input file?
> This would avoid having to read the file twice and perhaps simplify the API 
> (no need for a temporary spill).
> I'll piggyback this test as part of LUCENE-9727 if you'd like to re-run it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572702955



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -70,6 +70,9 @@
 
 /** In-memory structure for the dictionary (.dic) and affix (.aff) data of a 
hunspell dictionary. */
 public class Dictionary {
+  // Derived from woorm/ openoffice dictionaries.

Review comment:
   Ouch. Can you correct it for my and piggyback on any other subsequent 
patch? I overlooked this one.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9750) Hunspell: improve suggestions for mixed-case misspelled words

2021-02-09 Thread Peter Gromov (Jira)

Peter Gromov created LUCENE-9750:


 Summary: Hunspell: improve suggestions for mixed-case misspelled 
words
 Key: LUCENE-9750
 URL: https://issues.apache.org/jira/browse/LUCENE-9750
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572703555



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -224,26 +227,37 @@ public Dictionary(
 this.needsInputCleaning = ignoreCase;
 this.needsOutputCleaning = false; // set if we have an OCONV
 
-Path tempPath = getDefaultTempDir(); // TODO: make this configurable?
-Path aff = Files.createTempFile(tempPath, "affix", "aff");
-
-BufferedInputStream aff1 = null;
-InputStream aff2 = null;
-boolean success = false;
-try {
-  // Copy contents of the affix stream to a temp file.
-  try (OutputStream os = Files.newOutputStream(aff)) {
-affix.transferTo(os);
+try (BufferedInputStream affixStream =
+new BufferedInputStream(affix, MAX_PROLOGUE_SCAN_WINDOW) {
+  @Override
+  public void close() throws IOException {
+// TODO: maybe we should consume and close it? Why does it need to 
stay open?

Review comment:
   Closeable.close() can be invoked any number of times without 
side-effects - this is a contract stated in the javadoc.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572704827



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -346,6 +352,7 @@ private void readAffixFile(InputStream affixStream, 
CharsetDecoder decoder, Flag
   if (line.isEmpty()) continue;
 
   String firstWord = line.split("\\s")[0];
+  // TODO: convert to a switch?

Review comment:
   Ok. My taste tells me it'd be cleaner than that multi-level-if, 
especially that not all statements are identical there (one compares to the 
entire line if I believe). It may be worth considering the switch for 
performance reasons too but don't know if you'd see the difference.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572706742



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -224,26 +227,37 @@ public Dictionary(
 this.needsInputCleaning = ignoreCase;
 this.needsOutputCleaning = false; // set if we have an OCONV
 
-Path tempPath = getDefaultTempDir(); // TODO: make this configurable?
-Path aff = Files.createTempFile(tempPath, "affix", "aff");
-
-BufferedInputStream aff1 = null;
-InputStream aff2 = null;
-boolean success = false;
-try {
-  // Copy contents of the affix stream to a temp file.
-  try (OutputStream os = Files.newOutputStream(aff)) {
-affix.transferTo(os);
+try (BufferedInputStream affixStream =
+new BufferedInputStream(affix, MAX_PROLOGUE_SCAN_WINDOW) {
+  @Override
+  public void close() throws IOException {
+// TODO: maybe we should consume and close it? Why does it need to 
stay open?

Review comment:
   True. It feels a bit more right to me when the ones creating the streams 
close them. But in fact I don't like the whole idea of passing streams to 
constructor. I believe most clients would be happier with passing paths (except 
the rare(?) cases when the content is created in memory or loaded from 
classpath).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572707863



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -778,31 +791,36 @@ char affixData(int affixIndex, int offset) {
   private static final byte[] BOM_UTF8 = {(byte) 0xef, (byte) 0xbb, (byte) 
0xbf};
 
   /** Parses the encoding and flag format specified in the provided 
InputStream */
-  private void readConfig(BufferedInputStream stream) throws IOException, 
ParseException {
-// I assume we don't support other BOMs (utf16, etc.)? We trivially could,
-// by adding maybeConsume() with a proper bom... but I don't see hunspell 
repo to have
-// any such exotic examples.
-Charset streamCharset;
-if (maybeConsume(stream, BOM_UTF8)) {
-  streamCharset = StandardCharsets.UTF_8;
-} else {
-  streamCharset = DEFAULT_CHARSET;
-}
-
-// TODO: can these flags change throughout the file? If not then we can 
abort sooner. And
-// then we wouldn't even need to create a temp file for the affix stream - 
a large enough
-// leading buffer (BufferedInputStream) would be sufficient?
+  private void readConfig(InputStream stream, Charset streamCharset)
+  throws IOException, ParseException {
 LineNumberReader reader = new LineNumberReader(new 
InputStreamReader(stream, streamCharset));
 String line;
+String flagLine = null;
+boolean charsetFound = false;
+boolean flagFound = false;
 while ((line = reader.readLine()) != null) {
   if (line.isBlank()) continue;
 
   String firstWord = line.split("\\s")[0];
   if ("SET".equals(firstWord)) {
 decoder = getDecoder(singleArgument(reader, line));
+charsetFound = true;
   } else if ("FLAG".equals(firstWord)) {
-flagParsingStrategy = getFlagParsingStrategy(line, decoder.charset());
+// Preserve the flag line for parsing later since we need the 
decoder's charset
+// and just in case they come out of order.
+flagLine = line;
+flagFound = true;
+  } else {
+continue;
   }
+
+  if (charsetFound && flagFound) {
+break;
+  }
+}
+
+if (flagFound) {

Review comment:
   Yes. I mean that `flagFound` is excessive since we already have 
`flagLine`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572710743



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -346,6 +352,7 @@ private void readAffixFile(InputStream affixStream, 
CharsetDecoder decoder, Flag
   if (line.isEmpty()) continue;
 
   String firstWord = line.split("\\s")[0];
+  // TODO: convert to a switch?

Review comment:
   A bit cleaner, yes, but verbosity and the risk of forgetting a `break` 
outweigh that for me.
   I also considered creating a map from first word into parsing lambdas, but 
decided it'd be quite verbose as well.
   
   Performance should be negligible here. Last time I checked, it was dominated 
by writing/sorting/reading dic entries and building FSTs.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter opened a new pull request #2332: LUCENE-9750: Hunspell: improve suggestions for mixed-case misspelled words

2021-02-09 Thread GitBox



donnerpeter opened a new pull request #2332:
URL: https://github.com/apache/lucene-solr/pull/2332


   
   
   
   # Description
   
   Fix a failing Hunspell repo test
   
   # Solution
   
   Replicate Hunspell's logic around suggestion casing, especially mixed-case 
ones
   
   # Tests
   
   `i58202` from Hunspell repo, whatever that means
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9751) Assertion error (int overflow) in ByteSliceReader

2021-02-09 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-9751:
---

 Summary: Assertion error (int overflow) in ByteSliceReader
 Key: LUCENE-9751
 URL: https://issues.apache.org/jira/browse/LUCENE-9751
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 8.7
Reporter: Dawid Weiss


New computers come with insane amounts of ram and heaps can get pretty big. If 
you adjust per-thread buffers to larger values strange things start happening. 
This happened to us today:

{code}
Caused by: java.lang.AssertionError
at 
org.apache.lucene.index.ByteSliceReader.init(ByteSliceReader.java:44) 
~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.TermsHashPerField.initReader(TermsHashPerField.java:88) 
~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.FreqProxFields$FreqProxPostingsEnum.reset(FreqProxFields.java:430)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.FreqProxFields$FreqProxTermsEnum.postings(FreqProxFields.java:247)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:127)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120) 
~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:264)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480) 
~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:394) 
~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:440)
 ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) 
~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
atrisharma - 2020-10-29 19:35:28]
... 7 more
{code}

Likely an int overflow in TermsHashPerField:
{code}
reader.init(bytePool,

postingsArray.byteStarts[termID]+stream*ByteBlockPool.FIRST_LEVEL_SIZE,
streamAddressBuffer[offsetInAddressBuffer+stream]);
{code}

Don't know if this can be prevented somehow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572717985



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -224,26 +227,37 @@ public Dictionary(
 this.needsInputCleaning = ignoreCase;
 this.needsOutputCleaning = false; // set if we have an OCONV
 
-Path tempPath = getDefaultTempDir(); // TODO: make this configurable?
-Path aff = Files.createTempFile(tempPath, "affix", "aff");
-
-BufferedInputStream aff1 = null;
-InputStream aff2 = null;
-boolean success = false;
-try {
-  // Copy contents of the affix stream to a temp file.
-  try (OutputStream os = Files.newOutputStream(aff)) {
-affix.transferTo(os);
+try (BufferedInputStream affixStream =
+new BufferedInputStream(affix, MAX_PROLOGUE_SCAN_WINDOW) {
+  @Override
+  public void close() throws IOException {
+// TODO: maybe we should consume and close it? Why does it need to 
stay open?

Review comment:
   Leave it a stream, it's hard to beat its flexibility.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572718170



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -346,6 +352,7 @@ private void readAffixFile(InputStream affixStream, 
CharsetDecoder decoder, Flag
   if (line.isEmpty()) continue;
 
   String firstWord = line.split("\\s")[0];
+  // TODO: convert to a switch?

Review comment:
   ok.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572718487



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -778,31 +791,36 @@ char affixData(int affixIndex, int offset) {
   private static final byte[] BOM_UTF8 = {(byte) 0xef, (byte) 0xbb, (byte) 
0xbf};
 
   /** Parses the encoding and flag format specified in the provided 
InputStream */
-  private void readConfig(BufferedInputStream stream) throws IOException, 
ParseException {
-// I assume we don't support other BOMs (utf16, etc.)? We trivially could,
-// by adding maybeConsume() with a proper bom... but I don't see hunspell 
repo to have
-// any such exotic examples.
-Charset streamCharset;
-if (maybeConsume(stream, BOM_UTF8)) {
-  streamCharset = StandardCharsets.UTF_8;
-} else {
-  streamCharset = DEFAULT_CHARSET;
-}
-
-// TODO: can these flags change throughout the file? If not then we can 
abort sooner. And
-// then we wouldn't even need to create a temp file for the affix stream - 
a large enough
-// leading buffer (BufferedInputStream) would be sufficient?
+  private void readConfig(InputStream stream, Charset streamCharset)
+  throws IOException, ParseException {
 LineNumberReader reader = new LineNumberReader(new 
InputStreamReader(stream, streamCharset));
 String line;
+String flagLine = null;
+boolean charsetFound = false;
+boolean flagFound = false;
 while ((line = reader.readLine()) != null) {
   if (line.isBlank()) continue;
 
   String firstWord = line.split("\\s")[0];
   if ("SET".equals(firstWord)) {
 decoder = getDecoder(singleArgument(reader, line));
+charsetFound = true;
   } else if ("FLAG".equals(firstWord)) {
-flagParsingStrategy = getFlagParsingStrategy(line, decoder.charset());
+// Preserve the flag line for parsing later since we need the 
decoder's charset
+// and just in case they come out of order.
+flagLine = line;
+flagFound = true;
+  } else {
+continue;
   }
+
+  if (charsetFound && flagFound) {
+break;
+  }
+}
+
+if (flagFound) {

Review comment:
   Yeah, I left that intentionally for parity with flagFound... 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15147) Hide jdbc credentials in data-config.xml

2021-02-09 Thread Ajay G (Jira)

Ajay G created SOLR-15147:
-

 Summary: Hide jdbc credentials in data-config.xml 
 Key: SOLR-15147
 URL: https://issues.apache.org/jira/browse/SOLR-15147
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI, SolrCloud
Affects Versions: 7.1.1
Reporter: Ajay G


Team,

is there any chance to hide data-config files in solr-7.x version 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572744570



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -33,44 +40,59 @@
  */
 class GeneratingSuggester {
   private static final int MAX_ROOTS = 100;
-  private static final int MAX_GUESSES = 100;
+  private static final int MAX_WORDS = 100;
+  private static final int MAX_GUESSES = 200;
   private final Dictionary dictionary;
+  private final SpellChecker speller;
 
-  GeneratingSuggester(Dictionary dictionary) {
-this.dictionary = dictionary;
+  GeneratingSuggester(SpellChecker speller) {
+this.dictionary = speller.dictionary;
+this.speller = speller;
   }
 
   List suggest(String word, WordCase originalCase, Set 
prevSuggestions) {
-List roots = findSimilarDictionaryEntries(word, 
originalCase);
-List expanded = expandRoots(word, roots);
-TreeSet bySimilarity = rankBySimilarity(word, expanded);
+List> roots = findSimilarDictionaryEntries(word, 
originalCase);
+List> expanded = expandRoots(word, roots);
+TreeSet> bySimilarity = rankBySimilarity(word, expanded);
 return getMostRelevantSuggestions(bySimilarity, prevSuggestions);
   }
 
-  private List findSimilarDictionaryEntries(String word, 
WordCase originalCase) {
-try {
-  IntsRefFSTEnum fstEnum = new IntsRefFSTEnum<>(dictionary.words);
-  TreeSet roots = new TreeSet<>();
+  private List> findSimilarDictionaryEntries(
+  String word, WordCase originalCase) {
+TreeSet> roots = new TreeSet<>();
+processFST(
+dictionary.words,
+(key, forms) -> {
+  if (Math.abs(key.length - word.length()) > 4) return;
+
+  String root = toString(key);
+  List entries = filterSuitableEntries(root, forms);
+  if (entries.isEmpty()) return;
+
+  if (originalCase == WordCase.LOWER
+  && WordCase.caseOf(root) == WordCase.TITLE
+  && !dictionary.hasLanguage("de")) {
+return;
+  }
 
-  IntsRefFSTEnum.InputOutput mapping;
-  while ((mapping = fstEnum.next()) != null) {
-IntsRef key = mapping.input;
-if (Math.abs(key.length - word.length()) > 4 || 
!isSuitableRoot(mapping.output)) continue;
-
-String root = toString(key);
-if (originalCase == WordCase.LOWER
-&& WordCase.caseOf(root) == WordCase.TITLE
-&& !dictionary.hasLanguage("de")) {
-  continue;
-}
+  String lower = dictionary.toLowerCase(root);
+  int sc =
+  ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE))
+  + commonPrefix(word, root);
 
-String lower = dictionary.toLowerCase(root);
-int sc =
-ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE)) + 
commonPrefix(word, root);
+  entries.forEach(e -> roots.add(new Weighted<>(e, sc)));
+});
+return roots.stream().limit(MAX_ROOTS).collect(Collectors.toList());
+  }
 
-roots.add(new WeightedWord(root, sc));
+  private void processFST(FST fst, BiConsumer 
keyValueConsumer) {

Review comment:
   Add a "forEach" method to fstenum, maybe? It'd correspond to Java 
collections then.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-9854) Collect metrics for index merges and index store IO

2021-02-09 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281682#comment-17281682
 ] 

Andrzej Bialecki commented on SOLR-9854:


Metrics Counter can only go forward but these integers must be able to go both 
ways because they represent the number of *currently* running merges (and the 
current number of docs / segments involved in the running merges), which 
naturally may vary from 0 to N.

> Collect metrics for index merges and index store IO
> ---
>
> Key: SOLR-9854
> URL: https://issues.apache.org/jira/browse/SOLR-9854
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 6.4, 7.0
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 6.4, 7.0
>
> Attachments: SOLR-9854.patch, SOLR-9854.patch
>
>
> Using API for metrics management developed in SOLR-4735 we should also start 
> collecting metrics for major aspects of {{IndexWriter}} operation, such as 
> read / write IO rates, number of minor and major merges and IO during these 
> operations, etc.
> This will provide a better insight into resource consumption and load at the 
> IO level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9741) Add optimization for sequential access of stored fields

2021-02-09 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281684#comment-17281684
 ] 

Adrien Grand commented on LUCENE-9741:
--

I've fallen into the trap of not optimizing merging for stored fields a couple 
times, typically by forgetting to override {{getMergeInstance()}} when passing 
a FilterCodecReader to {{IndexWriter#addIndexes}}, so I'd be supportive of 
making sequential access more a first-class citizen of stored fields.

However the proposed API feels a bit too complex to me. I wonder if we could 
achieve the same benefits by changing the StoredFieldsReader API to return an 
iterator over stored fields that would keep state in order to avoid 
decompressing the same data over and over again?

> Add optimization for sequential access of stored fields
> ---
>
> Key: LUCENE-9741
> URL: https://issues.apache.org/jira/browse/LUCENE-9741
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If we are reading the stored-fields of document ids (25, 27, 28, 26, 99), and 
> doc-25 triggers the stored-fields reader to decompress a block containing 
> document ids [10-50], then we can tell the reader to read not only 25, but 
> 26, 27, and 28 to avoid decompressing that block multiple times.
> This issue proposes adding a new optimized instance of stored-fields reader 
> that allows users to select the preferred fetching range.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9741) Add optimization for sequential access of stored fields

2021-02-09 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281687#comment-17281687
 ] 

Adrien Grand commented on LUCENE-9741:
--

To be clear, I'm thinking of only updating the {{StoredFieldsReader}} API, the 
{{LeafReaderAPI}} could remain the same and {{CodecReader#document}} could be 
implemented by creating an iterator, advancing it to the desired doc, doing 
what it has to do, and then throwing away the iterator immediately to allow the 
JVM to garbage-collect memory that is needed for the internal state of the 
iterator.

> Add optimization for sequential access of stored fields
> ---
>
> Key: LUCENE-9741
> URL: https://issues.apache.org/jira/browse/LUCENE-9741
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If we are reading the stored-fields of document ids (25, 27, 28, 26, 99), and 
> doc-25 triggers the stored-fields reader to decompress a block containing 
> document ids [10-50], then we can tell the reader to read not only 25, but 
> 26, 27, and 28 to avoid decompressing that block multiple times.
> This issue proposes adding a new optimized instance of stored-fields reader 
> that allows users to select the preferred fetching range.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9741) Add optimization for sequential access of stored fields

2021-02-09 Thread Robert Muir (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281690#comment-17281690
 ] 

Robert Muir commented on LUCENE-9741:
-

The getMergeInstance() is already optimized for this case though, why do we 
need additional apis?

> Add optimization for sequential access of stored fields
> ---
>
> Key: LUCENE-9741
> URL: https://issues.apache.org/jira/browse/LUCENE-9741
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If we are reading the stored-fields of document ids (25, 27, 28, 26, 99), and 
> doc-25 triggers the stored-fields reader to decompress a block containing 
> document ids [10-50], then we can tell the reader to read not only 25, but 
> 26, 27, and 28 to avoid decompressing that block multiple times.
> This issue proposes adding a new optimized instance of stored-fields reader 
> that allows users to select the preferred fetching range.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572760859



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -33,44 +40,59 @@
  */
 class GeneratingSuggester {
   private static final int MAX_ROOTS = 100;
-  private static final int MAX_GUESSES = 100;
+  private static final int MAX_WORDS = 100;
+  private static final int MAX_GUESSES = 200;
   private final Dictionary dictionary;
+  private final SpellChecker speller;
 
-  GeneratingSuggester(Dictionary dictionary) {
-this.dictionary = dictionary;
+  GeneratingSuggester(SpellChecker speller) {
+this.dictionary = speller.dictionary;
+this.speller = speller;
   }
 
   List suggest(String word, WordCase originalCase, Set 
prevSuggestions) {
-List roots = findSimilarDictionaryEntries(word, 
originalCase);
-List expanded = expandRoots(word, roots);
-TreeSet bySimilarity = rankBySimilarity(word, expanded);
+List> roots = findSimilarDictionaryEntries(word, 
originalCase);
+List> expanded = expandRoots(word, roots);
+TreeSet> bySimilarity = rankBySimilarity(word, expanded);
 return getMostRelevantSuggestions(bySimilarity, prevSuggestions);
   }
 
-  private List findSimilarDictionaryEntries(String word, 
WordCase originalCase) {
-try {
-  IntsRefFSTEnum fstEnum = new IntsRefFSTEnum<>(dictionary.words);
-  TreeSet roots = new TreeSet<>();
+  private List> findSimilarDictionaryEntries(
+  String word, WordCase originalCase) {
+TreeSet> roots = new TreeSet<>();
+processFST(
+dictionary.words,
+(key, forms) -> {
+  if (Math.abs(key.length - word.length()) > 4) return;
+
+  String root = toString(key);
+  List entries = filterSuitableEntries(root, forms);
+  if (entries.isEmpty()) return;
+
+  if (originalCase == WordCase.LOWER
+  && WordCase.caseOf(root) == WordCase.TITLE
+  && !dictionary.hasLanguage("de")) {
+return;
+  }
 
-  IntsRefFSTEnum.InputOutput mapping;
-  while ((mapping = fstEnum.next()) != null) {
-IntsRef key = mapping.input;
-if (Math.abs(key.length - word.length()) > 4 || 
!isSuitableRoot(mapping.output)) continue;
-
-String root = toString(key);
-if (originalCase == WordCase.LOWER
-&& WordCase.caseOf(root) == WordCase.TITLE
-&& !dictionary.hasLanguage("de")) {
-  continue;
-}
+  String lower = dictionary.toLowerCase(root);
+  int sc =
+  ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE))
+  + commonPrefix(word, root);
 
-String lower = dictionary.toLowerCase(root);
-int sc =
-ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE)) + 
commonPrefix(word, root);
+  entries.forEach(e -> roots.add(new Weighted<>(e, sc)));
+});
+return roots.stream().limit(MAX_ROOTS).collect(Collectors.toList());
+  }
 
-roots.add(new WeightedWord(root, sc));
+  private void processFST(FST fst, BiConsumer 
keyValueConsumer) {

Review comment:
   I wonder if it makes sense to add something breakable in the middle, 
e.g. accepting some processor (unfortunately neither BiFunction nor BiPredicate 
convey that semantics for me :( ). OTOH I don't need it right now, and 
breakability can be added later. Or, it could be made a `Stream` or `Iterable`.
   
   One complication though: here I ignore all `IOException`s, but that's 
probably not a good idea in a general FST case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572760859



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -33,44 +40,59 @@
  */
 class GeneratingSuggester {
   private static final int MAX_ROOTS = 100;
-  private static final int MAX_GUESSES = 100;
+  private static final int MAX_WORDS = 100;
+  private static final int MAX_GUESSES = 200;
   private final Dictionary dictionary;
+  private final SpellChecker speller;
 
-  GeneratingSuggester(Dictionary dictionary) {
-this.dictionary = dictionary;
+  GeneratingSuggester(SpellChecker speller) {
+this.dictionary = speller.dictionary;
+this.speller = speller;
   }
 
   List suggest(String word, WordCase originalCase, Set 
prevSuggestions) {
-List roots = findSimilarDictionaryEntries(word, 
originalCase);
-List expanded = expandRoots(word, roots);
-TreeSet bySimilarity = rankBySimilarity(word, expanded);
+List> roots = findSimilarDictionaryEntries(word, 
originalCase);
+List> expanded = expandRoots(word, roots);
+TreeSet> bySimilarity = rankBySimilarity(word, expanded);
 return getMostRelevantSuggestions(bySimilarity, prevSuggestions);
   }
 
-  private List findSimilarDictionaryEntries(String word, 
WordCase originalCase) {
-try {
-  IntsRefFSTEnum fstEnum = new IntsRefFSTEnum<>(dictionary.words);
-  TreeSet roots = new TreeSet<>();
+  private List> findSimilarDictionaryEntries(
+  String word, WordCase originalCase) {
+TreeSet> roots = new TreeSet<>();
+processFST(
+dictionary.words,
+(key, forms) -> {
+  if (Math.abs(key.length - word.length()) > 4) return;
+
+  String root = toString(key);
+  List entries = filterSuitableEntries(root, forms);
+  if (entries.isEmpty()) return;
+
+  if (originalCase == WordCase.LOWER
+  && WordCase.caseOf(root) == WordCase.TITLE
+  && !dictionary.hasLanguage("de")) {
+return;
+  }
 
-  IntsRefFSTEnum.InputOutput mapping;
-  while ((mapping = fstEnum.next()) != null) {
-IntsRef key = mapping.input;
-if (Math.abs(key.length - word.length()) > 4 || 
!isSuitableRoot(mapping.output)) continue;
-
-String root = toString(key);
-if (originalCase == WordCase.LOWER
-&& WordCase.caseOf(root) == WordCase.TITLE
-&& !dictionary.hasLanguage("de")) {
-  continue;
-}
+  String lower = dictionary.toLowerCase(root);
+  int sc =
+  ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE))
+  + commonPrefix(word, root);
 
-String lower = dictionary.toLowerCase(root);
-int sc =
-ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE)) + 
commonPrefix(word, root);
+  entries.forEach(e -> roots.add(new Weighted<>(e, sc)));
+});
+return roots.stream().limit(MAX_ROOTS).collect(Collectors.toList());
+  }
 
-roots.add(new WeightedWord(root, sc));
+  private void processFST(FST fst, BiConsumer 
keyValueConsumer) {

Review comment:
   I wonder if it makes sense to add something breakable in the middle, 
e.g. accepting some processor (unfortunately neither BiFunction nor BiPredicate 
convey that semantics for me :( ). OTOH I don't need it right now, and 
breakability can be added later. Or, it could be made a `Stream` or `Iterable`.
   
   One complication though: here I wrap all `IOException`s, but that's probably 
not a good idea in a general FST case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



dweiss commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572764206



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -33,44 +40,59 @@
  */
 class GeneratingSuggester {
   private static final int MAX_ROOTS = 100;
-  private static final int MAX_GUESSES = 100;
+  private static final int MAX_WORDS = 100;
+  private static final int MAX_GUESSES = 200;
   private final Dictionary dictionary;
+  private final SpellChecker speller;
 
-  GeneratingSuggester(Dictionary dictionary) {
-this.dictionary = dictionary;
+  GeneratingSuggester(SpellChecker speller) {
+this.dictionary = speller.dictionary;
+this.speller = speller;
   }
 
   List suggest(String word, WordCase originalCase, Set 
prevSuggestions) {
-List roots = findSimilarDictionaryEntries(word, 
originalCase);
-List expanded = expandRoots(word, roots);
-TreeSet bySimilarity = rankBySimilarity(word, expanded);
+List> roots = findSimilarDictionaryEntries(word, 
originalCase);
+List> expanded = expandRoots(word, roots);
+TreeSet> bySimilarity = rankBySimilarity(word, expanded);
 return getMostRelevantSuggestions(bySimilarity, prevSuggestions);
   }
 
-  private List findSimilarDictionaryEntries(String word, 
WordCase originalCase) {
-try {
-  IntsRefFSTEnum fstEnum = new IntsRefFSTEnum<>(dictionary.words);
-  TreeSet roots = new TreeSet<>();
+  private List> findSimilarDictionaryEntries(
+  String word, WordCase originalCase) {
+TreeSet> roots = new TreeSet<>();
+processFST(
+dictionary.words,
+(key, forms) -> {
+  if (Math.abs(key.length - word.length()) > 4) return;
+
+  String root = toString(key);
+  List entries = filterSuitableEntries(root, forms);
+  if (entries.isEmpty()) return;
+
+  if (originalCase == WordCase.LOWER
+  && WordCase.caseOf(root) == WordCase.TITLE
+  && !dictionary.hasLanguage("de")) {
+return;
+  }
 
-  IntsRefFSTEnum.InputOutput mapping;
-  while ((mapping = fstEnum.next()) != null) {
-IntsRef key = mapping.input;
-if (Math.abs(key.length - word.length()) > 4 || 
!isSuitableRoot(mapping.output)) continue;
-
-String root = toString(key);
-if (originalCase == WordCase.LOWER
-&& WordCase.caseOf(root) == WordCase.TITLE
-&& !dictionary.hasLanguage("de")) {
-  continue;
-}
+  String lower = dictionary.toLowerCase(root);
+  int sc =
+  ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE))
+  + commonPrefix(word, root);
 
-String lower = dictionary.toLowerCase(root);
-int sc =
-ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE)) + 
commonPrefix(word, root);
+  entries.forEach(e -> roots.add(new Weighted<>(e, sc)));
+});
+return roots.stream().limit(MAX_ROOTS).collect(Collectors.toList());
+  }
 
-roots.add(new WeightedWord(root, sc));
+  private void processFST(FST fst, BiConsumer 
keyValueConsumer) {

Review comment:
   A BiPredicate sounds good to me, actually... But if IOExceptions are to 
be allowed then you'd need a custom visitor interface anyway.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9752) Hunspell Stemmer: reduce parameter count

2021-02-09 Thread Peter Gromov (Jira)

Peter Gromov created LUCENE-9752:


 Summary: Hunspell Stemmer: reduce parameter count
 Key: LUCENE-9752
 URL: https://issues.apache.org/jira/browse/LUCENE-9752
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2330: LUCENE-9748: Hunspell: suggest inflected dictionary entries similar t…

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2330:
URL: https://github.com/apache/lucene-solr/pull/2330#discussion_r572766983



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java
##
@@ -33,44 +40,59 @@
  */
 class GeneratingSuggester {
   private static final int MAX_ROOTS = 100;
-  private static final int MAX_GUESSES = 100;
+  private static final int MAX_WORDS = 100;
+  private static final int MAX_GUESSES = 200;
   private final Dictionary dictionary;
+  private final SpellChecker speller;
 
-  GeneratingSuggester(Dictionary dictionary) {
-this.dictionary = dictionary;
+  GeneratingSuggester(SpellChecker speller) {
+this.dictionary = speller.dictionary;
+this.speller = speller;
   }
 
   List suggest(String word, WordCase originalCase, Set 
prevSuggestions) {
-List roots = findSimilarDictionaryEntries(word, 
originalCase);
-List expanded = expandRoots(word, roots);
-TreeSet bySimilarity = rankBySimilarity(word, expanded);
+List> roots = findSimilarDictionaryEntries(word, 
originalCase);
+List> expanded = expandRoots(word, roots);
+TreeSet> bySimilarity = rankBySimilarity(word, expanded);
 return getMostRelevantSuggestions(bySimilarity, prevSuggestions);
   }
 
-  private List findSimilarDictionaryEntries(String word, 
WordCase originalCase) {
-try {
-  IntsRefFSTEnum fstEnum = new IntsRefFSTEnum<>(dictionary.words);
-  TreeSet roots = new TreeSet<>();
+  private List> findSimilarDictionaryEntries(
+  String word, WordCase originalCase) {
+TreeSet> roots = new TreeSet<>();
+processFST(
+dictionary.words,
+(key, forms) -> {
+  if (Math.abs(key.length - word.length()) > 4) return;
+
+  String root = toString(key);
+  List entries = filterSuitableEntries(root, forms);
+  if (entries.isEmpty()) return;
+
+  if (originalCase == WordCase.LOWER
+  && WordCase.caseOf(root) == WordCase.TITLE
+  && !dictionary.hasLanguage("de")) {
+return;
+  }
 
-  IntsRefFSTEnum.InputOutput mapping;
-  while ((mapping = fstEnum.next()) != null) {
-IntsRef key = mapping.input;
-if (Math.abs(key.length - word.length()) > 4 || 
!isSuitableRoot(mapping.output)) continue;
-
-String root = toString(key);
-if (originalCase == WordCase.LOWER
-&& WordCase.caseOf(root) == WordCase.TITLE
-&& !dictionary.hasLanguage("de")) {
-  continue;
-}
+  String lower = dictionary.toLowerCase(root);
+  int sc =
+  ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE))
+  + commonPrefix(word, root);
 
-String lower = dictionary.toLowerCase(root);
-int sc =
-ngram(3, word, lower, EnumSet.of(NGramOptions.LONGER_WORSE)) + 
commonPrefix(word, root);
+  entries.forEach(e -> roots.add(new Weighted<>(e, sc)));
+});
+return roots.stream().limit(MAX_ROOTS).collect(Collectors.toList());
+  }
 
-roots.add(new WeightedWord(root, sc));
+  private void processFST(FST fst, BiConsumer 
keyValueConsumer) {

Review comment:
   BiPredicate sounds pure to me, while this processing can have side 
effects. It's not in the javadoc, just in the name: predicates are something 
stateless.
   
   `IOException`s would be in the FST walking, the processing code itself 
doesn't necessarily need them (but can also have them).
   
   Maybe given all that it's just easier to leave the walking here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2327: LUCENE-9740: scan affix stream once.

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2327:
URL: https://github.com/apache/lucene-solr/pull/2327#discussion_r572769064



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -778,31 +791,36 @@ char affixData(int affixIndex, int offset) {
   private static final byte[] BOM_UTF8 = {(byte) 0xef, (byte) 0xbb, (byte) 
0xbf};
 
   /** Parses the encoding and flag format specified in the provided 
InputStream */
-  private void readConfig(BufferedInputStream stream) throws IOException, 
ParseException {
-// I assume we don't support other BOMs (utf16, etc.)? We trivially could,
-// by adding maybeConsume() with a proper bom... but I don't see hunspell 
repo to have
-// any such exotic examples.
-Charset streamCharset;
-if (maybeConsume(stream, BOM_UTF8)) {
-  streamCharset = StandardCharsets.UTF_8;
-} else {
-  streamCharset = DEFAULT_CHARSET;
-}
-
-// TODO: can these flags change throughout the file? If not then we can 
abort sooner. And
-// then we wouldn't even need to create a temp file for the affix stream - 
a large enough
-// leading buffer (BufferedInputStream) would be sufficient?
+  private void readConfig(InputStream stream, Charset streamCharset)
+  throws IOException, ParseException {
 LineNumberReader reader = new LineNumberReader(new 
InputStreamReader(stream, streamCharset));
 String line;
+String flagLine = null;
+boolean charsetFound = false;
+boolean flagFound = false;
 while ((line = reader.readLine()) != null) {
   if (line.isBlank()) continue;
 
   String firstWord = line.split("\\s")[0];
   if ("SET".equals(firstWord)) {
 decoder = getDecoder(singleArgument(reader, line));
+charsetFound = true;
   } else if ("FLAG".equals(firstWord)) {
-flagParsingStrategy = getFlagParsingStrategy(line, decoder.charset());
+// Preserve the flag line for parsing later since we need the 
decoder's charset
+// and just in case they come out of order.
+flagLine = line;
+flagFound = true;
+  } else {
+continue;
   }
+
+  if (charsetFound && flagFound) {
+break;
+  }
+}
+
+if (flagFound) {

Review comment:
   It could be paired by using something nullable encoding-related :) 
Anyway it's very minor.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter opened a new pull request #2333: LUCENE-9752: Hunspell Stemmer: reduce parameter count

2021-02-09 Thread GitBox



donnerpeter opened a new pull request #2333:
URL: https://github.com/apache/lucene-solr/pull/2333


   
   
   
   # Description
   
   There's too many parameters, some of them avoidable
   
   # Solution
   
   `doSuffix` is always true, `circumfix` can be calculated at the usage site 
(and once, not for every homonym).
   
   # Tests
   
   Unaffected
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2333: LUCENE-9752: Hunspell Stemmer: reduce parameter count

2021-02-09 Thread GitBox



donnerpeter commented on a change in pull request #2333:
URL: https://github.com/apache/lucene-solr/pull/2333#discussion_r572772417



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##
@@ -698,15 +688,6 @@ private boolean applyAffix(
 }
   }
 
-  // if circumfix was previously set by a prefix, we must check this 
suffix,
-  // to ensure it has it, and vice versa
-  if (dictionary.circumfix != Dictionary.FLAG_UNSET) {

Review comment:
   moved into `skipLookup`, as this check is independent of the loop 
variable





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] murblanc commented on pull request #2318: SOLR-15138: PerReplicaStates does not scale to large collections as well as state.json

2021-02-09 Thread GitBox



murblanc commented on pull request #2318:
URL: https://github.com/apache/lucene-solr/pull/2318#issuecomment-775852524


   > Error is a timeout from `CollectionsHandler` having waited 45 seconds 
   
   I take that back @noblepaul . I did the test wrong and just did it again and 
it passes. Timing for the collection creation (11x11=121 replicas on 3 nodes) 
is similar with or without PRS at about 45 seconds.
   
   I can do more testing later (more concurrent threads, more and smaller 
collections). Note I did put out a few numbers on PRS (not with the patch in 
this PR though), see [this 
comment](https://issues.apache.org/jira/browse/SOLR-15146?focusedCommentId=17281460&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17281460).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase opened a new pull request #2334: LUCENE-9705: Create Lucene90TermVectorsFormat

2021-02-09 Thread GitBox



iverase opened a new pull request #2334:
URL: https://github.com/apache/lucene-solr/pull/2334


   For now this is just a copy of Lucene90TermVectorsFormat. The existing
   Lucene50TermVectorsFormat was moved to backwards-codecs, along with its 
utility
   classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9751) Assertion error (int overflow) in ByteSliceReader

2021-02-09 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281751#comment-17281751
 ] 

Michael McCandless commented on LUCENE-9751:


Hmm I thought we long ago added a best effort to detect/prevent too large a 
DWPT RAM buffer?  Were you maybe indexing rather large individual documents?

> Assertion error (int overflow) in ByteSliceReader
> -
>
> Key: LUCENE-9751
> URL: https://issues.apache.org/jira/browse/LUCENE-9751
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.7
>Reporter: Dawid Weiss
>Priority: Major
>
> New computers come with insane amounts of ram and heaps can get pretty big. 
> If you adjust per-thread buffers to larger values strange things start 
> happening. This happened to us today:
> {code}
> Caused by: java.lang.AssertionError
>   at 
> org.apache.lucene.index.ByteSliceReader.init(ByteSliceReader.java:44) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.TermsHashPerField.initReader(TermsHashPerField.java:88)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxFields$FreqProxPostingsEnum.reset(FreqProxFields.java:430)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxFields$FreqProxTermsEnum.postings(FreqProxFields.java:247)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:127)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:264)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:394) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:440)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   ... 7 more
> {code}
> Likely an int overflow in TermsHashPerField:
> {code}
> reader.init(bytePool,
> 
> postingsArray.byteStarts[termID]+stream*ByteBlockPool.FIRST_LEVEL_SIZE,
> streamAddressBuffer[offsetInAddressBuffer+stream]);
> {code}
> Don't know if this can be prevented somehow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9751) Assertion error (int overflow) in ByteSliceReader

2021-02-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281753#comment-17281753
 ] 

Dawid Weiss commented on LUCENE-9751:
-

Everything passes with flying colors on lower heap settings (which result in 
smaller per-thread buffers). Lower means ~20GB. This failure occurred with max 
heap of 32GB. It's a highly concurrent and job-stealing setup so I doubt I can 
easily reproduce...

> Assertion error (int overflow) in ByteSliceReader
> -
>
> Key: LUCENE-9751
> URL: https://issues.apache.org/jira/browse/LUCENE-9751
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.7
>Reporter: Dawid Weiss
>Priority: Major
>
> New computers come with insane amounts of ram and heaps can get pretty big. 
> If you adjust per-thread buffers to larger values strange things start 
> happening. This happened to us today:
> {code}
> Caused by: java.lang.AssertionError
>   at 
> org.apache.lucene.index.ByteSliceReader.init(ByteSliceReader.java:44) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.TermsHashPerField.initReader(TermsHashPerField.java:88)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxFields$FreqProxPostingsEnum.reset(FreqProxFields.java:430)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxFields$FreqProxTermsEnum.postings(FreqProxFields.java:247)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:127)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:264)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:394) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:440)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   ... 7 more
> {code}
> Likely an int overflow in TermsHashPerField:
> {code}
> reader.init(bytePool,
> 
> postingsArray.byteStarts[termID]+stream*ByteBlockPool.FIRST_LEVEL_SIZE,
> streamAddressBuffer[offsetInAddressBuffer+stream]);
> {code}
> Don't know if this can be prevented somehow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr-operator] krishnachalla-hv opened a new issue #212: Solr cloud never getes deleted when reclaimPolicy is set to Delete

2021-02-09 Thread GitBox



krishnachalla-hv opened a new issue #212:
URL: https://github.com/apache/lucene-solr-operator/issues/212


   I facing this issue after upgrading to **v0.2.8**. Earlier I was using 
**v0.2.6** version of solrcloud things were working fine as expected except the 
**pvc** deletion that were created by solrcloud since the feature was not 
available in **v0.2.6**. Now I am testing my application with latest version of 
solr-operator(v02.8) facing wired issue, When I set the **reclaimPolicy: 
Delete** for solr as well provided Zookeeper instance in **solrcloud** yaml 
file, when I tried to uninstall it only the solr-operator and zk-operatorr are 
getting uninstalled but the solrcloud and zookeepr pods never gets terminated. 
I created one sample test chart  like below tested these things but the issue 
is still the same.
   
   ```
   apiVersion: v2
   name: test
   description: A Helm chart for intializing multi node solr cloud.
   type: application
   version: 1.0.0
   appVersion: 1.0.0
   dependencies:
 - name: solr-operator
   version: 0.2.8
   repository: "https://apache.github.io/lucene-solr-operator/charts";
   condition: solr-operator.enabled
   
 - name: zookeeper-operator
   version: 0.3.0
   repository: "https://kubernetes-charts.banzaicloud.com";
   condition: zookeeper-operator.enabled
   ```
   
   And my solr cloud yaml configuration is:
   ```
   apiVersion: solr.bloomberg.com/v1beta1
   kind: SolrCloud
   metadata:
 name: {{ .Release.Name }}
   spec:
 dataStorage:
   persistent:
 reclaimPolicy: Delete
 pvcTemplate:
   spec:
 resources:
   requests:
 storage: "5Gi"
 replicas: 2
 solrImage:
   tag: 8.7.0
 solrJavaMem: "-Xms1g -Xmx3g"
 customSolrKubeOptions:
   podOptions:
 resources:
   limits:
 memory: "1G"
   requests:
 cpu: "65m"
 memory: "156Mi"
 zookeeperRef:
   provided:
 chroot: "/solr"
 persistence:
   reclaimPolicy: Delete
   spec:
 resources:
   requests:
 storage: "5Gi"
 replicas: 3
 zookeeperPodPolicy:
   resources:
 limits:
   memory: "1G"
 requests:
   cpu: "65m"
   memory: "156Mi"
 solrOpts: "-Dsolr.autoSoftCommit.maxTime=1"
 solrGCTune: "-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 
-XX:MaxTenuringThreshold=8"
   ```
 
 If I set the **reclaimPolicy** to **Retain** and uninstall the chart, 
solrcloud is also uninstalling properly. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9663) Adding compression to terms dict from SortedSet/Sorted DocValues

2021-02-09 Thread Bruno Roustant (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281779#comment-17281779
 ] 

Bruno Roustant commented on LUCENE-9663:


I'm ready to merge. I think it could go to 8.9 branch but I'd like to have 
confirmation.

This change adds compression to Lucene80DocValuesFormat if the 
Mode.BEST_COMPRESSION is used and is backward compatible.

[~jpountz] any suggestion? Thanks

> Adding compression to terms dict from SortedSet/Sorted DocValues
> 
>
> Key: LUCENE-9663
> URL: https://issues.apache.org/jira/browse/LUCENE-9663
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Jaison.Bi
>Priority: Trivial
> Fix For: master (9.0)
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> Elasticsearch keyword field uses SortedSet DocValues. In our applications, 
> “keyword” is the most frequently used field type.
>  LUCENE-7081 has done prefix-compression for docvalues terms dict. We can do 
> better by replacing prefix-compression with LZ4. In one of our application, 
> the dvd files were ~41% smaller with this change(from 1.95 GB to 1.15 GB).
>  I've done simple tests based on the real application data, comparing the 
> write/merge time cost, and the on-disk *.dvd file size(after merge into 1 
> segment).
> || ||Before||After||
> |Write time cost(ms)|591972|618200|
> |Merge time cost(ms)|270661|294663|
> |*.dvd file size(GB)|1.95|1.15|
> This feature is only for the high-cardinality fields. 
>  I'm doing the benchmark test based on luceneutil. Will attach the report and 
> patch after the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2334: LUCENE-9705: Create Lucene90TermVectorsFormat

2021-02-09 Thread GitBox



muse-dev[bot] commented on a change in pull request #2334:
URL: https://github.com/apache/lucene-solr/pull/2334#discussion_r572873185



##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/compressing/Lucene50CompressingTermVectorsReader.java
##
@@ -0,0 +1,1367 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_codecs.compressing;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.TermVectorsReader;
+import org.apache.lucene.codecs.compressing.CompressionMode;
+import org.apache.lucene.codecs.compressing.Decompressor;
+import org.apache.lucene.index.BaseTermsEnum;
+import org.apache.lucene.index.CorruptIndexException;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FieldInfos;
+import org.apache.lucene.index.Fields;
+import org.apache.lucene.index.ImpactsEnum;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.SegmentInfo;
+import org.apache.lucene.index.SlowImpactsEnum;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.store.AlreadyClosedException;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ChecksumIndexInput;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.Accountable;
+import org.apache.lucene.util.Accountables;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.LongsRef;
+import org.apache.lucene.util.packed.BlockPackedReaderIterator;
+import org.apache.lucene.util.packed.PackedInts;
+
+/**
+ * {@link TermVectorsReader} for {@link Lucene50CompressingTermVectorsFormat}.
+ *
+ * @lucene.experimental
+ */
+public final class Lucene50CompressingTermVectorsReader extends 
TermVectorsReader
+implements Closeable {
+
+  // hard limit on the maximum number of documents per chunk
+  static final int MAX_DOCUMENTS_PER_CHUNK = 128;
+
+  static final String VECTORS_EXTENSION = "tvd";
+  static final String VECTORS_INDEX_EXTENSION = "tvx";
+  static final String VECTORS_META_EXTENSION = "tvm";
+  static final String VECTORS_INDEX_CODEC_NAME = "Lucene85TermVectorsIndex";
+
+  static final int VERSION_START = 1;
+  static final int VERSION_OFFHEAP_INDEX = 2;
+  /** Version where all metadata were moved to the meta file. */
+  static final int VERSION_META = 3;
+
+  static final int VERSION_CURRENT = VERSION_META;
+  static final int META_VERSION_START = 0;
+
+  static final int PACKED_BLOCK_SIZE = 64;
+
+  static final int POSITIONS = 0x01;
+  static final int OFFSETS = 0x02;
+  static final int PAYLOADS = 0x04;
+  static final int FLAGS_BITS = PackedInts.bitsRequired(POSITIONS | OFFSETS | 
PAYLOADS);
+
+  private final FieldInfos fieldInfos;
+  final FieldsIndex indexReader;
+  final IndexInput vectorsStream;
+  private final int version;
+  private final int packedIntsVersion;
+  private final CompressionMode compressionMode;
+  private final Decompressor decompressor;
+  private final int chunkSize;
+  private final int numDocs;
+  private boolean closed;
+  private final BlockPackedReaderIterator reader;
+  private final long numDirtyChunks; // number of incomplete compressed blocks 
written
+  private final long numDirtyDocs; // cumulative number of missing docs in 
incomplete chunks
+  private final long maxPointer; // end of the data section
+
+  // used by clone
+  private 
Lucene50CompressingTermVectorsReader(Lucene50CompressingTermVectorsReader 
reader) {
+this.fieldInfos = reader.fieldInfos;
+this.vectorsStream = reader.vectorsStream.clone();
+this.indexReader = reader.indexReader.clone();
+this.packedIntsVersion = reader.packedIntsVersion;
+this.compressionMode = reader.compressionMode;
+this.decompressor = reader.decompr

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2247: LUCENE-9476 Add getBulkPath API for the Taxonomy index

2021-02-09 Thread GitBox



mikemccand commented on a change in pull request #2247:
URL: https://github.com/apache/lucene-solr/pull/2247#discussion_r572845064



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##
@@ -31,7 +33,7 @@
 import org.apache.lucene.facet.taxonomy.ParallelTaxonomyArrays;
 import org.apache.lucene.facet.taxonomy.TaxonomyReader;
 import org.apache.lucene.index.BinaryDocValues;
-import org.apache.lucene.index.CorruptIndexException; // javadocs
+import org.apache.lucene.index.CorruptIndexException;

Review comment:
   Hmm, did we remove the `// javadocs` comment on purpose?

##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##
@@ -353,12 +349,137 @@ public FacetLabel getPath(int ordinal) throws 
IOException {
 }
 
 synchronized (categoryCache) {
-  categoryCache.put(catIDInteger, ret);
+  categoryCache.put(ordinal, ret);
 }
 
 return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+// TODO: can we use an int-based hash impl, such as IntToObjectMap,
+// wrapped as LRU?
+synchronized (categoryCache) {
+  return categoryCache.get(ordinal);
+}
+  }
+
+  private void checkOrdinalBounds(int ordinal, int indexReaderMaxDoc)
+  throws IllegalArgumentException {
+if (ordinal < 0 || ordinal >= indexReaderMaxDoc) {
+  throw new IllegalArgumentException(
+  "ordinal "
+  + ordinal
+  + " is out of the range of the indexReader "
+  + indexReader.toString());
+}
+  }
+
+  /**
+   * Returns an array of FacetLabels for a given array of ordinals.
+   *
+   * This API is generally faster than iteratively calling {@link 
#getPath(int)} over an array of
+   * ordinals. It uses the {@link #getPath(int)} method iteratively when it 
detects that the index
+   * was created using StoredFields (with no performance gains) and uses 
DocValues based iteration
+   * when the index is based on DocValues.
+   *
+   * @param ordinals Array of ordinals that are assigned to categories 
inserted into the taxonomy
+   * index
+   */
+  public FacetLabel[] getBulkPath(int... ordinals) throws IOException {
+ensureOpen();
+
+int ordinalsLength = ordinals.length;
+FacetLabel[] bulkPath = new FacetLabel[ordinalsLength];
+// remember the original positions of ordinals before they are sorted
+int originalPosition[] = new int[ordinalsLength];
+Arrays.setAll(originalPosition, IntUnaryOperator.identity());
+int indexReaderMaxDoc = indexReader.maxDoc();
+
+for (int i = 0; i < ordinalsLength; i++) {
+  // check whether the ordinal is valid before accessing the cache
+  checkOrdinalBounds(ordinals[i], indexReaderMaxDoc);
+  // check the cache before trying to find it in the index
+  FacetLabel ordinalPath = getPathFromCache(ordinals[i]);
+  if (ordinalPath != null) {
+bulkPath[i] = ordinalPath;
+  }
+}
+
+// parallel sort the ordinals and originalPosition array based on the 
values in the ordinals
+// array
+new InPlaceMergeSorter() {
+  @Override
+  protected void swap(int i, int j) {
+int x = ordinals[i];
+ordinals[i] = ordinals[j];
+ordinals[j] = x;
+
+x = originalPosition[i];
+originalPosition[i] = originalPosition[j];
+originalPosition[j] = x;
+  }
+  ;
+
+  @Override
+  public int compare(int i, int j) {
+return Integer.compare(ordinals[i], ordinals[j]);
+  }
+}.sort(0, ordinalsLength);
+
+int readerIndex;
+int leafReaderMaxDoc = 0;
+int leafReaderDocBase = 0;
+LeafReader leafReader;
+LeafReaderContext leafReaderContext;
+BinaryDocValues values = null;
+
+for (int i = 0; i < ordinalsLength; i++) {
+  if (bulkPath[originalPosition[i]] == null) {
+if (values == null || ordinals[i] >= leafReaderMaxDoc) {
+
+  readerIndex = ReaderUtil.subIndex(ordinals[i], indexReader.leaves());
+  leafReaderContext = indexReader.leaves().get(readerIndex);
+  leafReader = leafReaderContext.reader();
+  leafReaderMaxDoc = leafReader.maxDoc();
+  leafReaderDocBase = leafReaderContext.docBase;
+  values = leafReader.getBinaryDocValues(Consts.FULL);
+
+  // this check is only needed once to confirm that the index uses 
BinaryDocValues
+  boolean success = values.advanceExact(ordinals[i] - 
leafReaderDocBase);
+  if (success == false) {
+return getBulkPathForOlderIndexes(ordinals);

Review comment:
   Hmm, I'm confused -- wouldn't an older index have no `BinaryDocValues` 
field?  So, `values` would be null, and we should fallback then?
   
   This code should hit `NullPointerException` on an old index I think?  How 
come our backwards compatibility test didn't expose this?

##
File path: 
lucene/facet/src/

[jira] [Created] (LUCENE-9753) Hunspell: disallow compounds with parts present in dictionary space-separated

2021-02-09 Thread Peter Gromov (Jira)

Peter Gromov created LUCENE-9753:


 Summary: Hunspell: disallow compounds with parts present in 
dictionary space-separated
 Key: LUCENE-9753
 URL: https://issues.apache.org/jira/browse/LUCENE-9753
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9406) Make it simpler to track IndexWriter's events

2021-02-09 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281787#comment-17281787
 ] 

Michael McCandless commented on LUCENE-9406:


+1 for [~zacharymorn]'s proposed plan!

> Make it simpler to track IndexWriter's events
> -
>
> Key: LUCENE-9406
> URL: https://issues.apache.org/jira/browse/LUCENE-9406
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
>
> This is the second spinoff from a [controversial PR to add a new index-time 
> feature to Lucene to merge small segments during 
> commit|https://github.com/apache/lucene-solr/pull/1552].  That change can 
> substantially reduce the number of small index segments to search.
> In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving 
> the application a chance to track when {{IndexWriter}} kicked off merges 
> during commit, how many, how long it waited, how often it gave up waiting, 
> etc.
> Such telemetry from production usage is really helpful when tuning settings 
> like which merges (e.g. a size threshold) to attempt on commit, and how long 
> to wait during commit, etc.
> I am splitting out this issue to explore possible approaches to do this.  
> E.g. [~simonw] proposed using a statistics class instead, but if I understood 
> that correctly, I think that would put the role of aggregation inside 
> {{IndexWriter}}, which is not ideal.
> Many interesting events, e.g. how many merges are being requested, how large 
> are they, how long did they take to complete or fail, etc., can be gleaned by 
> wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}.  
> But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for 
> merges during commit), it would be very helpful to have some simple way to 
> track so applications can better tune.
> It is also possible to subclass {{IndexWriter}} and override key methods, but 
> I think that is inherently risky as {{IndexWriter}}'s protected methods are 
> not considered to be a stable API, and the synchronization used by 
> {{IndexWriter}} is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] donnerpeter opened a new pull request #2335: LUCENE-9753: Hunspell: disallow compounds with parts present in dicti…

2021-02-09 Thread GitBox



donnerpeter opened a new pull request #2335:
URL: https://github.com/apache/lucene-solr/pull/2335


   …onary, space-separated
   
   
   
   
   # Description
   
   Don't accept `compoundword` when there's `compound word` in the dictionary
   
   # Solution
   
   Like Hunspell, handle this near CHECKCOMPOUNDREP pattern check
   
   # Tests
   
   `wordpair` from Hunspell repo
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr-operator] krishnachalla-hv opened a new issue #213: Solr cloud provided zookeeper cluster in unhealthy state

2021-02-09 Thread GitBox



krishnachalla-hv opened a new issue #213:
URL: https://github.com/apache/lucene-solr-operator/issues/213


   I have installed **v0.2.8** version of the operator, Sometimes one of the 
provided zookeeper instance goes into unhealthy state. I have used default 
values initialize zookeeper instance that are provided in the documentation. 
Here is my configuration:
   
   ```
   apiVersion: v2
   name: test
   description: A Helm chart for intializing multi node solr cloud.
   appVersion: 1.0.0
   dependencies:
 - name: solr-operator
   version: 0.2.8
   repository: "https://apache.github.io/lucene-solr-operator/charts";
   condition: solr-operator.enabled
   
 - name: zookeeper-operator
   version: 0.3.0
   repository: "https://kubernetes-charts.banzaicloud.com";
   condition: zookeeper-operator.enabled
   
   ```
   And the solr cloud yaml file:
   
   ```
   apiVersion: solr.bloomberg.com/v1beta1
   kind: SolrCloud
   metadata:
 name: {{ .Release.Name }}
   spec:
 dataStorage:
   persistent:
 reclaimPolicy: Retain
 pvcTemplate:
   spec:
 resources:
   requests:
 storage: "5Gi"
 replicas: 2
 solrImage:
   tag: 8.7.0
 solrJavaMem: "-Xms1g -Xmx3g"
 customSolrKubeOptions:
   podOptions:
 resources:
   limits:
 memory: "1G"
   requests:
 cpu: "65m"
 memory: "156Mi"
 zookeeperRef:
   provided:
 chroot: "/solr"
 persistence:
   reclaimPolicy: Retain
   spec:
 resources:
   requests:
 storage: "5Gi"
 replicas: 3
 zookeeperPodPolicy:
   resources:
 limits:
   memory: "1G"
 requests:
   cpu: "65m"
   memory: "156Mi"
 solrOpts: "-Dsolr.autoSoftCommit.maxTime=1"
 solrGCTune: "-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 
-XX:MaxTenuringThreshold=8"
   ```
   
   After applying the above configuration always 2nd zk pod will be in 
unhealthy state.[ZK pod 
error.log](https://github.com/apache/lucene-solr-operator/files/5951776/ZK.pod.error.log)
 But the remaining 2 ZK pods will be functioning properly.
   ![zk unhealthy 
state](https://user-images.githubusercontent.com/60733731/107374588-c348b100-6b0d-11eb-9b41-005861a15c9e.png)
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija opened a new pull request #2336: SOLR-15101: Add list/delete APIs for incremental backups

2021-02-09 Thread GitBox



gerlowskija opened a new pull request #2336:
URL: https://github.com/apache/lucene-solr/pull/2336


   # Description
   
   SOLR-13608 introduced support into Solr for an "incremental" backup file 
structure, which allows storing multiple backup points for the same collection 
at a given location.  With the ability to store multiple backups at the same 
place, users will need to be able to list and cleanup these backups.
   
   # Solution
   
   This PR introduces two new APIs: one for listing the backups at a given 
location (along with associated metadata), and one to delete or cleanup these 
backups.  The APIs are offered in both v1 and v2 flavors.
   
   # Tests
   
   Manual testing, along with new automated tests in `PurgeGraphTest` 
(reference checking for detecting index files to delete), 
`V2CollectionBackupsAPIMappingTest` (v1<->v2 mapping), and 
`AbstractIncrementalBackupTest` (integration test for list, delete 
functionality).
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [x] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15101) Add list-backups and delete-backups APIs

2021-02-09 Thread Jason Gerlowski (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281802#comment-17281802
 ] 

Jason Gerlowski commented on SOLR-15101:


I've pushed up a PR for this.  I'm hoping to add some additional tests, but 
otherwise the code and docs should be ready to go.  I'll plan on merging this 
in 4-5 days or so, and backporting to branch_8x afterwards.

> Add list-backups and delete-backups APIs
> 
>
> Key: SOLR-15101
> URL: https://issues.apache.org/jira/browse/SOLR-15101
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The accepted SIP-12 outlines a plan for changing Solr's backup file structure 
> in a way that supports storing multiple backups within a single "location" 
> URI.  With this comes a need for APIs that can list out and delete backups 
> within that single location.
> SIP-12 has v1 and v2 API specs for these APIs.  This ticket covers 
> implementing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread David Eric Pugh (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281817#comment-17281817
 ] 

David Eric Pugh commented on LUCENE-9747:
-

I have Java 11.0.3+7:

 

➜ lucene-solr-epugh git:(SOLR-15121) ✗ java --version
openjdk 11.0.3 2019-04-16

 

And here is the stack trace (take 2):

 

{{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
{{javadoc: error - Please file a bug against the javadoc tool via the Java bug 
reporting page}}
{{(}}{{[http://bugreport.java.com|http://bugreport.java.com/]}}{{) after 
checking the Bug Database 
(}}{{[http://bugs.java.com|http://bugs.java.com/]}}{{)}}
{{for duplicates. Include error messages and the following diagnostic in your 
report. Thank you.}}
{{java.lang.NullPointerException}}
{{  at 
jdk.javadoc/jdk.javadoc.internal.tool.Messager.getDiagSource(Messager.java:206)}}
{{  at 
jdk.javadoc/jdk.javadoc.internal.tool.Messager.printError(Messager.java:234)}}
{{  at 
jdk.javadoc/jdk.javadoc.internal.tool.Messager.print(Messager.java:121)}}
{{  at 
org.apache.lucene.missingdoclet.MissingDoclet.error(MissingDoclet.java:434)}}
{{  at 
org.apache.lucene.missingdoclet.MissingDoclet.checkComment(MissingDoclet.java:309)}}
{{  at 
org.apache.lucene.missingdoclet.MissingDoclet.check(MissingDoclet.java:237)}}
{{  at 
org.apache.lucene.missingdoclet.MissingDoclet.run(MissingDoclet.java:205)}}
{{  at 
jdk.javadoc/jdk.javadoc.internal.tool.Start.parseAndExecute(Start.java:582)}}
{{  at jdk.javadoc/jdk.javadoc.internal.tool.Start.begin(Start.java:431)}}
{{  at jdk.javadoc/jdk.javadoc.internal.tool.Start.begin(Start.java:344)}}
{{  at jdk.javadoc/jdk.javadoc.internal.tool.Main.execute(Main.java:63)}}
{{  at jdk.javadoc/jdk.javadoc.internal.tool.Main.main(Main.java:52)}}

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jimczi commented on a change in pull request #2256: LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter

2021-02-09 Thread GitBox



jimczi commented on a change in pull request #2256:
URL: https://github.com/apache/lucene-solr/pull/2256#discussion_r572984156



##
File path: 
lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
##
@@ -39,33 +40,47 @@
 
   final IndexWriter writer;
   final SegmentInfos segmentInfos;
+  private final Comparator leafSorter;
   private final boolean applyAllDeletes;
   private final boolean writeAllDeletes;
 
-  /** called only from static open() methods */
+  /** package private constructor, called only from static open() methods. */
   StandardDirectoryReader(
   Directory directory,
   LeafReader[] readers,
   IndexWriter writer,
   SegmentInfos sis,
+  Comparator leafSorter,
   boolean applyAllDeletes,
   boolean writeAllDeletes)
   throws IOException {
-super(directory, readers);
+super(directory, sortLeaves(readers, leafSorter));

Review comment:
   I wonder if the leafSorter should be declared and executed in the base 
class `BaseCompositeReader` ?
   That would expose the feature explicitly to `MultiReader` and friends.

##
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##
@@ -56,7 +57,24 @@
* @throws IOException if there is a low-level IO error
*/
   public static DirectoryReader open(final Directory directory) throws 
IOException {
-return StandardDirectoryReader.open(directory, null);
+return StandardDirectoryReader.open(directory, null, null);
+  }
+
+  /**
+   * Returns a IndexReader for the the index in the given Directory
+   *
+   * @param directory the index directory
+   * @param leafSorter a comparator for sorting leaf readers. Providing 
leafSorter is useful for
+   * indices on which it is expected to run many queries with particular 
sort criteria (e.g. for
+   * time-based indices this is usually a descending sort on timestamp). 
In this case {@code
+   * leafSorter} should sort leaves according to this sort criteria. 
Providing leafSorter allows
+   * to speed up this particular type of sort queries by early terminating 
while iterating
+   * though segments and segments' documents.

Review comment:
   nit: s/though/through/

##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterReader.java
##
@@ -169,7 +176,7 @@ public void testUpdateDocument() throws Exception {
 // writer.close wrote a new commit
 assertFalse(r2.isCurrent());
 
-DirectoryReader r3 = DirectoryReader.open(dir1);
+DirectoryReader r3 = open(dir1);

Review comment:
   Can you keep the explicit version ?

##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##
@@ -478,6 +479,18 @@ public IndexWriterConfig setIndexSort(Sort sort) {
 return this;
   }
 
+  /**
+   * Set the comparator for sorting leaf readers. A DirectoryReader opened 
from a IndexWriter with
+   * this configuration will have its leaf readers sorted with the provided 
leaf sorter.
+   *
+   * @param leafSorter – a comparator for sorting leaf readers
+   * @return IndexWriterConfig with leafSorter set.
+   */
+  public IndexWriterConfig setLeafSorter(Comparator leafSorter) {

Review comment:
   You added a specific unit test for this feature but we could also set a 
random value in `LuceneTestCase#newIndexWriterConfig` to improve the coverage.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently

2021-02-09 Thread Trey Jones (Jira)

Trey Jones created LUCENE-9754:
--

 Summary: ICU Tokenizer: letter-space-number-letter tokenized 
inconsistently
 Key: LUCENE-9754
 URL: https://issues.apache.org/jira/browse/LUCENE-9754
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 7.5
 Environment: Tested most recently on Elasticsearch 6.5.4.
Reporter: Trey Jones


The tokenization of strings like _14th_ with the ICU tokenizer is affected by 
the character that comes before preceeding whitespace.

For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 14 
| th.

In general, in a letter-space-number-letter sequence, if the writing system 
before the space is the same as the writing system after the number, then you 
get two tokens. If the writing systems differ, you get three tokens.

If the conditions are just right, the chunking that the ICU tokenizer does 
(trying to split on spaces to create <4k chunks) can create an artificial 
boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the 
unexpected split of the second token (_14th_). Because chunking changes can 
ripple through a long document, editing text or the effects of a character 
filter can cause changes in tokenization thousands of lines later in a document.

My guess is that some "previous character set" flag is not reset at the space, 
and numbers are not in a character set, so _t_ is compared to _ァ_ and they are 
not the same—causing a token split at the character set change—but I'm not sure.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9673) The level of IntBlockPool slice is always 1

2021-02-09 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281862#comment-17281862
 ] 

Michael McCandless commented on LUCENE-9673:


I was curious if this impacted indexing throughput and ran `luceneutil` three 
times with this change (confusingly, named `trunkN.txt` below) and three times 
without this change (`baseN.log`):

```

[mike@beast3 trunk]$ grep "indexing done" /l/logs/trunk?.txt

/l/logs/trunk1.txt:Indexer: indexing done (89114 msec); total 27624170 docs

/l/logs/trunk2.txt:Indexer: indexing done (89974 msec); total 27624192 docs

/l/logs/trunk3.txt:Indexer: indexing done (90614 msec); total 27624409 docs

[mike@beast3 trunk]$ grep "indexing done" /l/logs/base?.log

/l/logs/base1.log:Indexer: indexing done (89271 msec); total 27623915 docs

/l/logs/base2.log:Indexer: indexing done (91676 msec); total 27624107 docs

/l/logs/base3.log:Indexer: indexing done (93120 msec); total 27624268 docs

```

Possibly a small speedup, but within the noise/variance of the test.  Plus, the 
precise doc count indexed changes each time, which is not right!  I opened 
[https://github.com/mikemccand/luceneutil/issues/106] to get to the bottom of 
that ...

> The level of IntBlockPool slice is always 1 
> 
>
> Key: LUCENE-9673
> URL: https://issues.apache.org/jira/browse/LUCENE-9673
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: mashudong
>Priority: Minor
> Attachments: LUCENE-9673.patch
>
>
> First slice is allocated by IntBlockPoo.newSlice(), and its level is 1,
>  
> {code:java}
> private int newSlice(final int size) {
>  if (intUpto > INT_BLOCK_SIZE-size) {
>  nextBuffer();
>  assert assertSliceBuffer(buffer);
>  }
>  
>  final int upto = intUpto;
>  intUpto += size;
>  buffer[intUpto-1] = 1;
>  return upto;
> }{code}
>  
>  
> If one slice is not enough, IntBlockPoo.allocSlice() is called to allocate 
> more slices,
> as the following code shows, level is 1, newLevel is NEXT_LEVEL_ARRAY[0] 
> which is also 1.
>  
> The result is the level of IntBlockPool slice is always 1, the first slice is 
>  2 bytes long, and all subsequent slices are 4 bytes long.
>  
> {code:java}
> private static final int[] NEXT_LEVEL_ARRAY = {1, 2, 3, 4, 5, 6, 7, 8, 9, 9};
> private int allocSlice(final int[] slice, final int sliceOffset) {
>  final int level = slice[sliceOffset];
>  final int newLevel = NEXT_LEVEL_ARRAY[level - 1];
>  final int newSize = LEVEL_SIZE_ARRAY[newLevel];
>  // Maybe allocate another block
>  if (intUpto > INT_BLOCK_SIZE - newSize) {
>  nextBuffer();
>  assert assertSliceBuffer(buffer);
>  }
> final int newUpto = intUpto;
>  final int offset = newUpto + intOffset;
>  intUpto += newSize;
>  // Write forwarding address at end of last slice:
>  slice[sliceOffset] = offset;
> // Write new level:
>  buffer[intUpto - 1] = newLevel;
> return newUpto;
>  } 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15133) Document how to eliminate Failed to reserve shared memory warning

2021-02-09 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281872#comment-17281872
 ] 

Mike Drob commented on SOLR-15133:
--

I think [https://shipilev.net/jvm/anatomy-quarks/2-transparent-huge-pages/] is 
a better explainer of what is going on, including the difference between 
{{UseLargePages}} and {{UseTransparentHugePages}}.

Keeping this enabled for heaps beyond 1G (which is most Solr heaps IME), 
appears to be beneficial, when the system supports it.

Now for the wrinkle... I believe both hugetlbfs and THP are reliant on kernel 
settings/parameters, and docker images don't have kernels themselves. MacOS 
doesn't support Large Pages 
([https://bugs.openjdk.java.net/browse/JDK-8233062)] which suggests that 
processes running in Docker for Mac wouldn't either. I don't know if this holds 
true for Windows/Linux as well, or if the docker engines there are able to 
delegate that memory management request.

> Document how to eliminate Failed to reserve shared memory warning
> -
>
> Key: SOLR-15133
> URL: https://issues.apache.org/jira/browse/SOLR-15133
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Docker, documentation
>Affects Versions: 8.7
>Reporter: David Eric Pugh
>Assignee: David Eric Pugh
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> inspired by a conversation on 
> [https://github.com/docker-solr/docker-solr/issues/273,] it would be good to 
> document how to get rid of shared memory warning in Docker setups.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9751) Assertion error (int overflow) in ByteSliceReader

2021-02-09 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281874#comment-17281874
 ] 

Michael McCandless commented on LUCENE-9751:


Thanks [~dweiss], clearly we have a bug here!  We need better testing of 
"large" indexing RAM buffers.

This is the assert that tripped I think:
{noformat}
  public void init(ByteBlockPool pool, int startIndex, int endIndex) {

    assert endIndex-startIndex >= 0; {noformat}
So I think most likely the {{endIndex}} overflowed int and became negative 
elsewhere.

We do know that our "best effort" will fail to catch indexing a gigantic 
document that pushes the indexing buffer over 2.1 GB.  We default to 1945 MB 
cutoff ({{IWC.setRAMPerThreadhardLimitMB}} can be used to change that), but if 
that one gigantic document takes the RAM usage from e.g. 1944 MB up beyond 2048 
MB then it can lead to exceptions like this.

> Assertion error (int overflow) in ByteSliceReader
> -
>
> Key: LUCENE-9751
> URL: https://issues.apache.org/jira/browse/LUCENE-9751
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.7
>Reporter: Dawid Weiss
>Priority: Major
>
> New computers come with insane amounts of ram and heaps can get pretty big. 
> If you adjust per-thread buffers to larger values strange things start 
> happening. This happened to us today:
> {code}
> Caused by: java.lang.AssertionError
>   at 
> org.apache.lucene.index.ByteSliceReader.init(ByteSliceReader.java:44) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.TermsHashPerField.initReader(TermsHashPerField.java:88)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxFields$FreqProxPostingsEnum.reset(FreqProxFields.java:430)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxFields$FreqProxTermsEnum.postings(FreqProxFields.java:247)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:127)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:264)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:394) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:440)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   ... 7 more
> {code}
> Likely an int overflow in TermsHashPerField:
> {code}
> reader.init(bytePool,
> 
> postingsArray.byteStarts[termID]+stream*ByteBlockPool.FIRST_LEVEL_SI

[jira] [Created] (LUCENE-9755) Index Segment without DocValues May Cause Search to Fail

2021-02-09 Thread Thomas Hecker (Jira)

Thomas Hecker created LUCENE-9755:
-

 Summary: Index Segment without DocValues May Cause Search to Fail
 Key: LUCENE-9755
 URL: https://issues.apache.org/jira/browse/LUCENE-9755
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 8.3.1, 8.x, 8.8
Reporter: Thomas Hecker
 Attachments: DocValuesTest.java

Not sure if this can be considered a bug, but it is certainly a caveat that may 
slip through testing due to its nature.

Consider the following scenario:
 * all documents in the index have a field "numfield" indexed as IntPoint
 * in addition, SOME of those documents are also indexed with a 
SortedNumericDocValuesField using the same "numfield" name

The documents without the DocValues cannot be matched from any queries that 
involve sorting, so we save some space by omitting the DocValues for those 
documents.

This works perfectly fine, unless
 * the index contains a segment that only contains documents without the 
DocValues

In this case, running a query that sorts by "numfield" will throw the following 
exception:
{noformat}
java.lang.IllegalStateException: unexpected docvalues type NONE for field 
'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct 
docvalues type.
   at org.apache.lucene.index.DocValues.checkField(DocValues.java:317)
   at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389)
   at 
org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159)
   at 
org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155){noformat}
I have included a minimal example program that demonstrates the issue. This will
 * create an index with two documents, each having "numfield" indexed
 * add a DocValuesField "numfield" only for the first document
 * force the two documents into separate index segments
 * run a query that matches only the first document and sorts by "numfield"

This results in the aforementioned exception.

When removing the following lines from the code:
{code:java}
if (i==docCount/2) {
  iw.commit();
}
{code}
both documents get added to the same segment. When re-running the code creating 
with a single index segment, the query works fine.

Tested with Lucene 8.3.1 and 8.8.0  .

Like I said, this may not be considered a bug. But it has slipped through our 
testing because the existence of such a DocValues-free segment is such a rare 
and short-lived event.

We can avoid this issue in the future by using a different field name for the 
DocValuesField. But for our production systems we have to patch 
DocValues.checkField() to suppress the IllegalStateException as reindexing is 
not an option right now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] anshumg commented on a change in pull request #2328: SOLR-15145: System property to control whether base_url is stored in state.json to enable back-compat with older SolrJ versi

2021-02-09 Thread GitBox



anshumg commented on a change in pull request #2328:
URL: https://github.com/apache/lucene-solr/pull/2328#discussion_r573145918



##
File path: solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
##
@@ -118,14 +120,9 @@ public static ZkNodeProps load(byte[] bytes) {
   @Override
   public void write(JSONWriter jsonWriter) {
 // don't write out the base_url if we have a node_name

Review comment:
   Perhaps also add to the comment about the `STORE_BASE_URL` flag ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr-operator] madrob opened a new issue #214: extensions/v1beta1 Ingress is deprecated

2021-02-09 Thread GitBox



madrob opened a new issue #214:
URL: https://github.com/apache/lucene-solr-operator/issues/214


   After deploying a cluster using this operator, I wanted to get the Ingress, 
but the one we currently use is deprecated.
   
   ```$ k get ing
   Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in 
v1.22+; use networking.k8s.io/v1 Ingress```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #2318: SOLR-15138: PerReplicaStates does not scale to large collections as well as state.json

2021-02-09 Thread GitBox



madrob commented on a change in pull request #2318:
URL: https://github.com/apache/lucene-solr/pull/2318#discussion_r573116634



##
File path: 
solr/solrj/src/java/org/apache/solr/common/cloud/PerReplicaStates.java
##
@@ -92,6 +94,17 @@ public PerReplicaStates(String path, int cversion, 
List states) {
 
   }
 
+  /** Check and return if all replicas are ACTIVE
+   */
+  public boolean allActive() {
+if (this.allActive != null) return allActive;
+boolean[] result = new boolean[]{true};

Review comment:
   Agree.

##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java
##
@@ -264,8 +264,11 @@ public void call(ClusterState clusterState, ZkNodeProps 
message, @SuppressWarnin
 log.info("Cleaned up artifacts for failed create collection for [{}]", 
collectionName);
 throw new SolrException(ErrorCode.BAD_REQUEST, "Underlying core 
creation failed while creating collection: " + collectionName);
   } else {
+//we want to wait till all the replicas are ACTIVE for PRS collections 
because
+  ocmh.zkStateReader.waitForState(collectionName, 30, 
TimeUnit.SECONDS, (liveNodes, c) ->
+  c.getPerReplicaStates() == null || // this is not a PRS 
collection

Review comment:
   I agree with Ilan here, let's skip the extra watcher and call to ZK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15138) PerReplicaStates does not scale to large collections as well as state.json

2021-02-09 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281996#comment-17281996
 ] 

Mike Drob commented on SOLR-15138:
--

Added a couple comments where I agreed with Ilan's review.

 

More generally, I don't know if that is the right place to be blocking? Why not 
in AddReplicaCmd where we already have a check for user provided 
\{{waitForFinalState}} parameter. Similarly, do we need to consider PRS state 
in MoveReplicaCmd (or maybe other places?)

> PerReplicaStates does not scale to large collections as well as state.json
> --
>
> Key: SOLR-15138
> URL: https://issues.apache.org/jira/browse/SOLR-15138
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.8
>Reporter: Mike Drob
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I was testing PRS collection creation with larger collections today 
> (previously I had tested with many small collections) and it seemed to be 
> having trouble keeping up.
>  
> I was running a 4 node instance, each JVM with 4G Heap in k8s, and a single 
> zookeeper.
>  
> With this cluster configuration, I am able to create several (at least 10) 
> collections with 11 shards and 11 replicas using the "old way" of keeping 
> state. These collections are created serially, waiting for all replicas to be 
> active before proceeding.
> However, when attempting to do the same with PRS, the creation stalls on 
> collection 2 or 3, with several replicas stuck in a "down" state. Further, 
> when attempting to delete these collections using the regular API it 
> sometimes takes several attempts after getting stuck a few times as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282011#comment-17282011
 ] 

Dawid Weiss commented on LUCENE-9747:
-

Indeed, I can reproduce this with JDK11 too.

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282012#comment-17282012
 ] 

Dawid Weiss commented on LUCENE-9747:
-

https://bugs.openjdk.java.net/browse/JDK-8224082

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15129) Use the Solr TGZ artifact as Docker context

2021-02-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282025#comment-17282025
 ] 

David Smiley commented on SOLR-15129:
-

See my comment [on PR 
1769|https://github.com/apache/lucene-solr/pull/1769#issuecomment-729210262]; I 
will copy:
bq. Wouldn't it be simpler for the release manager to build the docker image, 
examine the sha256 hash of the image, and publish that to the download 
location, making it official? Someone who wants to use the official Solr docker 
image who is ultra-paranoid can reference the image by hash like so:
bq. 
bq. docker run --rm 
solr@sha256:02fe5f1ac04c28291fba23a18cd8765dd62c7a98538f07f2f7d8504ba217284d
bq. That runs Solr 8.7, the official one. It's compact and can even be 
broadcasted easily in the release announcement for future Solr releases for 
people to get and run the latest release immediately, and be assured it's the 
correct one.
bq. 
bq. I wonder what other major Apache projects do.

CC [~janhoy]

RE asking official images folks -- thanks for the reminder

> Use the Solr TGZ artifact as Docker context
> ---
>
> Key: SOLR-15129
> URL: https://issues.apache.org/jira/browse/SOLR-15129
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Houston Putman
>Priority: Major
>
> As discussed in SOLR-15127, there is a need for a unified Dockerfile that 
> allows for release and local builds.
> This ticket is an attempt to achieve this by using the Solr distribution TGZ 
> as the docker context to build from.
> Therefore release images would be completely reproducible by running:
> {{docker build -f solr-9.0.0/Dockerfile 
> https://www.apache.org/dyn/closer.lua/lucene/solr/9.0.0/solr-9.0.0.tgz}}
> The changes to the Solr distribution would include adding a Dockerfile at 
> {{solr-/Dockerfile}}, adding the docker scripts under 
> {{solr-/docker}}, and adding a version file at 
> {{solr-/VERSION.txt}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread David Eric Pugh (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282028#comment-17282028
 ] 

David Eric Pugh commented on LUCENE-9747:
-

:P

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss opened a new pull request #2337: LUCENE-9747: dodge javadoc reporter NPE bug on Java 11.

2021-02-09 Thread GitBox



dweiss opened a new pull request #2337:
URL: https://github.com/apache/lucene-solr/pull/2337


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282029#comment-17282029
 ] 

Dawid Weiss commented on LUCENE-9747:
-

Filed a slightly smaller PR. Passes for me.

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread David Eric Pugh (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282031#comment-17282031
 ] 

David Eric Pugh commented on LUCENE-9747:
-

LGTM.

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss merged pull request #2337: LUCENE-9747: dodge javadoc reporter NPE bug on Java 11.

2021-02-09 Thread GitBox



dweiss merged pull request #2337:
URL: https://github.com/apache/lucene-solr/pull/2337


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282032#comment-17282032
 ] 

Dawid Weiss commented on LUCENE-9747:
-

Thanks for reporting!

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9747.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-9747:
---

Assignee: Dawid Weiss

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282033#comment-17282033
 ] 

ASF subversion and git services commented on LUCENE-9747:
-

Commit 1f5b37f299206b0d82d2105a0472b417898fc29f in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1f5b37f ]

LUCENE-9747: dodge javadoc reporter NPE bug on Java 11. (#2337)



> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15138) PerReplicaStates does not scale to large collections as well as state.json

2021-02-09 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282040#comment-17282040
 ] 

Mike Drob commented on SOLR-15138:
--

This patch is an improvement over what we had previously, but I don't think it 
takes care of the situation completely. I was able to create 5 collections in 
my cluster, but #6 timed out. Although interestingly, #7 created just fine.

Maybe there's a race condition somewhere then, if it's not related to the 
amount of existing or outstanding watches when subsequent collections continue 
to create.

> PerReplicaStates does not scale to large collections as well as state.json
> --
>
> Key: SOLR-15138
> URL: https://issues.apache.org/jira/browse/SOLR-15138
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.8
>Reporter: Mike Drob
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I was testing PRS collection creation with larger collections today 
> (previously I had tested with many small collections) and it seemed to be 
> having trouble keeping up.
>  
> I was running a 4 node instance, each JVM with 4G Heap in k8s, and a single 
> zookeeper.
>  
> With this cluster configuration, I am able to create several (at least 10) 
> collections with 11 shards and 11 replicas using the "old way" of keeping 
> state. These collections are created serially, waiting for all replicas to be 
> active before proceeding.
> However, when attempting to do the same with PRS, the creation stalls on 
> collection 2 or 3, with several replicas stuck in a "down" state. Further, 
> when attempting to delete these collections using the regular API it 
> sometimes takes several attempts after getting stuck a few times as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15138) PerReplicaStates does not scale to large collections as well as state.json

2021-02-09 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282042#comment-17282042
 ] 

Mike Drob commented on SOLR-15138:
--

For the shards which did not come up, I noticed that not all replicas had 
registered for leader election, and that there was no leader present for that 
shard. Maybe our creations timeouts need to take into account leaderVoteWait?

> PerReplicaStates does not scale to large collections as well as state.json
> --
>
> Key: SOLR-15138
> URL: https://issues.apache.org/jira/browse/SOLR-15138
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.8
>Reporter: Mike Drob
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I was testing PRS collection creation with larger collections today 
> (previously I had tested with many small collections) and it seemed to be 
> having trouble keeping up.
>  
> I was running a 4 node instance, each JVM with 4G Heap in k8s, and a single 
> zookeeper.
>  
> With this cluster configuration, I am able to create several (at least 10) 
> collections with 11 shards and 11 replicas using the "old way" of keeping 
> state. These collections are created serially, waiting for all replicas to be 
> active before proceeding.
> However, when attempting to do the same with PRS, the creation stalls on 
> collection 2 or 3, with several replicas stuck in a "down" state. Further, 
> when attempting to delete these collections using the regular API it 
> sometimes takes several attempts after getting stuck a few times as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

2021-02-09 Thread GitBox



epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-776276696


   Okay, I've done a bunch of tweaking (banging my head?) against the ref guide 
docs, and they are working, and I think all the tests are passing.
   
   I don't like that the SolrJ tests depend on the sample_techproducts_configs 
directory, but I think adding some `startup=lazy` settings for the XSLT classes 
means that the links in the ref guide will work, and the solrj tests don't blow 
up on the missing xslt classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI

2021-02-09 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282082#comment-17282082
 ] 

Jan Høydahl commented on SOLR-14561:


Did you try allowPaths? 
https://lucene.apache.org/solr/guide/8_6/solr-upgrade-notes.html

> Validate parameters to CoreAdminAPI
> ---
>
> Key: SOLR-14561
> URL: https://issues.apache.org/jira/browse/SOLR-14561
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> CoreAdminAPI does not validate parameter input. We should limit what users 
> can specify for at least {{instanceDir and dataDir}} params, perhaps restrict 
> them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude commented on pull request #2328: SOLR-15145: System property to control whether base_url is stored in state.json to enable back-compat with older SolrJ versions

2021-02-09 Thread GitBox



thelabdude commented on pull request #2328:
URL: https://github.com/apache/lucene-solr/pull/2328#issuecomment-776287188


   I ran a back-compat test with a client app built with SolrJ 8.7.0 and a 
server from this branch and it works as expected.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15148) Include a backcompat utility class with a main in solrj that we can use to test older SolrJ against a release candidate

2021-02-09 Thread Timothy Potter (Jira)

Timothy Potter created SOLR-15148:
-

 Summary: Include a backcompat utility class with a main in solrj 
that we can use to test older SolrJ against a release candidate
 Key: SOLR-15148
 URL: https://issues.apache.org/jira/browse/SOLR-15148
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrJ
Reporter: Timothy Potter


Changes to SolrJ in 8.8.0 (SOLR-12182) broke backcompat (fix is SOLR-15145) 
should have been caught during RC smoke testing.

A simple utility class that we can run during RC smoke testing to catch 
back-compat breaks like this would be useful. To keep things simple, the smoke 
tester can download the previous version of SolrJ from Maven central and invoke 
this Backcompat app (embedded in the SolrJ JAR) against the new Solr server in 
the RC.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15011) /admin/logging handler should be able to configure logs on all nodes

2021-02-09 Thread Chris M. Hostetter (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282092#comment-17282092
 ] 

Chris M. Hostetter commented on SOLR-15011:
---

BadApple'ing the test just prevents it from being a noisy failure on jenkins 
... BadApple tests are still run on developer boxes by default, so this test is 
still causing lots of failures. 
{quote}I tried something else that occurred to me... I merely commented out the 
substance of the issue (LoggingHandler calling into AdminHandlersProxy) and... 
the test still passed. I'm not surprised; this is an embedded test and thus all 
nodes share the same logging state. Hmm. I wonder if we can't realistically 
test this until we have Docker based test infrastructure with fully isolated 
Solr nodes.
{quote}
If that's the case then i would suggest the test just be deleted – or 
explicitly @AwaitsFixed – but if it's going to stick around in a disabled state 
it should probably point at a new Jira to track if/how/when we might be able to 
adequately test it, so that the current Jira can be re-resolved and correctly 
track when this functionality was added.

 

 

> /admin/logging handler should be able to configure logs on all nodes
> 
>
> Key: SOLR-15011
> URL: https://issues.apache.org/jira/browse/SOLR-15011
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: logging
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The LoggingHandler registered at /admin/logging can configure log levels for 
> the current node.  This is nice but in SolrCloud, what's needed is an ability 
> to change the level for _all_ nodes in the cluster.  I propose that this be a 
> parameter name "distrib" defaulting to SolrCloud mode's status.  An admin UI 
> could have a checkbox for it.  I don't propose that the read operations be 
> changed -- they can continue to just look at the node you are hitting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9756) Extend FieldInfosFormat tests to cover points and vectors

2021-02-09 Thread Julie Tibshirani (Jira)

Julie Tibshirani created LUCENE-9756:


 Summary: Extend FieldInfosFormat tests to cover points and vectors
 Key: LUCENE-9756
 URL: https://issues.apache.org/jira/browse/LUCENE-9756
 Project: Lucene - Core
  Issue Type: Test
Reporter: Julie Tibshirani


Currently {{BaseFieldInfoFormatTestCase}} doesn't exercise points, vectors, or 
the soft deletes field. We should make sure the test covers these options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani commented on pull request #2269: LUCENE-9322: Add TestLucene90FieldInfosFormat

2021-02-09 Thread GitBox



jtibshirani commented on pull request #2269:
URL: https://github.com/apache/lucene-solr/pull/2269#issuecomment-776361691


   I opened https://issues.apache.org/jira/browse/LUCENE-9756 to add tests for 
points and vectors.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani opened a new pull request #2338: LUCENE-9756: Extend FieldInfosFormat tests to cover points and vectors

2021-02-09 Thread GitBox



jtibshirani opened a new pull request #2338:
URL: https://github.com/apache/lucene-solr/pull/2338


   This commit adds coverage to `BaseFieldInfoFormatTestCase ` for points,
   vectors, and the soft deletes field.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani opened a new pull request #2339: LUCENE-9705: Reset internal version in Lucene90FieldInfosFormat.

2021-02-09 Thread GitBox



jtibshirani opened a new pull request #2339:
URL: https://github.com/apache/lucene-solr/pull/2339


   Since this is a fresh format, we can remove older version logic and reset the
   internal version to 0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9747) Missing package-info.java causes NPE in MissingDoclet.java

2021-02-09 Thread Robert Muir (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282163#comment-17282163
 ] 

Robert Muir commented on LUCENE-9747:
-

whoah, nice work [~epugh] [~dweiss]. thanks for debugging through it!

> Missing package-info.java causes NPE in MissingDoclet.java
> --
>
> Key: LUCENE-9747
> URL: https://issues.apache.org/jira/browse/LUCENE-9747
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: David Eric Pugh
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
> Attachments: LUCENE-9747.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When running {{./gradlew :solr:core:javadoc}} discovered that if a package 
> directory is missing the {{package-info.java}} file you get a VERY cryptic 
> error:
>  
> {{javadoc: error - fatal error encountered: java.lang.NullPointerException}}
> {{javadoc: error - Please file a bug against the javadoc tool via the Java 
> bug reporting page}}
>  
> I poked around and that the {{MissingDoclet.java}} method call to 
> \{{reporter.print(Diagnostic.Kind.ERROR, element, fullMessage.toString());}} 
> was failing, due to the element having some sort of null in it.  I am 
> attaching a patch and a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

99 matches

Mail list logo