[GitHub] [lucene] zacharymorn commented on pull request #240: LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager)

2021-09-10 Thread GitBox


zacharymorn commented on pull request #240:
URL: https://github.com/apache/lucene/pull/240#issuecomment-916729951


   Hi @jpountz @gsmiller , just want to check back on this PR to see if you 
have any further feedback?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10094) QueryCache wrapper Weight doesn't delegate its count() method

2021-09-10 Thread Alan Woodward (Jira)
Alan Woodward created LUCENE-10094:
--

 Summary: QueryCache wrapper Weight doesn't delegate its count() 
method
 Key: LUCENE-10094
 URL: https://issues.apache.org/jira/browse/LUCENE-10094
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: main (9.0)
Reporter: Alan Woodward
Assignee: Alan Woodward


This means that cached queries will always use the slow path to calculate 
search counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek opened a new pull request #289: LUCENE-10094: Delegate count() from CachingWrapperWeight

2021-09-10 Thread GitBox


romseygeek opened a new pull request #289:
URL: https://github.com/apache/lucene/pull/289


   CachingWrapperWeight always returns -1 from its `count()` method, which
   disables the fast path for TermQuery, MatchAllDocQuery, etc, when running
   `IndexSearcher.count(Query)`.  This commit makes it delegate the method
   to its wrapped Weight.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek merged pull request #289: LUCENE-10094: Delegate count() from CachingWrapperWeight

2021-09-10 Thread GitBox


romseygeek merged pull request #289:
URL: https://github.com/apache/lucene/pull/289


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10094) QueryCache wrapper Weight doesn't delegate its count() method

2021-09-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413075#comment-17413075
 ] 

ASF subversion and git services commented on LUCENE-10094:
--

Commit 1bb52859c88193e1e1674d6c360d772c20d1c2ea in lucene's branch 
refs/heads/main from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1bb5285 ]

LUCENE-10094: Delegate count() from CachingWrapperWeight (#289)

CachingWrapperWeight always returns -1 from its count() method, which
disables the fast path for TermQuery, MatchAllDocQuery, etc, when running
IndexSearcher.count(Query). This commit makes it delegate the method
to its wrapped Weight.

> QueryCache wrapper Weight doesn't delegate its count() method
> -
>
> Key: LUCENE-10094
> URL: https://issues.apache.org/jira/browse/LUCENE-10094
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: main (9.0)
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This means that cached queries will always use the slow path to calculate 
> search counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10094) QueryCache wrapper Weight doesn't delegate its count() method

2021-09-10 Thread Alan Woodward (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-10094.

Resolution: Fixed

> QueryCache wrapper Weight doesn't delegate its count() method
> -
>
> Key: LUCENE-10094
> URL: https://issues.apache.org/jira/browse/LUCENE-10094
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: main (9.0)
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This means that cached queries will always use the slow path to calculate 
> search counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10088) Too many open files in TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader

2021-09-10 Thread Simon Willnauer (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413103#comment-17413103
 ] 

Simon Willnauer commented on LUCENE-10088:
--

Oh  don’t get me wrong Adrien, we do that. We have that protection in place. If 
we use 2x the configured ram buffer for flushing we stall. The ram buffer is 
not an upperbound the iw will use but rather a marker that triggers a flush. 
Yet, we still use a function of it to make sure we don’t go OOM when flushes 
are slow.



> Too many open files in 
> TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader
> --
>
> Key: LUCENE-10088
> URL: https://issues.apache.org/jira/browse/LUCENE-10088
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>
> [This build 
> failure|https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/386/]
>  reproduces for me.  I'll try to dig.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2570: SOLR-15621: index.html for Admin UI should send Solr version in the request for JavaScript files

2021-09-10 Thread GitBox


thelabdude merged pull request #2570:
URL: https://github.com/apache/lucene-solr/pull/2570


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude opened a new pull request #2571: SOLR-15621: index.html for Admin UI should send Solr version in the request for JavaScript files

2021-09-10 Thread GitBox


thelabdude opened a new pull request #2571:
URL: https://github.com/apache/lucene-solr/pull/2571


   backport of #2570 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2571: SOLR-15621: index.html for Admin UI should send Solr version in the request for JavaScript files

2021-09-10 Thread GitBox


thelabdude merged pull request #2571:
URL: https://github.com/apache/lucene-solr/pull/2571


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10095) Nepali Analyzer

2021-09-10 Thread Robert Muir (Jira)
Robert Muir created LUCENE-10095:


 Summary: Nepali Analyzer
 Key: LUCENE-10095
 URL: https://issues.apache.org/jira/browse/LUCENE-10095
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


The snowball2 upgrade in our main branch added a Nepali Stemmer.

Let's "shrink-wrap" this into an Analyzer: add stopwords, normalization, tests, 
etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10095) Nepali Analyzer

2021-09-10 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-10095:
-
Fix Version/s: main (9.0)

> Nepali Analyzer
> ---
>
> Key: LUCENE-10095
> URL: https://issues.apache.org/jira/browse/LUCENE-10095
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Fix For: main (9.0)
>
>
> The snowball2 upgrade in our main branch added a Nepali Stemmer.
> Let's "shrink-wrap" this into an Analyzer: add stopwords, normalization, 
> tests, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request #290: LUCENE-10095: Nepali Analyzer

2021-09-10 Thread GitBox


rmuir opened a new pull request #290:
URL: https://github.com/apache/lucene/pull/290


   Add Nepali analyzer based on snowball stemmer and NLTK stopwords.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10094) QueryCache wrapper Weight doesn't delegate its count() method

2021-09-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413254#comment-17413254
 ] 

ASF subversion and git services commented on LUCENE-10094:
--

Commit cc8c4283dd0b299d5b6a64ced0fb1b9acbc3bb30 in lucene's branch 
refs/heads/main from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=cc8c428 ]

LUCENE-10094: Fix test bug


> QueryCache wrapper Weight doesn't delegate its count() method
> -
>
> Key: LUCENE-10094
> URL: https://issues.apache.org/jira/browse/LUCENE-10094
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: main (9.0)
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This means that cached queries will always use the slow path to calculate 
> search counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jimczi opened a new pull request #291: LUCENE-10089: Disable numeric sort optimization early

2021-09-10 Thread GitBox


jimczi opened a new pull request #291:
URL: https://github.com/apache/lucene/pull/291


   This commit moves the responsibility to disable
   the numeric sort optimization on comparators to the SortField.
   This way we don't need to apply the logic on every top field collectors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10096) Tamil Analyzer

2021-09-10 Thread Robert Muir (Jira)
Robert Muir created LUCENE-10096:


 Summary: Tamil Analyzer
 Key: LUCENE-10096
 URL: https://issues.apache.org/jira/browse/LUCENE-10096
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


Similar to LUCENE-10095, let's "shrink-wrap" the new snowball stemmer into a 
proper Analyzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10097) Replace TreeMap use by HashMap when unnecessary

2021-09-10 Thread Bruno Roustant (Jira)
Bruno Roustant created LUCENE-10097:
---

 Summary: Replace TreeMap use by HashMap when unnecessary
 Key: LUCENE-10097
 URL: https://issues.apache.org/jira/browse/LUCENE-10097
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Bruno Roustant
Assignee: Bruno Roustant


There are a couple of places where TreeMap is used although it could easily be 
replaced by a HashMap with potentially a single sort. Sometimes it would bring 
perf improvement (e.g. when TreeMap.entrySet() is called), other times it's 
more for consistency to use a simpler HashMap if there is no strong need for a 
TreeMap.

I saw other places where we have TODOs to see whether we can replace the 
TreeMap, but when it is more complex, I'll prefer to open separate Jira issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request #292: LUCENE-10096: Tamil Analyzer

2021-09-10 Thread GitBox


rmuir opened a new pull request #292:
URL: https://github.com/apache/lucene/pull/292


   Add Tamil analyzer based on snowball stemmer and TamilNLP stopwords
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #292: LUCENE-10096: Tamil Analyzer

2021-09-10 Thread GitBox


rmuir commented on pull request #292:
URL: https://github.com/apache/lucene/pull/292#issuecomment-917034582


   > Thanks @rmuir! The change looks great to me, but I do not speak nor read 
Tamil :)
   > 
   > The TamilNLP stop words were reasonably licensed?
   
   yes apache 2. see https://github.com/AshokR/TamilNLP


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10097) Replace TreeMap use by HashMap when unnecessary

2021-09-10 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413275#comment-17413275
 ] 

Robert Muir commented on LUCENE-10097:
--

Note: apart from ordering, in some cases this is done intentionally to save 
memory (e.g. per-segment maps that might have lots of fields comes to mind). 
but, IMO in such cases we should add a comment/explanation as to why TreeMap is 
being used.

> Replace TreeMap use by HashMap when unnecessary
> ---
>
> Key: LUCENE-10097
> URL: https://issues.apache.org/jira/browse/LUCENE-10097
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>
> There are a couple of places where TreeMap is used although it could easily 
> be replaced by a HashMap with potentially a single sort. Sometimes it would 
> bring perf improvement (e.g. when TreeMap.entrySet() is called), other times 
> it's more for consistency to use a simpler HashMap if there is no strong need 
> for a TreeMap.
> I saw other places where we have TODOs to see whether we can replace the 
> TreeMap, but when it is more complex, I'll prefer to open separate Jira 
> issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on pull request #2567: LUCENE-9662: CheckIndex should be concurrent (backporting)

2021-09-10 Thread GitBox


mikemccand commented on pull request #2567:
URL: https://github.com/apache/lucene-solr/pull/2567#issuecomment-917040356


   > Hi @mikemccand, I've added the usage instruction commit to this PR and 
merged. Thanks again for your review and approval!
   
   Woot!  Thank you @zacharymorn!  What an exciting improvement :)  Gives us a 
reason to make an eventual 8.11!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a change in pull request #288: LUCENE 10080 Added FixedBitSet for one counts when counting taxonomy facet labels

2021-09-10 Thread GitBox


mikemccand commented on a change in pull request #288:
URL: https://github.com/apache/lucene/pull/288#discussion_r706319030



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##
@@ -253,4 +257,78 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
 
 return new FacetResult(dim, path, totValue, labelValues, childCount);
   }
+
+  /**
+   * Class that uses FixedBitSet to store counts for all ordinals with 1 count 
and IntIntHashMap for
+   * all other counts
+   */
+  private static class IntIntHashMapWithFixedBitSet implements 
Iterable {
+// if the key exists, fixedBitSet[key] will be true, if fixedBitSet[key] 
is true but the key in
+// intIntHashMap
+// does not exist, then the value is 1
+private final FixedBitSet fixedBitSet;
+private final IntIntHashMap intIntHashMap;
+
+IntIntHashMapWithFixedBitSet(int numCategories) {
+  fixedBitSet = new FixedBitSet(numCategories);
+  intIntHashMap = new IntIntHashMap();
+}
+
+public int addTo(int key, int incrementValue) {
+  if (!fixedBitSet.getAndSet(key) && incrementValue == 1) {
+return 1;
+  }
+  int currentValue = intIntHashMap.addTo(key, incrementValue);
+  if (currentValue == 1) {
+intIntHashMap.remove(key);

Review comment:
   I'm pretty sure it must always be `> 0` -- maybe add an `assert`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on a change in pull request #288: LUCENE 10080 Added FixedBitSet for one counts when counting taxonomy facet labels

2021-09-10 Thread GitBox


mdmarshmallow commented on a change in pull request #288:
URL: https://github.com/apache/lucene/pull/288#discussion_r706366414



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##
@@ -253,4 +257,78 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
 
 return new FacetResult(dim, path, totValue, labelValues, childCount);
   }
+
+  /**
+   * Class that uses FixedBitSet to store counts for all ordinals with 1 count 
and IntIntHashMap for
+   * all other counts
+   */
+  private static class IntIntHashMapWithFixedBitSet implements 
Iterable {
+// if the key exists, fixedBitSet[key] will be true, if fixedBitSet[key] 
is true but the key in
+// intIntHashMap
+// does not exist, then the value is 1
+private final FixedBitSet fixedBitSet;
+private final IntIntHashMap intIntHashMap;
+
+IntIntHashMapWithFixedBitSet(int numCategories) {
+  fixedBitSet = new FixedBitSet(numCategories);
+  intIntHashMap = new IntIntHashMap();
+}
+
+public int addTo(int key, int incrementValue) {
+  if (!fixedBitSet.getAndSet(key) && incrementValue == 1) {
+return 1;
+  }
+  int currentValue = intIntHashMap.addTo(key, incrementValue);
+  if (currentValue == 1) {
+intIntHashMap.remove(key);

Review comment:
   I posted a new commit which included that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] goankur opened a new pull request #293: Lucene-10070: Skip deleted documents during facet counting for all do…

2021-09-10 Thread GitBox


goankur opened a new pull request #293:
URL: https://github.com/apache/lucene/pull/293


   
   
   
   # Description
   
   Following `Facets` implementations are changed to ignore deleted documents 
for `count all` queries that don't use `FacetsCollector` instance.
 - SortedSetDocValueFacetCounts
 - ConcurrentSortedSetDocValueFacetCounts
 - LongValueFacetCounts
 - StringValueFacetCounts 
   
   # Solution
   
   Ignore deleted documents during docValues iteration to calculate facet label 
counts
   
   # Tests
   
   Following test methods have been copied from pull request
https://github.com/apache/lucene/pull/263/files (Thanks Greg Miller)
 - TestLongValueFacetCounts.testCountAll() 
 - TestStringValueFacetCount.testCountAll()
 - TestSortedSetDocValuesFacets.testCountAll()
 
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `main` branch.
   - [X] I have run `./gradlew check`.
   - [X] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #290: LUCENE-10095: Nepali Analyzer

2021-09-10 Thread GitBox


rmuir merged pull request #290:
URL: https://github.com/apache/lucene/pull/290


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10095) Nepali Analyzer

2021-09-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413447#comment-17413447
 ] 

ASF subversion and git services commented on LUCENE-10095:
--

Commit 8bce7652188a2ab6c167bafc7bfb4ff59bce93ec in lucene's branch 
refs/heads/main from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8bce765 ]

LUCENE-10095: Nepali Analyzer (#290)

Add Nepali analyzer based on snowball stemmer and NLTK stopwords

> Nepali Analyzer
> ---
>
> Key: LUCENE-10095
> URL: https://issues.apache.org/jira/browse/LUCENE-10095
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The snowball2 upgrade in our main branch added a Nepali Stemmer.
> Let's "shrink-wrap" this into an Analyzer: add stopwords, normalization, 
> tests, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10095) Nepali Analyzer

2021-09-10 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-10095.
--
Resolution: Fixed

> Nepali Analyzer
> ---
>
> Key: LUCENE-10095
> URL: https://issues.apache.org/jira/browse/LUCENE-10095
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The snowball2 upgrade in our main branch added a Nepali Stemmer.
> Let's "shrink-wrap" this into an Analyzer: add stopwords, normalization, 
> tests, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] goankur commented on a change in pull request #282: Lucene-10070

2021-09-10 Thread GitBox


goankur commented on a change in pull request #282:
URL: https://github.com/apache/lucene/pull/282#discussion_r706532667



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/ConcurrentSortedSetDocValuesFacetCounts.java
##
@@ -159,13 +160,15 @@ private FacetResult getDim(String dim, OrdRange ordRange, 
int topN) throws IOExc
 final MatchingDocs hits;
 final OrdinalMap ordinalMap;
 final int segOrd;
+final Bits liveDocs;
 
 public CountOneSegment(
 LeafReader leafReader, MatchingDocs hits, OrdinalMap ordinalMap, int 
segOrd) {
   this.leafReader = leafReader;
   this.hits = hits;
   this.ordinalMap = ordinalMap;
   this.segOrd = segOrd;
+  this.liveDocs = (leafReader != null) ? leafReader.getLiveDocs() : null;

Review comment:
   Yes that makes sense. I made the change in the new revision captured in 
a different PR
   https://github.com/apache/lucene/pull/293
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] goankur commented on a change in pull request #282: Lucene-10070

2021-09-10 Thread GitBox


goankur commented on a change in pull request #282:
URL: https://github.com/apache/lucene/pull/282#discussion_r706532667



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/ConcurrentSortedSetDocValuesFacetCounts.java
##
@@ -159,13 +160,15 @@ private FacetResult getDim(String dim, OrdRange ordRange, 
int topN) throws IOExc
 final MatchingDocs hits;
 final OrdinalMap ordinalMap;
 final int segOrd;
+final Bits liveDocs;
 
 public CountOneSegment(
 LeafReader leafReader, MatchingDocs hits, OrdinalMap ordinalMap, int 
segOrd) {
   this.leafReader = leafReader;
   this.hits = hits;
   this.ordinalMap = ordinalMap;
   this.segOrd = segOrd;
+  this.liveDocs = (leafReader != null) ? leafReader.getLiveDocs() : null;

Review comment:
   Yes that makes sense. I made the change in the new revision captured in 
a different PR
   https://github.com/apache/lucene/pull/293
   
   I had to create the new PR of a different branch as changes in a local 
branch got overwritten after a rebase from remote, likely due to some GIT 
related mistakes I made. 
   
   I'd request you to move our conversation over to the new PR. Apologies for 
the inconvenience.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #292: LUCENE-10096: Tamil Analyzer

2021-09-10 Thread GitBox


rmuir merged pull request #292:
URL: https://github.com/apache/lucene/pull/292


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10096) Tamil Analyzer

2021-09-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413449#comment-17413449
 ] 

ASF subversion and git services commented on LUCENE-10096:
--

Commit 24aa45dc3e7b65ad27c44a48bb0ff1808bf8ea4c in lucene's branch 
refs/heads/main from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=24aa45d ]

LUCENE-10096: Tamil Analyzer (#292)

Add Tamil analyzer based on snowball stemmer and TamilNLP stopwords

> Tamil Analyzer
> --
>
> Key: LUCENE-10096
> URL: https://issues.apache.org/jira/browse/LUCENE-10096
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Similar to LUCENE-10095, let's "shrink-wrap" the new snowball stemmer into a 
> proper Analyzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10096) Tamil Analyzer

2021-09-10 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-10096.
--
Fix Version/s: main (9.0)
   Resolution: Fixed

> Tamil Analyzer
> --
>
> Key: LUCENE-10096
> URL: https://issues.apache.org/jira/browse/LUCENE-10096
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Similar to LUCENE-10095, let's "shrink-wrap" the new snowball stemmer into a 
> proper Analyzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] goankur commented on a change in pull request #282: Lucene-10070

2021-09-10 Thread GitBox


goankur commented on a change in pull request #282:
URL: https://github.com/apache/lucene/pull/282#discussion_r706535145



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/ConcurrentSortedSetDocValuesFacetCounts.java
##
@@ -207,11 +210,17 @@ public Void call() throws IOException {
   // Remap every ord to global ord as we iterate:
   if (singleValues != null) {
 for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; 
doc = it.nextDoc()) {
+  if (liveDocs != null && liveDocs.get(doc) == false) {

Review comment:
   Checking `liveDocs`  only when `hits` is `null` makes sense to me.  
However, duplicating significant amount of code doesn't sound appealing to me. 
To get around this issue,  I created a new DocIdSetIterator that wraps liveDocs 
and the main DocIdSetIterator only when we know we are counting all docs (i.e 
hits is null).  Please see the new PR
   https://github.com/apache/lucene/pull/293/files




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] goankur commented on a change in pull request #282: Lucene-10070

2021-09-10 Thread GitBox


goankur commented on a change in pull request #282:
URL: https://github.com/apache/lucene/pull/282#discussion_r706535312



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##
@@ -152,7 +153,8 @@ private FacetResult getDim(String dim, OrdRange ordRange, 
int topN) throws IOExc
   }
 
   private void countOneSegment(
-  OrdinalMap ordinalMap, LeafReader reader, int segOrd, MatchingDocs hits) 
throws IOException {
+  OrdinalMap ordinalMap, LeafReader reader, int segOrd, MatchingDocs hits, 
Bits liveDocs)

Review comment:
   Please see https://github.com/apache/lucene/pull/293/files for the 
approach that relies on wrapping `liveDocs` and main `DocIdSeIterator` to avoid 
checking `liveDocs` when `hits` is not null.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] goankur commented on a change in pull request #282: Lucene-10070

2021-09-10 Thread GitBox


goankur commented on a change in pull request #282:
URL: https://github.com/apache/lucene/pull/282#discussion_r706535368



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java
##
@@ -272,7 +273,8 @@ private void count(FacetsCollector facetsCollector) throws 
IOException {
   // Assuming the state is valid, ordinalMap should be null since we have 
one segment:
   assert ordinalMap == null;
 
-  countOneSegment(docValues, hits.context.ord, hits);
+  // hits contain live documents only, no need to pass live docs bitset 
explicitly
+  countOneSegment(docValues, hits.context.ord, hits, null);

Review comment:
   Please see my earlier comments. Thanks for taking a look.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10070) "count all" faceting functionality counts deleted docs for multiple implementations

2021-09-10 Thread Ankur (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413450#comment-17413450
 ] 

Ankur commented on LUCENE-10070:


[~gsmiller] Thanks for taking a look the the above PR. 

I incorporated your feedback into the changes capture in a different PR due to 
GIT related issues at my end.

Request you to continue the conversation there.

https://github.com/apache/lucene/pull/293/files

> "count all" faceting functionality counts deleted docs for multiple 
> implementations
> ---
>
> Key: LUCENE-10070
> URL: https://issues.apache.org/jira/browse/LUCENE-10070
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A few different {{Facets}} implementations supporting a "count all" style 
> constructor that allows the user to not pass in a {{FacetsCollector}} 
> instance. It advertises that it's equivalent to using a {{FacetsCollector}} 
> populated with a {{MatchAllDocsQuery}}, but more efficient. It looks like, 
> with the exception of {{FastTaxonomyFacetCounts}}, none of the 
> implementations correctly account for deleted documents (have a look at 
> {{FastTaxonomyFacetCounts}} for a correct example that consults "live docs."
> From what I can tell, the affected implementations are:
>  * SortedSetDocValueFacetCounts
>  * ConcurrentSortedSetDocValueFacetCounts
>  * LongValueFacetCounts
>  * StringValueFacetCounts
> I'll attach a PR shortly illustrating unit tests I wrote that confirm the bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10070) "count all" faceting functionality counts deleted docs for multiple implementations

2021-09-10 Thread Ankur (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413450#comment-17413450
 ] 

Ankur edited comment on LUCENE-10070 at 9/11/21, 1:11 AM:
--

[~gsmiller] Thanks for taking a look at the above PR. 

I incorporated your feedback into the changes capture in a different PR due to 
GIT related issues at my end.

Request you to continue the conversation there.

[https://github.com/apache/lucene/pull/293/files]


was (Author: goankur):
[~gsmiller] Thanks for taking a look the the above PR. 

I incorporated your feedback into the changes capture in a different PR due to 
GIT related issues at my end.

Request you to continue the conversation there.

https://github.com/apache/lucene/pull/293/files

> "count all" faceting functionality counts deleted docs for multiple 
> implementations
> ---
>
> Key: LUCENE-10070
> URL: https://issues.apache.org/jira/browse/LUCENE-10070
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A few different {{Facets}} implementations supporting a "count all" style 
> constructor that allows the user to not pass in a {{FacetsCollector}} 
> instance. It advertises that it's equivalent to using a {{FacetsCollector}} 
> populated with a {{MatchAllDocsQuery}}, but more efficient. It looks like, 
> with the exception of {{FastTaxonomyFacetCounts}}, none of the 
> implementations correctly account for deleted documents (have a look at 
> {{FastTaxonomyFacetCounts}} for a correct example that consults "live docs."
> From what I can tell, the affected implementations are:
>  * SortedSetDocValueFacetCounts
>  * ConcurrentSortedSetDocValueFacetCounts
>  * LongValueFacetCounts
>  * StringValueFacetCounts
> I'll attach a PR shortly illustrating unit tests I wrote that confirm the bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10070) "count all" faceting functionality counts deleted docs for multiple implementations

2021-09-10 Thread Ankur (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413450#comment-17413450
 ] 

Ankur edited comment on LUCENE-10070 at 9/11/21, 1:12 AM:
--

[~gsmiller] Thanks for taking a look at the above PR. 

I incorporated the feedback in a different PR due to GIT related issues at my 
end.

Request folks to continue the conversation there.

[https://github.com/apache/lucene/pull/293/files]


was (Author: goankur):
[~gsmiller] Thanks for taking a look at the above PR. 

I incorporated your feedback into the changes capture in a different PR due to 
GIT related issues at my end.

Request you to continue the conversation there.

[https://github.com/apache/lucene/pull/293/files]

> "count all" faceting functionality counts deleted docs for multiple 
> implementations
> ---
>
> Key: LUCENE-10070
> URL: https://issues.apache.org/jira/browse/LUCENE-10070
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A few different {{Facets}} implementations supporting a "count all" style 
> constructor that allows the user to not pass in a {{FacetsCollector}} 
> instance. It advertises that it's equivalent to using a {{FacetsCollector}} 
> populated with a {{MatchAllDocsQuery}}, but more efficient. It looks like, 
> with the exception of {{FastTaxonomyFacetCounts}}, none of the 
> implementations correctly account for deleted documents (have a look at 
> {{FastTaxonomyFacetCounts}} for a correct example that consults "live docs."
> From what I can tell, the affected implementations are:
>  * SortedSetDocValueFacetCounts
>  * ConcurrentSortedSetDocValueFacetCounts
>  * LongValueFacetCounts
>  * StringValueFacetCounts
> I'll attach a PR shortly illustrating unit tests I wrote that confirm the bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org