[GitHub] [lucene] mogui commented on pull request #679: Monitor Improvements LUCENE-10422

2022-03-14 Thread GitBox


mogui commented on pull request #679:
URL: https://github.com/apache/lucene/pull/679#issuecomment-1066549257


   @romseygeek fixed !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-03-14 Thread GitBox


romseygeek commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r825818251



##
File path: lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java
##
@@ -125,14 +105,16 @@ public Monitor(Analyzer analyzer, Presearcher 
presearcher, MonitorConfiguration
* Monitor's queryindex
*
* @param listener listener to register
+   * @throws IllegalStateException when Monitor is readonly

Review comment:
   I think this is an UOE now? Probably doesn't need to be in the javadoc, 
to be honest.

##
File path: 
lucene/monitor/src/test/org/apache/lucene/monitor/TestMonitorReadonly.java
##
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import java.util.Collections;
+import java.util.concurrent.TimeUnit;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.index.IndexNotFoundException;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.store.FSDirectory;
+import org.junit.Test;
+
+public class TestMonitorReadonly extends MonitorTestBase {
+  private static final Analyzer ANALYZER = new WhitespaceAnalyzer();
+
+  @Test
+  public void testReadonlyMonitorThrowsOnInexistentIndex() {
+Path indexDirectory = createTempDir();
+MonitorConfiguration config =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse),
+true);
+assertThrows(
+IndexNotFoundException.class,
+() -> {
+  new Monitor(ANALYZER, config);
+});
+  }
+
+  @Test
+  public void testReadonlyMonitorThrowsWhenCallingWriteRequests() throws 
IOException {
+Path indexDirectory = createTempDir();
+MonitorConfiguration writeConfig =
+new MonitorConfiguration()
+.setIndexPath(
+indexDirectory, 
MonitorQuerySerializer.fromParser(MonitorTestBase::parse));
+
+// this will create the index
+Monitor writeMonitor = new Monitor(ANALYZER, writeConfig);
+writeMonitor.close();
+
+MonitorConfiguration config =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse),
+true);
+try (Monitor monitor = new Monitor(ANALYZER, config)) {
+  assertThrows(
+  IllegalStateException.class,
+  () -> {
+TermQuery query = new TermQuery(new Term(FIELD, "test"));
+monitor.register(
+new MonitorQuery("query1", query, query.toString(), 
Collections.emptyMap()));
+  });
+
+  assertThrows(
+  UnsupportedOperationException.class,
+  () -> {
+monitor.deleteById("query1");
+  });
+
+  assertThrows(
+  UnsupportedOperationException.class,
+  () -> {
+monitor.clear();
+  });
+}
+  }
+
+  @Test
+  public void testSettingCustomDirectory() throws IOException {
+Path indexDirectory = createTempDir();
+Document doc = new Document();
+doc.add(newTextField(FIELD, "This is a Foobar test document", 
Field.Store.NO));
+
+MonitorConfiguration writeConfig =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse));
+
+try (Monitor writeMonitor = new Monitor(ANALYZER, writeConfig)) {
+  TermQuery query = new TermQuery(new Term(FIELD, "test"));
+  writeMonitor.register(
+  new MonitorQuery("query1", query, query.toString(), 
Collections.emptyMap()));
+  TermQuery query2 = new TermQuery(new Term(FIELD, "Foobar"));
+  writeMonitor.register(
+  new MonitorQuery("que

[jira] [Commented] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.

2022-03-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506184#comment-17506184
 ] 

Adrien Grand commented on LUCENE-10448:
---

Like [~vigyas] my understanding is that there would only be a problem in 
practice if Lucene would do very large and infrequent writes, but in practice 
Lucene does exactly the opposite. So I'm not sure there's anything to fix?

> MergeRateLimiter doesn't always limit instant rate.
> ---
>
> Key: LUCENE-10448
> URL: https://issues.apache.org/jira/browse/LUCENE-10448
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.11.1
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can see the code in *MergeRateLimiter*:
> {code:java}
> private long maybePause(long bytes, long curNS) throws 
> MergePolicy.MergeAbortedException {
>
> double rate = mbPerSec; 
> double secondsToPause = (bytes / 1024. / 1024.) / rate;
> long targetNS = lastNS + (long) (10 * secondsToPause);
> long curPauseNS = targetNS - curNS;
> // We don't bother with thread pausing if the pause is smaller than 2 
> msec.
> if (curPauseNS <= MIN_PAUSE_NS) {
>   // Set to curNS, not targetNS, to enforce the instant rate, not
>   // the "averaged over all history" rate:
>   lastNS = curNS;
>   return -1;
> }
>..
>   }
> {code}
> If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, 
> then the *maybePause* is called in 7:05 again,  so the value of 
> *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than 
> *curNS*, no matter how big the bytes is, we will return -1 and ignore to 
> pause. 
> I count the total times(callTimes) calling *maybePause* and ignored pause 
> times(ignorePauseTimes) and detail ignored bytes(detailBytes):
> {code:java}
> [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] 
> [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 
> docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec 
> throttle], [callTimes=857], [ignorePauseTimes=25],  [detailBytes(mb) = 
> [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, 
> 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, 
> 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, 
> 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]]
> {code}
> There are 857 times calling *maybePause*, including 25 times which is ignored 
> to pause, we can see that the ignored detail bytes (such as 0.28125mb) are 
> not small.
> As long as the interval between two *maybePause* calls is relatively long, 
> the pause action that should be executed will not be executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #740: LUCENE-10393: Unify binary dictionary and dictionary writer in kuromoji and nori

2022-03-14 Thread GitBox


mocobeta commented on pull request #740:
URL: https://github.com/apache/lucene/pull/740#issuecomment-1066672493


   We could have a common `DictionaryBuilder` class in analyzers-common but it 
brings too complex class hierarchy to me. I'd postpone refactoring 
XXXDictionaryBuilder until we come up with good interfaces or framework for 
that - it may need public interface changes and is out of the scope of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #737: Reduce for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms method.

2022-03-14 Thread GitBox


jpountz commented on pull request #737:
URL: https://github.com/apache/lucene/pull/737#issuecomment-1066789801


   This change makes sense to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #736: LUCENE-10458: BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables

2022-03-14 Thread GitBox


jpountz commented on a change in pull request #736:
URL: https://github.com/apache/lucene/pull/736#discussion_r825981973



##
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##
@@ -198,16 +198,22 @@ public boolean isCacheable(LeafReaderContext ctx) {
 
   @Override
   public int count(LeafReaderContext context) throws IOException {
-BoundedDocSetIdIterator disi = getDocIdSetIteratorOrNull(context);
-if (disi != null) {
-  return disi.lastDoc - disi.firstDoc;
+Sort indexSort = context.reader().getMetaData().getSort();
+if (indexSort != null
+&& indexSort.getSort().length > 0
+&& indexSort.getSort()[0].getField().equals(field)
+&& indexSort.getSort()[0].getMissingValue() == null) {

Review comment:
   I don't think that this is the right thing to check since the missing 
value is assumed to be zero if not set.
   
   The best thing we can do that I can think of is to check if the field is 
dense via points (ie. no missing values) or if the missing value falls outside 
of the range so that the bounded iterator is accurate?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a change in pull request #718: LUCENE-10444: Support alternate aggregation functions in association facets

2022-03-14 Thread GitBox


gsmiller commented on a change in pull request #718:
URL: https://github.com/apache/lucene/pull/718#discussion_r826082690



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java
##
@@ -130,16 +140,16 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
   ord = siblings[ord];
 }
 
-if (sumValues == 0) {
+if (aggregatedValue == 0) {
   return null;
 }
 
 if (dimConfig.multiValued) {
   if (dimConfig.requireDimCount) {
-sumValues = values[dimOrd];
+aggregatedValue = values[dimOrd];
   } else {
 // Our sum'd count is not correct, in general:

Review comment:
   It's not necessarily a "count" though here right? It's an aggregated 
weight associated with the value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a change in pull request #718: LUCENE-10444: Support alternate aggregation functions in association facets

2022-03-14 Thread GitBox


gsmiller commented on a change in pull request #718:
URL: https://github.com/apache/lucene/pull/718#discussion_r826085655



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##
@@ -173,17 +185,17 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
 
 if (sparseValues != null) {
   for (IntIntCursor c : sparseValues) {
-int count = c.value;
+int value = c.value;
 int ord = c.key;
-if (parents[ord] == dimOrd && count > 0) {
-  totValue += count;
+if (parents[ord] == dimOrd && value > 0) {
+  aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
   childCount++;
-  if (count > bottomValue) {
+  if (value > bottomValue) {

Review comment:
   That's right. There are a number of things actually preventing us from 
cleanly adding something like `min`. I had it originally but as I started 
looking at all the changes it would require, I backed off for the time being 
(especially since I don't have a concrete use-case in mind). One interesting 
challenge is that these facets implementations all assume the weights are 
positive values. There are a lot of `> 0` checks floating around the various 
implementations to check whether-or-not a value had any "weight" associated 
with it. This makes sense when using counts, but it's weird when generally 
associated weights with the values. So `min` started to feel a little weird and 
I just left it out for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request #745: Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri…

2022-03-14 Thread GitBox


javanna opened a new pull request #745:
URL: https://github.com/apache/lucene/pull/745


   Based on discussion happening in 
https://issues.apache.org/jira/browse/LUCENE-10458 , I am reverting 
LUCENE-10385 (#635) in the 8.1 branch.
   
   I left some test improvements that are still valid but removed the specific 
tests that verified the count optimization that no longer exists in this branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #718: LUCENE-10444: Support alternate aggregation functions in association facets

2022-03-14 Thread GitBox


gsmiller commented on pull request #718:
URL: https://github.com/apache/lucene/pull/718#issuecomment-1066998156


   Even though I ran benchmarks on the backport version of this change (#719), 
I figured it would be good to run benchmarks here as well. Below compares this 
patch against `main` using `wikimedium10m`:
   ```
   TaskQPS baseline  StdDevQPS candidate  
StdDevPct diff p-value
  BrowseDayOfYearTaxoFacets   21.65 (22.8%)   20.61 
(22.9%)   -4.8% ( -41% -   52%) 0.507
   BrowseDateTaxoFacets   21.61 (22.6%)   20.61 
(22.8%)   -4.6% ( -40% -   52%) 0.519
Prefix3  362.81 (10.5%)  349.00 
(11.4%)   -3.8% ( -23% -   20%) 0.273
BrowseRandomLabelTaxoFacets   17.76 (18.5%)   17.15 
(20.0%)   -3.4% ( -35% -   43%) 0.577
  BrowseMonthTaxoFacets   27.32 (27.1%)   26.45 
(29.2%)   -3.2% ( -46% -   72%) 0.721
   Wildcard   64.61  (5.6%)   63.58  
(6.0%)   -1.6% ( -12% -   10%) 0.381
   OrNotHighMed  887.00  (3.3%)  874.66  
(3.4%)   -1.4% (  -7% -5%) 0.191
LowTerm 2661.07  (2.9%) 2630.67  
(3.2%)   -1.1% (  -7% -5%) 0.240
  OrNotHighHigh 1523.01  (3.8%) 1506.01  
(4.2%)   -1.1% (  -8% -7%) 0.379
AndHighMedDayTaxoFacets  124.82  (1.4%)  123.59  
(1.5%)   -1.0% (  -3% -1%) 0.032
   HighSpanNear   21.47  (4.9%)   21.27  
(4.9%)   -0.9% ( -10% -9%) 0.558
  MedPhrase  342.22  (2.8%)  339.35  
(3.2%)   -0.8% (  -6% -5%) 0.373
 HighPhrase  453.21  (2.2%)  449.61  
(2.6%)   -0.8% (  -5% -4%) 0.291
MedSpanNear   74.03  (4.3%)   73.45  
(4.2%)   -0.8% (  -8% -8%) 0.559
  BrowseMonthSSDVFacets   13.79 (19.1%)   13.69 
(19.7%)   -0.7% ( -33% -   47%) 0.910
  LowPhrase   85.89  (1.9%)   85.33  
(2.0%)   -0.7% (  -4% -3%) 0.294
   PKLookup  169.25  (3.2%)  168.19  
(2.3%)   -0.6% (  -5% -5%) 0.481
   OrNotHighLow  962.38  (3.0%)  956.47  
(2.9%)   -0.6% (  -6% -5%) 0.511
  BrowseDayOfYearSSDVFacets   12.23 (14.4%)   12.15 
(14.5%)   -0.6% ( -25% -   33%) 0.897
   BrowseDateSSDVFacets2.34  (6.1%)2.32  
(7.8%)   -0.6% ( -13% -   14%) 0.802
  OrHighMed  134.10  (5.0%)  133.49  
(4.6%)   -0.5% (  -9% -9%) 0.766
 Fuzzy1   91.09  (1.2%)   90.71  
(1.8%)   -0.4% (  -3% -2%) 0.373
   HighTerm 1690.39  (4.6%) 1684.21  
(5.3%)   -0.4% (  -9% -   10%) 0.816
  OrHighNotHigh 1592.87  (2.4%) 1587.86  
(3.7%)   -0.3% (  -6% -5%) 0.751
   AndHighHighDayTaxoFacets   12.34  (2.4%)   12.31  
(2.4%)   -0.3% (  -4% -4%) 0.696
 AndHighLow  927.42  (3.2%)  924.95  
(3.1%)   -0.3% (  -6% -6%) 0.790
MedSloppyPhrase  107.35  (2.5%)  107.06  
(2.5%)   -0.3% (  -5% -4%) 0.736
 IntNRQ   83.16  (1.2%)   82.95  
(1.0%)   -0.2% (  -2% -1%) 0.469
MedTerm 1935.87  (4.3%) 1934.77  
(4.8%)   -0.1% (  -8% -9%) 0.969
   OrHighNotLow 1109.87  (4.1%) 1109.25  
(4.7%)   -0.1% (  -8% -9%) 0.968
   HighSloppyPhrase   28.62  (1.8%)   28.61  
(2.6%)   -0.0% (  -4% -4%) 0.981
Respell   57.41  (1.1%)   57.40  
(1.6%)   -0.0% (  -2% -2%) 0.968
LowSpanNear  193.79  (3.4%)  193.81  
(3.8%)0.0% (  -6% -7%) 0.992
LowSloppyPhrase   30.88  (1.5%)   30.90  
(1.8%)0.1% (  -3% -3%) 0.885
 OrHighHigh   38.71  (4.3%)   38.74  
(4.2%)0.1% (  -8% -8%) 0.954
  OrHighLow  606.33  (2.8%)  606.93  
(2.5%)0.1% (  -5% -5%) 0.907
   OrHighNotMed  985.50  (4.3%)  986.95  
(4.6%)0.1% (  -8% -9%) 0.917
 Fuzzy2   38.38  (1.5%)   38.45  
(1.8%)0.2% (  -3% -3%) 0.698
 OrHighMedDayTaxoFacets5.37  (4.8%)5.39  
(4.9%)0.4% (  -8% -   10%) 0.801
   MedTermDayTaxoFacets   27.22  (3.9%)   27.

[jira] [Commented] (LUCENE-10458) BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables

2022-03-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506341#comment-17506341
 ] 

Adrien Grand commented on LUCENE-10458:
---

Thanks for catching this, [~lucacavanna] is reverting the change on branch_9_1 
to not delay the release. https://github.com/apache/lucene/pull/745

> BoundedDocSetIdIterator may supply error count in 
> Weigth#count(LeafReaderContext) when missingValue enables
> ---
>
> Key: LUCENE-10458
> URL: https://issues.apache.org/jira/browse/LUCENE-10458
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When IndexSortSortedNumericDocValuesRangeQuery can take advantage of index 
> sort, Weight#count will use BoundedDocSetIdIterator's lastDoc and firstDoc to 
> calculate count, but if missingValue enables, those Documents which not 
> contain DocValues may be involved in calculating count.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10441) ArrayIndexOutOfBoundsException during indexing

2022-03-14 Thread Peixin Li (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506376#comment-17506376
 ] 

Peixin Li commented on LUCENE-10441:


How many tokens should causing issue? and is there a way to improve it

currently i'm using slandered analyzer for indexWriter, it could cause too many 
tokens if terms are having a lot of "-" or "." right?

> ArrayIndexOutOfBoundsException during indexing
> --
>
> Key: LUCENE-10441
> URL: https://issues.apache.org/jira/browse/LUCENE-10441
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.10
>Reporter: Peixin Li
>Priority: Major
>
> Hi experts!, i have facing ArrayIndexOutOfBoundsException during indexing and 
> committing documents, this exception gives me no clue about what happened so 
> i have little information for debugging, can i have some suggest about what 
> could be and how to fix this error? i'm using Lucene 8.10.0
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: -1
>     at org.apache.lucene.util.BytesRefHash$1.get(BytesRefHash.java:179)
>     at 
> org.apache.lucene.util.StringMSBRadixSorter$1.get(StringMSBRadixSorter.java:42)
>     at 
> org.apache.lucene.util.StringMSBRadixSorter$1.setPivot(StringMSBRadixSorter.java:63)
>     at org.apache.lucene.util.Sorter.binarySort(Sorter.java:192)
>     at org.apache.lucene.util.Sorter.binarySort(Sorter.java:187)
>     at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:41)
>     at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:83)
>     at org.apache.lucene.util.IntroSorter.sort(IntroSorter.java:36)
>     at 
> org.apache.lucene.util.MSBRadixSorter.introSort(MSBRadixSorter.java:133)
>     at org.apache.lucene.util.MSBRadixSorter.sort(MSBRadixSorter.java:126)
>     at org.apache.lucene.util.MSBRadixSorter.sort(MSBRadixSorter.java:121)
>     at org.apache.lucene.util.BytesRefHash.sort(BytesRefHash.java:183)
>     at 
> org.apache.lucene.index.SortedSetDocValuesWriter.flush(SortedSetDocValuesWriter.java:171)
>     at 
> org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:348)
>     at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:228)
>     at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
>     at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476)
>     at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656)
>     at 
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364)
>     at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770)
>     at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta opened a new pull request #746: Sanity check on start javaw

2022-03-14 Thread GitBox


mocobeta opened a new pull request #746:
URL: https://github.com/apache/lucene/pull/746


   This tests if Luke process successfully starts on "java" on Mac/Linux or 
"start javaw" on Windows.
   TestScripts now checks if the expected message is contained in a log file 
instead of the forked process's stdout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10464) unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms

2022-03-14 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506392#comment-17506392
 ] 

Christine Poerschke commented on LUCENE-10464:
--

https://github.com/apache/lucene/pull/737

> unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms 
> ---
>
> Key: LUCENE-10464
> URL: https://issues.apache.org/jira/browse/LUCENE-10464
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>
> The 
> https://github.com/apache/lucene/commit/81c7ba4601a9aaf16e2255fe493ee582abe72a90
>  change in LUCENE-4728 included
> {code}
> - final SpanQuery rewrittenQuery = (SpanQuery) 
> spanQuery.rewrite(getLeafContextForField(field).reader());
> + final SpanQuery rewrittenQuery = (SpanQuery) 
> spanQuery.rewrite(getLeafContext().reader());
> {code}
> i.e. previously more needed to happen in the loop but now the query rewrite 
> and term collecting need not happen in the loop.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10464) unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms

2022-03-14 Thread Christine Poerschke (Jira)
Christine Poerschke created LUCENE-10464:


 Summary: unnecessary for-loop in 
WeightedSpanTermExtractor.extractWeightedSpanTerms 
 Key: LUCENE-10464
 URL: https://issues.apache.org/jira/browse/LUCENE-10464
 Project: Lucene - Core
  Issue Type: Task
Reporter: Christine Poerschke
Assignee: Christine Poerschke


The 
https://github.com/apache/lucene/commit/81c7ba4601a9aaf16e2255fe493ee582abe72a90
 change in LUCENE-4728 included
{code}
- final SpanQuery rewrittenQuery = (SpanQuery) 
spanQuery.rewrite(getLeafContextForField(field).reader());
+ final SpanQuery rewrittenQuery = (SpanQuery) 
spanQuery.rewrite(getLeafContext().reader());
{code}

i.e. previously more needed to happen in the loop but now the query rewrite and 
term collecting need not happen in the loop.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10465) Unable to find antlr4.runtime

2022-03-14 Thread Muler (Jira)
Muler created LUCENE-10465:
--

 Summary: Unable to find antlr4.runtime
 Key: LUCENE-10465
 URL: https://issues.apache.org/jira/browse/LUCENE-10465
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Muler


While running the trunk version of Lucene on Intellij, I'm getting the below 
error and unable to fix it.


Error occurred during initialization of boot layer
java.lang.module.FindException: Module antlr4.runtime not found, required by 
org.apache.lucene.expressions



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10427) OLAP likewise rollup during segment merge process

2022-03-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506402#comment-17506402
 ] 

Adrien Grand commented on LUCENE-10427:
---

Thanks I understand better now. With the sidecar approach, could you compute 
rollups at index time by performing updates instead of hooking into the merging 
process? For instance if a user is adding a new sample, you could retrieve data 
for the current  bucket for the given 
dimensions and update the min/max/sum values?


> OLAP likewise rollup during segment merge process
> -
>
> Key: LUCENE-10427
> URL: https://issues.apache.org/jira/browse/LUCENE-10427
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Suhan Mao
>Priority: Major
>
> Currently, many OLAP engines support rollup feature like 
> clickhouse(AggregateMergeTree)/druid. 
> Rollup definition: [https://athena.ecs.csus.edu/~mei/olap/OLAPoperations.php]
> One of the way to do rollup is to merge the same dimension buckets into one 
> and do sum()/min()/max() operation on metric fields during segment 
> compact/merge process. This can significantly reduce the size of the data and 
> speed up the query a lot.
>  
> *Abstraction of how to do*
>  # Define rollup logic: which is dimensions and metrics.
>  # Rollup definition for each metric field: max/min/sum ...
>  # index sorting should the the same as dimension fields.
>  # We will do rollup calculation during segment merge just like other OLAP 
> engine do.
>  
> *Assume the scenario*
> We use ES to ingest realtime raw temperature data every minutes of each 
> sensor device along with many dimension information. User may want to query 
> the data like "what is the max temperature of some device within some/latest 
> hour" or "what is the max temperature of some city within some/latest hour"
> In that way, we can define such fields and rollup definition:
>  # event_hour(round to hour granularity)
>  # device_id(dimension)
>  # city_id(dimension)
>  # temperature(metrics, max/min rollup logic)
> The raw data will periodically be rolled up to the hour granularity during 
> segment merge process, which should save 60x storage ideally in the end.
>  
> *How we do rollup in segment merge*
> bucket: docs should belong to the same bucket if the dimension values are all 
> the same.
>  # For docvalues merge, we send the normal mappedDocId if we encounter a new 
> bucket in DocIDMerger.
>  # Since the index sorting fields are the same with dimension fields. if we 
> encounter more docs in the same bucket, We emit special mappedDocId from 
> DocIDMerger .
>  # In DocValuesConsumer.mergeNumericField, if we meet special mappedDocId, we 
> do a rollup calculation on metric fields and fold the result value to the 
> first doc in the  bucket. The calculation just like a streaming merge sort 
> rollup.
>  # We discard all the special mappedDocId docs because the metrics is already 
> folded to the first doc of in the bucket.
>  # In BKD/posting structure, we discard all the special mappedDocId docs and 
> only place the first doc id within a bucket in the BKD/posting data. It 
> should be simple.
>  
> *How to define the logic*
>  
> {code:java}
> public class RollupMergeConfig {
>   private List dimensionNames;
>   private List aggregateFields;
> } 
> public class RollupMergeAggregateField {
>   private String name;
>   private RollupMergeAggregateType aggregateType;
> }
> public enum RollupMergeAggregateType {
>   COUNT,
>   SUM,
>   MIN,
>   MAX,
>   CARDINALITY // if data sketch is stored in binary doc values, we can do a 
> union logic 
> }{code}
>  
>  
> I have written the initial code in a basic level. I can submit the complete 
> PR if you think this feature is good to try.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10461) Luke: Windows launch script passes integration tests but fails to run

2022-03-14 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506403#comment-17506403
 ] 

Tomoko Uchida commented on LUCENE-10461:


I am fine with keeping javaw support.

In that case, we could slightly change the TestScripts to make it allow to 
check the healthiness of the spawned process by "start javaw" when it runs on 
Windows. I'm sorry for being persistent - I just wanted to test the actual 
command, rather than adjusted one for testing.

https://github.com/apache/lucene/pull/746

> Luke: Windows launch script passes integration tests but fails to run
> -
>
> Key: LUCENE-10461
> URL: https://issues.apache.org/jira/browse/LUCENE-10461
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: 9.1, 10.0 (main)
>
> Attachments: image-2022-03-13-11-18-34-704.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> PR at https://github.com/apache/lucene/pull/743



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a change in pull request #746: Sanity check on start javaw

2022-03-14 Thread GitBox


mocobeta commented on a change in pull request #746:
URL: https://github.com/apache/lucene/pull/746#discussion_r826259906



##
File path: 
lucene/distribution.tests/src/test/org/apache/lucene/distribution/TestScripts.java
##
@@ -112,6 +120,7 @@ protected void execute(
   Launcher launcher,
   int expectedExitCode,
   long timeoutInSeconds,
+  Path logFile,

Review comment:
   I added this parameter for proof of concepts, but this can be (should 
be) `Optional`. If None is passed, `processOutputConsumer` would take 
stdout of the forked process as before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a change in pull request #746: Sanity check on start javaw

2022-03-14 Thread GitBox


mocobeta commented on a change in pull request #746:
URL: https://github.com/apache/lucene/pull/746#discussion_r826266136



##
File path: 
lucene/distribution.tests/src/test/org/apache/lucene/distribution/TestScripts.java
##
@@ -125,13 +134,29 @@ protected void execute(
   throw new AssertionError("Forked process did not terminate in the 
expected time");
 }
 
+// Wait until the log file is created by Luke.

Review comment:
   ```suggestion
   // Wait until the log file is created by the descendant process.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10461) Luke: Windows launch script passes integration tests but fails to run

2022-03-14 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506458#comment-17506458
 ] 

Dawid Weiss commented on LUCENE-10461:
--

Hey, Tomoko. I don't think we should go in the direction you have in your 
patch. java and javaw are literally the same, functionally. The difference is 
one has a window-application api entry and the other requires console support - 
it really doesn't matter from Luke's point of view. I also think it is more 
than fine to run the tests with 'java' and skip the 'start' to make the script 
blocking. It is a Windows-specific script and it is a Windows-specific 
workaround to make the test behave better. With your patch, the test relies on 
wall clock to detect the log file and is just more complex than it has to be. 
Just my few cents.

> Luke: Windows launch script passes integration tests but fails to run
> -
>
> Key: LUCENE-10461
> URL: https://issues.apache.org/jira/browse/LUCENE-10461
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: 9.1, 10.0 (main)
>
> Attachments: image-2022-03-13-11-18-34-704.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> PR at https://github.com/apache/lucene/pull/743



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Andriy Redko (Jira)
Andriy Redko created LUCENE-10466:
-

 Summary: IndexSortSortedNumericDocValuesRangeQuery unconditionally 
assumes the usage of the LONG-encoded SortField
 Key: LUCENE-10466
 URL: https://issues.apache.org/jira/browse/LUCENE-10466
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring
Affects Versions: 9.0
Reporter: Andriy Redko


We have run into this issue while migrating to OpenSearch and making changes to 
accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned out 
that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the 
usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static 
ValueComparator loadComparator* method
{noformat}
    @SuppressWarnings("unchecked")
    FieldComparator fieldComparator = (FieldComparator) 
sortField.getComparator(1, 0);
    fieldComparator.setTopValue(topValue);
 {noformat}
 

Using the numeric range query (in case of sorted index) with anything but LONG 
ends up with class cast exception:
{noformat}
   >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
java.base of loader 'bootstrap')
   >         at 
org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
 {noformat}
Simple test case to reproduce (for 
TestIndexSortSortedNumericDocValuesRangeQuery):
{noformat}
  public void testIndexSortDocValuesWithIntRange() throws Exception {
    Directory dir = newDirectory();    IndexWriterConfig iwc = new 
IndexWriterConfig(new MockAnalyzer(random()));
    Sort indexSort = new Sort(new SortedNumericSortField("field", 
SortField.Type.INT, false));
    iwc.setIndexSort(indexSort);
    RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
writer.addDocument(createDocument("field", -80));    DirectoryReader reader = 
writer.getReader();
    IndexSearcher searcher = newSearcher(reader);    // Test ranges consisting 
of one value.
    assertEquals(1, searcher.count(createQuery("field", -80, -80)));    
writer.close();
    reader.close();
    dir.close();
  } {noformat}
 

The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should not 
fail with class cast but correctly convert the numeric values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Andriy Redko (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy Redko updated LUCENE-10466:
--
Description: 
We have run into this issue while migrating to OpenSearch and making changes to 
accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned out 
that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the 
usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static 
ValueComparator loadComparator* method
{noformat}
    @SuppressWarnings("unchecked")
    FieldComparator fieldComparator = (FieldComparator) 
sortField.getComparator(1, 0);
    fieldComparator.setTopValue(topValue);
 {noformat}
 

Using the numeric range query (in case of sorted index) with anything but LONG 
ends up with class cast exception:
{noformat}
   >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
java.base of loader 'bootstrap')
   >         at 
org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
 {noformat}
Simple test case to reproduce (for 
TestIndexSortSortedNumericDocValuesRangeQuery):
{noformat}
  public void testIndexSortDocValuesWithIntRange() throws Exception {
    Directory dir = newDirectory();    
IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
    Sort indexSort = new Sort(new SortedNumericSortField("field", 
SortField.Type.INT, false));
    iwc.setIndexSort(indexSort);
    
RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
writer.addDocument(createDocument("field", -80));    

DirectoryReader reader = writer.getReader();
    IndexSearcher searcher = newSearcher(reader);    // Test ranges consisting 
of one value.
    assertEquals(1, searcher.count(createQuery("field", -80, -80)));    

writer.close();
    reader.close();
    dir.close();
  } {noformat}
 

The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should not 
fail with class cast but correctly convert the numeric values.

  was:
We have run into this issue while migrating to OpenSearch and making changes to 
accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned out 
that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the 
usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static 
ValueComparator loadComparator* method
{noformat}
    @SuppressWarnings("unchecked")
    FieldComparator fieldComparator = (FieldComparator) 
sortField.getComparator(1, 0);
    fieldComparator.setTopValue(topValue);
 {noformat}
 

Using the numeric range query (in case of sorted index) with anything but LONG 
ends up with class cast exception:
{noformat}
   >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
java.base of loader 'bootstrap')
   >         at 
org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
 {noformat}
Simple test case to reproduce (for 
TestIndexSortSortedNumericDocValuesRangeQuery):
{noformat}
  public void testIndexSortDocValuesWithIntRange() throws Exception {
    Directory dir = newDirectory();    IndexWriterConfig iwc = new 
IndexWriterConfig(new MockAnalyzer(random()));
    Sort indexSort = new Sort(new SortedNumericSortField("field", 
SortField.Type.INT, false));
    iwc.setIndexSort(indexSort);
    RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
writer.addDocument(createDocument("field", -80));    DirectoryReader reader = 
writer.getReader();
    IndexSearcher searcher = newSearcher(reader);    // Test ranges consisting 
of one value.
    assertEquals(1, searcher.count(createQuery("field", -80, -80)));    
writer.close();
    reader.close();
    dir.close();
  } {noformat}
 

The expectation is that *IndexSortSortedNumericDocVa

[jira] [Commented] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Andriy Redko (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506464#comment-17506464
 ] 

Andriy Redko commented on LUCENE-10466:
---

[~jpountz] does the issue make sense to you? I would be happy to work on pull 
request to fix that, thank you.

> IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage 
> of the LONG-encoded SortField
> -
>
> Key: LUCENE-10466
> URL: https://issues.apache.org/jira/browse/LUCENE-10466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 9.0
>Reporter: Andriy Redko
>Priority: Major
>
> We have run into this issue while migrating to OpenSearch and making changes 
> to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned 
> out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes 
> the usage of the LONG-encoded {*}SortField{*}, as could be seen inside 
> *static ValueComparator loadComparator* method
> {noformat}
>     @SuppressWarnings("unchecked")
>     FieldComparator fieldComparator = (FieldComparator) 
> sortField.getComparator(1, 0);
>     fieldComparator.setTopValue(topValue);
>  {noformat}
>  
> Using the numeric range query (in case of sorted index) with anything but 
> LONG ends up with class cast exception:
> {noformat}
>    >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
> class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
> java.base of loader 'bootstrap')
>    >         at 
> org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
>  {noformat}
> Simple test case to reproduce (for 
> TestIndexSortSortedNumericDocValuesRangeQuery):
> {noformat}
>   public void testIndexSortDocValuesWithIntRange() throws Exception {
>     Directory dir = newDirectory();    IndexWriterConfig iwc = new 
> IndexWriterConfig(new MockAnalyzer(random()));
>     Sort indexSort = new Sort(new SortedNumericSortField("field", 
> SortField.Type.INT, false));
>     iwc.setIndexSort(indexSort);
>     RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
> writer.addDocument(createDocument("field", -80));    DirectoryReader reader = 
> writer.getReader();
>     IndexSearcher searcher = newSearcher(reader);    // Test ranges 
> consisting of one value.
>     assertEquals(1, searcher.count(createQuery("field", -80, -80)));    
> writer.close();
>     reader.close();
>     dir.close();
>   } {noformat}
>  
> The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should 
> not fail with class cast but correctly convert the numeric values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a change in pull request #746: Sanity check on start javaw

2022-03-14 Thread GitBox


mocobeta commented on a change in pull request #746:
URL: https://github.com/apache/lucene/pull/746#discussion_r826294904



##
File path: lucene/distribution/src/binary-release/bin/luke.cmd
##
@@ -18,21 +18,17 @@
 SETLOCAL
 SET MODULES=%~dp0..
 
-IF DEFINED LAUNCH_CMD GOTO testing
 REM Windows 'start' command takes the first quoted argument to be the title of 
the started window. Since we
 REM quote the LAUNCH_CMD (because it can contain spaces), it misinterprets it 
as the title and fails to run.
 REM force the window title here.
 SET LAUNCH_START=start "Lucene Luke"
+
+IF DEFINED LAUNCH_CMD GOTO testing
 SET LAUNCH_CMD=javaw
 SET LAUNCH_OPTS=
 goto launch
 
 :testing
-REM For distribution testing we don't use start and pass an explicit java 
command path,
-REM This is required because otherwise we can't block on luke invocation and 
can't intercept
-REM the return status. We also force UTF-8 encoding so that we don't have to 
interpret the output in
-REM an unknown local platform encoding.

Review comment:
   This comment was wrongly removed; should be reverted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Andriy Redko (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy Redko updated LUCENE-10466:
--
Priority: Minor  (was: Major)

> IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage 
> of the LONG-encoded SortField
> -
>
> Key: LUCENE-10466
> URL: https://issues.apache.org/jira/browse/LUCENE-10466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 9.0
>Reporter: Andriy Redko
>Priority: Minor
>
> We have run into this issue while migrating to OpenSearch and making changes 
> to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned 
> out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes 
> the usage of the LONG-encoded {*}SortField{*}, as could be seen inside 
> *static ValueComparator loadComparator* method
> {noformat}
>     @SuppressWarnings("unchecked")
>     FieldComparator fieldComparator = (FieldComparator) 
> sortField.getComparator(1, 0);
>     fieldComparator.setTopValue(topValue);
>  {noformat}
>  
> Using the numeric range query (in case of sorted index) with anything but 
> LONG ends up with class cast exception:
> {noformat}
>    >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
> class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
> java.base of loader 'bootstrap')
>    >         at 
> org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
>  {noformat}
> Simple test case to reproduce (for 
> TestIndexSortSortedNumericDocValuesRangeQuery):
> {noformat}
>   public void testIndexSortDocValuesWithIntRange() throws Exception {
>     Directory dir = newDirectory();    
> IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
>     Sort indexSort = new Sort(new SortedNumericSortField("field", 
> SortField.Type.INT, false));
>     iwc.setIndexSort(indexSort);
>     
> RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
> writer.addDocument(createDocument("field", -80));    
> 
> DirectoryReader reader = writer.getReader();
>     IndexSearcher searcher = newSearcher(reader);    // Test ranges 
> consisting of one value.
>     assertEquals(1, searcher.count(createQuery("field", -80, -80)));    
> writer.close();
>     reader.close();
>     dir.close();
>   } {noformat}
>  
> The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should 
> not fail with class cast but correctly convert the numeric values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10461) Luke: Windows launch script passes integration tests but fails to run

2022-03-14 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506488#comment-17506488
 ] 

Dawid Weiss commented on LUCENE-10461:
--

Just to be clear - I did take a look at the patch... I know what you're trying 
to do but something in me says it's not going to work reliably (the timeouts 
awaiting the appearance of the log file, the wait for the buffer flush from a 
subprocess). I think it's far less trappy to just use the blocking java call in 
the script for integration tests...

If you're convinced this is the way to go then I'm not going to stand in the 
way... I'd perhaps suggest to at least make the start command blocking (add the 
/wait option in the script just for testing) - this will eliminate the need to 
wait for the log file to appear (as start will be synchronous then).

> Luke: Windows launch script passes integration tests but fails to run
> -
>
> Key: LUCENE-10461
> URL: https://issues.apache.org/jira/browse/LUCENE-10461
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: 9.1, 10.0 (main)
>
> Attachments: image-2022-03-13-11-18-34-704.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> PR at https://github.com/apache/lucene/pull/743



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G opened a new pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G opened a new pull request #747:
URL: https://github.com/apache/lucene/pull/747


   # Description
   This change 
   * adds a new API - getTopDims in Facets to support users specify the number 
of dims and children they want to get, and returns only these dims and children.
   * override getTopDims in SortedSetDocValuesFacetCounts to optimize the 
current method of getting dimCount, return FacetResult and resolve child paths 
for only the requested dims.
   
   # Solution
   
   * Implement a default getTopDims function in the Facets class.
   * Override getTopDims and refactor the getPathResult function in 
SortedSetDocValuesFacetCounts to get dimCount (aggregated dim values) more 
efficiently by checking if dimCount has been populated in indexing time 
(setRequireDimCount == true) before accumulating dimCount using a while loop. 
   * Use priority queue to store the requested top n dims and populate labels 
and returns FacetResult for those dims.
   
   # Tests
   
   Added new testing for both default and overridden implementations of 
getTopDims
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `main` branch.
   - [X] I have run `./gradlew check`.
   - [X] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10460) Delegating DocIdSetIterator could be replaced to DocIdSetIterator#range(int minDoc, int maxDoc) in IndexSortSortedNumericDocValuesRangeQuery

2022-03-14 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506528#comment-17506528
 ] 

Julie Tibshirani commented on LUCENE-10460:
---

It indeed seems okay to use a simple DocIdSetIterator#range in this case. I'm 
wondering about the motivation for specializing this case though, especially 
since the logic is already pretty complex. Have you seen it make a latency 
difference when there are missing values?  In the case with no missing values I 
don't think it will help much, since iterating dense doc values is already 
optimized (see DenseNumericDocValues).

> Delegating DocIdSetIterator could be replaced to DocIdSetIterator#range(int 
> minDoc, int maxDoc) in IndexSortSortedNumericDocValuesRangeQuery
> 
>
> Key: LUCENE-10460
> URL: https://issues.apache.org/jira/browse/LUCENE-10460
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While taking advantage of of index sort In 
> IndexSortSortedNumericDocValuesRangeQuery, if MissingValue disabled, all 
> Documents between a range of firstDoc and lastDoc must contain DocValues. So 
> In BoundedDocSetIdIterator#advance(int),  the delegating DocIdSetIterator 
> could be replaced to DocIdSetIterator#range(int minDoc, int maxDoc)?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506530#comment-17506530
 ] 

Julie Tibshirani commented on LUCENE-10466:
---

Thank you [~reta] for reporting this. I had noticed the same thing when 
integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but 
forgot to follow up!! I would be happy to help review a PR if you're up for it.

For context, how did you run into this? How does it relate to deletions in 
nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040)? 

> IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage 
> of the LONG-encoded SortField
> -
>
> Key: LUCENE-10466
> URL: https://issues.apache.org/jira/browse/LUCENE-10466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 9.0
>Reporter: Andriy Redko
>Priority: Minor
>
> We have run into this issue while migrating to OpenSearch and making changes 
> to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned 
> out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes 
> the usage of the LONG-encoded {*}SortField{*}, as could be seen inside 
> *static ValueComparator loadComparator* method
> {noformat}
>     @SuppressWarnings("unchecked")
>     FieldComparator fieldComparator = (FieldComparator) 
> sortField.getComparator(1, 0);
>     fieldComparator.setTopValue(topValue);
>  {noformat}
>  
> Using the numeric range query (in case of sorted index) with anything but 
> LONG ends up with class cast exception:
> {noformat}
>    >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
> class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
> java.base of loader 'bootstrap')
>    >         at 
> org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
>  {noformat}
> Simple test case to reproduce (for 
> TestIndexSortSortedNumericDocValuesRangeQuery):
> {noformat}
>   public void testIndexSortDocValuesWithIntRange() throws Exception {
>     Directory dir = newDirectory();    
> IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
>     Sort indexSort = new Sort(new SortedNumericSortField("field", 
> SortField.Type.INT, false));
>     iwc.setIndexSort(indexSort);
>     
> RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
> writer.addDocument(createDocument("field", -80));    
> 
> DirectoryReader reader = writer.getReader();
>     IndexSearcher searcher = newSearcher(reader);    // Test ranges 
> consisting of one value.
>     assertEquals(1, searcher.count(createQuery("field", -80, -80)));    
> writer.close();
>     reader.close();
>     dir.close();
>   } {noformat}
>  
> The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should 
> not fail with class cast but correctly convert the numeric values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506530#comment-17506530
 ] 

Julie Tibshirani edited comment on LUCENE-10466 at 3/14/22, 8:40 PM:
-

Thank you [~reta] for reporting this. I had noticed the same thing when 
integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but 
forgot to follow up!! I would be happy to help review a PR if you're up for it.

For context, how did you run into this? How does it relate to deletions in 
nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040) ?


was (Author: julietibs):
Thank you [~reta] for reporting this. I had noticed the same thing when 
integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but 
forgot to follow up!! I would be happy to help review a PR if you're up for it.

For context, how did you run into this? How does it relate to deletions in 
nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040)?

> IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage 
> of the LONG-encoded SortField
> -
>
> Key: LUCENE-10466
> URL: https://issues.apache.org/jira/browse/LUCENE-10466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 9.0
>Reporter: Andriy Redko
>Priority: Minor
>
> We have run into this issue while migrating to OpenSearch and making changes 
> to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned 
> out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes 
> the usage of the LONG-encoded {*}SortField{*}, as could be seen inside 
> *static ValueComparator loadComparator* method
> {noformat}
>     @SuppressWarnings("unchecked")
>     FieldComparator fieldComparator = (FieldComparator) 
> sortField.getComparator(1, 0);
>     fieldComparator.setTopValue(topValue);
>  {noformat}
>  
> Using the numeric range query (in case of sorted index) with anything but 
> LONG ends up with class cast exception:
> {noformat}
>    >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
> class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
> java.base of loader 'bootstrap')
>    >         at 
> org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
>  {noformat}
> Simple test case to reproduce (for 
> TestIndexSortSortedNumericDocValuesRangeQuery):
> {noformat}
>   public void testIndexSortDocValuesWithIntRange() throws Exception {
>     Directory dir = newDirectory();    
> IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
>     Sort indexSort = new Sort(new SortedNumericSortField("field", 
> SortField.Type.INT, false));
>     iwc.setIndexSort(indexSort);
>     
> RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
> writer.addDocument(createDocument("field", -80));    
> 
> DirectoryReader reader = writer.getReader();
>     IndexSearcher searcher = newSearcher(reader);    // Test ranges 
> consisting of one value.
>     assertEquals(1, searcher.count(createQuery("field", -80, -80)));    
> writer.close();
>     reader.close();
>     dir.close();
>   } {noformat}
>  
> The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should 
> not fail with class cast but correctly convert the numeric values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506530#comment-17506530
 ] 

Julie Tibshirani edited comment on LUCENE-10466 at 3/14/22, 8:40 PM:
-

Thank you [~reta] for reporting this. I had noticed the same thing when 
integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but 
forgot to follow up!! I would be happy to help review a PR if you're up for it.

For context, how did you run into this? How does it relate to deletions in 
nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040)?


was (Author: julietibs):
Thank you [~reta] for reporting this. I had noticed the same thing when 
integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but 
forgot to follow up!! I would be happy to help review a PR if you're up for it.

For context, how did you run into this? How does it relate to deletions in 
nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040)? 

> IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage 
> of the LONG-encoded SortField
> -
>
> Key: LUCENE-10466
> URL: https://issues.apache.org/jira/browse/LUCENE-10466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 9.0
>Reporter: Andriy Redko
>Priority: Minor
>
> We have run into this issue while migrating to OpenSearch and making changes 
> to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned 
> out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes 
> the usage of the LONG-encoded {*}SortField{*}, as could be seen inside 
> *static ValueComparator loadComparator* method
> {noformat}
>     @SuppressWarnings("unchecked")
>     FieldComparator fieldComparator = (FieldComparator) 
> sortField.getComparator(1, 0);
>     fieldComparator.setTopValue(topValue);
>  {noformat}
>  
> Using the numeric range query (in case of sorted index) with anything but 
> LONG ends up with class cast exception:
> {noformat}
>    >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
> class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
> java.base of loader 'bootstrap')
>    >         at 
> org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
>  {noformat}
> Simple test case to reproduce (for 
> TestIndexSortSortedNumericDocValuesRangeQuery):
> {noformat}
>   public void testIndexSortDocValuesWithIntRange() throws Exception {
>     Directory dir = newDirectory();    
> IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
>     Sort indexSort = new Sort(new SortedNumericSortField("field", 
> SortField.Type.INT, false));
>     iwc.setIndexSort(indexSort);
>     
> RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
> writer.addDocument(createDocument("field", -80));    
> 
> DirectoryReader reader = writer.getReader();
>     IndexSearcher searcher = newSearcher(reader);    // Test ranges 
> consisting of one value.
>     assertEquals(1, searcher.count(createQuery("field", -80, -80)));    
> writer.close();
>     reader.close();
>     dir.close();
>   } {noformat}
>  
> The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should 
> not fail with class cast but correctly convert the numeric values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani merged pull request #745: Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri…

2022-03-14 Thread GitBox


jtibshirani merged pull request #745:
URL: https://github.com/apache/lucene/pull/745


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10385) Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery.

2022-03-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506531#comment-17506531
 ] 

ASF subversion and git services commented on LUCENE-10385:
--

Commit a6114b532a273e370528675d551d3ddfa02f4679 in lucene's branch 
refs/heads/branch_9_1 from Luca Cavanna
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a6114b5 ]

Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri… (#745)

In LUCENE-10458 we identified a bug in the logic. We're reverting on the 9.1
branch to avoid holding up the release.


> Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery.
> 
>
> Key: LUCENE-10385
> URL: https://issues.apache.org/jira/browse/LUCENE-10385
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This query can count matches by computing the first and last matching doc IDs 
> using binary search.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10458) BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables

2022-03-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506532#comment-17506532
 ] 

ASF subversion and git services commented on LUCENE-10458:
--

Commit a6114b532a273e370528675d551d3ddfa02f4679 in lucene's branch 
refs/heads/branch_9_1 from Luca Cavanna
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a6114b5 ]

Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri… (#745)

In LUCENE-10458 we identified a bug in the logic. We're reverting on the 9.1
branch to avoid holding up the release.


> BoundedDocSetIdIterator may supply error count in 
> Weigth#count(LeafReaderContext) when missingValue enables
> ---
>
> Key: LUCENE-10458
> URL: https://issues.apache.org/jira/browse/LUCENE-10458
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When IndexSortSortedNumericDocValuesRangeQuery can take advantage of index 
> sort, Weight#count will use BoundedDocSetIdIterator's lastDoc and firstDoc to 
> calculate count, but if missingValue enables, those Documents which not 
> contain DocValues may be involved in calculating count.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Andriy Redko (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy Redko updated LUCENE-10466:
--
Description: 
We have run into this issue while migrating to OpenSearch and making changes to 
accommodate https://issues.apache.org/jira/browse/LUCENE-10087. It turned out 
that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the 
usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static 
ValueComparator loadComparator* method
{noformat}
    @SuppressWarnings("unchecked")
    FieldComparator fieldComparator = (FieldComparator) 
sortField.getComparator(1, 0);
    fieldComparator.setTopValue(topValue);
 {noformat}
 

Using the numeric range query (in case of sorted index) with anything but LONG 
ends up with class cast exception:
{noformat}
   >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
java.base of loader 'bootstrap')
   >         at 
org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
 {noformat}
Simple test case to reproduce (for 
TestIndexSortSortedNumericDocValuesRangeQuery):
{noformat}
  public void testIndexSortDocValuesWithIntRange() throws Exception {
    Directory dir = newDirectory();    
IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
    Sort indexSort = new Sort(new SortedNumericSortField("field", 
SortField.Type.INT, false));
    iwc.setIndexSort(indexSort);
    
RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
writer.addDocument(createDocument("field", -80));    

DirectoryReader reader = writer.getReader();
    IndexSearcher searcher = newSearcher(reader);    // Test ranges consisting 
of one value.
    assertEquals(1, searcher.count(createQuery("field", -80, -80)));    

writer.close();
    reader.close();
    dir.close();
  } {noformat}
 

The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should not 
fail with class cast but correctly convert the numeric values.

  was:
We have run into this issue while migrating to OpenSearch and making changes to 
accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned out 
that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the 
usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static 
ValueComparator loadComparator* method
{noformat}
    @SuppressWarnings("unchecked")
    FieldComparator fieldComparator = (FieldComparator) 
sortField.getComparator(1, 0);
    fieldComparator.setTopValue(topValue);
 {noformat}
 

Using the numeric range query (in case of sorted index) with anything but LONG 
ends up with class cast exception:
{noformat}
   >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
java.base of loader 'bootstrap')
   >         at 
org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
   >         at 
org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
 {noformat}
Simple test case to reproduce (for 
TestIndexSortSortedNumericDocValuesRangeQuery):
{noformat}
  public void testIndexSortDocValuesWithIntRange() throws Exception {
    Directory dir = newDirectory();    
IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
    Sort indexSort = new Sort(new SortedNumericSortField("field", 
SortField.Type.INT, false));
    iwc.setIndexSort(indexSort);
    
RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
writer.addDocument(createDocument("field", -80));    

DirectoryReader reader = writer.getReader();
    IndexSearcher searcher = newSearcher(reader);    // Test ranges consisting 
of one value.
    assertEquals(1, searcher.count(createQuery("field", -80, -80)));    

writer.close();
    reader.close();
    dir.close();
  } {noformat}
 

The expectation is that *

[jira] [Commented] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField

2022-03-14 Thread Andriy Redko (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506548#comment-17506548
 ] 

Andriy Redko commented on LUCENE-10466:
---

Thanks a lot, [~julietibs] , working on the pull request :)  I would like to 
apologize for pasting the wrong issue, 
https://issues.apache.org/jira/browse/LUCENE-10087 is the one.

> IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage 
> of the LONG-encoded SortField
> -
>
> Key: LUCENE-10466
> URL: https://issues.apache.org/jira/browse/LUCENE-10466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 9.0
>Reporter: Andriy Redko
>Priority: Minor
>
> We have run into this issue while migrating to OpenSearch and making changes 
> to accommodate https://issues.apache.org/jira/browse/LUCENE-10087. It turned 
> out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes 
> the usage of the LONG-encoded {*}SortField{*}, as could be seen inside 
> *static ValueComparator loadComparator* method
> {noformat}
>     @SuppressWarnings("unchecked")
>     FieldComparator fieldComparator = (FieldComparator) 
> sortField.getComparator(1, 0);
>     fieldComparator.setTopValue(topValue);
>  {noformat}
>  
> Using the numeric range query (in case of sorted index) with anything but 
> LONG ends up with class cast exception:
> {noformat}
>    >     java.lang.ClassCastException: class java.lang.Long cannot be cast to 
> class java.lang.Integer (java.lang.Long and java.lang.Integer are in module 
> java.base of loader 'bootstrap')
>    >         at 
> org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206)
>    >         at 
> org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170)
>  {noformat}
> Simple test case to reproduce (for 
> TestIndexSortSortedNumericDocValuesRangeQuery):
> {noformat}
>   public void testIndexSortDocValuesWithIntRange() throws Exception {
>     Directory dir = newDirectory();    
> IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
>     Sort indexSort = new Sort(new SortedNumericSortField("field", 
> SortField.Type.INT, false));
>     iwc.setIndexSort(indexSort);
>     
> RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);    
> writer.addDocument(createDocument("field", -80));    
> 
> DirectoryReader reader = writer.getReader();
>     IndexSearcher searcher = newSearcher(reader);    // Test ranges 
> consisting of one value.
>     assertEquals(1, searcher.count(createQuery("field", -80, -80)));    
> writer.close();
>     reader.close();
>     dir.close();
>   } {noformat}
>  
> The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should 
> not fail with class cast but correctly convert the numeric values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


gsmiller commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r826440044



##
File path: lucene/facet/src/java/org/apache/lucene/facet/Facets.java
##
@@ -48,4 +48,13 @@ public abstract FacetResult getTopChildren(int topN, String 
dim, String... path)
* indexed, for example depending on the type of document.
*/
   public abstract List getAllDims(int topN) throws IOException;
+
+  /**
+   * Returns labels for topN dimensions and their topNChildren sorted by the 
number of hits that
+   * dimension matched
+   */
+  public List getTopDims(int topNDims, int topNChildren) throws 
IOException {

Review comment:
   I like the approach of providing a default implementation here so 
existing sub-classes will be fully backwards-compatible (and they don't need to 
worry about providing an implementation if this suits their needs). It might be 
nice to mention explicitly in the javadoc that sub-classes may _want_ to 
override this implementation though with a more efficient one if they're able. 

##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java
##
@@ -104,6 +104,65 @@ public void testBasic() throws Exception {
   "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
   facets.getTopChildren(10, "b").toString());
 
+  // test getAllDims

Review comment:
   Thank you for adding so much testing, including coverage for existing 
`getAllDims` functionality!

##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java
##
@@ -104,6 +104,65 @@ public void testBasic() throws Exception {
   "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
   facets.getTopChildren(10, "b").toString());
 
+  // test getAllDims
+  List results = facets.getAllDims(10);
+  assertEquals(2, results.size());
+  assertEquals(
+  "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
+  results.get(0).toString());
+  assertEquals(
+  "dim=a path=[] value=-1 childCount=3\n  foo (2)\n  bar (1)\n  
zoo (1)\n",
+  results.get(1).toString());
+
+  // test getAllDims(1, 0) with topN = 0
+  expectThrows(

Review comment:
   Ouch! This seems like a poor (existing) experience for users. Would you 
mind creating a Jira to track this? We should probably change this behavior to 
throw an `IllegalArgumentException` at least instead of just an NPE. Thanks for 
uncovering this!

##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/range/TestRangeFacetCounts.java
##
@@ -243,6 +243,24 @@ public void testLongGetAllDims() throws Exception {
 "dim=field path=[] value=22 childCount=5\n  less than 10 (10)\n  less 
than or equal to 10 (11)\n  over 90 (9)\n  90 or above (10)\n  over 1000 (1)\n",
 result.get(0).toString());
 
+// test getAllDims(1)
+List test1Child = facets.getAllDims(1);
+assertEquals(1, test1Child.size());
+assertEquals(
+"dim=field path=[] value=22 childCount=5\n  less than 10 (10)\n  less 
than or equal to 10 (11)\n  over 90 (9)\n  90 or above (10)\n  over 1000 (1)\n",
+test1Child.get(0).toString());
+
+// test default implementation of getTopDims
+List topNDimsResult = facets.getTopDims(1, 1);
+assertEquals(1, topNDimsResult.size());
+assertEquals(

Review comment:
   minor: Since `FacetResult` properly implements `equals`, you could just 
do `assertEquals(test1Child, topNDimsResult)`. This makes it slightly more 
obvious to the reader that you expect the exact same behavior as `getAllDims`

##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java
##
@@ -104,6 +104,65 @@ public void testBasic() throws Exception {
   "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
   facets.getTopChildren(10, "b").toString());
 
+  // test getAllDims
+  List results = facets.getAllDims(10);
+  assertEquals(2, results.size());
+  assertEquals(
+  "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
+  results.get(0).toString());
+  assertEquals(
+  "dim=a path=[] value=-1 childCount=3\n  foo (2)\n  bar (1)\n  
zoo (1)\n",
+  results.get(1).toString());
+
+  // test getAllDims(1, 0) with topN = 0
+  expectThrows(
+  NullPointerException.class,
+  () -> {
+facets.getAllDims(0);
+  });
+
+  // test getTopDims(10, 10) and expect same results from 
getAllDims(10)
+  List allDimsResults = facets.getTopDims(10, 10);
+  assertEquals(2, results.size());
+  assertEquals(
+  "dim=b pat

[GitHub] [lucene] Yuti-G commented on pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G commented on pull request #747:
URL: https://github.com/apache/lucene/pull/747#issuecomment-1067392546


   > Thanks for picking this up! I've looked at everything except for your 
overridden implementation in SSDV faceting, but since I may run out of time to 
look at that today, I'll go ahead and publish my feedback on your default 
implementation and testing. I'll follow up with more feedback soon. Thanks 
again!
   
   Thanks @gsmiller for reviewing my PR and leaving the detailed feedback! I 
will address them in my next PR. Appreciated it  :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G edited a comment on pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G edited a comment on pull request #747:
URL: https://github.com/apache/lucene/pull/747#issuecomment-1067392546


   > Thanks for picking this up! I've looked at everything except for your 
overridden implementation in SSDV faceting, but since I may run out of time to 
look at that today, I'll go ahead and publish my feedback on your default 
implementation and testing. I'll follow up with more feedback soon. Thanks 
again!
   
   Thanks @gsmiller for reviewing my PR and leaving the detailed feedback! I 
will address them in my next PR, appreciated :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r826455347



##
File path: lucene/facet/src/java/org/apache/lucene/facet/Facets.java
##
@@ -48,4 +48,13 @@ public abstract FacetResult getTopChildren(int topN, String 
dim, String... path)
* indexed, for example depending on the type of document.
*/
   public abstract List getAllDims(int topN) throws IOException;
+
+  /**
+   * Returns labels for topN dimensions and their topNChildren sorted by the 
number of hits that
+   * dimension matched
+   */
+  public List getTopDims(int topNDims, int topNChildren) throws 
IOException {

Review comment:
My current javadoc does not well describe this new functionality, and I 
will add more to it. Thanks! 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r826456302



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/range/TestRangeFacetCounts.java
##
@@ -243,6 +243,24 @@ public void testLongGetAllDims() throws Exception {
 "dim=field path=[] value=22 childCount=5\n  less than 10 (10)\n  less 
than or equal to 10 (11)\n  over 90 (9)\n  90 or above (10)\n  over 1000 (1)\n",
 result.get(0).toString());
 
+// test getAllDims(1)
+List test1Child = facets.getAllDims(1);
+assertEquals(1, test1Child.size());
+assertEquals(
+"dim=field path=[] value=22 childCount=5\n  less than 10 (10)\n  less 
than or equal to 10 (11)\n  over 90 (9)\n  90 or above (10)\n  over 1000 (1)\n",
+test1Child.get(0).toString());
+
+// test default implementation of getTopDims
+List topNDimsResult = facets.getTopDims(1, 1);
+assertEquals(1, topNDimsResult.size());
+assertEquals(

Review comment:
   Thanks for catching this! 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r826456758



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java
##
@@ -104,6 +104,65 @@ public void testBasic() throws Exception {
   "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
   facets.getTopChildren(10, "b").toString());
 
+  // test getAllDims

Review comment:
   Thank you so much! I added more testing for getAllDims in order to 
compare the behavior of getTopDims :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r826461610



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java
##
@@ -104,6 +104,65 @@ public void testBasic() throws Exception {
   "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
   facets.getTopChildren(10, "b").toString());
 
+  // test getAllDims
+  List results = facets.getAllDims(10);
+  assertEquals(2, results.size());
+  assertEquals(
+  "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
+  results.get(0).toString());
+  assertEquals(
+  "dim=a path=[] value=-1 childCount=3\n  foo (2)\n  bar (1)\n  
zoo (1)\n",
+  results.get(1).toString());
+
+  // test getAllDims(1, 0) with topN = 0
+  expectThrows(

Review comment:
   Sure! I will create a Jira and resolve it. getAllDims(0) does throw 
`IllegalArgumentException` in TaxonomyFacetCounts because `getTopChildren(0, 
dim)` throws it, but the overridden implementation in SSDV does not specify it. 
I was not sure if the two implementations should behave the same on this 
exception. Thank you so much for confirming this!
   
   The javadoc of this test has a typo, should be `// test getAllDims(0) with 
topN = 0`, will fix this as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r82646



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java
##
@@ -104,6 +104,65 @@ public void testBasic() throws Exception {
   "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
   facets.getTopChildren(10, "b").toString());
 
+  // test getAllDims
+  List results = facets.getAllDims(10);
+  assertEquals(2, results.size());
+  assertEquals(
+  "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
+  results.get(0).toString());
+  assertEquals(
+  "dim=a path=[] value=-1 childCount=3\n  foo (2)\n  bar (1)\n  
zoo (1)\n",
+  results.get(1).toString());
+
+  // test getAllDims(1, 0) with topN = 0
+  expectThrows(
+  NullPointerException.class,
+  () -> {
+facets.getAllDims(0);
+  });
+
+  // test getTopDims(10, 10) and expect same results from 
getAllDims(10)
+  List allDimsResults = facets.getTopDims(10, 10);
+  assertEquals(2, results.size());
+  assertEquals(
+  "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
+  allDimsResults.get(0).toString());
+  assertEquals(
+  "dim=a path=[] value=-1 childCount=3\n  foo (2)\n  bar (1)\n  
zoo (1)\n",
+  allDimsResults.get(1).toString());
+
+  // test getTopDims(2, 1)
+  List topDimsResults = facets.getTopDims(2, 1);

Review comment:
   Thank you so much! I will check and fix other tests that have this issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-14 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r826461610



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java
##
@@ -104,6 +104,65 @@ public void testBasic() throws Exception {
   "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
   facets.getTopChildren(10, "b").toString());
 
+  // test getAllDims
+  List results = facets.getAllDims(10);
+  assertEquals(2, results.size());
+  assertEquals(
+  "dim=b path=[] value=2 childCount=2\n  buzz (2)\n  baz (1)\n",
+  results.get(0).toString());
+  assertEquals(
+  "dim=a path=[] value=-1 childCount=3\n  foo (2)\n  bar (1)\n  
zoo (1)\n",
+  results.get(1).toString());
+
+  // test getAllDims(1, 0) with topN = 0
+  expectThrows(

Review comment:
   Sure! I will create a Jira and resolve it. getAllDims(0) does throw 
`IllegalArgumentException` in TaxonomyFacetCounts because it 
calls`getTopChildren(0, dim)` and getTopChildren throws it, but the overridden 
implementation in SSDV does not specify it. I was not sure if the two 
implementations should behave the same on this exception. Thank you so much for 
confirming this!
   
   The javadoc of this test has a typo, should be `// test getAllDims(0) with 
topN = 0`, will fix this as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10325) Add getTopDims functionality to Facets

2022-03-14 Thread Yuting Gan (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506640#comment-17506640
 ] 

Yuting Gan commented on LUCENE-10325:
-

Thanks [~gsmiller] for creating issue!

I provided a default implementation of `getTopDims({color:#cc7832}int 
{color}topNDims{color:#cc7832}, int {color}topNChildren)` in the Facets class 
that calls the existing `getAllDims(topNChildren)` function and returns 
`FacetResult` of the requested `topNDims` and their `topNChildren`.

Currently, I only experimented with one overridden implementation of 
`getTopDims` in `SortedSetDocValuesFacetCounts` that aims to provide a more 
optimal way of populating dimCount. It avoids resolving all child paths and 
creating all FacetResult for every dim when calling `getTopDims`. 

I created #747 for this change and will appreciate any feedback. Since this 
change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the 
PR is approved, I can also expand it to `ConcurrentSSDVFacetCounts`and explore 
other possible optimized implementations in faceting.

> Add getTopDims functionality to Facets
> --
>
> Key: LUCENE-10325
> URL: https://issues.apache.org/jira/browse/LUCENE-10325
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The current {{getAllDims}} functionality is really the only way for users to 
> determine the "top" dimensions in a faceting field (i.e., get the top dims by 
> count along with their top-n children), but it has the unfortunate 
> side-effect of resolving all child paths for every dim, even if the user 
> doesn't intend to use those dims. For example, if a match set contains docs 
> relating to 100 different dims (and various values under each), but the user 
> only wants the top 10 dims with their top 5 children, they can call 
> getAllDims(5) then just grab the first 10 results, but a lot of wasted work 
> has been done for the other 90 dims.
> It would be nice to implement something like {{getTopDims(int numDims, int 
> numChildren)}} that would only do the work necessary to resolve {{numDims}} 
> dims instead of all dims.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10325) Add getTopDims functionality to Facets

2022-03-14 Thread Yuting Gan (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506640#comment-17506640
 ] 

Yuting Gan edited comment on LUCENE-10325 at 3/15/22, 12:22 AM:


Thanks [~gsmiller] for creating issue!

I provided a default implementation of _getTopDims(int topNDims, int 
topNChildren)_ in the Facets class that calls the existing 
_getAllDims(topNChildren)_ function and returns _FacetResult_ of the requested 
_topNDims_ and their {_}topNChildren{_}.

Currently, I only experimented with one overridden implementation of 
_getTopDims_ in _SortedSetDocValuesFacetCounts_ that aims to provide a more 
optimal way of populating {_}dimCount{_}. It avoids resolving all child paths 
and creating all _FacetResult_ for every dim when calling _getTopDims._ 

I created #747 for this change and will appreciate any feedback. Since this 
change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the 
PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore 
other possible optimized implementations in faceting. Thanks!


was (Author: yutinggan):
Thanks [~gsmiller] for creating issue!

I provided a default implementation of `getTopDims({color:#cc7832}int 
{color}topNDims{color:#cc7832}, int {color}topNChildren)` in the Facets class 
that calls the existing `getAllDims(topNChildren)` function and returns 
`FacetResult` of the requested `topNDims` and their `topNChildren`.

Currently, I only experimented with one overridden implementation of 
`getTopDims` in `SortedSetDocValuesFacetCounts` that aims to provide a more 
optimal way of populating dimCount. It avoids resolving all child paths and 
creating all FacetResult for every dim when calling `getTopDims`. 

I created #747 for this change and will appreciate any feedback. Since this 
change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the 
PR is approved, I can also expand it to `ConcurrentSSDVFacetCounts`and explore 
other possible optimized implementations in faceting.

> Add getTopDims functionality to Facets
> --
>
> Key: LUCENE-10325
> URL: https://issues.apache.org/jira/browse/LUCENE-10325
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The current {{getAllDims}} functionality is really the only way for users to 
> determine the "top" dimensions in a faceting field (i.e., get the top dims by 
> count along with their top-n children), but it has the unfortunate 
> side-effect of resolving all child paths for every dim, even if the user 
> doesn't intend to use those dims. For example, if a match set contains docs 
> relating to 100 different dims (and various values under each), but the user 
> only wants the top 10 dims with their top 5 children, they can call 
> getAllDims(5) then just grab the first 10 results, but a lot of wasted work 
> has been done for the other 90 dims.
> It would be nice to implement something like {{getTopDims(int numDims, int 
> numChildren)}} that would only do the work necessary to resolve {{numDims}} 
> dims instead of all dims.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10325) Add getTopDims functionality to Facets

2022-03-14 Thread Yuting Gan (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506640#comment-17506640
 ] 

Yuting Gan edited comment on LUCENE-10325 at 3/15/22, 12:25 AM:


Thanks [~gsmiller] for creating this issue.

I provided a default implementation of _getTopDims(int topNDims, int 
topNChildren)_ in the Facets class that calls the existing 
_getAllDims(topNChildren)_ function and returns _FacetResult_ of the requested 
_topNDims_ and their {_}topNChildren{_}.

Currently, I only experimented with one overridden implementation of 
_getTopDims_ in _SortedSetDocValuesFacetCounts_ that aims to provide a more 
optimal way of populating {_}dimCount{_}. It avoids resolving all child paths 
and creating all _FacetResult_ for every dim when calling _getTopDims._ 

I created #747 for this change and will appreciate any feedback. Since this 
change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the 
PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore 
other possible optimized implementations in faceting. Thanks!


was (Author: yutinggan):
Thanks [~gsmiller] for creating issue!

I provided a default implementation of _getTopDims(int topNDims, int 
topNChildren)_ in the Facets class that calls the existing 
_getAllDims(topNChildren)_ function and returns _FacetResult_ of the requested 
_topNDims_ and their {_}topNChildren{_}.

Currently, I only experimented with one overridden implementation of 
_getTopDims_ in _SortedSetDocValuesFacetCounts_ that aims to provide a more 
optimal way of populating {_}dimCount{_}. It avoids resolving all child paths 
and creating all _FacetResult_ for every dim when calling _getTopDims._ 

I created #747 for this change and will appreciate any feedback. Since this 
change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the 
PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore 
other possible optimized implementations in faceting. Thanks!

> Add getTopDims functionality to Facets
> --
>
> Key: LUCENE-10325
> URL: https://issues.apache.org/jira/browse/LUCENE-10325
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The current {{getAllDims}} functionality is really the only way for users to 
> determine the "top" dimensions in a faceting field (i.e., get the top dims by 
> count along with their top-n children), but it has the unfortunate 
> side-effect of resolving all child paths for every dim, even if the user 
> doesn't intend to use those dims. For example, if a match set contains docs 
> relating to 100 different dims (and various values under each), but the user 
> only wants the top 10 dims with their top 5 children, they can call 
> getAllDims(5) then just grab the first 10 results, but a lot of wasted work 
> has been done for the other 90 dims.
> It would be nice to implement something like {{getTopDims(int numDims, int 
> numChildren)}} that would only do the work necessary to resolve {{numDims}} 
> dims instead of all dims.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10467) Throws IllegalArgumentException for getAllDims(0)

2022-03-14 Thread Yuting Gan (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuting Gan updated LUCENE-10467:

Description: 
Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it 
would throw a  NullPointerException. Other class that implements the getAllDims 
functionality could have the same issue, except for TaxonomyFacetCounts, which 
has been tested. 

It would provide better user experience by throwing an IllegalArgumentException 
when requesting topN = 0 for getAllDims.

  was:
Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it 
would throw a  NullPointerException. Other class that implements the getAllDims 
functionality could have the same issue, except for TaxonomyFacetCounts, which 
has been tested. 

It would provide better user experience by throwing an IllegalArgumentException 
when requesting topN = 0 for getAllDims.{{{}{}}}{{{}{}}}


> Throws IllegalArgumentException for getAllDims(0)
> -
>
> Key: LUCENE-10467
> URL: https://issues.apache.org/jira/browse/LUCENE-10467
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>
> Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it 
> would throw a  NullPointerException. Other class that implements the 
> getAllDims functionality could have the same issue, except for 
> TaxonomyFacetCounts, which has been tested. 
> It would provide better user experience by throwing an 
> IllegalArgumentException when requesting topN = 0 for getAllDims.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10467) Throws IllegalArgumentException for getAllDims(0)

2022-03-14 Thread Yuting Gan (Jira)
Yuting Gan created LUCENE-10467:
---

 Summary: Throws IllegalArgumentException for getAllDims(0)
 Key: LUCENE-10467
 URL: https://issues.apache.org/jira/browse/LUCENE-10467
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Yuting Gan


Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it 
would throw a  NullPointerException. Other class that implements the getAllDims 
functionality could have the same issue, except for TaxonomyFacetCounts, which 
has been tested. 

It would provide better user experience by throwing an IllegalArgumentException 
when requesting topN = 0 for getAllDims.{{{}{}}}{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10467) Throws IllegalArgumentException for getAllDims if topN <= 0

2022-03-14 Thread Yuting Gan (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuting Gan updated LUCENE-10467:

Description: 
Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it 
would throw a  NullPointerException. Other class that implements the getAllDims 
functionality could have the same issue, except for TaxonomyFacetCounts, which 
has been tested. 

It would provide better user experience by throwing an IllegalArgumentException 
when requesting topN <= 0 for getAllDims.

  was:
Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it 
would throw a  NullPointerException. Other class that implements the getAllDims 
functionality could have the same issue, except for TaxonomyFacetCounts, which 
has been tested. 

It would provide better user experience by throwing an IllegalArgumentException 
when requesting topN = 0 for getAllDims.


> Throws IllegalArgumentException for getAllDims if topN <= 0
> ---
>
> Key: LUCENE-10467
> URL: https://issues.apache.org/jira/browse/LUCENE-10467
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>
> Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it 
> would throw a  NullPointerException. Other class that implements the 
> getAllDims functionality could have the same issue, except for 
> TaxonomyFacetCounts, which has been tested. 
> It would provide better user experience by throwing an 
> IllegalArgumentException when requesting topN <= 0 for getAllDims.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10467) Throws IllegalArgumentException for getAllDims if topN <= 0

2022-03-14 Thread Yuting Gan (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuting Gan updated LUCENE-10467:

Summary: Throws IllegalArgumentException for getAllDims if topN <= 0  (was: 
Throws IllegalArgumentException for getAllDims(0))

> Throws IllegalArgumentException for getAllDims if topN <= 0
> ---
>
> Key: LUCENE-10467
> URL: https://issues.apache.org/jira/browse/LUCENE-10467
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>
> Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it 
> would throw a  NullPointerException. Other class that implements the 
> getAllDims functionality could have the same issue, except for 
> TaxonomyFacetCounts, which has been tested. 
> It would provide better user experience by throwing an 
> IllegalArgumentException when requesting topN = 0 for getAllDims.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a change in pull request #736: LUCENE-10458: BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables

2022-03-14 Thread GitBox


LuXugang commented on a change in pull request #736:
URL: https://github.com/apache/lucene/pull/736#discussion_r826515036



##
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##
@@ -198,16 +198,22 @@ public boolean isCacheable(LeafReaderContext ctx) {
 
   @Override
   public int count(LeafReaderContext context) throws IOException {
-BoundedDocSetIdIterator disi = getDocIdSetIteratorOrNull(context);
-if (disi != null) {
-  return disi.lastDoc - disi.firstDoc;
+Sort indexSort = context.reader().getMetaData().getSort();
+if (indexSort != null
+&& indexSort.getSort().length > 0
+&& indexSort.getSort()[0].getField().equals(field)
+&& indexSort.getSort()[0].getMissingValue() == null) {

Review comment:
   `indexSort.getSort()[0].getMissingValue() == null`
   It indeed seems too aggressive, Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org