[GitHub] [lucene] mogui commented on pull request #679: Monitor Improvements LUCENE-10422
mogui commented on pull request #679: URL: https://github.com/apache/lucene/pull/679#issuecomment-1066549257 @romseygeek fixed ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422
romseygeek commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r825818251 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java ## @@ -125,14 +105,16 @@ public Monitor(Analyzer analyzer, Presearcher presearcher, MonitorConfiguration * Monitor's queryindex * * @param listener listener to register + * @throws IllegalStateException when Monitor is readonly Review comment: I think this is an UOE now? Probably doesn't need to be in the javadoc, to be honest. ## File path: lucene/monitor/src/test/org/apache/lucene/monitor/TestMonitorReadonly.java ## @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.nio.file.Path; +import java.util.Collections; +import java.util.concurrent.TimeUnit; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.core.WhitespaceAnalyzer; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; +import org.apache.lucene.index.IndexNotFoundException; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.TermQuery; +import org.apache.lucene.store.FSDirectory; +import org.junit.Test; + +public class TestMonitorReadonly extends MonitorTestBase { + private static final Analyzer ANALYZER = new WhitespaceAnalyzer(); + + @Test + public void testReadonlyMonitorThrowsOnInexistentIndex() { +Path indexDirectory = createTempDir(); +MonitorConfiguration config = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse), +true); +assertThrows( +IndexNotFoundException.class, +() -> { + new Monitor(ANALYZER, config); +}); + } + + @Test + public void testReadonlyMonitorThrowsWhenCallingWriteRequests() throws IOException { +Path indexDirectory = createTempDir(); +MonitorConfiguration writeConfig = +new MonitorConfiguration() +.setIndexPath( +indexDirectory, MonitorQuerySerializer.fromParser(MonitorTestBase::parse)); + +// this will create the index +Monitor writeMonitor = new Monitor(ANALYZER, writeConfig); +writeMonitor.close(); + +MonitorConfiguration config = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse), +true); +try (Monitor monitor = new Monitor(ANALYZER, config)) { + assertThrows( + IllegalStateException.class, + () -> { +TermQuery query = new TermQuery(new Term(FIELD, "test")); +monitor.register( +new MonitorQuery("query1", query, query.toString(), Collections.emptyMap())); + }); + + assertThrows( + UnsupportedOperationException.class, + () -> { +monitor.deleteById("query1"); + }); + + assertThrows( + UnsupportedOperationException.class, + () -> { +monitor.clear(); + }); +} + } + + @Test + public void testSettingCustomDirectory() throws IOException { +Path indexDirectory = createTempDir(); +Document doc = new Document(); +doc.add(newTextField(FIELD, "This is a Foobar test document", Field.Store.NO)); + +MonitorConfiguration writeConfig = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse)); + +try (Monitor writeMonitor = new Monitor(ANALYZER, writeConfig)) { + TermQuery query = new TermQuery(new Term(FIELD, "test")); + writeMonitor.register( + new MonitorQuery("query1", query, query.toString(), Collections.emptyMap())); + TermQuery query2 = new TermQuery(new Term(FIELD, "Foobar")); + writeMonitor.register( + new MonitorQuery("que
[jira] [Commented] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.
[ https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506184#comment-17506184 ] Adrien Grand commented on LUCENE-10448: --- Like [~vigyas] my understanding is that there would only be a problem in practice if Lucene would do very large and infrequent writes, but in practice Lucene does exactly the opposite. So I'm not sure there's anything to fix? > MergeRateLimiter doesn't always limit instant rate. > --- > > Key: LUCENE-10448 > URL: https://issues.apache.org/jira/browse/LUCENE-10448 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: 8.11.1 >Reporter: kkewwei >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > We can see the code in *MergeRateLimiter*: > {code:java} > private long maybePause(long bytes, long curNS) throws > MergePolicy.MergeAbortedException { > > double rate = mbPerSec; > double secondsToPause = (bytes / 1024. / 1024.) / rate; > long targetNS = lastNS + (long) (10 * secondsToPause); > long curPauseNS = targetNS - curNS; > // We don't bother with thread pausing if the pause is smaller than 2 > msec. > if (curPauseNS <= MIN_PAUSE_NS) { > // Set to curNS, not targetNS, to enforce the instant rate, not > // the "averaged over all history" rate: > lastNS = curNS; > return -1; > } >.. > } > {code} > If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, > then the *maybePause* is called in 7:05 again, so the value of > *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than > *curNS*, no matter how big the bytes is, we will return -1 and ignore to > pause. > I count the total times(callTimes) calling *maybePause* and ignored pause > times(ignorePauseTimes) and detail ignored bytes(detailBytes): > {code:java} > [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] > [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 > docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec > throttle], [callTimes=857], [ignorePauseTimes=25], [detailBytes(mb) = > [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, > 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, > 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, > 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]] > {code} > There are 857 times calling *maybePause*, including 25 times which is ignored > to pause, we can see that the ignored detail bytes (such as 0.28125mb) are > not small. > As long as the interval between two *maybePause* calls is relatively long, > the pause action that should be executed will not be executed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #740: LUCENE-10393: Unify binary dictionary and dictionary writer in kuromoji and nori
mocobeta commented on pull request #740: URL: https://github.com/apache/lucene/pull/740#issuecomment-1066672493 We could have a common `DictionaryBuilder` class in analyzers-common but it brings too complex class hierarchy to me. I'd postpone refactoring XXXDictionaryBuilder until we come up with good interfaces or framework for that - it may need public interface changes and is out of the scope of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #737: Reduce for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms method.
jpountz commented on pull request #737: URL: https://github.com/apache/lucene/pull/737#issuecomment-1066789801 This change makes sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #736: LUCENE-10458: BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables
jpountz commented on a change in pull request #736: URL: https://github.com/apache/lucene/pull/736#discussion_r825981973 ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java ## @@ -198,16 +198,22 @@ public boolean isCacheable(LeafReaderContext ctx) { @Override public int count(LeafReaderContext context) throws IOException { -BoundedDocSetIdIterator disi = getDocIdSetIteratorOrNull(context); -if (disi != null) { - return disi.lastDoc - disi.firstDoc; +Sort indexSort = context.reader().getMetaData().getSort(); +if (indexSort != null +&& indexSort.getSort().length > 0 +&& indexSort.getSort()[0].getField().equals(field) +&& indexSort.getSort()[0].getMissingValue() == null) { Review comment: I don't think that this is the right thing to check since the missing value is assumed to be zero if not set. The best thing we can do that I can think of is to check if the field is dense via points (ie. no missing values) or if the missing value falls outside of the range so that the bounded iterator is accurate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #718: LUCENE-10444: Support alternate aggregation functions in association facets
gsmiller commented on a change in pull request #718: URL: https://github.com/apache/lucene/pull/718#discussion_r826082690 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java ## @@ -130,16 +140,16 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I ord = siblings[ord]; } -if (sumValues == 0) { +if (aggregatedValue == 0) { return null; } if (dimConfig.multiValued) { if (dimConfig.requireDimCount) { -sumValues = values[dimOrd]; +aggregatedValue = values[dimOrd]; } else { // Our sum'd count is not correct, in general: Review comment: It's not necessarily a "count" though here right? It's an aggregated weight associated with the value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #718: LUCENE-10444: Support alternate aggregation functions in association facets
gsmiller commented on a change in pull request #718: URL: https://github.com/apache/lucene/pull/718#discussion_r826085655 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java ## @@ -173,17 +185,17 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I if (sparseValues != null) { for (IntIntCursor c : sparseValues) { -int count = c.value; +int value = c.value; int ord = c.key; -if (parents[ord] == dimOrd && count > 0) { - totValue += count; +if (parents[ord] == dimOrd && value > 0) { + aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value); childCount++; - if (count > bottomValue) { + if (value > bottomValue) { Review comment: That's right. There are a number of things actually preventing us from cleanly adding something like `min`. I had it originally but as I started looking at all the changes it would require, I backed off for the time being (especially since I don't have a concrete use-case in mind). One interesting challenge is that these facets implementations all assume the weights are positive values. There are a lot of `> 0` checks floating around the various implementations to check whether-or-not a value had any "weight" associated with it. This makes sense when using counts, but it's weird when generally associated weights with the values. So `min` started to feel a little weird and I just left it out for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request #745: Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri…
javanna opened a new pull request #745: URL: https://github.com/apache/lucene/pull/745 Based on discussion happening in https://issues.apache.org/jira/browse/LUCENE-10458 , I am reverting LUCENE-10385 (#635) in the 8.1 branch. I left some test improvements that are still valid but removed the specific tests that verified the count optimization that no longer exists in this branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #718: LUCENE-10444: Support alternate aggregation functions in association facets
gsmiller commented on pull request #718: URL: https://github.com/apache/lucene/pull/718#issuecomment-1066998156 Even though I ran benchmarks on the backport version of this change (#719), I figured it would be good to run benchmarks here as well. Below compares this patch against `main` using `wikimedium10m`: ``` TaskQPS baseline StdDevQPS candidate StdDevPct diff p-value BrowseDayOfYearTaxoFacets 21.65 (22.8%) 20.61 (22.9%) -4.8% ( -41% - 52%) 0.507 BrowseDateTaxoFacets 21.61 (22.6%) 20.61 (22.8%) -4.6% ( -40% - 52%) 0.519 Prefix3 362.81 (10.5%) 349.00 (11.4%) -3.8% ( -23% - 20%) 0.273 BrowseRandomLabelTaxoFacets 17.76 (18.5%) 17.15 (20.0%) -3.4% ( -35% - 43%) 0.577 BrowseMonthTaxoFacets 27.32 (27.1%) 26.45 (29.2%) -3.2% ( -46% - 72%) 0.721 Wildcard 64.61 (5.6%) 63.58 (6.0%) -1.6% ( -12% - 10%) 0.381 OrNotHighMed 887.00 (3.3%) 874.66 (3.4%) -1.4% ( -7% -5%) 0.191 LowTerm 2661.07 (2.9%) 2630.67 (3.2%) -1.1% ( -7% -5%) 0.240 OrNotHighHigh 1523.01 (3.8%) 1506.01 (4.2%) -1.1% ( -8% -7%) 0.379 AndHighMedDayTaxoFacets 124.82 (1.4%) 123.59 (1.5%) -1.0% ( -3% -1%) 0.032 HighSpanNear 21.47 (4.9%) 21.27 (4.9%) -0.9% ( -10% -9%) 0.558 MedPhrase 342.22 (2.8%) 339.35 (3.2%) -0.8% ( -6% -5%) 0.373 HighPhrase 453.21 (2.2%) 449.61 (2.6%) -0.8% ( -5% -4%) 0.291 MedSpanNear 74.03 (4.3%) 73.45 (4.2%) -0.8% ( -8% -8%) 0.559 BrowseMonthSSDVFacets 13.79 (19.1%) 13.69 (19.7%) -0.7% ( -33% - 47%) 0.910 LowPhrase 85.89 (1.9%) 85.33 (2.0%) -0.7% ( -4% -3%) 0.294 PKLookup 169.25 (3.2%) 168.19 (2.3%) -0.6% ( -5% -5%) 0.481 OrNotHighLow 962.38 (3.0%) 956.47 (2.9%) -0.6% ( -6% -5%) 0.511 BrowseDayOfYearSSDVFacets 12.23 (14.4%) 12.15 (14.5%) -0.6% ( -25% - 33%) 0.897 BrowseDateSSDVFacets2.34 (6.1%)2.32 (7.8%) -0.6% ( -13% - 14%) 0.802 OrHighMed 134.10 (5.0%) 133.49 (4.6%) -0.5% ( -9% -9%) 0.766 Fuzzy1 91.09 (1.2%) 90.71 (1.8%) -0.4% ( -3% -2%) 0.373 HighTerm 1690.39 (4.6%) 1684.21 (5.3%) -0.4% ( -9% - 10%) 0.816 OrHighNotHigh 1592.87 (2.4%) 1587.86 (3.7%) -0.3% ( -6% -5%) 0.751 AndHighHighDayTaxoFacets 12.34 (2.4%) 12.31 (2.4%) -0.3% ( -4% -4%) 0.696 AndHighLow 927.42 (3.2%) 924.95 (3.1%) -0.3% ( -6% -6%) 0.790 MedSloppyPhrase 107.35 (2.5%) 107.06 (2.5%) -0.3% ( -5% -4%) 0.736 IntNRQ 83.16 (1.2%) 82.95 (1.0%) -0.2% ( -2% -1%) 0.469 MedTerm 1935.87 (4.3%) 1934.77 (4.8%) -0.1% ( -8% -9%) 0.969 OrHighNotLow 1109.87 (4.1%) 1109.25 (4.7%) -0.1% ( -8% -9%) 0.968 HighSloppyPhrase 28.62 (1.8%) 28.61 (2.6%) -0.0% ( -4% -4%) 0.981 Respell 57.41 (1.1%) 57.40 (1.6%) -0.0% ( -2% -2%) 0.968 LowSpanNear 193.79 (3.4%) 193.81 (3.8%)0.0% ( -6% -7%) 0.992 LowSloppyPhrase 30.88 (1.5%) 30.90 (1.8%)0.1% ( -3% -3%) 0.885 OrHighHigh 38.71 (4.3%) 38.74 (4.2%)0.1% ( -8% -8%) 0.954 OrHighLow 606.33 (2.8%) 606.93 (2.5%)0.1% ( -5% -5%) 0.907 OrHighNotMed 985.50 (4.3%) 986.95 (4.6%)0.1% ( -8% -9%) 0.917 Fuzzy2 38.38 (1.5%) 38.45 (1.8%)0.2% ( -3% -3%) 0.698 OrHighMedDayTaxoFacets5.37 (4.8%)5.39 (4.9%)0.4% ( -8% - 10%) 0.801 MedTermDayTaxoFacets 27.22 (3.9%) 27.
[jira] [Commented] (LUCENE-10458) BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables
[ https://issues.apache.org/jira/browse/LUCENE-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506341#comment-17506341 ] Adrien Grand commented on LUCENE-10458: --- Thanks for catching this, [~lucacavanna] is reverting the change on branch_9_1 to not delay the release. https://github.com/apache/lucene/pull/745 > BoundedDocSetIdIterator may supply error count in > Weigth#count(LeafReaderContext) when missingValue enables > --- > > Key: LUCENE-10458 > URL: https://issues.apache.org/jira/browse/LUCENE-10458 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Priority: Major > Fix For: 9.1 > > Time Spent: 20m > Remaining Estimate: 0h > > When IndexSortSortedNumericDocValuesRangeQuery can take advantage of index > sort, Weight#count will use BoundedDocSetIdIterator's lastDoc and firstDoc to > calculate count, but if missingValue enables, those Documents which not > contain DocValues may be involved in calculating count. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10441) ArrayIndexOutOfBoundsException during indexing
[ https://issues.apache.org/jira/browse/LUCENE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506376#comment-17506376 ] Peixin Li commented on LUCENE-10441: How many tokens should causing issue? and is there a way to improve it currently i'm using slandered analyzer for indexWriter, it could cause too many tokens if terms are having a lot of "-" or "." right? > ArrayIndexOutOfBoundsException during indexing > -- > > Key: LUCENE-10441 > URL: https://issues.apache.org/jira/browse/LUCENE-10441 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.10 >Reporter: Peixin Li >Priority: Major > > Hi experts!, i have facing ArrayIndexOutOfBoundsException during indexing and > committing documents, this exception gives me no clue about what happened so > i have little information for debugging, can i have some suggest about what > could be and how to fix this error? i'm using Lucene 8.10.0 > {code:java} > java.lang.ArrayIndexOutOfBoundsException: -1 > at org.apache.lucene.util.BytesRefHash$1.get(BytesRefHash.java:179) > at > org.apache.lucene.util.StringMSBRadixSorter$1.get(StringMSBRadixSorter.java:42) > at > org.apache.lucene.util.StringMSBRadixSorter$1.setPivot(StringMSBRadixSorter.java:63) > at org.apache.lucene.util.Sorter.binarySort(Sorter.java:192) > at org.apache.lucene.util.Sorter.binarySort(Sorter.java:187) > at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:41) > at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:83) > at org.apache.lucene.util.IntroSorter.sort(IntroSorter.java:36) > at > org.apache.lucene.util.MSBRadixSorter.introSort(MSBRadixSorter.java:133) > at org.apache.lucene.util.MSBRadixSorter.sort(MSBRadixSorter.java:126) > at org.apache.lucene.util.MSBRadixSorter.sort(MSBRadixSorter.java:121) > at org.apache.lucene.util.BytesRefHash.sort(BytesRefHash.java:183) > at > org.apache.lucene.index.SortedSetDocValuesWriter.flush(SortedSetDocValuesWriter.java:171) > at > org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:348) > at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:228) > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656) > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364) > at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770) > at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta opened a new pull request #746: Sanity check on start javaw
mocobeta opened a new pull request #746: URL: https://github.com/apache/lucene/pull/746 This tests if Luke process successfully starts on "java" on Mac/Linux or "start javaw" on Windows. TestScripts now checks if the expected message is contained in a log file instead of the forked process's stdout. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10464) unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms
[ https://issues.apache.org/jira/browse/LUCENE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506392#comment-17506392 ] Christine Poerschke commented on LUCENE-10464: -- https://github.com/apache/lucene/pull/737 > unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms > --- > > Key: LUCENE-10464 > URL: https://issues.apache.org/jira/browse/LUCENE-10464 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > > The > https://github.com/apache/lucene/commit/81c7ba4601a9aaf16e2255fe493ee582abe72a90 > change in LUCENE-4728 included > {code} > - final SpanQuery rewrittenQuery = (SpanQuery) > spanQuery.rewrite(getLeafContextForField(field).reader()); > + final SpanQuery rewrittenQuery = (SpanQuery) > spanQuery.rewrite(getLeafContext().reader()); > {code} > i.e. previously more needed to happen in the loop but now the query rewrite > and term collecting need not happen in the loop. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10464) unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms
Christine Poerschke created LUCENE-10464: Summary: unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms Key: LUCENE-10464 URL: https://issues.apache.org/jira/browse/LUCENE-10464 Project: Lucene - Core Issue Type: Task Reporter: Christine Poerschke Assignee: Christine Poerschke The https://github.com/apache/lucene/commit/81c7ba4601a9aaf16e2255fe493ee582abe72a90 change in LUCENE-4728 included {code} - final SpanQuery rewrittenQuery = (SpanQuery) spanQuery.rewrite(getLeafContextForField(field).reader()); + final SpanQuery rewrittenQuery = (SpanQuery) spanQuery.rewrite(getLeafContext().reader()); {code} i.e. previously more needed to happen in the loop but now the query rewrite and term collecting need not happen in the loop. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10465) Unable to find antlr4.runtime
Muler created LUCENE-10465: -- Summary: Unable to find antlr4.runtime Key: LUCENE-10465 URL: https://issues.apache.org/jira/browse/LUCENE-10465 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Muler While running the trunk version of Lucene on Intellij, I'm getting the below error and unable to fix it. Error occurred during initialization of boot layer java.lang.module.FindException: Module antlr4.runtime not found, required by org.apache.lucene.expressions -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10427) OLAP likewise rollup during segment merge process
[ https://issues.apache.org/jira/browse/LUCENE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506402#comment-17506402 ] Adrien Grand commented on LUCENE-10427: --- Thanks I understand better now. With the sidecar approach, could you compute rollups at index time by performing updates instead of hooking into the merging process? For instance if a user is adding a new sample, you could retrieve data for the current bucket for the given dimensions and update the min/max/sum values? > OLAP likewise rollup during segment merge process > - > > Key: LUCENE-10427 > URL: https://issues.apache.org/jira/browse/LUCENE-10427 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Suhan Mao >Priority: Major > > Currently, many OLAP engines support rollup feature like > clickhouse(AggregateMergeTree)/druid. > Rollup definition: [https://athena.ecs.csus.edu/~mei/olap/OLAPoperations.php] > One of the way to do rollup is to merge the same dimension buckets into one > and do sum()/min()/max() operation on metric fields during segment > compact/merge process. This can significantly reduce the size of the data and > speed up the query a lot. > > *Abstraction of how to do* > # Define rollup logic: which is dimensions and metrics. > # Rollup definition for each metric field: max/min/sum ... > # index sorting should the the same as dimension fields. > # We will do rollup calculation during segment merge just like other OLAP > engine do. > > *Assume the scenario* > We use ES to ingest realtime raw temperature data every minutes of each > sensor device along with many dimension information. User may want to query > the data like "what is the max temperature of some device within some/latest > hour" or "what is the max temperature of some city within some/latest hour" > In that way, we can define such fields and rollup definition: > # event_hour(round to hour granularity) > # device_id(dimension) > # city_id(dimension) > # temperature(metrics, max/min rollup logic) > The raw data will periodically be rolled up to the hour granularity during > segment merge process, which should save 60x storage ideally in the end. > > *How we do rollup in segment merge* > bucket: docs should belong to the same bucket if the dimension values are all > the same. > # For docvalues merge, we send the normal mappedDocId if we encounter a new > bucket in DocIDMerger. > # Since the index sorting fields are the same with dimension fields. if we > encounter more docs in the same bucket, We emit special mappedDocId from > DocIDMerger . > # In DocValuesConsumer.mergeNumericField, if we meet special mappedDocId, we > do a rollup calculation on metric fields and fold the result value to the > first doc in the bucket. The calculation just like a streaming merge sort > rollup. > # We discard all the special mappedDocId docs because the metrics is already > folded to the first doc of in the bucket. > # In BKD/posting structure, we discard all the special mappedDocId docs and > only place the first doc id within a bucket in the BKD/posting data. It > should be simple. > > *How to define the logic* > > {code:java} > public class RollupMergeConfig { > private List dimensionNames; > private List aggregateFields; > } > public class RollupMergeAggregateField { > private String name; > private RollupMergeAggregateType aggregateType; > } > public enum RollupMergeAggregateType { > COUNT, > SUM, > MIN, > MAX, > CARDINALITY // if data sketch is stored in binary doc values, we can do a > union logic > }{code} > > > I have written the initial code in a basic level. I can submit the complete > PR if you think this feature is good to try. > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10461) Luke: Windows launch script passes integration tests but fails to run
[ https://issues.apache.org/jira/browse/LUCENE-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506403#comment-17506403 ] Tomoko Uchida commented on LUCENE-10461: I am fine with keeping javaw support. In that case, we could slightly change the TestScripts to make it allow to check the healthiness of the spawned process by "start javaw" when it runs on Windows. I'm sorry for being persistent - I just wanted to test the actual command, rather than adjusted one for testing. https://github.com/apache/lucene/pull/746 > Luke: Windows launch script passes integration tests but fails to run > - > > Key: LUCENE-10461 > URL: https://issues.apache.org/jira/browse/LUCENE-10461 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: 9.1, 10.0 (main) > > Attachments: image-2022-03-13-11-18-34-704.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > PR at https://github.com/apache/lucene/pull/743 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #746: Sanity check on start javaw
mocobeta commented on a change in pull request #746: URL: https://github.com/apache/lucene/pull/746#discussion_r826259906 ## File path: lucene/distribution.tests/src/test/org/apache/lucene/distribution/TestScripts.java ## @@ -112,6 +120,7 @@ protected void execute( Launcher launcher, int expectedExitCode, long timeoutInSeconds, + Path logFile, Review comment: I added this parameter for proof of concepts, but this can be (should be) `Optional`. If None is passed, `processOutputConsumer` would take stdout of the forked process as before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #746: Sanity check on start javaw
mocobeta commented on a change in pull request #746: URL: https://github.com/apache/lucene/pull/746#discussion_r826266136 ## File path: lucene/distribution.tests/src/test/org/apache/lucene/distribution/TestScripts.java ## @@ -125,13 +134,29 @@ protected void execute( throw new AssertionError("Forked process did not terminate in the expected time"); } +// Wait until the log file is created by Luke. Review comment: ```suggestion // Wait until the log file is created by the descendant process. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10461) Luke: Windows launch script passes integration tests but fails to run
[ https://issues.apache.org/jira/browse/LUCENE-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506458#comment-17506458 ] Dawid Weiss commented on LUCENE-10461: -- Hey, Tomoko. I don't think we should go in the direction you have in your patch. java and javaw are literally the same, functionally. The difference is one has a window-application api entry and the other requires console support - it really doesn't matter from Luke's point of view. I also think it is more than fine to run the tests with 'java' and skip the 'start' to make the script blocking. It is a Windows-specific script and it is a Windows-specific workaround to make the test behave better. With your patch, the test relies on wall clock to detect the log file and is just more complex than it has to be. Just my few cents. > Luke: Windows launch script passes integration tests but fails to run > - > > Key: LUCENE-10461 > URL: https://issues.apache.org/jira/browse/LUCENE-10461 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: 9.1, 10.0 (main) > > Attachments: image-2022-03-13-11-18-34-704.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > PR at https://github.com/apache/lucene/pull/743 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
Andriy Redko created LUCENE-10466: - Summary: IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField Key: LUCENE-10466 URL: https://issues.apache.org/jira/browse/LUCENE-10466 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring Affects Versions: 9.0 Reporter: Andriy Redko We have run into this issue while migrating to OpenSearch and making changes to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static ValueComparator loadComparator* method {noformat} @SuppressWarnings("unchecked") FieldComparator fieldComparator = (FieldComparator) sortField.getComparator(1, 0); fieldComparator.setTopValue(topValue); {noformat} Using the numeric range query (in case of sorted index) with anything but LONG ends up with class cast exception: {noformat} > java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap') > at org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) {noformat} Simple test case to reproduce (for TestIndexSortSortedNumericDocValuesRangeQuery): {noformat} public void testIndexSortDocValuesWithIntRange() throws Exception { Directory dir = newDirectory(); IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.INT, false)); iwc.setIndexSort(indexSort); RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); writer.addDocument(createDocument("field", -80)); DirectoryReader reader = writer.getReader(); IndexSearcher searcher = newSearcher(reader); // Test ranges consisting of one value. assertEquals(1, searcher.count(createQuery("field", -80, -80))); writer.close(); reader.close(); dir.close(); } {noformat} The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should not fail with class cast but correctly convert the numeric values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
[ https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy Redko updated LUCENE-10466: -- Description: We have run into this issue while migrating to OpenSearch and making changes to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static ValueComparator loadComparator* method {noformat} @SuppressWarnings("unchecked") FieldComparator fieldComparator = (FieldComparator) sortField.getComparator(1, 0); fieldComparator.setTopValue(topValue); {noformat} Using the numeric range query (in case of sorted index) with anything but LONG ends up with class cast exception: {noformat} > java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap') > at org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) {noformat} Simple test case to reproduce (for TestIndexSortSortedNumericDocValuesRangeQuery): {noformat} public void testIndexSortDocValuesWithIntRange() throws Exception { Directory dir = newDirectory(); IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.INT, false)); iwc.setIndexSort(indexSort); RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); writer.addDocument(createDocument("field", -80)); DirectoryReader reader = writer.getReader(); IndexSearcher searcher = newSearcher(reader); // Test ranges consisting of one value. assertEquals(1, searcher.count(createQuery("field", -80, -80))); writer.close(); reader.close(); dir.close(); } {noformat} The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should not fail with class cast but correctly convert the numeric values. was: We have run into this issue while migrating to OpenSearch and making changes to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static ValueComparator loadComparator* method {noformat} @SuppressWarnings("unchecked") FieldComparator fieldComparator = (FieldComparator) sortField.getComparator(1, 0); fieldComparator.setTopValue(topValue); {noformat} Using the numeric range query (in case of sorted index) with anything but LONG ends up with class cast exception: {noformat} > java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap') > at org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) {noformat} Simple test case to reproduce (for TestIndexSortSortedNumericDocValuesRangeQuery): {noformat} public void testIndexSortDocValuesWithIntRange() throws Exception { Directory dir = newDirectory(); IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.INT, false)); iwc.setIndexSort(indexSort); RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); writer.addDocument(createDocument("field", -80)); DirectoryReader reader = writer.getReader(); IndexSearcher searcher = newSearcher(reader); // Test ranges consisting of one value. assertEquals(1, searcher.count(createQuery("field", -80, -80))); writer.close(); reader.close(); dir.close(); } {noformat} The expectation is that *IndexSortSortedNumericDocVa
[jira] [Commented] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
[ https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506464#comment-17506464 ] Andriy Redko commented on LUCENE-10466: --- [~jpountz] does the issue make sense to you? I would be happy to work on pull request to fix that, thank you. > IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage > of the LONG-encoded SortField > - > > Key: LUCENE-10466 > URL: https://issues.apache.org/jira/browse/LUCENE-10466 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 9.0 >Reporter: Andriy Redko >Priority: Major > > We have run into this issue while migrating to OpenSearch and making changes > to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned > out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes > the usage of the LONG-encoded {*}SortField{*}, as could be seen inside > *static ValueComparator loadComparator* method > {noformat} > @SuppressWarnings("unchecked") > FieldComparator fieldComparator = (FieldComparator) > sortField.getComparator(1, 0); > fieldComparator.setTopValue(topValue); > {noformat} > > Using the numeric range query (in case of sorted index) with anything but > LONG ends up with class cast exception: > {noformat} > > java.lang.ClassCastException: class java.lang.Long cannot be cast to > class java.lang.Integer (java.lang.Long and java.lang.Integer are in module > java.base of loader 'bootstrap') > > at > org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) > {noformat} > Simple test case to reproduce (for > TestIndexSortSortedNumericDocValuesRangeQuery): > {noformat} > public void testIndexSortDocValuesWithIntRange() throws Exception { > Directory dir = newDirectory(); IndexWriterConfig iwc = new > IndexWriterConfig(new MockAnalyzer(random())); > Sort indexSort = new Sort(new SortedNumericSortField("field", > SortField.Type.INT, false)); > iwc.setIndexSort(indexSort); > RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); > writer.addDocument(createDocument("field", -80)); DirectoryReader reader = > writer.getReader(); > IndexSearcher searcher = newSearcher(reader); // Test ranges > consisting of one value. > assertEquals(1, searcher.count(createQuery("field", -80, -80))); > writer.close(); > reader.close(); > dir.close(); > } {noformat} > > The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should > not fail with class cast but correctly convert the numeric values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #746: Sanity check on start javaw
mocobeta commented on a change in pull request #746: URL: https://github.com/apache/lucene/pull/746#discussion_r826294904 ## File path: lucene/distribution/src/binary-release/bin/luke.cmd ## @@ -18,21 +18,17 @@ SETLOCAL SET MODULES=%~dp0.. -IF DEFINED LAUNCH_CMD GOTO testing REM Windows 'start' command takes the first quoted argument to be the title of the started window. Since we REM quote the LAUNCH_CMD (because it can contain spaces), it misinterprets it as the title and fails to run. REM force the window title here. SET LAUNCH_START=start "Lucene Luke" + +IF DEFINED LAUNCH_CMD GOTO testing SET LAUNCH_CMD=javaw SET LAUNCH_OPTS= goto launch :testing -REM For distribution testing we don't use start and pass an explicit java command path, -REM This is required because otherwise we can't block on luke invocation and can't intercept -REM the return status. We also force UTF-8 encoding so that we don't have to interpret the output in -REM an unknown local platform encoding. Review comment: This comment was wrongly removed; should be reverted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
[ https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy Redko updated LUCENE-10466: -- Priority: Minor (was: Major) > IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage > of the LONG-encoded SortField > - > > Key: LUCENE-10466 > URL: https://issues.apache.org/jira/browse/LUCENE-10466 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 9.0 >Reporter: Andriy Redko >Priority: Minor > > We have run into this issue while migrating to OpenSearch and making changes > to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned > out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes > the usage of the LONG-encoded {*}SortField{*}, as could be seen inside > *static ValueComparator loadComparator* method > {noformat} > @SuppressWarnings("unchecked") > FieldComparator fieldComparator = (FieldComparator) > sortField.getComparator(1, 0); > fieldComparator.setTopValue(topValue); > {noformat} > > Using the numeric range query (in case of sorted index) with anything but > LONG ends up with class cast exception: > {noformat} > > java.lang.ClassCastException: class java.lang.Long cannot be cast to > class java.lang.Integer (java.lang.Long and java.lang.Integer are in module > java.base of loader 'bootstrap') > > at > org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) > {noformat} > Simple test case to reproduce (for > TestIndexSortSortedNumericDocValuesRangeQuery): > {noformat} > public void testIndexSortDocValuesWithIntRange() throws Exception { > Directory dir = newDirectory(); > IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); > Sort indexSort = new Sort(new SortedNumericSortField("field", > SortField.Type.INT, false)); > iwc.setIndexSort(indexSort); > > RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); > writer.addDocument(createDocument("field", -80)); > > DirectoryReader reader = writer.getReader(); > IndexSearcher searcher = newSearcher(reader); // Test ranges > consisting of one value. > assertEquals(1, searcher.count(createQuery("field", -80, -80))); > writer.close(); > reader.close(); > dir.close(); > } {noformat} > > The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should > not fail with class cast but correctly convert the numeric values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10461) Luke: Windows launch script passes integration tests but fails to run
[ https://issues.apache.org/jira/browse/LUCENE-10461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506488#comment-17506488 ] Dawid Weiss commented on LUCENE-10461: -- Just to be clear - I did take a look at the patch... I know what you're trying to do but something in me says it's not going to work reliably (the timeouts awaiting the appearance of the log file, the wait for the buffer flush from a subprocess). I think it's far less trappy to just use the blocking java call in the script for integration tests... If you're convinced this is the way to go then I'm not going to stand in the way... I'd perhaps suggest to at least make the start command blocking (add the /wait option in the script just for testing) - this will eliminate the need to wait for the log file to appear (as start will be synchronous then). > Luke: Windows launch script passes integration tests but fails to run > - > > Key: LUCENE-10461 > URL: https://issues.apache.org/jira/browse/LUCENE-10461 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: 9.1, 10.0 (main) > > Attachments: image-2022-03-13-11-18-34-704.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > PR at https://github.com/apache/lucene/pull/743 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G opened a new pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G opened a new pull request #747: URL: https://github.com/apache/lucene/pull/747 # Description This change * adds a new API - getTopDims in Facets to support users specify the number of dims and children they want to get, and returns only these dims and children. * override getTopDims in SortedSetDocValuesFacetCounts to optimize the current method of getting dimCount, return FacetResult and resolve child paths for only the requested dims. # Solution * Implement a default getTopDims function in the Facets class. * Override getTopDims and refactor the getPathResult function in SortedSetDocValuesFacetCounts to get dimCount (aggregated dim values) more efficiently by checking if dimCount has been populated in indexing time (setRequireDimCount == true) before accumulating dimCount using a while loop. * Use priority queue to store the requested top n dims and populate labels and returns FacetResult for those dims. # Tests Added new testing for both default and overridden implementations of getTopDims # Checklist Please review the following and check all that apply: - [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [X] I have created a Jira issue and added the issue ID to my pull request title. - [X] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [X] I have developed this patch against the `main` branch. - [X] I have run `./gradlew check`. - [X] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10460) Delegating DocIdSetIterator could be replaced to DocIdSetIterator#range(int minDoc, int maxDoc) in IndexSortSortedNumericDocValuesRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506528#comment-17506528 ] Julie Tibshirani commented on LUCENE-10460: --- It indeed seems okay to use a simple DocIdSetIterator#range in this case. I'm wondering about the motivation for specializing this case though, especially since the logic is already pretty complex. Have you seen it make a latency difference when there are missing values? In the case with no missing values I don't think it will help much, since iterating dense doc values is already optimized (see DenseNumericDocValues). > Delegating DocIdSetIterator could be replaced to DocIdSetIterator#range(int > minDoc, int maxDoc) in IndexSortSortedNumericDocValuesRangeQuery > > > Key: LUCENE-10460 > URL: https://issues.apache.org/jira/browse/LUCENE-10460 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > While taking advantage of of index sort In > IndexSortSortedNumericDocValuesRangeQuery, if MissingValue disabled, all > Documents between a range of firstDoc and lastDoc must contain DocValues. So > In BoundedDocSetIdIterator#advance(int), the delegating DocIdSetIterator > could be replaced to DocIdSetIterator#range(int minDoc, int maxDoc)? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
[ https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506530#comment-17506530 ] Julie Tibshirani commented on LUCENE-10466: --- Thank you [~reta] for reporting this. I had noticed the same thing when integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but forgot to follow up!! I would be happy to help review a PR if you're up for it. For context, how did you run into this? How does it relate to deletions in nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040)? > IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage > of the LONG-encoded SortField > - > > Key: LUCENE-10466 > URL: https://issues.apache.org/jira/browse/LUCENE-10466 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 9.0 >Reporter: Andriy Redko >Priority: Minor > > We have run into this issue while migrating to OpenSearch and making changes > to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned > out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes > the usage of the LONG-encoded {*}SortField{*}, as could be seen inside > *static ValueComparator loadComparator* method > {noformat} > @SuppressWarnings("unchecked") > FieldComparator fieldComparator = (FieldComparator) > sortField.getComparator(1, 0); > fieldComparator.setTopValue(topValue); > {noformat} > > Using the numeric range query (in case of sorted index) with anything but > LONG ends up with class cast exception: > {noformat} > > java.lang.ClassCastException: class java.lang.Long cannot be cast to > class java.lang.Integer (java.lang.Long and java.lang.Integer are in module > java.base of loader 'bootstrap') > > at > org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) > {noformat} > Simple test case to reproduce (for > TestIndexSortSortedNumericDocValuesRangeQuery): > {noformat} > public void testIndexSortDocValuesWithIntRange() throws Exception { > Directory dir = newDirectory(); > IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); > Sort indexSort = new Sort(new SortedNumericSortField("field", > SortField.Type.INT, false)); > iwc.setIndexSort(indexSort); > > RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); > writer.addDocument(createDocument("field", -80)); > > DirectoryReader reader = writer.getReader(); > IndexSearcher searcher = newSearcher(reader); // Test ranges > consisting of one value. > assertEquals(1, searcher.count(createQuery("field", -80, -80))); > writer.close(); > reader.close(); > dir.close(); > } {noformat} > > The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should > not fail with class cast but correctly convert the numeric values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
[ https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506530#comment-17506530 ] Julie Tibshirani edited comment on LUCENE-10466 at 3/14/22, 8:40 PM: - Thank you [~reta] for reporting this. I had noticed the same thing when integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but forgot to follow up!! I would be happy to help review a PR if you're up for it. For context, how did you run into this? How does it relate to deletions in nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040) ? was (Author: julietibs): Thank you [~reta] for reporting this. I had noticed the same thing when integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but forgot to follow up!! I would be happy to help review a PR if you're up for it. For context, how did you run into this? How does it relate to deletions in nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040)? > IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage > of the LONG-encoded SortField > - > > Key: LUCENE-10466 > URL: https://issues.apache.org/jira/browse/LUCENE-10466 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 9.0 >Reporter: Andriy Redko >Priority: Minor > > We have run into this issue while migrating to OpenSearch and making changes > to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned > out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes > the usage of the LONG-encoded {*}SortField{*}, as could be seen inside > *static ValueComparator loadComparator* method > {noformat} > @SuppressWarnings("unchecked") > FieldComparator fieldComparator = (FieldComparator) > sortField.getComparator(1, 0); > fieldComparator.setTopValue(topValue); > {noformat} > > Using the numeric range query (in case of sorted index) with anything but > LONG ends up with class cast exception: > {noformat} > > java.lang.ClassCastException: class java.lang.Long cannot be cast to > class java.lang.Integer (java.lang.Long and java.lang.Integer are in module > java.base of loader 'bootstrap') > > at > org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) > {noformat} > Simple test case to reproduce (for > TestIndexSortSortedNumericDocValuesRangeQuery): > {noformat} > public void testIndexSortDocValuesWithIntRange() throws Exception { > Directory dir = newDirectory(); > IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); > Sort indexSort = new Sort(new SortedNumericSortField("field", > SortField.Type.INT, false)); > iwc.setIndexSort(indexSort); > > RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); > writer.addDocument(createDocument("field", -80)); > > DirectoryReader reader = writer.getReader(); > IndexSearcher searcher = newSearcher(reader); // Test ranges > consisting of one value. > assertEquals(1, searcher.count(createQuery("field", -80, -80))); > writer.close(); > reader.close(); > dir.close(); > } {noformat} > > The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should > not fail with class cast but correctly convert the numeric values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
[ https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506530#comment-17506530 ] Julie Tibshirani edited comment on LUCENE-10466 at 3/14/22, 8:40 PM: - Thank you [~reta] for reporting this. I had noticed the same thing when integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but forgot to follow up!! I would be happy to help review a PR if you're up for it. For context, how did you run into this? How does it relate to deletions in nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040)? was (Author: julietibs): Thank you [~reta] for reporting this. I had noticed the same thing when integrating IndexSortSortedNumericDocValuesRangeQuery into Elasticsearch but forgot to follow up!! I would be happy to help review a PR if you're up for it. For context, how did you run into this? How does it relate to deletions in nearest vector search (https://issues.apache.org/jira/browse/LUCENE-10040)? > IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage > of the LONG-encoded SortField > - > > Key: LUCENE-10466 > URL: https://issues.apache.org/jira/browse/LUCENE-10466 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 9.0 >Reporter: Andriy Redko >Priority: Minor > > We have run into this issue while migrating to OpenSearch and making changes > to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned > out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes > the usage of the LONG-encoded {*}SortField{*}, as could be seen inside > *static ValueComparator loadComparator* method > {noformat} > @SuppressWarnings("unchecked") > FieldComparator fieldComparator = (FieldComparator) > sortField.getComparator(1, 0); > fieldComparator.setTopValue(topValue); > {noformat} > > Using the numeric range query (in case of sorted index) with anything but > LONG ends up with class cast exception: > {noformat} > > java.lang.ClassCastException: class java.lang.Long cannot be cast to > class java.lang.Integer (java.lang.Long and java.lang.Integer are in module > java.base of loader 'bootstrap') > > at > org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) > {noformat} > Simple test case to reproduce (for > TestIndexSortSortedNumericDocValuesRangeQuery): > {noformat} > public void testIndexSortDocValuesWithIntRange() throws Exception { > Directory dir = newDirectory(); > IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); > Sort indexSort = new Sort(new SortedNumericSortField("field", > SortField.Type.INT, false)); > iwc.setIndexSort(indexSort); > > RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); > writer.addDocument(createDocument("field", -80)); > > DirectoryReader reader = writer.getReader(); > IndexSearcher searcher = newSearcher(reader); // Test ranges > consisting of one value. > assertEquals(1, searcher.count(createQuery("field", -80, -80))); > writer.close(); > reader.close(); > dir.close(); > } {noformat} > > The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should > not fail with class cast but correctly convert the numeric values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #745: Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri…
jtibshirani merged pull request #745: URL: https://github.com/apache/lucene/pull/745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10385) Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery.
[ https://issues.apache.org/jira/browse/LUCENE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506531#comment-17506531 ] ASF subversion and git services commented on LUCENE-10385: -- Commit a6114b532a273e370528675d551d3ddfa02f4679 in lucene's branch refs/heads/branch_9_1 from Luca Cavanna [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a6114b5 ] Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri… (#745) In LUCENE-10458 we identified a bug in the logic. We're reverting on the 9.1 branch to avoid holding up the release. > Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery. > > > Key: LUCENE-10385 > URL: https://issues.apache.org/jira/browse/LUCENE-10385 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This query can count matches by computing the first and last matching doc IDs > using binary search. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10458) BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables
[ https://issues.apache.org/jira/browse/LUCENE-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506532#comment-17506532 ] ASF subversion and git services commented on LUCENE-10458: -- Commit a6114b532a273e370528675d551d3ddfa02f4679 in lucene's branch refs/heads/branch_9_1 from Luca Cavanna [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a6114b5 ] Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri… (#745) In LUCENE-10458 we identified a bug in the logic. We're reverting on the 9.1 branch to avoid holding up the release. > BoundedDocSetIdIterator may supply error count in > Weigth#count(LeafReaderContext) when missingValue enables > --- > > Key: LUCENE-10458 > URL: https://issues.apache.org/jira/browse/LUCENE-10458 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Priority: Major > Fix For: 9.1 > > Time Spent: 20m > Remaining Estimate: 0h > > When IndexSortSortedNumericDocValuesRangeQuery can take advantage of index > sort, Weight#count will use BoundedDocSetIdIterator's lastDoc and firstDoc to > calculate count, but if missingValue enables, those Documents which not > contain DocValues may be involved in calculating count. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
[ https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy Redko updated LUCENE-10466: -- Description: We have run into this issue while migrating to OpenSearch and making changes to accommodate https://issues.apache.org/jira/browse/LUCENE-10087. It turned out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static ValueComparator loadComparator* method {noformat} @SuppressWarnings("unchecked") FieldComparator fieldComparator = (FieldComparator) sortField.getComparator(1, 0); fieldComparator.setTopValue(topValue); {noformat} Using the numeric range query (in case of sorted index) with anything but LONG ends up with class cast exception: {noformat} > java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap') > at org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) {noformat} Simple test case to reproduce (for TestIndexSortSortedNumericDocValuesRangeQuery): {noformat} public void testIndexSortDocValuesWithIntRange() throws Exception { Directory dir = newDirectory(); IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.INT, false)); iwc.setIndexSort(indexSort); RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); writer.addDocument(createDocument("field", -80)); DirectoryReader reader = writer.getReader(); IndexSearcher searcher = newSearcher(reader); // Test ranges consisting of one value. assertEquals(1, searcher.count(createQuery("field", -80, -80))); writer.close(); reader.close(); dir.close(); } {noformat} The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should not fail with class cast but correctly convert the numeric values. was: We have run into this issue while migrating to OpenSearch and making changes to accommodate https://issues.apache.org/jira/browse/LUCENE-10040. It turned out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes the usage of the LONG-encoded {*}SortField{*}, as could be seen inside *static ValueComparator loadComparator* method {noformat} @SuppressWarnings("unchecked") FieldComparator fieldComparator = (FieldComparator) sortField.getComparator(1, 0); fieldComparator.setTopValue(topValue); {noformat} Using the numeric range query (in case of sorted index) with anything but LONG ends up with class cast exception: {noformat} > java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap') > at org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > at org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) {noformat} Simple test case to reproduce (for TestIndexSortSortedNumericDocValuesRangeQuery): {noformat} public void testIndexSortDocValuesWithIntRange() throws Exception { Directory dir = newDirectory(); IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); Sort indexSort = new Sort(new SortedNumericSortField("field", SortField.Type.INT, false)); iwc.setIndexSort(indexSort); RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); writer.addDocument(createDocument("field", -80)); DirectoryReader reader = writer.getReader(); IndexSearcher searcher = newSearcher(reader); // Test ranges consisting of one value. assertEquals(1, searcher.count(createQuery("field", -80, -80))); writer.close(); reader.close(); dir.close(); } {noformat} The expectation is that *
[jira] [Commented] (LUCENE-10466) IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of the LONG-encoded SortField
[ https://issues.apache.org/jira/browse/LUCENE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506548#comment-17506548 ] Andriy Redko commented on LUCENE-10466: --- Thanks a lot, [~julietibs] , working on the pull request :) I would like to apologize for pasting the wrong issue, https://issues.apache.org/jira/browse/LUCENE-10087 is the one. > IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage > of the LONG-encoded SortField > - > > Key: LUCENE-10466 > URL: https://issues.apache.org/jira/browse/LUCENE-10466 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 9.0 >Reporter: Andriy Redko >Priority: Minor > > We have run into this issue while migrating to OpenSearch and making changes > to accommodate https://issues.apache.org/jira/browse/LUCENE-10087. It turned > out that *IndexSortSortedNumericDocValuesRangeQuery* unconditionally assumes > the usage of the LONG-encoded {*}SortField{*}, as could be seen inside > *static ValueComparator loadComparator* method > {noformat} > @SuppressWarnings("unchecked") > FieldComparator fieldComparator = (FieldComparator) > sortField.getComparator(1, 0); > fieldComparator.setTopValue(topValue); > {noformat} > > Using the numeric range query (in case of sorted index) with anything but > LONG ends up with class cast exception: > {noformat} > > java.lang.ClassCastException: class java.lang.Long cannot be cast to > class java.lang.Integer (java.lang.Long and java.lang.Integer are in module > java.base of loader 'bootstrap') > > at > org.apache.lucene.search.comparators.IntComparator.setTopValue(IntComparator.java:29) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.loadComparator(IndexSortSortedNumericDocValuesRangeQuery.java:251) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery.getDocIdSetIterator(IndexSortSortedNumericDocValuesRangeQuery.java:206) > > at > org.apache.lucene.sandbox.search.IndexSortSortedNumericDocValuesRangeQuery$1.scorer(IndexSortSortedNumericDocValuesRangeQuery.java:170) > {noformat} > Simple test case to reproduce (for > TestIndexSortSortedNumericDocValuesRangeQuery): > {noformat} > public void testIndexSortDocValuesWithIntRange() throws Exception { > Directory dir = newDirectory(); > IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); > Sort indexSort = new Sort(new SortedNumericSortField("field", > SortField.Type.INT, false)); > iwc.setIndexSort(indexSort); > > RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc); > writer.addDocument(createDocument("field", -80)); > > DirectoryReader reader = writer.getReader(); > IndexSearcher searcher = newSearcher(reader); // Test ranges > consisting of one value. > assertEquals(1, searcher.count(createQuery("field", -80, -80))); > writer.close(); > reader.close(); > dir.close(); > } {noformat} > > The expectation is that *IndexSortSortedNumericDocValuesRangeQuery* should > not fail with class cast but correctly convert the numeric values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
gsmiller commented on a change in pull request #747: URL: https://github.com/apache/lucene/pull/747#discussion_r826440044 ## File path: lucene/facet/src/java/org/apache/lucene/facet/Facets.java ## @@ -48,4 +48,13 @@ public abstract FacetResult getTopChildren(int topN, String dim, String... path) * indexed, for example depending on the type of document. */ public abstract List getAllDims(int topN) throws IOException; + + /** + * Returns labels for topN dimensions and their topNChildren sorted by the number of hits that + * dimension matched + */ + public List getTopDims(int topNDims, int topNChildren) throws IOException { Review comment: I like the approach of providing a default implementation here so existing sub-classes will be fully backwards-compatible (and they don't need to worry about providing an implementation if this suits their needs). It might be nice to mention explicitly in the javadoc that sub-classes may _want_ to override this implementation though with a more efficient one if they're able. ## File path: lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java ## @@ -104,6 +104,65 @@ public void testBasic() throws Exception { "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", facets.getTopChildren(10, "b").toString()); + // test getAllDims Review comment: Thank you for adding so much testing, including coverage for existing `getAllDims` functionality! ## File path: lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java ## @@ -104,6 +104,65 @@ public void testBasic() throws Exception { "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", facets.getTopChildren(10, "b").toString()); + // test getAllDims + List results = facets.getAllDims(10); + assertEquals(2, results.size()); + assertEquals( + "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", + results.get(0).toString()); + assertEquals( + "dim=a path=[] value=-1 childCount=3\n foo (2)\n bar (1)\n zoo (1)\n", + results.get(1).toString()); + + // test getAllDims(1, 0) with topN = 0 + expectThrows( Review comment: Ouch! This seems like a poor (existing) experience for users. Would you mind creating a Jira to track this? We should probably change this behavior to throw an `IllegalArgumentException` at least instead of just an NPE. Thanks for uncovering this! ## File path: lucene/facet/src/test/org/apache/lucene/facet/range/TestRangeFacetCounts.java ## @@ -243,6 +243,24 @@ public void testLongGetAllDims() throws Exception { "dim=field path=[] value=22 childCount=5\n less than 10 (10)\n less than or equal to 10 (11)\n over 90 (9)\n 90 or above (10)\n over 1000 (1)\n", result.get(0).toString()); +// test getAllDims(1) +List test1Child = facets.getAllDims(1); +assertEquals(1, test1Child.size()); +assertEquals( +"dim=field path=[] value=22 childCount=5\n less than 10 (10)\n less than or equal to 10 (11)\n over 90 (9)\n 90 or above (10)\n over 1000 (1)\n", +test1Child.get(0).toString()); + +// test default implementation of getTopDims +List topNDimsResult = facets.getTopDims(1, 1); +assertEquals(1, topNDimsResult.size()); +assertEquals( Review comment: minor: Since `FacetResult` properly implements `equals`, you could just do `assertEquals(test1Child, topNDimsResult)`. This makes it slightly more obvious to the reader that you expect the exact same behavior as `getAllDims` ## File path: lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java ## @@ -104,6 +104,65 @@ public void testBasic() throws Exception { "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", facets.getTopChildren(10, "b").toString()); + // test getAllDims + List results = facets.getAllDims(10); + assertEquals(2, results.size()); + assertEquals( + "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", + results.get(0).toString()); + assertEquals( + "dim=a path=[] value=-1 childCount=3\n foo (2)\n bar (1)\n zoo (1)\n", + results.get(1).toString()); + + // test getAllDims(1, 0) with topN = 0 + expectThrows( + NullPointerException.class, + () -> { +facets.getAllDims(0); + }); + + // test getTopDims(10, 10) and expect same results from getAllDims(10) + List allDimsResults = facets.getTopDims(10, 10); + assertEquals(2, results.size()); + assertEquals( + "dim=b pat
[GitHub] [lucene] Yuti-G commented on pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G commented on pull request #747: URL: https://github.com/apache/lucene/pull/747#issuecomment-1067392546 > Thanks for picking this up! I've looked at everything except for your overridden implementation in SSDV faceting, but since I may run out of time to look at that today, I'll go ahead and publish my feedback on your default implementation and testing. I'll follow up with more feedback soon. Thanks again! Thanks @gsmiller for reviewing my PR and leaving the detailed feedback! I will address them in my next PR. Appreciated it :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G edited a comment on pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G edited a comment on pull request #747: URL: https://github.com/apache/lucene/pull/747#issuecomment-1067392546 > Thanks for picking this up! I've looked at everything except for your overridden implementation in SSDV faceting, but since I may run out of time to look at that today, I'll go ahead and publish my feedback on your default implementation and testing. I'll follow up with more feedback soon. Thanks again! Thanks @gsmiller for reviewing my PR and leaving the detailed feedback! I will address them in my next PR, appreciated :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G commented on a change in pull request #747: URL: https://github.com/apache/lucene/pull/747#discussion_r826455347 ## File path: lucene/facet/src/java/org/apache/lucene/facet/Facets.java ## @@ -48,4 +48,13 @@ public abstract FacetResult getTopChildren(int topN, String dim, String... path) * indexed, for example depending on the type of document. */ public abstract List getAllDims(int topN) throws IOException; + + /** + * Returns labels for topN dimensions and their topNChildren sorted by the number of hits that + * dimension matched + */ + public List getTopDims(int topNDims, int topNChildren) throws IOException { Review comment: My current javadoc does not well describe this new functionality, and I will add more to it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G commented on a change in pull request #747: URL: https://github.com/apache/lucene/pull/747#discussion_r826456302 ## File path: lucene/facet/src/test/org/apache/lucene/facet/range/TestRangeFacetCounts.java ## @@ -243,6 +243,24 @@ public void testLongGetAllDims() throws Exception { "dim=field path=[] value=22 childCount=5\n less than 10 (10)\n less than or equal to 10 (11)\n over 90 (9)\n 90 or above (10)\n over 1000 (1)\n", result.get(0).toString()); +// test getAllDims(1) +List test1Child = facets.getAllDims(1); +assertEquals(1, test1Child.size()); +assertEquals( +"dim=field path=[] value=22 childCount=5\n less than 10 (10)\n less than or equal to 10 (11)\n over 90 (9)\n 90 or above (10)\n over 1000 (1)\n", +test1Child.get(0).toString()); + +// test default implementation of getTopDims +List topNDimsResult = facets.getTopDims(1, 1); +assertEquals(1, topNDimsResult.size()); +assertEquals( Review comment: Thanks for catching this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G commented on a change in pull request #747: URL: https://github.com/apache/lucene/pull/747#discussion_r826456758 ## File path: lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java ## @@ -104,6 +104,65 @@ public void testBasic() throws Exception { "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", facets.getTopChildren(10, "b").toString()); + // test getAllDims Review comment: Thank you so much! I added more testing for getAllDims in order to compare the behavior of getTopDims :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G commented on a change in pull request #747: URL: https://github.com/apache/lucene/pull/747#discussion_r826461610 ## File path: lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java ## @@ -104,6 +104,65 @@ public void testBasic() throws Exception { "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", facets.getTopChildren(10, "b").toString()); + // test getAllDims + List results = facets.getAllDims(10); + assertEquals(2, results.size()); + assertEquals( + "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", + results.get(0).toString()); + assertEquals( + "dim=a path=[] value=-1 childCount=3\n foo (2)\n bar (1)\n zoo (1)\n", + results.get(1).toString()); + + // test getAllDims(1, 0) with topN = 0 + expectThrows( Review comment: Sure! I will create a Jira and resolve it. getAllDims(0) does throw `IllegalArgumentException` in TaxonomyFacetCounts because `getTopChildren(0, dim)` throws it, but the overridden implementation in SSDV does not specify it. I was not sure if the two implementations should behave the same on this exception. Thank you so much for confirming this! The javadoc of this test has a typo, should be `// test getAllDims(0) with topN = 0`, will fix this as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G commented on a change in pull request #747: URL: https://github.com/apache/lucene/pull/747#discussion_r82646 ## File path: lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java ## @@ -104,6 +104,65 @@ public void testBasic() throws Exception { "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", facets.getTopChildren(10, "b").toString()); + // test getAllDims + List results = facets.getAllDims(10); + assertEquals(2, results.size()); + assertEquals( + "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", + results.get(0).toString()); + assertEquals( + "dim=a path=[] value=-1 childCount=3\n foo (2)\n bar (1)\n zoo (1)\n", + results.get(1).toString()); + + // test getAllDims(1, 0) with topN = 0 + expectThrows( + NullPointerException.class, + () -> { +facets.getAllDims(0); + }); + + // test getTopDims(10, 10) and expect same results from getAllDims(10) + List allDimsResults = facets.getTopDims(10, 10); + assertEquals(2, results.size()); + assertEquals( + "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", + allDimsResults.get(0).toString()); + assertEquals( + "dim=a path=[] value=-1 childCount=3\n foo (2)\n bar (1)\n zoo (1)\n", + allDimsResults.get(1).toString()); + + // test getTopDims(2, 1) + List topDimsResults = facets.getTopDims(2, 1); Review comment: Thank you so much! I will check and fix other tests that have this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets
Yuti-G commented on a change in pull request #747: URL: https://github.com/apache/lucene/pull/747#discussion_r826461610 ## File path: lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java ## @@ -104,6 +104,65 @@ public void testBasic() throws Exception { "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", facets.getTopChildren(10, "b").toString()); + // test getAllDims + List results = facets.getAllDims(10); + assertEquals(2, results.size()); + assertEquals( + "dim=b path=[] value=2 childCount=2\n buzz (2)\n baz (1)\n", + results.get(0).toString()); + assertEquals( + "dim=a path=[] value=-1 childCount=3\n foo (2)\n bar (1)\n zoo (1)\n", + results.get(1).toString()); + + // test getAllDims(1, 0) with topN = 0 + expectThrows( Review comment: Sure! I will create a Jira and resolve it. getAllDims(0) does throw `IllegalArgumentException` in TaxonomyFacetCounts because it calls`getTopChildren(0, dim)` and getTopChildren throws it, but the overridden implementation in SSDV does not specify it. I was not sure if the two implementations should behave the same on this exception. Thank you so much for confirming this! The javadoc of this test has a typo, should be `// test getAllDims(0) with topN = 0`, will fix this as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10325) Add getTopDims functionality to Facets
[ https://issues.apache.org/jira/browse/LUCENE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506640#comment-17506640 ] Yuting Gan commented on LUCENE-10325: - Thanks [~gsmiller] for creating issue! I provided a default implementation of `getTopDims({color:#cc7832}int {color}topNDims{color:#cc7832}, int {color}topNChildren)` in the Facets class that calls the existing `getAllDims(topNChildren)` function and returns `FacetResult` of the requested `topNDims` and their `topNChildren`. Currently, I only experimented with one overridden implementation of `getTopDims` in `SortedSetDocValuesFacetCounts` that aims to provide a more optimal way of populating dimCount. It avoids resolving all child paths and creating all FacetResult for every dim when calling `getTopDims`. I created #747 for this change and will appreciate any feedback. Since this change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the PR is approved, I can also expand it to `ConcurrentSSDVFacetCounts`and explore other possible optimized implementations in faceting. > Add getTopDims functionality to Facets > -- > > Key: LUCENE-10325 > URL: https://issues.apache.org/jira/browse/LUCENE-10325 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Greg Miller >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The current {{getAllDims}} functionality is really the only way for users to > determine the "top" dimensions in a faceting field (i.e., get the top dims by > count along with their top-n children), but it has the unfortunate > side-effect of resolving all child paths for every dim, even if the user > doesn't intend to use those dims. For example, if a match set contains docs > relating to 100 different dims (and various values under each), but the user > only wants the top 10 dims with their top 5 children, they can call > getAllDims(5) then just grab the first 10 results, but a lot of wasted work > has been done for the other 90 dims. > It would be nice to implement something like {{getTopDims(int numDims, int > numChildren)}} that would only do the work necessary to resolve {{numDims}} > dims instead of all dims. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10325) Add getTopDims functionality to Facets
[ https://issues.apache.org/jira/browse/LUCENE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506640#comment-17506640 ] Yuting Gan edited comment on LUCENE-10325 at 3/15/22, 12:22 AM: Thanks [~gsmiller] for creating issue! I provided a default implementation of _getTopDims(int topNDims, int topNChildren)_ in the Facets class that calls the existing _getAllDims(topNChildren)_ function and returns _FacetResult_ of the requested _topNDims_ and their {_}topNChildren{_}. Currently, I only experimented with one overridden implementation of _getTopDims_ in _SortedSetDocValuesFacetCounts_ that aims to provide a more optimal way of populating {_}dimCount{_}. It avoids resolving all child paths and creating all _FacetResult_ for every dim when calling _getTopDims._ I created #747 for this change and will appreciate any feedback. Since this change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore other possible optimized implementations in faceting. Thanks! was (Author: yutinggan): Thanks [~gsmiller] for creating issue! I provided a default implementation of `getTopDims({color:#cc7832}int {color}topNDims{color:#cc7832}, int {color}topNChildren)` in the Facets class that calls the existing `getAllDims(topNChildren)` function and returns `FacetResult` of the requested `topNDims` and their `topNChildren`. Currently, I only experimented with one overridden implementation of `getTopDims` in `SortedSetDocValuesFacetCounts` that aims to provide a more optimal way of populating dimCount. It avoids resolving all child paths and creating all FacetResult for every dim when calling `getTopDims`. I created #747 for this change and will appreciate any feedback. Since this change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the PR is approved, I can also expand it to `ConcurrentSSDVFacetCounts`and explore other possible optimized implementations in faceting. > Add getTopDims functionality to Facets > -- > > Key: LUCENE-10325 > URL: https://issues.apache.org/jira/browse/LUCENE-10325 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Greg Miller >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The current {{getAllDims}} functionality is really the only way for users to > determine the "top" dimensions in a faceting field (i.e., get the top dims by > count along with their top-n children), but it has the unfortunate > side-effect of resolving all child paths for every dim, even if the user > doesn't intend to use those dims. For example, if a match set contains docs > relating to 100 different dims (and various values under each), but the user > only wants the top 10 dims with their top 5 children, they can call > getAllDims(5) then just grab the first 10 results, but a lot of wasted work > has been done for the other 90 dims. > It would be nice to implement something like {{getTopDims(int numDims, int > numChildren)}} that would only do the work necessary to resolve {{numDims}} > dims instead of all dims. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10325) Add getTopDims functionality to Facets
[ https://issues.apache.org/jira/browse/LUCENE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506640#comment-17506640 ] Yuting Gan edited comment on LUCENE-10325 at 3/15/22, 12:25 AM: Thanks [~gsmiller] for creating this issue. I provided a default implementation of _getTopDims(int topNDims, int topNChildren)_ in the Facets class that calls the existing _getAllDims(topNChildren)_ function and returns _FacetResult_ of the requested _topNDims_ and their {_}topNChildren{_}. Currently, I only experimented with one overridden implementation of _getTopDims_ in _SortedSetDocValuesFacetCounts_ that aims to provide a more optimal way of populating {_}dimCount{_}. It avoids resolving all child paths and creating all _FacetResult_ for every dim when calling _getTopDims._ I created #747 for this change and will appreciate any feedback. Since this change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore other possible optimized implementations in faceting. Thanks! was (Author: yutinggan): Thanks [~gsmiller] for creating issue! I provided a default implementation of _getTopDims(int topNDims, int topNChildren)_ in the Facets class that calls the existing _getAllDims(topNChildren)_ function and returns _FacetResult_ of the requested _topNDims_ and their {_}topNChildren{_}. Currently, I only experimented with one overridden implementation of _getTopDims_ in _SortedSetDocValuesFacetCounts_ that aims to provide a more optimal way of populating {_}dimCount{_}. It avoids resolving all child paths and creating all _FacetResult_ for every dim when calling _getTopDims._ I created #747 for this change and will appreciate any feedback. Since this change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore other possible optimized implementations in faceting. Thanks! > Add getTopDims functionality to Facets > -- > > Key: LUCENE-10325 > URL: https://issues.apache.org/jira/browse/LUCENE-10325 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Greg Miller >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The current {{getAllDims}} functionality is really the only way for users to > determine the "top" dimensions in a faceting field (i.e., get the top dims by > count along with their top-n children), but it has the unfortunate > side-effect of resolving all child paths for every dim, even if the user > doesn't intend to use those dims. For example, if a match set contains docs > relating to 100 different dims (and various values under each), but the user > only wants the top 10 dims with their top 5 children, they can call > getAllDims(5) then just grab the first 10 results, but a lot of wasted work > has been done for the other 90 dims. > It would be nice to implement something like {{getTopDims(int numDims, int > numChildren)}} that would only do the work necessary to resolve {{numDims}} > dims instead of all dims. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10467) Throws IllegalArgumentException for getAllDims(0)
[ https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuting Gan updated LUCENE-10467: Description: Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it would throw a NullPointerException. Other class that implements the getAllDims functionality could have the same issue, except for TaxonomyFacetCounts, which has been tested. It would provide better user experience by throwing an IllegalArgumentException when requesting topN = 0 for getAllDims. was: Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it would throw a NullPointerException. Other class that implements the getAllDims functionality could have the same issue, except for TaxonomyFacetCounts, which has been tested. It would provide better user experience by throwing an IllegalArgumentException when requesting topN = 0 for getAllDims.{{{}{}}}{{{}{}}} > Throws IllegalArgumentException for getAllDims(0) > - > > Key: LUCENE-10467 > URL: https://issues.apache.org/jira/browse/LUCENE-10467 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > > Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it > would throw a NullPointerException. Other class that implements the > getAllDims functionality could have the same issue, except for > TaxonomyFacetCounts, which has been tested. > It would provide better user experience by throwing an > IllegalArgumentException when requesting topN = 0 for getAllDims. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10467) Throws IllegalArgumentException for getAllDims(0)
Yuting Gan created LUCENE-10467: --- Summary: Throws IllegalArgumentException for getAllDims(0) Key: LUCENE-10467 URL: https://issues.apache.org/jira/browse/LUCENE-10467 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Yuting Gan Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it would throw a NullPointerException. Other class that implements the getAllDims functionality could have the same issue, except for TaxonomyFacetCounts, which has been tested. It would provide better user experience by throwing an IllegalArgumentException when requesting topN = 0 for getAllDims.{{{}{}}}{{{}{}}} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10467) Throws IllegalArgumentException for getAllDims if topN <= 0
[ https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuting Gan updated LUCENE-10467: Description: Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it would throw a NullPointerException. Other class that implements the getAllDims functionality could have the same issue, except for TaxonomyFacetCounts, which has been tested. It would provide better user experience by throwing an IllegalArgumentException when requesting topN <= 0 for getAllDims. was: Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it would throw a NullPointerException. Other class that implements the getAllDims functionality could have the same issue, except for TaxonomyFacetCounts, which has been tested. It would provide better user experience by throwing an IllegalArgumentException when requesting topN = 0 for getAllDims. > Throws IllegalArgumentException for getAllDims if topN <= 0 > --- > > Key: LUCENE-10467 > URL: https://issues.apache.org/jira/browse/LUCENE-10467 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > > Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it > would throw a NullPointerException. Other class that implements the > getAllDims functionality could have the same issue, except for > TaxonomyFacetCounts, which has been tested. > It would provide better user experience by throwing an > IllegalArgumentException when requesting topN <= 0 for getAllDims. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10467) Throws IllegalArgumentException for getAllDims if topN <= 0
[ https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuting Gan updated LUCENE-10467: Summary: Throws IllegalArgumentException for getAllDims if topN <= 0 (was: Throws IllegalArgumentException for getAllDims(0)) > Throws IllegalArgumentException for getAllDims if topN <= 0 > --- > > Key: LUCENE-10467 > URL: https://issues.apache.org/jira/browse/LUCENE-10467 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > > Currently, when calling getAllDims(0) in SortedSetDocValuesFacetCounts, it > would throw a NullPointerException. Other class that implements the > getAllDims functionality could have the same issue, except for > TaxonomyFacetCounts, which has been tested. > It would provide better user experience by throwing an > IllegalArgumentException when requesting topN = 0 for getAllDims. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a change in pull request #736: LUCENE-10458: BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables
LuXugang commented on a change in pull request #736: URL: https://github.com/apache/lucene/pull/736#discussion_r826515036 ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java ## @@ -198,16 +198,22 @@ public boolean isCacheable(LeafReaderContext ctx) { @Override public int count(LeafReaderContext context) throws IOException { -BoundedDocSetIdIterator disi = getDocIdSetIteratorOrNull(context); -if (disi != null) { - return disi.lastDoc - disi.firstDoc; +Sort indexSort = context.reader().getMetaData().getSort(); +if (indexSort != null +&& indexSort.getSort().length > 0 +&& indexSort.getSort()[0].getField().equals(field) +&& indexSort.getSort()[0].getMissingValue() == null) { Review comment: `indexSort.getSort()[0].getMissingValue() == null` It indeed seems too aggressive, Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org