[GitHub] [lucene] gautamworah96 commented on a change in pull request #191: LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields
gautamworah96 commented on a change in pull request #191: URL: https://github.com/apache/lucene/pull/191#discussion_r659420106 ## File path: lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java ## @@ -162,8 +164,14 @@ private void count(String field, List matchingDocs) throws IOExcep if (limit > 0) { totCount++; } + Set uniqueLongValues = + new HashSet<>(); // count each repeated long value in a field as the same Review comment: Ahh. I did not think of this. Will fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a change in pull request #191: LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields
gautamworah96 commented on a change in pull request #191: URL: https://github.com/apache/lucene/pull/191#discussion_r659451970 ## File path: lucene/facet/src/test/org/apache/lucene/facet/TestLongValueFacetCounts.java ## @@ -382,19 +383,19 @@ public void testRandomMultiValued() throws Exception { + allSingleValued); } -long[][] values = new long[valueCount][]; -for (int i = 0; i < valueCount; i++) { +Long[][] values = new Long[docCount][]; Review comment: Oooh. This is a tricky one. When converting primitive arrays into lists, Java converts them into a single object. So when we convert an array of `long`s to a list, Java converts them to a list with a single object and that object has all the `long` children. So `long[]` is converted to a list `{C@15db9742}` where that C@15db9742 is an array of longs. When you send this list to a HashSet it will not remove out duplicate values because the values the HashSet sees are objects. In contrast, on converting an array of `Long`s (of type `Object`) to a list, Java converts all of the children directly to a list like `{Long val1, Long val2 ...}`. When you send these values to a HashSet it will do the right thing and remove out the duplicate values. In L435 `Long value : new HashSet<>(Arrays.asList(values[i]))` we are trying to do exactly that i.e, remove out duplicate values so that we get the correct expected count (with no double counting for duplicate values). So using `long` will give us the wrong result because it will double count values as it has not removed duplicate ones. Because we've inadvertently sent the wrong values to the `HashSet`! Related reference from SO: https://stackoverflow.com/questions/47153183/primitive-type-and-wrapper-type-for-arrays-aslist -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a change in pull request #191: LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields
gautamworah96 commented on a change in pull request #191: URL: https://github.com/apache/lucene/pull/191#discussion_r659420106 ## File path: lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java ## @@ -162,8 +164,14 @@ private void count(String field, List matchingDocs) throws IOExcep if (limit > 0) { totCount++; } + Set uniqueLongValues = + new HashSet<>(); // count each repeated long value in a field as the same Review comment: Ahh. I did not think of this. Will fix. Edit: The new commit e84005f implements this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org