[GitHub] [lucene] gautamworah96 commented on a change in pull request #191: LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields

2021-06-27 Thread GitBox


gautamworah96 commented on a change in pull request #191:
URL: https://github.com/apache/lucene/pull/191#discussion_r659420106



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java
##
@@ -162,8 +164,14 @@ private void count(String field, List 
matchingDocs) throws IOExcep
   if (limit > 0) {
 totCount++;
   }
+  Set uniqueLongValues =
+  new HashSet<>(); // count each repeated long value in a field as 
the same

Review comment:
   Ahh. I did not think of this. Will fix




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a change in pull request #191: LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields

2021-06-27 Thread GitBox


gautamworah96 commented on a change in pull request #191:
URL: https://github.com/apache/lucene/pull/191#discussion_r659451970



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/TestLongValueFacetCounts.java
##
@@ -382,19 +383,19 @@ public void testRandomMultiValued() throws Exception {
   + allSingleValued);
 }
 
-long[][] values = new long[valueCount][];
-for (int i = 0; i < valueCount; i++) {
+Long[][] values = new Long[docCount][];

Review comment:
   Oooh. This is a tricky one.
   
   
   When converting primitive arrays into lists, Java converts them into a 
single object.
   So when we convert an array of `long`s to a list, Java converts them to a 
list with a single object and that object has all the `long` children.
   So `long[]` is converted to a list `{C@15db9742}` where that  C@15db9742 is 
an array of longs. When you send this list to a HashSet it will not remove out 
duplicate values because the values the HashSet sees are objects.
   
   In contrast, on converting  an array of `Long`s (of type `Object`) to a 
list, Java converts all of the children directly to a list like `{Long val1, 
Long val2 ...}`.  When you send these values to a HashSet it will do the right 
thing and remove out the duplicate values.
   
   In L435 `Long value : new HashSet<>(Arrays.asList(values[i]))` we are trying 
to do exactly that i.e, remove out duplicate values so that we get the correct 
expected count (with no double counting for duplicate values).
   
   So using `long` will give us the wrong result because it will double count 
values as it has not removed duplicate ones. Because we've inadvertently sent 
the wrong values to the `HashSet`!
   
   Related reference from SO: 
https://stackoverflow.com/questions/47153183/primitive-type-and-wrapper-type-for-arrays-aslist




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a change in pull request #191: LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields

2021-06-27 Thread GitBox


gautamworah96 commented on a change in pull request #191:
URL: https://github.com/apache/lucene/pull/191#discussion_r659420106



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java
##
@@ -162,8 +164,14 @@ private void count(String field, List 
matchingDocs) throws IOExcep
   if (limit > 0) {
 totCount++;
   }
+  Set uniqueLongValues =
+  new HashSet<>(); // count each repeated long value in a field as 
the same

Review comment:
   Ahh. I did not think of this. Will fix.
   Edit: The new commit e84005f  implements this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org