dsmiley commented on code in PR #13327: URL: https://github.com/apache/lucene/pull/13327#discussion_r1594462331
########## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ########## @@ -156,15 +148,38 @@ public FieldInfos(FieldInfo[] infos) { this.softDeletesField = softDeletesField; this.parentField = parentField; - List<FieldInfo> valuesTemp = new ArrayList<>(infos.length); - byNumber = new FieldInfo[size]; - for (int i = 0; i < size; i++) { - byNumber[i] = byNumberTemp[i]; - if (byNumberTemp[i] != null) { - valuesTemp.add(byNumberTemp[i]); + if (fieldNumberStrictlyAscending && maxFieldNumber == infos.length - 1) { + // The input FieldInfo[] contains all fields numbered from 0 to infos.length - 1 and they are + // sorted, use it directly. This is an optimization when reading a segment with all fields + // since the FieldInfo[] is sorted. + byNumber = infos; // We could copy the input array, but do we need to? Review Comment: This is fairly internal stuff labelled lucene.experimental; I say yes but document it in the javadoc. Obviously look at existing callers to double-check this is fine. ########## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ########## @@ -139,17 +148,38 @@ public FieldInfos(FieldInfo[] infos) { this.softDeletesField = softDeletesField; this.parentField = parentField; - FieldInfo[] sortedFieldInfos = ArrayUtil.copyOfSubArray(infos, 0, infos.length); - Arrays.sort(sortedFieldInfos, (fi1, fi2) -> Integer.compare(fi1.number, fi2.number)); - int maxFieldNumber = infos.length == 0 ? -1 : sortedFieldInfos[infos.length - 1].number; - // If there are many fields and the max field number is greater than twice the number - // of fields, then a map structure is more compact to store the by-number mapping. - byNumber = - maxFieldNumber >= 2 * infos.length && maxFieldNumber >= 32 - ? new MapFieldInfoByNumber(infos) - : new ArrayFieldInfoByNumber(infos, maxFieldNumber); - // The iteration of FieldInfo is ordered by ascending field number. - values = Collections.unmodifiableCollection(Arrays.asList(sortedFieldInfos)); + if (fieldNumberStrictlyAscending && maxFieldNumber == infos.length - 1) { + // The input FieldInfo[] contains all fields numbered from 0 to infos.length - 1 and they are + // sorted, use it directly. This is an optimization when reading a segment with all fields + // since the FieldInfo[] is sorted. + byNumber = infos; // We could copy the input array, but do we need to? + values = Arrays.asList(byNumber); + } else { + byNumber = new FieldInfo[maxFieldNumber + 1]; + for (FieldInfo fieldInfo : infos) { + FieldInfo previous = byNumber[fieldInfo.number]; + if (previous != null) { + throw new IllegalArgumentException( + "duplicate field numbers: " + + previous.name + + " and " + + fieldInfo.name + + " have: " + + fieldInfo.number); + } + byNumber[fieldInfo.number] = fieldInfo; + } + if (maxFieldNumber == infos.length - 1) { + // No fields are missing, use byNumber. + values = Arrays.asList(byNumber); + } else { + // The below code is faster than Arrays.stream(byNumber).filter(Objects::nonNull).toList(), + // mainly when the input FieldInfo[] is sorted, when reading a segment. + FieldInfo[] sortedFieldInfos = ArrayUtil.copyOfSubArray(infos, 0, infos.length); Review Comment: It looks a bit curious to be calling a method with "sub array" for what is actually the whole array. Maybe a convenience method copy(infos) should be provided. Could be deferred of course. ########## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ########## @@ -139,17 +148,38 @@ public FieldInfos(FieldInfo[] infos) { this.softDeletesField = softDeletesField; this.parentField = parentField; - FieldInfo[] sortedFieldInfos = ArrayUtil.copyOfSubArray(infos, 0, infos.length); - Arrays.sort(sortedFieldInfos, (fi1, fi2) -> Integer.compare(fi1.number, fi2.number)); - int maxFieldNumber = infos.length == 0 ? -1 : sortedFieldInfos[infos.length - 1].number; - // If there are many fields and the max field number is greater than twice the number - // of fields, then a map structure is more compact to store the by-number mapping. - byNumber = - maxFieldNumber >= 2 * infos.length && maxFieldNumber >= 32 - ? new MapFieldInfoByNumber(infos) - : new ArrayFieldInfoByNumber(infos, maxFieldNumber); - // The iteration of FieldInfo is ordered by ascending field number. - values = Collections.unmodifiableCollection(Arrays.asList(sortedFieldInfos)); + if (fieldNumberStrictlyAscending && maxFieldNumber == infos.length - 1) { + // The input FieldInfo[] contains all fields numbered from 0 to infos.length - 1 and they are + // sorted, use it directly. This is an optimization when reading a segment with all fields + // since the FieldInfo[] is sorted. + byNumber = infos; // We could copy the input array, but do we need to? + values = Arrays.asList(byNumber); + } else { + byNumber = new FieldInfo[maxFieldNumber + 1]; + for (FieldInfo fieldInfo : infos) { + FieldInfo previous = byNumber[fieldInfo.number]; Review Comment: this is non-obvious. At a glance, we are retrieving the very same fieldInfo, yet supposedly it's the previous. Oh, maybe you mean "existing" fieldInfo as opposed to a fieldInfo ordered prior to this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org