Tim-Brooks commented on code in PR #15990:
URL: https://github.com/apache/lucene/pull/15990#discussion_r3183298317
##########
lucene/core/src/java/org/apache/lucene/index/IndexingChain.java:
##########
@@ -764,10 +1205,30 @@ private void initializeFieldInfo(PerField pf) throws
IOException {
/** Index each field Returns {@code true}, if we are indexing a unique field
with postings */
private boolean processField(int docID, IndexableField field, PerField pf)
throws IOException {
+ boolean indexedField = invertAndStore(docID, field, pf);
+ IndexableFieldType fieldType = field.fieldType();
+ DocValuesType dvType = fieldType.docValuesType();
+ if (dvType != DocValuesType.NONE) {
+ indexDocValue(docID, pf, dvType, field);
+ }
+ if (fieldType.pointDimensionCount() != 0) {
+ pf.pointValuesWriter.addPackedValue(docID, field.binaryValue());
+ }
+ if (fieldType.vectorDimension() != 0) {
+ indexVectorValue(docID, pf, fieldType.vectorEncoding(), field);
+ }
+ return indexedField;
+ }
+
+ /**
+ * Inverts indexed fields and writes stored fields. Shared by the single-doc
row path ({@link
+ * #processField}) and the column-batch row pass ({@link
#processRowColumns}). Returns {@code
+ * true} if this is a unique indexed field with postings.
+ */
+ private boolean invertAndStore(int docID, IndexableField field, PerField pf)
throws IOException {
Review Comment:
I wrote in the mailing list that DOC + no norms can be processed columnar.
In terms of optimizations, once the api had landed I planned to propose a
long (or int) column with an associated array dictionary. And then Lucene would
only index the dictionary each column batch.
This would be targeting inverted index and sorted set DV optimizations for
low cardinality use cases. Without exposing any Lucene hashing or equality
internals.
But I have not actually gone through the steps of implementing something
like this yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]