Re: [PR] Add experimental columnar indexing api [lucene]

via GitHub Mon, 04 May 2026 10:32:13 -0700


Tim-Brooks commented on code in PR #15990:
URL: https://github.com/apache/lucene/pull/15990#discussion_r3183298317



##########
lucene/core/src/java/org/apache/lucene/index/IndexingChain.java:
##########
@@ -764,10 +1205,30 @@ private void initializeFieldInfo(PerField pf) throws 
IOException {
 
   /** Index each field Returns {@code true}, if we are indexing a unique field 
with postings */
   private boolean processField(int docID, IndexableField field, PerField pf) 
throws IOException {
+    boolean indexedField = invertAndStore(docID, field, pf);
+    IndexableFieldType fieldType = field.fieldType();
+    DocValuesType dvType = fieldType.docValuesType();
+    if (dvType != DocValuesType.NONE) {
+      indexDocValue(docID, pf, dvType, field);
+    }
+    if (fieldType.pointDimensionCount() != 0) {
+      pf.pointValuesWriter.addPackedValue(docID, field.binaryValue());
+    }
+    if (fieldType.vectorDimension() != 0) {
+      indexVectorValue(docID, pf, fieldType.vectorEncoding(), field);
+    }
+    return indexedField;
+  }
+
+  /**
+   * Inverts indexed fields and writes stored fields. Shared by the single-doc 
row path ({@link
+   * #processField}) and the column-batch row pass ({@link 
#processRowColumns}). Returns {@code
+   * true} if this is a unique indexed field with postings.
+   */
+  private boolean invertAndStore(int docID, IndexableField field, PerField pf) 
throws IOException {

Review Comment:
   I wrote in the mailing list that DOC + no norms can be processed columnar. 
   
   In terms of optimizations, once the api had landed I planned to propose a 
long (or int) column with an associated array dictionary. And then Lucene would 
only index the dictionary each column batch. 
   
   This would be targeting inverted index and sorted set DV optimizations for 
low cardinality use cases. Without exposing any Lucene hashing or equality 
internals. 
   
   But I have not actually gone through the steps of implementing something 
like this yet. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add experimental columnar indexing api [lucene]

Reply via email to