Re: [I] Support for building materialized views using Lucene formats [lucene]

via GitHub Wed, 24 Apr 2024 11:15:34 -0700


bharath-techie commented on issue #13188:
URL: https://github.com/apache/lucene/issues/13188#issuecomment-2075553304


   Thanks for the comments @msfroh .
   
   Good idea, if we want to supply `Dims` and `metric` values to 
`DataCubesWriter` as part of `addDocument` flow and consume them similar to 
other formats.
   
   But there are some cons:
   
   1. For adding an attribute to the field : (Lets take `IntField` for example )
   
   The same `IntField` can be part of both dimension and metric ( in fact 
multiple metrics ) as part of a `DataCubeField`. And same `IntField` can be 
part of multiple `DataCubeField`. 
   
   2. If we solve the above, and supply values via `DataCubesWriter` for each 
`DataCubeField`, there will be duplicate values depending on the configuration.
   
   So in order to avoid the duplication of values , how about we derive the 
values of `DataCubeField` from the original values of `DocValuesWriter` during 
`flush` ?
   
   ### Flush
   
   `IntField` values will be already part of `DocValuesWriter` , so we can 
supply `DataCubesConsumer` and keep track of the resultant values.
   
   
   1. During flush, in a new method `writeDataCubes`, we supply 
`dataCubeDocValuesConsumer` to `docValuesWriter.flush`
   ```
     // For all doc values fields
     if(perField.docValuesWriter !=null) {
   
       {
         if (dataCubeDocValuesConsumer == null) {
           // lazy init
           DataCubesFormat fmt = state.segmentInfo.getCodec().dataCubesFormat();
           dataCubeDocValuesConsumer = fmt.fieldsConsumer(state, 
dataCubesConfig);
         }
         perField.docValuesWriter.flush(state, sortMap, 
dataCubeDocValuesConsumer);
       }
     }
   
     // This creates the dataCubes indices
     dataCubeDocValuesConsumer.flush(dataCubesConfig);
   
   ```
   `DocValuesWriter.flush` calls respective `addNumericField` , 
`addSortedSetField` in the supplied consumer.
   
   2. Then in the `DataCubesDocValuesConsumer`, we keep track of the fields and 
the associated doc values.  And in flush we can make use of the `DocValues` for 
each`DataCubeField`
   
   ```
   public class DataCubeDocValuesConsumer extends DocValuesConsumer {
   
     Map<String, NumericDocValues> numericDocValuesMap = new 
ConcurrentHashMap<>();
     Map<String, SortedSetDocValues> sortedSetDocValuesMap = new 
ConcurrentHashMap<>();
   
     @Override
     public void addSortedSetField(FieldInfo field, DocValuesProducer 
valuesProducer)
         throws IOException {
       sortedSetDocValuesMap.put(field.name, 
valuesProducer.getSortedSet(field));
     }
   
     @Override
     public void addNumericField(FieldInfo field, DocValuesProducer 
valuesProducer)
         throws IOException {
      numericDocValuesMap.put(field.name, valuesProducer.getNumeric(field));
     }
   }
   
    public void flush(DataCubesConfig dataCubesConfig) throws IOException {
         for(DataCubeField field : config.getFields()) {
                for(String dim : field.getDims()) {
                        // Get docValues from the map ( we can get a clone / 
singleton )
                        // Custom implementation over docValuesIterator
                }
                for(String metric : field.getMetrics()) {
                      // Get docValues from the map
                      // Custom implementation over docValuesIterator
                }
         }
    }
   
   
   ```
   
   ### Merge
   During merge, we will most likely not need `DocValues` , instead `Merge` 
will be for `DataCubeIndices` and associated structures.
   
   POC 
[code](https://github.com/bharath-techie/lucene/commit/d4455221becca86c039236f4a730066626207870#diff-24dc83bf177eafec2219cdc119f0eaae8c52da9a949973d8045fdef49a0de16e)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Support for building materialized views using Lucene formats [lucene]

Reply via email to