bharath-techie commented on issue #13188: URL: https://github.com/apache/lucene/issues/13188#issuecomment-2075553304
Thanks for the comments @msfroh . Good idea, if we want to supply `Dims` and `metric` values to `DataCubesWriter` as part of `addDocument` flow and consume them similar to other formats. But there are some cons: 1. For adding an attribute to the field : (Lets take `IntField` for example ) The same `IntField` can be part of both dimension and metric ( in fact multiple metrics ) as part of a `DataCubeField`. And same `IntField` can be part of multiple `DataCubeField`. 2. If we solve the above, and supply values via `DataCubesWriter` for each `DataCubeField`, there will be duplicate values depending on the configuration. So in order to avoid the duplication of values , how about we derive the values of `DataCubeField` from the original values of `DocValuesWriter` during `flush` ? ### Flush `IntField` values will be already part of `DocValuesWriter` , so we can supply `DataCubesConsumer` and keep track of the resultant values. 1. During flush, in a new method `writeDataCubes`, we supply `dataCubeDocValuesConsumer` to `docValuesWriter.flush` ``` // For all doc values fields if(perField.docValuesWriter !=null) { { if (dataCubeDocValuesConsumer == null) { // lazy init DataCubesFormat fmt = state.segmentInfo.getCodec().dataCubesFormat(); dataCubeDocValuesConsumer = fmt.fieldsConsumer(state, dataCubesConfig); } perField.docValuesWriter.flush(state, sortMap, dataCubeDocValuesConsumer); } } // This creates the dataCubes indices dataCubeDocValuesConsumer.flush(dataCubesConfig); ``` `DocValuesWriter.flush` calls respective `addNumericField` , `addSortedSetField` in the supplied consumer. 2. Then in the `DataCubesDocValuesConsumer`, we keep track of the fields and the associated doc values. And in flush we can make use of the `DocValues` for each`DataCubeField` ``` public class DataCubeDocValuesConsumer extends DocValuesConsumer { Map<String, NumericDocValues> numericDocValuesMap = new ConcurrentHashMap<>(); Map<String, SortedSetDocValues> sortedSetDocValuesMap = new ConcurrentHashMap<>(); @Override public void addSortedSetField(FieldInfo field, DocValuesProducer valuesProducer) throws IOException { sortedSetDocValuesMap.put(field.name, valuesProducer.getSortedSet(field)); } @Override public void addNumericField(FieldInfo field, DocValuesProducer valuesProducer) throws IOException { numericDocValuesMap.put(field.name, valuesProducer.getNumeric(field)); } } public void flush(DataCubesConfig dataCubesConfig) throws IOException { for(DataCubeField field : config.getFields()) { for(String dim : field.getDims()) { // Get docValues from the map ( we can get a clone / singleton ) // Custom implementation over docValuesIterator } for(String metric : field.getMetrics()) { // Get docValues from the map // Custom implementation over docValuesIterator } } } ``` ### Merge During merge, we will most likely not need `DocValues` , instead `Merge` will be for `DataCubeIndices` and associated structures. POC [code](https://github.com/bharath-techie/lucene/commit/d4455221becca86c039236f4a730066626207870#diff-24dc83bf177eafec2219cdc119f0eaae8c52da9a949973d8045fdef49a0de16e) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org