[ 
https://issues.apache.org/jira/browse/LUCENE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041729#comment-17041729
 ] 

juan camilo rodriguez duran commented on LUCENE-9236:
-----------------------------------------------------

Yeah that's why I consider do this in 3 steps, to have feedback at each stage 
and depending on the results continue or stop in something useful for 
everybody. [~rcmuir] just to say that the PerFieldDocValues is not enough when 
you have to write a new doc values format at in its contract you must implement 
the 5 different functions even If a field only supports one of those functions 
(cannot be numeric and sorted set at same time for example).

> Having a modular Doc Values format
> ----------------------------------
>
>                 Key: LUCENE-9236
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9236
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: juan camilo rodriguez duran
>            Priority: Minor
>              Labels: docValues
>
>  Today DocValues Consumer/Producer require override 5 different methods, even 
> if you only want to use one and given that one given field can only support 
> one doc values type at same time.
>  
> In the attached PR I’ve implemented a new modular version of those classes 
> (consumer/producer) each one having a single responsibility and writing in 
> the same unique file.
> This is mainly a refactor of the existing format opening the possibility to 
> override or implement the sub-format you need.
>  
> I’ll do in 3 steps:
>  # Create a CompositeDocValuesFormat and moving the code of 
> Lucene80DocValuesFormat in separate classes, without modifying the inner 
> code. At same time I created a Lucene85CompositeDocValuesFormat based on 
> these changes.
>  # I’ll introduce some basic components for writing doc values in general 
> such as:
>  ## DocumentIdSetIterator Serializer: used in each type of field based on an 
> IndexedDISI.
>  ## Document Ordinals Serializer: Used in Sorted and SortedSet for 
> deduplicate values using a dictionary.
>  ## Document Boundaries Serializer (optional used only for multivalued 
> fields: SortedNumeric and SortedSet)
>  ## TermsEnum Serializer: useful to write and read the terms dictionary for 
> sorted and sorted set doc values.
>  # I’ll create the new Sub-DocValues format using the previous components.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to