brancz opened a new issue, #47:
URL: https://github.com/apache/arrow-go/issues/47

   ### Describe the enhancement requested
   
   The dictionary builders already have methods to insert whole arrays, but 
unfortunately they cause a lot of potentially unnecessary CPU time.
   
   Take the following scenario: I have two sources of data, one of them is 
already dictionary encoded, the other is not, so I would like to initialize the 
dictionary builder with the existing dictionary, and only insert new items for 
the non-dictionary encodede items. Now comes the important part: I'm ok with 
inserts potentially creating duplicates in the dictionary.
   
   I would like to propose a new API `PrependInitialDict`, that takes an array 
and *must* be called before inserting into the indices array, otherwise it 
errors, and then any new dictionary item inserted start at `len(initialDict)+i`.
   
   Theoretically it could even be designed to insert dicts multiple times, but 
I would suggest to start the API like this and only extend when we have the use 
cases.
   
   ---
   
   Alternative I have considered: Prepending the dictionary after building the 
"new" dictionary and have any indices start at the length. I've found this to 
not really be workable, for two reasons:
   
   1) There would still have to be an API to set the initial index.
   2) It would rely on the user actually prepending the dictionary afterward 
(easy to misuse).
   3) It would be quite awkward to use in scenarios where there are deeply 
nested lists and structs, where building the final record is primarily done 
using a record builder, but only this array would be the exception.
   
   cc @zeroshade 
   
   ### Component(s)
   
   Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to