Re: [Discussion] Carbondata Store size optimization

Jacky Li Fri, 14 Sep 2018 00:47:14 -0700

+1

Compression is quite important for the scan performance. I think all your 
listed points are valid. Please feel free to contribute.


Regards,
Jacky

> 在 2018年9月12日，下午5:09，Kumar Vishal <[email protected]> 写道：
> 
> Hi All,
> I am working on below carbondata store size optimization to reduce the size
> of the carbondata file which will improve IO performance during query.
> 
> *1. String/Varchar store size optimization*
> *Problem:*
> Currently String/Varchar data type values are stored in LV format in
> carbondata file and during query, first it calculates offset(position of
> each cell value) of each value in a page, which is impacting query
> performance and storage size is also high as we cannot apply any encoding
> on length part as it is stored along with the data.
> *Solution:*
> Store length part separately from data part and apply adaptive on length.
> This will optimize store size and during query offset calculation will be
> much faster as only need to look in length pat. It will improve query
> performance.
> 
> *2. Adaptive encoding for Global/Direct/Local dictionary columns*
> *Problem:*
> Global/Direct/Local dictionary are stored  in binary format and only snappy
> is applied for compression. As Global/Direct/Local dictionary values are of
> Integer data type  it can adaptability stored with the data type smaller
> than Integer.
> *Solution:*
> Add adaptive for global/direct dictionary column to reduce the store size.
> 
> *3. Local dictionary for Primitive data type columns*
> Currently in carbondata local dictionary is not supported for primitive
> columns(supported only for String datatype column). For low cardinality
> columns, local dictionary encoding will be effective and adaptive can be
> applied on top it. It will reduce the store size.
> 
> Any suggestion from community is most welcomed.
> 
> -Regards
> Kumar Vishal
>

Re: [Discussion] Carbondata Store size optimization

Reply via email to