I see code derived from Mondrian in the org.carbondata.core.carbon package[1] (I’m familiar with Mondrian’s code structure because I wrote it). Mondrian was originally EPL and as such cannot be re-licensed under ASL. Everything is probably fine, but as part of incubation, we will need to make sure that this and other code has a clear progeny.
Julian [1] https://github.com/HuaweiBigData/carbondata/tree/master/core/src/main/java/org/carbondata/core/carbon <https://github.com/HuaweiBigData/carbondata/tree/master/core/src/main/java/org/carbondata/core/carbon> > On May 19, 2016, at 10:04 AM, Liang Chen <chenliang...@huawei.com> wrote: > > Hi Lars > > Thanks for you participated in discussion. > > Based on the below requirements, we investigated existing file formats in > the Hadoop eco-system, but we could not find a suitable solution that > satisfying requirements all at the same time, so we start designing > CarbonData. > R1.Support big scan & only fetch a few columns > R2.Support primary key lookup response in sub-second. > R3.Support interactive OLAP-style query over big data which involve many > filters in a query, this type of workload should response in seconds. > R4.Support fast individual record extraction which fetch all columns of the > record. > R5.Support HDFS so that customer can leverage existing Hadoop cluster. > > When we investigate Parquet/ORC, it seems they work very well for R1 and R5, > but they does not meet for R2,R3,R4. So we designed CarbonData mainly to add > following differentiating features: > > 1.Stores data along with index: it can significantly accelerate query > performance and reduces the I/O scans and CPU resources, where there are > filters in the query. CarbonData index is consisted of multiple level, a > processing framework can leverage this index to reduce the task it needs to > schedule and process, and it can also do skip scan in more finer grain unit > (called blocklet) in task side scanning instead of scanning the whole file. > > 2.Operable encoded data :Through supporting efficient compression and global > encoding schemes, can query on compressed/encoded data, the data can be > converted just before returning the results to the users, which is "late > materialized". > > 3.Column group: Allow multiple columns form a column group to store as row > format, thus cost of column reconstructing is reduced. > > 4.Supports for various use cases with one single Data format : like > interactive OLAP-style query, Sequential Access (big scan), Random Access > (narrow scan). > > Please kindly let me know if the above info answer your questions. > > Regards > Liang > > > > > > > -- > View this message in context: > http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-CarbonData-incubation-proposal-tp49643p49652.html > Sent from the Apache Incubator - General mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org >