Repository: kylin Updated Branches: refs/heads/1.x-staging e16ffb7ae -> 75b6479fe
KYLIN-1295 Documents on some common technical concepts Project: http://git-wip-us.apache.org/repos/asf/kylin/repo Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/75b6479f Tree: http://git-wip-us.apache.org/repos/asf/kylin/tree/75b6479f Diff: http://git-wip-us.apache.org/repos/asf/kylin/diff/75b6479f Branch: refs/heads/1.x-staging Commit: 75b6479fecbbf8872351cbd2dcfa5dc60f5a0e3e Parents: e16ffb7 Author: lidongsjtu <don...@ebay.com> Authored: Tue Jan 12 10:34:02 2016 +0800 Committer: lidongsjtu <don...@ebay.com> Committed: Tue Jan 12 10:34:16 2016 +0800 ---------------------------------------------------------------------- website/_docs/gettingstarted/concepts.md | 65 +++++++++++++++++++ website/_docs/gettingstarted/terminology.md | 2 +- .../images/docs/concepts/AggregationGroup.png | Bin 0 -> 105363 bytes website/images/docs/concepts/CubeAction.png | Bin 0 -> 110592 bytes website/images/docs/concepts/CubeDesc.png | Bin 0 -> 190025 bytes website/images/docs/concepts/CubeInstance.png | Bin 0 -> 285222 bytes website/images/docs/concepts/CubeSegment.png | Bin 0 -> 96393 bytes website/images/docs/concepts/DataModel.png | Bin 0 -> 193661 bytes website/images/docs/concepts/DataSource.png | Bin 0 -> 180295 bytes website/images/docs/concepts/Dimension.png | Bin 0 -> 190495 bytes website/images/docs/concepts/Job.png | Bin 0 -> 299095 bytes website/images/docs/concepts/JobAction.png | Bin 0 -> 53369 bytes website/images/docs/concepts/Measure.png | Bin 0 -> 160824 bytes website/images/docs/concepts/Partition.png | Bin 0 -> 148494 bytes 14 files changed, 66 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/_docs/gettingstarted/concepts.md ---------------------------------------------------------------------- diff --git a/website/_docs/gettingstarted/concepts.md b/website/_docs/gettingstarted/concepts.md new file mode 100644 index 0000000..081ec54 --- /dev/null +++ b/website/_docs/gettingstarted/concepts.md @@ -0,0 +1,65 @@ +--- +layout: docs +title: "Technical Concepts" +categories: gettingstarted +permalink: /docs/gettingstarted/concepts.html +version: v1.2 +since: v1.2 +--- + +Here are some basic technical concepts used in Apache Kylin, please check them for your reference. +For terminology in domain, please refer to: [Terminology](terminology.md) + +## CUBE +* __Table__ - This is definition of hive tables as source of cubes, which must be synced before building cubes. + + +* __Data Model__ - This describes a [STAR SCHEMA](https://en.wikipedia.org/wiki/Star_schema) data model, which defines fact/lookup tables and filter condition. + + +* __Cube Descriptor__ - This describes definition and settings for a cube instance, defining which data model to use, what dimensions and measures to have, how to partition to segments and how to handle auto-merge etc. + + +* __Cube Instance__ - This is instance of cube, built from one cube descriptor, and consist of one or more cube segments according partition settings. + + +* __Partition__ - User can define a DATE/STRING column as partition column on cube descriptor, to separate one cube into several segments with different date periods. + + +* __Cube Segment__ - This is actual carrier of cube data, and maps to a HTable in HBase. One building job creates one new segment for the cube instance. Once data change on specified data period, we can refresh related segments to avoid rebuilding whole cube. + + +* __Aggregation Group__ - Each aggregation group is subset of dimensions, and build cuboid with combinations inside. It aims at pruning for optimization. + + +## DIMENSION & MEASURE +* __Mandotary__ - This dimension type is used for cuboid pruning, if a dimension is specified as âmandatoryâ, then those combinations without such dimension are pruned. +* __Hierarchy__ - This dimension type is used for cuboid pruning, if dimension A,B,C forms a âhierarchyâ relation, then only combinations with A, AB or ABC shall be remained. +* __Derived__ - On lookup tables, some dimensions could be generated from its PK, so there's specific mapping between them and FK from fact table. So those dimensions are DERIVED and don't participate in cuboid generation. + + +* __Count Distinct(HyperLogLog)__ - Immediate COUNT DISTINCT is hard to calculate, a approximate algorithm - [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) is introduced, and keep error rate in a lower level. +* __Count Distinct(Precise)__ - Precise COUNT DISTINCT will be pre-calculated basing on RoaringBitmap, currently only int or bigint are supported. +* __Top N__ - (Will release in 2.x) For example, with this measure type, user can easily get specified numbers of top sellers/buyers etc. + + +## CUBE ACTIONS +* __BUILD__ - Given an interval of partition column, this action is to build a new cube segment. +* __REFRESH__ - This action will rebuilt cube segment in some partition period, which is used in case of source table increasing. +* __MERGE__ - This action will merge multiple continuous cube segments into single one. This can be automated with auto-merge settings in cube descriptor. +* __PURGE__ - Clear segments under a cube instance. This will only update metadata, and won't delete cube data from HBase. + + +## JOB STATUS +* __NEW__ - This denotes one job has been just created. +* __PENDING__ - This denotes one job is paused by job scheduler and waiting for resources. +* __RUNNING__ - This denotes one job is running in progress. +* __FINISHED__ - This denotes one job is successfully finished. +* __ERROR__ - This denotes one job is aborted with errors. +* __DISCARDED__ - This denotes one job is cancelled by end users. + + +## JOB ACTION +* __RESUME__ - Once a job in ERROR status, this action will try to restore it from latest successful point. +* __DISCARD__ - No matter status of a job is, user can end it and release resources with DISCARD action. + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/_docs/gettingstarted/terminology.md ---------------------------------------------------------------------- diff --git a/website/_docs/gettingstarted/terminology.md b/website/_docs/gettingstarted/terminology.md index ac4a19d..0f9e669 100644 --- a/website/_docs/gettingstarted/terminology.md +++ b/website/_docs/gettingstarted/terminology.md @@ -8,7 +8,7 @@ since: v0.5.x --- -Here are some terms we are using in Apache Kylin, please check them for your reference. +Here are some domain terms we are using in Apache Kylin, please check them for your reference. They are basic knowledge of Apache Kylin which also will help to well understand such concerpt, term, knowledge, theory and others about Data Warehouse, Business Intelligence for analycits. * __Data Warehouse__: a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, [wikipedia](https://en.wikipedia.org/wiki/Data_warehouse) http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/AggregationGroup.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/AggregationGroup.png b/website/images/docs/concepts/AggregationGroup.png new file mode 100644 index 0000000..0e563fc Binary files /dev/null and b/website/images/docs/concepts/AggregationGroup.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/CubeAction.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/CubeAction.png b/website/images/docs/concepts/CubeAction.png new file mode 100644 index 0000000..bbc6ef5 Binary files /dev/null and b/website/images/docs/concepts/CubeAction.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/CubeDesc.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/CubeDesc.png b/website/images/docs/concepts/CubeDesc.png new file mode 100644 index 0000000..a0736a4 Binary files /dev/null and b/website/images/docs/concepts/CubeDesc.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/CubeInstance.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/CubeInstance.png b/website/images/docs/concepts/CubeInstance.png new file mode 100644 index 0000000..7748afa Binary files /dev/null and b/website/images/docs/concepts/CubeInstance.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/CubeSegment.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/CubeSegment.png b/website/images/docs/concepts/CubeSegment.png new file mode 100644 index 0000000..6f57720 Binary files /dev/null and b/website/images/docs/concepts/CubeSegment.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/DataModel.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/DataModel.png b/website/images/docs/concepts/DataModel.png new file mode 100644 index 0000000..dd959f5 Binary files /dev/null and b/website/images/docs/concepts/DataModel.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/DataSource.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/DataSource.png b/website/images/docs/concepts/DataSource.png new file mode 100644 index 0000000..1933fa3 Binary files /dev/null and b/website/images/docs/concepts/DataSource.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/Dimension.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/Dimension.png b/website/images/docs/concepts/Dimension.png new file mode 100644 index 0000000..65e5810 Binary files /dev/null and b/website/images/docs/concepts/Dimension.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/Job.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/Job.png b/website/images/docs/concepts/Job.png new file mode 100644 index 0000000..a790239 Binary files /dev/null and b/website/images/docs/concepts/Job.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/JobAction.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/JobAction.png b/website/images/docs/concepts/JobAction.png new file mode 100644 index 0000000..1ec370b Binary files /dev/null and b/website/images/docs/concepts/JobAction.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/Measure.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/Measure.png b/website/images/docs/concepts/Measure.png new file mode 100644 index 0000000..34542f6 Binary files /dev/null and b/website/images/docs/concepts/Measure.png differ http://git-wip-us.apache.org/repos/asf/kylin/blob/75b6479f/website/images/docs/concepts/Partition.png ---------------------------------------------------------------------- diff --git a/website/images/docs/concepts/Partition.png b/website/images/docs/concepts/Partition.png new file mode 100644 index 0000000..636eaed Binary files /dev/null and b/website/images/docs/concepts/Partition.png differ