Author: nju_yaho
Date: Sun Jan 13 05:04:15 2019
New Revision: 1851193
URL: http://svn.apache.org/viewvc?rev=1851193&view=rev
Log:
update download documents
Modified:
kylin/site/cn/download/index.html
kylin/site/download/index.html
kylin/site/feed.xml
Modified: kylin/site/cn/download/index.html
URL:
http://svn.apache.org/viewvc/kylin/site/cn/download/index.html?rev=1851193&r1=1851192&r2=1851193&view=diff
==============================================================================
--- kylin/site/cn/download/index.html (original)
+++ kylin/site/cn/download/index.html Sun Jan 13 05:04:15 2019
@@ -174,6 +174,19 @@ var _hmt = _hmt || [];
</header>
<p>æ¨å¯ä»¥æç
§è¿äº<a
href="https://www.apache.org/info/verification.html">æ¥éª¤</a>
并使ç¨è¿äº<a
href="https://www.apache.org/dist/kylin/KEYS">KEYS</a>æ¥éªè¯ä¸è½½æä»¶çæææ§.</p>
+<h4 id="v260">v2.6.0</h4>
+<ul>
+ <li>è¿æ¯2.5 çæ¬åçä¸ä¸ªä¸»è¦åå¸çæ¬ï¼å
å«äº94个é®é¢çä¿®å¤ä»¥ååç§æ¹è¿ãå
³äºå
·ä½å
容请æ¥çåå¸è¯´æ.</li>
+ <li><a href="/docs/release_notes.html">åå¸è¯´æ</a> and <a
href="/docs/howto/howto_upgrade.html">å级æå</a></li>
+ <li>æºç ä¸è½½: <a
href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip">apache-kylin-2.6.0-source-release.zip</a>
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip.asc">asc</a>]
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip.sha256">sha256</a>]</li>
+ <li>äºè¿å¶å
ä¸è½½:
+ <ul>
+ <li>for HBase 1.x (includes HDP 2.3+, AWS EMR 5.0+, Azure HDInsight 3.4
- 3.6) - <a
href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz">apache-kylin-2.6.0-bin-hbase1x.tar.gz</a>
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz.asc">asc</a>]
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz.sha256">sha256</a>]</li>
+ <li>for CDH 5.7+ - <a
href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz">apache-kylin-2.6.0-bin-cdh57.tar.gz</a>
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz.asc">asc</a>]
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz.sha256">sha256</a>]</li>
+ </ul>
+ </li>
+</ul>
+
<h4 id="v252">v2.5.2</h4>
<ul>
<li>è¿æ¯2.5.1 çæ¬åçä¸ä¸ªbugfixåå¸çæ¬ï¼å
å«äº12
个以ååç§æ¹è¿ãå
³äºå
·ä½å
容请æ¥çåå¸è¯´æ.</li>
@@ -193,18 +206,18 @@ var _hmt = _hmt || [];
</li>
</ul>
-<h4 id="jdbc-">JDBC 驱å¨ç¨åº</h4>
+<h4 id="jdbc-驱å¨ç¨åº">JDBC 驱å¨ç¨åº</h4>
<p>Kylin JDBC 驱å¨ç¨åº, <code
class="highlighter-rouge">kylin-jdbc-<version>.jar</code>, å¨ Kylin
äºè¿å¶å
ç âlibâ ç®å½ä¸.</p>
-<h4 id="odbc-">ODBC 驱å¨ç¨åº</h4>
+<h4 id="odbc-驱å¨ç¨åº">ODBC 驱å¨ç¨åº</h4>
<ul>
<li><a
href="http://kylin.apache.org/download/KylinODBCDriver-2.1.0.zip">Kylin ODBC
é©±å¨ v2.1.0</a> ï¼ä¸ææ Kylin çæ¬å
¼å®¹ï¼</li>
</ul>
<p>注: Kylin ODBC 驱å¨ä¾èµ <a
href="http://www.microsoft.com/en-us/download/details.aspx?id=30679">Microsoft
Visual C++ 2012 Redistributable</a>ã</p>
-<h4 id="section">以åççæ¬</h4>
+<h4 id="以åççæ¬">以åççæ¬</h4>
<p>Apache Kylinçæ§çæ¬å¯ä»¥ä» <a
href="https://archive.apache.org/dist/kylin/">https://archive.apache.org/dist/kylin/</a>
ä¸è½½ã</p>
</div>
Modified: kylin/site/download/index.html
URL:
http://svn.apache.org/viewvc/kylin/site/download/index.html?rev=1851193&r1=1851192&r2=1851193&view=diff
==============================================================================
--- kylin/site/download/index.html (original)
+++ kylin/site/download/index.html Sun Jan 13 05:04:15 2019
@@ -6033,6 +6033,19 @@ var _hmt = _hmt || [];
</header>
<p>You can verify the download by following these <a
href="https://www.apache.org/info/verification.html">procedures</a> and using
these <a href="https://www.apache.org/dist/kylin/KEYS">KEYS</a>.</p>
+<h4 id="v260">v2.6.0</h4>
+<ul>
+ <li>This is a major release after 2.5, with 94 bug fixes and enhancement.
Check the release notes.</li>
+ <li><a href="/docs/release_notes.html">Release notes</a> and <a
href="/docs/howto/howto_upgrade.html">upgrade guide</a></li>
+ <li>Source download: <a
href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip">apache-kylin-2.6.0-source-release.zip</a>
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip.asc">asc</a>]
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip.sha256">sha256</a>]</li>
+ <li>Binary download:
+ <ul>
+ <li>for HBase 1.x (includes HDP 2.3+, AWS EMR 5.0+, Azure HDInsight 3.4
- 3.6) - <a
href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz">apache-kylin-2.6.0-bin-hbase1x.tar.gz</a>
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz.asc">asc</a>]
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz.sha256">sha256</a>]</li>
+ <li>for CDH 5.7+ - <a
href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz">apache-kylin-2.6.0-bin-cdh57.tar.gz</a>
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz.asc">asc</a>]
[<a
href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz.sha256">sha256</a>]</li>
+ </ul>
+ </li>
+</ul>
+
<h4 id="v252">v2.5.2</h4>
<ul>
<li>This is a bugfix release after 2.5.1, with 12 bug fixes and enhancement.
Check the release notes.</li>
Modified: kylin/site/feed.xml
URL:
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1851193&r1=1851192&r2=1851193&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Sun Jan 13 05:04:15 2019
@@ -19,8 +19,8 @@
<description>Apache Kylin Home</description>
<link>http://kylin.apache.org/</link>
<atom:link href="http://kylin.apache.org/feed.xml" rel="self"
type="application/rss+xml"/>
- <pubDate>Thu, 10 Jan 2019 05:59:22 -0800</pubDate>
- <lastBuildDate>Thu, 10 Jan 2019 05:59:22 -0800</lastBuildDate>
+ <pubDate>Sat, 12 Jan 2019 20:42:57 -0800</pubDate>
+ <lastBuildDate>Sat, 12 Jan 2019 20:42:57 -0800</lastBuildDate>
<generator>Jekyll v2.5.3</generator>
<item>
@@ -31,12 +31,12 @@
<p>During the Apache Kylin Meetup in August 2018, the Meituan team
shared their Kylin on Druid (KoD) solution. Why did they develop this hybrid
system? Whatâs the rationale behind it? This article will answer these
questions and help you to understand the differences and the pros and cons of
each OLAP engine.</p>
-<h2 id="introduction-to-apache-kylin">01 Introduction to
Apache Kylin</h2>
+<h2 id="01-introduction-to-apache-kylin">01 Introduction to
Apache Kylin</h2>
<p>Apache Kylin is an open source distributed big data analytics engine.
It constructs data models on top of huge datasets, builds pre-calculated Cubes
to support multi-dimensional analysis, and provides a SQL query interface and
multi-dimensional analysis on top of Hadoop, with general ODBC, JDBC, and
RESTful API interfaces. Apache Kylinâs unique pre-calculation ability enables
it to handle extremely large datasets with sub-second query response
times.<br />
<img src="/images/blog/Kylin-On-Durid/1 kylin_architecture.png"
alt="" /><br />
Graphic 1 Kylin Architecture</p>
-<h2 id="apache-kylins-advantage">02 Apache Kylinâs
Advantage</h2>
+<h2 id="02-apache-kylins-advantage">02 Apache Kylinâs
Advantage</h2>
<ol>
<li>The mature, Hadoop-based computing engines (MapReduce and Spark)
that provide strong capability of pre-calculation on super large datasets,
which can be deployed out-of-the-box on any mainstream Hadoop
platform.</li>
<li>Support of ANSI SQL that allows users to do data analysis with SQL
directly.</li>
@@ -48,34 +48,34 @@ Graphic 1 Kylin Architecture</p>
<li>Support of both batch loading of super large historical datasets
and micro-batches of data streams.</li>
</ol>
-<h2 id="introduction-to-apache-druid-incubating">03
Introduction to Apache Druid (incubating)</h2>
+<h2 id="03-introduction-to-apache-druid-incubating">03
Introduction to Apache Druid (incubating)</h2>
<p>Druid was created in 2012. Itâs an open source distributed data
store. Its core design combines the concept of analytical databases,
time-series databases, and search systems, and it can support data collection
and analytics on fairly large datasets. Druid uses an Apache V2 license and is
an Apache incubator project.</p>
<p>Druid Architecture<br />
From the perspective of deployment architectures, Druidâs processes mostly
fall into 3 categories based on their roles.</p>
-<h3
id="data-node-slave-node-for-data-ingestion-and-calculation">â¢
Data Node (Slave node for data ingestion and calculation)</h3>
+<h3
id="-data-node-slave-node-for-data-ingestion-and-calculation">â¢
Data Node (Slave node for data ingestion and calculation)</h3>
<p>The Historical node is in charge of loading segments (committed
immutable data) and receiving queries on historical data.<br />
Middle Manager is in charge of data ingestion and commit segments. Each task
is done by a separate JVM. <br />
Peon is in charge of completing a single task, which is managed and monitored
by the Middle Manager.</p>
-<h3 id="query-node">⢠Query Node</h3>
+<h3 id="-query-node">⢠Query Node</h3>
<p>Broker receives query requests, determines on which segment the data
resides, and distributes sub-queries and merges query results.</p>
-<h3 id="master-node-task-coordinator-and-cluster-manager">â¢
Master Node (Task Coordinator and Cluster Managerï¼</h3>
+<h3 id="-master-node-task-coordinator-and-cluster-manager">â¢
Master Node (Task Coordinator and Cluster Managerï¼</h3>
<p>Coordinator monitors Historical nodes, dispatches segments and
monitor workload.<br />
Overlord monitors Middle Manager, dispatches tasks to Middle Manager, and
assists releasing of segments.</p>
<h3 id="external-dependency">External Dependency</h3>
<p>At the same time, Druid has 3 replaceable external
dependencies.</p>
-<h3 id="deep-storage-distributed-storage">⢠Deep Storage
(distributed storage)</h3>
+<h3 id="-deep-storage-distributed-storage">⢠Deep Storage
(distributed storage)</h3>
<p>Druid uses Deep storage to transfer data files between
nodes.</p>
-<h3 id="metadata-storage">⢠Metadata Storage</h3>
+<h3 id="-metadata-storage">⢠Metadata Storage</h3>
<p>Metadata Storage stores the metadata about segment positions and task
output.</p>
-<h3
id="zookeeper-cluster-management-and-task-coordination">â¢
Zookeeper (cluster management and task coordination)</h3>
+<h3
id="-zookeeper-cluster-management-and-task-coordination">â¢
Zookeeper (cluster management and task coordination)</h3>
<p>Druid uses Zookeeper (ZK) to ensure consistency of the cluster
status.<br />
<img src="/images/blog/Kylin-On-Durid/2 druid_architecture.png"
alt="" /><br />
Graphic 2 Druid Architecture</p>
@@ -98,7 +98,7 @@ Graphic 4 Druid Schema</p>
<li>Separation of cold/hot data.</li>
</ol>
-<h2 id="why-did-meituan-develop-kylin-on-druid">04 Why did
Meituan develop Kylin on Druid?</h2>
+<h2 id="04-why-did-meituan-develop-kylin-on-druid">04 Why did
Meituan develop Kylin on Druid?</h2>
<p>Meituan deployed into production an offline OLAP platform with Apache
Kylin as its core component in 2015. Since then the platform has served almost
all business lines with fast growing data volume and query executions, and the
stress on the cluster has increased accordingly. Throughout the time, the tech
team in Meituan keeps exploring better solutions for some of Kylinâs
challenges. The major one is Apache HBase, the storage that Kylin relies
on.</p>
<p>Kylin stores its data in HBase by converting the Dimensions and
Measures into HBase Keys and Values, respectively. As HBase doesnât support
secondary index and only has one RowKey index, Kylinâs Dimension values will
be combined into a fixed sequence to store as RowKey. In this way, filtering on
a Dimension in the front of the sequence will perform better than those at the
back. Hereâs an example:</p>
@@ -112,14 +112,13 @@ Graphic 6 Cube2 RowKey Sequence</p>
<p><strong>Now letâs query each Cube with the same SQL and
compare the response time.</strong></p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>select S_SUPPKEY, C_CUSTKEY,
sum(LO_EXTENDEDPRICE) as m1
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>select S_SUPPKEY, C_CUSTKEY,
sum(LO_EXTENDEDPRICE) as m1
from P_LINEORDER
left join SUPPLIER on P_LINEORDER.LO_SUPPKEY = SUPPLIER.S_SUPPKEY
left join CUSTOMER on P_LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
WHERE (LO_ORDERKEY &gt; 1799905 and LO_ORDERKEY &lt; 1799915) or
(LO_ORDERKEY &gt; 1999905 and LO_ORDERKEY &lt; 1999935)
GROUP BY S_SUPPKEY, C_CUSTKEY;
-</code></pre>
-</div>
+</code></pre></div></div>
<p><strong>Below shows the time consumed and data
scanned:</strong><br />
<img src="/images/blog/Kylin-On-Durid/7 cube1_query_log.png"
alt="" /><br />
@@ -136,11 +135,12 @@ Graphic 8 Cube1 Query Log</p>
<p>Kylinâs query performance and user experience can be greatly
improved with pure columnar storage and multiple indexes on Dimensions. As
analyzed above, Druid happens to meet the requirements of columnar +
multi-index. So the Meituan Kylin team decided to try replacing HBase with
Druid.</p>
-<p>Why not just use Druid then? Meituanâs engineers shared their
thoughts:<br />
-1. Druidâs native query language is in its own specific JSON format,
which is not as easy to pick up as SQL. Although the Druid community added SQL
support later on, the support is not complete and does not meet the data
analystsâ requirement of complex SQL queries. On the contrary, Kylin natively
supports ANSI SQL, uses Apache Calcite for semantic parsing, and supports SQL
features such as join, sub query, window functions, etc. In addition, it
provides standard interfaces including ODBC/JDBC, and can directly connect with
BI tools such as Tableau, Power BI, Superset, and Redash.</p>
-
+<p>Why not just use Druid then? Meituanâs engineers shared their
thoughts:</p>
<ol>
<li>
+ <p>Druidâs native query language is in its own specific JSON
format, which is not as easy to pick up as SQL. Although the Druid community
added SQL support later on, the support is not complete and does not meet the
data analystsâ requirement of complex SQL queries. On the contrary, Kylin
natively supports ANSI SQL, uses Apache Calcite for semantic parsing, and
supports SQL features such as join, sub query, window functions, etc. In
addition, it provides standard interfaces including ODBC/JDBC, and can directly
connect with BI tools such as Tableau, Power BI, Superset, and Redash.</p>
+ </li>
+ <li>
<p>Druid can support only single-table query. Multi-table joins are
very common in practice, but they cannot be supported by Druid. Kylin, however,
supports Star Schema and Snowflake Schema, satisfying multi-table join
requirements.</p>
</li>
<li>
@@ -167,9 +167,9 @@ Graphic 8 Cube1 Query Log</p>
<p>Therefore, it appears to be a promising OLAP solution to combine
Druidâs excellent columnar storage with Kylinâs usability, compatibility,
and completeness. Druid has columnar storage, inverted index, better filtering
performance than HBase, native OLAP features, and good secondary aggregation
capabilities. Meituanâs tech team decided to try replacing HBase with Druid
as the storage for Kylin.</p>
-<h2 id="section">05</h2>
-<p>### Kylin on Druid Design<br />
-At v1.5, Apache Kylin introduced plugable architecture and de-coupled
computing and storage components, which makes the replacement of HBase
possible. Here is a brief introduction to the main design concept of Kylin on
Druid based on Meituan engineer Kaisen Kangâs design doc. (Graphics 9 and 10
are from reference[1], and text are from reference[1] and [3])</p>
+<h2 id="05">05</h2>
+<h3 id="kylin-on-druid-design">Kylin on Druid Design</h3>
+<p>At v1.5, Apache Kylin introduced plugable architecture and de-coupled
computing and storage components, which makes the replacement of HBase
possible. Here is a brief introduction to the main design concept of Kylin on
Druid based on Meituan engineer Kaisen Kangâs design doc. (Graphics 9 and 10
are from reference[1], and text are from reference[1] and [3])</p>
<h3 id="process-of-building-cube">Process of Building
Cube</h3>
<ol>
@@ -202,12 +202,12 @@ Graphic 10 Process of Querying Cube</
<li>Kylin measure columns map to Druid measure columns.</li>
</ol>
-<h2 id="summary">06 Summary</h2>
+<h2 id="06-summary">06 Summary</h2>
<p>In this article, we first analyzed features and pros/cons of both
Kylin and Druid, and the reasons for poor performance of Hbase in Kylin in some
cases. Then we searched solutions and found the feasible option of using Druid
as the Kylin storage engine. At last, we illustrated the Kylin-on-Druid
architecture and the processes developed by Meituan.</p>
<p>Stay tuned for our next article about how to use Kylin on Druid, how
it performs, and how it can be improved.</p>
-<h2 id="reference">07 Reference</h2>
+<h2 id="07-reference">07 Reference</h2>
<ol>
<li>
@@ -225,7 +225,7 @@ Graphic 10 Process of Querying Cube</
</ol>
</description>
- <pubDate>Wed, 12 Dec 2018 09:30:00 -0800</pubDate>
+ <pubDate>Wed, 12 Dec 2018 17:30:00 +0000</pubDate>
<link>http://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/</link>
<guid
isPermaLink="true">http://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/</guid>
@@ -242,37 +242,37 @@ Graphic 10 Process of Querying Cube</
<p>è¿æ¯ç»§2.4.0 åçä¸ä¸ªæ°åè½çæ¬ãè¯¥çæ¬å¼å
¥äºå¾å¤æä»·å¼çæ¹è¿ï¼å®æ´çæ¹å¨å表请åè§<a
href="https://kylin.apache.org/docs/release_notes.html">release
notes</a>ï¼è¿éæä¸äºä¸»è¦æ¹è¿å说æï¼</p>
-<h3 id="all-in-spark--cubing-">All-in-Spark ç Cubing
弿</h3>
+<h3 id="all-in-spark-ç-cubing-弿">All-in-Spark ç
Cubing 弿</h3>
<p>Kylin ç Spark 弿å°ä½¿ç¨ Spark è¿è¡ cube
计ç®ä¸çææåå¸å¼ä½ä¸ï¼å
æ¬è·åå个维度çä¸åå¼ï¼å°
cuboid æä»¶è½¬æ¢ä¸º HBase HFileï¼åå¹¶ segmentï¼åå¹¶è¯å
¸çãé»è®¤ç Spark é
ç½®ä¹ç»è¿ä¼åï¼ä½¿å¾ç¨æ·å¯ä»¥è·å¾å¼ç®±å³ç¨çä½éªãç¸å
³å¼å任塿¯ KYLIN-3427, KYLIN-3441, KYLIN-3442.</p>
<p>Spark ä»»å¡ç®¡çä¹æææ¹è¿ï¼ä¸æ¦ Spark
ä»»å¡å¼å§è¿è¡ï¼æ¨å°±å¯ä»¥å¨Webæ§å¶å°ä¸è·å¾ä½ä¸é¾æ¥ï¼å¦ææ¨ä¸¢å¼è¯¥ä½ä¸ï¼Kylin
å°ç«å»ç»æ¢ Spark ä½ä¸ä»¥åæ¶éæ¾èµæºï¼å¦æéæ°å¯å¨
Kylinï¼å®å¯ä»¥ä»ä¸ä¸ä¸ªä½ä¸æ¢å¤ï¼è䏿¯éæ°æäº¤æ°ä½ä¸.</p>
-<h3 id="mysql--kylin-">MySQL å Kylin å
æ°æ®çåå¨</h3>
+<h3 id="mysql-å-kylin-å
æ°æ®çåå¨">MySQL å Kylin å
æ°æ®çåå¨</h3>
<p>å¨è¿å»ï¼HBase æ¯ Kylin å
æ°æ®åå¨çå¯ä¸éæ©ã
å¨æäºæ
åµä¸ HBaseä¸éç¨ï¼ä¾å¦ä½¿ç¨å¤ä¸ª HBase é群æ¥ä¸º Kylin
æä¾è·¨åºåçé«å¯ç¨ï¼è¿éå¤å¶ç HBase
é群æ¯åªè¯»çï¼æä»¥ä¸è½åå
æ°æ®åå¨ãç°å¨æä»¬å¼å
¥äº
MySQL Metastore
以满足è¿ç§éæ±ãæ¤åè½ç°å¨å¤äºæµè¯é¶æ®µãæ´å¤å
容åè§
KYLIN-3488ã</p>
-<h3 id="hybrid-model-">Hybrid model å¾å½¢çé¢</h3>
+<h3 id="hybrid-model-å¾å½¢çé¢">Hybrid model
å¾å½¢çé¢</h3>
<p>Hybrid æ¯ä¸ç§ç¨äºç»è£
å¤ä¸ª cube çé«çº§æ¨¡åã
å®å¯ç¨äºæ»¡è¶³ cube ç schema è¦åçæ¹åçæ
åµãè¿ä¸ªåè½è¿å»æ²¡æå¾å½¢çé¢ï¼å
æ¤åªæä¸å°é¨åç¨æ·ç¥éå®ãç°å¨æä»¬å¨ Web
çé¢ä¸å¼å¯äºå®ï¼ä»¥ä¾¿æ´å¤ç¨æ·å¯ä»¥å°è¯ã</p>
-<h3 id="cube-planner">é»è®¤å¼å¯ Cube planner</h3>
+<h3 id="é»è®¤å¼å¯-cube-planner">é»è®¤å¼å¯ Cube
planner</h3>
<p>Cube planner å¯ä»¥æå¤§å°ä¼å cube ç»æï¼åå°æå»ºç
cuboid
æ°éï¼ä»èèç计ç®/åå¨èµæºå¹¶æé«æ¥è¯¢æ§è½ã宿¯å¨v2.3ä¸å¼å
¥çï¼ä½é»è®¤æ
åµä¸æ²¡æå¼å¯ã为äºè®©æ´å¤ç¨æ·çå°å¹¶å°è¯å®ï¼æä»¬é»è®¤å¨v2.5ä¸å¯ç¨å®ã
ç®æ³å°å¨ç¬¬ä¸æ¬¡æå»º segment çæ¶åï¼æ ¹æ®æ°æ®ç»è®¡èªå¨ä¼å
cuboid éå.</p>
-<h3 id="segment-">æ¹è¿ç Segment åªæ</h3>
+<h3 id="æ¹è¿ç-segment-åªæ">æ¹è¿ç Segment
åªæ</h3>
<p>Segmentï¼ååºï¼ä¿®åªå¯ä»¥ææå°åå°ç£çåç½ç»I /
Oï¼å æ¤å¤§å¤§æé«äºæ¥è¯¢æ§è½ã è¿å»ï¼Kylin åªæååºå
(partition date column) çå¼è¿è¡ segment çä¿®åªã
妿æ¥è¯¢ä¸æ²¡æå°ååºåä½ä¸ºè¿æ»¤æ¡ä»¶ï¼é£ä¹ä¿®åªå°ä¸èµ·ä½ç¨ï¼ä¼æ«æææsegmentã.<br
/>
ç°å¨ä»v2.5å¼å§ï¼Kylin å°å¨ segment
级å«è®°å½æ¯ä¸ªç»´åº¦çæå°/æå¤§å¼ã 卿«æ segment
ä¹åï¼ä¼å°æ¥è¯¢çæ¡ä»¶ä¸æå°/æå¤§ç´¢å¼è¿è¡æ¯è¾ã
妿ä¸å¹é
ï¼å°è·³è¿è¯¥ segmentã
æ£æ¥KYLIN-3370äºè§£æ´å¤ä¿¡æ¯ã</p>
-<h3 id="yarn-">å¨ YARN ä¸åå¹¶åå
¸</h3>
+<h3 id="å¨-yarn-ä¸åå¹¶åå
¸">å¨ YARN ä¸åå¹¶åå
¸</h3>
<p>å½ segment åå¹¶æ¶ï¼å®ä»¬çè¯å
¸ä¹éè¦åå¹¶ãå¨è¿å»ï¼åå
¸åå¹¶åçå¨ Kylin ç JVM
ä¸ï¼è¿éè¦ä½¿ç¨å¤§éçæ¬å°å
åå CPU èµæºã å¨æç«¯æ
åµä¸ï¼å¦ææå 个并åä½ä¸ï¼ï¼å¯è½ä¼å¯¼è´ Kylin è¿ç¨å´©æºã
å æ¤ï¼ä¸äºç¨æ·ä¸å¾ä¸ä¸º Kylin ä»»å¡èç¹åé
æ´å¤å
åï¼æè¿è¡å¤ä¸ªä»»å¡èç¹ä»¥å¹³è¡¡å·¥ä½è´è½½ã<br />
ç°å¨ä»v2.5å¼å§ï¼Kylin å°æè¿é¡¹ä»»å¡æäº¤ç» Hadoop MapReduce å
Sparkï¼è¿æ ·å°±å¯ä»¥è§£å³è¿ä¸ªç¶é¢é®é¢ã
æ¥çKYLIN-3471äºè§£æ´å¤ä¿¡æ¯.</p>
-<h3 id="cube-">æ¹è¿ä½¿ç¨å
¨å±åå
¸ç cube
æå»ºæ§è½</h3>
+<h3 id="æ¹è¿ä½¿ç¨å
¨å±åå
¸ç-cube-æå»ºæ§è½">æ¹è¿ä½¿ç¨å
¨å±åå
¸ç cube
æå»ºæ§è½</h3>
<p>å
¨å±åå
¸ (Global Dictionary) æ¯ bitmap 精确å»é计æ°çå¿
è¦æ¡ä»¶ã妿å»éåå
·æé常é«çåºæ°ï¼å GD
å¯è½é常大ãå¨ cube æå»ºé¶æ®µï¼Kylin éè¦éè¿ GD
å°éæ´æ°å¼è½¬æ¢ä¸ºæ´æ°ã尽管 GD
已被åæå¤ä¸ªåçï¼å¯ä»¥åå¼å è½½å°å
åï¼ä½æ¯ç±äºå»éåç弿¯ä¹±åºçãKylin éè¦åå¤è½½å
¥åè½½åº(swap in/out)åçï¼è¿ä¼å¯¼è´æå»ºä»»å¡éå¸¸ç¼æ
¢ã<br
/>
该å¢å¼ºåè½å¼å
¥äºä¸ä¸ªæ°æ¥éª¤ï¼ä¸ºæ¯ä¸ªæ°æ®åä»å
¨å±åå
¸ä¸æå»ºä¸ä¸ªç¼©å°çåå
¸ã éåæ¯ä¸ªä»»å¡åªéè¦å
载缩å°çåå
¸ï¼ä»èé¿å
é¢ç¹çè½½å
¥åè½½åºãæ§è½å¯ä»¥æ¯ä»¥åå¿«3åãæ¥ç KYLIN-3491
äºè§£æ´å¤ä¿¡æ¯.</p>
-<h3 id="topn-count-distinct--cube-">æ¹è¿å« TOPN, COUNT
DISTINCT ç cube 大å°ç估计</h3>
+<h3
id="æ¹è¿å«-topn-count-distinct-ç-cube-大å°ç估计">æ¹è¿å«
TOPN, COUNT DISTINCT ç cube 大å°ç估计</h3>
<p>Cube ç大å°å¨æå»ºæ¶æ¯é¢å
估计çï¼å¹¶è¢«åç»å
个æ¥éª¤ä½¿ç¨ï¼ä¾å¦å³å® MR / Spark ä½ä¸çååºæ°ï¼è®¡ç® HBase
region åå²çãå®çåç¡®ä¸å¦ä¼å¯¹æå»ºæ§è½äº§çå¾å¤§å½±åã
å½åå¨ COUNT DISTINCTï¼TOPN çåº¦éæ¶åï¼å
为å®ä»¬ç大尿¯çµæ´»çï¼å
æ¤ä¼°è®¡å¼å¯è½è·çå®å¼æå¾å¤§åå·®ã
å¨è¿å»ï¼ç¨æ·éè¦è°æ´è¥å¹²ä¸ªåæ°ä»¥ä½¿å°ºå¯¸ä¼°è®¡æ´æ¥è¿å®é
尺寸ï¼è¿å¯¹æ®éç¨æ·æç¹å°é¾ã<br />
ç°å¨ï¼Kylin å°æ
¹æ®æ¶éçç»è®¡ä¿¡æ¯èªå¨è°æ´å¤§å°ä¼°è®¡ãè¿å¯ä»¥ä½¿ä¼°è®¡å¼ä¸å®é
大尿´æ¥è¿ãæ¥ç KYLIN-3453 äºè§£æ´å¤ä¿¡æ¯ã</p>
-<h3 id="hadoop-30hbase-20">æ¯æHadoop 3.0/HBase
2.0</h3>
+<h3 id="æ¯æhadoop-30hbase-20">æ¯æHadoop 3.0/HBase
2.0</h3>
<p>Hadoop 3å HBase 2å¼å§è¢«è®¸å¤ç¨æ·éç¨ãç°å¨ Kylin
æä¾ä½¿ç¨æ°ç Hadoop å HBase API ç¼è¯çæ°äºè¿å¶å
ãæä»¬å·²ç»å¨ Hortonworks HDP 3.0 å Cloudera CDH 6.0
ä¸è¿è¡äºæµè¯</p>
<p><strong>ä¸è½½</strong></p>
@@ -289,7 +289,7 @@ Graphic 10 Process of Querying Cube</
<p><em>é常æè°¢ææè´¡ç®Apache
Kylinçæå!</em></p>
</description>
- <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate>
+ <pubDate>Thu, 20 Sep 2018 20:00:00 +0000</pubDate>
<link>http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</link>
<guid
isPermaLink="true">http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</guid>
@@ -361,7 +361,7 @@ Graphic 10 Process of Querying Cube</
<p><em>Great thanks to everyone who
contributed!</em></p>
</description>
- <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate>
+ <pubDate>Thu, 20 Sep 2018 20:00:00 +0000</pubDate>
<link>http://kylin.apache.org/blog/2018/09/20/release-v2.5.0/</link>
<guid
isPermaLink="true">http://kylin.apache.org/blog/2018/09/20/release-v2.5.0/</guid>
@@ -407,22 +407,20 @@ GRANT ROLE ssb_write_role TO GROUP ssb_w
# Then add kylin_manager_user to kylin_manager_group in OpenLDAP, so
kylin_manager_user has access to the ssb database.
</pre>
<p>2 Assign HDFS directory /user/kylin_manager_user read and write
permissions to kylin_manager_user user.<br />
-3 Configure the HADOOP_STREAMING_JAR environment variable under the
kylin_manager_user user home directory.<br />
-<code class="highlighter-rouge">
-Export
HADOOP_STREAMING_JAR=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar
-</code></p>
+3 Configure the HADOOP_STREAMING_JAR environment variable under the
kylin_manager_user user home directory.</p>
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>Export
HADOOP_STREAMING_JAR=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar
+</code></pre></div></div>
<h2 id="download-the-ssb-tool-and-compile">Download the SSB
tool and compile</h2>
<p>You can quickly download and compile the ssb test tool by entering
the following command in the linux terminal.</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>git clone
https://github.com/jiangshouzhuang/ssb-kylin.git
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>git clone
https://github.com/jiangshouzhuang/ssb-kylin.git
cd ssb-kylin
cd ssb-benchmark
make clean
make
-</code></pre>
-</div>
+</code></pre></div></div>
<h2 id="adjust-the-ssb-parameters">Adjust the SSB
parameters</h2>
@@ -430,7 +428,7 @@ make
<p>Part of the ssb.conf file is:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code> # customer base, default value is
30,000
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code> # customer base, default value is
30,000
customer_base = 30000
# part base, default value is 200,000
part_base = 200000
@@ -440,30 +438,27 @@ make
date_base = 2556
# lineorder base (purchase record), default value is 6,000,000
lineorder_base = 6000000
-</code></pre>
-</div>
+</code></pre></div></div>
<p>Of course, the above base parameters can be adjusted according to
their actual needs, I use the default parameters.<br />
In the ssb.conf file, there are some parameters as follows.</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code># manufacturer max. The value range
is (1 .. manu_max)
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code># manufacturer max. The value range
is (1 .. manu_max)
manu_max = 5
# category max. The value range is (1 .. cat_max)
cat_max = 5
# brand max. The value range is (1 .. brand_max)
brand_max = 40
-</code></pre>
-</div>
+</code></pre></div></div>
<p><strong>The explanation is as follows:</strong> <br
/>
manu_max, cat_max and brand_max are used to define hierarchical scale. For
example, manu_max=10, cat_max=10, and brand_max=10 refer to a total of 10
manufactures, and each manufactures has a maximum of 10 category parts, and
each category has up to 10 brands. Therefore, the cardinality of manufacture is
10, the cardinality of category is 100, and the cardinality of brand is
1000.</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code># customer: num of cities per
country, default value is 100
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code># customer: num of cities per
country, default value is 100
cust_city_max = 9
# supplier: num of cities per country, default value is 100
supp_city_max = 9
-</code></pre>
-</div>
+</code></pre></div></div>
<p><strong>The explanation is as follows:</strong> <br
/>
cust_city_max and supp_city_max are used to define the number of city for each
country in customer and supplier tables. If the total number of country is 30,
and cust_city_max=100, supp_city_max=10, then the customer table will have 3000
different city, and the supplier table will have 300 different city.</p>
@@ -494,19 +489,17 @@ ${KYLIN_INSTALL_USER_PASSWD} -d org.apac
<p>If your CDH or other big data platform is not using beeline, but hive
cli, please modify it yourself.<br />
Once everything is ready, we start running the program and generate test
data:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>cd ssb-kylin
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>cd ssb-kylin
bin/run.sh --scale 20
-</code></pre>
-</div>
+</code></pre></div></div>
<p>We set the scale to 20, the program will run for a while, the maximum
lineorder table data has more than 100 million. After the program is executed,
we look at the tables in the hive database and the amount of data:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>use ssb;
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>use ssb;
show tables;
select count(1) from lineorder;
select count(1) from p_lineorder;
-</code></pre>
-</div>
+</code></pre></div></div>
<p><img src="/images/blog/2.1 generated tables.png"
alt="" /></p>
@@ -519,10 +512,9 @@ select count(1) from p_lineorder;
<p>The ssb-kylin project has helped us build the project, model, and
cube in advance. Just import the Kylin directly like the learn_kylin example.
Cube Metadataâs directory is cubemeta, because our kylin integrates OpenLDAP,
there is no ADMIN user, so the owner parameter in cubemeta/cube/ssb.json is set
to null.<br />
Execute the following command to import cubemeta:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>cd ssb-kylin
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>cd ssb-kylin
$KYLIN_HOME/bin/metastore.sh restore cubemeta
-</code></pre>
-</div>
+</code></pre></div></div>
<p>Then log in to Kylin and execute Reload Metadata operation. This
creates new project, model and cube in Kylin. Before building cube, first
Disable, then Purge, delete old temporary files.</p>
@@ -532,19 +524,17 @@ $KYLIN_HOME/bin/metastore.sh restore cub
<p>Here I test the performance of Spark to build Cube again, disable the
previously created Cube, and then Purge. Since the Cube is used by Purge, the
useless HBase tables and HDFS files need to be deleted. Here, manually clean up
the junk files. First execute the following command:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>${KYLIN_HOME}/bin/kylin.sh
org.apache.kylin.tool.StorageCleanupJob --delete false
-</code></pre>
-</div>
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>${KYLIN_HOME}/bin/kylin.sh
org.apache.kylin.tool.StorageCleanupJob --delete false
+</code></pre></div></div>
<p>Then check whether the listed HBase table and the HDFS file are
useless. After confirming the error, perform the delete operation:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>${KYLIN_HOME}/bin/kylin.sh
org.apache.kylin.tool.StorageCleanupJob --delete true
-</code></pre>
-</div>
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>${KYLIN_HOME}/bin/kylin.sh
org.apache.kylin.tool.StorageCleanupJob --delete true
+</code></pre></div></div>
<p>When using Spark to build a cube, it consumes a lot of memory. After
all, using memory resources improves the speed of cube building. Here I will
list some of the parameters of Spark in the kylin.properties configuration
file:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>kylin.engine.spark-conf.spark.master=yarn
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.yarn.queue=root.kylin_manager_group
# config Dynamic resource allocation
@@ -560,8 +550,7 @@ kylin.engine.spark-conf.spark.driver.mem
kylin.engine.spark-conf.spark.executor.memory=4G
kylin.engine.spark-conf.spark.executor.cores=1
kylin.engine.spark-conf.spark.network.timeout=600
-</code></pre>
-</div>
+</code></pre></div></div>
<p>The above parameters can meet most of the requirements, so users
basically do not need to configure when designing the Cube. Of course, if the
situation is special, you can still set Spark-related tuning parameters at the
Cube level.</p>
@@ -599,7 +588,7 @@ The query result of Scale=10 is as follo
</ol>
</description>
- <pubDate>Mon, 16 Jul 2018 05:28:00 -0700</pubDate>
+ <pubDate>Mon, 16 Jul 2018 12:28:00 +0000</pubDate>
<link>http://kylin.apache.org/blog/2018/07/16/Star-Schema-Benchmark-on-Apache-Kylin/</link>
<guid
isPermaLink="true">http://kylin.apache.org/blog/2018/07/16/Star-Schema-Benchmark-on-Apache-Kylin/</guid>
@@ -649,7 +638,7 @@ The query result of Scale=10 is as follo
<p>Wish you have a good time with Redash-Kylin!</p>
</description>
- <pubDate>Tue, 08 May 2018 13:00:00 -0700</pubDate>
+ <pubDate>Tue, 08 May 2018 20:00:00 +0000</pubDate>
<link>http://kylin.apache.org/blog/2018/05/08/redash-kylin-plugin-strikingly/</link>
<guid
isPermaLink="true">http://kylin.apache.org/blog/2018/05/08/redash-kylin-plugin-strikingly/</guid>
@@ -691,11 +680,11 @@ The query result of Scale=10 is as follo
<p>Any issue or question,<br />
open JIRA to Apache Kylin project: <a
href="https://issues.apache.org/jira/browse/KYLIN/">https://issues.apache.org/jira/browse/KYLIN/</a><br
/>
or<br />
-send mail to Apache Kylin dev mailing list: <a
href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;</a></p>
+send mail to Apache Kylin dev mailing list: <a
href="mailto:[email protected]">[email protected]</a></p>
<p><em>Great thanks to everyone who
contributed!</em></p>
</description>
- <pubDate>Sun, 04 Mar 2018 12:00:00 -0800</pubDate>
+ <pubDate>Sun, 04 Mar 2018 20:00:00 +0000</pubDate>
<link>http://kylin.apache.org/blog/2018/03/04/release-v2.3.0/</link>
<guid
isPermaLink="true">http://kylin.apache.org/blog/2018/03/04/release-v2.3.0/</guid>
@@ -766,15 +755,14 @@ Figure 4: Build Cube in Apache Kylin<
<li>Execute SQL in the âInsightâ tab, for example:</li>
</ol>
-<div class="highlighter-rouge"><pre
class="highlight"><code> select part_dtï¼
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code> select part_dtï¼
sum(price) as total_selledï¼
count(distinct seller_id) as sellers
from kylin_sales
group by part_dt
order by part_dt
-- #This query will hit on the newly built Cube âKylin_sales_cubeâ.
-</code></pre>
-</div>
+</code></pre></div></div>
<ol>
<li>Next, we will install Apache Superset and initialize it.<br
/>
@@ -782,15 +770,14 @@ Figure 4: Build Cube in Apache Kylin<
<li>Install kylinpy</li>
</ol>
-<div class="highlighter-rouge"><pre
class="highlight"><code> $ pip install kylinpy
-</code></pre>
-</div>
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code> $ pip install kylinpy
+</code></pre></div></div>
<ol>
<li>Verify your installation, if everything goes well, Apache Superset
daemon should be up and running.</li>
</ol>
-<div class="highlighter-rouge"><pre
class="highlight"><code>$ superset runserver -d
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>$ superset runserver -d
Starting server with command:
gunicorn -w 2 --timeout 60 -b 0.0.0.0:8088 --limit-request-line 0
--limit-request-field_size 0 superset:app
@@ -799,18 +786,19 @@ gunicorn -w 2 --timeout 60 -b 0.0.0.0:8
[2018-01-03 15:54:03 +0800] [73673] [INFO] Using worker: sync
[2018-01-03 15:54:03 +0800] [73676] [INFO] Booting worker with pid: 73676
[2018-01-03 15:54:03 +0800] [73679] [INFO] Booting worker with pid: 73679
-</code></pre>
-</div>
+</code></pre></div></div>
<h2 id="connect-apache-kylin-from-apachesuperset">Connect
Apache Kylin from ApacheSuperset</h2>
-<p>Now everything you need is installed and ready to go. Letâs try to
create an Apache Kylin data source in Apache Superset.<br />
-1. Open up http://localhost:8088 in your web browser with the credential you
set during Apache Superset installation.<br />
+<p>Now everything you need is installed and ready to go. Letâs try to
create an Apache Kylin data source in Apache Superset.</p>
+<ol>
+ <li>
+ <p>Open up http://localhost:8088 in your web browser with the
credential you set during Apache Superset installation.<br />
<img src="/images/Kylin-and-Superset/png/5. superset_1.png"
alt="" /><br />
Figure 5: Apache Superset Login Page</p>
-
-<ol>
- <li>Go to Source -&gt; Datasource to configure a new data source.
+ </li>
+ <li>
+ <p>Go to Source -&gt; Datasource to configure a new data
source.</p>
<ul>
<li>SQLAlchemy URI pattern is :
kylin://<username>:<password>@<hostname>:<port>/<project
name=""></project></port></hostname></password></username></li>
<li>Check âExpose in SQL Labâ if you want to expose this data
source in SQL Lab.</li>
@@ -856,9 +844,8 @@ Figure 11 Query multiple tables from Apa
<img src="/images/Kylin-and-Superset/png/12. SQL_Lab_2.png"
alt="" /><br />
Figure 12 Define your query and visualize it immediately</p>
-<p>You may copy the entire SQL below to experience how you can query
Kylin Cube in SQL Lab. <br />
-<code class="highlighter-rouge">
-select
+<p>You may copy the entire SQL below to experience how you can query
Kylin Cube in SQL Lab.</p>
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>select
YEAR_BEG_DT,
MONTH_BEG_DTï¼
WEEK_BEG_DTï¼
@@ -876,8 +863,8 @@ join KYLIN_CATEGORY_GROUPINGS on SITE_ID
join KYLIN_ACCOUNT on ACCOUNT_ID=BUYER_ID
join KYLIN_COUNTRY on ACCOUNT_COUNTRY=COUNTRY
group by YEAR_BEG_DT,
MONTH_BEG_DTï¼WEEK_BEG_DTï¼META_CATEG_NAMEï¼CATEG_LVL2_NAME,
CATEG_LVL3_NAME, OPS_REGION, NAME
-</code><br />
-## Experience All Features in Apache Superset with Apache Kylin</p>
+</code></pre></div></div>
+<h2
id="experience-all-features-in-apache-superset-with-apache-kylin">Experience
All Features in Apache Superset with Apache Kylin</h2>
<p>Most of the common reporting features are available in Apache
Superset. Now letâs see how we can use those features to analyze data from
Apache Kylin.</p>
@@ -890,13 +877,14 @@ group by YEAR_BEG_DT, MONTH_BEG_DTï¼
Figure 13 Sort by</p>
<h3 id="filtering">Filtering</h3>
-<p>There are multiple ways you may filter data from Apache Kylin.<br
/>
-1. Date Filter<br />
+<p>There are multiple ways you may filter data from Apache
Kylin.</p>
+<ol>
+ <li>
+ <p>Date Filter<br />
You may filter date and time dimension with the calendar filter. <br />
<img src="/images/Kylin-and-Superset/png/14. time_filter.png"
alt="" /><br />
Figure 14 Filtering time</p>
-
-<ol>
+ </li>
<li>
<p>Dimension Filter<br />
For other dimensions, you may filter it with SQL conditions like âin, not
in, equal to, not equal to, greater than and equal to, smaller than and equal
to, greater than, smaller than, likeâ.<br />
@@ -964,7 +952,7 @@ Figure 13 Sort by</p>
</ol>
</description>
- <pubDate>Mon, 01 Jan 2018 04:28:00 -0800</pubDate>
+ <pubDate>Mon, 01 Jan 2018 12:28:00 +0000</pubDate>
<link>http://kylin.apache.org/blog/2018/01/01/kylin-and-superset/</link>
<guid
isPermaLink="true">http://kylin.apache.org/blog/2018/01/01/kylin-and-superset/</guid>
@@ -993,12 +981,11 @@ Figure 13 Sort by</p>
<h3 id="make-spark-connect-hbase-with-kerberos-enabled">Make
Spark connect HBase with Kerberos enabled</h3>
<p>If just want to run Spark Cubing in Yarn client mode, we only need to
add three line code before new SparkConf() in SparkCubingByLayer:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code> Configuration configuration
= HBaseConnection.getCurrentHBaseConfiguration();
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code> Configuration configuration
= HBaseConnection.getCurrentHBaseConfiguration();
HConnection connection =
HConnectionManager.createConnection(configuration);
//Obtain an authentication token for the given user and add it to the
user's credentials.
TokenUtil.obtainAndCacheToken(connection,
UserProvider.instantiate(configuration).create(UserGroupInformation.getCurrentUser()));
-</code></pre>
-</div>
+</code></pre></div></div>
<p>As for How to make Spark connect HBase using Kerberos in Yarn cluster
mode, please refer to SPARK-6918, SPARK-12279, and HBASE-17040. The solution
may work, but not elegant. So I tried the sencond solution.</p>
@@ -1039,7 +1026,7 @@ This following picture shows the content
<p>Following is the Spark configuration I used in our environment. It
enables Spark dynamic resource allocation; the goal is to let our user set less
Spark configurations.</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code>//running in yarn-cluster mode
+<div class="highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code>//running in yarn-cluster mode
kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
@@ -1064,8 +1051,7 @@ kylin.engine.spark-conf.spark.network.ti
kylin.engine.spark-conf.spark.yarn.queue=root.hadoop.test
kylin.engine.spark.rdd-partition-cut-mb=100
-</code></pre>
-</div>
+</code></pre></div></div>
<h3 id="performance-test-of-spark-cubing">Performance test of
Spark Cubing</h3>
@@ -1121,7 +1107,7 @@ kylin.engine.spark.rdd-partition-cut-mb=
<p>Spark Cubing is a great feature for Kylin 2.0, Thanks Kylin
community. We will apply Spark Cubing in real scenarios in our company. I
believe Spark Cubing will be more robust and efficient in the future
releases.</p>
</description>
- <pubDate>Fri, 21 Jul 2017 15:22:22 -0700</pubDate>
+ <pubDate>Fri, 21 Jul 2017 22:22:22 +0000</pubDate>
<link>http://kylin.apache.org/blog/2017/07/21/Improving-Spark-Cubing/</link>
<guid
isPermaLink="true">http://kylin.apache.org/blog/2017/07/21/Improving-Spark-Cubing/</guid>
@@ -1141,11 +1127,10 @@ kylin.engine.spark.rdd-partition-cut-mb=
<p>In Apache Kylin, we support the similar SQL sytanx like Apache Hive,
with a aggregation function called <strong>percentile(&lt;Number
Column&gt;, &lt;Double&gt;)</strong>:</p>
-<div class="highlighter-rouge"><pre
class="highlight"><code><span
class="k">SELECT</span> <span
class="n">seller_id</span><span
class="p">,</span> <span
class="n">percentile</span><span
class="p">(</span><span
class="n">price</span><span
class="p">,</span> <span
class="mi">0</span><span
class="p">.</span><span
class="mi">5</span><span
class="p">)</span>
+<div class="language-sql highlighter-rouge"><div
class="highlight"><pre
class="highlight"><code><span
class="k">SELECT</span> <span
class="n">seller_id</span><span
class="p">,</span> <span
class="n">percentile</span><span
class="p">(</span><span
class="n">price</span><span
class="p">,</span> <span
class="mi">0</span><span
class="p">.</span><span
class="mi">5</span><span
class="p">)</span>
<span class="k">FROM</span> <span
class="n">test_kylin_fact</span>
<span class="k">GROUP</span> <span
class="k">BY</span> <span
class="n">seller_id</span>
-</code></pre>
-</div>
+</code></pre></div></div>
<h3 id="how-to-use">How to use</h3>
<p>If you know little about <em>Cubes</em>, please go to
<a
href="http://kylin.apache.org/docs20/tutorial/kylin_sample.html">QuickStart</a>
first to learn basic knowledge.</p>
@@ -1162,7 +1147,7 @@ kylin.engine.spark.rdd-partition-cut-mb=
<p><img src="/images/blog/percentile_3.png"
alt="" /></p>
</description>
- <pubDate>Sat, 01 Apr 2017 15:22:22 -0700</pubDate>
+ <pubDate>Sat, 01 Apr 2017 22:22:22 +0000</pubDate>
<link>http://kylin.apache.org/blog/2017/04/01/percentile-measure/</link>
<guid
isPermaLink="true">http://kylin.apache.org/blog/2017/04/01/percentile-measure/</guid>
@@ -1191,23 +1176,23 @@ kylin.engine.spark.rdd-partition-cut-mb=
<li>å¨ Cloud ä¸éè¿æµè¯ (<a
href="https://issues.apache.org/jira/browse/KYLIN-2351">KYLIN-2351</a>)</li>
</ul>
-<p>é常欢è¿å¤§å®¶ä¸è½½å¹¶æµè¯ v2.0.0
betaãæ¨çåé¦å¯¹æä»¬é常éè¦ï¼è¯·åé®ä»¶å° <a
href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;</a>ã</p>
+<p>é常欢è¿å¤§å®¶ä¸è½½å¹¶æµè¯ v2.0.0
betaãæ¨çåé¦å¯¹æä»¬é常éè¦ï¼è¯·åé®ä»¶å° <a
href="mailto:[email protected]">[email protected]</a>ã</p>
<hr />
-<h2 id="section">å®è£
</h2>
+<h2 id="å®è£
">å®è£
</h2>
<p>ææ¶ v2.0.0 beta æ æ³ä» v1.6.0 ç´æ¥å级ï¼å¿
éå
¨æ°å®è£
ãè¿æ¯ç±äºæ°çæ¬çå
æ°æ®å¹¶ä¸ååå
¼å®¹ãå¥½å¨ Cube
æ°æ®æ¯ååå
¼å®¹çï¼å æ¤åªéè¦å¼åä¸ä¸ªå
æ°æ®è½¬æ¢å·¥å
·ï¼å°±è½å¨ä¸ä¹
çå°æ¥å®ç°å¹³æ»å级ãæä»¬æ£å¨ä¸ºæ¤åªåã</p>
<hr />
-<h2 id="tpc-h-">è¿è¡ TPC-H åºåæµè¯</h2>
+<h2 id="è¿è¡-tpc-h-åºåæµè¯">è¿è¡ TPC-H
åºåæµè¯</h2>
<p>å¨ Apache Kylin ä¸è¿è¡ TPC-H çå
·ä½æ¥éª¤: <a
href="https://github.com/Kyligence/kylin-tpch">https://github.com/Kyligence/kylin-tpch</a></p>
<hr />
-<h2 id="spark-">Spark æå»ºå¼æ</h2>
+<h2 id="spark-æå»ºå¼æ">Spark æå»ºå¼æ</h2>
<p>Apache Kylin v2.0.0 å¼å
¥äºä¸ä¸ªå
¨æ°çåºäº Apache Spark
çæå»ºå¼æãå®å¯ç¨äºæ¿æ¢åæç MapReduce
æå»ºå¼æã忥æµè¯æ¾ç¤º Cube çæå»ºæ¶é´ä¸è¬è½ç¼©çå°åå
ç 50% å·¦å³ã</p>
@@ -1217,7 +1202,7 @@ kylin.engine.spark.rdd-partition-cut-mb=
<p><em>æè°¢æ¯ä¸ä½æåçåä¸åè´¡ç®!</em></p>
</description>
- <pubDate>Sat, 25 Feb 2017 12:00:00 -0800</pubDate>
+ <pubDate>Sat, 25 Feb 2017 20:00:00 +0000</pubDate>
<link>http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</link>
<guid
isPermaLink="true">http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</guid>