Author: nju_yaho Date: Sun Jan 13 05:04:15 2019 New Revision: 1851193 URL: http://svn.apache.org/viewvc?rev=1851193&view=rev Log: update download documents
Modified: kylin/site/cn/download/index.html kylin/site/download/index.html kylin/site/feed.xml Modified: kylin/site/cn/download/index.html URL: http://svn.apache.org/viewvc/kylin/site/cn/download/index.html?rev=1851193&r1=1851192&r2=1851193&view=diff ============================================================================== --- kylin/site/cn/download/index.html (original) +++ kylin/site/cn/download/index.html Sun Jan 13 05:04:15 2019 @@ -174,6 +174,19 @@ var _hmt = _hmt || []; </header> <p>æ¨å¯ä»¥æç §è¿äº<a href="https://www.apache.org/info/verification.html">æ¥éª¤</a> 并使ç¨è¿äº<a href="https://www.apache.org/dist/kylin/KEYS">KEYS</a>æ¥éªè¯ä¸è½½æä»¶çæææ§.</p> +<h4 id="v260">v2.6.0</h4> +<ul> + <li>è¿æ¯2.5 çæ¬åçä¸ä¸ªä¸»è¦åå¸çæ¬ï¼å å«äº94个é®é¢çä¿®å¤ä»¥ååç§æ¹è¿ãå ³äºå ·ä½å 容请æ¥çåå¸è¯´æ.</li> + <li><a href="/docs/release_notes.html">åå¸è¯´æ</a> and <a href="/docs/howto/howto_upgrade.html">å级æå</a></li> + <li>æºç ä¸è½½: <a href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip">apache-kylin-2.6.0-source-release.zip</a> [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip.asc">asc</a>] [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip.sha256">sha256</a>]</li> + <li>äºè¿å¶å ä¸è½½: + <ul> + <li>for HBase 1.x (includes HDP 2.3+, AWS EMR 5.0+, Azure HDInsight 3.4 - 3.6) - <a href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz">apache-kylin-2.6.0-bin-hbase1x.tar.gz</a> [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz.asc">asc</a>] [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz.sha256">sha256</a>]</li> + <li>for CDH 5.7+ - <a href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz">apache-kylin-2.6.0-bin-cdh57.tar.gz</a> [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz.asc">asc</a>] [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz.sha256">sha256</a>]</li> + </ul> + </li> +</ul> + <h4 id="v252">v2.5.2</h4> <ul> <li>è¿æ¯2.5.1 çæ¬åçä¸ä¸ªbugfixåå¸çæ¬ï¼å å«äº12 个以ååç§æ¹è¿ãå ³äºå ·ä½å 容请æ¥çåå¸è¯´æ.</li> @@ -193,18 +206,18 @@ var _hmt = _hmt || []; </li> </ul> -<h4 id="jdbc-">JDBC 驱å¨ç¨åº</h4> +<h4 id="jdbc-驱å¨ç¨åº">JDBC 驱å¨ç¨åº</h4> <p>Kylin JDBC 驱å¨ç¨åº, <code class="highlighter-rouge">kylin-jdbc-<version>.jar</code>, å¨ Kylin äºè¿å¶å ç âlibâ ç®å½ä¸.</p> -<h4 id="odbc-">ODBC 驱å¨ç¨åº</h4> +<h4 id="odbc-驱å¨ç¨åº">ODBC 驱å¨ç¨åº</h4> <ul> <li><a href="http://kylin.apache.org/download/KylinODBCDriver-2.1.0.zip">Kylin ODBC é©±å¨ v2.1.0</a> ï¼ä¸ææ Kylin çæ¬å ¼å®¹ï¼</li> </ul> <p>注: Kylin ODBC 驱å¨ä¾èµ <a href="http://www.microsoft.com/en-us/download/details.aspx?id=30679">Microsoft Visual C++ 2012 Redistributable</a>ã</p> -<h4 id="section">以åççæ¬</h4> +<h4 id="以åççæ¬">以åççæ¬</h4> <p>Apache Kylinçæ§çæ¬å¯ä»¥ä» <a href="https://archive.apache.org/dist/kylin/">https://archive.apache.org/dist/kylin/</a> ä¸è½½ã</p> </div> Modified: kylin/site/download/index.html URL: http://svn.apache.org/viewvc/kylin/site/download/index.html?rev=1851193&r1=1851192&r2=1851193&view=diff ============================================================================== --- kylin/site/download/index.html (original) +++ kylin/site/download/index.html Sun Jan 13 05:04:15 2019 @@ -6033,6 +6033,19 @@ var _hmt = _hmt || []; </header> <p>You can verify the download by following these <a href="https://www.apache.org/info/verification.html">procedures</a> and using these <a href="https://www.apache.org/dist/kylin/KEYS">KEYS</a>.</p> +<h4 id="v260">v2.6.0</h4> +<ul> + <li>This is a major release after 2.5, with 94 bug fixes and enhancement. Check the release notes.</li> + <li><a href="/docs/release_notes.html">Release notes</a> and <a href="/docs/howto/howto_upgrade.html">upgrade guide</a></li> + <li>Source download: <a href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip">apache-kylin-2.6.0-source-release.zip</a> [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip.asc">asc</a>] [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-source-release.zip.sha256">sha256</a>]</li> + <li>Binary download: + <ul> + <li>for HBase 1.x (includes HDP 2.3+, AWS EMR 5.0+, Azure HDInsight 3.4 - 3.6) - <a href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz">apache-kylin-2.6.0-bin-hbase1x.tar.gz</a> [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz.asc">asc</a>] [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz.sha256">sha256</a>]</li> + <li>for CDH 5.7+ - <a href="https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz">apache-kylin-2.6.0-bin-cdh57.tar.gz</a> [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz.asc">asc</a>] [<a href="https://www.apache.org/dist/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-cdh57.tar.gz.sha256">sha256</a>]</li> + </ul> + </li> +</ul> + <h4 id="v252">v2.5.2</h4> <ul> <li>This is a bugfix release after 2.5.1, with 12 bug fixes and enhancement. Check the release notes.</li> Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1851193&r1=1851192&r2=1851193&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Sun Jan 13 05:04:15 2019 @@ -19,8 +19,8 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Thu, 10 Jan 2019 05:59:22 -0800</pubDate> - <lastBuildDate>Thu, 10 Jan 2019 05:59:22 -0800</lastBuildDate> + <pubDate>Sat, 12 Jan 2019 20:42:57 -0800</pubDate> + <lastBuildDate>Sat, 12 Jan 2019 20:42:57 -0800</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item> @@ -31,12 +31,12 @@ <p>During the Apache Kylin Meetup in August 2018, the Meituan team shared their Kylin on Druid (KoD) solution. Why did they develop this hybrid system? Whatâs the rationale behind it? This article will answer these questions and help you to understand the differences and the pros and cons of each OLAP engine.</p> -<h2 id="introduction-to-apache-kylin">01 Introduction to Apache Kylin</h2> +<h2 id="01-introduction-to-apache-kylin">01 Introduction to Apache Kylin</h2> <p>Apache Kylin is an open source distributed big data analytics engine. It constructs data models on top of huge datasets, builds pre-calculated Cubes to support multi-dimensional analysis, and provides a SQL query interface and multi-dimensional analysis on top of Hadoop, with general ODBC, JDBC, and RESTful API interfaces. Apache Kylinâs unique pre-calculation ability enables it to handle extremely large datasets with sub-second query response times.<br /> <img src="/images/blog/Kylin-On-Durid/1 kylin_architecture.png" alt="" /><br /> Graphic 1 Kylin Architecture</p> -<h2 id="apache-kylins-advantage">02 Apache Kylinâs Advantage</h2> +<h2 id="02-apache-kylins-advantage">02 Apache Kylinâs Advantage</h2> <ol> <li>The mature, Hadoop-based computing engines (MapReduce and Spark) that provide strong capability of pre-calculation on super large datasets, which can be deployed out-of-the-box on any mainstream Hadoop platform.</li> <li>Support of ANSI SQL that allows users to do data analysis with SQL directly.</li> @@ -48,34 +48,34 @@ Graphic 1 Kylin Architecture</p> <li>Support of both batch loading of super large historical datasets and micro-batches of data streams.</li> </ol> -<h2 id="introduction-to-apache-druid-incubating">03 Introduction to Apache Druid (incubating)</h2> +<h2 id="03-introduction-to-apache-druid-incubating">03 Introduction to Apache Druid (incubating)</h2> <p>Druid was created in 2012. Itâs an open source distributed data store. Its core design combines the concept of analytical databases, time-series databases, and search systems, and it can support data collection and analytics on fairly large datasets. Druid uses an Apache V2 license and is an Apache incubator project.</p> <p>Druid Architecture<br /> From the perspective of deployment architectures, Druidâs processes mostly fall into 3 categories based on their roles.</p> -<h3 id="data-node-slave-node-for-data-ingestion-and-calculation">⢠Data Node (Slave node for data ingestion and calculation)</h3> +<h3 id="-data-node-slave-node-for-data-ingestion-and-calculation">⢠Data Node (Slave node for data ingestion and calculation)</h3> <p>The Historical node is in charge of loading segments (committed immutable data) and receiving queries on historical data.<br /> Middle Manager is in charge of data ingestion and commit segments. Each task is done by a separate JVM. <br /> Peon is in charge of completing a single task, which is managed and monitored by the Middle Manager.</p> -<h3 id="query-node">⢠Query Node</h3> +<h3 id="-query-node">⢠Query Node</h3> <p>Broker receives query requests, determines on which segment the data resides, and distributes sub-queries and merges query results.</p> -<h3 id="master-node-task-coordinator-and-cluster-manager">⢠Master Node (Task Coordinator and Cluster Managerï¼</h3> +<h3 id="-master-node-task-coordinator-and-cluster-manager">⢠Master Node (Task Coordinator and Cluster Managerï¼</h3> <p>Coordinator monitors Historical nodes, dispatches segments and monitor workload.<br /> Overlord monitors Middle Manager, dispatches tasks to Middle Manager, and assists releasing of segments.</p> <h3 id="external-dependency">External Dependency</h3> <p>At the same time, Druid has 3 replaceable external dependencies.</p> -<h3 id="deep-storage-distributed-storage">⢠Deep Storage (distributed storage)</h3> +<h3 id="-deep-storage-distributed-storage">⢠Deep Storage (distributed storage)</h3> <p>Druid uses Deep storage to transfer data files between nodes.</p> -<h3 id="metadata-storage">⢠Metadata Storage</h3> +<h3 id="-metadata-storage">⢠Metadata Storage</h3> <p>Metadata Storage stores the metadata about segment positions and task output.</p> -<h3 id="zookeeper-cluster-management-and-task-coordination">⢠Zookeeper (cluster management and task coordination)</h3> +<h3 id="-zookeeper-cluster-management-and-task-coordination">⢠Zookeeper (cluster management and task coordination)</h3> <p>Druid uses Zookeeper (ZK) to ensure consistency of the cluster status.<br /> <img src="/images/blog/Kylin-On-Durid/2 druid_architecture.png" alt="" /><br /> Graphic 2 Druid Architecture</p> @@ -98,7 +98,7 @@ Graphic 4 Druid Schema</p> <li>Separation of cold/hot data.</li> </ol> -<h2 id="why-did-meituan-develop-kylin-on-druid">04 Why did Meituan develop Kylin on Druid?</h2> +<h2 id="04-why-did-meituan-develop-kylin-on-druid">04 Why did Meituan develop Kylin on Druid?</h2> <p>Meituan deployed into production an offline OLAP platform with Apache Kylin as its core component in 2015. Since then the platform has served almost all business lines with fast growing data volume and query executions, and the stress on the cluster has increased accordingly. Throughout the time, the tech team in Meituan keeps exploring better solutions for some of Kylinâs challenges. The major one is Apache HBase, the storage that Kylin relies on.</p> <p>Kylin stores its data in HBase by converting the Dimensions and Measures into HBase Keys and Values, respectively. As HBase doesnât support secondary index and only has one RowKey index, Kylinâs Dimension values will be combined into a fixed sequence to store as RowKey. In this way, filtering on a Dimension in the front of the sequence will perform better than those at the back. Hereâs an example:</p> @@ -112,14 +112,13 @@ Graphic 6 Cube2 RowKey Sequence</p> <p><strong>Now letâs query each Cube with the same SQL and compare the response time.</strong></p> -<div class="highlighter-rouge"><pre class="highlight"><code>select S_SUPPKEY, C_CUSTKEY, sum(LO_EXTENDEDPRICE) as m1 +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>select S_SUPPKEY, C_CUSTKEY, sum(LO_EXTENDEDPRICE) as m1 from P_LINEORDER left join SUPPLIER on P_LINEORDER.LO_SUPPKEY = SUPPLIER.S_SUPPKEY left join CUSTOMER on P_LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY WHERE (LO_ORDERKEY &gt; 1799905 and LO_ORDERKEY &lt; 1799915) or (LO_ORDERKEY &gt; 1999905 and LO_ORDERKEY &lt; 1999935) GROUP BY S_SUPPKEY, C_CUSTKEY; -</code></pre> -</div> +</code></pre></div></div> <p><strong>Below shows the time consumed and data scanned:</strong><br /> <img src="/images/blog/Kylin-On-Durid/7 cube1_query_log.png" alt="" /><br /> @@ -136,11 +135,12 @@ Graphic 8 Cube1 Query Log</p> <p>Kylinâs query performance and user experience can be greatly improved with pure columnar storage and multiple indexes on Dimensions. As analyzed above, Druid happens to meet the requirements of columnar + multi-index. So the Meituan Kylin team decided to try replacing HBase with Druid.</p> -<p>Why not just use Druid then? Meituanâs engineers shared their thoughts:<br /> -1. Druidâs native query language is in its own specific JSON format, which is not as easy to pick up as SQL. Although the Druid community added SQL support later on, the support is not complete and does not meet the data analystsâ requirement of complex SQL queries. On the contrary, Kylin natively supports ANSI SQL, uses Apache Calcite for semantic parsing, and supports SQL features such as join, sub query, window functions, etc. In addition, it provides standard interfaces including ODBC/JDBC, and can directly connect with BI tools such as Tableau, Power BI, Superset, and Redash.</p> - +<p>Why not just use Druid then? Meituanâs engineers shared their thoughts:</p> <ol> <li> + <p>Druidâs native query language is in its own specific JSON format, which is not as easy to pick up as SQL. Although the Druid community added SQL support later on, the support is not complete and does not meet the data analystsâ requirement of complex SQL queries. On the contrary, Kylin natively supports ANSI SQL, uses Apache Calcite for semantic parsing, and supports SQL features such as join, sub query, window functions, etc. In addition, it provides standard interfaces including ODBC/JDBC, and can directly connect with BI tools such as Tableau, Power BI, Superset, and Redash.</p> + </li> + <li> <p>Druid can support only single-table query. Multi-table joins are very common in practice, but they cannot be supported by Druid. Kylin, however, supports Star Schema and Snowflake Schema, satisfying multi-table join requirements.</p> </li> <li> @@ -167,9 +167,9 @@ Graphic 8 Cube1 Query Log</p> <p>Therefore, it appears to be a promising OLAP solution to combine Druidâs excellent columnar storage with Kylinâs usability, compatibility, and completeness. Druid has columnar storage, inverted index, better filtering performance than HBase, native OLAP features, and good secondary aggregation capabilities. Meituanâs tech team decided to try replacing HBase with Druid as the storage for Kylin.</p> -<h2 id="section">05</h2> -<p>### Kylin on Druid Design<br /> -At v1.5, Apache Kylin introduced plugable architecture and de-coupled computing and storage components, which makes the replacement of HBase possible. Here is a brief introduction to the main design concept of Kylin on Druid based on Meituan engineer Kaisen Kangâs design doc. (Graphics 9 and 10 are from reference[1], and text are from reference[1] and [3])</p> +<h2 id="05">05</h2> +<h3 id="kylin-on-druid-design">Kylin on Druid Design</h3> +<p>At v1.5, Apache Kylin introduced plugable architecture and de-coupled computing and storage components, which makes the replacement of HBase possible. Here is a brief introduction to the main design concept of Kylin on Druid based on Meituan engineer Kaisen Kangâs design doc. (Graphics 9 and 10 are from reference[1], and text are from reference[1] and [3])</p> <h3 id="process-of-building-cube">Process of Building Cube</h3> <ol> @@ -202,12 +202,12 @@ Graphic 10 Process of Querying Cube</ <li>Kylin measure columns map to Druid measure columns.</li> </ol> -<h2 id="summary">06 Summary</h2> +<h2 id="06-summary">06 Summary</h2> <p>In this article, we first analyzed features and pros/cons of both Kylin and Druid, and the reasons for poor performance of Hbase in Kylin in some cases. Then we searched solutions and found the feasible option of using Druid as the Kylin storage engine. At last, we illustrated the Kylin-on-Druid architecture and the processes developed by Meituan.</p> <p>Stay tuned for our next article about how to use Kylin on Druid, how it performs, and how it can be improved.</p> -<h2 id="reference">07 Reference</h2> +<h2 id="07-reference">07 Reference</h2> <ol> <li> @@ -225,7 +225,7 @@ Graphic 10 Process of Querying Cube</ </ol> </description> - <pubDate>Wed, 12 Dec 2018 09:30:00 -0800</pubDate> + <pubDate>Wed, 12 Dec 2018 17:30:00 +0000</pubDate> <link>http://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/</link> <guid isPermaLink="true">http://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/</guid> @@ -242,37 +242,37 @@ Graphic 10 Process of Querying Cube</ <p>è¿æ¯ç»§2.4.0 åçä¸ä¸ªæ°åè½çæ¬ãè¯¥çæ¬å¼å ¥äºå¾å¤æä»·å¼çæ¹è¿ï¼å®æ´çæ¹å¨å表请åè§<a href="https://kylin.apache.org/docs/release_notes.html">release notes</a>ï¼è¿éæä¸äºä¸»è¦æ¹è¿å说æï¼</p> -<h3 id="all-in-spark--cubing-">All-in-Spark ç Cubing 弿</h3> +<h3 id="all-in-spark-ç-cubing-弿">All-in-Spark ç Cubing 弿</h3> <p>Kylin ç Spark 弿å°ä½¿ç¨ Spark è¿è¡ cube 计ç®ä¸çææåå¸å¼ä½ä¸ï¼å æ¬è·åå个维度çä¸åå¼ï¼å° cuboid æä»¶è½¬æ¢ä¸º HBase HFileï¼åå¹¶ segmentï¼åå¹¶è¯å ¸çãé»è®¤ç Spark é ç½®ä¹ç»è¿ä¼åï¼ä½¿å¾ç¨æ·å¯ä»¥è·å¾å¼ç®±å³ç¨çä½éªãç¸å ³å¼å任塿¯ KYLIN-3427, KYLIN-3441, KYLIN-3442.</p> <p>Spark ä»»å¡ç®¡çä¹æææ¹è¿ï¼ä¸æ¦ Spark ä»»å¡å¼å§è¿è¡ï¼æ¨å°±å¯ä»¥å¨Webæ§å¶å°ä¸è·å¾ä½ä¸é¾æ¥ï¼å¦ææ¨ä¸¢å¼è¯¥ä½ä¸ï¼Kylin å°ç«å»ç»æ¢ Spark ä½ä¸ä»¥åæ¶éæ¾èµæºï¼å¦æéæ°å¯å¨ Kylinï¼å®å¯ä»¥ä»ä¸ä¸ä¸ªä½ä¸æ¢å¤ï¼è䏿¯éæ°æäº¤æ°ä½ä¸.</p> -<h3 id="mysql--kylin-">MySQL å Kylin å æ°æ®çåå¨</h3> +<h3 id="mysql-å-kylin-å æ°æ®çåå¨">MySQL å Kylin å æ°æ®çåå¨</h3> <p>å¨è¿å»ï¼HBase æ¯ Kylin å æ°æ®åå¨çå¯ä¸éæ©ã å¨æäºæ åµä¸ HBaseä¸éç¨ï¼ä¾å¦ä½¿ç¨å¤ä¸ª HBase é群æ¥ä¸º Kylin æä¾è·¨åºåçé«å¯ç¨ï¼è¿éå¤å¶ç HBase é群æ¯åªè¯»çï¼æä»¥ä¸è½åå æ°æ®åå¨ãç°å¨æä»¬å¼å ¥äº MySQL Metastore 以满足è¿ç§éæ±ãæ¤åè½ç°å¨å¤äºæµè¯é¶æ®µãæ´å¤å 容åè§ KYLIN-3488ã</p> -<h3 id="hybrid-model-">Hybrid model å¾å½¢çé¢</h3> +<h3 id="hybrid-model-å¾å½¢çé¢">Hybrid model å¾å½¢çé¢</h3> <p>Hybrid æ¯ä¸ç§ç¨äºç»è£ å¤ä¸ª cube çé«çº§æ¨¡åã å®å¯ç¨äºæ»¡è¶³ cube ç schema è¦åçæ¹åçæ åµãè¿ä¸ªåè½è¿å»æ²¡æå¾å½¢çé¢ï¼å æ¤åªæä¸å°é¨åç¨æ·ç¥éå®ãç°å¨æä»¬å¨ Web çé¢ä¸å¼å¯äºå®ï¼ä»¥ä¾¿æ´å¤ç¨æ·å¯ä»¥å°è¯ã</p> -<h3 id="cube-planner">é»è®¤å¼å¯ Cube planner</h3> +<h3 id="é»è®¤å¼å¯-cube-planner">é»è®¤å¼å¯ Cube planner</h3> <p>Cube planner å¯ä»¥æå¤§å°ä¼å cube ç»æï¼åå°æå»ºç cuboid æ°éï¼ä»èèç计ç®/åå¨èµæºå¹¶æé«æ¥è¯¢æ§è½ã宿¯å¨v2.3ä¸å¼å ¥çï¼ä½é»è®¤æ åµä¸æ²¡æå¼å¯ã为äºè®©æ´å¤ç¨æ·çå°å¹¶å°è¯å®ï¼æä»¬é»è®¤å¨v2.5ä¸å¯ç¨å®ã ç®æ³å°å¨ç¬¬ä¸æ¬¡æå»º segment çæ¶åï¼æ ¹æ®æ°æ®ç»è®¡èªå¨ä¼å cuboid éå.</p> -<h3 id="segment-">æ¹è¿ç Segment åªæ</h3> +<h3 id="æ¹è¿ç-segment-åªæ">æ¹è¿ç Segment åªæ</h3> <p>Segmentï¼ååºï¼ä¿®åªå¯ä»¥ææå°åå°ç£çåç½ç»I / Oï¼å æ¤å¤§å¤§æé«äºæ¥è¯¢æ§è½ã è¿å»ï¼Kylin åªæååºå (partition date column) çå¼è¿è¡ segment çä¿®åªã 妿æ¥è¯¢ä¸æ²¡æå°ååºåä½ä¸ºè¿æ»¤æ¡ä»¶ï¼é£ä¹ä¿®åªå°ä¸èµ·ä½ç¨ï¼ä¼æ«æææsegmentã.<br /> ç°å¨ä»v2.5å¼å§ï¼Kylin å°å¨ segment 级å«è®°å½æ¯ä¸ªç»´åº¦çæå°/æå¤§å¼ã 卿«æ segment ä¹åï¼ä¼å°æ¥è¯¢çæ¡ä»¶ä¸æå°/æå¤§ç´¢å¼è¿è¡æ¯è¾ã 妿ä¸å¹é ï¼å°è·³è¿è¯¥ segmentã æ£æ¥KYLIN-3370äºè§£æ´å¤ä¿¡æ¯ã</p> -<h3 id="yarn-">å¨ YARN ä¸åå¹¶åå ¸</h3> +<h3 id="å¨-yarn-ä¸åå¹¶åå ¸">å¨ YARN ä¸åå¹¶åå ¸</h3> <p>å½ segment åå¹¶æ¶ï¼å®ä»¬çè¯å ¸ä¹éè¦åå¹¶ãå¨è¿å»ï¼åå ¸åå¹¶åçå¨ Kylin ç JVM ä¸ï¼è¿éè¦ä½¿ç¨å¤§éçæ¬å°å åå CPU èµæºã å¨æç«¯æ åµä¸ï¼å¦ææå 个并åä½ä¸ï¼ï¼å¯è½ä¼å¯¼è´ Kylin è¿ç¨å´©æºã å æ¤ï¼ä¸äºç¨æ·ä¸å¾ä¸ä¸º Kylin ä»»å¡èç¹åé æ´å¤å åï¼æè¿è¡å¤ä¸ªä»»å¡èç¹ä»¥å¹³è¡¡å·¥ä½è´è½½ã<br /> ç°å¨ä»v2.5å¼å§ï¼Kylin å°æè¿é¡¹ä»»å¡æäº¤ç» Hadoop MapReduce å Sparkï¼è¿æ ·å°±å¯ä»¥è§£å³è¿ä¸ªç¶é¢é®é¢ã æ¥çKYLIN-3471äºè§£æ´å¤ä¿¡æ¯.</p> -<h3 id="cube-">æ¹è¿ä½¿ç¨å ¨å±åå ¸ç cube æå»ºæ§è½</h3> +<h3 id="æ¹è¿ä½¿ç¨å ¨å±åå ¸ç-cube-æå»ºæ§è½">æ¹è¿ä½¿ç¨å ¨å±åå ¸ç cube æå»ºæ§è½</h3> <p>å ¨å±åå ¸ (Global Dictionary) æ¯ bitmap 精确å»é计æ°çå¿ è¦æ¡ä»¶ã妿å»éåå ·æé常é«çåºæ°ï¼å GD å¯è½é常大ãå¨ cube æå»ºé¶æ®µï¼Kylin éè¦éè¿ GD å°éæ´æ°å¼è½¬æ¢ä¸ºæ´æ°ã尽管 GD 已被åæå¤ä¸ªåçï¼å¯ä»¥åå¼å è½½å°å åï¼ä½æ¯ç±äºå»éåç弿¯ä¹±åºçãKylin éè¦åå¤è½½å ¥åè½½åº(swap in/out)åçï¼è¿ä¼å¯¼è´æå»ºä»»å¡éå¸¸ç¼æ ¢ã<br /> 该å¢å¼ºåè½å¼å ¥äºä¸ä¸ªæ°æ¥éª¤ï¼ä¸ºæ¯ä¸ªæ°æ®åä»å ¨å±åå ¸ä¸æå»ºä¸ä¸ªç¼©å°çåå ¸ã éåæ¯ä¸ªä»»å¡åªéè¦å 载缩å°çåå ¸ï¼ä»èé¿å é¢ç¹çè½½å ¥åè½½åºãæ§è½å¯ä»¥æ¯ä»¥åå¿«3åãæ¥ç KYLIN-3491 äºè§£æ´å¤ä¿¡æ¯.</p> -<h3 id="topn-count-distinct--cube-">æ¹è¿å« TOPN, COUNT DISTINCT ç cube 大å°ç估计</h3> +<h3 id="æ¹è¿å«-topn-count-distinct-ç-cube-大å°ç估计">æ¹è¿å« TOPN, COUNT DISTINCT ç cube 大å°ç估计</h3> <p>Cube ç大å°å¨æå»ºæ¶æ¯é¢å 估计çï¼å¹¶è¢«åç»å 个æ¥éª¤ä½¿ç¨ï¼ä¾å¦å³å® MR / Spark ä½ä¸çååºæ°ï¼è®¡ç® HBase region åå²çãå®çåç¡®ä¸å¦ä¼å¯¹æå»ºæ§è½äº§çå¾å¤§å½±åã å½åå¨ COUNT DISTINCTï¼TOPN çåº¦éæ¶åï¼å 为å®ä»¬ç大尿¯çµæ´»çï¼å æ¤ä¼°è®¡å¼å¯è½è·çå®å¼æå¾å¤§åå·®ã å¨è¿å»ï¼ç¨æ·éè¦è°æ´è¥å¹²ä¸ªåæ°ä»¥ä½¿å°ºå¯¸ä¼°è®¡æ´æ¥è¿å®é 尺寸ï¼è¿å¯¹æ®éç¨æ·æç¹å°é¾ã<br /> ç°å¨ï¼Kylin å°æ ¹æ®æ¶éçç»è®¡ä¿¡æ¯èªå¨è°æ´å¤§å°ä¼°è®¡ãè¿å¯ä»¥ä½¿ä¼°è®¡å¼ä¸å®é 大尿´æ¥è¿ãæ¥ç KYLIN-3453 äºè§£æ´å¤ä¿¡æ¯ã</p> -<h3 id="hadoop-30hbase-20">æ¯æHadoop 3.0/HBase 2.0</h3> +<h3 id="æ¯æhadoop-30hbase-20">æ¯æHadoop 3.0/HBase 2.0</h3> <p>Hadoop 3å HBase 2å¼å§è¢«è®¸å¤ç¨æ·éç¨ãç°å¨ Kylin æä¾ä½¿ç¨æ°ç Hadoop å HBase API ç¼è¯çæ°äºè¿å¶å ãæä»¬å·²ç»å¨ Hortonworks HDP 3.0 å Cloudera CDH 6.0 ä¸è¿è¡äºæµè¯</p> <p><strong>ä¸è½½</strong></p> @@ -289,7 +289,7 @@ Graphic 10 Process of Querying Cube</ <p><em>é常æè°¢ææè´¡ç®Apache Kylinçæå!</em></p> </description> - <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate> + <pubDate>Thu, 20 Sep 2018 20:00:00 +0000</pubDate> <link>http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</link> <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</guid> @@ -361,7 +361,7 @@ Graphic 10 Process of Querying Cube</ <p><em>Great thanks to everyone who contributed!</em></p> </description> - <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate> + <pubDate>Thu, 20 Sep 2018 20:00:00 +0000</pubDate> <link>http://kylin.apache.org/blog/2018/09/20/release-v2.5.0/</link> <guid isPermaLink="true">http://kylin.apache.org/blog/2018/09/20/release-v2.5.0/</guid> @@ -407,22 +407,20 @@ GRANT ROLE ssb_write_role TO GROUP ssb_w # Then add kylin_manager_user to kylin_manager_group in OpenLDAP, so kylin_manager_user has access to the ssb database. </pre> <p>2 Assign HDFS directory /user/kylin_manager_user read and write permissions to kylin_manager_user user.<br /> -3 Configure the HADOOP_STREAMING_JAR environment variable under the kylin_manager_user user home directory.<br /> -<code class="highlighter-rouge"> -Export HADOOP_STREAMING_JAR=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar -</code></p> +3 Configure the HADOOP_STREAMING_JAR environment variable under the kylin_manager_user user home directory.</p> +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Export HADOOP_STREAMING_JAR=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar +</code></pre></div></div> <h2 id="download-the-ssb-tool-and-compile">Download the SSB tool and compile</h2> <p>You can quickly download and compile the ssb test tool by entering the following command in the linux terminal.</p> -<div class="highlighter-rouge"><pre class="highlight"><code>git clone https://github.com/jiangshouzhuang/ssb-kylin.git +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/jiangshouzhuang/ssb-kylin.git cd ssb-kylin cd ssb-benchmark make clean make -</code></pre> -</div> +</code></pre></div></div> <h2 id="adjust-the-ssb-parameters">Adjust the SSB parameters</h2> @@ -430,7 +428,7 @@ make <p>Part of the ssb.conf file is:</p> -<div class="highlighter-rouge"><pre class="highlight"><code> # customer base, default value is 30,000 +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> # customer base, default value is 30,000 customer_base = 30000 # part base, default value is 200,000 part_base = 200000 @@ -440,30 +438,27 @@ make date_base = 2556 # lineorder base (purchase record), default value is 6,000,000 lineorder_base = 6000000 -</code></pre> -</div> +</code></pre></div></div> <p>Of course, the above base parameters can be adjusted according to their actual needs, I use the default parameters.<br /> In the ssb.conf file, there are some parameters as follows.</p> -<div class="highlighter-rouge"><pre class="highlight"><code># manufacturer max. The value range is (1 .. manu_max) +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># manufacturer max. The value range is (1 .. manu_max) manu_max = 5 # category max. The value range is (1 .. cat_max) cat_max = 5 # brand max. The value range is (1 .. brand_max) brand_max = 40 -</code></pre> -</div> +</code></pre></div></div> <p><strong>The explanation is as follows:</strong> <br /> manu_max, cat_max and brand_max are used to define hierarchical scale. For example, manu_max=10, cat_max=10, and brand_max=10 refer to a total of 10 manufactures, and each manufactures has a maximum of 10 category parts, and each category has up to 10 brands. Therefore, the cardinality of manufacture is 10, the cardinality of category is 100, and the cardinality of brand is 1000.</p> -<div class="highlighter-rouge"><pre class="highlight"><code># customer: num of cities per country, default value is 100 +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># customer: num of cities per country, default value is 100 cust_city_max = 9 # supplier: num of cities per country, default value is 100 supp_city_max = 9 -</code></pre> -</div> +</code></pre></div></div> <p><strong>The explanation is as follows:</strong> <br /> cust_city_max and supp_city_max are used to define the number of city for each country in customer and supplier tables. If the total number of country is 30, and cust_city_max=100, supp_city_max=10, then the customer table will have 3000 different city, and the supplier table will have 300 different city.</p> @@ -494,19 +489,17 @@ ${KYLIN_INSTALL_USER_PASSWD} -d org.apac <p>If your CDH or other big data platform is not using beeline, but hive cli, please modify it yourself.<br /> Once everything is ready, we start running the program and generate test data:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>cd ssb-kylin +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ssb-kylin bin/run.sh --scale 20 -</code></pre> -</div> +</code></pre></div></div> <p>We set the scale to 20, the program will run for a while, the maximum lineorder table data has more than 100 million. After the program is executed, we look at the tables in the hive database and the amount of data:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>use ssb; +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>use ssb; show tables; select count(1) from lineorder; select count(1) from p_lineorder; -</code></pre> -</div> +</code></pre></div></div> <p><img src="/images/blog/2.1 generated tables.png" alt="" /></p> @@ -519,10 +512,9 @@ select count(1) from p_lineorder; <p>The ssb-kylin project has helped us build the project, model, and cube in advance. Just import the Kylin directly like the learn_kylin example. Cube Metadataâs directory is cubemeta, because our kylin integrates OpenLDAP, there is no ADMIN user, so the owner parameter in cubemeta/cube/ssb.json is set to null.<br /> Execute the following command to import cubemeta:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>cd ssb-kylin +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ssb-kylin $KYLIN_HOME/bin/metastore.sh restore cubemeta -</code></pre> -</div> +</code></pre></div></div> <p>Then log in to Kylin and execute Reload Metadata operation. This creates new project, model and cube in Kylin. Before building cube, first Disable, then Purge, delete old temporary files.</p> @@ -532,19 +524,17 @@ $KYLIN_HOME/bin/metastore.sh restore cub <p>Here I test the performance of Spark to build Cube again, disable the previously created Cube, and then Purge. Since the Cube is used by Purge, the useless HBase tables and HDFS files need to be deleted. Here, manually clean up the junk files. First execute the following command:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete false -</code></pre> -</div> +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete false +</code></pre></div></div> <p>Then check whether the listed HBase table and the HDFS file are useless. After confirming the error, perform the delete operation:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true -</code></pre> -</div> +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true +</code></pre></div></div> <p>When using Spark to build a cube, it consumes a lot of memory. After all, using memory resources improves the speed of cube building. Here I will list some of the parameters of Spark in the kylin.properties configuration file:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>kylin.engine.spark-conf.spark.master=yarn +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=cluster kylin.engine.spark-conf.spark.yarn.queue=root.kylin_manager_group # config Dynamic resource allocation @@ -560,8 +550,7 @@ kylin.engine.spark-conf.spark.driver.mem kylin.engine.spark-conf.spark.executor.memory=4G kylin.engine.spark-conf.spark.executor.cores=1 kylin.engine.spark-conf.spark.network.timeout=600 -</code></pre> -</div> +</code></pre></div></div> <p>The above parameters can meet most of the requirements, so users basically do not need to configure when designing the Cube. Of course, if the situation is special, you can still set Spark-related tuning parameters at the Cube level.</p> @@ -599,7 +588,7 @@ The query result of Scale=10 is as follo </ol> </description> - <pubDate>Mon, 16 Jul 2018 05:28:00 -0700</pubDate> + <pubDate>Mon, 16 Jul 2018 12:28:00 +0000</pubDate> <link>http://kylin.apache.org/blog/2018/07/16/Star-Schema-Benchmark-on-Apache-Kylin/</link> <guid isPermaLink="true">http://kylin.apache.org/blog/2018/07/16/Star-Schema-Benchmark-on-Apache-Kylin/</guid> @@ -649,7 +638,7 @@ The query result of Scale=10 is as follo <p>Wish you have a good time with Redash-Kylin!</p> </description> - <pubDate>Tue, 08 May 2018 13:00:00 -0700</pubDate> + <pubDate>Tue, 08 May 2018 20:00:00 +0000</pubDate> <link>http://kylin.apache.org/blog/2018/05/08/redash-kylin-plugin-strikingly/</link> <guid isPermaLink="true">http://kylin.apache.org/blog/2018/05/08/redash-kylin-plugin-strikingly/</guid> @@ -691,11 +680,11 @@ The query result of Scale=10 is as follo <p>Any issue or question,<br /> open JIRA to Apache Kylin project: <a href="https://issues.apache.org/jira/browse/KYLIN/">https://issues.apache.org/jira/browse/KYLIN/</a><br /> or<br /> -send mail to Apache Kylin dev mailing list: <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;</a></p> +send mail to Apache Kylin dev mailing list: <a href="mailto:d...@kylin.apache.org">d...@kylin.apache.org</a></p> <p><em>Great thanks to everyone who contributed!</em></p> </description> - <pubDate>Sun, 04 Mar 2018 12:00:00 -0800</pubDate> + <pubDate>Sun, 04 Mar 2018 20:00:00 +0000</pubDate> <link>http://kylin.apache.org/blog/2018/03/04/release-v2.3.0/</link> <guid isPermaLink="true">http://kylin.apache.org/blog/2018/03/04/release-v2.3.0/</guid> @@ -766,15 +755,14 @@ Figure 4: Build Cube in Apache Kylin< <li>Execute SQL in the âInsightâ tab, for example:</li> </ol> -<div class="highlighter-rouge"><pre class="highlight"><code> select part_dtï¼ +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> select part_dtï¼ sum(price) as total_selledï¼ count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt -- #This query will hit on the newly built Cube âKylin_sales_cubeâ. -</code></pre> -</div> +</code></pre></div></div> <ol> <li>Next, we will install Apache Superset and initialize it.<br /> @@ -782,15 +770,14 @@ Figure 4: Build Cube in Apache Kylin< <li>Install kylinpy</li> </ol> -<div class="highlighter-rouge"><pre class="highlight"><code> $ pip install kylinpy -</code></pre> -</div> +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> $ pip install kylinpy +</code></pre></div></div> <ol> <li>Verify your installation, if everything goes well, Apache Superset daemon should be up and running.</li> </ol> -<div class="highlighter-rouge"><pre class="highlight"><code>$ superset runserver -d +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ superset runserver -d Starting server with command: gunicorn -w 2 --timeout 60 -b 0.0.0.0:8088 --limit-request-line 0 --limit-request-field_size 0 superset:app @@ -799,18 +786,19 @@ gunicorn -w 2 --timeout 60 -b 0.0.0.0:8 [2018-01-03 15:54:03 +0800] [73673] [INFO] Using worker: sync [2018-01-03 15:54:03 +0800] [73676] [INFO] Booting worker with pid: 73676 [2018-01-03 15:54:03 +0800] [73679] [INFO] Booting worker with pid: 73679 -</code></pre> -</div> +</code></pre></div></div> <h2 id="connect-apache-kylin-from-apachesuperset">Connect Apache Kylin from ApacheSuperset</h2> -<p>Now everything you need is installed and ready to go. Letâs try to create an Apache Kylin data source in Apache Superset.<br /> -1. Open up http://localhost:8088 in your web browser with the credential you set during Apache Superset installation.<br /> +<p>Now everything you need is installed and ready to go. Letâs try to create an Apache Kylin data source in Apache Superset.</p> +<ol> + <li> + <p>Open up http://localhost:8088 in your web browser with the credential you set during Apache Superset installation.<br /> <img src="/images/Kylin-and-Superset/png/5. superset_1.png" alt="" /><br /> Figure 5: Apache Superset Login Page</p> - -<ol> - <li>Go to Source -&gt; Datasource to configure a new data source. + </li> + <li> + <p>Go to Source -&gt; Datasource to configure a new data source.</p> <ul> <li>SQLAlchemy URI pattern is : kylin://<username>:<password>@<hostname>:<port>/<project name=""></project></port></hostname></password></username></li> <li>Check âExpose in SQL Labâ if you want to expose this data source in SQL Lab.</li> @@ -856,9 +844,8 @@ Figure 11 Query multiple tables from Apa <img src="/images/Kylin-and-Superset/png/12. SQL_Lab_2.png" alt="" /><br /> Figure 12 Define your query and visualize it immediately</p> -<p>You may copy the entire SQL below to experience how you can query Kylin Cube in SQL Lab. <br /> -<code class="highlighter-rouge"> -select +<p>You may copy the entire SQL below to experience how you can query Kylin Cube in SQL Lab.</p> +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>select YEAR_BEG_DT, MONTH_BEG_DTï¼ WEEK_BEG_DTï¼ @@ -876,8 +863,8 @@ join KYLIN_CATEGORY_GROUPINGS on SITE_ID join KYLIN_ACCOUNT on ACCOUNT_ID=BUYER_ID join KYLIN_COUNTRY on ACCOUNT_COUNTRY=COUNTRY group by YEAR_BEG_DT, MONTH_BEG_DTï¼WEEK_BEG_DTï¼META_CATEG_NAMEï¼CATEG_LVL2_NAME, CATEG_LVL3_NAME, OPS_REGION, NAME -</code><br /> -## Experience All Features in Apache Superset with Apache Kylin</p> +</code></pre></div></div> +<h2 id="experience-all-features-in-apache-superset-with-apache-kylin">Experience All Features in Apache Superset with Apache Kylin</h2> <p>Most of the common reporting features are available in Apache Superset. Now letâs see how we can use those features to analyze data from Apache Kylin.</p> @@ -890,13 +877,14 @@ group by YEAR_BEG_DT, MONTH_BEG_DTï¼ Figure 13 Sort by</p> <h3 id="filtering">Filtering</h3> -<p>There are multiple ways you may filter data from Apache Kylin.<br /> -1. Date Filter<br /> +<p>There are multiple ways you may filter data from Apache Kylin.</p> +<ol> + <li> + <p>Date Filter<br /> You may filter date and time dimension with the calendar filter. <br /> <img src="/images/Kylin-and-Superset/png/14. time_filter.png" alt="" /><br /> Figure 14 Filtering time</p> - -<ol> + </li> <li> <p>Dimension Filter<br /> For other dimensions, you may filter it with SQL conditions like âin, not in, equal to, not equal to, greater than and equal to, smaller than and equal to, greater than, smaller than, likeâ.<br /> @@ -964,7 +952,7 @@ Figure 13 Sort by</p> </ol> </description> - <pubDate>Mon, 01 Jan 2018 04:28:00 -0800</pubDate> + <pubDate>Mon, 01 Jan 2018 12:28:00 +0000</pubDate> <link>http://kylin.apache.org/blog/2018/01/01/kylin-and-superset/</link> <guid isPermaLink="true">http://kylin.apache.org/blog/2018/01/01/kylin-and-superset/</guid> @@ -993,12 +981,11 @@ Figure 13 Sort by</p> <h3 id="make-spark-connect-hbase-with-kerberos-enabled">Make Spark connect HBase with Kerberos enabled</h3> <p>If just want to run Spark Cubing in Yarn client mode, we only need to add three line code before new SparkConf() in SparkCubingByLayer:</p> -<div class="highlighter-rouge"><pre class="highlight"><code> Configuration configuration = HBaseConnection.getCurrentHBaseConfiguration(); +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Configuration configuration = HBaseConnection.getCurrentHBaseConfiguration(); HConnection connection = HConnectionManager.createConnection(configuration); //Obtain an authentication token for the given user and add it to the user's credentials. TokenUtil.obtainAndCacheToken(connection, UserProvider.instantiate(configuration).create(UserGroupInformation.getCurrentUser())); -</code></pre> -</div> +</code></pre></div></div> <p>As for How to make Spark connect HBase using Kerberos in Yarn cluster mode, please refer to SPARK-6918, SPARK-12279, and HBASE-17040. The solution may work, but not elegant. So I tried the sencond solution.</p> @@ -1039,7 +1026,7 @@ This following picture shows the content <p>Following is the Spark configuration I used in our environment. It enables Spark dynamic resource allocation; the goal is to let our user set less Spark configurations.</p> -<div class="highlighter-rouge"><pre class="highlight"><code>//running in yarn-cluster mode +<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>//running in yarn-cluster mode kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=cluster @@ -1064,8 +1051,7 @@ kylin.engine.spark-conf.spark.network.ti kylin.engine.spark-conf.spark.yarn.queue=root.hadoop.test kylin.engine.spark.rdd-partition-cut-mb=100 -</code></pre> -</div> +</code></pre></div></div> <h3 id="performance-test-of-spark-cubing">Performance test of Spark Cubing</h3> @@ -1121,7 +1107,7 @@ kylin.engine.spark.rdd-partition-cut-mb= <p>Spark Cubing is a great feature for Kylin 2.0, Thanks Kylin community. We will apply Spark Cubing in real scenarios in our company. I believe Spark Cubing will be more robust and efficient in the future releases.</p> </description> - <pubDate>Fri, 21 Jul 2017 15:22:22 -0700</pubDate> + <pubDate>Fri, 21 Jul 2017 22:22:22 +0000</pubDate> <link>http://kylin.apache.org/blog/2017/07/21/Improving-Spark-Cubing/</link> <guid isPermaLink="true">http://kylin.apache.org/blog/2017/07/21/Improving-Spark-Cubing/</guid> @@ -1141,11 +1127,10 @@ kylin.engine.spark.rdd-partition-cut-mb= <p>In Apache Kylin, we support the similar SQL sytanx like Apache Hive, with a aggregation function called <strong>percentile(&lt;Number Column&gt;, &lt;Double&gt;)</strong>:</p> -<div class="highlighter-rouge"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">seller_id</span><span class="p">,</span> <span class="n">percentile</span><span class="p">(</span><span class="n">price</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="p">)</span> +<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">seller_id</span><span class="p">,</span> <span class="n">percentile</span><span class="p">(</span><span class="n">price</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">test_kylin_fact</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">seller_id</span> -</code></pre> -</div> +</code></pre></div></div> <h3 id="how-to-use">How to use</h3> <p>If you know little about <em>Cubes</em>, please go to <a href="http://kylin.apache.org/docs20/tutorial/kylin_sample.html">QuickStart</a> first to learn basic knowledge.</p> @@ -1162,7 +1147,7 @@ kylin.engine.spark.rdd-partition-cut-mb= <p><img src="/images/blog/percentile_3.png" alt="" /></p> </description> - <pubDate>Sat, 01 Apr 2017 15:22:22 -0700</pubDate> + <pubDate>Sat, 01 Apr 2017 22:22:22 +0000</pubDate> <link>http://kylin.apache.org/blog/2017/04/01/percentile-measure/</link> <guid isPermaLink="true">http://kylin.apache.org/blog/2017/04/01/percentile-measure/</guid> @@ -1191,23 +1176,23 @@ kylin.engine.spark.rdd-partition-cut-mb= <li>å¨ Cloud ä¸éè¿æµè¯ (<a href="https://issues.apache.org/jira/browse/KYLIN-2351">KYLIN-2351</a>)</li> </ul> -<p>é常欢è¿å¤§å®¶ä¸è½½å¹¶æµè¯ v2.0.0 betaãæ¨çåé¦å¯¹æä»¬é常éè¦ï¼è¯·åé®ä»¶å° <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;</a>ã</p> +<p>é常欢è¿å¤§å®¶ä¸è½½å¹¶æµè¯ v2.0.0 betaãæ¨çåé¦å¯¹æä»¬é常éè¦ï¼è¯·åé®ä»¶å° <a href="mailto:d...@kylin.apache.org">d...@kylin.apache.org</a>ã</p> <hr /> -<h2 id="section">å®è£ </h2> +<h2 id="å®è£ ">å®è£ </h2> <p>ææ¶ v2.0.0 beta æ æ³ä» v1.6.0 ç´æ¥å级ï¼å¿ éå ¨æ°å®è£ ãè¿æ¯ç±äºæ°çæ¬çå æ°æ®å¹¶ä¸ååå ¼å®¹ãå¥½å¨ Cube æ°æ®æ¯ååå ¼å®¹çï¼å æ¤åªéè¦å¼åä¸ä¸ªå æ°æ®è½¬æ¢å·¥å ·ï¼å°±è½å¨ä¸ä¹ çå°æ¥å®ç°å¹³æ»å级ãæä»¬æ£å¨ä¸ºæ¤åªåã</p> <hr /> -<h2 id="tpc-h-">è¿è¡ TPC-H åºåæµè¯</h2> +<h2 id="è¿è¡-tpc-h-åºåæµè¯">è¿è¡ TPC-H åºåæµè¯</h2> <p>å¨ Apache Kylin ä¸è¿è¡ TPC-H çå ·ä½æ¥éª¤: <a href="https://github.com/Kyligence/kylin-tpch">https://github.com/Kyligence/kylin-tpch</a></p> <hr /> -<h2 id="spark-">Spark æå»ºå¼æ</h2> +<h2 id="spark-æå»ºå¼æ">Spark æå»ºå¼æ</h2> <p>Apache Kylin v2.0.0 å¼å ¥äºä¸ä¸ªå ¨æ°çåºäº Apache Spark çæå»ºå¼æãå®å¯ç¨äºæ¿æ¢åæç MapReduce æå»ºå¼æã忥æµè¯æ¾ç¤º Cube çæå»ºæ¶é´ä¸è¬è½ç¼©çå°åå ç 50% å·¦å³ã</p> @@ -1217,7 +1202,7 @@ kylin.engine.spark.rdd-partition-cut-mb= <p><em>æè°¢æ¯ä¸ä½æåçåä¸åè´¡ç®!</em></p> </description> - <pubDate>Sat, 25 Feb 2017 12:00:00 -0800</pubDate> + <pubDate>Sat, 25 Feb 2017 20:00:00 +0000</pubDate> <link>http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</link> <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</guid>