Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1841530&r1=1841529&r2=1841530&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Fri Sep 21 03:31:15 2018 @@ -19,11 +19,146 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Wed, 19 Sep 2018 06:59:19 -0700</pubDate> - <lastBuildDate>Wed, 19 Sep 2018 06:59:19 -0700</lastBuildDate> + <pubDate>Thu, 20 Sep 2018 20:21:48 -0700</pubDate> + <lastBuildDate>Thu, 20 Sep 2018 20:21:48 -0700</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item> + <title>Apache Kylin v2.5.0 æ£å¼åå¸</title> + <description><p>è¿æ¥Apache Kylin 社åºå¾é«å ´å°å®£å¸ï¼Apache Kylin 2.5.0 æ£å¼åå¸ã</p> + +<p>Apache Kylin æ¯ä¸ä¸ªå¼æºçåå¸å¼åæå¼æï¼æ¨å¨ä¸ºæå¤§æ°æ®éæä¾ SQL æ¥å£åå¤ç»´åæï¼OLAPï¼çè½åã</p> + +<p>è¿æ¯ç»§2.4.0 åçä¸ä¸ªæ°åè½çæ¬ãè¯¥çæ¬å¼å ¥äºå¾å¤æä»·å¼çæ¹è¿ï¼å®æ´çæ¹å¨å表请åè§<a href="https://kylin.apache.org/docs/release_notes.html">release notes</a>ï¼è¿éæä¸äºä¸»è¦æ¹è¿å说æï¼</p> + +<h3 id="all-in-spark--cubing-">All-in-Spark ç Cubing 弿</h3> +<p>Kylin ç Spark 弿å°ä½¿ç¨ Spark è¿è¡ cube 计ç®ä¸çææåå¸å¼ä½ä¸ï¼å æ¬è·åå个维度çä¸åå¼ï¼å° cuboid æä»¶è½¬æ¢ä¸º HBase HFileï¼åå¹¶ segmentï¼åå¹¶è¯å ¸çãé»è®¤ç Spark é ç½®ä¹ç»è¿ä¼åï¼ä½¿å¾ç¨æ·å¯ä»¥è·å¾å¼ç®±å³ç¨çä½éªãç¸å ³å¼å任塿¯ KYLIN-3427, KYLIN-3441, KYLIN-3442.</p> + +<p>Spark ä»»å¡ç®¡çä¹æææ¹è¿ï¼ä¸æ¦ Spark ä»»å¡å¼å§è¿è¡ï¼æ¨å°±å¯ä»¥å¨Webæ§å¶å°ä¸è·å¾ä½ä¸é¾æ¥ï¼å¦ææ¨ä¸¢å¼è¯¥ä½ä¸ï¼Kylin å°ç«å»ç»æ¢ Spark ä½ä¸ä»¥åæ¶éæ¾èµæºï¼å¦æéæ°å¯å¨ Kylinï¼å®å¯ä»¥ä»ä¸ä¸ä¸ªä½ä¸æ¢å¤ï¼è䏿¯éæ°æäº¤æ°ä½ä¸.</p> + +<h3 id="mysql--kylin-">MySQL å Kylin å æ°æ®çåå¨</h3> +<p>å¨è¿å»ï¼HBase æ¯ Kylin å æ°æ®åå¨çå¯ä¸éæ©ã å¨æäºæ åµä¸ HBaseä¸éç¨ï¼ä¾å¦ä½¿ç¨å¤ä¸ª HBase é群æ¥ä¸º Kylin æä¾è·¨åºåçé«å¯ç¨ï¼è¿éå¤å¶ç HBase é群æ¯åªè¯»çï¼æä»¥ä¸è½åå æ°æ®åå¨ãç°å¨æä»¬å¼å ¥äº MySQL Metastore 以满足è¿ç§éæ±ãæ¤åè½ç°å¨å¤äºæµè¯é¶æ®µãæ´å¤å 容åè§ KYLIN-3488ã</p> + +<h3 id="hybrid-model-">Hybrid model å¾å½¢çé¢</h3> +<p>Hybrid æ¯ä¸ç§ç¨äºç»è£ å¤ä¸ª cube çé«çº§æ¨¡åã å®å¯ç¨äºæ»¡è¶³ cube ç schema è¦åçæ¹åçæ åµãè¿ä¸ªåè½è¿å»æ²¡æå¾å½¢çé¢ï¼å æ¤åªæä¸å°é¨åç¨æ·ç¥éå®ãç°å¨æä»¬å¨ Web çé¢ä¸å¼å¯äºå®ï¼ä»¥ä¾¿æ´å¤ç¨æ·å¯ä»¥å°è¯ã</p> + +<h3 id="cube-planner">é»è®¤å¼å¯ Cube planner</h3> +<p>Cube planner å¯ä»¥æå¤§å°ä¼å cube ç»æï¼åå°æå»ºç cuboid æ°éï¼ä»èèç计ç®/åå¨èµæºå¹¶æé«æ¥è¯¢æ§è½ã宿¯å¨v2.3ä¸å¼å ¥çï¼ä½é»è®¤æ åµä¸æ²¡æå¼å¯ã为äºè®©æ´å¤ç¨æ·çå°å¹¶å°è¯å®ï¼æä»¬é»è®¤å¨v2.5ä¸å¯ç¨å®ã ç®æ³å°å¨ç¬¬ä¸æ¬¡æå»º segment çæ¶åï¼æ ¹æ®æ°æ®ç»è®¡èªå¨ä¼å cuboid éå.</p> + +<h3 id="segment-">æ¹è¿ç Segment åªæ</h3> +<p>Segmentï¼ååºï¼ä¿®åªå¯ä»¥ææå°åå°ç£çåç½ç»I / Oï¼å æ¤å¤§å¤§æé«äºæ¥è¯¢æ§è½ã è¿å»ï¼Kylin åªæååºå (partition date column) çå¼è¿è¡ segment çä¿®åªã 妿æ¥è¯¢ä¸æ²¡æå°ååºåä½ä¸ºè¿æ»¤æ¡ä»¶ï¼é£ä¹ä¿®åªå°ä¸èµ·ä½ç¨ï¼ä¼æ«æææsegmentã.<br /> +ç°å¨ä»v2.5å¼å§ï¼Kylin å°å¨ segment 级å«è®°å½æ¯ä¸ªç»´åº¦çæå°/æå¤§å¼ã 卿«æ segment ä¹åï¼ä¼å°æ¥è¯¢çæ¡ä»¶ä¸æå°/æå¤§ç´¢å¼è¿è¡æ¯è¾ã 妿ä¸å¹é ï¼å°è·³è¿è¯¥ segmentã æ£æ¥KYLIN-3370äºè§£æ´å¤ä¿¡æ¯ã</p> + +<h3 id="yarn-">å¨ YARN ä¸åå¹¶åå ¸</h3> +<p>å½ segment åå¹¶æ¶ï¼å®ä»¬çè¯å ¸ä¹éè¦åå¹¶ãå¨è¿å»ï¼åå ¸åå¹¶åçå¨ Kylin ç JVM ä¸ï¼è¿éè¦ä½¿ç¨å¤§éçæ¬å°å åå CPU èµæºã å¨æç«¯æ åµä¸ï¼å¦ææå 个并åä½ä¸ï¼ï¼å¯è½ä¼å¯¼è´ Kylin è¿ç¨å´©æºã å æ¤ï¼ä¸äºç¨æ·ä¸å¾ä¸ä¸º Kylin ä»»å¡èç¹åé æ´å¤å åï¼æè¿è¡å¤ä¸ªä»»å¡èç¹ä»¥å¹³è¡¡å·¥ä½è´è½½ã<br /> +ç°å¨ä»v2.5å¼å§ï¼Kylin å°æè¿é¡¹ä»»å¡æäº¤ç» Hadoop MapReduce å Sparkï¼è¿æ ·å°±å¯ä»¥è§£å³è¿ä¸ªç¶é¢é®é¢ã æ¥çKYLIN-3471äºè§£æ´å¤ä¿¡æ¯.</p> + +<h3 id="cube-">æ¹è¿ä½¿ç¨å ¨å±åå ¸ç cube æå»ºæ§è½</h3> +<p>å ¨å±åå ¸ (Global Dictionary) æ¯ bitmap 精确å»é计æ°çå¿ è¦æ¡ä»¶ã妿å»éåå ·æé常é«çåºæ°ï¼å GD å¯è½é常大ãå¨ cube æå»ºé¶æ®µï¼Kylin éè¦éè¿ GD å°éæ´æ°å¼è½¬æ¢ä¸ºæ´æ°ã尽管 GD 已被åæå¤ä¸ªåçï¼å¯ä»¥åå¼å è½½å°å åï¼ä½æ¯ç±äºå»éåç弿¯ä¹±åºçãKylin éè¦åå¤è½½å ¥åè½½åº(swap in/out)åçï¼è¿ä¼å¯¼è´æå»ºä»»å¡éå¸¸ç¼æ ¢ã<br /> +该å¢å¼ºåè½å¼å ¥äºä¸ä¸ªæ°æ¥éª¤ï¼ä¸ºæ¯ä¸ªæ°æ®åä»å ¨å±åå ¸ä¸æå»ºä¸ä¸ªç¼©å°çåå ¸ã éåæ¯ä¸ªä»»å¡åªéè¦å 载缩å°çåå ¸ï¼ä»èé¿å é¢ç¹çè½½å ¥åè½½åºãæ§è½å¯ä»¥æ¯ä»¥åå¿«3åãæ¥ç KYLIN-3491 äºè§£æ´å¤ä¿¡æ¯.</p> + +<h3 id="topn-count-distinct--cube-">æ¹è¿å« TOPN, COUNT DISTINCT ç cube 大å°ç估计</h3> +<p>Cube ç大å°å¨æå»ºæ¶æ¯é¢å 估计çï¼å¹¶è¢«åç»å 个æ¥éª¤ä½¿ç¨ï¼ä¾å¦å³å® MR / Spark ä½ä¸çååºæ°ï¼è®¡ç® HBase region åå²çãå®çåç¡®ä¸å¦ä¼å¯¹æå»ºæ§è½äº§çå¾å¤§å½±åã å½åå¨ COUNT DISTINCTï¼TOPN çåº¦éæ¶åï¼å 为å®ä»¬ç大尿¯çµæ´»çï¼å æ¤ä¼°è®¡å¼å¯è½è·çå®å¼æå¾å¤§åå·®ã å¨è¿å»ï¼ç¨æ·éè¦è°æ´è¥å¹²ä¸ªåæ°ä»¥ä½¿å°ºå¯¸ä¼°è®¡æ´æ¥è¿å®é 尺寸ï¼è¿å¯¹æ®éç¨æ·æç¹å°é¾ã<br /> +ç°å¨ï¼Kylin å°æ ¹æ®æ¶éçç»è®¡ä¿¡æ¯èªå¨è°æ´å¤§å°ä¼°è®¡ãè¿å¯ä»¥ä½¿ä¼°è®¡å¼ä¸å®é 大尿´æ¥è¿ãæ¥ç KYLIN-3453 äºè§£æ´å¤ä¿¡æ¯ã</p> + +<h3 id="hadoop-30hbase-20">æ¯æHadoop 3.0/HBase 2.0</h3> +<p>Hadoop 3å HBase 2å¼å§è¢«è®¸å¤ç¨æ·éç¨ãç°å¨ Kylin æä¾ä½¿ç¨æ°ç Hadoop å HBase API ç¼è¯çæ°äºè¿å¶å ãæä»¬å·²ç»å¨ Hortonworks HDP 3.0 å Cloudera CDH 6.0 ä¸è¿è¡äºæµè¯</p> + +<p><strong>ä¸è½½</strong></p> + +<p>è¦ä¸è½½Apache Kylin v2.5.0æºä»£ç æäºè¿å¶å ï¼è¯·è®¿é®<a href="http://kylin.apache.org/download">ä¸è½½é¡µé¢</a> .</p> + +<p><strong>å级</strong></p> + +<p>åè<a href="/docs/howto/howto_upgrade.html">å级æå</a>.</p> + +<p><strong>åé¦</strong></p> + +<p>妿æ¨éå°é®é¢æçé®ï¼è¯·åéé®ä»¶è³ Apache Kylin dev æ user é®ä»¶å表ï¼d...@kylin.apache.orgï¼u...@kylin.apache.org; å¨åéä¹åï¼è¯·ç¡®ä¿æ¨å·²éè¿åéçµåé®ä»¶è³ dev-subscr...@kylin.apache.org æ user-subscr...@kylin.apache.org订é äºé®ä»¶å表ã</p> + +<p><em>é常æè°¢ææè´¡ç®Apache Kylinçæå!</em></p> +</description> + <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate> + <link>http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</link> + <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</guid> + + + <category>blog</category> + + </item> + + <item> + <title>Apache Kylin v2.5.0 Release Announcement</title> + <description><p>The Apache Kylin community is pleased to announce the release of Apache Kylin v2.5.0.</p> + +<p>Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Big Data supporting extremely large datasets.</p> + +<p>This is a major release after 2.4.0, including many enhancements. All of the changes can be found in the <a href="https://kylin.apache.org/docs/release_notes.html">release notes</a>. Here just highlight the major ones:</p> + +<h3 id="the-all-in-spark-cubing-engine">The all-in-Spark cubing engine</h3> +<p>Now Kylinâs Spark engine will run all distributed jobs in Spark, including fetch distinct dimension values, converting cuboid files to HBase HFile, merging segments, merging dictionaries, etc. The default configurations are tuned so the user can get an out-of-box experience. The overall performance with the previous version is close, but we assume Spark has more room to improve. The related tasks are KYLIN-3427, KYLIN-3441, KYLIN-3442.</p> + +<p>There are also improvements in the job management. Now you can get the job link on the web console once Spark starts to run. If you discard the job, Kylin will kill the Spark job to release the resource in time. If Kylin is restarted, it can resume from the previous job instead of resubmitting a new job. <br /> +### MySQL as Kylin metastore<br /> +In the past, HBase is the only option for Kylin metadata. In some cases, this is not applicable, for example using replicated HBase cluster for Kylinâs HA (the replicated HBase is read only). Now we introduce the MySQL metastore to fulfill such need. This function is in beta now. Check KYLIN-3488 for more.</p> + +<h3 id="hybrid-model-web-gui">Hybrid model web GUI</h3> +<p>Hybrid is an advanced model for compositing multiple Cubes. It can be used for the Cube schema change issue. This function had no GUI in the past so only a small portion of Kylin users know it. Now we added the web GUI for it so everyone can try it.</p> + +<h3 id="enable-cube-planner-by-default">Enable Cube planner by default</h3> +<p>The Cube planner can greatly optimize the cube structure, save the computing/storage resources and improve the query performance. It was introduced in v2.3 but is disabled by default. In order to let more users seeing and trying it, we enable it by default in v2.5. The algorithm will automatically optimize the cube by your data statistics on the first build.</p> + +<h3 id="advanced-segment-pruning">Advanced segment pruning</h3> +<p>Segment (partition) pruning can efficiently reduce the disk and network I/O, so to greatly improve the query performance. In the past, Kylin only prunes segments by the partition columnâs value. If the query doesnât have the partition column as the filtering condition, the pruning wonât work, all segments will be scanned.</p> + +<p>Now from v2.5, Kylin will record the min/max value for EVERY dimension at the segment level. Before scanning a segment, it will compare the queryâs conditions with the min/max index. If not matched, the segment will be skipped. Check KYLIN-3370 for more.</p> + +<h3 id="merge-dictionary-on-yarn">Merge dictionary on YARN</h3> + +<p>When segments get merged, their dictionaries also need to be merged. In the past, the merging happens in Kylinâs JVM, which takes a lot of memory and CPU resources. In extreme case (if you have a couple of concurrent jobs) it may crash the Kylin process. Since this, some users have to allocate much more memory to Kylin job node or run multiple job nodes to balance the workload.</p> + +<p>Now from v2.5, Kylin will submit this task to Hadoop MR or Spark, so this bottleneck can be solved. Check KYLIN-3471 for more.</p> + +<h3 id="improve-building-performance-for-reading-global-dictionary">Improve building performance for reading Global Dictionary</h3> + +<p>Global Dictionary is a must for bitmap count distinct. The GD can be very large if the column has a very high cardinality. In the cube building phase, Kylin need to translate the non-integer values to integers by the GD. Although the GD has been split into several slices, the values are often scrambled. Kylin needs swap in/out the slices into memory repeatedly, which causes the building slowly.</p> + +<p>The enhancement introduces a new step to build a shrunken dictionary for each data block. Then each task only loads the shrunken dictionary, which is quite small, so there is no swap in/out any more in the cubing step. Then the performance can be 3x faster than before. Check KYLIN-3491 for more.</p> + +<h3 id="improved-cube-size-estimation-for-topn-count-distinct">Improved cube size estimation for TOPN, COUNT DISTINCT</h3> + +<p>Cube size estimation is used in several steps, such as decides the MR/Spark job partition number, calculates the HBase region number etc. It will affect the build performance much. The estimation can be wild when there is COUNT DISTINCT, TOPN measures because their size is flexible. The incorrect estimation may cause too many data partitions and then too many tasks. In the past, users need to tune several parameters to make the size estimation more close to real size, that is hard to do.</p> + +<p>Now Kylin will correct the size estimation automatically based on the collected data statistics. This can make the estimation much closer with the real size than before. Check KYLIN-3453 for more.</p> + +<h3 id="hadoop-30hbase-20-support">Hadoop 3.0/HBase 2.0 support</h3> + +<p>Hadoop 3 and HBase 2 starts to be adopted by many users. Now we provide new binary packages compiled with the new Hadoop and HBase API. We tested them on Hortonworks HDP 3.0 and Cloudera CDH 6.0.</p> + +<p><strong>Download</strong></p> + +<p>To download Apache Kylin v2.5.0 source code or binary package, visit the <a href="http://kylin.apache.org/download">download</a> page.</p> + +<p><strong>Upgrade</strong></p> + +<p>Follow the <a href="/docs/howto/howto_upgrade.html">upgrade guide</a>.</p> + +<p><strong>Feedback</strong></p> + +<p>If you face issue or question, please send mail to Apache Kylin dev or user mailing list: d...@kylin.apache.org , u...@kylin.apache.org; Before sending, please make sure you have subscribed the mailing list by dropping an email to dev-subscr...@kylin.apache.org or user-subscr...@kylin.apache.org.</p> + +<p><em>Great thanks to everyone who contributed!</em></p> +</description> + <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate> + <link>http://kylin.apache.org/blog/2018/09/20/release-v2.5.0/</link> + <guid isPermaLink="true">http://kylin.apache.org/blog/2018/09/20/release-v2.5.0/</guid> + + + <category>blog</category> + + </item> + + <item> <title>Use Star Schema Benchmark for Apache Kylin</title> <description><h2 id="background">Background</h2> @@ -825,61 +960,6 @@ kylin.engine.spark.rdd-partition-cut-mb= </item> <item> - <title>Apache Kylin v2.0.0 beta åå¸</title> - <description><p>Apache Kylin社åºé常é«å ´å°å®£å¸ <a href="http://kylin.apache.org/cn/download/">v2.0.0 beta package</a> å·²ç»å¯ä»¥ä¸è½½å¹¶æµè¯äºã</p> - -<ul> - <li>ä¸è½½é¾æ¥: <a href="http://kylin.apache.org/cn/download/">http://kylin.apache.org/cn/download/</a></li> - <li>æºä»£ç : https://github.com/apache/kylin/tree/kylin-2.0.0-beta</li> -</ul> - -<p>èªä»v1.6.0çæ¬åå¸å·²ç»2ä¸ªå¤æäºãè¿æ®µæ¶é´éï¼æ´ä¸ªç¤¾åºååå¼å宿äºä¸ç³»åé大çåè½ï¼å¸æè½å°Apache Kylinæåå°ä¸ä¸ªæ°çé«åº¦ã</p> - -<ul> - <li>æ¯æéªè±æ¨¡å (<a href="https://issues.apache.org/jira/browse/KYLIN-1875">KYLIN-1875</a>)</li> - <li>æ¯æ TPC-H æ¥è¯¢ (<a href="https://issues.apache.org/jira/browse/KYLIN-2467">KYLIN-2467</a>)</li> - <li>Spark æå»ºå¼æ (<a href="https://issues.apache.org/jira/browse/KYLIN-2331">KYLIN-2331</a>)</li> - <li>Job Engine é«å¯ç¨æ§ (<a href="https://issues.apache.org/jira/browse/KYLIN-2006">KYLIN-2006</a>)</li> - <li>Percentile 度é (<a href="https://issues.apache.org/jira/browse/KYLIN-2396">KYLIN-2396</a>)</li> - <li>å¨ Cloud ä¸éè¿æµè¯ (<a href="https://issues.apache.org/jira/browse/KYLIN-2351">KYLIN-2351</a>)</li> -</ul> - -<p>é常欢è¿å¤§å®¶ä¸è½½å¹¶æµè¯ v2.0.0 betaãæ¨çåé¦å¯¹æä»¬é常éè¦ï¼è¯·åé®ä»¶å° <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;</a>ã</p> - -<hr /> - -<h2 id="section">å®è£ </h2> - -<p>ææ¶ v2.0.0 beta æ æ³ä» v1.6.0 ç´æ¥å级ï¼å¿ éå ¨æ°å®è£ ãè¿æ¯ç±äºæ°çæ¬çå æ°æ®å¹¶ä¸ååå ¼å®¹ãå¥½å¨ Cube æ°æ®æ¯ååå ¼å®¹çï¼å æ¤åªéè¦å¼åä¸ä¸ªå æ°æ®è½¬æ¢å·¥å ·ï¼å°±è½å¨ä¸ä¹ çå°æ¥å®ç°å¹³æ»å级ãæä»¬æ£å¨ä¸ºæ¤åªåã</p> - -<hr /> - -<h2 id="tpc-h-">è¿è¡ TPC-H åºåæµè¯</h2> - -<p>å¨ Apache Kylin ä¸è¿è¡ TPC-H çå ·ä½æ¥éª¤: <a href="https://github.com/Kyligence/kylin-tpch">https://github.com/Kyligence/kylin-tpch</a></p> - -<hr /> - -<h2 id="spark-">Spark æå»ºå¼æ</h2> - -<p>Apache Kylin v2.0.0 å¼å ¥äºä¸ä¸ªå ¨æ°çåºäº Apache Spark çæå»ºå¼æãå®å¯ç¨äºæ¿æ¢åæç MapReduce æå»ºå¼æã忥æµè¯æ¾ç¤º Cube çæå»ºæ¶é´ä¸è¬è½ç¼©çå°åå ç 50% å·¦å³ã</p> - -<p>å¯ç¨ Spark æå»ºå¼æï¼è¯·åè<a href="/docs16/tutorial/cube_spark.html">è¿ç¯ææ¡£</a>.</p> - -<hr /> - -<p><em>æè°¢æ¯ä¸ä½æåçåä¸åè´¡ç®!</em></p> -</description> - <pubDate>Sat, 25 Feb 2017 12:00:00 -0800</pubDate> - <link>http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</link> - <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</guid> - - - <category>blog</category> - - </item> - - <item> <title>Apache Kylin v2.0.0 Beta Announcement</title> <description><p>The Apache Kylin community is pleased to announce the <a href="http://kylin.apache.org/download/">v2.0.0 beta package</a> is ready for download and test.</p> @@ -935,111 +1015,54 @@ kylin.engine.spark.rdd-partition-cut-mb= </item> <item> - <title>By-layer Spark Cubing</title> - <description><p>Before v2.0, Apache Kylin uses Hadoop MapReduce as the framework to build Cubes over huge dataset. The MapReduce framework is simple, stable and can fulfill Kylinâs need very well except the performance. In order to get better performance, we introduced the âfast cubingâ algorithm in Kylin v1.5, tries to do as much as possible aggregations at map side within memory, so to avoid the disk and network I/O; but not all data models can benefit from it, and it still runs on MR which means on-disk sorting and shuffling.</p> - -<p>Now Spark comes; Apache Spark is an open-source cluster-computing framework, which provides programmers with an application programming interface centered on a data structure called RDD; it runs in-memory on the cluster, this makes repeated access to the same data much faster. Spark provides flexible and fancy APIs. You are not tied to Hadoopâs MapReduce two-stage paradigm.</p> - -<p>Before introducing how calculate Cube with Spark, letâs see how Kylin do that with MR; Figure 1 illustrates how a 4-dimension Cube get calculated with the classic âby-layerâ algorithm: the first round MR aggregates the base (4-D) cuboid from source data; the second MR aggregates on the base cuboid to get the 3-D cuboids; With N+1 round MR all layersâ cuboids get calculated.</p> - -<p><img src="/images/blog/spark-mr-layer.png" alt="MapReduce Cubing by Layer" /></p> - -<p>The âby-layerâ Cubing divides a big task into a couple steps, and each step bases on the previous stepâs output, so it can reuse the previous calculation and also avoid calculating from very beginning when there is a failure in between. These makes it as a reliable algorithm. When moving to Spark, we decide to keep this algorithm, thatâs why we call this feature as âBy layer Spark Cubingâ.</p> - -<p>As we know, RDD (Resilient Distributed Dataset) is a basic concept in Spark. A collection of N-Dimension cuboids can be well described as an RDD, a N-Dimension Cube will have N+1 RDD. These RDDs have the parent/child relationship as the parent can be used to generate the children. With the parent RDD cached in memory, the child RDDâs generation can be much efficient than reading from disk. Figure 2 describes this process.</p> - -<p><img src="/images/blog/spark-cubing-layer.png" alt="Spark Cubing by Layer" /></p> - -<p>Figure 3 is the DAG of Cubing in Spark, it illustrates the process in detail: In âStage 5â, Kylin uses a HiveContext to read the intermediate Hive table, and then do a âmapâ operation, which is an one to one map, to encode the origin values into K-V bytes. On complete Kylin gets an intermediate encoded RDD. In âStage 6â, the intermediate RDD is aggregated with a âreduceByKeyâ operation to get RDD-1, which is the base cuboid. Nextly, do an âflatMapâ (one to many map) on RDD-1, because the base cuboid has N children cuboids. And so on, all levelsâ RDDs get calculated. These RDDs will be persisted to distributed file system on complete, but be cached in memory for next levelâs calculation. When child be generated, it will be removed from cache.</p> - -<p><img src="/images/blog/spark-dag.png" alt="DAG of Spark Cubing" /></p> - -<p>We did a test to see how much performance improvement can gain from Spark:</p> - -<p>Environment</p> - -<ul> - <li>4 nodes Hadoop cluster; each node has 28 GB RAM and 12 cores;</li> - <li>YRAN has 48GB RAM and 30 cores in total;</li> - <li>CDH 5.8, Apache Kylin 2.0 beta.</li> -</ul> - -<p>Spark</p> - -<ul> - <li>Spark 1.6.3 on YARN</li> - <li>6 executors, each has 4 cores, 4GB +1GB (overhead) memory</li> -</ul> - -<p>Test Data</p> + <title>Apache Kylin v2.0.0 beta åå¸</title> + <description><p>Apache Kylin社åºé常é«å ´å°å®£å¸ <a href="http://kylin.apache.org/cn/download/">v2.0.0 beta package</a> å·²ç»å¯ä»¥ä¸è½½å¹¶æµè¯äºã</p> <ul> - <li>Airline data, total 160 million rows</li> - <li>Cube: 10 dimensions, 5 measures (SUM)</li> + <li>ä¸è½½é¾æ¥: <a href="http://kylin.apache.org/cn/download/">http://kylin.apache.org/cn/download/</a></li> + <li>æºä»£ç : https://github.com/apache/kylin/tree/kylin-2.0.0-beta</li> </ul> -<p>Test Scenarios</p> +<p>èªä»v1.6.0çæ¬åå¸å·²ç»2ä¸ªå¤æäºãè¿æ®µæ¶é´éï¼æ´ä¸ªç¤¾åºååå¼å宿äºä¸ç³»åé大çåè½ï¼å¸æè½å°Apache Kylinæåå°ä¸ä¸ªæ°çé«åº¦ã</p> <ul> - <li>Build the cube at different source data level: 3 million, 50 million and 160 million source rows; Compare the build time with MapReduce (by layer) and Spark. No compression enabled.<br /> -The time only cover the building cube step, not including data preparations and subsequent steps.</li> + <li>æ¯æéªè±æ¨¡å (<a href="https://issues.apache.org/jira/browse/KYLIN-1875">KYLIN-1875</a>)</li> + <li>æ¯æ TPC-H æ¥è¯¢ (<a href="https://issues.apache.org/jira/browse/KYLIN-2467">KYLIN-2467</a>)</li> + <li>Spark æå»ºå¼æ (<a href="https://issues.apache.org/jira/browse/KYLIN-2331">KYLIN-2331</a>)</li> + <li>Job Engine é«å¯ç¨æ§ (<a href="https://issues.apache.org/jira/browse/KYLIN-2006">KYLIN-2006</a>)</li> + <li>Percentile 度é (<a href="https://issues.apache.org/jira/browse/KYLIN-2396">KYLIN-2396</a>)</li> + <li>å¨ Cloud ä¸éè¿æµè¯ (<a href="https://issues.apache.org/jira/browse/KYLIN-2351">KYLIN-2351</a>)</li> </ul> -<p><img src="/images/blog/spark-mr-performance.png" alt="Spark vs MR performance" /></p> - -<p>Spark is faster than MR in all the 3 scenarios, and overall it can reduce about half time in the cubing.</p> +<p>é常欢è¿å¤§å®¶ä¸è½½å¹¶æµè¯ v2.0.0 betaãæ¨çåé¦å¯¹æä»¬é常éè¦ï¼è¯·åé®ä»¶å° <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;</a>ã</p> -<p>Now you can download a 2.0.0 beta build from Kylinâs download page, and then follow this <a href="https://kylin.apache.org/blog/2017/02/25/v2.0.0-beta-ready/">post</a> to build a cube with Spark engine. If you have any comments or inputs, please discuss in the community.</p> +<hr /> -</description> - <pubDate>Thu, 23 Feb 2017 09:30:00 -0800</pubDate> - <link>http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/</link> - <guid isPermaLink="true">http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/</guid> - - - <category>blog</category> - - </item> - - <item> - <title>Apache Kylin v1.6.0 æ£å¼åå¸</title> - <description><p>Apache Kylin社åºé常é«å ´å®£å¸Apache Kylin v1.6.0æ£å¼åå¸ã</p> +<h2 id="section">å®è£ </h2> -<p>Apache Kylinæ¯ä¸ä¸ªå¼æºçåå¸å¼åæå¼æï¼æä¾Hadoopä¹ä¸çSQLæ¥è¯¢æ¥å£åå¤ç»´åæï¼OLAPï¼è½åï¼æ¯æå¯¹è¶ å¤§è§æ¨¡æ°æ®è¿è¡ç§çº§æ¥è¯¢ã</p> +<p>ææ¶ v2.0.0 beta æ æ³ä» v1.6.0 ç´æ¥å级ï¼å¿ éå ¨æ°å®è£ ãè¿æ¯ç±äºæ°çæ¬çå æ°æ®å¹¶ä¸ååå ¼å®¹ãå¥½å¨ Cube æ°æ®æ¯ååå ¼å®¹çï¼å æ¤åªéè¦å¼åä¸ä¸ªå æ°æ®è½¬æ¢å·¥å ·ï¼å°±è½å¨ä¸ä¹ çå°æ¥å®ç°å¹³æ»å级ãæä»¬æ£å¨ä¸ºæ¤åªåã</p> -<p>Apache Kylin v1.6.0带æ¥äºæ´å¯é æ´æäºç®¡ççä»Apache Kafkaæµä¸ç´æ¥æå»ºCubeçè½åï¼ä½¿å¾ç¨æ·å¯ä»¥å¨æ´å¤åºæ¯ä¸æ´èªç¶å°è¿è¡æ°æ®åæï¼ä½¿å¾æ°æ®ä»äº§çå°è¢«æ£ç´¢å°çå»¶è¿ï¼ä»ä»¥åçä¸å¤©ææ°å°æ¶ï¼éä½å°æ°åéã Apache Kylin 1.6.0ä¿®å¤äº102个issueï¼å æ¬ç¼ºé·ï¼æ¹è¿åæ°åè½ï¼è¯¦è§<a href="https://kylin.apache.org/docs16/release_notes.html">release notes</a>.</p> +<hr /> -<h2 id="section">主è¦åå</h2> +<h2 id="tpc-h-">è¿è¡ TPC-H åºåæµè¯</h2> -<ul> - <li>å¯ä¼¸ç¼©çæµå¼Cubeæå»º <a href="https://issues.apache.org/jira/browse/KYLIN-1726">KYLIN-1726</a></li> - <li>TopNæ§è½å¢å¼º <a href="https://issues.apache.org/jira/browse/KYLIN-1917">KYLIN-1917</a></li> - <li>æ¯æKafkaçåµå ¥æ ¼å¼çJSONæ¶æ¯ <a href="https://issues.apache.org/jira/browse/KYLIN-1919">KYLIN-1919</a></li> - <li>å¯é 忥hiveè¡¨æ¨¡å¼æ´æ¹ <a href="https://issues.apache.org/jira/browse/KYLIN-2012">KYLIN-2012</a></li> - <li>æ¯ææ´å¤Kafkaæ¶æ¯çæ¶é´æ³æ ¼å¼ <a href="https://issues.apache.org/jira/browse/KYLIN-2054">KYLIN-2054</a></li> - <li>å¢å Booleanç¼ç <a href="https://issues.apache.org/jira/browse/KYLIN-2055">KYLIN-2055</a></li> - <li>æ¯æå¤segmentå¹¶è¡æå»ºï¼åå¹¶ï¼å·æ° <a href="https://issues.apache.org/jira/browse/KYLIN-2070">KYLIN-2070</a></li> - <li>æ¯ææ´æ°æµå¼è¡¨æ¨¡å¼åé ç½®çä¿®æ¹ <a href="https://issues.apache.org/jira/browse/KYLIN-2082">KYLIN-2082</a></li> -</ul> +<p>å¨ Apache Kylin ä¸è¿è¡ TPC-H çå ·ä½æ¥éª¤: <a href="https://github.com/Kyligence/kylin-tpch">https://github.com/Kyligence/kylin-tpch</a></p> -<p>ä¸è½½Apache Kylin v1.6.0æºä»£ç åäºè¿å¶å®è£ å ï¼è¯·è®¿é®<a href="http://kylin.apache.org/cn/download/">ä¸è½½</a>页é¢.</p> +<hr /> -<p><strong>å级</strong></p> +<h2 id="spark-">Spark æå»ºå¼æ</h2> -<p>åè§<a href="/docs16/howto/howto_upgrade.html">å级æå</a>.</p> +<p>Apache Kylin v2.0.0 å¼å ¥äºä¸ä¸ªå ¨æ°çåºäº Apache Spark çæå»ºå¼æãå®å¯ç¨äºæ¿æ¢åæç MapReduce æå»ºå¼æã忥æµè¯æ¾ç¤º Cube çæå»ºæ¶é´ä¸è¬è½ç¼©çå°åå ç 50% å·¦å³ã</p> -<p><strong>æ¯æ</strong></p> +<p>å¯ç¨ Spark æå»ºå¼æï¼è¯·åè<a href="/docs16/tutorial/cube_spark.html">è¿ç¯ææ¡£</a>.</p> -<p>å级å使ç¨è¿ç¨ä¸æä»»ä½é®é¢ï¼è¯·ï¼<br /> -æäº¤è³KylinçJIRA: <a href="https://issues.apache.org/jira/browse/KYLIN/">https://issues.apache.org/jira/browse/KYLIN/</a><br /> -æè <br /> -åéé®ä»¶å°Apache Kyliné®ä»¶å表: <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;</a></p> +<hr /> <p><em>æè°¢æ¯ä¸ä½æåçåä¸åè´¡ç®!</em></p> </description> - <pubDate>Sun, 04 Dec 2016 13:00:00 -0800</pubDate> - <link>http://kylin.apache.org/cn/blog/2016/12/04/release-v1.6.0/</link> - <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2016/12/04/release-v1.6.0/</guid> + <pubDate>Sat, 25 Feb 2017 12:00:00 -0800</pubDate> + <link>http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</link> + <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</guid> <category>blog</category>