Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1841530&r1=1841529&r2=1841530&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Fri Sep 21 03:31:15 2018
@@ -19,11 +19,146 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Wed, 19 Sep 2018 06:59:19 -0700</pubDate>
-    <lastBuildDate>Wed, 19 Sep 2018 06:59:19 -0700</lastBuildDate>
+    <pubDate>Thu, 20 Sep 2018 20:21:48 -0700</pubDate>
+    <lastBuildDate>Thu, 20 Sep 2018 20:21:48 -0700</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
+        <title>Apache Kylin v2.5.0 正式发布</title>
+        <description>&lt;p&gt;近日Apache Kylin 社区很高å…
´åœ°å®£å¸ƒï¼ŒApache Kylin 2.5.0 正式发布。&lt;/p&gt;
+
+&lt;p&gt;Apache Kylin 
是一个开源的分布式分析引擎,旨在为极大数据集提供 SQL 
接口和多维分析(OLAP)的能力。&lt;/p&gt;
+
+&lt;p&gt;这是继2.4.0 后的一个新功能版本。该版本引å…
¥äº†å¾ˆå¤šæœ‰ä»·å€¼çš„æ”¹è¿›ï¼Œå®Œæ•´çš„æ”¹åŠ¨åˆ—è¡¨è¯·å‚è§&lt;a 
href=&quot;https://kylin.apache.org/docs/release_notes.html&quot;&gt;release 
notes&lt;/a&gt;;这里挑一些主要改进做说明:&lt;/p&gt;
+
+&lt;h3 id=&quot;all-in-spark--cubing-&quot;&gt;All-in-Spark 的 Cubing 
引擎&lt;/h3&gt;
+&lt;p&gt;Kylin 的 Spark 引擎将使用 Spark 运行 cube 
计算中的所有分布式作业,包括获取各个维度的不同值,将 
cuboid 文件转换为 HBase HFile,合并 segment,合并词å…
¸ç­‰ã€‚默认的 Spark é…
ç½®ä¹Ÿç»è¿‡ä¼˜åŒ–,使得用户可以获得开箱即用的体验。相å…
³å¼€å‘任务是 KYLIN-3427, KYLIN-3441, KYLIN-3442.&lt;/p&gt;
+
+&lt;p&gt;Spark 任务管理也有所改进:一旦 Spark 
任务开始运行,您就可以在Web控制台上获得作业链接;如果您丢弃该作业,Kylin
 将立刻终止 Spark 作业以及时释放资源;如果重新启动 
Kylin,它可以从上一个作业恢复,而不是重新提交新作业.&lt;/p&gt;
+
+&lt;h3 id=&quot;mysql--kylin-&quot;&gt;MySQL 做 Kylin å…
ƒæ•°æ®çš„存储&lt;/h3&gt;
+&lt;p&gt;在过去,HBase 是 Kylin 元数据存储的唯一选择。 
在某些情况下 HBase不适用,例如使用多个 HBase 集群来为 Kylin 
提供跨区域的高可用,这里复制的 HBase 
集群是只读的,所以不能做元数据存储。现在我们引入了 
MySQL Metastore 
以满足这种需求。此功能现在处于测试阶段。更多内容参见 
KYLIN-3488。&lt;/p&gt;
+
+&lt;h3 id=&quot;hybrid-model-&quot;&gt;Hybrid model 图形界面&lt;/h3&gt;
+&lt;p&gt;Hybrid 是一种用于组装多个 cube 的高级模型。 
它可用于满足 cube 的 schema 要发生改变的情
况。这个功能过去没有图形界面,因
此只有一小部分用户知道它。现在我们在 Web 
界面上开启了它,以便更多用户可以尝试。&lt;/p&gt;
+
+&lt;h3 id=&quot;cube-planner&quot;&gt;默认开启 Cube planner&lt;/h3&gt;
+&lt;p&gt;Cube planner 可以极大地优化 cube 结构,减少构建的 
cuboid 
数量,从而节省计算/存储资源并提高查询性能。它是在v2.3中引å
…¥çš„,但默认情
况下没有开启。为了让更多用户看到并尝试它,我们默认在v2.5中启用它。
 算法将在第一次构建 segment 的时候,根据数据统计自动优化 
cuboid 集合.&lt;/p&gt;
+
+&lt;h3 id=&quot;segment-&quot;&gt;改进的 Segment 剪枝&lt;/h3&gt;
+&lt;p&gt;Segment(分区)修剪可以有效地减少磁盘和网络I / 
O,因此大大提高了查询性能。 过去,Kylin 只按分区列 
(partition date column) 的值进行 segment 的修剪。 
如果查询中没有将分区列作为过滤条件,那么修剪将不起作用,会扫描所有segment。.&lt;br
 /&gt;
+现在从v2.5开始,Kylin 将在 segment 
级别记录每个维度的最小/最大值。 在扫描 segment 
之前,会将查询的条件与最小/最大索引进行比较。 
如果不匹配,将跳过该 segment。 
检查KYLIN-3370了解更多信息。&lt;/p&gt;
+
+&lt;h3 id=&quot;yarn-&quot;&gt;在 YARN 上合并字典&lt;/h3&gt;
+&lt;p&gt;当 segment 合并时,它们的词å…
¸ä¹Ÿéœ€è¦åˆå¹¶ã€‚在过去,字典合并发生在 Kylin 的 JVM 
中,这需要使用大量的本地内存和 CPU 资源。 在极端情
况下(如果有几个并发作业),可能会导致 Kylin 进程崩溃。 
因此,一些用户不得不为 Kylin 任务节点分配更多内
存,或运行多个任务节点以平衡工作负载。&lt;br /&gt;
+现在从v2.5开始,Kylin 将把这项任务提交给 Hadoop MapReduce 和 
Spark,这样就可以解决这个瓶颈问题。 
查看KYLIN-3471了解更多信息.&lt;/p&gt;
+
+&lt;h3 id=&quot;cube-&quot;&gt;改进使用全局字典的 cube 
构建性能&lt;/h3&gt;
+&lt;p&gt;全局字典 (Global Dictionary) 是 bitmap 精确去重计数的必
要条件。如果去重列具有非常高的基数,则 GD 
可能非常大。在 cube 构建阶段,Kylin 需要通过 GD 
将非整数值转换为整数。尽管 GD 
已被分成多个切片,可以分开加载到内
存,但是由于去重列的值是乱序的。Kylin 需要反复载å…
¥å’Œè½½å‡º(swap in/out)切片,这会导致构建任务非常缓慢。&lt;br 
/&gt;
+该增强功能引入了一个新步骤,为每个数据块从全局字å…
¸ä¸­æž„建一个缩小的字典。 随后每个任务只需要加
载缩小的字典,从而避免频繁的载å…
¥å’Œè½½å‡ºã€‚性能可以比以前快3倍。查看 KYLIN-3491 
了解更多信息.&lt;/p&gt;
+
+&lt;h3 id=&quot;topn-count-distinct--cube-&quot;&gt;改进含 TOPN, COUNT 
DISTINCT 的 cube 大小的估计&lt;/h3&gt;
+&lt;p&gt;Cube 的大小在构建时是预先估计的,并被后续几
个步骤使用,例如决定 MR / Spark 作业的分区数,计算 HBase 
region 切割等。它的准确与否会对构建性能产生很大影响。 
当存在 COUNT DISTINCT,TOPN 的度量时候,因
为它们的大小是灵活的,因
此估计值可能跟真实值有很大偏差。 
在过去,用户需要调整若干个参数以使尺寸估计更接近实际
尺寸,这对普通用户有点困难。&lt;br /&gt;
+现在,Kylin 将æ 
¹æ®æ”¶é›†çš„统计信息自动调整大小估计。这可以使估计值与实é™
…大小更接近。查看 KYLIN-3453 了解更多信息。&lt;/p&gt;
+
+&lt;h3 id=&quot;hadoop-30hbase-20&quot;&gt;支持Hadoop 3.0/HBase 
2.0&lt;/h3&gt;
+&lt;p&gt;Hadoop 3和 HBase 2开始被许多用户采用。现在 Kylin 
提供使用新的 Hadoop 和 HBase API 编译的新二进制包
。我们已经在 Hortonworks HDP 3.0 和 Cloudera CDH 6.0 
上进行了测试&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;下载&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;要下载Apache Kylin v2.5.0源代码或二进制包,请访问&lt;a 
href=&quot;http://kylin.apache.org/download&quot;&gt;下载页面&lt;/a&gt; 
.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;升级&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;参考&lt;a 
href=&quot;/docs/howto/howto_upgrade.html&quot;&gt;升级指南&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;反馈&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;如果您遇到问题或疑问,请发送邮件至 Apache Kylin dev 
或 user 邮件列表:d...@kylin.apache.org,u...@kylin.apache.org; 
在发送之前,请确保您已通过发送电子邮件至 
dev-subscr...@kylin.apache.org 或 user-subscr...@kylin.apache.org订阅
了邮件列表。&lt;/p&gt;
+
+&lt;p&gt;&lt;em&gt;非常感谢所有贡献Apache 
Kylin的朋友!&lt;/em&gt;&lt;/p&gt;
+</description>
+        <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate>
+        <link>http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/cn/blog/2018/09/20/release-v2.5.0/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
+        <title>Apache Kylin v2.5.0 Release Announcement</title>
+        <description>&lt;p&gt;The Apache Kylin community is pleased to 
announce the release of Apache Kylin v2.5.0.&lt;/p&gt;
+
+&lt;p&gt;Apache Kylin is an open source Distributed Analytics Engine designed 
to provide SQL interface and multi-dimensional analysis (OLAP) on Big Data 
supporting extremely large datasets.&lt;/p&gt;
+
+&lt;p&gt;This is a major release after 2.4.0, including many enhancements. All 
of the changes can be found in the &lt;a 
href=&quot;https://kylin.apache.org/docs/release_notes.html&quot;&gt;release 
notes&lt;/a&gt;. Here just highlight the major ones:&lt;/p&gt;
+
+&lt;h3 id=&quot;the-all-in-spark-cubing-engine&quot;&gt;The all-in-Spark 
cubing engine&lt;/h3&gt;
+&lt;p&gt;Now Kylin’s Spark engine will run all distributed jobs in Spark, 
including fetch distinct dimension values, converting cuboid files to HBase 
HFile, merging segments, merging dictionaries, etc. The default configurations 
are tuned so the user can get an out-of-box experience. The overall performance 
with the previous version is close, but we assume Spark has more room to 
improve. The related tasks are KYLIN-3427, KYLIN-3441, KYLIN-3442.&lt;/p&gt;
+
+&lt;p&gt;There are also improvements in the job management. Now you can get 
the job link on the web console once Spark starts to run. If you discard the 
job, Kylin will kill the Spark job to release the resource in time. If Kylin is 
restarted, it can resume from the previous job instead of resubmitting a new 
job.  &lt;br /&gt;
+### MySQL as Kylin metastore&lt;br /&gt;
+In the past, HBase is the only option for Kylin metadata. In some cases, this 
is not applicable, for example using replicated HBase cluster for Kylin’s HA 
(the replicated HBase is read only). Now we introduce the MySQL metastore to 
fulfill such need. This function is in beta now. Check KYLIN-3488 for 
more.&lt;/p&gt;
+
+&lt;h3 id=&quot;hybrid-model-web-gui&quot;&gt;Hybrid model web GUI&lt;/h3&gt;
+&lt;p&gt;Hybrid is an advanced model for compositing multiple Cubes. It can be 
used for the Cube schema change issue. This function had no GUI in the past so 
only a small portion of Kylin users know it. Now we added the web GUI for it so 
everyone can try it.&lt;/p&gt;
+
+&lt;h3 id=&quot;enable-cube-planner-by-default&quot;&gt;Enable Cube planner by 
default&lt;/h3&gt;
+&lt;p&gt;The Cube planner can greatly optimize the cube structure, save the 
computing/storage resources and improve the query performance. It was 
introduced in v2.3 but is disabled by default. In order to let more users 
seeing and trying it, we enable it by default in v2.5. The algorithm will 
automatically optimize the cube by your data statistics on the first 
build.&lt;/p&gt;
+
+&lt;h3 id=&quot;advanced-segment-pruning&quot;&gt;Advanced segment 
pruning&lt;/h3&gt;
+&lt;p&gt;Segment (partition) pruning can efficiently reduce the disk and 
network I/O, so to greatly improve the query performance. In the past, Kylin 
only prunes segments by the partition column’s value. If the query doesn’t 
have the partition column as the filtering condition, the pruning won’t work, 
all segments will be scanned.&lt;/p&gt;
+
+&lt;p&gt;Now from v2.5, Kylin will record the min/max value for EVERY 
dimension at the segment level. Before scanning a segment, it will compare the 
query’s conditions with the min/max index. If not matched, the segment will 
be skipped. Check KYLIN-3370 for more.&lt;/p&gt;
+
+&lt;h3 id=&quot;merge-dictionary-on-yarn&quot;&gt;Merge dictionary on 
YARN&lt;/h3&gt;
+
+&lt;p&gt;When segments get merged, their dictionaries also need to be merged. 
In the past, the merging happens in Kylin’s JVM, which takes a lot of memory 
and CPU resources. In extreme case (if you have a couple of concurrent jobs) it 
may crash the Kylin process. Since this, some users have to allocate much more 
memory to Kylin job node or run multiple job nodes to balance the 
workload.&lt;/p&gt;
+
+&lt;p&gt;Now from v2.5, Kylin will submit this task to Hadoop MR or Spark, so 
this bottleneck can be solved. Check KYLIN-3471 for more.&lt;/p&gt;
+
+&lt;h3 
id=&quot;improve-building-performance-for-reading-global-dictionary&quot;&gt;Improve
 building performance for reading Global Dictionary&lt;/h3&gt;
+
+&lt;p&gt;Global Dictionary is a must for bitmap count distinct. The GD can be 
very large if the column has a very high cardinality. In the cube building 
phase, Kylin need to translate the non-integer values to integers by the GD. 
Although the GD has been split into several slices, the values are often 
scrambled. Kylin needs swap in/out the slices into memory repeatedly, which 
causes the building slowly.&lt;/p&gt;
+
+&lt;p&gt;The enhancement introduces a new step to build a shrunken dictionary 
for each data block. Then each task only loads the shrunken dictionary, which 
is quite small, so there is no swap in/out any more in the cubing step. Then 
the performance can be 3x faster than before. Check KYLIN-3491 for 
more.&lt;/p&gt;
+
+&lt;h3 
id=&quot;improved-cube-size-estimation-for-topn-count-distinct&quot;&gt;Improved
 cube size estimation for TOPN, COUNT DISTINCT&lt;/h3&gt;
+
+&lt;p&gt;Cube size estimation is used in several steps, such as decides the 
MR/Spark job partition number, calculates the HBase region number etc. It will 
affect the build performance much. The estimation can be wild when there is 
COUNT DISTINCT, TOPN measures because their size is flexible. The incorrect 
estimation may cause too many data partitions and then too many tasks. In the 
past, users need to tune several parameters to make the size estimation more 
close to real size, that is hard to do.&lt;/p&gt;
+
+&lt;p&gt;Now Kylin will correct the size estimation automatically based on the 
collected data statistics. This can make the estimation much closer with the 
real size than before. Check KYLIN-3453 for more.&lt;/p&gt;
+
+&lt;h3 id=&quot;hadoop-30hbase-20-support&quot;&gt;Hadoop 3.0/HBase 2.0 
support&lt;/h3&gt;
+
+&lt;p&gt;Hadoop 3 and HBase 2 starts to be adopted by many users. Now we 
provide new binary packages compiled with the new Hadoop and HBase API. We 
tested them on Hortonworks HDP 3.0 and Cloudera CDH 6.0.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Download&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;To download Apache Kylin v2.5.0 source code or binary package, visit 
the &lt;a 
href=&quot;http://kylin.apache.org/download&quot;&gt;download&lt;/a&gt; 
page.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Upgrade&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;Follow the &lt;a 
href=&quot;/docs/howto/howto_upgrade.html&quot;&gt;upgrade 
guide&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Feedback&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;If you face issue or question, please send mail to Apache Kylin dev 
or user mailing list: d...@kylin.apache.org , u...@kylin.apache.org; Before 
sending, please make sure you have subscribed the mailing list by dropping an 
email to dev-subscr...@kylin.apache.org or 
user-subscr...@kylin.apache.org.&lt;/p&gt;
+
+&lt;p&gt;&lt;em&gt;Great thanks to everyone who 
contributed!&lt;/em&gt;&lt;/p&gt;
+</description>
+        <pubDate>Thu, 20 Sep 2018 13:00:00 -0700</pubDate>
+        <link>http://kylin.apache.org/blog/2018/09/20/release-v2.5.0/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2018/09/20/release-v2.5.0/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Use Star Schema Benchmark for Apache Kylin</title>
         <description>&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
 
@@ -825,61 +960,6 @@ kylin.engine.spark.rdd-partition-cut-mb=
       </item>
     
       <item>
-        <title>Apache Kylin v2.0.0 beta 发布</title>
-        <description>&lt;p&gt;Apache Kylin社区非常高兴地宣布 &lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;v2.0.0 beta 
package&lt;/a&gt; 已经可以下载并测试了。&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;下载链接: &lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;http://kylin.apache.org/cn/download/&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;源代码: 
https://github.com/apache/kylin/tree/kylin-2.0.0-beta&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;自从v1.6.0版本发布已经2个多月了。这段时间里,整个社区协力开发完成了一系列重大的功能,希望能将Apache
 Kylin提升到一个新的高度。&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;支持雪花模型 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1875&quot;&gt;KYLIN-1875&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;支持 TPC-H 查询 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2467&quot;&gt;KYLIN-2467&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Spark 构建引擎 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2331&quot;&gt;KYLIN-2331&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Job Engine 高可用性 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2006&quot;&gt;KYLIN-2006&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Percentile 度量 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2396&quot;&gt;KYLIN-2396&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;在 Cloud 上通过测试 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2351&quot;&gt;KYLIN-2351&lt;/a&gt;)&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;非常欢迎大家下载并测试 v2.0.0 
beta。您的反馈对我们非常重要,请发邮件到 &lt;a 
href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;。&lt;/p&gt;
-
-&lt;hr /&gt;
-
-&lt;h2 id=&quot;section&quot;&gt;安装&lt;/h2&gt;
-
-&lt;p&gt;暂时 v2.0.0 beta 无法从 v1.6.0 直接升级,必需全新安装
。这是由于新版本的元数据并不向前兼容。好在 Cube 
数据是向前兼容的,因此只需要开发一个元数据转换工å…
·ï¼Œå°±èƒ½åœ¨ä¸ä¹…
的将来实现平滑升级。我们正在为此努力。&lt;/p&gt;
-
-&lt;hr /&gt;
-
-&lt;h2 id=&quot;tpc-h-&quot;&gt;运行 TPC-H 基准测试&lt;/h2&gt;
-
-&lt;p&gt;在 Apache Kylin 上运行 TPC-H 的具体步骤: &lt;a 
href=&quot;https://github.com/Kyligence/kylin-tpch&quot;&gt;https://github.com/Kyligence/kylin-tpch&lt;/a&gt;&lt;/p&gt;
-
-&lt;hr /&gt;
-
-&lt;h2 id=&quot;spark-&quot;&gt;Spark 构建引擎&lt;/h2&gt;
-
-&lt;p&gt;Apache Kylin v2.0.0 引入了一个全新的基于 Apache Spark 
的构建引擎。它可用于替换原有的 MapReduce 
构建引擎。初步测试显示 Cube 的构建时间一般能缩短到原å…
ˆçš„ 50% 左右。&lt;/p&gt;
-
-&lt;p&gt;启用 Spark 构建引擎,请参考&lt;a 
href=&quot;/docs16/tutorial/cube_spark.html&quot;&gt;这篇文档&lt;/a&gt;.&lt;/p&gt;
-
-&lt;hr /&gt;
-
-&lt;p&gt;&lt;em&gt;感谢每一位朋友的参与和贡献!&lt;/em&gt;&lt;/p&gt;
-</description>
-        <pubDate>Sat, 25 Feb 2017 12:00:00 -0800</pubDate>
-        
<link>http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</guid>
-        
-        
-        <category>blog</category>
-        
-      </item>
-    
-      <item>
         <title>Apache Kylin v2.0.0 Beta Announcement</title>
         <description>&lt;p&gt;The Apache Kylin community is pleased to 
announce the &lt;a href=&quot;http://kylin.apache.org/download/&quot;&gt;v2.0.0 
beta package&lt;/a&gt; is ready for download and test.&lt;/p&gt;
 
@@ -935,111 +1015,54 @@ kylin.engine.spark.rdd-partition-cut-mb=
       </item>
     
       <item>
-        <title>By-layer Spark Cubing</title>
-        <description>&lt;p&gt;Before v2.0, Apache Kylin uses Hadoop MapReduce 
as the framework to build Cubes over huge dataset. The MapReduce framework is 
simple, stable and can fulfill Kylin’s need very well except the performance. 
In order to get better performance, we introduced the “fast cubing” 
algorithm in Kylin v1.5, tries to do as much as possible aggregations at map 
side within memory, so to avoid the disk and network I/O; but not all data 
models can benefit from it, and it still runs on MR which means on-disk sorting 
and shuffling.&lt;/p&gt;
-
-&lt;p&gt;Now Spark comes; Apache Spark is an open-source cluster-computing 
framework, which provides programmers with an application programming interface 
centered on a data structure called RDD; it runs in-memory on the cluster, this 
makes repeated access to the same data much faster. Spark provides flexible and 
fancy APIs. You are not tied to Hadoop’s MapReduce two-stage 
paradigm.&lt;/p&gt;
-
-&lt;p&gt;Before introducing how calculate Cube with Spark, let’s see how 
Kylin do that with MR; Figure 1 illustrates how a 4-dimension Cube get 
calculated with the classic “by-layer” algorithm: the first round MR 
aggregates the base (4-D) cuboid from source data; the second MR aggregates on 
the base cuboid to get the 3-D cuboids; With N+1 round MR all layers’ cuboids 
get calculated.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/spark-mr-layer.png&quot; 
alt=&quot;MapReduce Cubing by Layer&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;The “by-layer” Cubing divides a big task into a couple steps, and 
each step bases on the previous step’s output, so it can reuse the previous 
calculation and also avoid calculating from very beginning when there is a 
failure in between. These makes it as a reliable algorithm. When moving to 
Spark, we decide to keep this algorithm, that’s why we call this feature as 
“By layer Spark Cubing”.&lt;/p&gt;
-
-&lt;p&gt;As we know, RDD (Resilient Distributed Dataset) is a basic concept in 
Spark. A collection of N-Dimension cuboids can be well described as an RDD, a 
N-Dimension Cube will have N+1 RDD. These RDDs have the parent/child 
relationship as the parent can be used to generate the children. With the 
parent RDD cached in memory, the child RDD’s generation can be much efficient 
than reading from disk. Figure 2 describes this process.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/spark-cubing-layer.png&quot; 
alt=&quot;Spark Cubing by Layer&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 3 is the DAG of Cubing in Spark, it illustrates the process in 
detail: In “Stage 5”, Kylin uses a HiveContext to read the intermediate 
Hive table, and then do a “map” operation, which is an one to one map, to 
encode the origin values into K-V bytes. On complete Kylin gets an intermediate 
encoded RDD. In “Stage 6”, the intermediate RDD is aggregated with a 
“reduceByKey” operation to get RDD-1, which is the base cuboid. Nextly, do 
an “flatMap” (one to many map) on RDD-1, because the base cuboid has N 
children cuboids. And so on, all levels’ RDDs get calculated. These RDDs will 
be persisted to distributed file system on complete, but be cached in memory 
for next level’s calculation. When child be generated, it will be removed 
from cache.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/spark-dag.png&quot; alt=&quot;DAG of 
Spark Cubing&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;We did a test to see how much performance improvement can gain from 
Spark:&lt;/p&gt;
-
-&lt;p&gt;Environment&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;4 nodes Hadoop cluster; each node has 28 GB RAM and 12 
cores;&lt;/li&gt;
-  &lt;li&gt;YRAN has 48GB RAM and 30 cores in total;&lt;/li&gt;
-  &lt;li&gt;CDH 5.8, Apache Kylin 2.0 beta.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Spark&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Spark 1.6.3 on YARN&lt;/li&gt;
-  &lt;li&gt;6 executors, each has 4 cores, 4GB +1GB (overhead) 
memory&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Test Data&lt;/p&gt;
+        <title>Apache Kylin v2.0.0 beta 发布</title>
+        <description>&lt;p&gt;Apache Kylin社区非常高兴地宣布 &lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;v2.0.0 beta 
package&lt;/a&gt; 已经可以下载并测试了。&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;Airline data, total 160 million rows&lt;/li&gt;
-  &lt;li&gt;Cube: 10 dimensions, 5 measures (SUM)&lt;/li&gt;
+  &lt;li&gt;下载链接: &lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;http://kylin.apache.org/cn/download/&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;源代码: 
https://github.com/apache/kylin/tree/kylin-2.0.0-beta&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;Test Scenarios&lt;/p&gt;
+&lt;p&gt;自从v1.6.0版本发布已经2个多月了。这段时间里,整个社区协力开发完成了一系列重大的功能,希望能将Apache
 Kylin提升到一个新的高度。&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;Build the cube at different source data level: 3 million, 50 
million and 160 million source rows; Compare the build time with MapReduce (by 
layer) and Spark. No compression enabled.&lt;br /&gt;
-The time only cover the building cube step, not including data preparations 
and subsequent steps.&lt;/li&gt;
+  &lt;li&gt;支持雪花模型 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1875&quot;&gt;KYLIN-1875&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;支持 TPC-H 查询 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2467&quot;&gt;KYLIN-2467&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Spark 构建引擎 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2331&quot;&gt;KYLIN-2331&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Job Engine 高可用性 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2006&quot;&gt;KYLIN-2006&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;Percentile 度量 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2396&quot;&gt;KYLIN-2396&lt;/a&gt;)&lt;/li&gt;
+  &lt;li&gt;在 Cloud 上通过测试 (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2351&quot;&gt;KYLIN-2351&lt;/a&gt;)&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;&lt;img src=&quot;/images/blog/spark-mr-performance.png&quot; 
alt=&quot;Spark vs MR performance&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Spark is faster than MR in all the 3 scenarios, and overall it can 
reduce about half time in the cubing.&lt;/p&gt;
+&lt;p&gt;非常欢迎大家下载并测试 v2.0.0 
beta。您的反馈对我们非常重要,请发邮件到 &lt;a 
href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;。&lt;/p&gt;
 
-&lt;p&gt;Now you can download a 2.0.0 beta build from Kylin’s download page, 
and then follow this &lt;a 
href=&quot;https://kylin.apache.org/blog/2017/02/25/v2.0.0-beta-ready/&quot;&gt;post&lt;/a&gt;
 to build a cube with Spark engine. If you have any comments or inputs, please 
discuss in the community.&lt;/p&gt;
+&lt;hr /&gt;
 
-</description>
-        <pubDate>Thu, 23 Feb 2017 09:30:00 -0800</pubDate>
-        
<link>http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2017/02/23/by-layer-spark-cubing/</guid>
-        
-        
-        <category>blog</category>
-        
-      </item>
-    
-      <item>
-        <title>Apache Kylin v1.6.0 正式发布</title>
-        <description>&lt;p&gt;Apache Kylin社区非常高兴宣布Apache Kylin 
v1.6.0正式发布。&lt;/p&gt;
+&lt;h2 id=&quot;section&quot;&gt;安装&lt;/h2&gt;
 
-&lt;p&gt;Apache 
Kylin是一个开源的分布式分析引擎,提供Hadoop之上的SQL查询接口及多维分析(OLAP)能力,支持对è¶
…大规模数据进行秒级查询。&lt;/p&gt;
+&lt;p&gt;暂时 v2.0.0 beta 无法从 v1.6.0 直接升级,必需全新安装
。这是由于新版本的元数据并不向前兼容。好在 Cube 
数据是向前兼容的,因此只需要开发一个元数据转换工å…
·ï¼Œå°±èƒ½åœ¨ä¸ä¹…
的将来实现平滑升级。我们正在为此努力。&lt;/p&gt;
 
-&lt;p&gt;Apache Kylin v1.6.0带来了更可靠更易于管理的从Apache 
Kafka流中直接构建Cube的能力,使得用户可以在更多场景中更自然地进行数据分析,使得数据从产生到被检索到的延迟,从以前的一天或数小时,降低到数分钟。
 Apache Kylin 1.6.0修复了102个issue,包
括缺陷,改进和新功能,详见&lt;a 
href=&quot;https://kylin.apache.org/docs16/release_notes.html&quot;&gt;release 
notes&lt;/a&gt;.&lt;/p&gt;
+&lt;hr /&gt;
 
-&lt;h2 id=&quot;section&quot;&gt;主要变化&lt;/h2&gt;
+&lt;h2 id=&quot;tpc-h-&quot;&gt;运行 TPC-H 基准测试&lt;/h2&gt;
 
-&lt;ul&gt;
-  &lt;li&gt;可伸缩的流式Cube构建 &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1726&quot;&gt;KYLIN-1726&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;TopN性能增强 &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1917&quot;&gt;KYLIN-1917&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;支持Kafka的嵌入格式的JSON消息 &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1919&quot;&gt;KYLIN-1919&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;可靠同步hive表模式更改 &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2012&quot;&gt;KYLIN-2012&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;支持更多Kafka消息的时间戳格式 &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2054&quot;&gt;KYLIN-2054&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;增加Boolean编码 &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2055&quot;&gt;KYLIN-2055&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;支持多segment并行构建/合并/刷新 &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2070&quot;&gt;KYLIN-2070&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;支持更新流式表模式和配置的修改 &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-2082&quot;&gt;KYLIN-2082&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
+&lt;p&gt;在 Apache Kylin 上运行 TPC-H 的具体步骤: &lt;a 
href=&quot;https://github.com/Kyligence/kylin-tpch&quot;&gt;https://github.com/Kyligence/kylin-tpch&lt;/a&gt;&lt;/p&gt;
 
-&lt;p&gt;下载Apache Kylin v1.6.0源代码及二进制安装包
,请访问&lt;a 
href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;下载&lt;/a&gt;页面.&lt;/p&gt;
+&lt;hr /&gt;
 
-&lt;p&gt;&lt;strong&gt;升级&lt;/strong&gt;&lt;/p&gt;
+&lt;h2 id=&quot;spark-&quot;&gt;Spark 构建引擎&lt;/h2&gt;
 
-&lt;p&gt;参见&lt;a 
href=&quot;/docs16/howto/howto_upgrade.html&quot;&gt;升级指南&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;Apache Kylin v2.0.0 引入了一个全新的基于 Apache Spark 
的构建引擎。它可用于替换原有的 MapReduce 
构建引擎。初步测试显示 Cube 的构建时间一般能缩短到原å…
ˆçš„ 50% 左右。&lt;/p&gt;
 
-&lt;p&gt;&lt;strong&gt;支持&lt;/strong&gt;&lt;/p&gt;
+&lt;p&gt;启用 Spark 构建引擎,请参考&lt;a 
href=&quot;/docs16/tutorial/cube_spark.html&quot;&gt;这篇文档&lt;/a&gt;.&lt;/p&gt;
 
-&lt;p&gt;升级和使用过程中有任何问题,请:&lt;br /&gt;
-提交至Kylin的JIRA: &lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN/&quot;&gt;https://issues.apache.org/jira/browse/KYLIN/&lt;/a&gt;&lt;br
 /&gt;
-或者&lt;br /&gt;
-发送邮件到Apache Kylin邮件列表: &lt;a 
href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;&lt;/p&gt;
+&lt;hr /&gt;
 
 
&lt;p&gt;&lt;em&gt;感谢每一位朋友的参与和贡献!&lt;/em&gt;&lt;/p&gt;
 </description>
-        <pubDate>Sun, 04 Dec 2016 13:00:00 -0800</pubDate>
-        <link>http://kylin.apache.org/cn/blog/2016/12/04/release-v1.6.0/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/cn/blog/2016/12/04/release-v1.6.0/</guid>
+        <pubDate>Sat, 25 Feb 2017 12:00:00 -0800</pubDate>
+        
<link>http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/cn/blog/2017/02/25/v2.0.0-beta-ready/</guid>
         
         
         <category>blog</category>


Reply via email to