Author: lidong
Date: Wed Oct 19 06:02:59 2016
New Revision: 1765533

URL: http://svn.apache.org/viewvc?rev=1765533&view=rev
Log:
minor update on the blog

Modified:
    kylin/site/blog/2016/10/18/new-nrt-streaming/index.html
    kylin/site/feed.xml

Modified: kylin/site/blog/2016/10/18/new-nrt-streaming/index.html
URL: 
http://svn.apache.org/viewvc/kylin/site/blog/2016/10/18/new-nrt-streaming/index.html?rev=1765533&r1=1765532&r2=1765533&view=diff
==============================================================================
--- kylin/site/blog/2016/10/18/new-nrt-streaming/index.html (original)
+++ kylin/site/blog/2016/10/18/new-nrt-streaming/index.html Wed Oct 19 06:02:59 
2016
@@ -205,15 +205,15 @@
   </li>
 </ul>
 
-<p>To overcome these limitations, the Apache Kylin team developed the new 
streaming (<a 
href="https://issues.apache.org/jira/browse/KYLIN-1726";>KYLIN-1726</a>) with 
Kafka 0.10 API, it has been tested internally for some time, will release to 
public soon.</p>
+<p>To overcome these limitations, the Apache Kylin team developed the new 
streaming (<a 
href="https://issues.apache.org/jira/browse/KYLIN-1726";>KYLIN-1726</a>) with 
Kafka 0.10, it has been tested internally for some time, will release to public 
soon.</p>
 
-<p>The new design is a perfect implementation under Kylin 1.5’s 
“Plug-in” architecture: treat Kafka topic as a “Data Source” like Hive 
table, using an adapter to extract the data to HDFS; the next steps are almost 
the same as from Hive. Figure 1 is a high level architecture of the new 
design.</p>
+<p>The new design is a perfect implementation under Kylin 1.5’s 
“plug-in” architecture: treat Kafka topic as a “Data Source” like Hive 
table, using an adapter to extract the data to HDFS; the next steps are almost 
the same as other cubes. Figure 1 is a high level architecture of the new 
design.</p>
 
 <p><img src="/images/blog/new-streaming.png" alt="Kylin New Streaming 
Framework Architecture" /></p>
 
-<p>The adapter to read Kafka messages is modified from <a 
href="https://github.com/amient/kafka-hadoop-loader";>kafka-hadoop-loader</a>, 
which is open sourced under Apache License V2.0; it starts a mapper for each 
Kafka partition, reading and then saving the messages to HDFS; in next steps 
Kylin will be able to leverage existing framework like MR to do the processing, 
this makes the solution scalable and fault-tolerant.</p>
+<p>The adapter to read Kafka messages is modified from <a 
href="https://github.com/amient/kafka-hadoop-loader";>kafka-hadoop-loader</a>, 
the author Michal Harish open sourced it under Apache License V2.0; it starts a 
mapper for each Kafka partition, reading and then saving the messages to HDFS; 
so Kylin will be able to leverage existing framework like MR to do the 
processing, this makes the solution scalable and fault-tolerant.</p>
 
-<p>To overcome the “data loss” problem, Kylin adds the start/end offset 
information on each Cube segment, and then use the offsets as the partition 
value (no overlap is allowed); this ensures no data be lost and 1 message be 
consumed at most once. To let the late/early message can be queried, Cube 
segments allow overlap for the partition time dimension: Kylin will scan all 
segments which include the queried time. Figure 2 illurates this.</p>
+<p>To overcome the “data loss” limitation, Kylin adds the start/end offset 
information on each Cube segment, and then use the offsets as the partition 
value (no overlap allowed); this ensures no data be lost and 1 message be 
consumed at most once. To let the late/early message can be queried, Cube 
segments allow overlap for the partition time dimension: each segment has a 
“min” date/time and a “max” date/time; Kylin will scan all segments 
which matched with the queried time scope. Figure 2 illurates this.</p>
 
 <p><img src="/images/blog/offset-as-partition-value.png" alt="Use Offset to 
Cut Segments" /></p>
 
@@ -227,23 +227,25 @@
   <li>Add REST API to check and fill the segment holes</li>
 </ul>
 
-<p>The integration test result shows big improvements than the previous 
version:</p>
+<p>The integration test result is promising:</p>
 
 <ul>
   <li>Scalability: it can easily process up to hundreds of million records in 
one build;</li>
-  <li>Flexibility: trigger the build at any time with the frequency you want, 
e.g: every 5 minutes in day and every hour in night; Kylin manages the offsets 
so it can resume from the last position;</li>
-  <li>Stability: pretty stable, no OutOfMemory error;</li>
+  <li>Flexibility: you can trigger the build at any time, with the frequency 
you want; for example: every 5 minutes in day time but every hour in night 
time, and even pause when you need do a maintenance; Kylin manages the offsets 
so it can automatically continue from the last position;</li>
+  <li>Stability: pretty stable, no OutOfMemoryError;</li>
   <li>Management: user can check all jobs’ status through Kylin’s 
“Monitor” page or REST API;</li>
   <li>Build Performance: in a testing cluster (8 AWS instances to consume 
Twitter streams), 10 thousands arrives per second, define a 9-dimension cube 
with 3 measures; when build interval is 2 mintues, the job finishes in around 3 
minutes; if change interval to 5 mintues, build finishes in around 4 
minutes;</li>
 </ul>
 
-<p>Here are a couple of screenshots in this test:<br />
+<p>Here are a couple of screenshots in this test, we may compose it as a 
step-by-step tutorial in the future:<br />
 <img src="/images/blog/streaming-monitor.png" alt="Streaming Job Monitoring" 
/></p>
 
 <p><img src="/images/blog/streaming-adapter.png" alt="Streaming Adapter" /></p>
 
 <p><img src="/images/blog/streaming-twitter.png" alt="Streaming Twitter 
Sample" /></p>
 
+<p>In short, this is a more robust Near Real Time Streaming OLAP solution 
(compared with the previous version). Nextly, the Apache Kylin team will move 
toward a Real Time engine.</p>
+
   </article>
 
 </div>

Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1765533&r1=1765532&r2=1765533&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Wed Oct 19 06:02:59 2016
@@ -19,8 +19,8 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Tue, 18 Oct 2016 07:59:25 -0700</pubDate>
-    <lastBuildDate>Tue, 18 Oct 2016 07:59:25 -0700</lastBuildDate>
+    <pubDate>Wed, 19 Oct 2016 06:59:18 -0700</pubDate>
+    <lastBuildDate>Wed, 19 Oct 2016 06:59:18 -0700</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
@@ -44,15 +44,15 @@
   &lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;To overcome these limitations, the Apache Kylin team developed the 
new streaming (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1726&quot;&gt;KYLIN-1726&lt;/a&gt;)
 with Kafka 0.10 API, it has been tested internally for some time, will release 
to public soon.&lt;/p&gt;
+&lt;p&gt;To overcome these limitations, the Apache Kylin team developed the 
new streaming (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/KYLIN-1726&quot;&gt;KYLIN-1726&lt;/a&gt;)
 with Kafka 0.10, it has been tested internally for some time, will release to 
public soon.&lt;/p&gt;
 
-&lt;p&gt;The new design is a perfect implementation under Kylin 1.5’s 
“Plug-in” architecture: treat Kafka topic as a “Data Source” like Hive 
table, using an adapter to extract the data to HDFS; the next steps are almost 
the same as from Hive. Figure 1 is a high level architecture of the new 
design.&lt;/p&gt;
+&lt;p&gt;The new design is a perfect implementation under Kylin 1.5’s 
“plug-in” architecture: treat Kafka topic as a “Data Source” like Hive 
table, using an adapter to extract the data to HDFS; the next steps are almost 
the same as other cubes. Figure 1 is a high level architecture of the new 
design.&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/images/blog/new-streaming.png&quot; 
alt=&quot;Kylin New Streaming Framework Architecture&quot; /&gt;&lt;/p&gt;
 
-&lt;p&gt;The adapter to read Kafka messages is modified from &lt;a 
href=&quot;https://github.com/amient/kafka-hadoop-loader&quot;&gt;kafka-hadoop-loader&lt;/a&gt;,
 which is open sourced under Apache License V2.0; it starts a mapper for each 
Kafka partition, reading and then saving the messages to HDFS; in next steps 
Kylin will be able to leverage existing framework like MR to do the processing, 
this makes the solution scalable and fault-tolerant.&lt;/p&gt;
+&lt;p&gt;The adapter to read Kafka messages is modified from &lt;a 
href=&quot;https://github.com/amient/kafka-hadoop-loader&quot;&gt;kafka-hadoop-loader&lt;/a&gt;,
 the author Michal Harish open sourced it under Apache License V2.0; it starts 
a mapper for each Kafka partition, reading and then saving the messages to 
HDFS; so Kylin will be able to leverage existing framework like MR to do the 
processing, this makes the solution scalable and fault-tolerant.&lt;/p&gt;
 
-&lt;p&gt;To overcome the “data loss” problem, Kylin adds the start/end 
offset information on each Cube segment, and then use the offsets as the 
partition value (no overlap is allowed); this ensures no data be lost and 1 
message be consumed at most once. To let the late/early message can be queried, 
Cube segments allow overlap for the partition time dimension: Kylin will scan 
all segments which include the queried time. Figure 2 illurates this.&lt;/p&gt;
+&lt;p&gt;To overcome the “data loss” limitation, Kylin adds the start/end 
offset information on each Cube segment, and then use the offsets as the 
partition value (no overlap allowed); this ensures no data be lost and 1 
message be consumed at most once. To let the late/early message can be queried, 
Cube segments allow overlap for the partition time dimension: each segment has 
a “min” date/time and a “max” date/time; Kylin will scan all segments 
which matched with the queried time scope. Figure 2 illurates this.&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/images/blog/offset-as-partition-value.png&quot; 
alt=&quot;Use Offset to Cut Segments&quot; /&gt;&lt;/p&gt;
 
@@ -66,22 +66,24 @@
   &lt;li&gt;Add REST API to check and fill the segment holes&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;The integration test result shows big improvements than the previous 
version:&lt;/p&gt;
+&lt;p&gt;The integration test result is promising:&lt;/p&gt;
 
 &lt;ul&gt;
   &lt;li&gt;Scalability: it can easily process up to hundreds of million 
records in one build;&lt;/li&gt;
-  &lt;li&gt;Flexibility: trigger the build at any time with the frequency you 
want, e.g: every 5 minutes in day and every hour in night; Kylin manages the 
offsets so it can resume from the last position;&lt;/li&gt;
-  &lt;li&gt;Stability: pretty stable, no OutOfMemory error;&lt;/li&gt;
+  &lt;li&gt;Flexibility: you can trigger the build at any time, with the 
frequency you want; for example: every 5 minutes in day time but every hour in 
night time, and even pause when you need do a maintenance; Kylin manages the 
offsets so it can automatically continue from the last position;&lt;/li&gt;
+  &lt;li&gt;Stability: pretty stable, no OutOfMemoryError;&lt;/li&gt;
   &lt;li&gt;Management: user can check all jobs’ status through Kylin’s 
“Monitor” page or REST API;&lt;/li&gt;
   &lt;li&gt;Build Performance: in a testing cluster (8 AWS instances to 
consume Twitter streams), 10 thousands arrives per second, define a 9-dimension 
cube with 3 measures; when build interval is 2 mintues, the job finishes in 
around 3 minutes; if change interval to 5 mintues, build finishes in around 4 
minutes;&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;Here are a couple of screenshots in this test:&lt;br /&gt;
+&lt;p&gt;Here are a couple of screenshots in this test, we may compose it as a 
step-by-step tutorial in the future:&lt;br /&gt;
 &lt;img src=&quot;/images/blog/streaming-monitor.png&quot; alt=&quot;Streaming 
Job Monitoring&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/images/blog/streaming-adapter.png&quot; 
alt=&quot;Streaming Adapter&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/images/blog/streaming-twitter.png&quot; 
alt=&quot;Streaming Twitter Sample&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;In short, this is a more robust Near Real Time Streaming OLAP 
solution (compared with the previous version). Nextly, the Apache Kylin team 
will move toward a Real Time engine.&lt;/p&gt;
 </description>
         <pubDate>Tue, 18 Oct 2016 10:30:00 -0700</pubDate>
         <link>http://kylin.apache.org/blog/2016/10/18/new-nrt-streaming/</link>


Reply via email to