How-Meituan-Dominates-Online-...

lidong Tue, 03 Aug 2021 03:55:07 -0700

Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1891980&r1=1891979&r2=1891980&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Tue Aug  3 10:55:02 2021
@@ -19,11 +19,371 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Tue, 03 Aug 2021 00:39:20 -0700</pubDate>
-    <lastBuildDate>Tue, 03 Aug 2021 00:39:20 -0700</lastBuildDate>
+    <pubDate>Tue, 03 Aug 2021 03:26:40 -0700</pubDate>
+    <lastBuildDate>Tue, 03 Aug 2021 03:26:40 -0700</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
+        <title>Kylin å¨ç¾å¢å°åºé¤é¥®çå®è·µåä¼å</title>
+        
<description>&lt;p&gt;ä»2016å¹´å¼å§ï¼ç¾å¢å°åºé¤é¥®ææ¯å¢éå°±å¼å§ä½¿ç¨Apache
 
Kylinä½ä¸ºOLAPå¼æï¼ä½æ¯éçä¸å¡çé«éåå±ï¼å¨æå»ºåæ¥è¯¢å±é¢é½åºç°äºæçé®é¢ãäºæ¯ï¼ææ¯å¢éä»åçè§£è¯»å¼å§ï¼ç¶åå¯¹è¿ç¨è¿è¡å±å±æè§£ï¼å¹¶å¶å®äºç±ç¹åé¢çå®æ½è·¯çº¿ãæ¬ææ»ç»äºä¸äºç»éªåå¿å¾ï¼å¸æè½å¤å¸®å©ä¸çæ´å¤çææ¯å¢éæé«æ°æ®çäº§åºæçã&lt;/p&gt;
+
+&lt;h2 id=&quot;section&quot;&gt;èæ¯&lt;/h2&gt;
+
+&lt;p&gt;éå®ä¸å¡çç¹ç¹æ¯è§æ¨¡å¤§ãé¢åå¤ãéæ±å¯ãç¾å¢å°åºé¤é¥®æå¤©éå®ç³»ç»ï¼&lt;strong&gt;ä»¥ä¸ç®ç§°âæå¤©â&lt;/strong&gt;ï¼ä½ä¸ºéå®æ°æ®æ¯æçä¸»è¦è½½ä½ï¼ä¸ä»

æ¶åçèå´è¾å¹¿ï¼èä¸é¢ä¸´çææ¯åºæ¯ä¹éå¸¸å¤æï¼&lt;strong&gt;å¤ç»ç»å±çº§æ°æ®å±ç¤ºåé´æãè¶
è¿1/3çææ 
éè¦ç²¾åå»éï¼å³°å¼æ¥è¯¢å·²ç»è¾¾å°æ°ä¸çº§å«&lt;/strong&gt;ï¼ãå¨è¿æ
 
·çä¸å¡èæ¯ä¸ï¼å»ºè®¾ç¨³å®é«æçOLAPå¼æï¼åå©åæäººåå¿«éå³çï¼å·²ç»æ�
 �ä¸ºå°é¤æå¤©çæ ¸å¿ç®æ ã&lt;/p&gt;
+
+&lt;p&gt;Apache Kylinæ¯ä¸ä¸ªåºäºHadoopå¤§æ°æ®å¹³å°æé 
çå¼æºOLAPå¼æï¼å®éç¨äºå¤ç»´ç«æ¹ä½é¢è®¡ç®ææ¯ï¼å©ç¨ç©ºé´æ¢æ¶é´çæ¹æ³ï¼å°æ¥è¯¢éåº¦æåè³äºç§çº§å«ï¼æå¤§å°æé«äºæ°æ®åæçæçï¼å¹¶å¸¦æ¥äºä¾¿æ·ãçµæ´»çæ¥è¯¢åè½ãåºäºææ¯ä¸ä¸å¡å¹é
åº¦ï¼æå¤©äº2016å¹´éç¨Kylinä½ä¸ºOLAPå¼æï¼æ¥ä¸æ¥çå 
å¹´éï¼è¿å¥ç³»ç»é«æå°æ¯æäºæä»¬çæ°æ®åæä½ç³»ã&lt;/p&gt;
+
+&lt;p&gt;2020å¹´ï¼ç¾å¢å°é¤ä¸å¡åå±è¾å¿«ï¼æ°æ®ææ ä¹è¿éå¢å 
ãåºäºKylinçè¿å¥ç³»ç»ï¼å¨æå»ºåæ¥è¯¢ä¸ååºç°äºä¸¥éçæçé®é¢ï¼ä»èå½±åå°æ°æ®çåæå³çï¼å¹¶ç»ç¨æ·ä½éªä¼åå¸¦æ¥äºå¾å¤§çé»ç¢ãææ¯å¢éç»è¿åå¹´å·¦å³çæ¶é´ï¼å¯¹Kylinè¿è¡ä¸ç³»åçä¼åè¿ä»£ï¼å
æ¬ç»´åº¦è£åªãæ¨¡åè®¾è®¡ä»¥åèµæºéé
çççï¼å¸®å©éå®ä¸ç»©æ°æ®SLAä»90%æåè³99.99%ãåºäºè¿æ¬¡å®æï¼æä»¬æ²æ·äºä¸å¥æ¶µçäºâåçè
 
§£è¯»âãâè¿ç¨æè§£âãâå®æ½è·¯çº¿âçææ¯æ¹æ¡ãå¸æè¿äºç»éªä¸æ»ç»ï¼è½å¤å¸®å©ä¸çæ´å¤çææ¯å¢éæé«æ°æ®äº§åºä¸ä¸å¡å³ççæçã&lt;/p&gt;
+
+&lt;h2 id=&quot;section-1&quot;&gt;é®é¢ä¸ç®æ &lt;/h2&gt;
+
+&lt;p&gt;éå®ä½ä¸ºè¡æ¥å¹³å°ååå®¶çæ¡¥æ¢ï¼å
å«éå®å°åºåçµè¯æè®¿ä¸¤ç§ä¸å¡æ¨¡å¼ï¼ä»¥æåºãäººåç»ç»æ¶æéçº§ç®¡çï¼ææåæåéè¦æ2å¥ç»ç»å±çº§æ¥çãå¨ææ
 
å£å¾ä¸è´ãæ°æ®äº§åºåæ¶çè¦æ±ä¸ï¼æä»¬ç»åKylinçé¢è®¡ç®ææ³ï¼è¿è¡äºæ°æ®çæ¶æè®¾è®¡ãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-01.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;èKylinè®¡ç®ç»´åº¦ç»åçå
¬å¼æ¯2^Nï¼&lt;strong&gt;Nä¸ºç»´åº¦ä¸ªæ°&lt;/strong&gt;ï¼ï¼å®æ¹æä¾ç»´åº¦åªæçæ¹å¼ï¼åå°ç»´åº¦ç»åä¸ªæ°ãä½ç±äºå°é¤ä¸å¡çç¹æ®æ§ï¼åä»»å¡ä¸å¯è£åªçç»åä¸ªæ°ä»é«è¾¾1000+ãå¨éæ±è¿ä»£ä»¥åäººåãæåºç»ç»åå¨çåºæ¯ä¸ï¼éè¦åæº¯å
¨é¨åå²æ°æ®ï¼ä¼èè´¹å¤§éçèµæºä»¥åè¶
é«çæå»ºæ¶é¿ãèåºäºä¸å¡ååçæ¶æè®¾è®¡ï¼è½è½å¤æå¤§å°ä¿è¯æ°æ®äº§åºçè§£è¦ï¼ä¿è¯ææ
 å£å¾çä¸è´æ§ï�
 �ä½æ¯å¯¹Kylinæå»ºäº§çäºå¾å¤§çååï¼è¿èå¯¼è´èµæºå 
ç¨å¤§ãèæ¶é¿ãåºäºä»¥ä¸ä¸å¡ç°ç¶ï¼æä»¬å½çº³äºKylinçMOLAPæ¨¡å¼ä¸åå¨çé®é¢ï¼å
·ä½å¦ä¸ï¼&lt;/p&gt;
+
+&lt;ul&gt;
+  
&lt;li&gt;&lt;strong&gt;æçé®é¢å½ä¸é¾ï¼å®ç°åçï¼&lt;/strong&gt;ï¼æå»ºè¿ç¨æ¥éª¤å¤ï¼åæ¥éª¤ä¹é´å¼ºå
³èï¼ä»ä»é®é¢çè¡¨è±¡å¾é¾åç°é®é¢çæ ¹æ¬åå ï¼æ 
æ³è¡ä¹ææå°è§£å³é®é¢ã&lt;/li&gt;
+  
&lt;li&gt;&lt;strong&gt;æå»ºå¼ææªè¿ä»£ï¼æå»ºè¿ç¨ï¼&lt;/strong&gt;ï¼åå²ä»»å¡ä»éç¨MapReduceä½ä¸ºæå»ºå¼æï¼æ²¡æåæ¢å°æå»ºæçæ´é«çSparkã&lt;/li&gt;
+  
&lt;li&gt;&lt;strong&gt;èµæºå©ç¨ä¸åçï¼æå»ºè¿ç¨ï¼&lt;/strong&gt;ï¼èµæºæµªè´¹ãèµæºçå¾
ï¼é»è®¤å¹³å°å¨æèµæºéé
æ¹å¼ï¼å¯¼è´å°ä»»å¡ç³è¯·äºå¤§éèµæºï¼æ°æ®ååä¸åçï¼äº§çäºå¤§éçå°æä»¶ï¼ä»èé
 æèµæºæµªè´¹ãå¤§éä»»å¡çå¾ã&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;æ 
¸å¿ä»»å¡èæ¶é¿ï¼å®æ½è·¯çº¿ï¼&lt;/strong&gt;ï¼æå¤©éå®äº¤æä¸ç»©æ°æ®ææ
 
çæºè¡¨æ°æ®éå¤§ãç»´åº¦ç»åå¤ãè¨èçé«ï¼å¯¼è´æ¯å¤©æå»ºçæ¶é¿è¶
è¿2ä¸ªå°æ¶ã&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;SLAè´¨éä¸è¾¾æ 
ï¼å®æ½è·¯çº¿ï¼&lt;/strong&gt;ï¼SLAçæ´ä½è¾¾æçæªè½è¾¾å°é¢æç®æ
 ã&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;å¨è®¤çåæå®é®é¢ï¼å¹¶ç¡®å®ææçå¤§ç®æ 
åï¼æä»¬å¯¹Kylinçæå»ºè¿ç¨è¿è¡äºåç±»ï¼æè§£åºå¨æå»ºè¿ç¨ä¸è½æåæççæ
 
¸å¿ç¯èï¼éè¿âåçè§£è¯»âãâå±å±æè§£âãâç±ç¹åé¢âçææ®µï¼è¾¾æååéä½çç®æ
 ãå·ä½éåç®æ å¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-02.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;section-2&quot;&gt;ä¼ååæ-åçè§£è¯»&lt;/h2&gt;
+
+&lt;p&gt;ä¸ºäºè§£å³æçæåå®ä½é¾ãå½å 
é¾çé®é¢ï¼æä»¬è§£è¯»äºKylinæå»ºåçï¼å
å«äºé¢è®¡ç®ææ³ä»¥åBy-layeréå±ç®æ³ã&lt;/p&gt;
+
+&lt;h3 id=&quot;section-3&quot;&gt;é¢è®¡ç®&lt;/h3&gt;
+
+&lt;p&gt;æ 
¹æ®ç»´åº¦ç»ååºææå¯è½çç»´åº¦ï¼å¯¹å¤ç»´åæå¯è½ç¨å°çææ 
è¿è¡é¢è®¡ç®ï¼å°è®¡ç®å¥½çç»æä¿åæCubeãåè®¾æä»¬æ4ä¸ªç»´åº¦ï¼è¿ä¸ªCubeä¸æ¯ä¸ªèç¹ï¼&lt;strong&gt;ç§°ä½Cuboid&lt;/strong&gt;ï¼é½æ¯è¿4ä¸ªç»´åº¦çä¸åç»åï¼æ¯ä¸ªç»åå®ä¹äºä¸ç»åæçç»´åº¦ï¼&lt;strong&gt;å¦group
 by&lt;/strong&gt;ï¼ï¼ææ 
çèåç»æå°±ä¿åå¨æ¯ä¸ªCuboidä¸ãæ¥è¯¢æ¶ï¼æä»¬æ 
¹æ®SQLæ¾å°å¯¹åºçCuboidï¼è¯»åææ 
çå¼ï¼å³å¯è¿åãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-03.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;by-layer&quot;&gt;By-layeréå±ç®æ³&lt;/h3&gt;
+
+&lt;p&gt;ä¸ä¸ªNç»´çCubeï¼æ¯ç±1ä¸ªNç»´åç«æ¹ä½ãNä¸ªï¼N-1ï¼ç»´åç«æ¹ä½ãN*(N-1)/2ä¸ª(N-2)ç»´åç«æ¹ä½ãâ¦â¦Nä¸ª1ç»´åç«æ¹ä½å1ä¸ª0ç»´åç«æ¹ä½ææï¼æ»å
±æ 2^Nä¸ªåç«æ¹ä½ãå¨éå±ç®æ³ä¸ï¼æç
§ç»´åº¦æ°éå±åå°æ¥è®¡ç®ï¼æ¯ä¸ªå±çº§çè®¡ç®ï¼é¤äºç¬¬ä¸å±ï¼ç±åå§æ°æ®èåèæ¥ï¼ï¼æ¯åºäºä¸ä¸å±çº§çè®¡ç®ç»ææ¥è®¡ç®çã&lt;/p&gt;
+
+&lt;p&gt;ä¾å¦ï¼group by [A,B]çç»æï¼å¯ä»¥åºäºgroup by 
[A,B,C]çç»æï¼éè¿å»æCåèåå¾æ¥çï¼è¿æ 
·å¯ä»¥åå°éå¤è®¡ç®ï¼å½0ç»´Cuboidè®¡ç®åºæ¥çæ¶åï¼æ´ä¸ªCubeçè®¡ç®ä¹å°±å®æäºãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-04.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;section-4&quot;&gt;è¿ç¨åæ-å±å±æè§£&lt;/h2&gt;
+
+&lt;p&gt;å¨äºè§£å®Kylinçåºå±åçåï¼æä»¬å°ä¼åçæ¹åéå®å¨âå¼æéæ©âãâæ°æ®è¯»åâãâæå»ºåå

¸âãâåå±æå»ºâãâæä»¶è½¬æ¢âäºä¸ªç¯èï¼åç»ååé¶æ®µçé®é¢ãæè·¯åç®æ
 
åï¼æä»¬ç»äºåå°äºå¨éä½è®¡ç®èµæºçåæ¶éä½äºèæ¶ãè¯¦æ
å¦ä¸è¡¨æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-05.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;section-5&quot;&gt;æå»ºå¼æéæ©&lt;/h3&gt;
+
+&lt;p&gt;ç®åï¼æä»¬å·²ç»å°æå»ºå¼æå·²éæ¥åæ¢ä¸ºSparkãæå¤©æ©å¨2016å¹´å°±ä½¿ç¨Kylinä½ä¸ºOLAPå¼æï¼åå²ä»»å¡æ²¡æåæ¢ï¼ä»
ä»éå¯¹MapReduceåäºåæ°ä¼åãå
¶å®å¨2017å¹´ï¼Kylinå®ç½å·²å¯ç¨Sparkä½ä¸ºæå»ºå¼æï¼å®ç½å¯ç¨Sparkæå»ºå¼æï¼ï¼æå»ºæçç¸è¾MapReduceæå1è³3åï¼è¿å¯éè¿Cubeè®¾è®¡éæ©åæ¢ï¼å¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-06.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;section-6&quot;&gt;è¯»åæºæ°æ®&lt;/h3&gt;
+
+&lt;p&gt;Kylinä»¥å¤é¨è¡¨çæ¹å¼è¯»åHiveä¸çæºæ°æ®ï¼è¡¨ä¸çæ°æ®æä»¶ï¼&lt;strong&gt;åå¨å¨HDFS&lt;/strong&gt;ï¼ä½ä¸ºä¸ä¸ä¸ªåä»»å¡çè¾å

¥ï¼æ¤è¿ç¨å¯è½åå¨å°æä»¶é®é¢ãå½åï¼Kylinä¸æ¸¸æ°æ®å®½è¡¨æä»¶æ°åå¸æ¯è¾åçï¼æ
 éå¨ä¸æ¸¸è®¾ç½®åå¹¶ï¼å¦æå¼ºè¡åå¹¶åèä¼å¢å 
ä¸æ¸¸æºè¡¨æ°æ®å å·¥æ¶é´ã&lt;/p&gt;
+
+&lt;p&gt;å¯¹äºé¡¹ç®éæ±ï¼è¦åå·åå²æ°æ®æå¢å 
ç»´åº¦ç»åï¼éè¦éæ°æå»ºå
¨é¨çæ°æ®ï¼éå¸¸éç¨æææå»ºçæ¹å¼åå·åå²ï¼å 
è½½çååºè¿å¤åºç°å°æä»¶é®é¢ï¼å¯¼è´æ¤è¿ç¨æ§è¡ç¼æ
¢ãå¨Kylinçº§å«éåé
ç½®æä»¶ï¼å¯¹å°æä»¶è¿è¡åå¹¶ï¼åå°Mapæ°éï¼å¯ææå°æåè¯»åæçã&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;åå¹¶æºè¡¨å°æä»¶&lt;/strong&gt;ï¼åå¹¶Hiveæºè¡¨ä¸å°æä»¶ä¸ªæ°ï¼æ§å¶æ¯ä¸ªJobå¹¶è¡çTaskä¸ªæ°ãè°æ´åæ°å¦ä¸è¡¨æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-07.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Kylinçº§å«åæ°éå&lt;/strong&gt;ï¼è®¾ç½®Mapè¯»åè¿ç¨çæä»¶å¤§å°ãè°æ´åæ°å¦ä¸è¡¨æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-08.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;section-7&quot;&gt;æå»ºåå¸&lt;/h3&gt;
+
+&lt;p&gt;Kylinéè¿è®¡ç®Hiveè¡¨åºç°çç»´åº¦å¼ï¼åå»ºç»´åº¦åå
¸ï¼å°ç»´åº¦å¼æ å°æç¼ç 
ï¼å¹¶ä¿åä¿åç»è®¡ä¿¡æ¯ï¼èçº¦HBaseåå¨èµæºãæ¯ä¸ç§ç»´åº¦ç»åï¼ç§°ä¸ºä¸ä¸ªCuboidãçè®ºä¸æ¥è¯´ï¼ä¸ä¸ªNç»´çCubeï¼ä¾¿æ2^Nç§ç»´åº¦ç»åã&lt;/p&gt;
+
+&lt;h4 id=&quot;section-8&quot;&gt;ç»åæ°éæ¥ç&lt;/h4&gt;
+
+&lt;p&gt;å¨å¯¹ç»´åº¦ç»ååªæåï¼å®é
è®¡ç®ç»´åº¦ç»åé¾ä»¥è®¡ç®ï¼å¯éè¿æ§è¡æ¥å¿ï¼&lt;strong&gt;æªå¾ä¸ºæåäºå®è¡¨å¯ä¸åçæ¥éª¤ä¸ï¼æåä¸ä¸ªReduceçæ¥å¿&lt;/strong&gt;ï¼ï¼æ¥çå
·ä½çç»´åº¦ç»åæ°éãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-09.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h4 id=&quot;section-9&quot;&gt;å¨å±åå¸ä¾èµ&lt;/h4&gt;
+
+&lt;p&gt;æå¤©æå¾å¤ä¸å¡åºæ¯éè¦ç²¾ç¡®å»éï¼å½åå¨å¤ä¸ªå
¨å±åå
¸åæ¶ï¼å¯è®¾ç½®åä¾èµï¼ä¾å¦ï¼å½åæ¶åå¨âé¨åºæ°éâãâå¨çº¿é¨åºæ°éâæ°æ®ææ
 ï¼å¯è®¾ç½®åä¾èµï¼åå°å¯¹è¶
é«åºç»´åº¦çè®¡ç®ãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-10.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h4 id=&quot;section-10&quot;&gt;è®¡ç®èµæºéç½®&lt;/h4&gt;
+
+&lt;p&gt;å½ææ ä¸åå¨å¤ä¸ªç²¾åå»éææ æ¶ï¼å¯éå½å¢å 
è®¡ç®èµæºï¼æåå¯¹é«åºç»´åº¦æå»ºçæçãåæ°è®¾ç½®å¦ä¸è¡¨æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-11.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;section-11&quot;&gt;åå±æå»º&lt;/h3&gt;
+
+&lt;p&gt;æ¤è¿ç¨ä¸ºKylinæå»ºçæ 
¸å¿ï¼åæ¢Sparkå¼æåï¼é»è®¤åªéç¨By-layeréå±ç®æ³ï¼ä¸åèªå¨éæ©ï¼By-layeréå±ç®æ³ãå¿«éç®æ³ï¼ãSparkå¨å®ç°By-layeréå±ç®æ³çè¿ç¨ä¸ï¼ä»æåºå±çCuboidä¸å±ä¸å±å°åä¸è®¡ç®ï¼ç´å°è®¡ç®åºæé¡¶å±çCuboidï¼ç¸å½äºæ§è¡äºä¸ä¸ªä¸å¸¦group
 byçæ¥è¯¢ï¼ï¼å°åå±çç»ææ°æ®ç¼åå°å
åä¸ï¼è·³è¿æ¯æ¬¡æ°æ®çè¯»åè¿ç¨ï¼ç´æ¥ä¾èµä¸å±çç¼åæ°æ®ï¼å¤§å¤§æé«äºæ§è¡æçãSparkæ§è¡è¿ç¨å
·ä½åå®¹å¦
 ä¸ã&lt;/p&gt;
+
+&lt;h4 id=&quot;job&quot;&gt;Jobé¶æ®µ&lt;/h4&gt;
+
+&lt;p&gt;Jobä¸ªæ°ä¸ºBy-layerç®æ³æ 
çå±æ°ï¼Sparkå°æ¯å±ç»ææ°æ®çè¾åºï¼ä½ä¸ºä¸ä¸ªJobãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-12.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h4 id=&quot;stage&quot;&gt;Stageé¶æ®µ&lt;/h4&gt;
+
+&lt;p&gt;æ¯ä¸ªJobå¯¹åºä¸¤ä¸ªStageé¶æ®µï¼åä¸ºè¯»åä¸å±ç¼åæ°æ®åç¼åè¯¥å±è®¡ç®åçç»ææ°æ®ãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-13.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h4 id=&quot;task&quot;&gt;Taskå¹¶è¡åº¦è®¾ç½®&lt;/h4&gt;
+
+&lt;p&gt;Kylinæ 
¹æ®é¢ä¼°æ¯å±æå»ºCuboidç»åæ°æ®çå¤§å°ï¼&lt;strong&gt;å¯éè¿ç»´åº¦åªæçæ¹å¼ï¼åå°ç»´åº¦ç»åçæ°éï¼éä½Cuboidç»åæ°æ®çå¤§å°ï¼æåæå»ºæçï¼æ¬ææä¸è¯¦ç»ä»ç»&lt;/strong&gt;ï¼ååå²æ°æ®çåæ°å¼è®¡ç®åºä»»å¡å¹¶è¡åº¦ãè®¡ç®å
¬å¼å¦ä¸ï¼&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Taskä¸ªæ°è®¡ç®å
¬å¼&lt;/strong&gt;ï¼Min(MapSize/cut-mb ï¼MaxPartition) ï¼Max(MapSize/cut-mb 
ï¼MinPartition)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;ul&gt;
+      
&lt;li&gt;&lt;strong&gt;MapSize&lt;/strong&gt;ï¼æ¯å±æå»ºçCuboidç»åå¤§å°ï¼å³ï¼Kylinå¯¹åå±çº§ç»´åº¦ç»åå¤§å°çé¢ä¼°å¼ã&lt;/li&gt;
+      
&lt;li&gt;&lt;strong&gt;cut-mb&lt;/strong&gt;ï¼åå²æ°æ®å¤§å°ï¼æ§å¶Taskä»»å¡å¹¶è¡ä¸ªæ°ï¼å¯éè¿kylin.engine.spark.rdd-partition-cut-mbåæ°è®¾ç½®ã&lt;/li&gt;
+      
&lt;li&gt;&lt;strong&gt;MaxPartition&lt;/strong&gt;ï¼æå¤§ååºï¼å¯éè¿kylin.engine.spark.max-partitionåæ°è®¾ç½®ã&lt;/li&gt;
+      
&lt;li&gt;&lt;strong&gt;MinPartition&lt;/strong&gt;ï¼æå°ååºï¼å¯éè¿kylin.engine.spark.min-partitionåæ°è®¾ç½®ã&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  
&lt;li&gt;&lt;strong&gt;è¾åºæä»¶ä¸ªæ°è®¡ç®&lt;/strong&gt;ï¼æ¯ä¸ªTaskä»»å¡å°æ§è¡å®æåçç»ææ°æ®åç¼©ï¼åå
¥HDFSï¼ä½ä¸ºæä»¶è½¬æ¢è¿ç¨çè¾å
¥ãæä»¶ä¸ªæ°å³ä¸ºï¼Taskä»»å¡è¾åºæä»¶ä¸ªæ°çæ±æ»ã&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h4 id=&quot;section-12&quot;&gt;èµæºç³è¯·è®¡ç®&lt;/h4&gt;
+
+&lt;p&gt;å¹³å°é»è®¤éç¨å¨ææ¹å¼ç³è¯·è®¡ç®èµæºï¼åä¸ªExecutorçè®¡ç®è½åå
å«ï¼1ä¸ªé»è¾CPUï¼ä»¥ä¸ç®ç§°CPUï¼ã6GBå åååã1GBçå å¤å
åãè®¡ç®å¬å¼å¦ä¸ï¼&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;strong&gt;CPU&lt;/strong&gt; =  
kylin.engine.spark-conf.spark.executor.cores * å®é
ç³è¯·çExecutorsä¸ªæ°ã&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;åå&lt;/strong&gt; 
=ï¼kylin.engine.spark-conf.spark.executor.memory + 
spark.yarn.executor.memoryOverheadï¼* å®é
ç³è¯·çExecutorsä¸ªæ°ã&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;åä¸ªExecutorçæ§è¡è½å&lt;/strong&gt; = 
kylin.engine.spark-conf.spark.executor.memory / 
kylin.engine.spark-conf.spark.executor.coresï¼å³ï¼1ä¸ªCPUæ§è¡è¿ç¨ä¸ç³è¯·çå
åå¤§å°ã&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;æå¤§Executorsä¸ªæ°&lt;/strong&gt; = 
kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutorsï¼å¹³å°é»è®¤å¨æç³è¯·ï¼è¯¥åæ°éå¶æå¤§ç³è¯·ä¸ªæ°ã&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;å¨èµæºåè¶³çæ
åµä¸ï¼è¥åä¸ªStageé¶æ®µç³è¯·1000ä¸ªå¹¶è¡ä»»å¡ï¼åéè¦ç³è¯·èµæºè¾¾å°7000GBå
åå1000ä¸ªCPUï¼å³ï¼&lt;code 
class=&quot;highlighter-rouge&quot;&gt;CPUï¼1*1000=1000ï¼å
åï¼ï¼6+1ï¼*1000=7000GB&lt;/code&gt;ã&lt;/p&gt;
+
+&lt;h4 id=&quot;section-13&quot;&gt;èµæºåçåéé&lt;/h4&gt;
+
+&lt;p&gt;ç±äºBy-layeréå±ç®æ³çç¹æ§ï¼ä»¥åSparkå¨å®é
æ§è¡è¿ç¨ä¸çåç¼©æºå¶ï¼å®éæ§è¡çTaskä»»å¡å 
è½½çååºæ°æ®è¿è¿å°äºåæ°è®¾ç½®å¼ï¼ä»èå¯¼è´ä»»å¡è¶
é«å¹¶è¡ï¼å 
ç¨å¤§éèµæºï¼åæ¶äº§çå¤§éçå°æä»¶ï¼å½±åä¸æ¸¸æä»¶è½¬æ¢è¿ç¨ãå
 æ¤ï¼åççååæ°æ®æä¸ºä¼åçå
³é®ç¹ãéè¿Kylinæå»ºæ¥å¿ï¼å¯æ¥çåå±çº§çCuboidç»åæ°æ®çé¢ä¼°å¤§å°ï¼ä»¥åååçååºä¸ªæ°ï¼çäºStageé¶æ®µå®é
çæçTaskä¸ªæ°ï¼ãå¦ä¸å�
 �¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-14.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;ç»åSpark UIå¯æ¥çå®æ§è¡æåµï¼è°æ´å
åçç³è¯·ï¼æ»¡è¶³æ§è¡æéè¦çèµæºå³å¯ï¼åå°èµæºæµªè´¹ã&lt;/p&gt;
+
+&lt;ol&gt;
+  
&lt;li&gt;æ´ä½èµæºç³è¯·æå°å¼å¤§äºStageé¶æ®µTop1ãTop2å±çº§çç¼åæ°æ®ä¹åï¼ä¿è¯ç¼åæ°æ®å
¨é¨å¨ååãå¦ä¸å¾æç¤ºï¼&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-15.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;è®¡ç®å
¬å¼&lt;/strong&gt;ï¼Stageé¶æ®µTop1ãTop2å±çº§çç¼åæ°æ®ä¹å 
&amp;lt; kylin.engine.spark-conf.spark.executor.memory * 
kylin.engine.spark-conf.spark.memory.fraction *  spark.memory.storageFraction 
*æå¤§Executorsä¸ªæ°&lt;/p&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;åä¸ªTaskå®éæéè¦çå
ååCPUï¼&lt;strong&gt;1ä¸ªTaskæ§è¡ä½¿ç¨1ä¸ªCPU&lt;/strong&gt;ï¼å°äºåä¸ªExecutorçæ§è¡è½åãå¦ä¸å¾æç¤ºï¼&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-16.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;è®¡ç®å¬å¼&lt;/strong&gt;ï¼åä¸ªTaskå®é
æéè¦çåå &amp;lt; kylin.engine.spark-conf.spark.executor.memory * 
kylin.engine.spark-conf.spark.memory.fraction *  spark.memory.stÂ·orageFraction 
/ 
kylin.engine.spark-conf.spark.executor.coresãåæ°è¯´æå¦ä¸è¡¨æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-17.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;section-14&quot;&gt;æä»¶è½¬æ¢&lt;/h3&gt;
+
+&lt;p&gt;Kylinå°æå»ºä¹åçCuboidæä»¶è½¬æ¢æHTableæ 
¼å¼çHfileæä»¶ï¼éè¿BulkLoadçæ¹å¼å°æä»¶åHTableè¿è¡å
³èï¼å¤§å¤§éä½äºHBaseçè´è½½ãæ¤è¿ç¨éè¿ä¸ä¸ªMapReduceä»»å¡å®æï¼Mapä¸ªæ°ä¸ºåå±æå»ºé¶æ®µè¾åºæä»¶ä¸ªæ°ãæ¥å¿å¦ä¸ï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-18.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;æ¤é¶æ®µå¯æ ¹æ®å®éè¾å
¥çæ°æ®æä»¶å¤§å°ï¼&lt;strong&gt;å¯éè¿MapReduceæ¥å¿æ¥ç&lt;/strong&gt;ï¼ï¼åçç³è¯·è®¡ç®èµæºï¼é¿å
èµæºæµªè´¹ã&lt;/p&gt;
+
+&lt;p&gt;è®¡ç®å¬å¼ï¼Mapé¶æ®µèµæºç³è¯· = 
kylin.job.mr.config.override.mapreduce.map.memory.mb * 
åå±æå»ºé¶æ®µè¾åºæä»¶ä¸ªæ°ãå·ä½åæ°å¦ä¸è¡¨æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-19.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;section-15&quot;&gt;å®æ½è·¯çº¿-ç±ç¹åé¢&lt;/h2&gt;
+
+&lt;h3 id=&quot;section-16&quot;&gt;äº¤æè¯ç¹å®è·µ&lt;/h3&gt;
+
+&lt;p&gt;æä»¬éè¿å¯¹Kylinåççè§£è¯»ä»¥åæå»ºè¿ç¨çå±å±æè§£ï¼éåéå®äº¤ææ
 ¸å¿ä»»å¡è¿è¡è¯ç¹å®è·µãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-20.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;section-17&quot;&gt;å®è·µç»æå¯¹æ¯&lt;/h3&gt;
+
+&lt;p&gt;éå¯¹éå®äº¤ææ 
¸å¿ä»»å¡è¿è¡å®è·µä¼åï¼å¯¹æ¯è°æ´ååèµæºå®éä½¿ç¨æ
åµåæ§è¡æ¶é¿ï¼æç»è¾¾å°ååéä½çç®æ 
ãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-21.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;section-18&quot;&gt;ææå±ç¤º&lt;/h2&gt;
+
+&lt;h3 id=&quot;section-19&quot;&gt;èµæºæ´ä½æåµ&lt;/h3&gt;
+
+&lt;p&gt;æå¤©ç°æ20+çKylinä»»å¡ï¼ç»è¿åå¹´æ¶é´æç»ä¼åè¿ä»£ï¼å¯¹æ¯KylinèµæºéåæåCUä½¿ç¨éåPendingä»»å¡CUä½¿ç¨éï¼å¨åçä»»å¡ä¸èµæºæ¶èå·²ææ¾éä½ãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-23.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;sla&quot;&gt;SLAæ´ä½è¾¾æç&lt;/h3&gt;
+
+&lt;p&gt;ç»è¿äºç±ç¹åé¢çæ´ä½ä¼åï¼æå¤©äº2020å¹´6æSLAè¾¾æçè¾¾å°100%ãå¦ä¸å¾æç¤ºï¼&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan_cn/chart-24.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;section-20&quot;&gt;å±æ&lt;/h2&gt;
+
+&lt;p&gt;Apache 
Kylinå¨2015å¹´11ææ£å¼æä¸ºApacheåºéä¼çé¡¶çº§é¡¹ç®ãä»å¼æºå°æä¸ºApacheé¡¶çº§é¡¹ç®ï¼åªè±äº13ä¸ªæçæ¶é´ï¼èä¸å®ä¹æ¯ç¬¬ä¸ä¸ªç±ä¸å½å¢éå®æ´è´¡ç®å°Apacheçé¡¶çº§é¡¹ç®ã&lt;/p&gt;
+
+&lt;p&gt;ç®åï¼ç¾å¢éç¨æ¯è¾ç¨³å®çV2.0çæ¬ï¼ç»è¿è¿4å¹´çä½¿ç¨ä¸ç§¯ç´¯ï¼å°åºé¤é¥®ææ¯å¢éå¨ä¼åæ¥è¯¢æ§è½ä»¥åæå»ºæçå±é¢é½ç§¯ç´¯äºå¤§éç»éªï¼æ¬æä¸»è¦éè¿°äºå¨Sparkæå»ºè¿ç¨çèµæºéé

æ¹æ³ãå¼å¾ä¸æçæ¯ï¼Kylinå®æ¹å¨2020å¹´7æåå¸äºV3.1çæ¬ï¼å¼å
¥äºFlinkä½ä¸ºæå»ºå¼æï¼ç»ä¸ä½¿ç¨Flinkæå»ºæ ¸å¿è¿ç¨ï¼å
å«æ°æ®è¯»åé¶æ®µãæå»ºåå
¸é¶æ®µãåå±æå»ºé¶æ®µãæä»¶è½¬æ¢é¶æ®µï¼ä»¥ä¸åé¨åå 
æ´ä½æå»ºèæ¶
 ç95%ä»¥ä¸ãæ¤æ¬¡çæ¬çåçº§ä¹å¤§å¹
åº¦æé«äºKylinçæå»ºæçãè¯¦æå¯æ¥çï¼Flink Cube Build 
Engineã&lt;/p&gt;
+
+&lt;p&gt;åé¡¾Kylinæå»ºå¼æçåçº§è¿ç¨ï¼ä»MapReduceå°Sparkï¼åå°å¦ä»çFlinkï¼æå»ºå·¥å
·çè¿ä»£å§ç»åæ´å ä¼ç§çä¸»æµå¼æå¨é 
æ¢ï¼èä¸Kylinç¤¾åºæå¾å¤æ´»è·çä¼ç§ä»£ç è´¡ç®è
ï¼ä»ä»¬ä¹å¨å¸®å©æ©å¤§Kylinççæï¼å¢å 
æ´å¤çæ°åè½ï¼éå¸¸å¼å¾å¤§å®¶å¦ä¹ 
ãæåï¼ç¾å¢å°åºé¤é¥®ææ¯å¢éåæ¬¡è¡¨è¾¾å¯¹Apache 
Kyliné¡¹ç®å¢éçæè°¢ã&lt;/p&gt;
+</description>
+        <pubDate>Tue, 03 Aug 2021 08:00:00 -0700</pubDate>
+        
<link>http://kylin.apache.org/cn_blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/cn_blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</guid>
+        
+        
+        <category>cn_blog</category>
+        
+      </item>
+    
+      <item>
+        <title>How Meituan Dominates Online Shopping with Apache Kylin</title>
+        <description>&lt;p&gt;Letâs face it, online shopping now affects 
nearly every part of our shopping lives. From ordering groceries to &lt;a 
href=&quot;https://www.carvana.com/&quot;&gt;purchasing a car&lt;/a&gt;, 
weâre living in an age of limitless choices when it comes to online commerce. 
Nowhere is this more the case than with the worldâs 2nd largest consumer 
market: China.&lt;/p&gt;
+
+&lt;p&gt;Leading the online shopping revolution in China is Meituan, who since 
2016 has grown to support nearly 460 million consumers from over 2,000 
industries, regularly processing hundreds of $billions in transactions. To 
support these staggering operations, Meituan has invested heavily in its data 
analytics system and employs more than 10,000 engineers to ensure a stable and 
reliable experience for their customers.&lt;/p&gt;
+
+&lt;p&gt;But the driving force behind Meituanâs success is not simply a 
robust analytics system. While the organizationâs executives might think so, 
its engineers understand that it is the OLAP engine that system is built upon 
that has empowered the company to move quickly and win in the market.&lt;/p&gt;
+
+&lt;h2 
id=&quot;meituans-secret-weapon-apache-kylin&quot;&gt;&lt;strong&gt;Meituanâs 
Secret Weapon: Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;Since 2016, Meituanâs technical team has relied on&lt;a 
href=&quot;https://kyligence.io/apache-kylin-overview/&quot;&gt; Apache 
Kylin&lt;/a&gt; to power their&lt;a 
href=&quot;https://kyligence.io/resources/extreme-olap-with-apache-kylin/&quot;&gt;
 OLAP engine&lt;/a&gt;. Apache Kylin, an open source OLAP engine built on the 
Hadoop platform, resolves complex queries at sub-second speeds through 
multidimensional precomputation, allowing for blazing-fast analysis on even the 
largest datasets.&lt;/p&gt;
+
+&lt;p&gt;However, the limitations of this open source solution became apparent 
as the companyâs business grew, becoming less and less efficient as cubes and 
queries became larger and more complex. To solve this problem, the engineering 
team leveraged Kylinâs open source foundations to dig into the engine, 
understand its underlying principles, and develop an implementation strategy 
that other organizations using Kylin can adopt to greatly improve their data 
output efficiency.&lt;/p&gt;
+
+&lt;p&gt;Meituanâs technical team has graciously shared their story of this 
process below so that you can apply it toward solving your own big data 
challenges.&lt;/p&gt;
+
+&lt;h2 
id=&quot;a-global-pandemic-and-a-new-normal-for-business&quot;&gt;&lt;strong&gt;A
 Global Pandemic and a New Normal for Business&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;For the last four years, Meituanâs Qingtian sales system has served 
as the companyâs data processing workhorse, handling massive amounts of daily 
sales data involving a wide range of highly complex technical scenarios. The 
stability and efficiency of this system is paramount, and itâs why 
Meituanâs engineers have made significant investments in optimizing the OLAP 
engine Qingtian is built upon.&lt;/p&gt;
+
+&lt;p&gt;After a thorough investigation, the team identified Apache Kylin as 
the only OLAP engine that could meet their needs and scale with anticipated 
growth. The engine was rolled out in 2016 and, over the next few years, Kylin 
played an important role in the companyâs evolving data analytics 
system.&lt;/p&gt;
+
+&lt;p&gt;Growth expectations, however, turned out to be severely 
underestimated, as a global pandemic quickly drove major changes in how 
consumers shopped and how businesses sold their goods. Such a massive shift in 
online shopping led to even faster growth for Meituan as well as a nearly 
untenable amount of new business data.&lt;/p&gt;
+
+&lt;p&gt;This caused efficiency bottlenecks that even their Kylin-based system 
started to struggle with. Cube building and query performance was unable to 
keep up with these changes in consumer behaviors, slowing down data analysis 
and decision-making and creating a major obstacle towards addressing user 
experiences.&lt;/p&gt;
+
+&lt;p&gt;Meituanâs technical team would spend the next six months carrying 
out optimizations and iterations for Kylin, including dimension pruning, model 
design, resource adaptation, and improving SLA compliance.&lt;/p&gt;
+
+&lt;h2 
id=&quot;responding-to-new-consumer-behaviors-with-apache-kylin&quot;&gt;&lt;strong&gt;Responding
 to New Consumer Behaviors with Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;In order to understand the approach taken when optimizing Meituanâs 
data architecture, itâs important to understand how the business is managed. 
The companyâs sales force operates with two business models â in-store 
sales and phone sales â and is then further broken down by various 
territories and corporate departments. All analytics data must be communicated 
across both business models.&lt;/p&gt;
+
+&lt;p&gt;With this in mind, Meituan engineers incorporated Kylin into their 
design of the data architecture as follows:&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-01.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 3. Apache Kylinâs layer-by-layer building data flow&lt;/p&gt;
+
+&lt;p&gt;While this design addressed many of Meituanâs initial concerns 
around scalability and efficiency, continued shifts in consumer behaviors and 
the organizationâs response to dramatic changes in the market put enormous 
pressure on Kylin when it came to building cubes. This lead to an unsustainable 
level of consumption of both resources and time.&lt;/p&gt;
+
+&lt;p&gt;It became clear that Kylinâs MOLAP model was presenting the 
following challenges:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The build process involved many steps that were highly correlated, 
making it difficult to root cause problems.&lt;/li&gt;
+  &lt;li&gt;MapReduce - instead of the more efficient Spark - was still being 
used as the build engine for historical tasks.&lt;/li&gt;
+  &lt;li&gt;The platformâs default dynamic resource adaption method demanded 
considerable resources for small tasks. Data was sharded unnecessarily and a 
large number of small files were generated, resulting in a waste of 
resources.&lt;/li&gt;
+  &lt;li&gt;Data volumes Meituan was now having to work with were well beyond 
the original architectural plan, resulting in two hours of cube building every 
day.&lt;/li&gt;
+  &lt;li&gt;The overall SLA fulfillment rate remained lower than 
expected.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Recognizing these problems, the team set a goal of improving the 
platformâs efficiency (you can see the quantitative targets below). Finding a 
solution would involve classifying Kylinâs build process, digging into how 
Kylin worked under the hood, breaking down that process, and finally 
implementing a solution.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-02.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 4. Implementation path diagram&lt;/p&gt;
+
+&lt;h2 
id=&quot;optimization-understanding-how-apache-kylin-builds-cubes&quot;&gt;&lt;strong&gt;Optimization:
 Understanding How Apache Kylin Builds Cubes&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;Understanding the cube building process is critical for pinpointing 
efficiency and performance issues. In the case of Kylin, a solid grasp of its 
precomputation approach and its âby layerâ cubing algorithm are necessary 
when formulating a solution.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Precomputation with Apache 
Kylin&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;Apache Kylin generates all possible dimensional combinations and 
pre-calculates the metrics that may be used in future multidimensional 
analysis, saving the results as a cube. Metric aggregation results are saved on 
&lt;em&gt;cuboids&lt;/em&gt; (a logical branch of the cube), and during queries 
relevant cuboids are found through SQL statements, and then read and quickly 
returned as metric values.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-03.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 5. Precomputation across four dimensions example&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Apache Kylinâs By-Layer Cubing 
Algorithm&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;An N-dimensional cube is composed of 1 N-dimensional sub-cube, N 
(N-1)-dimensional sub-cubes, N*(N-1)/2 (N-2)-dimensional sub-cubes, â¦, N 
1-dimensional sub-cubes, and one 0-dimensional sub-cube, consisting of a total 
of 2^N sub-cubes. In Kylinâs by-layer cubing algorithm, the number of 
dimensions decreases with the calculation of each layer, and each layerâs 
calculation is based on the calculation result of its parent layer (except the 
first layer, which bases it on the source data).&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-04.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 6. Cuboid example&lt;/p&gt;
+
+&lt;h2 id=&quot;the-proof-is-in-the-process&quot;&gt;&lt;strong&gt;The Proof 
Is in the Process&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;Understanding the principles outlined above, the Meituan team 
identified five key areas to focus on for optimization: engine selection, data 
reading, dictionary building, layer-by-layer build, and file conversion. 
Addressing these areas would lead to the greatest gains in reducing the 
required resources for calculation and shortening processing time.&lt;/p&gt;
+
+&lt;p&gt;The team outlined the challenges, their solutions, and key objectives 
in the following table:&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-05.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 7. Breakdown of Apache Kylinâs process&lt;/p&gt;
+
+&lt;h2 
id=&quot;putting-apache-kylin-to-the-test&quot;&gt;&lt;strong&gt;Putting Apache 
Kylin to the Test&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;With their solutions in place, the next step was to test if Kylinâs 
build process had actually improved. To do this, the team selected a set of 
critical sales tasks and ran a pilot (outlined below):&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-06.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 8. Meituanâs pilot program for their Apache Kylin 
optimizations&lt;/p&gt;
+
+&lt;p&gt;The results of the pilot were astonishing. Ultimately, the team was 
able to realize a significant reduction in resource consumption as seen in the 
following chart:&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-07.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 9. Resource usage and performance of Apache Kylin before and 
after pilot&lt;/p&gt;
+
+&lt;h2 id=&quot;analytics-optimized&quot;&gt;&lt;strong&gt;Analytics 
Optimize&lt;/strong&gt;d&lt;/h2&gt;
+
+&lt;p&gt;Today, Meituanâs Qingtian system is processing over 20 different 
Kylin tasks, and after six months of constant optimization, the monthly CU 
usage for Kylinâs resource queue and the CU usage for pending tasks have seen 
significant reductions.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-08.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 10. Current performance of Apache Kylin after solution 
implementation&lt;/p&gt;
+
+&lt;p&gt;Resource usage isnât the only area of impressive improvement. The 
Qingtian systemâs SLA compliance also was able to reach 100% as of June 
2020.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-09.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 11. Meituan SLA compliance after Apache Kylin 
optimization&lt;/p&gt;
+
+&lt;h2 
id=&quot;taking-on-the-future-with-apache-kylin&quot;&gt;&lt;strong&gt;Taking 
on the Future with Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;Over the past four years, Meituanâs technical team has accumulated 
a great deal of experience in optimizing query performance and build efficiency 
with Apache Kylin. But Meituanâs success is also the story of open sourceâs 
success.&lt;/p&gt;
+
+&lt;p&gt;The&lt;a href=&quot;http://kylin.apache.org/community/&quot;&gt; 
Apache Kylin community&lt;/a&gt; has many active and outstanding code 
contributors (&lt;a 
href=&quot;https://kyligence.io/comparing-kylin-vs-kyligence/&quot;&gt;including
 Kyligence&lt;/a&gt;), who are relentlessly working to expand the Kylin 
ecosystem and add more new features. Itâs in sharing success stories like 
this that Apache Kylin is able to remain the leading open source solution for 
analytics on massive datasets.&lt;/p&gt;
+
+&lt;p&gt;Together, with the entire Apache Kylin community, Meituan is making 
sure critical analytics work can remain unburdened by growing datasets, and 
that when the next major shift in business takes place, industry leaders like 
Meituan will be able to analyze whatâs happening and quickly take 
action.&lt;/p&gt;
+</description>
+        <pubDate>Tue, 03 Aug 2021 08:00:00 -0700</pubDate>
+        
<link>http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Apache kylin4 æ°æ¶æåäº«</title>
         <description>&lt;p&gt;è¿ç¯æç« ä¸»è¦åä¸ºä»¥ä¸å 
ä¸ªé¨åï¼&lt;br /&gt;
 - Apache Kylin ä½¿ç¨åºæ¯&lt;br /&gt;
@@ -217,6 +577,155 @@ For example, a query joins two subquerie
       </item>
     
       <item>
+        <title>Why did Youzan choose Kylin4</title>
+        <description>&lt;p&gt;At the QCon Global Software Developers 
Conference held on May 29, 2021, Zheng Shengjun, head of Youzanâs data 
infrastructure platform, shared Youzanâs internal use experience and 
optimization practice of Kylin 4.0 on the meeting room of open source big data 
frameworks and applications. &lt;br /&gt;
+For many users of Kylin2/3(Kylin on HBase), this is also a chance to learn how 
and why to upgrade to Kylin 4.&lt;/p&gt;
+
+&lt;p&gt;This sharing is mainly divided into the following parts:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The reason for choosing Kylin 4&lt;/li&gt;
+  &lt;li&gt;Introduction to Kylin 4&lt;/li&gt;
+  &lt;li&gt;How to optimize performance of Kylin 4&lt;/li&gt;
+  &lt;li&gt;Practice of Kylin 4 in Youzan&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;the-reason-for-choosing-kylin-4&quot;&gt;01 The reason for 
choosing Kylin 4&lt;/h2&gt;
+
+&lt;h3 id=&quot;introduction-to-youzan&quot;&gt;Introduction to 
Youzan&lt;/h3&gt;
+&lt;p&gt;China Youzan Co., Ltd (stock code 08083.HK). is an enterprise mainly 
engaged in retail technology services.&lt;br /&gt;
+At present, it owns several tools and solutions to provide SaaS software 
products and talent services to help merchants operate mobile social e-commerce 
and new retail channels in an all-round way. &lt;br /&gt;
+Currently Youzan has hundreds of millions of consumers and 6 million existing 
merchants.&lt;/p&gt;
+
+&lt;h3 id=&quot;history-of-kylin-in-youzan&quot;&gt;History of Kylin in 
Youzan&lt;/h3&gt;
+&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/1 
history_of_youzan_OLAP.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;First of all, I would like to share why Youzan chose to upgrade to 
Kylin 4. Here, let me briefly reviewed the history of Youzan OLAP 
infra.&lt;/p&gt;
+
+&lt;p&gt;In the early days of Youzan, in order to iterate develop process 
quickly, we chose the method of pre-computation + MySQL; in 2018, Druid was 
introduced because of query flexibility and development efficiency, but there 
were problems such as low pre-aggregation, not supporting precisely count 
distinct measure. In this situation, Youzan introduced Apache Kylin and 
ClickHouse. Kylin supports high aggregation, precisely count distinct measure 
and the lowest RT, while ClickHouse is quite flexible in usage(ad hoc 
query).&lt;/p&gt;
+
+&lt;p&gt;From the introduction of Kylin in 2018 to now, Youzan has used Kylin 
for more than three years. With the continuous enrichment of business scenarios 
and the continuous accumulation of data volume, Youzan currently has 6 million 
existing merchants, GMV in 2020 is 107.3 billion, and the daily build data 
volume is 10 billion +. At present, Kylin has basically covered all the 
business scenarios of Youzan.&lt;/p&gt;
+
+&lt;h3 id=&quot;the-challenges-of-kylin-3&quot;&gt;The challenges of Kylin 
3&lt;/h3&gt;
+&lt;p&gt;With Youzanâs rapid development and in-depth use of Kylin, we also 
encountered some challenges:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;First of all, the build performance of Kylin on HBase cannot meet 
the favorable expectations, and the build performance will affect the userâs 
failure recovery time and stability experience;&lt;/li&gt;
+  &lt;li&gt;Secondly, with the access of more large merchants (tens of 
millions of members in a single store, with hundreds of thousands of goods for 
each store), it also brings great challenges to our OLAP system. Kylin on HBase 
is limited by the single-point query of Query Server, and cannot support these 
complex scenarios well;&lt;/li&gt;
+  &lt;li&gt;Finally, because HBase is not a cloud-native system, it is 
difficult to achieve flexible scale up and scale down. With the continuous 
growth of data volume, this system has peaks and valleys for businesses, which 
results in the average resource utilization rate is not high enough.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Faced with these challenges, Youzan chose to move closer and upgrade 
to the more cloud-native Apache Kylin 4.&lt;/p&gt;
+
+&lt;h2 id=&quot;introduction-to-kylin-4&quot;&gt;02 Introduction to Kylin 
4&lt;/h2&gt;
+&lt;p&gt;First of all, letâs introduce the main advantages of Kylin 4. 
Apache Kylin 4 completely depends on Spark for cubing job and query. It can 
make full use of Sparkâs parallelization, quantization(åéå), and global 
dynamic code generation technologies to improve the efficiency of large 
queries.&lt;br /&gt;
+Here is a brief introduction to the principle of Kylin 4, that is storage 
engine, build engine and query engine.&lt;/p&gt;
+
+&lt;h3 id=&quot;storage-engine&quot;&gt;Storage engine&lt;/h3&gt;
+&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/2 kylin4_storage.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;First of all, letâs take a look at the new storage engine, 
comparison between Kylin on HBase and Kylin on Parquet. The cuboid data of 
Kylin on HBase is stored in the table of HBase. Single Segment corresponds to 
one HBase table. Aggregation is pushed down to HBase coprocessor.&lt;/p&gt;
+
+&lt;p&gt;But as we know,  HBase is not a real Columnar Storage and its 
throughput is not enough for OLAP System. Kylin 4 replaces HBase with Parquet, 
all the data is stored in files. Each segment will have a corresponding HDFS 
directory. All queries and cubing jobs read and write files without HBase . 
Although there will be a certain loss of performance for simple queries, the 
improvement brought about by complex queries is more considerable and 
worthwhile.&lt;/p&gt;
+
+&lt;h3 id=&quot;build-engine&quot;&gt;Build engine&lt;/h3&gt;
+&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/3 kylin4_build_engine.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;The second is the new build engine. Based on our test, the build 
speed of Kylin on Parquet has been optimized from 82 minutes to 15 minutes. 
There are several reasons:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Kylin 4 removes the encoding of the dimension, eliminating a 
building step of encoding;&lt;/li&gt;
+  &lt;li&gt;Removed the HBase File generation step;&lt;/li&gt;
+  &lt;li&gt;Kylin on Parquet changes the granularity of cubing to cuboid 
level, which is conducive to further improving parallelism of cubing 
job.&lt;/li&gt;
+  &lt;li&gt;Enhanced implementation for global dictionary. In the new 
algorithm, dictionary and source data are hashed into the same buckets, making 
it possible for loading only piece of dictionary bucket to encode source 
data.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;As you can see on the right, after upgradation to Kylin 4, cubing job 
changes from ten steps to two steps, the performance improvement of the 
construction is very obvious.&lt;/p&gt;
+
+&lt;h3 id=&quot;query-engine&quot;&gt;Query engine&lt;/h3&gt;
+&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/4 kylin4_query.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Next is the new query engine of Kylin 4. As you can see, the 
calculation of Kylin on HBase is completely dependent on the coprocessor of 
HBase and query server process. When the data is read from HBase into query 
server to do aggregation, sorting, etc, the bottleneck will be restricted by 
the single point of query server. But Kylin 4 is converted to a fully 
distributed query mechanism based on Spark, whatâs more, it âs able to do 
configuration tuning automatically in spark query step !&lt;/p&gt;
+
+&lt;h2 id=&quot;how-to-optimize-performance-of-kylin-4&quot;&gt;03 How to 
optimize performance of Kylin 4&lt;/h2&gt;
+&lt;p&gt;Next, Iâd like to share some performance optimizations made by 
Youzan in Kylin 4.&lt;/p&gt;
+
+&lt;h3 id=&quot;optimization-of-query-engine&quot;&gt;Optimization of query 
engine&lt;/h3&gt;
+&lt;p&gt;#### 1.Cache Calcite physical plan&lt;br /&gt;
+&lt;img src=&quot;/images/blog/youzan/5 cache_calcite_plan.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;In Kylin4, SQL will be analyzed, optimized and do code generation in 
calcite. This step takes up about 150ms for some queries. We have supported 
PreparedStatementCache in Kylin4 to cache calcite plan, so that the structured 
SQL donât have to do the same step again. With this optimization it saved 
about 150ms of time cost.&lt;/p&gt;
+
+&lt;h4 id=&quot;tunning-spark-configuration&quot;&gt;2.Tunning spark 
configuration&lt;/h4&gt;
+&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/6 
tuning_spark_configuration.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Kylin4 uses spark as query engine. As spark is a distributed engine 
designed for massive data processing, itâs inevitable to loose some 
performance for small queries. We have tried to do some tuning to catch up with 
the latency in Kylin on HBase for small queries.&lt;/p&gt;
+
+&lt;p&gt;Our first optimization is to make more calculations finish in memory. 
The key is to avoid data spill during aggregation, shuffle and sort. Tuning the 
following configuration is helpful.&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;1.set &lt;code 
class=&quot;highlighter-rouge&quot;&gt;spark.sql.objectHashAggregate.sortBased.fallbackThreshold&lt;/code&gt;
 to larger value to avoid HashAggregate fall back to Sort Based Aggregate, 
which really kills performance when happens.&lt;/li&gt;
+  &lt;li&gt;2.set &lt;code 
class=&quot;highlighter-rouge&quot;&gt;spark.shuffle.spill.initialMemoryThreshold&lt;/code&gt;
 to a large value to avoid to many spills during shuffle.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Secondly, we route small queries to Query Server which run spark in 
local mode. Because the overhead of task schedule, shuffle read and variable 
broadcast is enlarged for small queries on YARN/Standalone mode.&lt;/p&gt;
+
+&lt;p&gt;Thirdly, we use RAM disk to enhance shuffle performance. Mount RAM 
disk as TMPFS and set spark.local.dir to directory using RAM disk.&lt;/p&gt;
+
+&lt;p&gt;Lastly, we disabled sparkâs whole stage code generation for small 
queries, for sparkâs whole stage code generation will cost about 100ms~200ms, 
whereas itâs not beneficial to small queries which is a simple 
project.&lt;/p&gt;
+
+&lt;h4 id=&quot;parquet-optimization&quot;&gt;3.Parquet optimization&lt;/h4&gt;
+&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/7 
parquet_optimization.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Optimizing parquet is also important for queries.&lt;/p&gt;
+
+&lt;p&gt;The first principal is that weâd better always include shard by 
column in our filter condition, for parquet files are shard by shard-by-column, 
filter using shard by column reduces the data files to read.&lt;/p&gt;
+
+&lt;p&gt;Then look into parquet files, data within files are sorted by rowkey 
columns, that is to say, prefix match in query is as important as Kylin on 
HBase. When a query condition satisfies prefix match, it can filter row groups 
with columnâs max/min index. Furthermore, we can reduce row group size to 
make finer index granularity, but be aware that the compression rate will be 
lower if we set row group size smaller.&lt;/p&gt;
+
+&lt;h4 
id=&quot;dynamic-elimination-of-partitioning-dimensions&quot;&gt;4.Dynamic 
elimination of partitioning dimensions&lt;/h4&gt;
+&lt;p&gt;Kylin4 have a new ability that the older version is not capable of, 
which is able to reduce dozens of times of data reading and computing for some 
big queries. Itâs offen the case that partition column is used to filter data 
but not used as group dimension. For those cases Kylin would always choose 
cuboid with partition column, but now it is able to use different cuboid in 
that query to reduce IO read and computing.&lt;/p&gt;
+
+&lt;p&gt;The key of this optimization is to split a query into two parts, one 
of the part uses all segmentâs data so that partition column doesnât have 
to be included in cuboid, the other part that uses part of segments data will 
choose cuboid with partition dimension to do the data filter.&lt;/p&gt;
+
+&lt;p&gt;We have tested that in some situations the response time reduced from 
20s to 6s, 10s to 3s.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/8 
Dynamic_elimination_of_partitioning_dimensions.png&quot; alt=&quot;&quot; 
/&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;optimization-of-build-engine&quot;&gt;Optimization of build 
engine&lt;/h3&gt;
+&lt;p&gt;#### 1.cache parent dataset&lt;br /&gt;
+&lt;img src=&quot;/images/blog/youzan/9 cache_parent_dataset.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Kylin build cube layer by layer. For a parent layer with multi 
cuboids to build, we can choose to cache parent dataset by setting 
kylin.engine.spark.parent-dataset.max.persist.count to a number greater than 0. 
But notice that if you set this value too small, it will affect the parallelism 
of build job, as the build granularity is at cuboid level.&lt;/p&gt;
+
+&lt;h2 id=&quot;practice-of-kylin-4-in-youzan&quot;&gt;04 Practice of Kylin 4 
in Youzan&lt;/h2&gt;
+&lt;p&gt;After introducing Youzanâs experience of performance optimization, 
letâs share the optimization effect. That is, Kylin 4âs practice in Youzan 
includes the upgrade process and the performance of online system.&lt;/p&gt;
+
+&lt;h3 id=&quot;upgrade-metadata-to-adapt-to-kylin-4&quot;&gt;Upgrade metadata 
to adapt to Kylin 4&lt;/h3&gt;
+&lt;p&gt;First of all, for metadata for Kylin 3 which stored on HBase, we have 
developed a tool for seamless upgrading of metadata. First of all, our metadata 
in Kylin on HBase is stored in HBase. We export the metadata in HBase into 
local files, and then use tools to transform and write back the new metadata 
into MySQL. We also updated the operation documents and general principles in 
the official wiki of Apache Kylin. For more details, you can refer to: &lt;a 
href=&quot;https://wiki.apache.org/confluence/display/KYLIN/How+to+migrate+metadata+to+Kylin+4&quot;&gt;How
 to migrate metadata to Kylin 4&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;Letâs give a general introduction to some compatibility in the 
whole process. The project metadata, tables metadata, permission-related 
metadata, and model metadata do not need be modified. What needs to be modified 
is the cube metadata, including the type of storage and query used by Cube. 
After updating these two fields, you need to recalculate the Cube signature. 
The function of this signature is designed internally by Kylin to avoid some 
problems caused by Cube after Cube is determined.&lt;/p&gt;
+
+&lt;h3 
id=&quot;performance-of-kylin-4-on-youzan-online-system&quot;&gt;Performance of 
Kylin 4 on Youzan online system&lt;/h3&gt;
+&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/10 commodity_insight.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;After the migration of metadata to Kylin4, letâs share the 
qualitative changes and substantial performance improvements brought about by 
some of the promising scenarios. First of all, in a scenario like Commodity 
Insight, there is a large store with several hundred thousand of commodities. 
We have to analyze its transactions and traffic, etc. There are more than a 
dozen precise precisely count distinct measures in single cube. Precisely count 
distinct measure is actually very inefficient if it is not optimized through 
pre-calculation and Bitmap. Kylin currently uses Bitmap to support precisely 
count distinct measure. In a scene that requires complex queries to sort 
hundreds of thousands of commodities in various UV(precisely count distinct 
measure), the RT of Kylin 2 is 27 seconds, while the RT of Kylin 4 is reduced 
from 27 seconds to less than 2 seconds.&lt;/p&gt;
+
+&lt;p&gt;What I find most appealing to me about Kylin 4 is that itâs like a 
manual transmission car, you can control its query concurrency at your will, 
whereas you canât change query concurrency in Kylin on HBase freely, because 
its concurrency is completely tied to the number of regions.&lt;/p&gt;
+
+&lt;h3 id=&quot;plan-for-kylin-4-in-youzan&quot;&gt;Plan for Kylin 4 in 
Youzan&lt;/h3&gt;
+&lt;p&gt;We have made full test, fixed several bugs and improved apache KYLIN4 
for several months. Now we are migrating cubes from older version to newer 
version. For the cubes already migrated to KYLIN4, its small queriesâ 
performance meet our expectations, its complex query and build performance did 
bring us a big surprise. We are planning to migrate all cubes from older 
version to Kylin4.&lt;/p&gt;
+</description>
+        <pubDate>Thu, 17 Jun 2021 08:00:00 -0700</pubDate>
+        
<link>http://kylin.apache.org/blog/2021/06/17/Why-did-Youzan-choose-Kylin4/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2021/06/17/Why-did-Youzan-choose-Kylin4/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>æèµä¸ºä»ä¹éæ© Kylin4</title>
         <description>&lt;p&gt;å¨ 2021å¹´5æ29æ¥ä¸¾åç QCon å
¨çè½¯ä»¶å¼åèå¤§ä¼ä¸ï¼æ¥èªæèµçæ°æ®åºç¡å¹³å°è´è´£äºº 
éçä¿ å¨å¤§æ°æ®å¼æºæ¡æ¶ä¸åºç¨ä¸é¢ä¸åäº«äºæèµåé¨å¯¹ 
Kylin 4.0 çä½¿ç¨ç»ååä¼åå®è·µï¼å¯¹äºä¼å¤ Kylin 
èç¨æ·æ¥è¯´ï¼è¿ä¹æ¯åçº§ Kylin 4 çå®ç¨æ»ç¥ã&lt;/p&gt;
 
@@ -376,155 +885,6 @@ For example, a query joins two subquerie
       </item>
     
       <item>
-        <title>Why did Youzan choose Kylin4</title>
-        <description>&lt;p&gt;At the QCon Global Software Developers 
Conference held on May 29, 2021, Zheng Shengjun, head of Youzanâs data 
infrastructure platform, shared Youzanâs internal use experience and 
optimization practice of Kylin 4.0 on the meeting room of open source big data 
frameworks and applications. &lt;br /&gt;
-For many users of Kylin2/3(Kylin on HBase), this is also a chance to learn how 
and why to upgrade to Kylin 4.&lt;/p&gt;
-
-&lt;p&gt;This sharing is mainly divided into the following parts:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;The reason for choosing Kylin 4&lt;/li&gt;
-  &lt;li&gt;Introduction to Kylin 4&lt;/li&gt;
-  &lt;li&gt;How to optimize performance of Kylin 4&lt;/li&gt;
-  &lt;li&gt;Practice of Kylin 4 in Youzan&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;h2 id=&quot;the-reason-for-choosing-kylin-4&quot;&gt;01 The reason for 
choosing Kylin 4&lt;/h2&gt;
-
-&lt;h3 id=&quot;introduction-to-youzan&quot;&gt;Introduction to 
Youzan&lt;/h3&gt;
-&lt;p&gt;China Youzan Co., Ltd (stock code 08083.HK). is an enterprise mainly 
engaged in retail technology services.&lt;br /&gt;
-At present, it owns several tools and solutions to provide SaaS software 
products and talent services to help merchants operate mobile social e-commerce 
and new retail channels in an all-round way. &lt;br /&gt;
-Currently Youzan has hundreds of millions of consumers and 6 million existing 
merchants.&lt;/p&gt;
-
-&lt;h3 id=&quot;history-of-kylin-in-youzan&quot;&gt;History of Kylin in 
Youzan&lt;/h3&gt;
-&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/1 
history_of_youzan_OLAP.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;First of all, I would like to share why Youzan chose to upgrade to 
Kylin 4. Here, let me briefly reviewed the history of Youzan OLAP 
infra.&lt;/p&gt;
-
-&lt;p&gt;In the early days of Youzan, in order to iterate develop process 
quickly, we chose the method of pre-computation + MySQL; in 2018, Druid was 
introduced because of query flexibility and development efficiency, but there 
were problems such as low pre-aggregation, not supporting precisely count 
distinct measure. In this situation, Youzan introduced Apache Kylin and 
ClickHouse. Kylin supports high aggregation, precisely count distinct measure 
and the lowest RT, while ClickHouse is quite flexible in usage(ad hoc 
query).&lt;/p&gt;
-
-&lt;p&gt;From the introduction of Kylin in 2018 to now, Youzan has used Kylin 
for more than three years. With the continuous enrichment of business scenarios 
and the continuous accumulation of data volume, Youzan currently has 6 million 
existing merchants, GMV in 2020 is 107.3 billion, and the daily build data 
volume is 10 billion +. At present, Kylin has basically covered all the 
business scenarios of Youzan.&lt;/p&gt;
-
-&lt;h3 id=&quot;the-challenges-of-kylin-3&quot;&gt;The challenges of Kylin 
3&lt;/h3&gt;
-&lt;p&gt;With Youzanâs rapid development and in-depth use of Kylin, we also 
encountered some challenges:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;First of all, the build performance of Kylin on HBase cannot meet 
the favorable expectations, and the build performance will affect the userâs 
failure recovery time and stability experience;&lt;/li&gt;
-  &lt;li&gt;Secondly, with the access of more large merchants (tens of 
millions of members in a single store, with hundreds of thousands of goods for 
each store), it also brings great challenges to our OLAP system. Kylin on HBase 
is limited by the single-point query of Query Server, and cannot support these 
complex scenarios well;&lt;/li&gt;
-  &lt;li&gt;Finally, because HBase is not a cloud-native system, it is 
difficult to achieve flexible scale up and scale down. With the continuous 
growth of data volume, this system has peaks and valleys for businesses, which 
results in the average resource utilization rate is not high enough.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Faced with these challenges, Youzan chose to move closer and upgrade 
to the more cloud-native Apache Kylin 4.&lt;/p&gt;
-
-&lt;h2 id=&quot;introduction-to-kylin-4&quot;&gt;02 Introduction to Kylin 
4&lt;/h2&gt;
-&lt;p&gt;First of all, letâs introduce the main advantages of Kylin 4. 
Apache Kylin 4 completely depends on Spark for cubing job and query. It can 
make full use of Sparkâs parallelization, quantization(åéå), and global 
dynamic code generation technologies to improve the efficiency of large 
queries.&lt;br /&gt;
-Here is a brief introduction to the principle of Kylin 4, that is storage 
engine, build engine and query engine.&lt;/p&gt;
-
-&lt;h3 id=&quot;storage-engine&quot;&gt;Storage engine&lt;/h3&gt;
-&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/2 kylin4_storage.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;First of all, letâs take a look at the new storage engine, 
comparison between Kylin on HBase and Kylin on Parquet. The cuboid data of 
Kylin on HBase is stored in the table of HBase. Single Segment corresponds to 
one HBase table. Aggregation is pushed down to HBase coprocessor.&lt;/p&gt;
-
-&lt;p&gt;But as we know,  HBase is not a real Columnar Storage and its 
throughput is not enough for OLAP System. Kylin 4 replaces HBase with Parquet, 
all the data is stored in files. Each segment will have a corresponding HDFS 
directory. All queries and cubing jobs read and write files without HBase . 
Although there will be a certain loss of performance for simple queries, the 
improvement brought about by complex queries is more considerable and 
worthwhile.&lt;/p&gt;
-
-&lt;h3 id=&quot;build-engine&quot;&gt;Build engine&lt;/h3&gt;
-&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/3 kylin4_build_engine.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;The second is the new build engine. Based on our test, the build 
speed of Kylin on Parquet has been optimized from 82 minutes to 15 minutes. 
There are several reasons:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Kylin 4 removes the encoding of the dimension, eliminating a 
building step of encoding;&lt;/li&gt;
-  &lt;li&gt;Removed the HBase File generation step;&lt;/li&gt;
-  &lt;li&gt;Kylin on Parquet changes the granularity of cubing to cuboid 
level, which is conducive to further improving parallelism of cubing 
job.&lt;/li&gt;
-  &lt;li&gt;Enhanced implementation for global dictionary. In the new 
algorithm, dictionary and source data are hashed into the same buckets, making 
it possible for loading only piece of dictionary bucket to encode source 
data.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;As you can see on the right, after upgradation to Kylin 4, cubing job 
changes from ten steps to two steps, the performance improvement of the 
construction is very obvious.&lt;/p&gt;
-
-&lt;h3 id=&quot;query-engine&quot;&gt;Query engine&lt;/h3&gt;
-&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/4 kylin4_query.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Next is the new query engine of Kylin 4. As you can see, the 
calculation of Kylin on HBase is completely dependent on the coprocessor of 
HBase and query server process. When the data is read from HBase into query 
server to do aggregation, sorting, etc, the bottleneck will be restricted by 
the single point of query server. But Kylin 4 is converted to a fully 
distributed query mechanism based on Spark, whatâs more, it âs able to do 
configuration tuning automatically in spark query step !&lt;/p&gt;
-
-&lt;h2 id=&quot;how-to-optimize-performance-of-kylin-4&quot;&gt;03 How to 
optimize performance of Kylin 4&lt;/h2&gt;
-&lt;p&gt;Next, Iâd like to share some performance optimizations made by 
Youzan in Kylin 4.&lt;/p&gt;
-
-&lt;h3 id=&quot;optimization-of-query-engine&quot;&gt;Optimization of query 
engine&lt;/h3&gt;
-&lt;p&gt;#### 1.Cache Calcite physical plan&lt;br /&gt;
-&lt;img src=&quot;/images/blog/youzan/5 cache_calcite_plan.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;In Kylin4, SQL will be analyzed, optimized and do code generation in 
calcite. This step takes up about 150ms for some queries. We have supported 
PreparedStatementCache in Kylin4 to cache calcite plan, so that the structured 
SQL donât have to do the same step again. With this optimization it saved 
about 150ms of time cost.&lt;/p&gt;
-
-&lt;h4 id=&quot;tunning-spark-configuration&quot;&gt;2.Tunning spark 
configuration&lt;/h4&gt;
-&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/6 
tuning_spark_configuration.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Kylin4 uses spark as query engine. As spark is a distributed engine 
designed for massive data processing, itâs inevitable to loose some 
performance for small queries. We have tried to do some tuning to catch up with 
the latency in Kylin on HBase for small queries.&lt;/p&gt;
-
-&lt;p&gt;Our first optimization is to make more calculations finish in memory. 
The key is to avoid data spill during aggregation, shuffle and sort. Tuning the 
following configuration is helpful.&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;1.set &lt;code 
class=&quot;highlighter-rouge&quot;&gt;spark.sql.objectHashAggregate.sortBased.fallbackThreshold&lt;/code&gt;
 to larger value to avoid HashAggregate fall back to Sort Based Aggregate, 
which really kills performance when happens.&lt;/li&gt;
-  &lt;li&gt;2.set &lt;code 
class=&quot;highlighter-rouge&quot;&gt;spark.shuffle.spill.initialMemoryThreshold&lt;/code&gt;
 to a large value to avoid to many spills during shuffle.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Secondly, we route small queries to Query Server which run spark in 
local mode. Because the overhead of task schedule, shuffle read and variable 
broadcast is enlarged for small queries on YARN/Standalone mode.&lt;/p&gt;
-
-&lt;p&gt;Thirdly, we use RAM disk to enhance shuffle performance. Mount RAM 
disk as TMPFS and set spark.local.dir to directory using RAM disk.&lt;/p&gt;
-
-&lt;p&gt;Lastly, we disabled sparkâs whole stage code generation for small 
queries, for sparkâs whole stage code generation will cost about 100ms~200ms, 
whereas itâs not beneficial to small queries which is a simple 
project.&lt;/p&gt;
-
-&lt;h4 id=&quot;parquet-optimization&quot;&gt;3.Parquet optimization&lt;/h4&gt;
-&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/7 
parquet_optimization.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Optimizing parquet is also important for queries.&lt;/p&gt;
-
-&lt;p&gt;The first principal is that weâd better always include shard by 
column in our filter condition, for parquet files are shard by shard-by-column, 
filter using shard by column reduces the data files to read.&lt;/p&gt;
-
-&lt;p&gt;Then look into parquet files, data within files are sorted by rowkey 
columns, that is to say, prefix match in query is as important as Kylin on 
HBase. When a query condition satisfies prefix match, it can filter row groups 
with columnâs max/min index. Furthermore, we can reduce row group size to 
make finer index granularity, but be aware that the compression rate will be 
lower if we set row group size smaller.&lt;/p&gt;
-
-&lt;h4 
id=&quot;dynamic-elimination-of-partitioning-dimensions&quot;&gt;4.Dynamic 
elimination of partitioning dimensions&lt;/h4&gt;
-&lt;p&gt;Kylin4 have a new ability that the older version is not capable of, 
which is able to reduce dozens of times of data reading and computing for some 
big queries. Itâs offen the case that partition column is used to filter data 
but not used as group dimension. For those cases Kylin would always choose 
cuboid with partition column, but now it is able to use different cuboid in 
that query to reduce IO read and computing.&lt;/p&gt;
-
-&lt;p&gt;The key of this optimization is to split a query into two parts, one 
of the part uses all segmentâs data so that partition column doesnât have 
to be included in cuboid, the other part that uses part of segments data will 
choose cuboid with partition dimension to do the data filter.&lt;/p&gt;
-
-&lt;p&gt;We have tested that in some situations the response time reduced from 
20s to 6s, 10s to 3s.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/8 
Dynamic_elimination_of_partitioning_dimensions.png&quot; alt=&quot;&quot; 
/&gt;&lt;/p&gt;
-
-&lt;h3 id=&quot;optimization-of-build-engine&quot;&gt;Optimization of build 
engine&lt;/h3&gt;
-&lt;p&gt;#### 1.cache parent dataset&lt;br /&gt;
-&lt;img src=&quot;/images/blog/youzan/9 cache_parent_dataset.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Kylin build cube layer by layer. For a parent layer with multi 
cuboids to build, we can choose to cache parent dataset by setting 
kylin.engine.spark.parent-dataset.max.persist.count to a number greater than 0. 
But notice that if you set this value too small, it will affect the parallelism 
of build job, as the build granularity is at cuboid level.&lt;/p&gt;
-
-&lt;h2 id=&quot;practice-of-kylin-4-in-youzan&quot;&gt;04 Practice of Kylin 4 
in Youzan&lt;/h2&gt;
-&lt;p&gt;After introducing Youzanâs experience of performance optimization, 
letâs share the optimization effect. That is, Kylin 4âs practice in Youzan 
includes the upgrade process and the performance of online system.&lt;/p&gt;
-
-&lt;h3 id=&quot;upgrade-metadata-to-adapt-to-kylin-4&quot;&gt;Upgrade metadata 
to adapt to Kylin 4&lt;/h3&gt;
-&lt;p&gt;First of all, for metadata for Kylin 3 which stored on HBase, we have 
developed a tool for seamless upgrading of metadata. First of all, our metadata 
in Kylin on HBase is stored in HBase. We export the metadata in HBase into 
local files, and then use tools to transform and write back the new metadata 
into MySQL. We also updated the operation documents and general principles in 
the official wiki of Apache Kylin. For more details, you can refer to: &lt;a 
href=&quot;https://wiki.apache.org/confluence/display/KYLIN/How+to+migrate+metadata+to+Kylin+4&quot;&gt;How
 to migrate metadata to Kylin 4&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;Letâs give a general introduction to some compatibility in the 
whole process. The project metadata, tables metadata, permission-related 
metadata, and model metadata do not need be modified. What needs to be modified 
is the cube metadata, including the type of storage and query used by Cube. 
After updating these two fields, you need to recalculate the Cube signature. 
The function of this signature is designed internally by Kylin to avoid some 
problems caused by Cube after Cube is determined.&lt;/p&gt;
-
-&lt;h3 
id=&quot;performance-of-kylin-4-on-youzan-online-system&quot;&gt;Performance of 
Kylin 4 on Youzan online system&lt;/h3&gt;
-&lt;p&gt;&lt;img src=&quot;/images/blog/youzan/10 commodity_insight.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;After the migration of metadata to Kylin4, letâs share the 
qualitative changes and substantial performance improvements brought about by 
some of the promising scenarios. First of all, in a scenario like Commodity 
Insight, there is a large store with several hundred thousand of commodities. 
We have to analyze its transactions and traffic, etc. There are more than a 
dozen precise precisely count distinct measures in single cube. Precisely count 
distinct measure is actually very inefficient if it is not optimized through 
pre-calculation and Bitmap. Kylin currently uses Bitmap to support precisely 
count distinct measure. In a scene that requires complex queries to sort 
hundreds of thousands of commodities in various UV(precisely count distinct 
measure), the RT of Kylin 2 is 27 seconds, while the RT of Kylin 4 is reduced 
from 27 seconds to less than 2 seconds.&lt;/p&gt;
-
-&lt;p&gt;What I find most appealing to me about Kylin 4 is that itâs like a 
manual transmission car, you can control its query concurrency at your will, 
whereas you canât change query concurrency in Kylin on HBase freely, because 
its concurrency is completely tied to the number of regions.&lt;/p&gt;
-
-&lt;h3 id=&quot;plan-for-kylin-4-in-youzan&quot;&gt;Plan for Kylin 4 in 
Youzan&lt;/h3&gt;
-&lt;p&gt;We have made full test, fixed several bugs and improved apache KYLIN4 
for several months. Now we are migrating cubes from older version to newer 
version. For the cubes already migrated to KYLIN4, its small queriesâ 
performance meet our expectations, its complex query and build performance did 
bring us a big surprise. We are planning to migrate all cubes from older 
version to Kylin4.&lt;/p&gt;
-</description>
-        <pubDate>Thu, 17 Jun 2021 08:00:00 -0700</pubDate>
-        
<link>http://kylin.apache.org/blog/2021/06/17/Why-did-Youzan-choose-Kylin4/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2021/06/17/Why-did-Youzan-choose-Kylin4/</guid>
-        
-        
-        <category>blog</category>
-        
-      </item>
-    
-      <item>
         <title>ä½ ç¦»å¯è§åé·ç«å¤§å±åªå·®ä¸å¥ Kylin + Davinci</title>
         <description>&lt;p&gt;Kylin æä¾ä¸ BI å·¥å·çæ´åè½åï¼å¦ 
Tableauï¼PowerBI/Excelï¼MSTRï¼QlikSenseï¼Hue å 
SuperSetãä½å°±å¯è§åå·¥å·èè¨ï¼Davinci 
è¯å¥½çäº¤äºæ§åä¸ªæ§åçå¯è§åå¤§å±å±ç°ææï¼ä½¿å¶ä¸ Kylin 
çç»åè½è®©å¤§é¨åç¨æ·ææ´å¥½çå¯è§ååæä½éªã&lt;/p&gt;
 
@@ -1394,304 +1754,6 @@ if (assignments.getPartitionsByReplicaSe
         
         
         <category>blog</category>
-        
-      </item>
-    
-      <item>
-        <title>Use Python for Data Science with Apache Kylin</title>
-        <description>&lt;p&gt;Original from &lt;a 
href=&quot;https://kyligence.io/blog/use-python-for-data-science-with-apache-kylin/&quot;&gt;Kyligence
 tech blog&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;In todayâs world, Big Data, data science, and machine learning 
analytics and are not only hot topics, theyâre also an essential part of our 
society. Data is everywhere, and the amount of digital data that exists is 
growing at a rapid rate. According to &lt;a 
href=&quot;https://www.forbes.com/sites/tomcoughlin/2018/11/27/175-zettabytes-by-2025/#622d803d5459&quot;&gt;Forbes&lt;/a&gt;,
 around 175 Zettabytes of data will be generated annually by 2025.&lt;/p&gt;
-
-&lt;p&gt;The economy, healthcare, agriculture, energy, media, education and 
all other critical human activities rely more and more on the advanced 
processing and analysis of large quantities of collected data. However, these 
massive datasets pose a real challenge to data analytics, data mining, machine 
learning and data science.&lt;/p&gt;
-
-&lt;p&gt;Data Scientists and analysts have often expressed frustration while 
trying to work with Big Data. The good news is that there is a solution: Apache 
Kylin. Kylin solves this Big Data dilemma by integrating with Python to help 
analysts &amp;amp; data scientists finally gain unfettered access to their 
large-scale (terabyte and petabyte) datasets.&lt;/p&gt;
-
-&lt;h2 id=&quot;machine-learning-challenges&quot;&gt;Machine Learning 
Challenges&lt;/h2&gt;
-
-&lt;p&gt;One of the main challenges machine learning (ML) engineers and data 
scientists encounter when running computations with Big Data comes from the 
principle that higher volume or scale equates to greater computational 
complexity.&lt;/p&gt;
-
-&lt;p&gt;Consequently, as datasets scale up, even trivial operations can 
become costly. Moreover, as data volume rises, algorithm performance becomes 
increasingly dependent on the architecture used to store and move data. 
Parallel data structures, data partitioning and placement, and data reuse 
become more important as the amount of data one is working with grows.&lt;/p&gt;
-
-&lt;h2 id=&quot;what-apache-kylin-is-and-how-it-helps&quot;&gt;What Apache 
Kylin Is and How It Helps&lt;/h2&gt;
-
-&lt;p&gt;Apache Kylin is an open source distributed Big Data analytics engine 
designed to provide a SQL interface for multi-dimensional analysis (MOLAP) on 
Hadoop. It allows enterprises to rapidly analyze their massive datasets in a 
fraction of the time it would take using other approaches or Big Data analytics 
tools.&lt;/p&gt;
-
-&lt;p&gt;With Apache Kylin, data teams are able to dramatically cut down on 
analytics processing time and associated IT and ops costs. Itâs able to do 
this by pre-computing large datasets into one (or another very small amount) of 
OLAP cubes and storing them in a columnar database. This allows ML Engineers, 
data scientists, and analysts to quickly access the data and perform data 
mining activities to uncover hidden trends easily.&lt;/p&gt;
-
-&lt;p&gt;The Following diagram illustrates how machine learning and data 
science activities on big data become much easier when Apache Kylin is 
introduced.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/python-data-science/diagram1.png&quot; 
alt=&quot;diagram1&quot; /&gt;&lt;/p&gt;
-
-&lt;h2 id=&quot;how-to-integrate-python-with-apache-kylin&quot;&gt;How to 
Integrate Python with Apache Kylin&lt;/h2&gt;
-
-&lt;p&gt;Python has quickly risen in prominence to take its spot as one of the 
leading programming languages in the data analytics field (as well as outside 
the field). With its ease of use and extensive collection of libraries, Python 
has become well-positioned to take on Big Data.&lt;/p&gt;
-
-&lt;p&gt;Python also provides plenty of data mining tools to assist in the 
handling of data, offering up a variety of applications already adopted by the 
machine learning and data science communities. Simply put, if youâre working 
with Big Data, thereâs probably a way Python can make your job 
easier.&lt;/p&gt;
-
-&lt;p&gt;Apache Kylin can be easily integrated with Python with support from 
&lt;a 
href=&quot;https://github.com/Kyligence/kylinpy&quot;&gt;Kylinpy&lt;/a&gt;. 
Kylinpy is a python library that provides a SQLAlchemy Dialect implementation. 
Thus, any application that uses SQLAlchemy can now query Kylin OLAP cubes. 
Additionally, it also allows users to access data via Pandas data 
frames.&lt;/p&gt;
-
-&lt;p&gt;&lt;strong&gt;Sample code to access data via 
Pandas:&lt;/strong&gt;&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;$ python
-
- &amp;gt;&amp;gt;&amp;gt; import sqlalchemy as sa
- &amp;gt;&amp;gt;&amp;gt; import pandas as pd
- &amp;gt;&amp;gt;&amp;gt; kylin_engine = 
sa.create_engine(&#39;kylin://&amp;lt;username&amp;gt;:&amp;lt;password&amp;gt;@&amp;lt;IP&amp;gt;:&amp;lt;PORT&amp;gt;/&amp;lt;project_name&amp;gt;&#39;,
-â     connect_args={&#39;is_ssl&#39;: True, &#39;timeout&#39;: 60})
- &amp;gt;&amp;gt;&amp;gt; sql = &#39;select * from kylin_sales limit 10&#39;
- &amp;gt;&amp;gt;&amp;gt; dataframe = pd.read_sql(sql, kylin_engine)
- &amp;gt;&amp;gt;&amp;gt; print(dataframe)
-&lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;
-
-&lt;p&gt;&lt;strong&gt;Benefits of using Apache Kylin as Data 
Source:&lt;/strong&gt;&lt;/p&gt;
-


[... 249 lines stripped ...]

svn commit: r1891980 [3/4] - in /kylin/site: ./ blog/ blog/2021/08/ blog/2021/08/03/ blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/ cn/blog/ cn_blog/2021/08/ cn_blog/2021/08/03/ cn_blog/2021/08/03/How-Meituan-Dominates-Online-...

Reply via email to