Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1890886&r1=1890885&r2=1890886&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Fri Jun 18 02:57:25 2021 @@ -19,11 +19,319 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Wed, 02 Jun 2021 20:18:35 -0700</pubDate> - <lastBuildDate>Wed, 02 Jun 2021 20:18:35 -0700</lastBuildDate> + <pubDate>Thu, 17 Jun 2021 19:32:15 -0700</pubDate> + <lastBuildDate>Thu, 17 Jun 2021 19:32:15 -0700</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item> + <title>Why did Youzan choose Kylin4</title> + <description><p>At the QCon Global Software Developers Conference held on May 29, 2021, Zheng Shengjun, head of Youzanâs data infrastructure platform, shared Youzanâs internal use experience and optimization practice of Kylin 4.0 on the meeting room of open source big data frameworks and applications. <br /> +For many users of Kylin2/3(Kylin on HBase), this is also a chance to learn how and why to upgrade to Kylin 4.</p> + +<p>This sharing is mainly divided into the following parts:</p> + +<ul> + <li>The reason for choosing Kylin 4</li> + <li>Introduction to Kylin 4</li> + <li>How to optimize performance of Kylin 4</li> + <li>Practice of Kylin 4 in Youzan</li> +</ul> + +<h2 id="the-reason-for-choosing-kylin-4">01 The reason for choosing Kylin 4</h2> + +<h3 id="introduction-to-youzan">Introduction to Youzan</h3> +<p>China Youzan Co., Ltd (stock code 08083.HK). is an enterprise mainly engaged in retail technology services.<br /> +At present, it owns several tools and solutions to provide SaaS software products and talent services to help merchants operate mobile social e-commerce and new retail channels in an all-round way. <br /> +Currently Youzan has hundreds of millions of consumers and 6 million existing merchants.</p> + +<h3 id="history-of-kylin-in-youzan">History of Kylin in Youzan</h3> +<p><img src="/images/blog/youzan/1 history_of_youzan_OLAP.png" alt="" /></p> + +<p>First of all, I would like to share why Youzan chose to upgrade to Kylin 4. Here, let me briefly reviewed the history of Youzan OLAP infra.</p> + +<p>In the early days of Youzan, in order to iterate develop process quickly, we chose the method of pre-computation + MySQL; in 2018, Druid was introduced because of query flexibility and development efficiency, but there were problems such as low pre-aggregation, not supporting precisely count distinct measure. In this situation, Youzan introduced Apache Kylin and ClickHouse. Kylin supports high aggregation, precisely count distinct measure and the lowest RT, while ClickHouse is quite flexible in usage(ad hoc query).</p> + +<p>From the introduction of Kylin in 2018 to now, Youzan has used Kylin for more than three years. With the continuous enrichment of business scenarios and the continuous accumulation of data volume, Youzan currently has 6 million existing merchants, GMV in 2020 is 107.3 billion, and the daily build data volume is 10 billion +. At present, Kylin has basically covered all the business scenarios of Youzan.</p> + +<h3 id="the-challenges-of-kylin-3">The challenges of Kylin 3</h3> +<p>With Youzanâs rapid development and in-depth use of Kylin, we also encountered some challenges:</p> + +<ul> + <li>First of all, the build performance of Kylin on HBase cannot meet the favorable expectations, and the build performance will affect the userâs failure recovery time and stability experience;</li> + <li>Secondly, with the access of more large merchants (tens of millions of members in a single store, with hundreds of thousands of goods for each store), it also brings great challenges to our OLAP system. Kylin on HBase is limited by the single-point query of Query Server, and cannot support these complex scenarios well;</li> + <li>Finally, because HBase is not a cloud-native system, it is difficult to achieve flexible scale up and scale down. With the continuous growth of data volume, this system has peaks and valleys for businesses, which results in the average resource utilization rate is not high enough.</li> +</ul> + +<p>Faced with these challenges, Youzan chose to move closer and upgrade to the more cloud-native Apache Kylin 4.</p> + +<h2 id="introduction-to-kylin-4">02 Introduction to Kylin 4</h2> +<p>First of all, letâs introduce the main advantages of Kylin 4. Apache Kylin 4 completely depends on Spark for cubing job and query. It can make full use of Sparkâs parallelization, quantization(åéå), and global dynamic code generation technologies to improve the efficiency of large queries.<br /> +Here is a brief introduction to the principle of Kylin 4, that is storage engine, build engine and query engine.</p> + +<h3 id="storage-engine">Storage engine</h3> +<p><img src="/images/blog/youzan/2 kylin4_storage.png" alt="" /></p> + +<p>First of all, letâs take a look at the new storage engine, comparison between Kylin on HBase and Kylin on Parquet. The cuboid data of Kylin on HBase is stored in the table of HBase. Single Segment corresponds to one HBase table. Aggregation is pushed down to HBase coprocessor.</p> + +<p>But as we know, HBase is not a real Columnar Storage and its throughput is not enough for OLAP System. Kylin 4 replaces HBase with Parquet, all the data is stored in files. Each segment will have a corresponding HDFS directory. All queries and cubing jobs read and write files without HBase . Although there will be a certain loss of performance for simple queries, the improvement brought about by complex queries is more considerable and worthwhile.</p> + +<h3 id="build-engine">Build engine</h3> +<p><img src="/images/blog/youzan/3 kylin4_build_engine.png" alt="" /></p> + +<p>The second is the new build engine. Based on our test, the build speed of Kylin on Parquet has been optimized from 82 minutes to 15 minutes. There are several reasons:</p> + +<ul> + <li>Kylin 4 removes the encoding of the dimension, eliminating a building step of encoding;</li> + <li>Removed the HBase File generation step;</li> + <li>Kylin on Parquet changes the granularity of cubing to cuboid level, which is conducive to further improving parallelism of cubing job.</li> + <li>Enhanced implementation for global dictionary. In the new algorithm, dictionary and source data are hashed into the same buckets, making it possible for loading only piece of dictionary bucket to encode source data.</li> +</ul> + +<p>As you can see on the right, after upgradation to Kylin 4, cubing job changes from ten steps to two steps, the performance improvement of the construction is very obvious.</p> + +<h3 id="query-engine">Query engine</h3> +<p><img src="/images/blog/youzan/4 kylin4_query.png" alt="" /></p> + +<p>Next is the new query engine of Kylin 4. As you can see, the calculation of Kylin on HBase is completely dependent on the coprocessor of HBase and query server process. When the data is read from HBase into query server to do aggregation, sorting, etc, the bottleneck will be restricted by the single point of query server. But Kylin 4 is converted to a fully distributed query mechanism based on Spark, whatâs more, it âs able to do configuration tuning automatically in spark query step !</p> + +<h2 id="how-to-optimize-performance-of-kylin-4">03 How to optimize performance of Kylin 4</h2> +<p>Next, Iâd like to share some performance optimizations made by Youzan in Kylin 4.</p> + +<h3 id="optimization-of-query-engine">Optimization of query engine</h3> +<p>#### 1.Cache Calcite physical plan<br /> +<img src="/images/blog/youzan/5 cache_calcite_plan.png" alt="" /></p> + +<p>In Kylin4, SQL will be analyzed, optimized and do code generation in calcite. This step takes up about 150ms for some queries. We have supported PreparedStatementCache in Kylin4 to cache calcite plan, so that the structured SQL donât have to do the same step again. With this optimization it saved about 150ms of time cost.</p> + +<h4 id="tunning-spark-configuration">2.Tunning spark configuration</h4> +<p><img src="/images/blog/youzan/6 tuning_spark_configuration.png" alt="" /></p> + +<p>Kylin4 uses spark as query engine. As spark is a distributed engine designed for massive data processing, itâs inevitable to loose some performance for small queries. We have tried to do some tuning to catch up with the latency in Kylin on HBase for small queries.</p> + +<p>Our first optimization is to make more calculations finish in memory. The key is to avoid data spill during aggregation, shuffle and sort. Tuning the following configuration is helpful.</p> + +<ul> + <li>1.set <code class="highlighter-rouge">spark.sql.objectHashAggregate.sortBased.fallbackThreshold</code> to larger value to avoid HashAggregate fall back to Sort Based Aggregate, which really kills performance when happens.</li> + <li>2.set <code class="highlighter-rouge">spark.shuffle.spill.initialMemoryThreshold</code> to a large value to avoid to many spills during shuffle.</li> +</ul> + +<p>Secondly, we route small queries to Query Server which run spark in local mode. Because the overhead of task schedule, shuffle read and variable broadcast is enlarged for small queries on YARN/Standalone mode.</p> + +<p>Thirdly, we use RAM disk to enhance shuffle performance. Mount RAM disk as TMPFS and set spark.local.dir to directory using RAM disk.</p> + +<p>Lastly, we disabled sparkâs whole stage code generation for small queries, for sparkâs whole stage code generation will cost about 100ms~200ms, whereas itâs not beneficial to small queries which is a simple project.</p> + +<h4 id="parquet-optimization">3.Parquet optimization</h4> +<p><img src="/images/blog/youzan/7 parquet_optimization.png" alt="" /></p> + +<p>Optimizing parquet is also important for queries.</p> + +<p>The first principal is that weâd better always include shard by column in our filter condition, for parquet files are shard by shard-by-column, filter using shard by column reduces the data files to read.</p> + +<p>Then look into parquet files, data within files are sorted by rowkey columns, that is to say, prefix match in query is as important as Kylin on HBase. When a query condition satisfies prefix match, it can filter row groups with columnâs max/min index. Furthermore, we can reduce row group size to make finer index granularity, but be aware that the compression rate will be lower if we set row group size smaller.</p> + +<h4 id="dynamic-elimination-of-partitioning-dimensions">4.Dynamic elimination of partitioning dimensions</h4> +<p>Kylin4 have a new ability that the older version is not capable of, which is able to reduce dozens of times of data reading and computing for some big queries. Itâs offen the case that partition column is used to filter data but not used as group dimension. For those cases Kylin would always choose cuboid with partition column, but now it is able to use different cuboid in that query to reduce IO read and computing.</p> + +<p>The key of this optimization is to split a query into two parts, one of the part uses all segmentâs data so that partition column doesnât have to be included in cuboid, the other part that uses part of segments data will choose cuboid with partition dimension to do the data filter.</p> + +<p>We have tested that in some situations the response time reduced from 20s to 6s, 10s to 3s.</p> + +<p><img src="/images/blog/youzan/8 Dynamic_elimination_of_partitioning_dimensions.png" alt="" /></p> + +<h3 id="optimization-of-build-engine">Optimization of build engine</h3> +<p>#### 1.cache parent dataset<br /> +<img src="/images/blog/youzan/9 cache_parent_dataset.png" alt="" /></p> + +<p>Kylin build cube layer by layer. For a parent layer with multi cuboids to build, we can choose to cache parent dataset by setting kylin.engine.spark.parent-dataset.max.persist.count to a number greater than 0. But notice that if you set this value too small, it will affect the parallelism of build job, as the build granularity is at cuboid level.</p> + +<h2 id="practice-of-kylin-4-in-youzan">04 Practice of Kylin 4 in Youzan</h2> +<p>After introducing Youzanâs experience of performance optimization, letâs share the optimization effect. That is, Kylin 4âs practice in Youzan includes the upgrade process and the performance of online system.</p> + +<h3 id="upgrade-metadata-to-adapt-to-kylin-4">Upgrade metadata to adapt to Kylin 4</h3> +<p>First of all, for metadata for Kylin 3 which stored on HBase, we have developed a tool for seamless upgrading of metadata. First of all, our metadata in Kylin on HBase is stored in HBase. We export the metadata in HBase into local files, and then use tools to transform and write back the new metadata into MySQL. We also updated the operation documents and general principles in the official wiki of Apache Kylin. For more details, you can refer to: <a href="https://wiki.apache.org/confluence/display/KYLIN/How+to+migrate+metadata+to+Kylin+4">How to migrate metadata to Kylin 4</a>.</p> + +<p>Letâs give a general introduction to some compatibility in the whole process. The project metadata, tables metadata, permission-related metadata, and model metadata do not need be modified. What needs to be modified is the cube metadata, including the type of storage and query used by Cube. After updating these two fields, you need to recalculate the Cube signature. The function of this signature is designed internally by Kylin to avoid some problems caused by Cube after Cube is determined.</p> + +<h3 id="performance-of-kylin-4-on-youzan-online-system">Performance of Kylin 4 on Youzan online system</h3> +<p><img src="/images/blog/youzan/10 commodity_insight.png" alt="" /></p> + +<p>After the migration of metadata to Kylin4, letâs share the qualitative changes and substantial performance improvements brought about by some of the promising scenarios. First of all, in a scenario like Commodity Insight, there is a large store with several hundred thousand of commodities. We have to analyze its transactions and traffic, etc. There are more than a dozen precise precisely count distinct measures in single cube. Precisely count distinct measure is actually very inefficient if it is not optimized through pre-calculation and Bitmap. Kylin currently uses Bitmap to support precisely count distinct measure. In a scene that requires complex queries to sort hundreds of thousands of commodities in various UV(precisely count distinct measure), the RT of Kylin 2 is 27 seconds, while the RT of Kylin 4 is reduced from 27 seconds to less than 2 seconds.</p> + +<p>What I find most appealing to me about Kylin 4 is that itâs like a manual transmission car, you can control its query concurrency at your will, whereas you canât change query concurrency in Kylin on HBase freely, because its concurrency is completely tied to the number of regions.</p> + +<h3 id="plan-for-kylin-4-in-youzan">Plan for Kylin 4 in Youzan</h3> +<p>We have made full test, fixed several bugs and improved apache KYLIN4 for several months. Now we are migrating cubes from older version to newer version. For the cubes already migrated to KYLIN4, its small queriesâ performance meet our expectations, its complex query and build performance did bring us a big surprise. We are planning to migrate all cubes from older version to Kylin4.</p> +</description> + <pubDate>Thu, 17 Jun 2021 08:00:00 -0700</pubDate> + <link>http://kylin.apache.org/blog/2021/06/17/Why-did-Youzan-choose-Kylin4/</link> + <guid isPermaLink="true">http://kylin.apache.org/blog/2021/06/17/Why-did-Youzan-choose-Kylin4/</guid> + + + <category>blog</category> + + </item> + + <item> + <title>æèµä¸ºä»ä¹éæ© Kylin4</title> + <description><p>å¨ 2021å¹´5æ29æ¥ä¸¾åç QCon å ¨ç软件å¼åè 大ä¼ä¸ï¼æ¥èªæèµçæ°æ®åºç¡å¹³å°è´è´£äºº éçä¿ å¨å¤§æ°æ®å¼æºæ¡æ¶ä¸åºç¨ä¸é¢ä¸åäº«äºæèµå é¨å¯¹ Kylin 4.0 ç使ç¨ç»ååä¼åå®è·µï¼å¯¹äºä¼å¤ Kylin èç¨æ·æ¥è¯´ï¼è¿ä¹æ¯å级 Kylin 4 çå®ç¨æ»ç¥ã</p> + +<p>æ¬æ¬¡å享主è¦å为以ä¸å个é¨åï¼</p> + +<ul> + <li>æèµéç¨ Kylin 4 çåå </li> + <li>Kylin 4 åçä»ç»</li> + <li>Kylin 4 æ§è½ä¼å</li> + <li>Kylin 4 卿èµçå®è·µ</li> +</ul> + +<h2 id="kylin-4-">01 æèµéç¨ Kylin 4 çåå </h2> +<p>é¦å å享æèµä¸ºä»ä¹ä¼éæ©å级为 Kylin 4ï¼è¿éå ç®åå顾ä¸ä¸æèµ OLAP çåå±åç¨ï¼æèµåæä¸ºäºå¿«éè¿ä»£ï¼éæ©äºé¢è®¡ç® + MySQL çæ¹å¼ï¼2018å¹´ï¼å 为æ¥è¯¢çµæ´»åå¼åæçå¼å ¥äº Druidï¼ä½æ¯åå¨é¢èå度ä¸é«ã䏿¯æç²¾ç¡®å»éåæç» OLAP çé®é¢ï¼å¨è¿æ ·çèæ¯ä¸ï¼æèµå¼å ¥äºæ»¡è¶³èå度é«ãæ¯æç²¾ç¡®å»éå RT æä½ç Apache Kylin åæ¥è¯¢éå¸¸çµæ´»ç ROLAP ClickHouseã</p> + +<p>ä»2018å¹´å¼å ¥ Kylin å°ç°å¨ï¼æèµå·²ç»ä½¿ç¨ Kylin ä¸å¹´å¤äºãéçä¸å¡åºæ¯ç䏿䏰å¯åæ°æ®éçä¸æç§¯ç´¯ï¼æèµç®åæ 600 ä¸çåéåå®¶ï¼2020å¹´ GMV æ¯ 1073äº¿ï¼æ¥æå»ºé为 100 亿+ï¼ç®å Kylin å·²ç»åºæ¬è¦çäºæèµææçä¸å¡èå´ã</p> + +<p>éçæèµèªèº«çè¿ éåå±åä¸ææ·±å ¥å°ä½¿ç¨ Kylinï¼æä»¬ä¹éå°ä¸äºææï¼<br /> +- é¦å Kylin on HBase çæå»ºæ§è½æ æ³æ»¡è¶³æèµçé¢æï¼æå»ºæ§è½ä¼å½±åå°ç¨æ·çæ 鿢夿¶é´åç¨³å®æ§çä½éªï¼<br /> +- å ¶æ¬¡ï¼éçæ´å¤å¤§åå®¶ï¼ååºåä¸çº§å«ä¼åãæ°åä¸ååï¼çæ¥å ¥ï¼å¯¹æä»¬çæ¥è¯¢ä¹å¸¦æ¥äºå¾å¤§çææãKylin on HBase åéäº QueryServer åç¹æ¥è¯¢çå±éï¼æ æ³å¾å¥½å°æ¯æè¿äºå¤æçåºæ¯ï¼<br /> +- æåï¼å 为 HBase 䏿¯ä¸ä¸ªäºåçç³»ç»ï¼å¾é¾åå°å¼¹æ§çèµæºä¼¸ç¼©ï¼éçæ°æ®éç䏿å¢é¿ï¼è¿ä¸ªç³»ç»å¯¹äºåå®¶èè¨ï¼ä½¿ç¨æ¶é´æ¯åå¨é«å³°åä½è°·çï¼è¿å°±é æå¹³åçèµæºä½¿ç¨çä¸å¤é«ã</p> + +<p>é¢å¯¹è¿äºææï¼æèµéæ©å»åæ´äºåçç Apache Kylin 4 å»é æ¢åå级ã</p> + +<h2 id="kylin-4--1">02 Kylin 4 åçä»ç»</h2> +<p>é¦å ä»ç»ä¸ä¸ Kylin 4 ç主è¦ä¼å¿ãApache Kylin 4 æ¯å®å ¨åºäº Spark å»åæå»ºåæ¥è¯¢çï¼è½å¤å åå°å©ç¨ Sparkçå¹¶è¡åãåéååå ¨å±å¨æä»£ç çæçææ¯ï¼å»æé«å¤§æ¥è¯¢çæçã<br /> +è¿éä»åå¨ãæå»ºåæ¥è¯¢ä¸ä¸ªé¨åç®åä»ç»ä¸ä¸ Kylin 4 çåçã</p> + +<h3 id="section">åå¨</h3> +<p><img src="/images/blog/youzan_cn/1 kylin4_storage.png" alt="" /><br /> +é¦å æ¥çä¸ä¸ï¼Kylin on HBase å Kylin on Parquet ç对æ¯ãKylin on HBase ç Cuboid çæ°æ®æ¯åæ¾å¨ HBase ç表éï¼ä¸ä¸ª Segment 对åºäºä¸å¼ HBase è¡¨ï¼æ¥è¯¢ä¸æ¨çå·¥ä½ç± HBase åçå¨å¤çï¼å 为 HBase 䏿¯çæ£çååå¹¶ä¸å¯¹ OLAP èè¨ååéä¸é«ãKylin 4 å° HBase æ¿æ¢ä¸º Parquetï¼ä¹å°±æ¯æææçæ°æ®æç §æä»¶åå¨ï¼æ¯ä¸ª Segment ä¼åå¨ä¸ä¸ªå¯¹åºç HDFS çç®å½ï¼ææçæ¥è¯¢ãæå»ºé½æ¯ç´æ¥éè¿è¯»åæä»¶çæ¹å¼ï¼ä¸ç¨åç»è¿ HBaseãè½ç¶å¯¹äºå°æ¥è¯¢çæ§è½ä¼æä ¸å®æå¤±ï¼ä½å¯¹äºå¤ææ¥è¯¢å¸¦æ¥çæåæ¯æ´å¯è§çãæ´å¼å¾çã</p> + +<h3 id="section-1">æå»ºå¼æ</h3> +<p><img src="/images/blog/youzan_cn/2 kylin4_build_engine.png" alt="" /><br /> +å ¶æ¬¡æ¯ Kylin æå»ºå¼æï¼åºäºæèµçæµè¯ï¼Kylin on Parquet çæå»ºé度已ç»ä» 82 åéä¼åå°äº 15 åéï¼æä»¥ä¸å 个åå ï¼</p> + +<ul> + <li>Kylin 4 廿äºç»´åº¦åå ¸çç¼ç ï¼çå»äºç¼ç çä¸ä¸ªæå»ºæ¥éª¤ï¼</li> + <li>å»æäº HBase File ççææ¥éª¤ï¼</li> + <li>æ°çæ¬ç Kylin 4 ææçæå»ºæ¥éª¤é½è½¬æ¢ä¸º Spark è¿è¡æå»ºï¼</li> + <li>Kylin on Parquet åºäº Cuboid å»ååæå»ºç²åº¦ï¼æå©äºè¿ä¸æ¥å°æåå¹¶è¡åº¦ã</li> +</ul> + +<p>å¯ä»¥çå°å³ä¾§ï¼ä»å个æ¥éª¤ç®åå°äºä¸¤ä¸ªæ¥éª¤ï¼æå»ºæ§è½æåçéå¸¸ææ¾çã</p> + +<h3 id="section-2">æ¥è¯¢å¼æ</h3> +<p><img src="/images/blog/youzan_cn/3 kylin4_query.png" alt="" /></p> + +<p>æ¥ä¸æ¥å°±æ¯ Kylin 4 çæ¥è¯¢ï¼å¤§å®¶å¯ä»¥çå°ï¼å·¦è¾¹è¿å Kylin on HBase çè®¡ç®æ¯å®å ¨ä¾æäº Calcite å HBase çåå¤çå¨ï¼è¿å°±å¯¼è´å½æ°æ®ä» HBase 读ååï¼å¦ææ³åèåãæåºçï¼å°±ä¼å±éäº QueryServer åç¹çç¶é¢ï¼è Kylin 4 å转æ¢ä¸ºåºäº Spark DataFrame çå ¨åå¸å¼çæ¥è¯¢æºå¶ã</p> + +<h2 id="kylin-4--2">03 Kylin 4 æ§è½ä¼å</h2> +<p>æ¥ä¸æ¥å享æèµå¨ Kylin 4 æåçä¸äºæ§è½ä¼åã</p> + +<h3 id="section-3">æ¥è¯¢æ§è½ä¼å</h3> +<p>#### 1.卿æ¶é¤ç»´åº¦ååº<br /> +<img src="/images/blog/youzan_cn/4 dynamic_elimination_dimension_partition.png" alt="" /></p> + +<p>é¦å æä»¬æ¥çä¸ä¸ªåºæ¯ï¼æä»¬åå°äºå¨ææ¶é¤ååºç»´åº¦ï¼æ··åä½¿ç¨ cuboid æ¥å¯¹å¤ææ¥è¯¢ï¼åå°æ°ååç计ç®éã</p> + +<p>è¿é举ä¸ä¸ªä¾åï¼å¨ä¸ä¸ª Cube æä¸ä¸ª Segment çæ åµä¸ï¼Cube ååºåæ®µè®°ä½ Pï¼å®æä¸ä¸ª Segment å嫿¯1æ1æ¥å°2æ1æ¥ã2æ1æ¥å°3æ1æ¥ï¼3æ1æ¥å°3æ7æ¥ãå设æä¸ä¸ªSQLï¼Select count(a) from test where p &gt;= 20200101 and p &lt;= 20200313 group by aã</p> + +<p>å¨è¿ç§æ åµä¸ï¼å 为éè¦ååºè¿æ»¤ï¼Kylin å®ä¼éæ© a å p é¢è®¡ç®ç»´åº¦çç»åï¼è½¬æ¢ææ§è¡è®¡åå°±æ¯æä¸å±ç Aggregate ç¶å Filterï¼æåä¼è½¬æ¢æä¸ä¸ª TableScanï¼è¿ä¸ª TableScan å°±æ¯éæ©èå维度为 a å p è¿æ ·çä¸ä¸ªç»´åº¦ç»åãå®é ä¸è¿ä¸ªæ¥è¯¢è®¡åæ¯éåæå®ä¼åæå³è¾¹è¿ç§æ¹å¼çï¼å¯¹äºæä¸ª Segment å®å ¨ä½¿ç¨å°çæ°æ®ï¼æä»¬å¯ä»¥éæ©ä¸ä¸ª Cuboid 为 a ç Cuboid å»åæ¥è¯¢ã对äºé¨åç¨å°çååºæè Segmentï¼æä»¬å¯ä»¥éæ© a å p è¿æ ·çä¸ä¸ªç»´åº¦ç»åãéè¿è¿ç§æ¹å¼ï¼å¨ a åªæä¸ä¸ªå¯è½å¼çæ åµä¸ï¼ä¹åå¯è½è¦ scan 65 æ¡æ°æ®ï¼ä¼åååªè¦ scan 8 æ¡æ°æ®ãå设æ¶é´è·¨åº¦æ´é¿ï¼æ¯å¦è¯´è·¨å 个æãåå¹´çè³ä¸å¹´ï¼å°±ä¼åå°æ°ååãå ååç计ç®éå IOã</p> + +<p>å¨æèµæäºåºæ¯ï¼RT å¯ä»¥ä» 10 ç§ä¼åå° 3 ç§ã20s æåå° 6sï¼å¯¹äºæ´å¤æçåºæ¯ï¼æ¯å¦è®¡ç®å¯éåç HLLï¼ï¼ä¼ææ´æ¾èçä¼åææãè¿é¨åä¼åï¼æèµä¹æ£æç®è´¡ç®å社åºãå 为æ¶åå°å¦ä½å¨å¤å±åµå¥åå¤æçæ¡ä»¶ä¸è¿è¡ segment åç»ï¼ä»¥åç®å calcite å spark catalyst å¹¶åï¼å®ç°ä¸ä¼æ¯è¾å¤æãå°æ¶åå¤§å®¶å¨ Kylin 4.0-GA çæ¬å¯è½å°±å¯ä»¥çå°è¿ä¸ªä¼åäºã</p> + +<h4 id="section-4">2.å¤æè¿æ»¤æ¡ä»¶ä¸çååºè£åª</h4> +<p>æ¥ä¸æ¥åä»ç»ä¸ä¸æèµæåçæ¥è¯¢æ§è½ä¼åï¼å°±æ¯æ¯æå¤æè¿æ»¤æ¡ä»¶ä¸çååºè£åªãç®å Kylin 4.0 Beta çæ¬å¯¹äºå¤æçè¿æ»¤æ¡ä»¶æ¯å¦å¤ä¸ªè¿æ»¤å段ãå¤å±åµå¥ç Filter çï¼ä¸æ¯æååºè£åªï¼å¯¼è´å ¨è¡¨æ«æãæä»¬åäºä¸ä¸ªä¼åï¼æ¯å°å¤æçåµå¥ Filter è¿æ»¤çè¯æ³æ è½¬æ¢æåºäºååºå段 p çä¸ä¸ªç价表达å¼ï¼ç¶ååå°è¿ä¸ªè¡¨è¾¾å¼åºç¨å°æ¯ä¸ä¸ª Segment å»åè¿æ»¤ï¼éè¿è¿æ ·çæ¹å¼ï¼å»æ¯æå®åå°ä¸ä¸ªé 叏夿çååºè¿æ»¤è£åªã</p> + +<p><img src="/images/blog/youzan_cn/5 Partition clipping under complex filter.png" alt="" /></p> + +<h4 id="spark-">3.Spark åæ°è°ä¼</h4> +<p><img src="/images/blog/youzan_cn/6 tuning_spark_configuration.png" alt="" /></p> + +<p>æ¥ä¸æ¥æ¯æ¯è¾éè¦çä¸é¨åï¼å°±æ¯å ³äº Spark çè°åãSpark æ¯ä¸ä¸ªåå¸å¼è®¡ç®æ¡æ¶ï¼ç¸æ¯ Calcite èè¨ï¼å¯¹äºå°æ¥è¯¢æ¯åå¨ä¸å®å£å¿çã</p> + +<p>é¦å æä»¬åäºä¸ä¸ªè°æ´ï¼å°½é让 Spark ææçè®¡ç®æä½æ¯å¨å åä¸å®æçã以ä¸ä¸¤ç§æ åµä¼äº§ç spillï¼<br /> +- 01 å¨èåæ¶ï¼å¨æä»¬å åä¸å¤çæ¶åï¼Spark ä¼å° HashAggregate 转æ¢ä¸º Sort Based Aggregateï¼å®é ä¸è¿ä¸æ¥æ¯å¾èæ§è½çãæä»¬éè¿è°å¤§éå¼çåæ°ï¼å°½é让ææçèåé½å¨å åä¸å®æã<br /> +- 02 å¨ shuffle çè¿ç¨ä¸ï¼Sparkæ¯ä¸å¯é¿å å°ä¼è¿è¡ Spillï¼ä¼è½çï¼æä»¬è½åçå°½éå¨ Shuffle è¿ç¨åå° Spillï¼åªå¨æå Shuffle ç»æä¹åè¿è¡ Spillã</p> + +<p>第äºä¸ªæä»¬åçè°ä¼æ¯ï¼ç¸æ¯ on YARN/Standalone 模å¼ä¸ï¼local 模å¼å¤§é¨å齿¯å¨è¿ç¨å éä¿¡çï¼ä¹ä¸éè¦äº§çè·¨ç½ç»ç Shuffleï¼ broadcast 广æåéä¹ä¸éè¦è·¨ç½ç»ï¼æä»¥å¯¹äºå°æ¥è¯¢ï¼æä»¬ä¼è·¯ç±å°ä»¥ Local 模å¼è¿è¡ç Spark Applicationï¼è¿å¯¹äºå°æ¥è¯¢é常ææä¹ã</p> + +<p>第ä¸ä¸ªä¼åæ¯ shuffle 使ç¨å åçãå 为å åçè¯å®æ¯æå¿«çï¼æä»¬å°å åçæè½½ä¸º tmpfs æä»¶ç³»ç»ï¼ç¶åå° spark.local.dir æå®ä¸ºæè½½çå åçå»ä¼å shuffle çé度åååã</p> + +<p>第å个ä¼åæ¯æä»¬å ³é Spark å ¨å±å¨æä»£ç çæãSpark çå ¨å±å¨æä»£ç çææ¯è¦å¨è¿è¡çæ¶é´å å»å¨ææ¼æ¥ä»£ç ï¼åå»å¨æç¼è¯ä»£ç ï¼è¿ä¸ªè¿ç¨å®é 䏿¯å¾èæ¶çã对äºç¦»çº¿çå¤§æ°æ®é䏿¯å¾æä¼åæä¹ï¼ä½æ¯å¯¹äºæ¯è¾å°çä¸äºæ°æ®åºæ¯ï¼æä»¬å ³æè¿ä¸ªå¨æä»£ç çæä¹åï¼è½å¤èçå¤§æ¦ 100 å° 200 毫ç§çèæ¶ã</p> + +<p>ç®åç»è¿ä¸è¿°ä¸ç³»åçä¼åï¼æä»¬è½è®©å°æ¥è¯¢ç RT 稳å®å¨å¤§æ¦ 300 毫ç§å·¦å³ï¼å°½ç®¡ HBase å¯è½æ¯å 忝«ç§å·¦å³ç RTï¼ä½æä»¬è®¤ä¸ºç®åå·²ç»æ¯è¾æ¥è¿äºï¼è¿ç§ä¸ºæå大æ¥è¯¢æåç Tradeoff æä»¬è®¤ä¸ºæ¯ä¸ä¸ªå¾å¼å¾çäºæ ã</p> + +<h4 id="section-5">4.å°æ¥è¯¢ä¼å</h4> +<p><img src="/images/blog/youzan_cn/8 small_query_optimization.png" alt="" /><br /> +ç¶åï¼ææ¥å享ä¸ä¸å°æ¥è¯¢çä¼åãKylin on HBase ä¾æäº HBase è½å¤åå°å 忝«ç§ç RTï¼å 为 HBase æ bucket cache ç¼åãè Kylin on Parquet å°±å®å ¨åºäºæä»¶ç读åå计ç®ï¼ç¼åä¾èµäºæä»¶ç³»ç»ç page cacheï¼é£ä¹å®å°æ¥è¯¢ç RT 伿¯ HBase æ´é«ä¸äºï¼æä»¬è½åçå°±æ¯å°½éç¼©å° Kylin on Parquet å Kylin on HBase ç RT å·®è·ã</p> + +<p>ç»è¿æä»¬çåæï¼SQL ä¼éè¿ Calcite è§£ææ Calcite è¯æ³æ ï¼ç¶åå°è¿ä¸ªè¯æ³æ 转å为 Spark DataFrameï¼æç»åå°æ´ä¸ªæ¥è¯¢äº¤ç» Spark 廿§è¡ãå¨è¿ä¸æ¥çè¿ç¨ä¸ï¼SQL è½¬åæ Calcite çè¿ç¨ä¸ï¼æ¯éè¦ç»è¿è¯æ³è§£æãä¼åçï¼è¿ä¸æ¥å¤§æ¦ä¼æ¶è 150 毫ç§å·¦å³ãæèµåçæ¯å°½é使ç¨ç»æåç SQLï¼å°±æ¯ PreparedStatementï¼æä»¬å¨ Kylin 䏿¯æ PreparedStatementCacheï¼å¯¹äºåºå®ç SQL æ ¼å¼ï¼å°å®çæ§è¡è®¡åè¿è¡ç¼åï¼å»éç¨è¿æ ·çæ§è¡è®¡åï¼éä½ è¯¥æ¥éª¤çæ¶é´æ¶èï¼éè¿è¿æ ·çä¼åï¼å¯ä»¥éä½å¤§æ¦ 100 毫ç§å·¦å³çèæ¶ã</p> + +<h4 id="parquet-">5.Parquet ä¼å</h4> + +<p>å ³äºæ¥è¯¢æ§è½çä¼åï¼æèµè¿å åå©ç¨äº Parquet ç´¢å¼ï¼ä¼åå»ºè®®å æ¬ï¼</p> + +<ul> + <li> + <p>Parquet æä»¶é¦å æ ¹æ® Shard By Column è¿è¡åç»ï¼è¿æ»¤æ¡ä»¶å°½éå å« Shard By Columnï¼</p> + </li> + <li> + <p>Parquet ä¸çæ°æ®ä¾ç¶æç §ç»´åº¦æåºï¼ç»å Column MetaData ä¸ç MaxãMin ç´¢å¼ï¼å¨å½ä¸åç¼ç´¢å¼æ¶è½å¤è¿æ»¤æå¤§éæ°æ®ï¼</p> + </li> + <li> + <p>è°å° RowGroup Size å¢å¤§ç´¢å¼ç²åº¦çã</p> + </li> +</ul> + +<h3 id="section-6">æå»ºæ§è½ä¼å</h3> +<p>#### 1.对 parent dataset åç¼å<br /> +<img src="/images/blog/youzan_cn/9 cache_parent_dataset.png" alt="" /></p> + +<h4 id="section-7">2.å¤ç空å¼å¯¼è´çæ°æ®å¾æ</h4> +<p><img src="/images/blog/youzan_cn/10 Processing data skew.png" alt="" /></p> + +<p>æ´å¤å ³äºæå»ºä¼åçç»èå 容大家å¯ä»¥åè <a href="https://mp.weixin.qq.com/s/T_mK7pTAgk2PXnSJ0lbZ_w">Kylin 4 ææ°åè½é¢è§ + ä¼åå®è·µæ¢å ç</a></p> + +<h2 id="kylin-4--3">04 Kylin 4 卿èµçå®è·µ</h2> +<p>ä»ç»æèµçä¼åä¹åï¼æä»¬åæ¥å享ä¸ä¸ä¼åçææï¼ä¹å°±æ¯ Kylin 4 卿èµçå®è·µå æ¬å级è¿ç¨ä»¥åä¸çº¿çææã</p> + +<h3 id="section-8">å æ°æ®å级</h3> +<p>é¦å æ¯å¦ä½åçº§ï¼æä»¬å¼åäºä¸ä¸ªå æ°æ®æ ç¼å级çå·¥å ·ï¼é¦å æä»¬å¨ Kylin on HBase çå æ°æ®æ¯ä¿åå¨ HBase éçï¼æä»¬å° HBase éçå æ°æ®ä»¥æä»¶çæ ¼å¼å¯¼åºï¼åå°æä»¶æ ¼å¼çå æ°æ®åå ¥å° MySQLï¼æä»¬ä¹å¨ Apache Kylin ç宿¹ wiki æ´æ°äºæä½ææ¡£ä»¥å大è´çåçï¼æ´å¤è¯¦æ 大家å¯ä»¥åèï¼<a href="https://wiki.apache.org/confluence/display/KYLIN/How+to+migrate+metadata+to+Kylin+4">å¦ä½åçº§å æ°æ®å°kylin4</a>.<br /> +<img src="/images/blog/youzan_cn/11 metadata_upgrade.png" alt="" /><br /> +æä»¬å¤§è´ä»ç»ä¸ä¸æ´ä¸ªè¿ç¨ä¸çä¸äºå ¼å®¹æ§ï¼éè¦è¿ç§»çæ°æ®å¤§æ¦æå 个ï¼åä¸ä¸ªæ¯ project å ä¿¡æ¯ï¼tables çå ä¿¡æ¯ï¼å æ¬ä¸äº Hive 表ï¼è¿æ model 模åå®ä¹çä¸äºå ä¿¡æ¯ï¼è¿äºæ¯ä¸éè¦ä¿®æ¹çãéè¦ä¿®æ¹çå°±æ¯ Cube çå ä¿¡æ¯ãè¿é¨åéè¦ä¿®æ¹åªäºä¸è¥¿å¢ï¼é¦å æ¯ Cube æä½¿ç¨çåå¨åæ¥è¯¢çç±»åï¼æ´æ°å®è¿ä¸¤ä¸ªå段ä¹åï¼éè¦éæ°è®¡ç®ä¸ä¸ Cube çç¾åï¼è¿ä¸ªç¾åçä½ç¨æ¯ Kylin å é¨è®¾è®¡çé¿å Cube ç¡®å®� �¹åæä»¬åå»ä¿®æ¹ Cube 导è´çä¸äºé®é¢ï¼æåä¸ä¸ªæ¯æéç¸å ³ï¼è¿é¨å乿¯å ¼å®¹ï¼æ éä¿®æ¹çã</p> + +<h3 id="kylin-4--4">Kylin 4 卿èµä¸çº¿åç表ç°</h3> +<p><img src="/images/blog/youzan_cn/12 commodity_insight.png" alt="" /></p> + +<p>å æ°æ®è¿ç§»å° Kylin ä¹åï¼æä»¬æ¥å享ä¸ä¸å¨æèµçä¸äºåºæ¯ä¸å¸¦æ¥äºçè´¨ååå¤§å¹ åº¦çæ§è½æåãé¦å åååæ´å¯è¿æ ·ä¸ä¸ªåºæ¯ï¼æä¸ä¸ªæ°åä¸ååç大åºéºï¼æä»¬è¦å»åæå®ç交æåæµéçï¼æåå 个精确å»éç计ç®ã精确å»éå¦ææ²¡æéè¿é¢è®¡ç®å Bitmap å»åä¼åå®é ä¸æçæ¯å¾ä½çï¼Kylin ç®åä½¿ç¨ Bitmap å»å精确å»éçæ¯æãå¨ä¸ä¸ªéè¦å¯¹å åä¸ä¸ªååçåç§ UV å»åæåºç夿� �¥è¯¢çåºæ¯ï¼Kylin 2 ç RT æ¯ 27 ç§ï¼èå¨ Kylin 4 è¿ä¸ªåºæ¯ç RT ä» 27 ç§éå°äº 2 ç§ä»¥å ã</p> + +<p>æè§å¾ Kylin 4 æå¸å¼æçå°æ¹æ¯å®å®å ¨åæäºä¸ä¸ªæå¨æ¡£ï¼è Kylin on HBase å®é 䏿¯ä¸ä¸ªèªå¨æ¡£ï¼å 为å®çå¹¶åå®å ¨å region çæ°éç»å®äºã</p> + +<p><img src="/images/blog/youzan_cn/13 cube_query.png" alt="" /></p> + +<h3 id="kylin-4--5">Kylin 4 卿èµçæªæ¥è®¡å</h3> +<p>Kylin 4 卿èµçå级大è´å å«ä»¥ä¸å 个æ¥éª¤ï¼<br /> +<img src="/images/blog/youzan_cn/14 youzan_plan.png" alt="" /></p> + +<p>第ä¸é¶æ®µå°±æ¯è°ç åå¯ç¨æ§æµè¯ï¼å 为 Kylin on Parquet å®é 䏿¯åºäº Sparkï¼æ¯æä¸å®çå¦ä¹ ææ¬çï¼è¿ä¸ªæä»¬ä¹è±äºä¸æ®µæ¶é´ï¼</p> + +<p>第äºé¶æ®µå°±æ¯è¯æ³å ¼å®¹æ§æµè¯ï¼æä»¬æ©å±äº Kylin 4 åæä¸æ¯æçä¸äºè¯æ³ï¼æ¯å¦è¯´å页æ¥è¯¢çè¯æ³çï¼</p> + +<p>第ä¸é¶æ®µå°±æ¯æµééæ¾ï¼éæ¥å°ä¸çº¿ Cube çï¼</p> + +<p>æä»¬ç°å¨æ¯å±äºç¬¬åé¶æ®µï¼æä»¬å·²ç»è¿ç§»äºä¸äºæ°æ®äºï¼æªæ¥çè¯ï¼æä»¬ä¼éæ¥å°ä¸çº¿æ§é群ï¼ç¶åå°ææçä¸å¡å¾æ°é群ä¸å»è¿ç§»ã</p> + +<p>å ³äº Kylin 4 æä»¬æªæ¥è®¡åå¼åçåè½å满足çéæ±æèµä¹ä¼å¨ç¤¾åºå»åæ¥ãå°±ä¸å¨è¿éå详ç»ä»ç»äºï¼å¤§å®¶å¯ä»¥å ³æ³¨æä»¬ç¤¾åºçææ°å¨æï¼ä»¥ä¸å°±æ¯æä»¬çå享ã</p> +</description> + <pubDate>Thu, 17 Jun 2021 08:00:00 -0700</pubDate> + <link>http://kylin.apache.org/cn_blog/2021/06/17/Why-did-Youzan-choose-Kylin4/</link> + <guid isPermaLink="true">http://kylin.apache.org/cn_blog/2021/06/17/Why-did-Youzan-choose-Kylin4/</guid> + + + <category>cn_blog</category> + + </item> + + <item> <title>ä½ ç¦»å¯è§åé ·ç«å¤§å±åªå·®ä¸å¥ Kylin + Davinci</title> <description><p>Kylin æä¾ä¸ BI å·¥å ·çæ´åè½åï¼å¦ Tableauï¼PowerBI/Excelï¼MSTRï¼QlikSenseï¼Hue å SuperSetãä½å°±å¯è§åå·¥å ·èè¨ï¼Davinci è¯å¥½çäº¤äºæ§å个æ§åçå¯è§å大å±å±ç°ææï¼ä½¿å ¶ä¸ Kylin çç»åè½è®©å¤§é¨åç¨æ·ææ´å¥½çå¯è§ååæä½éªã</p> @@ -1392,214 +1700,6 @@ Security: (depend on your security setti <category>blog</category> - - </item> - - <item> - <title>Apache Kylin v3.0.0-alpha åå¸</title> - <description><p>è¿æ¥ Apache Kylin 社åºå¾é«å ´å°å®£å¸ï¼Apache Kylin v3.0.0-alpha æ£å¼åå¸ã</p> - -<p>Apache Kylin æ¯ä¸ä¸ªå¼æºçåå¸å¼åæå¼æï¼æ¨å¨ä¸ºæå¤§æ°æ®éæä¾ SQL æ¥å£åå¤ç»´åæï¼OLAPï¼çè½åã</p> - -<p>è¿æ¯ Kylin ä¸ä¸ä»£ v3.x ç第ä¸ä¸ªåå¸çæ¬ï¼ç¨äºæ©æé¢è§ï¼ä¸»è¦çåè½æ¯å®æ¶ ï¼Real-timeï¼ OLAPã宿´çæ¹å¨å表请åè§<a href="/docs/release_notes.html">release notes</a>ï¼è¿éæä¸äºä¸»è¦æ¹è¿å说æã</p> - -<h1 id="section">éè¦æ°åè½</h1> - -<h3 id="kylin-3654----olap">KYLIN-3654 - 宿¶ OLAP</h3> -<p>éçå¼å ¥æ°ç real-time receiver å coordinator ç»ä»¶ï¼Kylin è½å¤å®ç°æ¯«ç§çº§å«çæ°æ®åå¤å»¶è¿ï¼æ°æ®æºæ¥èªæµå¼æ°æ®å¦ Apache Kafkaãè¿æå³çï¼ä» v3.0 å¼å§ï¼Kylin æ¢è½å¤æ¯æå岿¹éæ°æ®ç OLAPï¼ä¹æ¯æå¯¹æµå¼æ°æ®çå宿¶ï¼Near real-timeï¼ä»¥åå®å ¨å®æ¶(real-time)åæãç¨æ·å¯ä»¥ä½¿ç¨ä¸ä¸ª OLAP 平尿¥æå¡ä¸åç使ç¨åºæ¯ãæ¤æ¹æ¡å·²ç»å¨æ©æç¨æ·å¦ eBay å¾å°é¨ç½²åéªè¯ãå ³äºå¦ä½ä½¿ç¨æ¤åè½ï¼è¯·åè<a href="/docs30/tutorial/realtime_olap.html">æ¤æ ç¨</a>ã</p> - -<h3 id="kylin-3795----apache-livy--spark-">KYLIN-3795 - éè¿ Apache Livy é交 Spark ä»»å¡</h3> -<p>è¿ä¸ªåè½å 许管çå为 Kylin é ç½®ä½¿ç¨ Apache Livy (incubating) æ¥å®æä»»å¡çé交ãSpark ä½ä¸çæäº¤éè¿ Livy ç REST API æ¥æäº¤ï¼èæ é卿¬å°å¯å¨ Spark Driver è¿ç¨ï¼ä»èæ¹ä¾¿å¯¹ Spark èµæºç管ççæ§ï¼åæ¶ä¹éä½å¯¹ Kylin ä»»å¡è¿ç¨æå¨èç¹çååã</p> - -<h3 id="kylin-3820----curator-">KYLIN-3820 - åºäº Curator çä»»å¡èç¹åé åæå¡åç°</h3> -<p>æ°å¢ä¸ç§åºäºApache Zookeeper å Curatorä½ä¸è°åº¦å¨ï¼å¯ä»¥èªå¨åç° Kylin èç¹ï¼å¹¶èªå¨åé ä¸ä¸ªèç¹æ¥è¿è¡ä»»å¡ç管çä»¥åæ éæ¢å¤ãæäºè¿ä¸ªåè½åï¼ç®¡çåå¯ä»¥æ´å 容æå°é¨ç½²åæ©å± Kylin èç¹ï¼èä¸åéè¦å¨ <code class="highlighter-rouge">kylin.properties</code> ä¸é ç½®æ¯ä¸ª Kylin èç¹çå°åå¹¶éå¯ Kylin 以使ä¹çæã</p> - -<h1 id="section-1">å ¶å®æ¹è¿</h1> - -<h3 id="kylin-3716---fastthreadlocal--threadlocal">KYLIN-3716 - FastThreadLocal æ¿æ¢ ThreadLocal</h3> -<p>ä½¿ç¨ Netty ä¸ç FastThreadLocal æ¿ä»£ JDK åçç ThreadLocalï¼å¯ä»¥ä¸å®ç¨åº¦ä¸æå Kylin å¨é«å¹¶åä¸çæ§è½ã</p> - -<h3 id="kylin-3867---enable-jdbc-to-use-key-store--trust-store-for-https-connection">KYLIN-3867 - Enable JDBC to use key store &amp; trust store for https connection</h3> -<p>éè¿ä½¿ç¨HTTPSï¼ä¿æ¤äºJDBC使ç¨ç身份éªè¯ä¿¡æ¯ï¼ä½¿å¾Kylinæ´å å®å ¨</p> - -<h3 id="kylin-3905---enable-shrunken-dictionary-default">KYLIN-3905 - Enable shrunken dictionary default</h3> -<p>é»è®¤å¼å¯ shrunken dictionaryï¼é对é«åºç»´è¿è¡ç²¾ç¡®å»éçåºæ¯ï¼å¯ä»¥æ¾èåå°æå»ºç¨æ¶ã</p> - -<h3 id="kylin-3839---storage-clean-up-after-the-refreshing-and-deleting-a-segment">KYLIN-3839 - Storage clean up after the refreshing and deleting a segment</h3> -<p>æ´å åæ¶å°æ¸ é¤ä¸å¿ è¦çæ°æ®æä»¶</p> - -<p><strong>ä¸è½½</strong></p> - -<p>è¦ä¸è½½Apache Kylin æºä»£ç æäºè¿å¶å ï¼è¯·è®¿é®<a href="/download">ä¸è½½é¡µé¢</a> page.</p> - -<p><strong>å级</strong></p> - -<p>åè<a href="/docs/howto/howto_upgrade.html">å级æå</a>.</p> - -<p><strong>åé¦</strong></p> - -<p>妿æ¨éå°é®é¢æçé®ï¼è¯·åéé®ä»¶è³ Apache Kylin dev æ user é®ä»¶å表ï¼d...@kylin.apache.orgï¼u...@kylin.apache.org; å¨åéä¹åï¼è¯·ç¡®ä¿æ¨å·²éè¿åéçµåé®ä»¶è³ dev-subscr...@kylin.apache.org æ user-subscr...@kylin.apache.org 订é äºé®ä»¶å表ã</p> - -<p><em>é常æè°¢ææè´¡ç®Apache Kylinçæå!</em></p> -</description> - <pubDate>Fri, 19 Apr 2019 13:00:00 -0700</pubDate> - <link>http://kylin.apache.org/cn_blog/2019/04/19/release-v3.0.0-alpha/</link> - <guid isPermaLink="true">http://kylin.apache.org/cn_blog/2019/04/19/release-v3.0.0-alpha/</guid> - - - <category>cn_blog</category> - - </item> - - <item> - <title>Real-time Streaming Design in Apache Kylin</title> - <description><h2 id="why-build-real-time-streaming-in-kylin">Why Build Real-time Streaming in Kylin</h2> -<p>The real-time streaming feature is contributed by eBay big data team in Kylin 3.0, the purpose we build real-time streaming is:</p> - -<ul> - <li> - <p>Milliseconds Data Preparation Delay <br /> -Kylin provide sub-second query latency for extremely large dataset, the underly magic is precalculation cube. But the cube building often take long time(usually hours for large data sets), in some case, the analyst needs real-time data to do analysis, so we want to provide real-time OLAP, which means data can be queried immediately when produced to system.</p> - </li> - <li> - <p>Support Lambda Architecture <br /> -Real-time data often not reliable, that may caused by many reasons, for example, the upstream processing system has a bug, or the data need to be changed after some time, etc. So we need to support lambda architecture, which means the cube can be built from the streaming source(like Kafka), and the historical cube data can be refreshed from batch source(like Hive).</p> - </li> - <li> - <p>Less MR jobs and HBase Tables <br /> -Since Kylin 1.6, community has provided a streaming solution, it uses MR to consume Kafka data and then do batch cube building, it can provide minute-level data preparation latency, but to ensure the data latency, you need to schedule the MR very shortly(5 minutes or even less), that will cause too many hadoop jobs and small hbase tables in the system, and dramatically increase the Hadoop systemâs load.</p> - </li> -</ul> - -<h2 id="architecture">Architecture</h2> - -<p><img src="/images/blog/rt_stream_architecture.png" alt="Kylin RT Streaming Architecture" /></p> - -<p>The blue rectangle is streaming components added in current Kylinâs architecture, which is responsible to ingest data from streaming source, and provide query for real-time data.</p> - -<p>We divide the unbounded incoming streaming data into 3 stages, the data come into different stages are all queryable immediately.</p> - -<p><img src="/images/blog/rt_stream_stages.png" alt="Kylin RT Streaming stages" /></p> - -<h3 id="components">Components</h3> - -<p><img src="/images/blog/rt_stream_components.png" alt="Kylin RT Streaming Components" /></p> - -<p>Streaming Receiver: Responsible to ingest data from stream data source, and provide real-time data query.</p> - -<p>Streaming Coordinator: Responsible to do coordination works, for example, when new streaming cube is onboard, the coordinator need to decide which streaming receivers can be assigned.</p> - -<p>Metadata Store: Used to store streaming related metadata, for example, the cube assignments information, cube build state information.</p> - -<p>Query Engine: Extend the existing query engine, support to query real-time data from streaming receiver</p> - -<p>Build Engine: Extend the existing build engine, support to build full cube from the real-time data</p> - -<h3 id="how-streaming-cube-engine-works">How Streaming Cube Engine Works</h3> - -<p><img src="/images/blog/rt_stream_how_build_work.png" alt="Kylin RT Streaming How Build Works" /></p> - -<ol> - <li>Coordinator ask streaming source for all partitions of the cube</li> - <li>Coordinator decide which streaming receivers to assign to consume streaming data, and ask streaming receivers to start consuming data.</li> - <li>Streaming receiver start to consume and index streaming events</li> - <li>After sometime, streaming receiver copy the immutable segments from local files to remote HDFS files</li> - <li>Streaming receiver notify the coordinator that a segment has been persisted to HDFS</li> - <li>Coordinator submit a cube build job to Build Engine to triger cube full building after all receivers have submitted their segments</li> - <li>Build Engine build all cuboids from the streaming HDFS files</li> - <li>Build Engine store cuboid data to Hbase, and then the coordinator will ask the streaming receivers to remove the related local real-time data.</li> -</ol> - -<h3 id="how-streaming-query-engine-works">How Streaming Query Engine Works</h3> - -<p><img src="/images/blog/rt_stream_how_query_work.png" alt="Kylin RT Streaming How Query Works" /></p> - -<ol> - <li>If Query hits a streaming cube, Query Engine ask Streaming Coordinator what streaming receivers are assigned for the cube</li> - <li>Query Engine send query request to related streaming receivers to query realtime segments</li> - <li>Query Engine send query request to Hbase to query historical segments</li> - <li>Query Engine aggregate the query results, and send response back to client</li> -</ol> - -<h2 id="detail-design">Detail Design</h2> - -<h3 id="real-time-segment-store">Real-time Segment Store</h3> -<p>Real-time segments are divided by event time, when new event comes, it will be calculated which segment it will be located, if the segment doesnât exist, create a new one.</p> - -<p>The new created segment is in âActiveâ state first, if no further events coming into the segment after some preconfigured period, the segment state will be changed to âImmutableâ, and then write to remote HDFS.</p> - -<p><img src="/images/blog/rt_stream_rt_segment_state.png" alt="Kylin RT Streaming Segment State" /></p> - -<p>Each real-time segment has a memory store, new event will first goes into the memory store to do aggregation, when the memory store size reaches the configured threshold, it will be then be flushed to local disk as a fragment file.</p> - -<p>Not all cuboids are built in the receiver side, only basic cuboid and some specified cuboids are built.</p> - -<p>The data is stored as columnar format on disk, and when there are too many fragments on disk, the fragment files will be merged by a background thread automatically.</p> - -<p>The directory structure in receiver side is like:</p> - -<p><img src="/images/blog/rt_stream_dir_structure.png" alt="Kylin RT Streaming Segment Directory" /></p> - -<p>To improve the query performance, the data is stored in columnar format, the data format is like:</p> - -<p><img src="/images/blog/rt_stream_columnar_format.png" alt="Kylin RT Streaming Columnar Format" /></p> - -<p>Each cuboid data is stored together, and in each cuboid the data is stored column by column, the metadata is stored in json format.</p> - -<p>The dimension data is divided into 3 parts:</p> - -<p>The first part is Dictionary part, this part exists when the dimension encoding is set to âDictâ in cube design, by default we use <a href="https://kylin.apache.org/blog/2015/08/13/kylin-dictionary/">tri-tree dictionary</a> to minimize the memory footprints and preserve the original order.</p> - -<p>The second part is dictionary encoded values, additional compression mechanism can be applied to these values, since the values for the same column are usually similar, so the compression rate will be very good.</p> - -<p>The third part is invert-index data, use Roaring Bitmap to store the invert-index info, the following picture shows how invert-index data is stored, there are two types of format, the first one is dictionary encoding dimensionâs index data format, the second is other fix-len encoding dimensionâs index data format.</p> - -<p><img src="/images/blog/rt_stream_invertindex_format.png" alt="Kylin RT Streaming InvertIndex Format" /></p> - -<p>Real-time data is stored in compressed format, currently support two type compression: Run Length Encoding and LZ4.</p> - -<ul> - <li>Use RLE compression for time-related dim and first dim</li> - <li>Use LZ4 for other dimensions by default</li> - <li>Use LZ4 Compression for simple-type measure(long, double)</li> - <li>No compression for complex measure(count distinct, topn, etc.)</li> -</ul> - -<h3 id="high-availability">High Availability</h3> - -<p>Streaming receivers are group into replica-sets, all receivers in the same replica-set share the same assignments, so that when one receiver is down, the query and event consuming will not be impacted.</p> - -<p>In each replica-set, there is a lead responsible to upload real-time segments to HDFS, and zookeeper is used to do leader election</p> - -<h3 id="failure-recovery">Failure Recovery</h3> - -<p>We do checkpoint periodically in receiver side, so that when the receiver is restarted, the data can be restored correctly.</p> - -<p>There are two parts in the checkpoint: the first part is the streaming source consume info, for Kafka it is {partition:offset} pairs, the second part is disk states {segment:framentID} pairs, which means when do the checkpoint whatâs the max fragmentID for each segment.</p> - -<p>When receiver is restarted, it will check the latest checkpoint, set the Kafka consumer to start to consume data from specified partition offsets, and remove the fragment files that the fragmentID is larger than the checkpointed fragmentID on the disk.</p> - -<p>Besides the local checkpoint, we also have remote checkpoint, to restore the state when the disk is crashed, the remote checkpoint is saved to Cube Segment metadata after HBase segment build, like:<br /> -<code class="highlighter-rouge"> - âsegmentsâ:[{â¦, - "stream_source_checkpoint": {"0":8946898241, â1â: 8193859535, ...} - }, - ] -</code><br /> -The checkpoint info is the smallest partition offsets on the streaming receiver when real-time segment is sent to full build.</p> - -<h2 id="future">Future</h2> -<ul> - <li>Star Schema Support</li> - <li>Streaming Receiver On Kubernetes/Yarn</li> -</ul> -</description> - <pubDate>Fri, 12 Apr 2019 09:30:00 -0700</pubDate> - <link>http://kylin.apache.org/blog/2019/04/12/rt-streaming-design/</link> - <guid isPermaLink="true">http://kylin.apache.org/blog/2019/04/12/rt-streaming-design/</guid> - - - <category>blog</category> </item>
Added: kylin/site/images/blog/youzan/1 history_of_youzan_OLAP.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/1%20history_of_youzan_OLAP.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/1 history_of_youzan_OLAP.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/10 commodity_insight.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/10%20commodity_insight.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/10 commodity_insight.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/2 kylin4_storage.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/2%20kylin4_storage.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/2 kylin4_storage.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/3 kylin4_build_engine.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/3%20kylin4_build_engine.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/3 kylin4_build_engine.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/4 kylin4_query.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/4%20kylin4_query.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/4 kylin4_query.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/5 cache_calcite_plan.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/5%20cache_calcite_plan.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/5 cache_calcite_plan.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/6 tuning_spark_configuration.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/6%20tuning_spark_configuration.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/6 tuning_spark_configuration.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/7 parquet_optimization.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/7%20parquet_optimization.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/7 parquet_optimization.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/8 Dynamic_elimination_of_partitioning_dimensions.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/8%20Dynamic_elimination_of_partitioning_dimensions.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/8 Dynamic_elimination_of_partitioning_dimensions.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan/9 cache_parent_dataset.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan/9%20cache_parent_dataset.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan/9 cache_parent_dataset.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/1 kylin4_storage.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/1%20kylin4_storage.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/1 kylin4_storage.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/10 Processing data skew.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/10%20Processing%20data%20skew.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/10 Processing data skew.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/11 metadata_upgrade.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/11%20metadata_upgrade.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/11 metadata_upgrade.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/12 commodity_insight.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/12%20commodity_insight.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/12 commodity_insight.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/13 cube_query.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/13%20cube_query.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/13 cube_query.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/14 youzan_plan.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/14%20youzan_plan.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/14 youzan_plan.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/2 kylin4_build_engine.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/2%20kylin4_build_engine.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/2 kylin4_build_engine.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/3 kylin4_query.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/3%20kylin4_query.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/3 kylin4_query.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/4 dynamic_elimination_dimension_partition.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/4%20dynamic_elimination_dimension_partition.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/4 dynamic_elimination_dimension_partition.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/5 Partition clipping under complex filter.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/5%20Partition%20clipping%20under%20complex%20filter.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/5 Partition clipping under complex filter.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/6 tuning_spark_configuration.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/6%20tuning_spark_configuration.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/6 tuning_spark_configuration.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/8 small_query_optimization.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/8%20small_query_optimization.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/8 small_query_optimization.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: kylin/site/images/blog/youzan_cn/9 cache_parent_dataset.png URL: http://svn.apache.org/viewvc/kylin/site/images/blog/youzan_cn/9%20cache_parent_dataset.png?rev=1890886&view=auto ============================================================================== Binary file - no diff available. Propchange: kylin/site/images/blog/youzan_cn/9 cache_parent_dataset.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Modified: kylin/site/images/docs/quickstart/advance_setting.png URL: http://svn.apache.org/viewvc/kylin/site/images/docs/quickstart/advance_setting.png?rev=1890886&r1=1890885&r2=1890886&view=diff ============================================================================== Binary files - no diff available.