This is an automated email from the ASF dual-hosted git repository. nic pushed a commit to branch document in repository https://gitbox.apache.org/repos/asf/kylin.git
commit 8f2ce8c725df150d72bdc666d4a508521f3a7bc7 Author: XiaoxiangYu <hit_la...@126.com> AuthorDate: Tue Jul 30 11:17:25 2019 +0800 Add article of introducing KYLIN-2895 --- ...7-29-detailed-analysis-of-refine-query-cache.md | 158 +++++++++++++++++++++ .../blog/refine-query-cache/cache-signature.png | Bin 0 -> 86165 bytes .../blog/refine-query-cache/consistent-hashing.png | Bin 0 -> 85380 bytes .../images/blog/refine-query-cache/l1-l2-cache.png | Bin 0 -> 64587 bytes 4 files changed, 158 insertions(+) diff --git a/website/_posts/blog/2019-07-29-detailed-analysis-of-refine-query-cache.md b/website/_posts/blog/2019-07-29-detailed-analysis-of-refine-query-cache.md new file mode 100644 index 0000000..543b423 --- /dev/null +++ b/website/_posts/blog/2019-07-29-detailed-analysis-of-refine-query-cache.md @@ -0,0 +1,158 @@ +--- +layout: post-blog +title: Detailed Analysis of refine query cache +date: 2019-07-30 10:30:00 +author: Xiaoxiang Yu +categories: blog +--- + +---- + +## Part-I Basic Introduction + +### Backgroud +In the past, query cache are not efficiently used in Kylin due to two aspects: **coarse-grained cache expiration strategy** and **lack of external cache**. Because of the aggressive cache expiration strategy, useful caches are often cleaned up unnecessarily. Because query caches are stored in local servers, they cannot be shared between servers. And because of the size limitation of local cache, not all useful query results can be cached. + +To deal with these shortcomings, we change the query cache expiration strategy by signature checking and introduce the memcached as Kylin's distributed cache so that Kylin servers are able to share cache between servers. And it's easy to add memcached servers to scale out distributed cache. + +These features is proposed and developed by eBay Kylin team. Thanks so much for their contribution. + +### Related JIRA + +- [KYLIN-2895 Refine Query Cache](https://issues.apache.org/jira/browse/KYLIN-2895) + - [KYLIN-2899 Introduce segment level query cache](https://issues.apache.org/jira/browse/KYLIN-2899) + - [KYLIN-2898 Introduce memcached as a distributed cache for queries](https://issues.apache.org/jira/browse/KYLIN-2898) + - [KYLIN-2894 Change the query cache expiration strategy by signature checking](https://issues.apache.org/jira/browse/KYLIN-2894) + - [KYLIN-2897 Improve the query execution for a set of duplicate queries in a short period](https://issues.apache.org/jira/browse/KYLIN-2897) + - [KYLIN-2896 Refine query exception cache](https://issues.apache.org/jira/browse/KYLIN-2896) + +---- + +## Part-II Deep Dive + +- Introduce memcached as a Distributed Query Cache +- Segment Level Cache +- Query Cache Expiration Strategy by Signature Checking +- Other Enhancement + +### Introduce memcached as a Distributed Query Cache + +**Memcached** is a Free and open source, high-performance, distributed memory object caching system. It is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. It is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages. + +By KYLIN-2898, Kylin use **Memcached** as distributed cache service, and use **EhCache** as local cache service. When `RemoteLocalFailOverCacheManager` is configured in `applicationContext.xml`, for each cache put/get action, Kylin will first check if remote cache service is available, only if remote cache service is unavailable, local cache service will be used. + +Firstly, multi query server can share query cache. For each kylin server, less jvm memory will be occupied which help to reduce GC pressure. Secondly, since memcached is centralized so duplicated cache entry will avoid in serval Kylin process. Thirdly, memcached has larger size and easy to scale out, this will help to reduce the chance which useful cache entry have to be dropped due to limited memory capacity. + +To handle node failure and to scale out memcached cluster, author has introduced a consistent hash strategy to smoothly solve such problem. Ketama is an implementation of a consistent hashing algorithm, meaning you can add or remove servers from the memcached pool without causing a complete remap of all keys. Detail could be checked at [Ketama consistent hash strategy](https://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients). + + + +### Segment level Cache + +Currently Kylin use sql as the cache key, when sql comes, if result exists in the cache, it will directly returned the cached result and don't need to query hbase. When there is new segment build or existing segment refresh, all related cache result need to be evicted. For some frequently build cube such as streaming cube(NRT Streaming or Real-time OLAP), the cache miss will increase dramatically, that may decrease the query performance. + +Since for Kylin cube, most historical segments are immutable, the same query against historical segments should be always same, don't need to be evicted for new segment building. So we decide to implement the segment level cache, it is a complement of the existing front-end cache, the idea is similar as the level1/level2 cache in operating system. + + + +### Query Cache Expiration Strategy by Signature Checking + +Currently, to invalid query cache, `CacheService` will either invoke `cleanDataCache` or `cleanAllDataCache`. Both methods will clear all of the query cache , which is very inefficient and unnecessary. In production environment, there's around hundreds of cubing jobs per day, which means the query cache will be cleared very several minutes. Then we introduced a signature to upgrade cache invalidation strategy. + +The basic idea is as follows: +When put SQLResponse into cache, we add signature for each SQLResponse. To calculate signature for SQLResponse, we choose the cube last build time and its segments to as input of `SignatureCalculator`. +When fetch `SQLResponse` for cache, first check whether the signature is consistent. If not, this cached value is overdue and will be invalidate. + +As for the calculation of signature is show as follows: +1. `toString` of `ComponentSignature` will concatenate member varible into a large String; if a `ComponentSignature` has other `ComponentSignature` as member, toString will be calculated recursively +2. return value of `toString` will be input of `SignatureCalculator`, +`SignatureCalculator` encode string using MD5 as identifer of signature of query cache + + + +### Other Enhancement + +#### Improve the query execution for a set of duplicate queries in a short period + +If same query enter Kylin at the same time by different client, for each query they can not find query cache so they must be calculated respectively. And even wrose, if these query are complex, they usually cost a long duration so Kylin have less chance to utilize cache query; and them cost large computation resources that will make query server has poor performance has harm to HBase cluster. + +To reduce the impact of duplicated and complex query, it may be a good idea to block query which came later, wait to first one return result as far as possible. This lazy strategy is especially useful if you have duplicated complex query came in same time. To enbale it, you should set `kylin.query.lazy-query-enabled` to `true`. Optionlly, you may set `kylin.query.lazy-query-waiting-timeout-milliseconds` to what you think later duplicated query wait duration to meet your situation. + + +#### Remove exception cache +Formerly, query cache has been divided into two part, one part for storing success query result, another for failed query result, and they are invalidated respectively. It looks like not a good classification criteria because it is not fine-grained enough. After query cache signature was introduced, we have no reason to take them apart, so exception cache was removed. + +---- + +## Part-III How to Use + +To get prepared, you need to install memcached, you may refer to https://github.com/memcached/memcached/wiki/Install. Then you should modify `kylin.properties` and `applicationContext.xml`. + +- kylin.properties +{% highlight Groff markup %} +kylin.cache.memcached.hosts=10.1.2.42:11211 +kylin.query.cache-signature-enabled=true +kylin.query.lazy-query-enabled=true +kylin.metrics.memcached.enabled=true +kylin.query.segment-cache-enabled=true +{% endhighlight %} + +- applicationContext.xml +{% highlight Groff markup %} + +<cache:annotation-driven/> + +<bean id="ehcache" class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean" + p:configLocation="classpath:ehcache-test.xml" p:shared="true"/> + +<bean id="remoteCacheManager" class="org.apache.kylin.cache.cachemanager.MemcachedCacheManager"/> +<bean id="localCacheManager" class="org.apache.kylin.cache.cachemanager.InstrumentedEhCacheCacheManager" + p:cacheManager-ref="ehcache"/> +<bean id="cacheManager" class="org.apache.kylin.cache.cachemanager.RemoteLocalFailOverCacheManager"/> + +<bean id="memcachedCacheConfig" class="org.apache.kylin.cache.memcached.MemcachedCacheConfig"> + <property name="timeout" value="500"/> + <property name="hosts" value="${kylin.cache.memcached.hosts}"/> +</bean> +{% endhighlight %} + +### Configuration for query cache + +#### General part + +|Conf Key | Conf value| Explanation| +|:------------ | :-------------| :---| +|kylin.query.cache-enabled|boolean, default true|whether to enable query cache| +|kylin.query.cache-threshold-duration| long, in milliseconds, default is 2000|query duration threshold| +|kylin.query.cache-threshold-scan-count| long, default is 10240|query scan row count threshold| +|kylin.query.cache-threshold-scan-bytes| long, default is 1024 * 1024 (1MB)|query scan byte threshold| + +#### Memcached part + +|Conf Key | Conf value| Explanation| +|:------------ | :------------- | :--- | +|kylin.cache.memcached.hosts | host1:port1,host2:port2|host list of memcached host| +|kylin.query.segment-cache-enabled|default false|wether to enable| +|kylin.query.segment-cache-timeout|default 2000| timeout of memcached| +|kylin.query.segment-cache-max-size|200 (MB) |max size put into memcached| + +#### Cache signature part + +|Conf Key | Conf value| Explanation| +| :------------ | :------------- | :--- | +|kylin.query.cache-signature-enabled| default false| whether to use signature for query cache| +|kylin.query.signature-class|default is org.apache.kylin.rest.signature.FactTableRealizationSetCalculator| use which class to calculate signature of query cache| + +#### Other optimize part + +| Conf Key | Conf value| Explanation| +| :------------ | :------------- | :--- | +|kylin.query.lazy-query-enabled|default false| whether to block duplicated sql query| +|kylin.query.lazy-query-waiting-timeout-milliseconds|long , in milliseconds, default is 60000| max druation for blocking duplicated sql query| + +#### Metrics part + +| Conf Key | Conf value | Explanation | +| :--------------------------- | :--------------------| :-------------------- | +|kylin.metrics.memcached.enabled |true |Enable memcached metrics in memcached.| +|kylin.metrics.memcached.metricstype|off/performance/debug|refer to net.spy.memcached.metrics.MetricType| diff --git a/website/images/blog/refine-query-cache/cache-signature.png b/website/images/blog/refine-query-cache/cache-signature.png new file mode 100644 index 0000000..52bc3d3 Binary files /dev/null and b/website/images/blog/refine-query-cache/cache-signature.png differ diff --git a/website/images/blog/refine-query-cache/consistent-hashing.png b/website/images/blog/refine-query-cache/consistent-hashing.png new file mode 100644 index 0000000..58d4d17 Binary files /dev/null and b/website/images/blog/refine-query-cache/consistent-hashing.png differ diff --git a/website/images/blog/refine-query-cache/l1-l2-cache.png b/website/images/blog/refine-query-cache/l1-l2-cache.png new file mode 100644 index 0000000..ad3dae9 Binary files /dev/null and b/website/images/blog/refine-query-cache/l1-l2-cache.png differ