Doris??????????(Dataset Cache)???????????????? ???????????? 1??????????fe????????be???????????? 2??????????1??????10000????????????????????3?????????????????????????????????????????? 3????????????????????????????????????????checkpoint???????????????????????????????????????????? 4??????AP??????????????????????????????
?????????????? 1???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? 2????????????????????????????????????????????????????????????????????????Copy-on-Write?????????????? 3???????????????????????????????????????????????? 4???????????????????????????????????????????????????????????? 5??JDBC??????Doris??jdbc????????????????????????????????????jdbc???????? ?????????????? 1????????????????????????????????????????????????????????????????????????????????????????????????????cdc?????????????????????????????????????????????????????????????? 2???????????????????????????????????????????????????????????????????????????????????????????????????????????????? ????????????????????????????????????????????Doris????????????????????????????????????????????????????????????????????????????????????????????kudu??MemRowSet???????????????????????????????????????????????? ----------------The following is from Baidu translation------------------ Some ideas of Doris dataset cache: a?? How to save batches 1. Where: Batches may be at Frontend or Backend. 2. Size: Batches size in 1 minute or 10000 pieces, and save batches of 3 datasets for temporary de duplication and column compression. 3. Data security: the Batches in memory can write to the hard disk cache in real time, such as checkpoint. The hard disk cache data is used to deal with downtime events and does not participate in the calculation. 4. Participate in AP: participate in data analysis and statistics. b?? Advantages of saving batches 1. Real time data: since the new data are in memory, the data can participate in the calculation when they arrive. From the generation to the visibility of the data, the millisecond delay can be achieved. This is the ultimate goal of the real-time data warehouse and data Lake in the market, volume volume volume. 2. Data De duplication and consolidation: it can realize data De duplication in a short time window, reduce the pressure of de duplication, and better realize data consolidation during copy on write. 3. Data compression: compress the columns after saving batches, and the compression rate should be higher. 4. Historical data: after the historical data is involved in the calculation, it is merged with the calculation results of real-time data. 5. JDBC: at present, the JDBC operation performance of Doris is low. If properly designed, it can improve the performance of JDBC. c?? Disadvantages of saving approval 1. Dirty data reading: because the real-time data cannot be duplicated with the historical data after it arrives, there is a situation of dirty data reading. However, for CDC data without data modification and deletion, dirty reading does not exist, and some scene data can also tolerate a certain amount of data deviation. 2. Storage and calculation separation: there is a contradiction with storage and calculation separation. Solution: real time data can be calculated at the storage node, historical data can be calculated at the calculation node, and the final results can be consolidated.