[jira] [Commented] (COMDEV-503) OPC-UA browser for Apache StreamPipes

2023-03-12 Thread anurag (Jira)


[ 
https://issues.apache.org/jira/browse/COMDEV-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699338#comment-17699338
 ] 

anurag commented on COMDEV-503:
---

hello @Dominik Riemer

i just want to know can you gave me some help regarding (create the frontend 
views to asynchronously browse data and to create a new adapter ) . i have done 
research work on opc-ua in last week now i get comfortable with that and i 
understand its work flow as well . 

so can you elaborate about creating new adapter

> OPC-UA browser for Apache StreamPipes
> -
>
> Key: COMDEV-503
> URL: https://issues.apache.org/jira/browse/COMDEV-503
> Project: Community Development
>  Issue Type: Improvement
>  Components: GSoC/Mentoring ideas
>Reporter: Dominik Riemer
>Priority: Major
>  Labels: StreamPipes, full-time, gsoc, gsoc2023, mentor
>
> h3. *Apache StreamPipes*
> Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to 
> enable non-technical users to connect, analyze and explore IoT data streams. 
> StreamPipes offers several modules including StreamPipes Connect to easily 
> connect data from industrial IoT sources, the Pipeline Editor to quickly 
> create processing pipelines and several visualization modules for live and 
> historic data exploration. Under the hood, StreamPipes utilizes an 
> event-driven microservice paradigm of standalone, so-called analytics 
> microservices making the system easy to extend for individual needs.
> h3. *Background*
> StreamPipes is grown significantly throughout recent years. We were able to 
> introduce a lot of new features and attracted both users and contributors. 
> Putting the cherry on the cake, we were graduated as an Apache top level 
> project in December 2022. We will of course continue developing new features 
> and never rest to make StreamPipes even more amazing. 
> StreamPipes really shines when connecting Industrial IoT data. Such data 
> sources typically originate from machine controllers, called PLCs (e.g., 
> Siemens S7). But there are also new protocols such as OPC-UA which allow to 
> browse available data within the controller. Our goal is to make connectivity 
> of industrial data sources a matter of minutes.
> Currently, data sources can be connected using the built-in module 
> `StreamPipes Connect` from the UI. We provide a set of adapters for popular 
> protocols that can be customized, e.g., connection details can be added. 
> To make it even easier to connect industrial data sources with StreamPipes, 
> we plan to add an OPC-UA browser. This will be part of the entry page of 
> StreamPipes connect and should allow users to enter connection details of an 
> existing OPC-UA server. Afterwards, a new view in the UI shows available data 
> nodes from the server, their status and current value. Users should be able 
> to select values that should be part of a new adapter. Afterwards, a new 
> adapter can be created by reusing the current workflow to create an OPC-UA 
> data source.
> This is a really cool project for participants interested in full-stack 
> development who would like to get a deeper understanding of industrial IoT 
> protocols. Have fun! 
> h3. *Tasks*
>  - [ ] get familiar with the OPC-UA protocol
>  - [ ] develop mockups which demonstrate the user workflow
>  - [ ] develop a data model for discovering data from OPC-UA
>  - [ ] create the backend business logic for the OPC-UA browser 
>  - [ ] create the frontend views to asynchronously browse data and to create 
> a new adapter
>  - [ ] write Junit, Component and E2E tests
>  - [ ] what ever comes in your mind 💡 further ideas are always welcome
> h3.  
> h3.  *Relevant Skills*
>  - interest in Industrial IoT and procotols such as OPC-UA
>  * Java development skills
>  * Angular/Typescript development skills
> Anyways, the most important relevant skill is motivation and readiness to 
> learn during the project!
> h3. *Learning Material*
>  - StreamPipes documentation 
> ([https://streampipes.apache.org/docs/docs/user-guide-introduction.html])
>  - [ur current OPC-UA adapter 
> ([https://github.com/apache/streampipes/tree/dev/streampipes-extensions/streampipes-connect-adapters-iiot/src/main/java/org/apache/streampipes/connect/iiot/adapters/opcua])
>  - Eclipse Milo, which we currently use for OPC-UA connectivity 
> ([https://github.com/eclipse/milo])
>  - Apache PLC4X, which has an API for browsing 
> ([https://plc4x.apache.org/)|https://plc4x.apache.org/] 
> h3. *Reference*
> Github issue can be found here: 
> [https://github.com/apache/streampipes/issues/1390]
> h3. *Name and contact information*
>  * Mentor: Dominik Riemer (riemer[at]apache.org).
>  * Mailing list: (dev[at]streampipes.apache.org)
>  * Website: streampipes.apache.org
>  



--
This message was sent by

[jira] [Created] (COMDEV-510) [GSoC][Doris]Page Cache Improvement

2023-03-12 Thread Zhijing Lu (Jira)
Zhijing Lu created COMDEV-510:
-

 Summary: [GSoC][Doris]Page Cache Improvement
 Key: COMDEV-510
 URL: https://issues.apache.org/jira/browse/COMDEV-510
 Project: Community Development
  Issue Type: Task
  Components: GSoC/Mentoring ideas
Reporter: Zhijing Lu


*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]

{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*
Apache Doris accelerates high-concurrency queries utilizing page cache, where 
the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which 
reveals a few problems: # 
Hot data will be phased out in large queries
 # 
The page cache configuration is immutable and does not support GC.

h3. Task
 # 
{*}Phase One{*}: Identify the impacts on queries when the decompressed data is 
stored in memory and SSD, respectively, and then determine whether full page 
cache is required.
 # 
{*}Phase Two{*}: Improve the cache strategy for Apache Doris based on the 
results from Phase One.

h3. Learning Material

 
{*}Page{*}: https://doris.apache.org
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, 
[yangyongqi...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Haopeng Li, Apache Doris PMC member & Committer, 
[lihaop...@apache.org|mailto:lihaop...@apache.org]  
 * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Updated] (COMDEV-510) [GSoC][Doris]Page Cache Improvement

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-510:
--
Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]

{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*

Apache Doris accelerates high-concurrency queries utilizing page cache, where 
the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which 
reveals a few problems: 
 * Hot data will be phased out in large queries
 * The page cache configuration is immutable and does not support GC.

h3. Task
 * {*}Phase One{*}: Identify the impacts on queries when the decompressed data 
is stored in memory and SSD, respectively, and then determine whether full page 
cache is required.

 * {*}Phase Two{*}: Improve the cache strategy for Apache Doris based on the 
results from Phase One.

h3. Learning Material

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, 
[yangyongqi...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Haopeng Li, Apache Doris PMC member & Committer, 
[lihaop...@apache.org|mailto:lihaop...@apache.org]  
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]

{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*
Apache Doris accelerates high-concurrency queries utilizing page cache, where 
the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which 
reveals a few problems: # 
Hot data will be phased out in large queries
 # 
The page cache configuration is immutable and does not support GC.

h3. Task
 # 
{*}Phase One{*}: Identify the impacts on queries when the decompressed data is 
stored in memory and SSD, respectively, and then determine whether full page 
cache is required.
 # 
{*}Phase Two{*}: Improve the cache strategy for Apache Doris based on the 
results from Phase One.

h3. Learning Material

 
{*}Page{*}: https://doris.apache.org
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, 
[yangyongqi...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Haopeng Li, Apache Doris PMC member & Committer, 
[lihaop...@apache.org|mailto:lihaop...@apache.org]  
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris]Page Cache Improvement
> ---
>
> Key: COMDEV-510
> URL: https://issues.apache.org/jira/browse/COMDEV-510
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: ApacheDoris, Mentor, full-time, gsoc2023
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. *Background*
> Apache Doris accelerates high-concurrency queries utilizing page cache, where 
> the decompressed data is stored.
> Currently, the page cache in Apache Doris uses a simple LRU algorithm, which 
> reveals a few problems: 
>  * Hot data will be phased out in large queries
>  * The page cache configuration is immutable and does not support GC.
> h3. Task
>  * {*}Phase One{*}: Identify the impacts on queries when the decompressed 
> data is stored in memory and SSD, respectively, and then determine whether 
> full page cache is required.
>  * {*}Phase Two{*}: Improve the cache strategy for Apache Doris based on the 
> results from Phase One.
> h3. Learning Material
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Yongqiang Yang, Apache Doris PMC member & Commit

[jira] [Created] (COMDEV-511) [GSoC][Doris]Dictionary Encoding Acceleration

2023-03-12 Thread Zhijing Lu (Jira)
Zhijing Lu created COMDEV-511:
-

 Summary: [GSoC][Doris]Dictionary Encoding Acceleration
 Key: COMDEV-511
 URL: https://issues.apache.org/jira/browse/COMDEV-511
 Project: Community Development
  Issue Type: Task
  Components: GSoC/Mentoring ideas
Reporter: Zhijing Lu


*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]

{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*
In Apache Doris, dictionary encoding is performed during data writing and 
compaction. Dictionary encoding will be implemented on string data types by 
default. The dictionary size of a column for one segment is 1M at most. The 
dictionary encoding technology accelerates strings during queries, converting 
them into INT, for example.
 
h3. *Task*
 * Phase One: Get familiar with the implementation of Apache Doris dictionary 
encoding; learning how Apache Doris dictionary encoding accelerates queries.
 *  Phase Two: Evaluate the effectiveness of full dictionary encoding and 
figure out how to optimize memory in such a case.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Chen Zhang, Apache Doris Committer, [zhangec...@apache.org 
|mailto:yangyongqi...@apache.org]
 * Mentor: Zhijing Lu, Apache Doris Committer, 
[luzhij...@apache.org|mailto:lihaop...@apache.org]  
 * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Updated] (COMDEV-510) [GSoC][Doris]Page Cache Improvement

2023-03-12 Thread Maxim Solodovnik (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Solodovnik updated COMDEV-510:

Labels: Doris Mentor full-time gsoc2023  (was: ApacheDoris Mentor full-time 
gsoc2023)

> [GSoC][Doris]Page Cache Improvement
> ---
>
> Key: COMDEV-510
> URL: https://issues.apache.org/jira/browse/COMDEV-510
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, Mentor, full-time, gsoc2023
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. *Background*
> Apache Doris accelerates high-concurrency queries utilizing page cache, where 
> the decompressed data is stored.
> Currently, the page cache in Apache Doris uses a simple LRU algorithm, which 
> reveals a few problems: 
>  * Hot data will be phased out in large queries
>  * The page cache configuration is immutable and does not support GC.
> h3. Task
>  * {*}Phase One{*}: Identify the impacts on queries when the decompressed 
> data is stored in memory and SSD, respectively, and then determine whether 
> full page cache is required.
>  * {*}Phase Two{*}: Improve the cache strategy for Apache Doris based on the 
> results from Phase One.
> h3. Learning Material
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, 
> [yangyongqi...@apache.org |mailto:yangyongqi...@apache.org]
>  * Mentor: Haopeng Li, Apache Doris PMC member & Committer, 
> [lihaop...@apache.org|mailto:lihaop...@apache.org]  
>  * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Updated] (COMDEV-511) [GSoC][Doris]Dictionary Encoding Acceleration

2023-03-12 Thread Maxim Solodovnik (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Solodovnik updated COMDEV-511:

Labels: Doris Mentor full-time gsoc2023  (was: ApacheDoris Mentor full-time 
gsoc2023)

> [GSoC][Doris]Dictionary Encoding Acceleration
> -
>
> Key: COMDEV-511
> URL: https://issues.apache.org/jira/browse/COMDEV-511
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, Mentor, full-time, gsoc2023
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. *Background*
> In Apache Doris, dictionary encoding is performed during data writing and 
> compaction. Dictionary encoding will be implemented on string data types by 
> default. The dictionary size of a column for one segment is 1M at most. The 
> dictionary encoding technology accelerates strings during queries, converting 
> them into INT, for example.
>  
> h3. *Task*
>  * Phase One: Get familiar with the implementation of Apache Doris dictionary 
> encoding; learning how Apache Doris dictionary encoding accelerates queries.
>  *  Phase Two: Evaluate the effectiveness of full dictionary encoding and 
> figure out how to optimize memory in such a case.
> h3. *Learning Material*
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Chen Zhang, Apache Doris Committer, [zhangec...@apache.org 
> |mailto:yangyongqi...@apache.org]
>  * Mentor: Zhijing Lu, Apache Doris Committer, 
> [luzhij...@apache.org|mailto:lihaop...@apache.org]  
>  * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Created] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Zhijing Lu (Jira)
Zhijing Lu created COMDEV-512:
-

 Summary: [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache 
Cassandra/Apache Druid in Federated Queries 
 Key: COMDEV-512
 URL: https://issues.apache.org/jira/browse/COMDEV-512
 Project: Community Development
  Issue Type: Task
  Components: GSoC/Mentoring ideas
Reporter: Zhijing Lu


*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org
Github: [https://github.com/apache/doris]
h3. *Background*
Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * 
Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * 
Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * 
Investigate how metadata should be acquired and how data access works regarding 
the picked data source(s); produce the corresponding design documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.
h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[chenmin...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Maxim Solodovnik (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Solodovnik updated COMDEV-512:

Labels: Doris full-time gsoc2023 mentor  (was: ApacheDoris full-time 
gsoc2023 mentor)

> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://doris.apache.org|https://doris.apache.org/]
> Github: [https://github.com/apache/doris]
> h3. *Background*
> Apache Doris supports acceleration of queries on external data sources to 
> meet users' needs for federated queries and analysis.
> Currently, Apache Doris supports multiple external catalogs including those 
> from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources 
> to Apache Doris based on a unified framework.
> h4. *Objective*
>  * Enable Apache Doris to access one or more of these data sources via the 
> Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
>  * 
> Compile relevant documentation. See an example here: 
> [https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]
> *Task*
> {*}Phase One{*}:
>  * Get familiar with the Multi-Catalog structure of Apache Doris, including 
> the metadata synchronization mechanism in FE and the data reading mechanism 
> of BE.
>  * Investigate how metadata should be acquired and how data access works 
> regarding the picked data source(s); produce the corresponding design 
> documentation.
> {*}Phase Two{*}:
>  * Develop connections to the picked data source(s) and implement access to 
> metadata and data.
> h3. *Learning Material*
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
> [chenmin...@apache.org |mailto:yangyongqi...@apache.org]
>  * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
> [calvink...@apache.org|mailto:calvink...@apache.org]
>  * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-512:
--
Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.
h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[chenmin...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org
Github: [https://github.com/apache/doris]
h3. *Background*
Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * 
Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * 
Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * 
Investigate how metadata should be acquired and how data access works regarding 
the picked data source(s); produce the corresponding design documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.
h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[chenmin...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: ApacheDoris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://doris.apache.org|https

[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-512:
--
Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.
h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://

[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-512:
--
Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.
h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.
h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[chenmin...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://d

[jira] [Updated] (COMDEV-511) [GSoC][Doris]Dictionary Encoding Acceleration

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-511:
--
Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]

{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*

In Apache Doris, dictionary encoding is performed during data writing and 
compaction. Dictionary encoding will be implemented on string data types by 
default. The dictionary size of a column for one segment is 1M at most. The 
dictionary encoding technology accelerates strings during queries, converting 
them into INT, for example.
 
h3. *Task*
 * Phase One: Get familiar with the implementation of Apache Doris dictionary 
encoding; learning how Apache Doris dictionary encoding accelerates queries.
 *  Phase Two: Evaluate the effectiveness of full dictionary encoding and 
figure out how to optimize memory in such a case.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Chen Zhang, Apache Doris Committer, [zhangc...@apache.org 
|mailto:yangyongqi...@apache.org]
 * Mentor: Zhijing Lu, Apache Doris Committer, 
[luzhij...@apache.org|mailto:lihaop...@apache.org]  
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]

{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*
In Apache Doris, dictionary encoding is performed during data writing and 
compaction. Dictionary encoding will be implemented on string data types by 
default. The dictionary size of a column for one segment is 1M at most. The 
dictionary encoding technology accelerates strings during queries, converting 
them into INT, for example.
 
h3. *Task*
 * Phase One: Get familiar with the implementation of Apache Doris dictionary 
encoding; learning how Apache Doris dictionary encoding accelerates queries.
 *  Phase Two: Evaluate the effectiveness of full dictionary encoding and 
figure out how to optimize memory in such a case.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Chen Zhang, Apache Doris Committer, [zhangec...@apache.org 
|mailto:yangyongqi...@apache.org]
 * Mentor: Zhijing Lu, Apache Doris Committer, 
[luzhij...@apache.org|mailto:lihaop...@apache.org]  
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris]Dictionary Encoding Acceleration
> -
>
> Key: COMDEV-511
> URL: https://issues.apache.org/jira/browse/COMDEV-511
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, Mentor, full-time, gsoc2023
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. *Background*
> In Apache Doris, dictionary encoding is performed during data writing and 
> compaction. Dictionary encoding will be implemented on string data types by 
> default. The dictionary size of a column for one segment is 1M at most. The 
> dictionary encoding technology accelerates strings during queries, converting 
> them into INT, for example.
>  
> h3. *Task*
>  * Phase One: Get familiar with the implementation of Apache Doris dictionary 
> encoding; learning how Apache Doris dictionary encoding accelerates queries.
>  *  Phase Two: Evaluate the effectiveness of full dictionary encoding and 
> figure out how to optimize memory in such a case.
> h3. *Learning Material*
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Chen Zhang, Apache Doris Committer, [zhangc...@apache.org 

[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-512:
--
Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Geode PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Dolphinscheduler PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://doris.apac

[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-512:
--
Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Geode PMC & Committer, 
[k...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Geode PMC & Committer, 
[calvink...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://doris.apache.org|https://do

[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-512:
--
Labels: ApacheDoris full-time gsoc2023 mentor  (was: Doris full-time 
gsoc2023 mentor)

> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: ApacheDoris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://doris.apache.org|https://doris.apache.org/]
> Github: [https://github.com/apache/doris]
> h3. *Background*
> Apache Doris supports acceleration of queries on external data sources to 
> meet users' needs for federated queries and analysis.
> Currently, Apache Doris supports multiple external catalogs including those 
> from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources 
> to Apache Doris based on a unified framework.
> h4. *Objective*
>  * Enable Apache Doris to access one or more of these data sources via the 
> Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
>  * 
> Compile relevant documentation. See an example here: 
> [https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]
> *Task*
> {*}Phase One{*}:
>  * Get familiar with the Multi-Catalog structure of Apache Doris, including 
> the metadata synchronization mechanism in FE and the data reading mechanism 
> of BE.
>  * Investigate how metadata should be acquired and how data access works 
> regarding the picked data source(s); produce the corresponding design 
> documentation.
> {*}Phase Two{*}:
>  * Develop connections to the picked data source(s) and implement access to 
> metadata and data.
> h3. *Learning Material*
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
> [morning...@apache.org |mailto:yangyongqi...@apache.org]
>  * Mentor: Calvin Kirs, Apache Geode PMC & Committer, 
> [k...@apache.org|mailto:calvink...@apache.org]
>  * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Maxim Solodovnik (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Solodovnik updated COMDEV-512:

Labels: Doris full-time gsoc2023 mentor  (was: ApacheDoris full-time 
gsoc2023 mentor)

> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://doris.apache.org|https://doris.apache.org/]
> Github: [https://github.com/apache/doris]
> h3. *Background*
> Apache Doris supports acceleration of queries on external data sources to 
> meet users' needs for federated queries and analysis.
> Currently, Apache Doris supports multiple external catalogs including those 
> from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources 
> to Apache Doris based on a unified framework.
> h4. *Objective*
>  * Enable Apache Doris to access one or more of these data sources via the 
> Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
>  * 
> Compile relevant documentation. See an example here: 
> [https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]
> *Task*
> {*}Phase One{*}:
>  * Get familiar with the Multi-Catalog structure of Apache Doris, including 
> the metadata synchronization mechanism in FE and the data reading mechanism 
> of BE.
>  * Investigate how metadata should be acquired and how data access works 
> regarding the picked data source(s); produce the corresponding design 
> documentation.
> {*}Phase Two{*}:
>  * Develop connections to the picked data source(s) and implement access to 
> metadata and data.
> h3. *Learning Material*
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
> [morning...@apache.org |mailto:yangyongqi...@apache.org]
>  * Mentor: Calvin Kirs, Apache Geode PMC & Committer, 
> [k...@apache.org|mailto:calvink...@apache.org]
>  * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Commented] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Maxim Solodovnik (Jira)


[ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699472#comment-17699472
 ] 

Maxim Solodovnik commented on COMDEV-512:
-

[~luzhijing] please keep the label `Doris` otherwise the project will be 
incorrect at Ideas page :)))

> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: ApacheDoris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://doris.apache.org|https://doris.apache.org/]
> Github: [https://github.com/apache/doris]
> h3. *Background*
> Apache Doris supports acceleration of queries on external data sources to 
> meet users' needs for federated queries and analysis.
> Currently, Apache Doris supports multiple external catalogs including those 
> from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources 
> to Apache Doris based on a unified framework.
> h4. *Objective*
>  * Enable Apache Doris to access one or more of these data sources via the 
> Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
>  * 
> Compile relevant documentation. See an example here: 
> [https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]
> *Task*
> {*}Phase One{*}:
>  * Get familiar with the Multi-Catalog structure of Apache Doris, including 
> the metadata synchronization mechanism in FE and the data reading mechanism 
> of BE.
>  * Investigate how metadata should be acquired and how data access works 
> regarding the picked data source(s); produce the corresponding design 
> documentation.
> {*}Phase Two{*}:
>  * Develop connections to the picked data source(s) and implement access to 
> metadata and data.
> h3. *Learning Material*
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
> [morning...@apache.org |mailto:yangyongqi...@apache.org]
>  * Mentor: Calvin Kirs, Apache Geode PMC & Committer, 
> [k...@apache.org|mailto:calvink...@apache.org]
>  * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Updated] (COMDEV-512) [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in Federated Queries

2023-03-12 Thread Zhijing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-512:
--
Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Geode PMC & Committer, 
[k...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
Page: [https://doris.apache.org|https://doris.apache.org/]
Github: [https://github.com/apache/doris]
h3. *Background*

Apache Doris supports acceleration of queries on external data sources to meet 
users' needs for federated queries and analysis.
Currently, Apache Doris supports multiple external catalogs including those 
from Hive, Iceberg, Hudi, and JDBC. Developers can connect more data sources to 
Apache Doris based on a unified framework.
h4. *Objective*
 * Enable Apache Doris to access one or more of these data sources via the 
Multi-Catalog feature: BigQuery/Kudu/Cassandra/Druid;
 * 
Compile relevant documentation. See an example here: 
[https://doris.apache.org/docs/dev/lakehouse/multi-catalog/hive]

*Task*
{*}Phase One{*}:
 * Get familiar with the Multi-Catalog structure of Apache Doris, including the 
metadata synchronization mechanism in FE and the data reading mechanism of BE.
 * Investigate how metadata should be acquired and how data access works 
regarding the picked data source(s); produce the corresponding design 
documentation.

{*}Phase Two{*}:
 * Develop connections to the picked data source(s) and implement access to 
metadata and data.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Mingyu Chen, Apache Doris PMC Member & Committer, 
[morning...@apache.org |mailto:yangyongqi...@apache.org]
 * Mentor: Calvin Kirs, Apache Geode PMC & Committer, 
[k...@apache.org|mailto:calvink...@apache.org]
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris] Supports BigQuery/Apache Kudu/Apache Cassandra/Apache Druid in 
> Federated Queries 
> ---
>
> Key: COMDEV-512
> URL: https://issues.apache.org/jira/browse/COMDEV-512
> Project: Community Development
>  Issue Type: Task
>  Components: GSoC/Mentoring ideas
>Reporter: Zhijing Lu
>Priority: Major
>  Labels: Doris, full-time, gsoc2023, mentor
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> Page: [https://doris.apache.org|https://doris.apa

Re: Google Summer of Code 2023 Mentor Registration

2023-03-12 Thread Maxim Solodovnik
Hello Sally,

Thanks for the fast reply, and sorry for delay in answer :(

@Brian, can you please help us to announce GSoC ASF wide? :)

On Fri, 3 Mar 2023 at 00:49, Sally Khudairi  wrote:
>
> Thank you, Maxim --congratulations on securing the ASF's role as a mentoring 
> organization for another year!
>
> Unfortunately, I am unable to help with ASF-wide announcements, as I am no 
> longer VP Marketing & Publicity as of 2021.
>
> As such, I'm copying VP M&P Brian Proffitt and the team here for their follow 
> up.
>
> Warm regards,
> Sally
>
> - - -
> Vice President Sponsor Relations
> The Apache Software Foundation
>
> Tel +1 617 921 8656 | s...@apache.org
>
> On Thu, Mar 2, 2023, at 11:50, Maxim Solodovnik wrote:
> > Hello Sally,
> >
> > can we send below message ASF-wide?
> > Maybe something need to be changed? (wording etc. :
> >
> > On Thu, 2 Mar 2023 at 13:55, Sanyam Goel  wrote:
> >>
> >> Dear PMCs,
> >>
> >> I'm happy to announce that the ASF has made it onto the list of accepted
> >> organizations for
> >> Google Summer of Code 2023! [1,2]
> >>
> >> It is now time for mentors to sign up, so please pass this email on to 
> >> your community and podlings.
> >> If you aren’t already subscribed to ment...@community.apache.org
> >> you should do so now else you might miss important information.
> >>
> >> Mentor signup requires two steps: mentor signup in Google's system [3] and
> >> PMC acknowledgement.
> >>
> >> If you want to mentor a project in this year's SoC you will have to
> >>
> >> 1. Be an Apache committer.
> >> 2. Request an acknowledgement from the PMC for which you want to mentor
> >> projects. Use the below template and *do not forget to copy 
> >> ment...@community.apache.org*. We will
> >> use the email address you indicate to send the invite to be a mentor for 
> >> Apache.
> >>
> >> PMCs, read carefully, please.
> >>
> >> We request that each mentor is acknowledged by a PMC member. This is to
> >> ensure the mentor is in good standing with the community. When you receive 
> >> a request for
> >> acknowledgement, please ACK it and cc
> >> ment...@community.apache.org
> >>
> >> Lastly, it is not yet too late to record your ideas in Jira (see the 
> >> previous emails for details).
> >> GSoC participants will now begin to explore ideas so if you haven’t already
> >> done so, record your ideas
> >> immediately!
> >>
> >> Cheers,
> >> Sanyam
> >>
> >>
> >> Mentor request email template:
> >> 
> >> to: private@.apache.org
> >> cc: ment...@community.apache.org
> >> subject: GSoC 2023 mentor request for 
> >>
> >>  PMC,
> >>
> >> please acknowledge my request to become a mentor for Google Summer of Code
> >> 2023 projects for Apache Software Foundation
> >> .
> >>
> >> I would like to receive the mentor invite to 
> >>
> >> 
> >>
> >> 
> >>
> >>
> >> [1] https://summerofcode.withgoogle.com/programs/2023/organizations
> >> [2]
> >> https://summerofcode.withgoogle.com/programs/2023/organizations/apache-software-foundation
> >> [3] https://summerofcode.withgoogle.com/
> >
> >
> >
> > --
> > Best regards,
> > Maxim



-- 
Best regards,
Maxim

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org