Re: [DISCUSS] StreamPark Platform configuration files improvements

Yuepeng Pan Mon, 01 Apr 2024 19:50:26 -0700

Hi, Ben.

Thank you for driving it.
Sounds good to me~


Best,
Yuepeng Pan

On 2024/03/30 10:07:42 Huajie Wang wrote:
> hi devs:
> 
> 
> Currently, the streampark platform provides multiple configuration
> files for user configuration, such as: application.yml,
> application-pgsql.yml, application-mysql.yml, kerberos.yml... , We can
> improve these configuration files. Many config files are internal
> system configurations, for example, in application.yml, a large number
> of configurations are internal platform configurations, such as
> jackson config for integration Spring Boot, swagger-ui config. the
> 'allow-circular-references' parameter for Spring... These do not need
> user configuration and should not be exposed to users.
> 
> application.yml:
> ```yaml
> 
> server:
>   port: 10000
>   undertow:
>     buffer-size: 1024
>     direct-buffers: true
>     threads:
>       io: 4
>       worker: 20
> 
> logging:
>   level:
>     root: info
> 
> knife4j:
>   enable: true
>   basic:
>     # basic authentication, used to access swagger-ui and doc
>     enable: false
>     username: admin
>     password: streampark
> 
> springdoc:
>   api-docs:
>     enabled: true
>   swagger-ui:
>     path: /swagger-ui.html
>   packages-to-scan: org.apache.streampark.console
> 
> spring:
>   profiles.active: h2 #[h2,pgsql,mysql]
>   application.name: StreamPark
>   devtools.restart.enabled: false
>   mvc.pathmatch.matching-strategy: ant_path_matcher
>   servlet:
>     multipart:
>       enabled: true
>       max-file-size: 500MB
>       max-request-size: 500MB
>   aop.proxy-target-class: true
>   messages.encoding: utf-8
>   jackson:
>     date-format: yyyy-MM-dd HH:mm:ss
>     time-zone: GMT+8
>     deserialization:
>       fail-on-unknown-properties: false
>   main:
>     allow-circular-references: true
>     banner-mode: off
>   mvc:
>     converters:
>       preferred-json-mapper: jackson
> 
> management:
>   endpoints:
>     web:
>       exposure:
>         include: [ 'health', 'httptrace', 'metrics' ]
>   endpoint:
>     health:
>       enabled: true
>       show-details: always
>       probes:
>         enabled: true
>   health:
>     ldap:
>       enabled: false
> 
> streampark:
>   proxy:
>     # knox process address
> https://cdpsit02.example.cn:8443/gateway/cdp-proxy/yarn
>     yarn-url:
>     # lark alert proxy,default https://open.feishu.cn
>     lark-url:
>   yarn:
>       # default simple, or kerberos
>     http-auth: simple
> 
>   # HADOOP_USER_NAME
>   hadoop-user-name: hdfs
>   # local workspace, used to store source code and build dir etc.
>   workspace:
>     local: /opt/streampark_workspace
>     remote: hdfs:///streampark   # support hdfs:///streampark/ 、
> /streampark 、hdfs://host:ip/streampark/
> 
>   # remote docker register namespace for streampark
>   docker:
>     # instantiating DockerHttpClient
>     http-client:
>       max-connections: 10000
>       connection-timeout-sec: 10000
>       response-timeout-sec: 12000
>       docker-host: ""
> 
>   # flink-k8s tracking configuration
>   flink-k8s:
>     tracking:
>       silent-state-keep-sec: 10
>       polling-task-timeout-sec:
>         job-status: 120
>         cluster-metric: 120
>       polling-interval-sec:
>         job-status: 2
>         cluster-metric: 3
>     # If you need to specify an ingress controller, you can use this.
>     ingress:
>       class: nginx
> 
>   # packer garbage resources collection configuration
>   packer-gc:
>     # maximum retention time for temporary build resources
>     max-resource-expired-hours: 120
>     # gc task running interval hours
>     exec-cron: 0 0 0/6 * * ?
> 
>   shiro:
>     # token timeout, unit second
>     jwtTimeOut: 86400
>     # backend authentication-free resources url
>     anonUrl: >
> 
> ldap:
>   # Is ldap enabled? If so, please modify the urls
>   enable: false
>   ## AD server IP, default port 389
>   urls: ldap://99.99.99.99:389
>   ## Login Account
>   base-dn: dc=streampark,dc=com
>   username: cn=Manager,dc=streampark,dc=com
>   password: streampark
>   user:
>     identity-attribute: uid
>     email-attribute: mail
> 
> ```
> 
> 
> So, I propose that we improve these configurations by providing users
> with only one configuration file(only one). The configurations in this
> file should be completely user-focused, clear, and core
> configurations.
> 
> e.g:
> ```yaml
> 
> # logging level
> logging.level.root: info
> # server port
> server.port: 10000
> # The user's login session has a validity period. If it exceeds this
> time, the user will be automatically logout
> # unit: s|m|h|d, s: second, m:minute, h:hour, d: day
> server.session.ttl: 2h # unit[s|m|h|d], e.g: 24h, 2d....
> 
> # see: 
> https://github.com/undertow-io/undertow/blob/master/core/src/main/java/io/undertow/Undertow.java
> server.undertow.direct-buffers: true
> server.undertow.buffer-size: 1024
> server.undertow.threads.io: 16
> server.undertow.threads.worker: 256
> 
> # system database, default h2, mysql|pgsql|h2
> datasource.dialect: h2 # h2, pgsql
> #-------if datasource.dialect is mysql or pgsql, it is necessary to set-------
> datasource.username:
> datasource.password:
> # mysql jdbc url example:
> # datasource.url:
> jdbc:mysql://localhost:3306/streampark?useUnicode=true&characterEncoding=UTF-8&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=GMT%2B8
> # postgresql jdbc url example:
> # datasource.url:
> jdbc:postgresql://localhost:5432/streampark?stringtype=unspecified
> datasource.url:
> #---------------------------------------------------------------------------------
> 
> # Directory for storing locally built project
> streampark.workspace.local: /tmp/streampark
> # The root hdfs path of the jars, Same as yarn.provided.lib.dirs for
> flink on yarn-application
> # and Same as --jars for spark on yarn
> streampark.workspace.remote: hdfs:///streampark/
> # hadoop yarn proxy path, e.g: knox process address
> https://streampark.com:8443/proxy/yarn
> streampark.proxy.yarn-url:
> # lark proxy address, default https://open.feishu.cn
> streampark.proxy.lark-url:
> # flink on yarn or spark on yarn, monitoring job status from yarn, it
> is necessary to set hadoop.http.authentication.type
> streampark.yarn.http-auth: simple  # default simple, or kerberos
> # flink on yarn or spark on yarn, it is necessary to set
> streampark.hadoop-user-name: hdfs
> # flink on k8s ingress setting, If an ingress controller is specified
> in the configuration, the ingress class
> #  kubernetes.io/ingress.class must be specified when creating the
> ingress, since there are often
> #  multiple ingress controllers in a production environment.
> streampark.flink-k8s.ingress.class: nginx
> 
> # sign streampark with ldap.
> ldap.enable: false  # ldap enabled
> ldap.urls: ldap://99.99.99.99:389 #AD server IP, default port 389
> ldap.base-dn: dc=streampark,dc=com  # Login Account
> ldap.username: cn=Manager,dc=streampark,dc=com
> ldap.password: streampark
> ldap.user.identity-attribute: uid
> ldap.user.email-attribute: mail
> 
> # flink on yarn or spark on yarn, when the hadoop cluster enable
> kerberos authentication,
> # it is necessary to set up Kerberos authentication related parameters.
> security.kerberos.login.enable: false
> security.kerberos.login.debug: false
> # kerberos principal path
> security.kerberos.login.principal:
> security.kerberos.login.krb5:
> security.kerberos.login.keytab:
> security.kerberos.ttl: 2h # unit [s|m|h|d]
> 
> ```
> 
> this is issue: https://github.com/apache/incubator-streampark/issues/3641
> 
> What's your opinion on this? Welcome to discuss
> 
> 
> 
> Best,
> Huajie Wang
>

Re: [DISCUSS] StreamPark Platform configuration files improvements

Reply via email to