Hello -
We have implemented Flink on Kubernetes with Google Cloud Storage in high
availability configuration as per the below configmap. Everything appears to be
working normally, state is being saved to GCS.
However, every now and then - perhaps weekly or every other week, all of the
submitted jobs are lost and the cluster appears completely reset. Perhaps GKE
is doing maintenance or something of this nature, but the point being that the
cluster does not resume from this activity in an operational state with all
jobs placed into running status.
Is there something we are missing? Thanks!
-jc
apiVersion: v1
kind: ConfigMap
metadata:
name: flink-config
labels:
app: flink
data:
flink-conf.yaml: |+
jobmanager.rpc.address: flink-jobmanager
taskmanager.numberOfTaskSlots: 1
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
jobmanager.heap.size: 1024m
taskmanager.memory.process.size: 1024m
kubernetes.cluster-id: cluster1
high-availability:
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: gs://storage-uswest.yyy.com/kubernetes-flink
state.backend: filesystem
state.checkpoints.dir: gs://storage-uswest.yyy.com/kubernetes-checkpoint
state.savepoints.dir: gs://storage-uswest.yyy.com/kubernetes-savepoint
execution.checkpointing.interval: 3min
execution.checkpointing.externalized-checkpoint-retention:
DELETE_ON_CANCELLATION
execution.checkpointing.max-concurrent-checkpoints: 1
execution.checkpointing.min-pause: 0
execution.checkpointing.mode: EXACTLY_ONCE
execution.checkpointing.timeout: 10min
execution.checkpointing.tolerable-failed-checkpoints: 0
execution.checkpointing.unaligned: false
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 10
restart-strategy.fixed-delay.delay 10s
log4j.properties: |+
log4j.rootLogger=INFO, file
log4j.logger.akka=INFO
log4j.logger.org.apache.kafka=INFO
log4j.logger.org.apache.hadoop=INFO
log4j.logger.org.apache.zookeeper=INFO
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.file=${log.file}
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS}
%-5p %-60c %x - %m%n
log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR,
file
___
Julian Cardarelli
CEO
T (800) 961-1549
[email protected]
LinkedIn
DISCLAIMER
Neither Thentia Corporation, nor its directors, officers, shareholders,
representatives, employees, non-arms length companies, subsidiaries, parent,
affiliated brands and/or agencies are licensed to provide legal advice. This
e-mail may contain among other things legal information. We disclaim any and
all responsibility for the content of this e-mail. YOU MUST NOT rely on any of
our communications as legal advice. Only a licensed legal professional may give
you advice. Our communications are never provided as legal advice, because we
are not licensed to provide legal advice nor do we possess the knowledge,
skills or capacity to provide legal advice. We disclaim any and all
responsibility related to any action you might take based upon our
communications and emphasize the need for you to never rely on our
communications as the basis of any claim or proceeding.
CONFIDENTIALITY
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed. If
you have received this email in error please notify the system manager. This
message contains confidential information and is intended only for the
individual(s) named. If you are not the named addressee(s) you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and delete
this e-mail from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.
Disclaimer
The information contained in this communication from the sender is
confidential. It is intended solely for use by the recipient and others
authorized to receive it. If you are not the recipient, you are hereby notified
that any disclosure, copying, distribution or taking action in relation of the
contents of this information is strictly prohibited and may be unlawful.
This email has been scanned for viruses and malware, and may have been
automatically archived by Mimecast, a leader in email security and cyber
resilience. Mimecast integrates email defenses with brand protection, security
awareness training, web security, compliance and other essential capabilities.
Mimecast helps protect large and small organizations from malicious activity,
human error and technology failure; and to lead the movement toward building a
more resilient world. To find out more, visit our website.