Hi prometheus folks,

I have a question about alertmanager.

Here is an one year old issue about merging few HA alertmanager clusters 
into one big over time: 
https://github.com/prometheus/alertmanager/issues/2250 

I managed to reproduce it on my local k8s kind cluster. Seems there is 
small discrepancy between a list of peers reported by gossip library and a 
list of peers from am config file.

We can workaround it by using k8s network policy. However more proper fix 
would be on alertmanager side: keep eye on number of peers and compare with 
desired number. In case there is some unexpected state, clear table of 
peers, do DNS resolution once more and do form a new peer table. Maybe 
there is better solution. What do you think? 

Probably I even can introduce a PR if we can agree on a way to fix it and 
someone can support me with review : )

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/45dd29f4-cae7-4c42-9756-0ca92aa76884n%40googlegroups.com.

Reply via email to