Hi Bifroest sounds like a very interesting project and within my field of experience. I have worked about 3/4 year to implement circonus in ASF (It was then decided for good reasons not to use it for alerting), and before that I designed SCADA systems to monitor/control electrical grids.
today I live in southern spain (I am danish) so TZ fits nicely. I volunteer to champion for you, if the project want it, but suggest we exchange some mails offlist to checkout your wishes and my possibilities. rgds jan I. On 7 October 2014 10:59, Harald Kraemer <hkrae...@goodgamestudios.com> wrote: > Hi, > > we have been allowed to open-source one of our company internal projects - > currently called Bifroest. Bifroest is a storage backend for graphite-web, > based on Apache Cassandra. I'm quite happy about this, and now I'm in the > process of finding the best options and means to do so. This mail isn't an > entire proposal yet, but I will try to stick at least to the major points > in a proposal. > > What does Bifroest do, and where does it come from. > > At GoodgameStudios, we used Munin for most of our monitoring, using a lot > of custom plugins for our servers and pushing 500 - 700 hosts around. > That's ambitious with munin and by now, the munin-master is not able to > take the stress anymore. > As such, we started to evaluate graphite, since graphite is the state of > the art larger scale monitoring solution. To start evaluating graphite, we > deployed graphite with a carbon backend on a virtual machine. Our senior > monitoring admin (which we didn't have back then) probably just had to > giggle a bit and doesn't know why - things didn't perform that well on a > virtual machine. It could handle the important data, but the system didn't > seem to scale that well. > An admin would have tossed hardware at this, SSD-Raids and all that, > naturally. But we are software engineers, not admins, thus we tossed > software at it (until we required hardware) :) > > Our intention was to have a graphite with data stored in a distributed > database. A distributed database would scale both in storage space and in > load the system can deal with. And it's all behind a well-defined > interface. That seemed like a nifty feature for a scalable monitoring > system. > Hence, we tried Cyanide, since Cyanide was just that. Tossed a lot of data > into Apache Cassandra, click on the metric tree and... well. Nothing > happened, since Cyanide figured that a "select *" across several 100k rows > is a grand idea. After that, we looked at InfluxDB, but at the time we > started developing this, InfluxDB didn't support data aggregation and > seemed to be in a very, very early stage of development. > > Thus, the first thought of bifroest was born: Why don't we take the good > parts of Cyanide, a solid distributed database, such as Apache Cassandra, > and the good parts of carbon and toss them in a big stew? > > That's what we did, and that's what we are currently deploying as our > productive monitoring system, graphite on bifroest as a frontend for apache > cassandra. > > Fun features of this system include: > - Existing graphite and most carbon apis: > -- Full support of the graphite rest API, since we are just a backend. > -- Support for the Plaintext Protocol of Carbon > -- Planned: An AMQP interface to handle globally distributed networks > - Neat things, which graphite could do as well: > -- A fast key cache > -- A fast value-cache, which is fed by the data collection to hit the > database as little as possible > - New things, Graphite+carbon+whisper cannot do: > -- On the fly adjustable retention levels. You don't have the space to > keep 6 weeks of 1m data? Just reduce it. Or increase it. Our system can do > that on the fly. > -- Currently in progress: On the fly addition of new retention levels. > Have an emergency and need data in greater resolution? Just add a retention > level with 1 datapoint / 5s, keep the full data history and tell your data > collection to collect more data and delete it later on again wiithout > losing data. > -- High fault tolerance. We are relying on cassandra for persistent > storage, and a properly deployed cassandra cluster with redundancy just > doesn't care. Add a new machine, tell everything to rebuild the cluster and > the frontend didn't even notice the outage. > > So, after this wall of text, there are two questions from me: > > a) is this project interesting enough for everyone? :) > b) Are there people who would volunteer to coach me and my team through the > proposal and the incubator? > > Regards, > Harald. > -- > > *Harald Krämer* > Server Developer (Profiling first) > *hkrae...@goodgamestudios.com <hkrae...@goodgamestudios.com>* > > Goodgame Studios > Theodorstr. 42-90, House 9 > 22761 Hamburg, Germany > Phone: +49 (0)40 219 880 -0 > *www.goodgamestudios.com <http://www.goodgamestudios.com>* > > Goodgame Studios is a branch of Altigi GmbH > Altigi GmbH, District court Hamburg, HRB 99869 > Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian > Ritter >