On Thu, 7 Jan 2016, Javen Wu wrote:
> Hi Sage,
>
> Sorry to bother you. I am not sure if it is appropriate to send email to you
> directly, but I cannot find any useful information to address my confusion
> from Internet. Hope you can help me.
>
> Occasionally, I heard that you are going to start BlueFS to eliminate the
> redudancy between XFS journal and RocksDB WAL. I am a little confused.
> Is the Bluefs only to host RocksDB for BlueStore or it's an
> alternative of BlueStore?
>
> I am a new comer to CEPH, I am not sure my understanding is correct about
> BlueStore. BlueStore in my mind is as below.
>
> BlueStore
> =========
> RocksDB
> +-----------+ +-----------+
> | onode | | |
> | WAL | | |
> | omap | | |
> +-----------+ | bdev |
> | | | |
> | XFS | | |
> | | | |
> +-----------+ +-----------+
This is the picture before BlueFS enters the picture.
> I am curious if BlueFS is able to host RocksDB, actually it's already a
> "filesystem" which have to maintain blockmap kind of metadata by its own
> WITHOUT the help of RocksDB.
Right. BlueFS is a really simple "file system" that is *just* complicated
enough to implement the rocksdb::Env interface, which is what rocksdb
needs to store its log and sst files. The after picture looks like
+--------------------+
| bluestore |
+----------+ |
| rocksdb | |
+----------+ |
| bluefs | |
+----------+---------+
| block device |
+--------------------+
> The reason we care the intention and the design target of BlueFS is that I had
> discussion with my partner Peng.Hse about an idea to introduce a new
> ObjectStore using ZFS library. I know CEPH supports ZFS as FileStore backend
> already, but we had a different immature idea to use libzpool to implement a
> new
> ObjectStore for CEPH totally in userspace without SPL and ZOL kernel module.
> So that we can align CEPH transaction and zfs transaction in order to avoid
> double write for CEPH journal.
> ZFS core part libzpool (DMU, metaslab etc) offers a dnode object store and
> it's platform kernel/user independent. Another benefit for the idea is we
> can extend our metadata without bothering any DBStore.
>
> Frankly, we are not sure if our idea is realistic so far, but when I heard of
> BlueFS, I think we need to know the BlueFS design goal.
I think it makes a lot of sense, but there are a few challenges. One
reason we use rocksdb (or a similar kv store) is that we need in-order
enumeration of objects in order to do collection listing (needed for
backfill, scrub, and omap). You'll need something similar on top of zfs.
I suspect the simplest path would be to also implement the rocksdb::Env
interface on top of the zfs libraries. See BlueRocksEnv.{cc,h} to see the
interface that has to be implemented...
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html