On Fri, Jan 1, 2016 at 5:42 AM, lee <l...@yagibdah.de> wrote: > "Stefan G. Weichinger" <li...@xunil.at> writes: > >> btrfs offers RAID-like redundancy as well, no mdadm involved here. >> >> The general recommendation now is to stay at level-1 for now. That fits >> your 2-disk-situation. > > Well, what shows better performance? No btrfs-raid on hardware raid or > btrfs raid on JBOD?
I would run btrfs on bare partitions and use btrfs's raid1 capabilities. You're almost certainly going to get better performance, and you get more data integrity features. If you have a silent corruption with mdadm doing the raid1 then btrfs will happily warn you of your problem and you're going to have a really hard time fixing it, because btrfs only sees one copy of the data which is bad, and all mdadm can tell you is that the data is inconsistent with no idea which one is right. You'd end up having to try to manipulate the underlying data to figure out which one is right and fix it (the data is all there, but you'd probably end up hex-editing your disks). If you were using btrfs raid1 you'd just run a scrub and it would detect/fix the problem, since btrfs would see both copies and know which one is right. Then if you ever move to raid5 when that matures you eliminate the write hole with btrfs. >> >> I would avoid converting and stuff. >> >> Why not try a fresh install on the new disks with btrfs? > > Why would I want to spend another year to get back to where I'm now? I wouldn't do a fresh install. I'd just set up btrfs on the new disks and copy your data over (preserving attributes/etc). Before I did that I'd create any subvolumes you want to have on the new disks and copy the data into them. The only way to convert a directory into a subvolume after the fact is to create a subvolume with the new name, copy the directory into it, and then rename the directory and subvolume to swap their names, then delete the old directory. That is time-consuming, and depending on what directory you're talking about you might want to be in single-user or boot from a rescue disk to do it. I wouldn't do an in-place ext4->btrfs conversion. I know that there were some regressions in that feature recently and I'm not sure where it stands right now. >> I never had /boot on btrfs so far, maybe others can guide you with this. >> >> My /boot is plain extX on maybe RAID1 (differs on >> laptops/desktop/servers), I size it 500 MB to have space for multiple >> kernels (especially on dualboot-systems). >> >> Then some swap-partitions, and the rest for btrfs. > > There you go, you end up with an odd setup. I don't like /boot > partitions. As well as swap partitions, they need to be on raid. So > unless you use hardware raid, you end up with mdadm /and/ btrfs /and/ > perhaps ext4, /and/ multiple partitions. With grub2 you can boot from btrfs. I used to use a separate boot partition on ext4 with btrfs for the rest, but now my /boot is on my root partition. I'd still partition space for a boot partition in case you move to EFI in the future but I wouldn't bother formatting it or setting it up right now. As long as you're using grub2 you really don't need to do anything special. You DO need to partition your disks though, even if you only have one big partition for the whole thing. The reason is that this gives space for grub to stick its loaders/etc on the disk. I don't use swap. If I did I'd probably set up an mdadm array for it. According to the FAQ btrfs still doesn't support swap from a file. There isn't really anything painful about that setup though. Swap isn't needed to boot, so openrc/systemd will start up mdadm and activate your swap. I'm not sure if dracut will do that during early boot or not, but it doesn't really matter if it does. If you have two drives I'd just set them up as: sd[ab]1 - 1GB boot partition unformatted for future EFI sd[ab]2 - mdadm raid1 for swap sd[ab]3 - btrfs > When you use hardware raid, it > can be disadvantageous compared to btrfs-raid --- and when you use it > anyway, things are suddenly much more straightforward because everything > is on raid to begin with. I'd stick with mdadm. You're never going to run mixed btrfs/hardware-raid on a single drive, and the only time I'd consider hardware raid is with a high quality raid card. You'd still have to convince me not to use mdadm even if I had one of those lying around. >> Create your btrfs-"pool" with: >> >> # mkfs.btrfs -m raid1 -d raid1 /dev/sda3 /dev/sdb3 >> >> Then check for your btrfs-fs with: >> >> # btrfs fi show >> >> Oh: I realize that I start writing a howto here ;-) > > That doesn't work without an extra /boot partition? It works fine without a boot partition if you're using grub2. If you want to use grub legacy you'll need a boot partition. > > How's btrfs's performance when you use swap files instead of swap > partitions to avoid the need for mdadm? btrfs does not support swap files at present. When it does you'll need to disable COW for them (using chattr) otherwise they'll be fragmented until your system grinds to a halt. A swap file is about the worst case scenario for any COW filesystem - I'm not sure how ZFS handles them. > > Now I understand that it's apparently not possible to simply make a > btrfs-raid1 from the two raw disks, copy the system over, install grub > and boot from that. (I could live with swap files instead of swap > partitions.) Even if you used no swap and no boot like I have right now, you'd still want to create a single large partition for better grub2 support. Without space between the partition table and the first partition (which you'll want to start at 2048 or whatever the default is these days) it has to resort to blocklists. That means that if for any reason the files in /boot/grub move on disk your system won't boot. That isn't a btrfs thing - it holds just as true if you're using ext4 and is generally frowned upon. > >> As mentioned here several times I am using btrfs on >6 of my systems for >> years now. And I don't look back so far. > > And has it always been reliable? > I've never had an episode that resulted in actual data loss. I HAVE had an episode or two which resulted in downtime. When I've had btrfs issues I can generally mount the filesystem read-only just fine. The problem was that cleanup threads were causing kernel BUGs which cause the filesystem to stop syncing (not a full panic, but when all your filesystems are effectively read-only there isn't much difference in many cases). If I rebooted the system would BUG within a few minutes. In one case I was able to boot from a more recent kernel on a rescue disk and fix things by just mounting the drive and letting it sit for 20min to finish cleaning things up while the disk was otherwise idle (some kind of locking issue most likely) - maybe I had to run btrfsck on it. In the other case it was being really fussy and I ended up just restoring from a backup since that was the path of least resistance. I could have probably eventually fixed the problem, and the drive was mountable read-only the entire time so given sufficient space I could have copied all the data over to a new filesystem with no loss at all. Things have been pretty quiet for the last six months though, and I think it is largely due to a change in strategy around kernel versions. Right now I'm running 3.18. I'm starting to consider a move to 4.1, but there is a backlog of btrfs fixes for stable that I'm waiting for Greg to catch up on and maybe I'll wait for a version after that to see if things settle down. Around the time of 3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point I think newer kernels are more likely to introduce regressions than fix problems. The pace of btrfs patching seems to have increased as well in the last year (which is good in the long-term - most are bugfixes - but in the short term even bugfixes can introduce bugs). Unless I have a reason not to at this point I plan to run only longterm kernels, and move to them when they're about six months mature. If I had done that in the past I think I would have completely avoided that issue that required me to restore from backups. That happened in the 3.15/3.16 timeframe and I'd have never even run those kernels. They were stable kernels at the time, and a few versions in when I switched to them (I was probably just following gentoo-sources stable keywords back then), but they still had regressions (fixes were eventually backported). I think btrfs is certainly usable today, though I'd be hesitant to run it on production servers depending on the use case (I'd be looking for a use case that actually has a significant benefit from using btrfs, and which somehow mitigates the risks). Right now I keep a daily rsnapshot (rsync on steroids - it's in the Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally debate whether I still need it, but I sleep better knowing I have it. This is in addition to my daily duplicity cloud backups of my most important data (so, /etc and /home are in the cloud, and mythtv's /var/video is just on a local rsync backup). Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm raid5/6 is fine, but you lose the data integrity features). I wouldn't go anywhere near that for at least a year, and probably longer. Overall I'm very happy with btrfs though. Snapshots and reflinks are very handy - I can update containers and nfs roots after snapshotting them and it gives me a trivial rollback solution, and while I don't use snapper I do manually rotate through snapshots weekly. If you do run snapper I'd probably avoid generating large numbers of snapshots - one of my BUG problems happened as a result of snapper deleting a few hundred snapshots at once. Btrfs's deferred processing of the log/btrees can cause the kinds of performance issues associated with garbage collection (or BUGs due to thundering herd problems). I use ionice to try to prioritize my IO so that stuff like mythtv recordings will block less realtime activities, and in the past that hasn't always worked with btrfs. The problem is that btrfs would accept too much data into its log, and then it would block all writes while it tried to catch up. I haven't seen that as much recently, so maybe they're getting better about that. As with any other scheduling problem it only works if you correctly block writes into the start of the pipeline (I've heard of similar problems with TCP QoS and such if you don't ensure that the bottleneck is the first router along the route - you can let in too much low-priority traffic and then at that point you're stuck dealing with it). I'd suggest looking at the btrfs mailing list to get a survey for what people are dealing with. Just ignore all the threads marked as patches and look at the discussion threads. If you're getting the impression that btrfs isn't quite fire-and-forget, you're getting the right impression. Neither is Gentoo, so I wouldn't let that alone scare you off. But, I see no reason to not give you fair warning. -- Rich