On Fri, Jan 1, 2016 at 5:42 AM, lee <l...@yagibdah.de> wrote:
> "Stefan G. Weichinger" <li...@xunil.at> writes:
>
>> btrfs offers RAID-like redundancy as well, no mdadm involved here.
>>
>> The general recommendation now is to stay at level-1 for now. That fits
>> your 2-disk-situation.
>
> Well, what shows better performance?  No btrfs-raid on hardware raid or
> btrfs raid on JBOD?

I would run btrfs on bare partitions and use btrfs's raid1
capabilities.  You're almost certainly going to get better
performance, and you get more data integrity features.  If you have a
silent corruption with mdadm doing the raid1 then btrfs will happily
warn you of your problem and you're going to have a really hard time
fixing it, because btrfs only sees one copy of the data which is bad,
and all mdadm can tell you is that the data is inconsistent with no
idea which one is right.  You'd end up having to try to manipulate the
underlying data to figure out which one is right and fix it (the data
is all there, but you'd probably end up hex-editing your disks).  If
you were using btrfs raid1 you'd just run a scrub and it would
detect/fix the problem, since btrfs would see both copies and know
which one is right.  Then if you ever move to raid5 when that matures
you eliminate the write hole with btrfs.

>>
>> I would avoid converting and stuff.
>>
>> Why not try a fresh install on the new disks with btrfs?
>
> Why would I want to spend another year to get back to where I'm now?

I wouldn't do a fresh install.  I'd just set up btrfs on the new disks
and copy your data over (preserving attributes/etc).  Before I did
that I'd create any subvolumes you want to have on the new disks and
copy the data into them.  The only way to convert a directory into a
subvolume after the fact is to create a subvolume with the new name,
copy the directory into it, and then rename the directory and
subvolume to swap their names, then delete the old directory.  That is
time-consuming, and depending on what directory you're talking about
you might want to be in single-user or boot from a rescue disk to do
it.

I wouldn't do an in-place ext4->btrfs conversion.  I know that there
were some regressions in that feature recently and I'm not sure where
it stands right now.

>> I never had /boot on btrfs so far, maybe others can guide you with this.
>>
>> My /boot is plain extX on maybe RAID1 (differs on
>> laptops/desktop/servers), I size it 500 MB to have space for multiple
>> kernels (especially on dualboot-systems).
>>
>> Then some swap-partitions, and the rest for btrfs.
>
> There you go, you end up with an odd setup.  I don't like /boot
> partitions.  As well as swap partitions, they need to be on raid.  So
> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/
> perhaps ext4, /and/ multiple partitions.

With grub2 you can boot from btrfs.  I used to use a separate boot
partition on ext4 with btrfs for the rest, but now my /boot is on my
root partition.  I'd still partition space for a boot partition in
case you move to EFI in the future but I wouldn't bother formatting it
or setting it up right now.  As long as you're using grub2 you really
don't need to do anything special.

You DO need to partition your disks though, even if you only have one
big partition for the whole thing.  The reason is that this gives
space for grub to stick its loaders/etc on the disk.

I don't use swap.  If I did I'd probably set up an mdadm array for it.
According to the FAQ btrfs still doesn't support swap from a file.

There isn't really anything painful about that setup though.  Swap
isn't needed to boot, so openrc/systemd will start up mdadm and
activate your swap.  I'm not sure if dracut will do that during early
boot or not, but it doesn't really matter if it does.

If you have two drives I'd just set them up as:
sd[ab]1 - 1GB boot partition unformatted for future EFI
sd[ab]2 - mdadm raid1 for swap
sd[ab]3 - btrfs


> When you use hardware raid, it
> can be disadvantageous compared to btrfs-raid --- and when you use it
> anyway, things are suddenly much more straightforward because everything
> is on raid to begin with.

I'd stick with mdadm.  You're never going to run mixed
btrfs/hardware-raid on a single drive, and the only time I'd consider
hardware raid is with a high quality raid card.  You'd still have to
convince me not to use mdadm even if I had one of those lying around.

>> Create your btrfs-"pool" with:
>>
>> # mkfs.btrfs -m raid1 -d raid1 /dev/sda3 /dev/sdb3
>>
>> Then check for your btrfs-fs with:
>>
>> # btrfs fi show
>>
>> Oh: I realize that I start writing a howto here ;-)
>
> That doesn't work without an extra /boot partition?

It works fine without a boot partition if you're using grub2.  If you
want to use grub legacy you'll need a boot partition.

>
> How's btrfs's performance when you use swap files instead of swap
> partitions to avoid the need for mdadm?

btrfs does not support swap files at present.  When it does you'll
need to disable COW for them (using chattr) otherwise they'll be
fragmented until your system grinds to a halt.  A swap file is about
the worst case scenario for any COW filesystem - I'm not sure how ZFS
handles them.

>
> Now I understand that it's apparently not possible to simply make a
> btrfs-raid1 from the two raw disks, copy the system over, install grub
> and boot from that.  (I could live with swap files instead of swap
> partitions.)

Even if you used no swap and no boot like I have right now, you'd
still want to create a single large partition for better grub2
support.  Without space between the partition table and the first
partition (which you'll want to start at 2048 or whatever the default
is these days) it has to resort to blocklists.  That means that if for
any reason the files in /boot/grub move on disk your system won't
boot.  That isn't a btrfs thing - it holds just as true if you're
using ext4 and is generally frowned upon.

>
>> As mentioned here several times I am using btrfs on >6 of my systems for
>> years now. And I don't look back so far.
>
> And has it always been reliable?
>

I've never had an episode that resulted in actual data loss.  I HAVE
had an episode or two which resulted in downtime.

When I've had btrfs issues I can generally mount the filesystem
read-only just fine.  The problem was that cleanup threads were
causing kernel BUGs which cause the filesystem to stop syncing (not a
full panic, but when all your filesystems are effectively read-only
there isn't much difference in many cases).  If I rebooted the system
would BUG within a few minutes.  In one case I was able to boot from a
more recent kernel on a rescue disk and fix things by just mounting
the drive and letting it sit for 20min to finish cleaning things up
while the disk was otherwise idle (some kind of locking issue most
likely) - maybe I had to run btrfsck on it.  In the other case it was
being really fussy and I ended up just restoring from a backup since
that was the path of least resistance.  I could have probably
eventually fixed the problem, and the drive was mountable read-only
the entire time so given sufficient space I could have copied all the
data over to a new filesystem with no loss at all.

Things have been pretty quiet for the last six months though, and I
think it is largely due to a change in strategy around kernel
versions.  Right now I'm running 3.18.  I'm starting to consider a
move to 4.1, but there is a backlog of btrfs fixes for stable that I'm
waiting for Greg to catch up on and maybe I'll wait for a version
after that to see if things settle down.  Around the time of
3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point
I think newer kernels are more likely to introduce regressions than
fix problems.  The pace of btrfs patching seems to have increased as
well in the last year (which is good in the long-term - most are
bugfixes - but in the short term even bugfixes can introduce bugs).
Unless I have a reason not to at this point I plan to run only
longterm kernels, and move to them when they're about six months
mature.

If I had done that in the past I think I would have completely avoided
that issue that required me to restore from backups.  That happened in
the 3.15/3.16 timeframe and I'd have never even run those kernels.
They were stable kernels at the time, and a few versions in when I
switched to them (I was probably just following gentoo-sources stable
keywords back then), but they still had regressions (fixes were
eventually backported).

I think btrfs is certainly usable today, though I'd be hesitant to run
it on production servers depending on the use case (I'd be looking for
a use case that actually has a significant benefit from using btrfs,
and which somehow mitigates the risks).

Right now I keep a daily rsnapshot (rsync on steroids - it's in the
Gentoo repo) backup of my btrfs filesystems on ext4.  I occasionally
debate whether I still need it, but I sleep better knowing I have it.
This is in addition to my daily duplicity cloud backups of my most
important data (so, /etc and /home are in the cloud, and mythtv's
/var/video is just on a local rsync backup).

Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm
raid5/6 is fine, but you lose the data integrity features).  I
wouldn't go anywhere near that for at least a year, and probably
longer.

Overall I'm very happy with btrfs though.  Snapshots and reflinks are
very handy - I can update containers and nfs roots after snapshotting
them and it gives me a trivial rollback solution, and while I don't
use snapper I do manually rotate through snapshots weekly.  If you do
run snapper I'd probably avoid generating large numbers of snapshots -
one of my BUG problems happened as a result of snapper deleting a few
hundred snapshots at once.

Btrfs's deferred processing of the log/btrees can cause the kinds of
performance issues associated with garbage collection (or BUGs due to
thundering herd problems).  I use ionice to try to prioritize my IO so
that stuff like mythtv recordings will block less realtime activities,
and in the past that hasn't always worked with btrfs.  The problem is
that btrfs would accept too much data into its log, and then it would
block all writes while it tried to catch up.  I haven't seen that as
much recently, so maybe they're getting better about that.  As with
any other scheduling problem it only works if you correctly block
writes into the start of the pipeline (I've heard of similar problems
with TCP QoS and such if you don't ensure that the bottleneck is the
first router along the route - you can let in too much low-priority
traffic and then at that point you're stuck dealing with it).

I'd suggest looking at the btrfs mailing list to get a survey for what
people are dealing with.  Just ignore all the threads marked as
patches and look at the discussion threads.

If you're getting the impression that btrfs isn't quite
fire-and-forget, you're getting the right impression.  Neither is
Gentoo, so I wouldn't let that alone scare you off.  But, I see no
reason to not give you fair warning.

-- 
Rich

Reply via email to