Re: [OpenIndiana-discuss] ZFS; what the manuals don't say ...

Robin Axelsson Tue, 23 Oct 2012 08:44:42 -0700

On 2012-10-23 16:22, George Wilson wrote:

Comments inline...
On 10/23/12 8:29 AM, Robin Axelsson wrote:
Hi,
I've been using zfs for a while but still there are some questionsthat have remained unanswered even after reading the documentation soI thought I would ask them here.
I have learned that zfs datasets can be expanded by adding vdevs. Saythat you have created say a raidz3 pool named "mypool" with the command
# zpool create mypool raidz3 disk1 disk2 disk3 ... disk8

you can expand the capacity by adding vdevs to it through the command

# zpool add mypool raidz3 disk9 disk10 ... disk16
The vdev that is added doesn't need to have the same raid/mirrorconfiguration or disk geometry, if I understand correctly. It willmerely be dynamically concatenated with the old storage pool. Thedocumentations says that it will be "striped" but it is not so clearwhat that means if data is already stored in the old vdevs of the pool.
Unanswered questions:
* What determines _where_ the data will be stored on a such a pool?Will it fill up the old vdev(s) before moving on to the new one orwill the data be distributed evenly?
The data is written in a round-robin fashion across all the top-levelvdevs (i.e. the raidz vdevs). So it will get distributed across themas you fill up the pool. It does not fill up one vdev before proceeding.
* If the old pool is almost full, an even distribution will beimpossible, unless zpool rearranges/relocates data upon adding thevdev. Is that what will happen upon adding a vdev?
As you write new data it will try to even out the vdevs. In many casesthis is not possible and you may end up with the majority of thewrites going to the empty vdevs. There is logic in zfs to avoidcertain vdevs if we're unable to allocate from them during a giventransaction group commit. So when vdevs are very full you may findthat very little data is being written to them.
* Can the individual vdevs be read independently/separately? If saythe newly added vdev faults, will the entire pool be unreadable orwill I still be able to access the old data? What if I took asnapshot before adding the new vdev?
If you lose a top-level vdev then you probably won't be able to accessyour old data. If you're lucky you might be able to retrieve some datathat was not contained on that top-level vdev but given that ZFSstripes across all vdevs it means that most of your data could belost. Losing a leaf vdev (i.e. a single disk) within a top-level vdevis a different story. If you lose a leaf vdev then raidz will allowyou to continue to use the disk and pool in a degraded state. You canthen spare out the failed leaf vdev or replace the disk.
* Can several datasets be mounted to the same mount point, i.e. canmultiple "file system"-datasets be mounted so that they (the root ofthem) are all accessed from exactly the same (POSIX) path andsubdirectories with coinciding names will be merged? The purpose ofthis would be to seamlessly expand storage capacity this way justlike when adding vdevs to a pool.
I think you might be confused about datasets and how they areexpanded. Datasets see all the space within a pool. There is not aone-to-one mapping of dataset to pool. So if you want to create 10datasets and you find that you're running out of space then you simplyadd another top-level vdev to your pool and all the dataset see theadditional space. I pretty certain that doesn't answer your questionbut maybe it helps in other ways. Feel free to ask again.

But if I have two raidz3 vdevs, is there any way to create anisolation/separation between them so that if one of them fails, only thedata that is stored within that vdev will be lost and all data thathappen to be stored in the other can be recovered? And yet let them bothbe accessible from the same path?

The only thing that needs to be sorted out is where the files should gowhen you write to that path and avoid splitting such that one half ifthe file goes to one vdev and another goes to the other vdev. Maybethere is some disk or i/o scheduler that can handle such operations?

I can't see how a dataset can span over several zpools as you usuallycreate it with mypool/datasetname (in the case of a file systemdataset). But I can see several datasets in one pool though (e.g.mypool/dataset1, mypool/dataset2 ...). So the relationship I see is pool*onto* dataset.

But if I have two separate pools with separate names, say mypool1 andmypool2 I could create a zfs file system dataset with the same name ineach of these pools and then give these two datasets the same"mountpoint" property couldn't I? Then they would be forced to bemounted to the same path.


I feel now that the other questions are straightened out.

* If that's the case how will the data be distributed/allocated overthe datasets if I copy a data file to that path?
Data from all datasets are striped across the top-level vdevs. Thenotion of a given dataset only writing to a single raidz device in thepool does not exist.
Thanks,
George
Kind regards
Robin.



_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

.




_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] ZFS; what the manuals don't say ...

Reply via email to