Re: [OpenZFS Developer] [zfs] Normalization in ZFS

Yuri Pankov Sat, 14 Nov 2015 19:38:45 -0800

Thanks, Matthew.

On Sat, 14 Nov 2015 17:36:54 -0800, Matthew Ahrens wrote:



On Sat, Nov 14, 2015 at 4:14 AM, Yuri Pankov <[email protected]
<mailto:[email protected]>> wrote:

    I'm trying to understand the idea behind the "normalization"
    property in ZFS.

    What's the original idea behind the normalization when
    "normalization" is set to "none" - is it "Or we could choose to be
    normalization-insensitive on LOOKUP and normalization-preserving on
    CREATE." as described in [1]?


According to the zfs.1m manpage: "File names are always stored
unmodified, names are normalized as part of any comparison process."  In
general, normalization and casesensitivity work similarly: we always
store the specified bytes, but depending on the settings, some byte
sequences may be considered "identical" from the point of view of lookup
and create operations (in terms of determining if an entry exists).

Therefore:
  - when you list the entries, you will always see the bytes sequence
you used to create a file.
  - when you lookup a byte sequence, it may match a file whose name is a
different byte sequence, but which is considered to be the equivalent
according to the normalization and casesensitivity properties.  (e.g.
casesensitivity=insensitive, there is a file name "foo", you lookup
"Foo", it will match the existing file).
  - when you create a file, it may fail with EEXIST if there is a file
with a name that is equivalent according to the normalization and
casesensitivity properties.

normalization=none means that we do not do normalization, so even if two
characters look the same, if they use different byte sequences, they
will be considered to be distinct.  (Analogous to
casesensitivity=sensitive.)

So we are NOT normalization-insensitive by default, and treat filenamesas just byte sequences with normalization=none?

Hopefully the answers to your specific questions below are obvious given
the above principles:


    When comparing filenames for other "normalization" values, which
    part of the comparison do we normalize - the stored filename, or the
    one in lookup request?


normalize on lookup.

That was my question - WHAT do we normalize? eg, we have a filenamestored in NFC, and there's a request to delete same filename, but inNFD, do we normalize stored filename? do we normalize the one inrequest? do we normalize both (that would make no sense to havedifferent normalization values then)?

    Currently I'm seeing that "normalization-preserving on CREATE" part
    is there, but "normalization-insensitive on LOOKUP" is not:


    # zfs create -o mountpoint=/norm/n -o utf8only=on -o
    normalization=none rpool/formN


That's because you requested that it not be, by setting normalization=none.

    # cd /norm/n
    # touch $( echo "\xc3\xbc" )
    # touch $( echo "\x75\xcc\x88" )
    # ls
    ü  ü
    # LC_ALL=C ls -b
    u\314\210 \303\274

    What of the following is correct per design, not as currently
    implemented (given we have the "same" filename with "ü" character in
    NFC and NFD forms as "fileC" and "fileD"):

    A. for all normalization settings the filename itself is NOT modified.


Correct.


    B. normalization=none
    - creating either of fileC OR fileD is OK, creating another form
    when one exists is NOT.


Incorrect, the names are not equivalent according to normalization=none,
so you can create both names,


    C.  normalization=formC
    - creating either of fileC OR fileD is OK


Correct.

    - C1. fileC exists, creating fileD is OK;


Incorrect, these names are equivalent according to normalization=formC,
so you will get EEXIST when creating fileD.

    fileD exists, creating fileC isn't OK - normalizing stored filename.


Correct, creating fileC will get EEXIST.

    - OR
    - C2. fileC exists, creating fileD isn't OK;


Correct, creating fileD will get EEXIST.

    fileD exists, creating fileC is OK - normalizing the looked up filename.


Incorrect, creating fileC will get ENOENT


    D. normalization=formD, same as C, swapping the fileC and fileD.


Same as with formC, because you said the names are equivalent under both
formC and formD.



    1. https://blogs.oracle.com/nico/entry/filesystem_i18n


That post seems accurate to me.

Given the above answers, what is the difference between formC and formDsetting then?

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Re: [OpenZFS Developer] [zfs] Normalization in ZFS

Reply via email to