On 5/5/15 8:42 AM, Rick Macklem wrote:
Julian Elischer wrote:
On 5/3/15 10:33 PM, Jilles Tjoelker wrote:
On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov
wrote:
On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote:
if you are interested in readdir(3), seekdir(3) and telldir(3)
then
you should look at
     https://reviews.freebsd.org/D2410
this patches around a problem in seekdir() that breaks Samba.
Seekdir(3) will not work as expected when files prior to the
point of
interest in directory have been deleted since the directory was
opened.
Windows clients using Samba cause both these things to happen,
causing
the next readdir(3) after the bad seekdir(3) to skip some entries
and
return the wrong file.
Samba only needs to step back a single directory entry in the
case
where it reads an entry and then discovers it can't fit it into
the
buffer it is sending to the windows client. It turns out we can
reliably cater to Samba's requirement because the "last returned
element" is always still in memory, so with a little care, we can
set our filepointer back to it safely. (once)
seekdir and readdir (and telldir()) need a complete rewrite along
with
getdirentries() but that is more than a small edit like this.
Can you explain your expectations from the whole readdir() vs.
parallel
directory modifications interaction ?  From what I understood so
far,
there is unlocked modification of the container and parallel
iterator
over the same container.  IMO, in such situation, whatever tweaks
you
apply to the iterator, it is still cannot be made reliable.
Before making single-purpose changes to the libc readdir and
seekdir
code, or to the kernel code, it would be useful to state exact
behaviour
of the dirent machinery we want to see. No, 'make samba works in
my
situation' does not sound good enough.
Consider the subsequence of entries that existed at opendir() time
and
were not removed until now. This subsequence is clearly defined and
does
not have concurrency problems. The order of this subsequence must
remain
unchanged and seekdir() must be correct with respect to this
subsequence.

Additionally, two other kinds of entries may be returned. New
entries
may be inserted anywhere in between the entries of the subsequence,
and
removed entries may be returned as if they were still part of the
subsequence (so that not every readdir() needs a system call).

A simple implementation for UFS-style directories is to store the
offset
in the directory (all bits of it, not masking off the lower 9
bits).
This needs d_off or similar in struct dirent. The kernel
getdirentries()
then needs a similar loop as the old libc seekdir() to go from the
start
of the 512-byte directory block to the desired entry (since an
entry may
not exist at the stored offset within the directory block).

This means that a UFS-style directory cannot be compacted (existing
entries moved from higher to lower offsets to fill holes) while it
is
open for reading. An NFS exported directory is always open for
reading.

This also means that duplicate entries can only be returned if that
particular filename was deleted and created again.

Without kernel support, it is hard to get telldir/seekdir
completely
reliable. The current libc implementation is wrong since the
"holes"
within the block just disappear and change the offsets of the
following
entries; the kernel cannot fix this using entries with d_fileno = 0
since it cannot know, in the general case, how long the deleted
entry
was in the filesystem-independent dirent format. My previous idea
of
storing one d_fileno during telldir() is wrong since it will fail
if
that entry is deleted.

If you do not care about memory usage (which probably is already
excessive with the current libc implementation), you could store at
telldir() time the offset of the current block returned by
getdirentries() and the d_fileno of all entries already returned in
the
current block.

The D2410 patch can conceptually work for what Samba needs,
stepping
back one directory entry. I will comment on it.

how long do I have to wait until I can commit  this and was kib's
comment a
"do not commit"?

What about the bug Jilles reports in D2410?
- He said you might fix the problem by having telldir move the entry
   to the head of the list when it has a hit. However, this means that
   an "old" entry gets modified. Is it possible for this "modified"
   entry to be a match and get modified again, and so on...?

I will comment on the patch once you decide how to deal with Jilles
bug.
I don't think is is a "bug" as such..
it wasn't a case I was looking to fix and it is just as it was before.
I'd rephrase it as: "Jilles asks that we also fix the case where the previous telldir response is
a recycling of an old  telldir response."

In actual fact this scenario will almost never happen.
because the previous time the telldir call returned that location,
the 'fixtelldir' function would have later been called on the result, which
would have been modified to point to the start of the NEXT buffer,
and as such would not get matched on the next telldir to the same
place,  as that would be looking for the first location after the end
of the previous buffer. OR it does match because we seeked back to
that location, o which case we will get the same buffer alredy fixed.
The two addresses are logically equivalent,
but you can only know that after you have loaded the next buffer.

i.e. the first time telldir is called on location X it returns
    seek=123, location= 4096 (one past end of buffer)
 then a read happens and it gets converted to:
    seek= 124 location=0  (beginning of next buffer..
then some more reads happen or so and we
   seek back to 124,0
and do a new telldir, in which case we get a 'pre-fixed' value of
     124,0 not 123,4096.
 so while fixtelldir is not called, it doesn't matter.

If we seek back FURTHER than 124,0 then we are into the "unreliable
unfixed  zone".
 Assuming that by some luck we don't get confused (there were no
deletes) and we then work forwards back to 123,4096 and do a telldir again,
we will NOT match the old one, which was changed to be 124,0
so we will allocate a new one at 123,4096,
which in turn will get 'fixed' to be 124,0 So it worked.  the only 'less
than optimal' thing here is that we now have two entries saying 124,0.
But the case is so unusual and in a totally unsupported mode of
operation, that as far as I know no-one uses, that adding code
to merge the two entries is just not worth it. It still returns the
correct values, just wastes some memory. Anyone who wants to
do a DOS wasting these 16 bytes of memory, has to do it via writing
code to do it. so.. why would one DOS oneself?

we COULD mode the item up front but it'd never get matched unless
there had not been a matching read following the first telldir which
is really unlikely.

rick

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to
"freebsd-current-unsubscr...@freebsd.org"



_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to