On Wed, Sep 3, 2014 at 10:51 AM, Stefan Fritsch <s...@sfritsch.de> wrote: > On Tuesday 02 September 2014 15:22:16, Philip Guenther wrote: >> > From physio(9): >> > minphys >> > A device specific routine called to determine the >> >maximum transfer size that the device's strategy routine can >> >handle. >> > >> > Since we have seen that the driver must be able to handle 64k >> > blocks in any case, the fact that minphys is device specific is >> > useless, isn't it? >> >> physio() is used by character device access. Looks to me like >> sdminphys() will change the chunking behavior of this: >> dd if=/dev/zero of=/dev/rsd0a bs=100M count=1 >> >> depending on whether sd0 is a SCSI-I device, no? > > Yes, but that does not make it any less useless. File systems will > call the very same strategy routine and expect it to deal with 64K > blocks. The statement in the man page gives the misleading impression > that the [minphys] routine could be used to avoid that the strategy > routine has to handle 64K blocks, which is not true.
We don't seem to have a manpage describing the routines that a block device must implement and their requirements, nor the same for character devices. Those would be nice additions to our section 9 manpages. This requirement on a block device's strategy routine comes from the filesystem layer. These *do not use physio()*, so how does it make sense to document that requirement in physio(9)? That manpage doesn't even contain the word "block"! > So, maybe the minphys mechanism is useful for block devices that are > not used for file systems. Are there any such devices? The current minphys mechanism *should* be useful for devices which do not store file systems with a block size greater than the maximum I/O size of the device. IMO, if a filesystem is configured with block size less than the device's minphys, then it should not be doing breads of more than that block size. If that's the case, then manually setting the file system block size to less than the device's minphys should let you create a filesystem on that device. (Creating/using a filesystem with block size greater the device minphys seems like a Bad Idea for performance and possibly correctness reasons. It's like the disk I/O version of IP fragmentation...) To go back to a previous email, you wrote: > But what is really strange is, why does openbsd then have > an infrastructure to set different max transfer sizes for physio on a > per-adapter basis? This makes no sense. Either the drivers have to support > 64k transfers, in which case most of the minphys infrastructure is > useless, or they don't have to. In the latter case the minphys > infrastructure would need to be used in all code paths. The answer to this question can be found by reading cvs logs and diffs: MAXBSIZE used to be strictly less than MAXPHYS and less than or equal to the smallest minphys of a device supported on the given architecture. MAXBSIZE was MD then, and also apparently reflected limitations in the pmap; i386 pmap bugs delayed it from going to 64k MAXBSIZE until after all the others did. Why bother with permitting character device access in chunks greater than MAXBSIZE? Probably because some tools (newfs, dump, fsck) had significant performance gains by doing I/O in the largest chunk supported by the device, irrespective of the limitations of the pmap, buffer cache, or filesystems. Oh, and tapes really wanted larger I/Os so that they could stream, IIRC. IMO, the problem that you're hitting with your vioblk device isn't a problem with MAXPHYS, physio(), or minphys, but rather with MAXBSIZE. Back in the 1990's, the fact that you want to be able to configure a device with minphys 20kB on your arch means that the MD MAXBSIZE define for your arch should be 20kB. If MAXBSIZE could be set/limited on a per-device basis (i.e., the code using it queried the device) then your problem would be resolved, no? Philip Guenther