On Thu, Apr 20, 2006 at 05:27:42PM -0700, David S. Miller wrote: > From: Olof Johansson <[EMAIL PROTECTED]> > Date: Thu, 20 Apr 2006 16:33:05 -0500 > > > From the wiki: > > > > > 3. Data copied by I/OAT is not cached > > > > This is a I/OAT device limitation and not a global statement of the > > DMA infrastructure. Other platforms might be able to prime caches > > with the DMA traffic. Hint flags should be added on either the channel > > allocation calls, or per-operation calls, depending on where it makes > > sense driver/client wise. > > This sidesteps the whole question of _which_ cache to warm. And if > you choose wrongly, then what? > > Besides the control overhead of the DMA engines, the biggest thing > lost in my opinion is the perfect cache warming that a cpu based copy > does from the kernel socket buffer into userspace.
It's definitely the easiest way to always make sure the right caches are warm for the app, that I agree with. But, when warming those caches by copying, the data is pulled in through a potentially cold cache in the first place. So the cache misses are just moved from the copy loop to userspace with dma offload. Or am I missing something? > The first thing an application is going to do is touch that data. So > I think it's very important to prewarm the caches and the only > straightforward way I know of to always warm up the correct cpu's > caches is copy_to_user(). The other way (assuming the hardware supports cache warming) would be to pass down affinities (or look them up during receive processing, I'm not sure that's practical the way things work now), and dispatch on a DMA channel with the right cache affinity. I've got a feeling that "straightforward" is not a term to use for describing that solution though. > Unfortunately, many benchmarks just do raw bandwidth tests sending to > a receiver that just doesn't even look at the data. They just return > from recvmsg() and loop back into it. This is not what applications > using networking actually do, so it's important to make sure we look > intelligently at any benchmarks done and do not fall into the trap of > saying "even without cache warming it made things faster" when in fact > the tested receiver did not touch the data at all so was a false test. Yes, some real-life-like benchmarking is definitiely needed. Unfortunately I'm not at a position where I can do much (and share numbers) at the moment myself. -Olof - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html