On Wed, Mar 08, 2006 at 10:40:38AM +0100, Andi Kleen wrote:
> On Wednesday 08 March 2006 03:38, Benjamin LaHaise wrote:
> 
> > It's hardly that uncommon for pages to cross cachelines or for pages to
> > move around CPUs with networking. 
> 
> Data?

I posted a workload that shows this.   Anything that transmits over TCP on 
a CPU other than the one the interrupt occurs on is going to hit that 
behaviour for 7/8 of pages.

> > Please name some sort of benchmarks that show your concerns for decreased
> > performance.  
> 
> Anything that manipulates lots of data.

LMBench certainly doesn't look any worse.  In fact, things look slightly 
better.  (2.6.16n is with WANT_PAGE_VIRTUAL, while 2.6.16O is without).

cd results && make summary percent 2>/dev/null | more
make[1]: Entering directory `/md0/root/LMbench2/results'

                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------


Basic system parameters
----------------------------------------------------
Host                 OS Description              Mhz
                                                    
--------- ------------- ----------------------- ----
cobra.kva Linux 2.6.16n        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16n        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16n        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16n        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16n        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16O        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16O        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16O        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16O        x86_64-linux-gnu 2997
cobra.kva Linux 2.6.16O        x86_64-linux-gnu 2997

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                             call  I/O stat clos TCP   inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
cobra.kva Linux 2.6.16n 2997 0.23 0.30 1.45 2.70 8.974 0.35 3.19 111. 443. 1674
cobra.kva Linux 2.6.16n 2997 0.23 0.30 1.46 2.81 8.984 0.35 3.19 116. 443. 1679
cobra.kva Linux 2.6.16n 2997 0.23 0.30 1.45 2.74 9.761 0.35 3.17 112. 446. 1683
cobra.kva Linux 2.6.16n 2997 0.23 0.30 1.43 2.82 8.992 0.34 3.20 113. 444. 1683
cobra.kva Linux 2.6.16n 2997 0.23 0.30 1.45 2.83 8.975 0.35 3.18 112. 445. 1673
cobra.kva Linux 2.6.16O 2997 0.23 0.30 1.44 2.74 9.008 0.36 3.17 110. 443. 1674
cobra.kva Linux 2.6.16O 2997 0.23 0.30 1.45 2.75 9.003 0.37 3.19 112. 445. 1681
cobra.kva Linux 2.6.16O 2997 0.23 0.30 1.45 2.80 9.010 0.37 3.17 112. 452. 1689
cobra.kva Linux 2.6.16O 2997 0.23 0.30 1.45 2.75 9.007 0.36 3.15 112. 453. 1689
cobra.kva Linux 2.6.16O 2997 0.23 0.30 1.44 2.77 9.021 0.36 3.15 113. 448. 1686

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
cobra.kva Linux 2.6.16n 2.360 3.7500 6.6100 4.6600   10.1 5.20000    15.3
cobra.kva Linux 2.6.16n 2.420 3.7600 6.6500 4.8500   10.5 5.41000    15.0
cobra.kva Linux 2.6.16n 2.400 3.7600 6.5900 4.7000 9.6600 5.37000    15.0
cobra.kva Linux 2.6.16n 2.400 3.7600 6.5600 4.6300 9.6100 5.81000    15.0
cobra.kva Linux 2.6.16n 2.420 3.8200 6.5800 4.8400   10.7 5.47000    14.7
cobra.kva Linux 2.6.16O 2.430 4.4800 7.2100 4.8500   10.4 5.91000    15.8
cobra.kva Linux 2.6.16O 2.460 4.3100 7.2400 4.9000   10.7 5.42000    15.9
cobra.kva Linux 2.6.16O 2.450 4.4800 7.2800 4.7000   10.1 5.20000    15.9
cobra.kva Linux 2.6.16O 2.460 4.3900 6.9900 4.7200 9.7400 5.65000    15.2
cobra.kva Linux 2.6.16O 2.450 4.4700 6.9400 4.8100 8.7100 5.52000    15.5

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
cobra.kva Linux 2.6.16n 2.360 8.228 12.8  12.7  21.7  17.6  29.0 43.6
cobra.kva Linux 2.6.16n 2.420 8.192 13.1  12.7  22.1  17.6  29.2 44.0
cobra.kva Linux 2.6.16n 2.400 8.163 13.0  12.7  22.4  17.6  29.3 43.9
cobra.kva Linux 2.6.16n 2.400 8.163 11.7  12.7  21.9  17.5  29.0 44.4
cobra.kva Linux 2.6.16n 2.420 8.211 13.1  12.7  22.3  17.6  29.5 44.4
cobra.kva Linux 2.6.16O 2.430 8.273 13.1  14.4  23.8  17.7  29.7 43.9
cobra.kva Linux 2.6.16O 2.460 8.284 10.6  14.2  24.1  17.7  29.6 43.8
cobra.kva Linux 2.6.16O 2.450 8.454 13.5  14.3  24.1  17.7  29.8 43.9
cobra.kva Linux 2.6.16O 2.460 8.245 10.7  14.2  24.3  17.7  29.9 44.1
cobra.kva Linux 2.6.16O 2.450 8.395 13.6  14.3  23.8  17.8  29.8 44.2

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page       
                        Create Delete Create Delete  Latency Fault   Fault 
--------- ------------- ------ ------ ------ ------  ------- -----   ----- 
cobra.kva Linux 2.6.16n   15.4   12.5   49.1   26.2   2500.0       1.00000
cobra.kva Linux 2.6.16n   15.4   12.5   50.9   26.2   2512.0 0.126 1.00000
cobra.kva Linux 2.6.16n   15.5   12.6   50.3   25.6   2507.0       1.00000
cobra.kva Linux 2.6.16n   15.5   12.5   50.8   26.2   2514.0       1.00000
cobra.kva Linux 2.6.16n   15.4   12.5   51.0   26.2   2507.0       1.00000
cobra.kva Linux 2.6.16O   15.3   12.5   51.4   26.7   2533.0       1.00000
cobra.kva Linux 2.6.16O   15.3   12.5   51.7   26.7   2509.0       1.00000
cobra.kva Linux 2.6.16O   15.3   12.5   51.6   26.7   2522.0       1.00000
cobra.kva Linux 2.6.16O   15.3   12.5   51.6   26.8   2542.0       1.00000
cobra.kva Linux 2.6.16O   15.8   12.5   49.5   26.8   2521.0       1.00000

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
cobra.kva Linux 2.6.16n 1652 614K 1399 2365.9 4345.9 1369.6 1396.1 4347 1717.
cobra.kva Linux 2.6.16n 1654 615K 1383 2360.7 4347.5 1268.7 1303.2 4344 1900.
cobra.kva Linux 2.6.16n 1651 597K 1392 2361.1 4346.8 1257.1 1287.2 4346 1918.
cobra.kva Linux 2.6.16n 1608 593K 1372 2358.5 4348.1 1290.7 1328.7 4347 1960.
cobra.kva Linux 2.6.16n 1629 583K 1388 2354.8 4344.5 1291.5 1318.1 4346 1986.
cobra.kva Linux 2.6.16O 1625 657K 1394 2360.6 4346.1 1358.4 1352.0 4347 1740.
cobra.kva Linux 2.6.16O 1625 658K 1373 2349.9 4344.0 1361.3 1273.6 4347 1856.
cobra.kva Linux 2.6.16O 1657 585K 1370 2346.5 4344.5 1277.0 1299.7 4341 1934.
cobra.kva Linux 2.6.16O 1535 631K 1385 2343.8 4348.1 1278.9 1285.0 4347 1959.
cobra.kva Linux 2.6.16O 1621 605K 1368 2337.4 4342.7 1289.2 1295.0 4340 1982.

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------  ---- ----- ------    --------    -------
cobra.kva Linux 2.6.16n  2997 1.336 9.3600   44.2
cobra.kva Linux 2.6.16n  2997 1.336 9.4000   44.2
cobra.kva Linux 2.6.16n  2997 1.336 9.3540   44.1
cobra.kva Linux 2.6.16n  2997 1.336 9.3820   44.2
cobra.kva Linux 2.6.16n  2997 1.336 9.3500   44.2
cobra.kva Linux 2.6.16O  2997 1.336 9.3680   44.2
cobra.kva Linux 2.6.16O  2997 1.336 9.3780   44.2
cobra.kva Linux 2.6.16O  2997 1.336 9.3710   44.2
cobra.kva Linux 2.6.16O  2997 1.336 9.3650   44.2
cobra.kva Linux 2.6.16O  2997 1.336 9.3580   44.1
make[1]: Leaving directory `/md0/root/LMbench2/results'

> > I've shown you one that gets improved, and I think the pages 
> > not overlapping cachelines is only a good thing.
> 
> I think increasing the working set and wasting lots of money for this is only
> a bad thing.

> > I know these things look like piddly little worthless optimizations
> 
> In this case they look more like "make the big picture worse for some 
> microbenchmark" to me.

You haven't come up with any data to support your position, let alone even 
a specific benchmark that shows your concerns.  That sort of position is 
unreasonable to have to argue against.  Please come up with a specific 
benchmark addressing your concerns instead of this vague handwaving.

Besides, we're reducing the cache footprint for many users by only using 1 
cacheline per struct page instead of the 2 for every 7 out of 8 struct pages 
currently.  How many places in the kernel really do a linear walk of the 
struct page array?  Aside from early boot initialization, I think the answer 
is 0.  Random is far more likely, and this patch helps that usage model.

                -ben
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to