On 28 Aug 2014, at 11:05 pm, Mike Belopuhov <m...@belopuhov.com> wrote:

> On 28 August 2014 12:32, David Gwynne <da...@gwynne.id.au> wrote:
>> 
>> On 28 Aug 2014, at 3:02 am, Mike Belopuhov <m...@belopuhov.com> wrote:
>> 
>>> On 27 August 2014 08:25, Brad Smith <b...@comstyle.com> wrote:
>>>> Looking for some testing of the following diff to add Jumbo support for the
>>>> BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766
>>>> chipsets.
>>>> 
>>>> 
>>> 
>>> i have tested this on "Broadcom BCM5719" rev 0x01, unknown BCM5719 
>>> (0x5719001),
>>> APE firmware NCSI 1.1.15.0  and "Broadcom BCM5714" rev 0xa3, BCM5715
>>> A3 (0x9003).
>>> 
>>> it works, however i'm not strictly a fan of switching the cluster pool to
>>> larger one for 5714.  wasting another 8k page (on sparc for example) for
>>> every rx cluster in 90% cases sounds kinda wrong to me.  but ymmv.
>> 
>> this is what MCLGETI was invented to solve though. comparing pre mclgeti to 
>> what this does:
>> 
> 
> that doesn't make my point invalid though.
> 
>> a 5714 right now without jumbos would have 512 rings entries with 2048 bytes 
>> on each. 2048 * 512 is a 1024k of ram. if we bumped the std ring up to 
>> jumbos by default, 9216 * 512 would eat 4608k of ram.
>> 
> 
> your calculation is a bit off.  it's not 9216 * 512, in case of sparc64 it's
> 8k * 2 * 512 which is 8M.

on archs with 4k pages arts large pool code puts 9k frames on 12k pages, so 3k 
waste per cluster. on archs with and 8k pages 9k clusters land on on 64k pages, 
so the memory waste per 9k cluster is about 146 bytes. on archs with 16k pages 
you get a 7k waste per 9k cluster.

im proposing that the large page code in pools change so it puts at least 8 
items on a "page", and "pages" are always powers of two. that would mean 9k 
clusters would always end up on a 128k page, which works out so every arch only 
gets the 146 bytes of waste per 9k cluster. less if i put the pool page headers 
on the same page.

> but my concern is different:  you ask uvm to do more work for every cluster
> since now you need 2 consequent pages of memory for one cluster instead of
> just one that fits 8k/2k = 4 clusters.

see above.

it is also worth noting that the current mbuf cluster allocator sets a low 
watermark that for a lot of workloads means we never return clusters to uvm.

my proposed code changes would get rid of that low watermark, but would age 
fully free pages so theyre only returned to uvm if theyve been idle for a 
second. if you have a fairly consistent workload you dont move pages in and out 
of uvm a lot.

on my production firewalls, the result of the above (128k pages for 9k clusters 
and free page idling) is i have allocated mbuf clusters 74009152525 times, but 
only allocated pages 8172372 times and returned pages to uvm 8172233 times. 
that works out to be about 9000 uses of the pages per uvm allocation. that 
particular box has been up for a fortnight so some of those counters may have 
wrapped, so take the numbers with a grain of salt.

> 
>> my boxes with bge with mclgeti generally sit around 40 clusters, but 
>> sometimes end up around 80. 80 * 9216 is 720k. we can have jumbos and still 
>> be ahead.
>> 
>> if you compare the nics with split rings: 512 * 2048 + 256 * 9216 is ~3.3M. 
>> the same chip with mclgeti and only doing a 1500 byte workload would be 80 * 
>> 2048 + 17 * 9216, or 300k.
>> 
>>> 
>>> apart from that there's a deficiency in the diff itself.  you probably want
>>> to change MCLBYTES in bge_rxrinfo to bge_rx_std_len otherwise statistics
>>> look wrong.
>> 
>> yeah.
>> 
>> i have tested both 1500 and 9000 mtus on a 5714 and it is working well. as 
>> you say, 5719 seems to be fine too, but ive only tested it with mtu 1500. 
>> ill test 9k tomorrow.
>> 
>> it needs tests on older chips too though.


Reply via email to