On 21/11/2018 16:57, Will Deacon wrote:
On Wed, Nov 21, 2018 at 04:47:48PM +0000, John Garry wrote:
On 21/11/2018 16:07, Will Deacon wrote:
On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote:
From: Ganapatrao Kulkarni <[email protected]>

Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
respective device NUMA node. The ternary operator which would be for
alloc_pages_node() is tidied along with this.

We also include a change to use kvzalloc() for kzalloc()/vzalloc()
combination.

Signed-off-by: Ganapatrao Kulkarni <[email protected]>
[JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary 
operator]
Signed-off-by: John Garry <[email protected]>

Weird, you're missing a diffstat here.

Anyway, the patch looks fine to me, but it would be nice if you could
justify the change with some numbers. Do you actually see an improvement
>from this change?


Hi Will,

Ah, I missed adding my comments explaining the motivation. It would be
better in the commit log. Anyway, here's the snippet:

" ... as mentioned in [3], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

[3]
https://www.mail-archive.com/[email protected]/msg1692998.html";

Yes, please add to this to the commit log.


Sure,

I did have some numbers to show improvement in some scenarios when I tested
this a while back which I'll dig out.

However I would say that some scenarios will improve and the opposite for
others with this change, considering different conditions in which DMA
memory may be used.

Well, if you can show that it's useful in some cases and not catastrophic in
others, then I think shooting for parity with direct DMA is a reasonable
justification for the change.


So I have done some more testing with our SoC crypto engine, using tcrypt module. The reason I used this device was because we can utilise a local device per socket in the system, unlike other DMAing devices, which generally only exist on a single socket (for us, anyway).

Overall the results aren't brilliant - as expected - but show a general marginal improvement. Here's some figures:

Summary:
Average diff            +0.9%
Max diff                +1.5%
Min diff                +0.2%

Test                         Ops/second before  after   diff %  

async ecb(aes) encrypt
test 0 (128 bit key, 16 byte blocks)    68717   69057   0.5
test 1 (128 bit key, 64 byte blocks):   72633   73163   0.7
test 2 (128 bit key, 256 byte blocks):  71475   72076   0.8
test 3 (128 bit key, 1024 byte blocks): 66819   67467   1.0
test 4 (128 bit key, 8192 byte blocks): 38237   38495   0.7
test 5 (192 bit key, 16 byte blocks):   70273   71079   1.2
test 6 (192 bit key, 64 byte blocks):   72455   73292   1.2
test 7 (192 bit key, 256 byte blocks):  71085   71876   1.1
test 8 (192 bit key, 1024 byte blocks): 65891   66576   1.0
test 9 (192 bit key, 8192 byte blocks): 34846   35061   0.6
test 10 (256 bit key, 16 byte blocks):  72927   73762   1.2
test 11 (256 bit key, 64 byte blocks):  72361   73207   1.2
test 12 (256 bit key, 256 byte blocks): 70907   71602   1.0
test 13 (256 bit key, 1024 byte blocks):65035   65653   1.0
test 14 (256 bit key, 8192 byte blocks):32835   32998   0.5
async ecb(aes) decrypt
test 0 (128 bit key, 16 byte blocks)    68384   69130   1.1
test 1 (128 bit key, 64 byte blocks):   72645   73133   0.7
test 2 (128 bit key, 256 byte blocks):  71523   71912   0.5
test 3 (128 bit key, 1024 byte blocks): 66902   67258   0.5
test 4 (128 bit key, 8192 byte blocks): 38260   38434   0.5
test 5 (192 bit key, 16 byte blocks):   70301   70816   0.7
test 6 (192 bit key, 64 byte blocks):   72473   73064   0.8
test 7 (192 bit key, 256 byte blocks):  71106   71663   0.8
test 8 (192 bit key, 1024 byte blocks): 65915   66363   0.7
test 9 (192 bit key, 8192 byte blocks): 34876   35006   0.4
test 10 (256 bit key, 16 byte blocks):  72969   73519   0.8
test 11 (256 bit key, 64 byte blocks):  72404   72925   0.7
test 12 (256 bit key, 256 byte blocks): 70861   71350   0.7
test 13 (256 bit key, 1024 byte blocks):65074   65485   0.6
test 14 (256 bit key, 8192 byte blocks):32861   32974   0.3
async cbc(aes) encrypt
test 0 (128 bit key, 16 byte blocks)    58306   59131   1.4
test 1 (128 bit key, 64 byte blocks):   61647   62565   1.5
test 2 (128 bit key, 256 byte blocks):  60841   61666   1.4
test 3 (128 bit key, 1024 byte blocks): 57503   58204   1.2
test 4 (128 bit key, 8192 byte blocks): 34760   35055   0.9
test 5 (192 bit key, 16 byte blocks):   59684   60452   1.3
test 6 (192 bit key, 64 byte blocks):   61705   62516   1.3
test 7 (192 bit key, 256 byte blocks):  60678   61426   1.2
test 8 (192 bit key, 1024 byte blocks): 56805   57487   1.2
test 9 (192 bit key, 8192 byte blocks): 31836   32093   0.8
test 10 (256 bit key, 16 byte blocks):  61961   62785   1.3
test 11 (256 bit key, 64 byte blocks):  61584   62427   1.4
test 12 (256 bit key, 256 byte blocks): 60407   61246   1.4
test 13 (256 bit key, 1024 byte blocks):56135   56868   1.3
test 14 (256 bit key, 8192 byte blocks):30128   30380   0.8
async cbc(aes) decrypt
test 0 (128 bit key, 16 byte blocks)    58555   59044   0.8
test 1 (128 bit key, 64 byte blocks):   61853   62589   1.2
test 2 (128 bit key, 256 byte blocks):  60992   61728   1.2
test 3 (128 bit key, 1024 byte blocks): 57591   58250   1.1
test 4 (128 bit key, 8192 byte blocks): 34796   35064   0.8
test 5 (192 bit key, 16 byte blocks):   59843   60506   1.1
test 6 (192 bit key, 64 byte blocks):   61808   62521   1.2
test 7 (192 bit key, 256 byte blocks):  60800   61445   1.1
test 8 (192 bit key, 1024 byte blocks): 56949   57513   1.0
test 9 (192 bit key, 8192 byte blocks): 31890   32107   0.7
test 10 (256 bit key, 16 byte blocks):  62109   62778   1.1
test 11 (256 bit key, 64 byte blocks):  61748   62418   1.1
test 12 (256 bit key, 256 byte blocks): 60604   61226   1.0
test 13 (256 bit key, 1024 byte blocks):56277   56845   1.0
test 14 (256 bit key, 8192 byte blocks):30165   30340   0.6
sync xts(aes) encrypt
test 0 (256 bit key, 16 byte blocks):   59501   59742   0.4
test 1 (256 bit key, 64 byte blocks):   62894   63249   0.6
test 2 (256 bit key, 256 byte blocks):  62074   62476   0.7
test 3 (256 bit key, 1024 byte blocks): 58569   58894   0.6
test 4 (256 bit key, 8192 byte blocks): 35271   35462   0.5
test 5 (512 bit key, 16 byte blocks):   60749   61174   0.7
test 6 (512 bit key, 64 byte blocks):   62760   63148   0.6
test 7 (512 bit key, 256 byte blocks):  61625   61928   0.5
test 8 (512 bit key, 1024 byte blocks): 57162   57452   0.5
test 9 (512 bit key, 8192 byte blocks): 30582   30718   0.4
async xts(aes) decrypt
test 0 (256 bit key, 16 byte blocks):   59423   59587   0.3
test 1 (256 bit key, 64 byte blocks):   62849   62974   0.2
test 2 (256 bit key, 256 byte blocks):  61884   62231   0.6
test 3 (256 bit key, 1024 byte blocks): 58402   58699   0.5
test 4 (256 bit key, 8192 byte blocks): 35244   35353   0.3
test 5 (512 bit key, 16 byte blocks):   60693   60882   0.3
test 6 (512 bit key, 64 byte blocks):   62671   62896   0.4
test 7 (512 bit key, 256 byte blocks):  61488   61722   0.4
test 8 (512 bit key, 1024 byte blocks): 57092   57261   0.3
test 9 (512 bit key, 8192 byte blocks): 30572   30655   0.3
sync ctr(aes) encrypt   
test 0 (128 bit key, 16 byte blocks):   59673   9926    0.4
test 1 (128 bit key, 64 byte blocks):   62738   63024   0.5
test 2 (128 bit key, 256 byte blocks):  61841   62171   0.5
test 3 (128 bit key, 1024 byte blocks): 58374   58633   0.4
test 4 (128 bit key, 8192 byte blocks): 35137   35273   0.4
test 5 (192 bit key, 16 byte blocks):   60736   61049   0.5
test 6 (192 bit key, 64 byte blocks):   62718   63081   0.6
test 7 (192 bit key, 256 byte blocks):  61652   61973   0.5
test 8 (192 bit key, 1024 byte blocks): 57709   58002   0.5
test 9 (192 bit key, 8192 byte blocks): 32171   32306   0.4
test 10 (256 bit key, 16 byte blocks):  63000   63428   0.7
test 11 (256 bit key, 64 byte blocks):  62594   63027   0.7
test 12 (256 bit key, 256 byte blocks): 61452   61845   0.6
test 13 (256 bit key, 1024 byte blocks):57019   57346   0.6
test 14 (256 bit key, 8192 byte blocks):30422   30544   0.4
async ctr(aes) decrypt
test 0 (128 bit key, 16 byte blocks):   59488   59994   0.9
test 1 (128 bit key, 64 byte blocks):   62651   63173   0.8
test 2 (128 bit key, 256 byte blocks):  61778   62464   1.1
test 3 (128 bit key, 1024 byte blocks): 58290   58909   1.1
test 4 (128 bit key, 8192 byte blocks): 35115   35358   0.7
test 5 (192 bit key, 16 byte blocks):   60590   61244   1.1
test 6 (192 bit key, 64 byte blocks):   62604   63296   1.1
test 7 (192 bit key, 256 byte blocks):  61540   62199   1.1
test 8 (192 bit key, 1024 byte blocks): 57577   58149   1.0
test 9 (192 bit key, 8192 byte blocks): 32146   32368   0.7
test 10 (256 bit key, 16 byte blocks):  62852   63610   1.2
test 11 (256 bit key, 64 byte blocks):  62523   63276   1.2
test 12 (256 bit key, 256 byte blocks): 61356   62049   1.1
test 13 (256 bit key, 1024 byte blocks):56943   57546   1.1
test 14 (256 bit key, 8192 byte blocks):30399   30593   0.6

Let me know if any more questions or comments.

Thanks,
John

Will

.



_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to