Re: [PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour

2018-05-18 Thread Jiri Slaby
On 05/18/2018, 10:16 AM, Greg Kroah-Hartman wrote:
> 4.9-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Jiri Slaby 
> 
> commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.
...
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1458,6 +1458,45 @@ out:
>   return ret;
>  }
>  
> +static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)
> +{
> + unsigned int op = (encoded_op & 0x7000) >> 28;
> + unsigned int cmp =(encoded_op & 0x0f00) >> 24;
> + int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12);
> + int cmparg = sign_extend32(encoded_op & 0x0fff, 12);

12 is wrong here – wherever you apply this, you need also a follow-up fix:
commit d70ef22892ed6c066e51e118b225923c9b74af34
Author: Jiri Slaby 
Date:   Thu Nov 30 15:35:44 2017 +0100

futex: futex_wake_op, fix sign_extend32 sign bits

thanks,
-- 
js
suse labs

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

[PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour

2018-05-18 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Jiri Slaby 

commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.

There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.

Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.

This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.

Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).

Signed-off-by: Jiri Slaby 
Signed-off-by: Thomas Gleixner 
Acked-by: Russell King 
Acked-by: Michael Ellerman  (powerpc)
Acked-by: Heiko Carstens  [s390]
Acked-by: Chris Metcalf  [for tile]
Reviewed-by: Darren Hart (VMware) 
Reviewed-by: Will Deacon  [core/arm64]
Cc: linux-m...@linux-mips.org
Cc: Rich Felker 
Cc: linux-i...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: pet...@infradead.org
Cc: Benjamin Herrenschmidt 
Cc: Max Filippov 
Cc: Paul Mackerras 
Cc: sparcli...@vger.kernel.org
Cc: Jonas Bonn 
Cc: linux-s...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: Yoshinori Sato 
Cc: linux-hexa...@vger.kernel.org
Cc: Helge Deller 
Cc: "James E.J. Bottomley" 
Cc: Catalin Marinas 
Cc: Matt Turner 
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu 
Cc: Arnd Bergmann 
Cc: linux-xte...@linux-xtensa.org
Cc: Stefan Kristiansson 
Cc: openr...@lists.librecores.org
Cc: Ivan Kokshaysky 
Cc: Stafford Horne 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Richard Henderson 
Cc: Chris Zankel 
Cc: Michal Simek 
Cc: Tony Luck 
Cc: linux-par...@vger.kernel.org
Cc: Vineet Gupta 
Cc: Ralf Baechle 
Cc: Richard Kuo 
Cc: linux-al...@vger.kernel.org
Cc: Martin Schwidefsky 
Cc: linuxppc-...@lists.ozlabs.org
Cc: "David S. Miller" 
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jsl...@suse.cz
Cc: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/alpha/include/asm/futex.h  |   26 +++---
 arch/arc/include/asm/futex.h|   40 +++-
 arch/arm/include/asm/futex.h|   26 ++
 arch/arm64/include/asm/futex.h  |   27 ++-
 arch/frv/include/asm/futex.h|3 +-
 arch/frv/kernel/futex.c |   27 ++-
 arch/hexagon/include/asm/futex.h|   38 ++-
 arch/ia64/include/asm/futex.h   |   25 ++
 arch/microblaze/include/asm/futex.h |   38 ++-
 arch/mips/include/asm/futex.h   |   25 ++
 arch/parisc/include/asm/futex.h |   26 ++
 arch/powerpc/include/asm/futex.h|   26 +++---
 arch/s390/include/asm/futex.h   |   23 +++-
 arch/sh/include/asm/futex.h |   26 ++
 arch/sparc/include/asm/futex_64.h   |   26 +++---
 arch/tile/include/asm/futex.h   |   40 +++-
 arch/x86/include/asm/futex.h|   40 +++-
 arch/xtensa/include/asm/futex.h |   27 +++
 include/asm-generic/futex.h |   50 ++--
 kernel/futex.c  |   39 
 20 files changed, 126 insertions(+), 472 deletions(-)

--- a/arch/alpha/include/asm/futex.h
+++ b/arch/alpha/include/asm/futex.h
@@ -29,18 +29,10 @@
:   "r" (uaddr), "r"(oparg) \
:   "memory")
 
-static inline int futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+   u32 __user *uaddr)
 {
-   int op = (encoded_op >> 28) & 7;
-   int cmp = (encoded_op >> 24) & 15;
-   int oparg = (encoded_op << 8) >> 20;
-   int cmparg = (encoded_op << 20) >> 20;
int oldval = 0, ret;
-   if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
-   oparg = 1 << oparg;
-
-   if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
-   return -EFAULT;
 
pagefault_disable();
 
@@ -66,17 +58,9 @@ static inline int futex_atomic_op_inuser
 
pagefault_enable();
 
-   if (!ret) {
-   switch (cmp) {
-   case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
-   case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
-   case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
-   case

Re: [PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour

2018-05-18 Thread Greg Kroah-Hartman
On Fri, May 18, 2018 at 10:30:24AM +0200, Jiri Slaby wrote:
> On 05/18/2018, 10:16 AM, Greg Kroah-Hartman wrote:
> > 4.9-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Jiri Slaby 
> > 
> > commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.
> ...
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -1458,6 +1458,45 @@ out:
> > return ret;
> >  }
> >  
> > +static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user 
> > *uaddr)
> > +{
> > +   unsigned int op = (encoded_op & 0x7000) >> 28;
> > +   unsigned int cmp =(encoded_op & 0x0f00) >> 24;
> > +   int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12);
> > +   int cmparg = sign_extend32(encoded_op & 0x0fff, 12);
> 
> 12 is wrong here – wherever you apply this, you need also a follow-up fix:
> commit d70ef22892ed6c066e51e118b225923c9b74af34
> Author: Jiri Slaby 
> Date:   Thu Nov 30 15:35:44 2017 +0100
> 
> futex: futex_wake_op, fix sign_extend32 sign bits

Thanks for letting me know, I've now queued it up to the needed trees.

greg k-h

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation

2018-05-18 Thread Alexey Brodkin
Hi Christoph,

On Fri, 2018-05-11 at 09:59 +0200, Christoph Hellwig wrote:

[snip]

There seems to be one subtle issue with map/unmap code.
While investigating problems on ARC I added instrumentation as below:
>8
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -152,14 +152,37 @@ static void _dma_cache_sync(struct device *dev, 
phys_addr_t paddr, size_t size,
}
 }
 
+static const char *dir_to_str(enum dma_data_direction dir)
+{
+   switch (dir) {
+   case DMA_BIDIRECTIONAL: return "DMA_BIDIRECTIONAL";
+   case DMA_TO_DEVICE: return "DMA_TO_DEVICE";
+   case DMA_FROM_DEVICE: return "DMA_FROM_DEVICE";
+   case DMA_NONE: return "DMA_NONE";
+   default: return "WRONG_VALUE!";
+   }
+}
+
 void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir)
 {
+   if (dir != DMA_TO_DEVICE){
+   dump_stack();
+   printk(" *** %s@%d: DMA direction is %s instead of %s\n",
+  __func__, __LINE__, dir_to_str(dir), 
dir_to_str(DMA_TO_DEVICE));
+   }
+
return _dma_cache_sync(dev, paddr, size, dir);
 }
 
 void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir)
 {
+   if (dir != DMA_FROM_DEVICE) {
+   dump_stack();
+   printk(" *** %s@%d: DMA direction is %s instead of %s\n",
+  __func__, __LINE__, dir_to_str(dir), 
dir_to_str(DMA_FROM_DEVICE));
+   }
+
return _dma_cache_sync(dev, paddr, size, dir);
 }
>8

And with that I noticed a bit unexpected output, see below:
>8
Stack Trace:
  arc_unwind_core.constprop.1+0xd4/0xf8
  dump_stack+0x68/0x80
  arch_sync_dma_for_device+0x34/0xc4
  dma_noncoherent_map_sg+0x80/0x94
  __dw_mci_start_request+0x1ee/0x868
  dw_mci_request+0x17e/0x1c8
  mmc_wait_for_req+0x106/0x1ac
  mmc_app_sd_status+0x108/0x130
  mmc_sd_setup_card+0xc6/0x2e8
  mmc_attach_sd+0x1b6/0x394
  mmc_rescan+0x2f4/0x3bc
  process_one_work+0x194/0x348
  worker_thread+0xf2/0x478
  kthread+0x120/0x13c
  ret_from_fork+0x18/0x1c
 *** arch_sync_dma_for_device@172: DMA direction is DMA_FROM_DEVICE instead of 
DMA_TO_DEVICE
...
Stack Trace:
  arc_unwind_core.constprop.1+0xd4/0xf8
  dump_stack+0x68/0x80
  arch_sync_dma_for_device+0x34/0xc4
  dma_noncoherent_map_page+0x86/0x8c
  usb_hcd_map_urb_for_dma+0x49e/0x53c
  usb_hcd_submit_urb+0x43c/0x8c4
  usb_control_msg+0xbe/0x16c
  hub_port_init+0x5e0/0xb0c
  hub_event+0x4e6/0x1164
  process_one_work+0x194/0x348
  worker_thread+0xf2/0x478
  kthread+0x120/0x13c
  ret_from_fork+0x18/0x1c
 mmcblk0: p1 p2
 *** arch_sync_dma_for_device@172: DMA direction is DMA_FROM_DEVICE instead of 
DMA_TO_DEVICE

...
and quite some more of the similar
...
>8

In case of MMC/DW_MCI (AKA DesignWare MobileStorage controller) that's an 
execution flow:
1) __dw_mci_start_request()
2) dw_mci_pre_dma_transfer()
3) dma_map_sg(..., mmc_get_dma_dir(data))

Note mmc_get_dma_dir() is just "data->flags & MMC_DATA_WRITE ? DMA_TO_DEVICE : 
DMA_FROM_DEVICE".
I.e. if we're preparing for sending data dma_noncoherent_map_sg() will have 
DMA_TO_DEVICE which
is quite OK for passing to dma_noncoherent_sync_sg_for_device() but in case of 
reading we'll have
DMA_FROM_DEVICE which we'll pass to dma_noncoherent_sync_sg_for_device() in 
dma_noncoherent_map_sg().

I'd say this is not entirely correct because IMHO arch_sync_dma_for_cpu() is 
supposed to only be used
in case of DMA_FROM_DEVICE and arch_sync_dma_for_device() only in case of 
DMA_TO_DEVICE.


> +static dma_addr_t dma_noncoherent_map_page(struct device *dev, struct page 
> *page,
> + unsigned long offset, size_t size, enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> + dma_addr_t addr;
> +
> + addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
> + if (!dma_mapping_error(dev, addr) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> + arch_sync_dma_for_device(dev, page_to_phys(page), size, dir);
> + return addr;
> +}
> +
> +static int dma_noncoherent_map_sg(struct device *dev, struct scatterlist 
> *sgl,
> + int nents, enum dma_data_direction dir, unsigned long attrs)
> +{
> + nents = dma_direct_map_sg(dev, sgl, nents, dir, attrs);
> + if (nents > 0 && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> + dma_noncoherent_sync_sg_for_device(dev, sgl, nents, dir);
> + return nents;
> +}

The same is for unmap functions.
My guess is we need to respect direction in map/unmap functions and use
either dma_noncoherent_sync_single_for_cpu(..., DMA_FROM_DEVICE) or
dma_noncoherent_sync_single_for_device(...,DMA_TO_DEVI

Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation

2018-05-18 Thread h...@lst.de
On Fri, May 18, 2018 at 01:03:46PM +, Alexey Brodkin wrote:
> Note mmc_get_dma_dir() is just "data->flags & MMC_DATA_WRITE ? DMA_TO_DEVICE 
> : DMA_FROM_DEVICE".
> I.e. if we're preparing for sending data dma_noncoherent_map_sg() will have 
> DMA_TO_DEVICE which
> is quite OK for passing to dma_noncoherent_sync_sg_for_device() but in case 
> of reading we'll have
> DMA_FROM_DEVICE which we'll pass to dma_noncoherent_sync_sg_for_device() in 
> dma_noncoherent_map_sg().
> 
> I'd say this is not entirely correct because IMHO arch_sync_dma_for_cpu() is 
> supposed to only be used
> in case of DMA_FROM_DEVICE and arch_sync_dma_for_device() only in case of 
> DMA_TO_DEVICE.

arc overrides the dir paramter of the dma_sync_single_for_device/
dma_sync_single_for_cpu calls.  My patches dropped that, and I have
restored that, and audit for the other architectures is pending.

That being said the existing arc code still looks rather odd as it
didn't do the same thing for the scatterlist versions of the calls.
I've thrown in a few patches into my new tree to make the sg versions
make the normal calls, and to clean up the area a bit.

> You seem to lost an offset in the page so if we happen to have a buffer not 
> aligned to
> a page boundary then we were obviously corrupting data outside our data :)

Oops!  Thank you for all the debugging!

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


arc dma mapping cleanups/fixes and generic noncohernt dma code

2018-05-18 Thread Christoph Hellwig
Hi all,

this series continues consolidating the dma-mapping code, with a focus
on architectures that do not (always) provide cache coherence for DMA.
Three architectures (arm, mips and powerpc) are still left to be
converted later due to complexity of their dma ops selection.

The dma-noncoherent ops calls the dma-direct ops for the actual
translation of streaming mappins and allow the architecture to provide
any cache flushing required for cpu to device and/or device to cpu
ownership transfers.  The dma coherent allocator is for now still left
entirely to architecture supplied implementations due the amount of
variations.  Hopefully we can do some consolidation for them later on
as well.

Because this series sits on top of two previously submitted series
a git tree might be useful to actually test it.  It is provided here:

git://git.infradead.org/users/hch/misc.git arc-dma

Gitweb:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/arc-dma

Changes since previous bigger series:
 - take the offset into account in noncoherent_dma_map_page (Alexey Brodkin)
 - fix dma directions in arc (Alexey Brodkin)
 - split arc changes into smaller patches

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 3/6] arc: simplify arc_dma_sync_single_for_{cpu,device}

2018-05-18 Thread Christoph Hellwig
Remove the indirection through _dma_cache_sync.  Also move the functions
up a bit in the source file as we'll need them in more places soon.

Signed-off-by: Christoph Hellwig 
---
 arch/arc/mm/dma.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 1dcc404b5aec..98e21ce526be 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -153,6 +153,18 @@ static void _dma_cache_sync(phys_addr_t paddr, size_t size,
}
 }
 
+static void arc_dma_sync_single_for_device(struct device *dev,
+   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
+{
+   dma_cache_wback(dma_handle, size);
+}
+
+static void arc_dma_sync_single_for_cpu(struct device *dev,
+   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
+{
+   dma_cache_inv(dma_handle, size);
+}
+
 /*
  * arc_dma_map_page - map a portion of a page for streaming DMA
  *
@@ -221,18 +233,6 @@ static void arc_dma_unmap_sg(struct device *dev, struct 
scatterlist *sg,
   attrs);
 }
 
-static void arc_dma_sync_single_for_cpu(struct device *dev,
-   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
-{
-   _dma_cache_sync(dma_handle, size, DMA_FROM_DEVICE);
-}
-
-static void arc_dma_sync_single_for_device(struct device *dev,
-   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
-{
-   _dma_cache_sync(dma_handle, size, DMA_TO_DEVICE);
-}
-
 static void arc_dma_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sglist, int nelems,
enum dma_data_direction dir)
-- 
2.17.0


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 5/6] arc: fix arc_dma_{map,unmap}_page

2018-05-18 Thread Christoph Hellwig
These functions should perform the same cache synchronoization as calling
arc_dma_sync_single_for_{cpu,device} in addition to doing any required
address translation or mapping [1].  Ensure they actually do that by calling
arc_dma_sync_single_for_{cpu,device} instead of passing the dir argument
along to _dma_cache_sync.

The now unused _dma_cache_sync function is removed as well.

[1] in fact various drivers rely on that by passing DMA_ATTR_SKIP_CPU_SYNC
to the map/unmap routines and doing the cache synchronization manually.

Signed-off-by: Christoph Hellwig 
---
 arch/arc/mm/dma.c | 27 ++-
 1 file changed, 2 insertions(+), 25 deletions(-)

diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index eafdbd2ad20a..08d91c13ac52 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -130,29 +130,6 @@ static int arc_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
return ret;
 }
 
-/*
- * streaming DMA Mapping API...
- * CPU accesses page via normal paddr, thus needs to explicitly made
- * consistent before each use
- */
-static void _dma_cache_sync(phys_addr_t paddr, size_t size,
-   enum dma_data_direction dir)
-{
-   switch (dir) {
-   case DMA_FROM_DEVICE:
-   dma_cache_inv(paddr, size);
-   break;
-   case DMA_TO_DEVICE:
-   dma_cache_wback(paddr, size);
-   break;
-   case DMA_BIDIRECTIONAL:
-   dma_cache_wback_inv(paddr, size);
-   break;
-   default:
-   pr_err("Invalid DMA dir [%d] for OP @ %pa[p]\n", dir, &paddr);
-   }
-}
-
 static void arc_dma_sync_single_for_device(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -185,7 +162,7 @@ static dma_addr_t arc_dma_map_page(struct device *dev, 
struct page *page,
phys_addr_t paddr = page_to_phys(page) + offset;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   _dma_cache_sync(paddr, size, dir);
+   arc_dma_sync_single_for_device(dev, paddr, size, dir);
 
return paddr;
 }
@@ -205,7 +182,7 @@ static void arc_dma_unmap_page(struct device *dev, 
dma_addr_t handle,
phys_addr_t paddr = handle;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   _dma_cache_sync(paddr, size, dir);
+   arc_dma_sync_single_for_cpu(dev, paddr, size, dir);
 }
 
 static int arc_dma_map_sg(struct device *dev, struct scatterlist *sg,
-- 
2.17.0


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 2/6] dma-mapping: provide a generic dma-noncoherent implementation

2018-05-18 Thread Christoph Hellwig
Add a new dma_map_ops implementation that uses dma-direct for the
address mapping of streaming mappings, and which requires arch-specific
implemenations of coherent allocate/free.

Architectures have to provide flushing helpers to ownership trasnfers
to the device and/or CPU, and can provide optional implementations of
the coherent mmap functionality, and the cache_flush routines for
non-coherent long term allocations.

Signed-off-by: Christoph Hellwig 
---
 MAINTAINERS   |   2 +
 include/asm-generic/dma-mapping.h |   9 +++
 include/linux/dma-direct.h|   7 +-
 include/linux/dma-mapping.h   |   1 +
 include/linux/dma-noncoherent.h   |  47 ++
 lib/Kconfig   |  20 ++
 lib/Makefile  |   1 +
 lib/dma-direct.c  |   8 +--
 lib/dma-noncoherent.c | 102 ++
 9 files changed, 192 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/dma-noncoherent.h
 create mode 100644 lib/dma-noncoherent.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 79bb02ff812f..08d0d15d4958 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4334,12 +4334,14 @@ W:  
http://git.infradead.org/users/hch/dma-mapping.git
 S: Supported
 F: lib/dma-debug.c
 F: lib/dma-direct.c
+F: lib/dma-noncoherent.c
 F: lib/dma-virt.c
 F: drivers/base/dma-mapping.c
 F: drivers/base/dma-coherent.c
 F: include/asm-generic/dma-mapping.h
 F: include/linux/dma-direct.h
 F: include/linux/dma-mapping.h
+F: include/linux/dma-noncoherent.h
 
 DME1737 HARDWARE MONITOR DRIVER
 M: Juerg Haefliger 
diff --git a/include/asm-generic/dma-mapping.h 
b/include/asm-generic/dma-mapping.h
index 880a292d792f..ad2868263867 100644
--- a/include/asm-generic/dma-mapping.h
+++ b/include/asm-generic/dma-mapping.h
@@ -4,7 +4,16 @@
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
+   /*
+* Use the non-coherent ops if available.  If an architecture wants a
+* more fine-grained selection of operations it will have to implement
+* get_arch_dma_ops itself or use the per-device dma_ops.
+*/
+#ifdef CONFIG_DMA_NONCOHERENT_OPS
+   return &dma_noncoherent_ops;
+#else
return &dma_direct_ops;
+#endif
 }
 
 #endif /* _ASM_GENERIC_DMA_MAPPING_H */
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 53ad6a47f513..8d9f33febde5 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -59,6 +59,11 @@ void *dma_direct_alloc(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t dma_addr, unsigned long attrs);
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size, enum dma_data_direction dir,
+   unsigned long attrs);
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+   enum dma_data_direction dir, unsigned long attrs);
 int dma_direct_supported(struct device *dev, u64 mask);
-
+int dma_direct_mapping_error(struct device *dev, dma_addr_t dma_addr);
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 25a9a2b04f78..4be070df5fc5 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -136,6 +136,7 @@ struct dma_map_ops {
 };
 
 extern const struct dma_map_ops dma_direct_ops;
+extern const struct dma_map_ops dma_noncoherent_ops;
 extern const struct dma_map_ops dma_virt_ops;
 
 #define DMA_BIT_MASK(n)(((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
new file mode 100644
index ..10b2654d549b
--- /dev/null
+++ b/include/linux/dma-noncoherent.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_DMA_NONCOHERENT_H
+#define _LINUX_DMA_NONCOHERENT_H 1
+
+#include 
+
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   gfp_t gfp, unsigned long attrs);
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+   dma_addr_t dma_addr, unsigned long attrs);
+
+#ifdef CONFIG_DMA_NONCOHERENT_MMAP
+int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+   void *cpu_addr, dma_addr_t dma_addr, size_t size,
+   unsigned long attrs);
+#else
+#define arch_dma_mmap NULL
+#endif /* CONFIG_DMA_NONCOHERENT_MMAP */
+
+#ifdef CONFIG_DMA_NONCOHERENT_CACHE_SYNC
+void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+   enum dma_data_direction direction);
+#else
+#define arch_dma_cache_sync NULL
+#endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
+
+#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
+void arch_sync_dma_for_device(struct device *dev, phys

[PATCH 4/6] arc: fix arc_dma_sync_sg_for_{cpu,device}

2018-05-18 Thread Christoph Hellwig
These functions should perform the same functionality as calling
arc_dma_sync_single_for_{cpu,device} on each S/G list element.  Ensure
they actually do that by calling arc_dma_sync_single_for_{cpu,device}.
Otherwise we could be passing a different dir argument.

Signed-off-by: Christoph Hellwig 
---
 arch/arc/mm/dma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 98e21ce526be..eafdbd2ad20a 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -241,7 +241,7 @@ static void arc_dma_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sg;
 
for_each_sg(sglist, sg, nelems, i)
-   _dma_cache_sync(sg_phys(sg), sg->length, dir);
+   arc_dma_sync_single_for_cpu(dev, sg_phys(sg), sg->length, dir);
 }
 
 static void arc_dma_sync_sg_for_device(struct device *dev,
@@ -252,7 +252,8 @@ static void arc_dma_sync_sg_for_device(struct device *dev,
struct scatterlist *sg;
 
for_each_sg(sglist, sg, nelems, i)
-   _dma_cache_sync(sg_phys(sg), sg->length, dir);
+   arc_dma_sync_single_for_device(dev, sg_phys(sg), sg->length,
+   dir);
 }
 
 static int arc_dma_supported(struct device *dev, u64 dma_mask)
-- 
2.17.0


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 6/6] arc: use generic dma_noncoherent_ops

2018-05-18 Thread Christoph Hellwig
Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig 
---
 arch/arc/Kconfig   |   4 +
 arch/arc/include/asm/Kbuild|   1 +
 arch/arc/include/asm/dma-mapping.h |  21 -
 arch/arc/mm/dma.c  | 140 +++--
 4 files changed, 18 insertions(+), 148 deletions(-)
 delete mode 100644 arch/arc/include/asm/dma-mapping.h

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 7498aca4b887..89d47eac18b2 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -9,11 +9,15 @@
 config ARC
def_bool y
select ARC_TIMERS
+   select ARCH_HAS_SYNC_DMA_FOR_CPU
+   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_SG_CHAIN
select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
select BUILDTIME_EXTABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
+   select DMA_NONCOHERENT_OPS
+   select DMA_NONCOHERENT_MMAP
select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
select GENERIC_CLOCKEVENTS
select GENERIC_FIND_FIRST_BIT
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 4bd5d4369e05..bbdcb955e18f 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -2,6 +2,7 @@
 generic-y += bugs.h
 generic-y += device.h
 generic-y += div64.h
+generic-y += dma-mapping.h
 generic-y += emergency-restart.h
 generic-y += extable.h
 generic-y += fb.h
diff --git a/arch/arc/include/asm/dma-mapping.h 
b/arch/arc/include/asm/dma-mapping.h
deleted file mode 100644
index 7a16824bfe98..
--- a/arch/arc/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,21 +0,0 @@
-/*
- * DMA Mapping glue for ARC
- *
- * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#ifndef ASM_ARC_DMA_MAPPING_H
-#define ASM_ARC_DMA_MAPPING_H
-
-extern const struct dma_map_ops arc_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-   return &arc_dma_ops;
-}
-
-#endif
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 08d91c13ac52..8c1071840979 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -16,13 +16,12 @@
  * The default DMA address == Phy address which is 0x8000_ based.
  */
 
-#include 
+#include 
 #include 
 #include 
 
-
-static void *arc_dma_alloc(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   gfp_t gfp, unsigned long attrs)
 {
unsigned long order = get_order(size);
struct page *page;
@@ -89,7 +88,7 @@ static void *arc_dma_alloc(struct device *dev, size_t size,
return kvaddr;
 }
 
-static void arc_dma_free(struct device *dev, size_t size, void *vaddr,
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
 {
phys_addr_t paddr = dma_handle;
@@ -105,9 +104,9 @@ static void arc_dma_free(struct device *dev, size_t size, 
void *vaddr,
__free_pages(page, get_order(size));
 }
 
-static int arc_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs)
+int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+   void *cpu_addr, dma_addr_t dma_addr, size_t size,
+   unsigned long attrs)
 {
unsigned long user_count = vma_pages(vma);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
@@ -130,127 +129,14 @@ static int arc_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
return ret;
 }
 
-static void arc_dma_sync_single_for_device(struct device *dev,
-   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
-{
-   dma_cache_wback(dma_handle, size);
-}
-
-static void arc_dma_sync_single_for_cpu(struct device *dev,
-   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
-{
-   dma_cache_inv(dma_handle, size);
-}
-
-/*
- * arc_dma_map_page - map a portion of a page for streaming DMA
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_page().
- *
- * Note: while it takes struct page as arg, caller can "abuse" it to pass
- * a region larger than PAGE_SIZE, provided it is physically contiguous
- * and this still works correctly
- */
-static dma_addr_t arc_dma_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, enum dma_data_direction dir,
-

[PATCH 1/6] dma-mapping: simplify Kconfig dependencies

2018-05-18 Thread Christoph Hellwig
ARCH_DMA_ADDR_T_64BIT is always true for 64-bit architectures now, so we
can skip the clause requiring it.  'n' is the default default, so no need
to explicitly state it.

Signed-off-by: Christoph Hellwig 
---
 lib/Kconfig | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/Kconfig b/lib/Kconfig
index 1d84e61cccfe..6c4e9d0ce5d1 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -443,13 +443,11 @@ config IOMMU_HELPER
 
 config DMA_DIRECT_OPS
bool
-   depends on HAS_DMA && (!64BIT || ARCH_DMA_ADDR_T_64BIT)
-   default n
+   depends on HAS_DMA
 
 config DMA_VIRT_OPS
bool
-   depends on HAS_DMA && (!64BIT || ARCH_DMA_ADDR_T_64BIT)
-   default n
+   depends on HAS_DMA
 
 config SWIOTLB
bool
-- 
2.17.0


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation

2018-05-18 Thread Alexey Brodkin
Hi Christoph,

On Fri, 2018-05-18 at 15:27 +0200, h...@lst.de wrote:
> On Fri, May 18, 2018 at 01:03:46PM +, Alexey Brodkin wrote:
> > Note mmc_get_dma_dir() is just "data->flags & MMC_DATA_WRITE ? 
> > DMA_TO_DEVICE : DMA_FROM_DEVICE".
> > I.e. if we're preparing for sending data dma_noncoherent_map_sg() will have 
> > DMA_TO_DEVICE which
> > is quite OK for passing to dma_noncoherent_sync_sg_for_device() but in case 
> > of reading we'll have
> > DMA_FROM_DEVICE which we'll pass to dma_noncoherent_sync_sg_for_device() in 
> > dma_noncoherent_map_sg().
> > 
> > I'd say this is not entirely correct because IMHO arch_sync_dma_for_cpu() 
> > is supposed to only be used
> > in case of DMA_FROM_DEVICE and arch_sync_dma_for_device() only in case of 
> > DMA_TO_DEVICE.
> 
> arc overrides the dir paramter of the dma_sync_single_for_device/
> dma_sync_single_for_cpu calls.  My patches dropped that, and I have
> restored that, and audit for the other architectures is pending.

Well at least for me that's a confusion what is a reason to pass direction
to function which purpose is already known.

I'd say that XXX_sync_for_device() doesn't need _variable_ direction as an 
argument,
otherwise what does that mean if we pass DMA_FROM_DEVICE to that function?

> That being said the existing arc code still looks rather odd as it
> didn't do the same thing for the scatterlist versions of the calls.

That might easily be the case so good we caught that now and it will be fixed :)

> I've thrown in a few patches into my new tree to make the sg versions
> make the normal calls, and to clean up the area a bit.

I'll try your newer series now, thanks!

-Alexey
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: arc dma mapping cleanups/fixes and generic noncohernt dma code

2018-05-18 Thread Alexey Brodkin
Hi Christoph,

On Fri, 2018-05-18 at 15:45 +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series continues consolidating the dma-mapping code, with a focus
> on architectures that do not (always) provide cache coherence for DMA.
> Three architectures (arm, mips and powerpc) are still left to be
> converted later due to complexity of their dma ops selection.
> 
> The dma-noncoherent ops calls the dma-direct ops for the actual
> translation of streaming mappins and allow the architecture to provide
> any cache flushing required for cpu to device and/or device to cpu
> ownership transfers.  The dma coherent allocator is for now still left
> entirely to architecture supplied implementations due the amount of
> variations.  Hopefully we can do some consolidation for them later on
> as well.
> 
> Because this series sits on top of two previously submitted series
> a git tree might be useful to actually test it.  It is provided here:
> 
> git://git.infradead.org/users/hch/misc.git arc-dma
> 
> Gitweb:
> 
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__git.infradead.org_users_hch_misc.git_shortlog_refs_heads_arc-2Ddma&d=DwIBAg&c=DPL6_X_6JkXFx7
> AXWqB0tg&r=lqdeeSSEes0GFDDl656eViXO7breS55ytWkhpk5R81I&m=QNC0zOG4bRotSxdjQfFXl0dAQJCtnxtF9435wAuJlu0&s=Wad1ptsFpknoSn0ebr3_QMJn9G_D3eIS-LYMwGfOJ-
> I&e=
> 
> Changes since previous bigger series:
>  - take the offset into account in noncoherent_dma_map_page (Alexey Brodkin)
>  - fix dma directions in arc (Alexey Brodkin)
>  - split arc changes into smaller patches

I really like this series especially how it trimmed down arch/arc/mm/dma.c:
>8
git diff --stat HEAD~4
 arch/arc/Kconfig   |   4 +++
 arch/arc/include/asm/Kbuild|   1 +
 arch/arc/include/asm/dma-mapping.h |  21 ---
 arch/arc/mm/cache.c|   2 +-
 arch/arc/mm/dma.c  | 162 
++
-
>8

So if we set aside my complaints about direction in 
arch_sync_dma_for_{device|cpu}()...

Tested-by: Alexey Brodkin 

Let's still wait for Vineet's ack as he's the chief maintainer for ARC :)

-Alexey 
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: arc dma mapping cleanups/fixes and generic noncohernt dma code

2018-05-18 Thread h...@lst.de
On Fri, May 18, 2018 at 02:49:36PM +, Alexey Brodkin wrote:
> So if we set aside my complaints about direction in 
> arch_sync_dma_for_{device|cpu}()...

Many other architectures use the argument.  Various of those look bogus,
but for now I want to be able to do straight forward conversions.  If
we end up not needing the argument in the end we can still remove it
eventually.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)

2018-05-18 Thread Vineet Gupta

On 05/18/2018 06:11 AM, Alexey Brodkin wrote:

  void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
 size_t size, enum dma_data_direction dir)
  {
+   if (dir != DMA_TO_DEVICE){
+   dump_stack();
+   printk(" *** %s@%d: DMA direction is %s instead of %s\n",
+  __func__, __LINE__, dir_to_str(dir), 
dir_to_str(DMA_TO_DEVICE));
+   }
+
 return _dma_cache_sync(dev, paddr, size, dir);
  }
  
  void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,

 size_t size, enum dma_data_direction dir)
  {
+   if (dir != DMA_FROM_DEVICE) {
+   dump_stack();
+   printk(" *** %s@%d: DMA direction is %s instead of %s\n",
+  __func__, __LINE__, dir_to_str(dir), 
dir_to_str(DMA_FROM_DEVICE));
+   }
+
 return _dma_cache_sync(dev, paddr, size, dir);
  }


...

In case of MMC/DW_MCI (AKA DesignWare MobileStorage controller) that's an 
execution flow:
1) __dw_mci_start_request()
2) dw_mci_pre_dma_transfer()
3) dma_map_sg(..., mmc_get_dma_dir(data))

Note mmc_get_dma_dir() is just "data->flags & MMC_DATA_WRITE ? DMA_TO_DEVICE : 
DMA_FROM_DEVICE".
I.e. if we're preparing for sending data dma_noncoherent_map_sg() will have 
DMA_TO_DEVICE which
is quite OK for passing to dma_noncoherent_sync_sg_for_device() but in case of 
reading we'll have
DMA_FROM_DEVICE which we'll pass to dma_noncoherent_sync_sg_for_device() in 
dma_noncoherent_map_sg().

I'd say this is not entirely correct because IMHO arch_sync_dma_for_cpu() is 
supposed to only be used
in case of DMA_FROM_DEVICE and arch_sync_dma_for_device() only in case of 
DMA_TO_DEVICE.


So roughly 10 years ago, some kernel rookie name Vineet Gupta, asked the exact 
same question :-)


http://kernelnewbies.kernelnewbies.narkive.com/aGW1QcDv/query-about-dma-sync-for-cpu-and-direction-to-device

I never understood the need for this direction. And if memory serves me right, at 
that time I was seeing twice the amount of cache flushing !



But the real fix of my problem is:
>8
--- a/lib/dma-noncoherent.c
+++ b/lib/dma-noncoherent.c
@@ -35,7 +35,7 @@ static dma_addr_t dma_noncoherent_map_page(struct device 
*dev, struct page *page
  
 addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);

 if (!dma_mapping_error(dev, addr) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   arch_sync_dma_for_device(dev, page_to_phys(page), size, dir);
+   arch_sync_dma_for_device(dev, page_to_phys(page) + offset, 
size, dir);
 return addr;
  }
>8

You seem to lost an offset in the page so if we happen to have a buffer not 
aligned to
a page boundary then we were obviously corrupting data outside our data :)


Neat !


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation

2018-05-18 Thread Vineet Gupta

On 05/18/2018 06:23 AM, h...@lst.de wrote:

  Fri, May 18, 2018 at 01:03:46PM +, Alexey Brodkin wrote:

Note mmc_get_dma_dir() is just "data->flags & MMC_DATA_WRITE ? DMA_TO_DEVICE : 
DMA_FROM_DEVICE".
I.e. if we're preparing for sending data dma_noncoherent_map_sg() will have 
DMA_TO_DEVICE which
is quite OK for passing to dma_noncoherent_sync_sg_for_device() but in case of 
reading we'll have
DMA_FROM_DEVICE which we'll pass to dma_noncoherent_sync_sg_for_device() in 
dma_noncoherent_map_sg().

I'd say this is not entirely correct because IMHO arch_sync_dma_for_cpu() is 
supposed to only be used
in case of DMA_FROM_DEVICE and arch_sync_dma_for_device() only in case of 
DMA_TO_DEVICE.

arc overrides the dir paramter of the dma_sync_single_for_device/
dma_sync_single_for_cpu calls.  My patches dropped that, and I have
restored that, and audit for the other architectures is pending.


Right, for now lets retain that and do a sweeping audit of @direction - to me it 
seems extraneous (as it did 10 years ago), but I'm not an expert in this are so 
perhaps it is needed for some device / arches and it would be good to understand 
that finally.



That being said the existing arc code still looks rather odd as it
didn't do the same thing for the scatterlist versions of the calls.
I've thrown in a few patches into my new tree to make the sg versions
make the normal calls, and to clean up the area a bit.


Not calling names or anything here, but it doesn't exist for sg variants, because 
I didn't write that code :-)

It was introduced by your commi:

2016-01-20 052c96dbe33b arc: convert to dma_map_ops


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)

2018-05-18 Thread Russell King - ARM Linux
On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote:
> I never understood the need for this direction. And if memory serves me
> right, at that time I was seeing twice the amount of cache flushing !

It's necessary.  Take a moment to think carefully about this:

dma_map_single(, dir)

dma_sync_single_for_cpu(, dir)

dma_sync_single_for_device(, dir)

dma_unmap_single(, dir)

In the case of a DMA-incoherent architecture, the operations done at each
stage depend on the direction argument:

map for_cpu for_device  unmap
TO_DEV  writeback   nonewriteback   none
TO_CPU  invalidate  invalidate* invalidate  invalidate*
BIDIR   writeback   invalidate  writeback   invalidate

* - only necessary if the CPU speculatively prefetches.

The multiple invalidations for the TO_CPU case handles different
conditions that can result in data corruption, and for some CPUs, all
four are necessary.

This is what is implemented for 32-bit ARM, depending on the CPU
capabilities, as we have DMA incoherent devices and we have CPUs that
speculatively prefetch data, and so may load data into the caches while
DMA is in operation.


Things get more interesting if the implementation behind the DMA API has
to copy data between the buffer supplied to the mapping and some DMA
accessible buffer:

map for_cpu for_device  unmap
TO_DEV  copy to dma nonecopy to dma none
TO_CPU  nonecopy to cpu nonecopy to cpu
BIDIR   copy to dma copy to cpu copy to dma copy to cpu

So, in both cases, the value of the direction argument defines what you
need to do in each call.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)

2018-05-18 Thread Alexey Brodkin
Hi Russel,

On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote:
> It's necessary.  Take a moment to think carefully about this:
> 
> dma_map_single(, dir)
> 
> dma_sync_single_for_cpu(, dir)
> 
> dma_sync_single_for_device(, dir)
> 
> dma_unmap_single(, dir)
> 
> In the case of a DMA-incoherent architecture, the operations done at each
> stage depend on the direction argument:
> 
> map for_cpu for_device  unmap
> TO_DEV  writeback   nonewriteback   none
> TO_CPU  invalidate  invalidate* invalidate  invalidate*
> BIDIR   writeback   invalidate  writeback   invalidate
> 
> * - only necessary if the CPU speculatively prefetches.

I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even
if CPU doesn't preferch - what if we reuse the same buffer for multiple
reads from DMA device?

> The multiple invalidations for the TO_CPU case handles different
> conditions that can result in data corruption, and for some CPUs, all
> four are necessary.

I would agree that map()/unmap() a quite a special cases and so depending
on direction we need to execute in them either for_cpu() or for_device()
call-backs depending on direction.

As for invalidation in case of for_device(TO_CPU) I still don't see
a rationale behind it. Would be interesting to see a real example where
we benefit from this.

> This is what is implemented for 32-bit ARM, depending on the CPU
> capabilities, as we have DMA incoherent devices and we have CPUs that
> speculatively prefetch data, and so may load data into the caches while
> DMA is in operation.
> 
> 
> Things get more interesting if the implementation behind the DMA API has
> to copy data between the buffer supplied to the mapping and some DMA
> accessible buffer:
> 
> map for_cpu for_device  unmap
> TO_DEV  copy to dma nonecopy to dma none
> TO_CPU  nonecopy to cpu nonecopy to cpu
> BIDIR   copy to dma copy to cpu copy to dma copy to cpu
> 
> So, in both cases, the value of the direction argument defines what you
> need to do in each call.

Interesting enough in your seond table (which describes more complicated
case indeed) you set "none" for for_device(TO_CPU) which looks logical
to me.

So IMHO that's what make sense:
>8-
map for_cpu for_device  unmap
TO_DEV  writeback   nonewriteback   none
TO_CPU  noneinvalidate  noneinvalidate*
BIDIR   writeback   invalidate  writeback   invalidate*
>8-

* is the case for prefetching CPU.

-Alexey
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation

2018-05-18 Thread Helge Deller
On 18.05.2018 15:03, Alexey Brodkin wrote:
> But the real fix of my problem is:
> >8
> --- a/lib/dma-noncoherent.c
> +++ b/lib/dma-noncoherent.c
> @@ -35,7 +35,7 @@ static dma_addr_t dma_noncoherent_map_page(struct device 
> *dev, struct page *page
>  
> addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
> if (!dma_mapping_error(dev, addr) && !(attrs & 
> DMA_ATTR_SKIP_CPU_SYNC))
> -   arch_sync_dma_for_device(dev, page_to_phys(page), size, dir);
> +   arch_sync_dma_for_device(dev, page_to_phys(page) + offset, 
> size, dir);
> return addr;
>  }
> >8
> 
> You seem to lost an offset in the page so if we happen to have a buffer not 
> aligned to
> a page boundary then we were obviously corrupting data outside our data :)

Good.
This patch seems to fix the dma issues I faced on my 32bit B160L parisc box.

So it leaves only one open issue on parisc:
Now every 32 bit parisc system is unnecessarily non-coherent.

Helge

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)

2018-05-18 Thread Vineet Gupta

On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote:

On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote:

I never understood the need for this direction. And if memory serves me
right, at that time I was seeing twice the amount of cache flushing !

It's necessary.  Take a moment to think carefully about this:

dma_map_single(, dir)

dma_sync_single_for_cpu(, dir)

dma_sync_single_for_device(, dir)

dma_unmap_single(, dir)


As an aside, do these imply a state machine of sorts - does a driver needs to 
always call map_single first ?


My original point of contention/confusion is the specific combinations of API and 
direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU)


Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non dma 
coherent arch.
Your tables below have "none" for both, implying it is unlikely to be a real 
combination (for ARM and ARC atleast).


The other case, actually @dir TO_CPU, independent of for_{cpu, device}  implies 
driver intends to touch it after the call, so it would invalidate any stray lines, 
unconditionally (and not just for speculative prefetch case).




In the case of a DMA-incoherent architecture, the operations done at each
stage depend on the direction argument:

map for_cpu for_device  unmap
TO_DEV  writeback   nonewriteback   none
TO_CPU  invalidate  invalidate* invalidate  invalidate*
BIDIR   writeback   invalidate  writeback   invalidate

* - only necessary if the CPU speculatively prefetches.

The multiple invalidations for the TO_CPU case handles different
conditions that can result in data corruption, and for some CPUs, all
four are necessary.


Can you please explain in some more detail, TO_CPU row, why invalidate is 
conditional sometimes.


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)

2018-05-18 Thread Russell King - ARM Linux
On Fri, May 18, 2018 at 07:57:34PM +, Alexey Brodkin wrote:
> Hi Russel,

That's Russell.

> On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote:
> > It's necessary.  Take a moment to think carefully about this:
> > 
> > dma_map_single(, dir)
> > 
> > dma_sync_single_for_cpu(, dir)
> > 
> > dma_sync_single_for_device(, dir)
> > 
> > dma_unmap_single(, dir)
> > 
> > In the case of a DMA-incoherent architecture, the operations done at each
> > stage depend on the direction argument:
> > 
> > map for_cpu for_device  unmap
> > TO_DEV  writeback   nonewriteback   none
> > TO_CPU  invalidate  invalidate* invalidate  invalidate*
> > BIDIR   writeback   invalidate  writeback   invalidate
> > 
> > * - only necessary if the CPU speculatively prefetches.
> 
> I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even
> if CPU doesn't preferch - what if we reuse the same buffer for multiple
> reads from DMA device?

That's fine - for non-coherent DMA, the CPU caches will only end up
containing data for that memory if:
- the CPU speculatively fetches data from that memory, or
- the CPU explicitly touches that memory

> > The multiple invalidations for the TO_CPU case handles different
> > conditions that can result in data corruption, and for some CPUs, all
> > four are necessary.
> 
> I would agree that map()/unmap() a quite a special cases and so depending
> on direction we need to execute in them either for_cpu() or for_device()
> call-backs depending on direction.
> 
> As for invalidation in case of for_device(TO_CPU) I still don't see
> a rationale behind it. Would be interesting to see a real example where
> we benefit from this.

Yes, you could avoid that, but depending how you structure the
architecture implementation, it can turn out to be a corner case.
The above table is precisely how 32-bit ARM is implemented, because
the way we implement them is based on who owns the memory - the "map"
and "for_device" operations translate internally to a cpu-to-device
ownership transition of the buffer.  Similar for "unmap" and "to_cpu".
It basically avoids having to add additional functions at the lower
implementation levels.

> > Things get more interesting if the implementation behind the DMA API has
> > to copy data between the buffer supplied to the mapping and some DMA
> > accessible buffer:
> > 
> > map for_cpu for_device  unmap
> > TO_DEV  copy to dma nonecopy to dma none
> > TO_CPU  nonecopy to cpu nonecopy to cpu
> > BIDIR   copy to dma copy to cpu copy to dma copy to cpu
> > 
> > So, in both cases, the value of the direction argument defines what you
> > need to do in each call.
> 
> Interesting enough in your seond table (which describes more complicated
> case indeed) you set "none" for for_device(TO_CPU) which looks logical
> to me.
> 
> So IMHO that's what make sense:
> >8-
> map for_cpu for_device  unmap
> TO_DEV  writeback   nonewriteback   none
> TO_CPU  noneinvalidate  noneinvalidate*
> BIDIR   writeback   invalidate  writeback   invalidate*
> >8-

That doesn't make sense for the TO_CPU case.  If the caches contain
dirty cache lines, and you're DMAing from the device to the system
RAM, other system activity can cause the dirty cache lines to be
evicted (written back) to memory which the DMA has already overwritten.
The result is data corruption.  So, you really can't have "none" in
the "map" case there.

Given that, the "for_cpu" case becomes dependent on whether the CPU
speculatively prefetches.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)

2018-05-18 Thread Russell King - ARM Linux
On Fri, May 18, 2018 at 01:35:08PM -0700, Vineet Gupta wrote:
> On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote:
> >On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote:
> >>I never understood the need for this direction. And if memory serves me
> >>right, at that time I was seeing twice the amount of cache flushing !
> >It's necessary.  Take a moment to think carefully about this:
> >
> > dma_map_single(, dir)
> >
> > dma_sync_single_for_cpu(, dir)
> >
> > dma_sync_single_for_device(, dir)
> >
> > dma_unmap_single(, dir)
> 
> As an aside, do these imply a state machine of sorts - does a driver needs
> to always call map_single first ?

Kind-of, but some drivers do omit some of the dma_sync_*() calls.
For example, if a buffer is written to, then mapped with TO_DEVICE,
and then the CPU wishes to write to it, it's fairly common that a
driver omits the dma_sync_single_for_cpu() call.  If you think about
the cases I gave and what cache operations happen, such a scenario
practically turns out to be safe.

> My original point of contention/confusion is the specific combinations of
> API and direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU)

Remember that it is expected that all calls for a mapping use the
same direction argument while that mapping exists.  In other words,
if you call dma_map_single(TO_DEVICE) and then use any of the other
functions, the other functions will also use TO_DEVICE.  The DMA
direction argument describes the direction of the DMA operation
being performed on the buffer, not on the individual dma_* operation.

What isn't expected at arch level is for drivers to do:

dma_map_single(TO_DEVICE)
dma_sync_single_for_cpu(FROM_DEVICE)

or vice versa.

> Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non
> dma coherent arch.
> 
> Your tables below have "none" for both, implying it is unlikely to be a real
> combination (for ARM and ARC atleast).

Very little for the cases that I've stated (and as I mentioned
above, some drivers do omit the call in that case.)

> The other case, actually @dir TO_CPU, independent of for_{cpu, device} 
> implies driver intends to touch it after the call, so it would invalidate
> any stray lines, unconditionally (and not just for speculative prefetch
> case).

If you don't have a CPU that speculatively prefetches, and you've
already had to invalidate the cache lines (to avoid write-backs
corrupting DMA'd data) then there's no need for the architecture
to do any work at the for_cpu(TO_CPU) case - the CPU shouldn't
be touching cache lines that are part of the buffer while it is
mapped, which means a non-speculating CPU won't pull in any
cache lines without an explicit access.

Speculating CPUs are different.  The action of the speculation is
to try and guess what data the program wants to access ahead of
the program flow.  That causes the CPU to prefetch data into the
cache.  The point in the program flow that this happens is not
really determinant to the programmer.  This means that if you try
to read from the DMA buffer after the DMA operation has complete
without invalidating the cache between the DMA completing and the
CPU reading, you have no guarantee that you're reading the data
that the DMA operation has been written.  The cache may have
loaded itself with data before the DMA operation completed, and
the CPU may see that stale data.

The difference between non-speculating CPUs and speculating CPUs
is that for non-speculating CPUs, caches work according to explicit
accesses by the program, and the program is stalled while the data
is fetched from external memory.  Speculating CPUs try to predict
ahead of time what data the program will require in the future,
and attempt to load that data into the caches _before_ the program
requires it - which means that the program suffers fewer stalls.

> >In the case of a DMA-incoherent architecture, the operations done at each
> >stage depend on the direction argument:
> >
> > map for_cpu for_device  unmap
> >TO_DEV   writeback   nonewriteback   none
> >TO_CPU   invalidate  invalidate* invalidate  invalidate*
> >BIDIRwriteback   invalidate  writeback   invalidate
> >
> >* - only necessary if the CPU speculatively prefetches.
> >
> >The multiple invalidations for the TO_CPU case handles different
> >conditions that can result in data corruption, and for some CPUs, all
> >four are necessary.
> 
> Can you please explain in some more detail, TO_CPU row, why invalidate is
> conditional sometimes.

See above - I hope my explanation above is sufficient.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org

Re: arc dma mapping cleanups/fixes and generic noncohernt dma code

2018-05-18 Thread Vineet Gupta

On 05/18/2018 06:45 AM, Christoph Hellwig wrote:

Hi all,

this series continues consolidating the dma-mapping code, with a focus
on architectures that do not (always) provide cache coherence for DMA.
Three architectures (arm, mips and powerpc) are still left to be
converted later due to complexity of their dma ops selection.

The dma-noncoherent ops calls the dma-direct ops for the actual
translation of streaming mappins and allow the architecture to provide
any cache flushing required for cpu to device and/or device to cpu
ownership transfers.  The dma coherent allocator is for now still left
entirely to architecture supplied implementations due the amount of
variations.  Hopefully we can do some consolidation for them later on
as well.

Because this series sits on top of two previously submitted series
a git tree might be useful to actually test it.  It is provided here:

 git://git.infradead.org/users/hch/misc.git arc-dma

Gitweb:

 
https://urldefense.proofpoint.com/v2/url?u=http-3A__git.infradead.org_users_hch_misc.git_shortlog_refs_heads_arc-2Ddma&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=7FgpX6o3vAhwMrMhLh-4ZJey5kjdNUwOL2CWsFwR4T8&m=9mEjcnTLzOrHy9MqruvxfxgCZQ1c0Bs4EQ8F7G-F4M8&s=TV7HWlNcSsk79JReNvLa5u1vsIkJxrk_s6RsL5CMNW4&e=

Changes since previous bigger series:
  - take the offset into account in noncoherent_dma_map_page (Alexey Brodkin)
  - fix dma directions in arc (Alexey Brodkin)
  - split arc changes into smaller patches



LGTM !

Acked-by: Vineet Gupta 

Thx,
-Vineet

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation

2018-05-18 Thread h...@lst.de
On Fri, May 18, 2018 at 10:05:51PM +0200, Helge Deller wrote:
> This patch seems to fix the dma issues I faced on my 32bit B160L parisc box.
> 
> So it leaves only one open issue on parisc:
> Now every 32 bit parisc system is unnecessarily non-coherent.

I diagree with those comments, let me resend the refactored patch
to make it more clear.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc