Re: [Qemu-devel] [PATCH v0 0/7] Background snapshots

2018-07-25 Thread Andrea Arcangeli
On Wed, Jul 25, 2018 at 08:17:37PM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > On Fri, Jun 29, 2018 at 12:53:59PM +0100, Dr. David Alan Gilbert wrote: > > > * Denis Plotnikov (dplotni...@virtuozzo.com) wrote: > > > > The patch set adds the ability to make extern

Re: [Qemu-devel] [PATCH] migration: add capability to bypass the shared memory

2018-07-02 Thread Andrea Arcangeli
Hello, On Mon, Jul 02, 2018 at 09:52:08PM +0800, Peng Tao wrote: > I think we can write some host generated random seeds to guest's > urandom device, when cloning VMs from the same template before handing > it to users. Is it enough or do you think there are more to do w/ > re-randomizing? That m

Re: [Qemu-devel] [PATCH] migration: add capability to bypass the shared memory

2018-07-02 Thread Andrea Arcangeli
Hello everyone, On Mon, Jul 02, 2018 at 02:10:54PM +0100, Stefan Hajnoczi wrote: > Marcelo, Andrea, Paolo: There was a more complex local migration > approach in 2013 with fd passing and vmsplice. They specifically > avoided the approach proposed in this patch, but I don't remember why. > > The

[Qemu-devel] [PATCH 0/1] FOLL_NOWAIT and get_user_pages_unlocked

2018-03-02 Thread Andrea Arcangeli
pointer is not NULL) or we need to revert part of commit ce53053ce378c21e7ffc45241fd67d6ee79daa2b and keep using FOLL_NOWAIT only as parameter to get_user_pages (which won't ever set nonblocking pointer to non-NULL). I suppose the former approach is preferred to be more robust. Thanks, Andrea A

[Qemu-devel] [PATCH 1/1] mm: gup: teach get_user_pages_unlocked to handle FOLL_NOWAIT

2018-03-02 Thread Andrea Arcangeli
cause of FOLL_NOWAIT. Reported-by: Dr. David Alan Gilbert Tested-by: Dr. David Alan Gilbert Signed-off-by: Andrea Arcangeli --- mm/gup.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 1b46e6e74881..6afae32571ca 100644 --- a/mm/gup.c +++

Re: [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups

2017-07-12 Thread Andrea Arcangeli
On Tue, Jul 11, 2017 at 12:22:32PM +0800, Peter Xu wrote: > On Wed, Jun 28, 2017 at 08:00:40PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" > > > > Cause the vhost-user client to be woken up whenever: > > a) We place a page in postcopy mode > > Just to make sur

Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages

2017-04-27 Thread Andrea Arcangeli
On Thu, Apr 27, 2017 at 08:44:03AM +0200, Christian Borntraeger wrote: > I have started instrumenting the kernel. I can see a set_pte_at for this > address > and I see an (to be understood) invalidation shortly after that which explains > why I get a fault. Sounds great that you can see an invali

Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages

2017-04-26 Thread Andrea Arcangeli
Hello, On Wed, Apr 26, 2017 at 08:04:43PM +0100, Dr. David Alan Gilbert wrote: > * Christian Borntraeger (borntrae...@de.ibm.com) wrote: > > On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" > > > > > > When an all-zero page is received during the p

Re: [Qemu-devel] [LSF/MM TOPIC][LSF/MM, ATTEND] shared TLB, hugetlb reservations

2017-03-14 Thread Andrea Arcangeli
Hello, On Wed, Mar 08, 2017 at 05:30:55PM -0800, Mike Kravetz wrote: > On 01/10/2017 03:02 PM, Mike Kravetz wrote: > > Another more concrete topic is hugetlb reservations. Michal Hocko > > proposed the topic "mm patches review bandwidth", and brought up the > > related subject of areas in need of

Re: [Qemu-devel] Dual userfaultfd behavior

2017-03-13 Thread Andrea Arcangeli
Hello, On Mon, Mar 13, 2017 at 10:53:39AM +, Dr. David Alan Gilbert wrote: > * Alexey Perevalov (a.pereva...@samsung.com) wrote: > > Hi, David, Andrea and Mike > > Hi Alexey, > > > The problem I want to discuss it's 1G hugepage based VM and post copy live > > migration. > > > > I would like

Re: [Qemu-devel] [PATCH v3] os: don't corrupt pre-existing memory-backend data with prealloc

2017-03-03 Thread Andrea Arcangeli
it to memset for a > single byte write. > > Signed-off-by: Daniel P. Berrange > --- > > Changed in v3: Reviewed-by: Andrea Arcangeli Thanks, Andrea

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2017-02-28 Thread Andrea Arcangeli
Hello, On Tue, Feb 28, 2017 at 09:48:26AM +0800, Hailiang Zhang wrote: > Yes, for current implementing of live snapshot, it supports tcg, > but does not support kvm mode, the reason i have mentioned above, > if you try to implement it, i think you need to start from userfaultfd > supporting KVM. T

Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc

2017-02-27 Thread Andrea Arcangeli
Hello, On Fri, Feb 24, 2017 at 05:27:14PM +, Daniel P. Berrange wrote: > diff --git a/util/oslib-posix.c b/util/oslib-posix.c > index 35012b9..2a5bb93 100644 > --- a/util/oslib-posix.c > +++ b/util/oslib-posix.c > @@ -355,7 +355,20 @@ void os_mem_prealloc(int fd, char *area, size_t memory, >

Re: [Qemu-devel] [PATCH v2 00/16] Postcopy: Hugepage support

2017-02-27 Thread Andrea Arcangeli
Hello, On Mon, Feb 27, 2017 at 11:26:58AM +, Dr. David Alan Gilbert wrote: > * Alexey Perevalov (a.pereva...@samsung.com) wrote: > > Also if I'm not wrong, commands and pages are transferred over the same > > socket. Why not to use OOB TCP in this case for commands? > > My understanding was t

Re: [Qemu-devel] [PATCH v2 00/16] Postcopy: Hugepage support

2017-02-17 Thread Andrea Arcangeli
Hello Alexey, On Tue, Feb 14, 2017 at 05:48:25PM +0300, Alexey Perevalov wrote: > On Mon, Feb 13, 2017 at 06:57:22PM +0100, Andrea Arcangeli wrote: > > Hello, > > > > On Mon, Feb 13, 2017 at 08:11:06PM +0300, Alexey Perevalov wrote: > > > Another one request. &g

Re: [Qemu-devel] [PATCH v2 00/16] Postcopy: Hugepage support

2017-02-13 Thread Andrea Arcangeli
On Mon, Feb 13, 2017 at 06:57:22PM +0100, Andrea Arcangeli wrote: > Hello, > > On Mon, Feb 13, 2017 at 08:11:06PM +0300, Alexey Perevalov wrote: > > Another one request. > > QEMU could use mem_path in hugefs with share key simultaneously > > (-object > >

Re: [Qemu-devel] [PATCH v2 00/16] Postcopy: Hugepage support

2017-02-13 Thread Andrea Arcangeli
Hello, On Mon, Feb 13, 2017 at 08:11:06PM +0300, Alexey Perevalov wrote: > Another one request. > QEMU could use mem_path in hugefs with share key simultaneously > (-object > memory-backend-file,id=mem,size=${mem_size},mem-path=${mem_path},share=on) > and vm > in this case will start and will pr

Re: [Qemu-devel] [PATCH v7 08/11] x86, kvm/x86.c: support vcpu preempted check

2016-12-19 Thread Andrea Arcangeli
not fundamental for correct functionality of the guest pv spinlock code. This bug was introduced in commit 0b9f6c4615c993d2b552e0d2bd1ade49b56e5beb in v4.9-rc7. >From 458897fd44aa9b91459a006caa4051a7d1628a23 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Sat, 17 Dec 2016 18:43:52 +0100 Subject: [PATCH 1/2

Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-12-16 Thread Andrea Arcangeli
On Thu, Dec 15, 2016 at 05:40:45PM -0800, Dave Hansen wrote: > On 12/15/2016 05:38 PM, Li, Liang Z wrote: > > > > Use 52 bits for 'pfn', 12 bits for 'length', when the 12 bits is not long > > enough for the 'length' > > Set the 'length' to a special value to indicate the "actual length in next >

Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-12-16 Thread Andrea Arcangeli
On Fri, Dec 16, 2016 at 01:12:21AM +, Li, Liang Z wrote: > There still exist the case if the MAX_ORDER is configured to a large value, > e.g. 36 for a system > with huge amount of memory, then there is only 28 bits left for the pfn, > which is not enough. Not related to the balloon but how w

Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-12-09 Thread Andrea Arcangeli
Hello, On Fri, Dec 09, 2016 at 05:35:45AM +, Li, Liang Z wrote: > > On 12/08/2016 08:45 PM, Li, Liang Z wrote: > > > What's the conclusion of your discussion? It seems you want some > > > statistic before deciding whether to ripping the bitmap from the ABI, > > > am I right? > > > > I think

Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-12-07 Thread Andrea Arcangeli
On Wed, Dec 07, 2016 at 11:54:34AM -0800, Dave Hansen wrote: > We're talking about a bunch of different stuff which is all being > conflated. There are 3 issues here that I can see. I'll attempt to > summarize what I think is going on: > > 1. Current patches do a hypercall for each order in the

Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-12-07 Thread Andrea Arcangeli
On Wed, Dec 07, 2016 at 10:44:31AM -0800, Dave Hansen wrote: > On 12/07/2016 10:38 AM, Andrea Arcangeli wrote: > >> > and leaves room for the bitmap size to be encoded as well, if we decide > >> > we need a bitmap in the future. > > How would a bitmap ever be

Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-12-07 Thread Andrea Arcangeli
Hello, On Wed, Dec 07, 2016 at 08:57:01AM -0800, Dave Hansen wrote: > It is more space-efficient. We're fitting the order into 6 bits, which > would allows the full 2^64 address space to be represented in one entry, Very large order is the same as very large len, 6 bits of order or 8 bytes of le

Re: [Qemu-devel] Async savevm using userfaultfd(2)

2016-10-13 Thread Andrea Arcangeli
Hello, On Thu, Oct 13, 2016 at 09:30:49AM +0100, Dr. David Alan Gilbert wrote: > I think it should, or at least I think all other kernel things end up being > caught by userfaultfd during postcopy. Yes indeed, it will work. vhost blocks in its own task context inside the kernel and the vmsave/pos

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-08-18 Thread Andrea Arcangeli
Hello everyone, I've an aa.git tree uptodate on the master & userfault branch (master includes other pending VM stuff, userfault branch only contains userfault enhancements): https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault I didn't have time to test KVM live memory sn

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-05 Thread Andrea Arcangeli
Hello, On Tue, Jul 05, 2016 at 11:57:31AM +0200, Baptiste Reynal wrote: > Ok, if it is not on Andrea schedule I am willing to take the action, > at least for ARM/ARM64 support. A few days ago I released this update: https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/ git clone -b master

Re: [Qemu-devel] Default for phys-addr-bits? (was Re: [PATCH 4/5] x86: Allow physical address bits to be set)

2016-06-22 Thread Andrea Arcangeli
On Thu, Jun 23, 2016 at 01:44:06AM +0300, Michael S. Tsirkin wrote: > On Wed, Jun 22, 2016 at 04:24:14PM +0200, Andrea Arcangeli wrote: > > > cause malfunctioning, only crashes (and as Gerd said, if you cross your > > > fingers and hope the guest doesn't put

Re: [Qemu-devel] Default for phys-addr-bits? (was Re: [PATCH 4/5] x86: Allow physical address bits to be set)

2016-06-22 Thread Andrea Arcangeli
On Thu, Jun 23, 2016 at 01:40:42AM +0300, Michael S. Tsirkin wrote: > Where's a problem then? If EPT/NPT is enabled, the guest pagetables are parsed by the hardware and not by the KVM shadow MMU in software. The hardware speaks host phys bits and AFIK the hardware will behave different depending o

Re: [Qemu-devel] Default for phys-addr-bits? (was Re: [PATCH 4/5] x86: Allow physical address bits to be set)

2016-06-22 Thread Andrea Arcangeli
On Wed, Jun 22, 2016 at 04:33:18PM +0200, Paolo Bonzini wrote: > > > On 22/06/2016 16:24, Andrea Arcangeli wrote: > > Linux could not possibly crash instead if host phys bits > guest phys > > bits because it will never depend on GPF triggering if the must be > > zer

Re: [Qemu-devel] Default for phys-addr-bits? (was Re: [PATCH 4/5] x86: Allow physical address bits to be set)

2016-06-22 Thread Andrea Arcangeli
On Wed, Jun 22, 2016 at 04:48:50PM +0200, Paolo Bonzini wrote: > KVM encodes other information in the sPTE when it sets the reserved bit > (a generation count). Instead of using all bits up to 51, KVM could > well use bit MAXPHYADDR+1 as a marker and add bits MAXPHYADDR+2...51 to > the generation

Re: [Qemu-devel] Default for phys-addr-bits? (was Re: [PATCH 4/5] x86: Allow physical address bits to be set)

2016-06-22 Thread Andrea Arcangeli
Hello, On Wed, Jun 22, 2016 at 02:41:22PM +0200, Paolo Bonzini wrote: > From a semantics point of view, using a smaller phys-addr-bits than the > host is the worst, because you tell the guest that some bits are > must-be-zero, when they're not. Using a larger phys-addr-bits cannot Ok, so EPT/KVM

Re: [Qemu-devel] [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 05:57:30PM +0200, Andrea Arcangeli wrote: > couldn't do a fix as cleaner as this one for 4.6. ehm "cleaner then" If you've suggestions for a better name than PageTransCompoundMap I can respin a new patch though, I considered "CanMap"

Re: [Qemu-devel] [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 06:18:34PM +0300, Kirill A. Shutemov wrote: > Okay, I see. > > But do we really want to make PageTransCompoundMap() visiable beyond KVM > code? It looks like too KVM-specific. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will ne

Re: [Qemu-devel] [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 04:50:30PM +0300, Kirill A. Shutemov wrote: > I know nothing about kvm. How do you protect against pmd splitting between > get_user_pages() and the check? get_user_pages_fast() runs fully lockless and unpins the page right away (we need a get_user_pages_fast without the FOL

Re: [Qemu-devel] post-copy is broken?

2016-04-27 Thread Andrea Arcangeli
Hello Liang, On Mon, Apr 18, 2016 at 10:33:14AM +, Li, Liang Z wrote: > If the THP is disabled, no fails. > And your test was always passed, even when real post-copy was failed. > > In my env, the output of > 'cat /sys/kernel/mm/transparent_hugepage/enabled' is: > > [always] ... > Can

[Qemu-devel] [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
ical page split), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same m

Re: [Qemu-devel] post-copy is broken?

2016-04-15 Thread Andrea Arcangeli
On Fri, Apr 15, 2016 at 06:23:30PM +0300, Kirill A. Shutemov wrote: > The same here. Freshly booted machine with 64GiB ram. I've checked > /proc/vmstat: huge pages were allocated I tried the test in a loop and I can't reproduce it here. Tested with gcc 4.9.3 and glibc 2.21 and glibc 2.22 so far,

Re: [Qemu-devel] post-copy is broken?

2016-04-14 Thread Andrea Arcangeli
Adding linux-mm too, On Thu, Apr 14, 2016 at 01:34:41PM +0100, Dr. David Alan Gilbert wrote: > * Andrea Arcangeli (aarca...@redhat.com) wrote: > > > The next suspect is the massive THP refcounting change that went > > upstream recently: > > > As further debug hint,

Re: [Qemu-devel] post-copy is broken?

2016-04-13 Thread Andrea Arcangeli
On Wed, Apr 13, 2016 at 01:50:53PM +0100, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilb...@redhat.com) wrote: > > > +if ( ((b + 1) % 255) == last_byte && !hit_edge) { > > Ahem, that should be 256. > > I'm going to bisect the kernel and see where we get to. > Andrea'

Re: [Qemu-devel] [PATCH 01/23] userfaultfd: linux/Documentation/vm/userfaultfd.txt

2015-12-04 Thread Andrea Arcangeli
Hello Michael, On Fri, Dec 04, 2015 at 04:50:03PM +0100, Michael Kerrisk (man-pages) wrote: > Hi Andrea, > > On 09/11/2015 10:47 AM, Michael Kerrisk (man-pages) wrote: > > On 05/14/2015 07:30 PM, Andrea Arcangeli wrote: > >> Add documentation. > > > > Hi And

Re: [Qemu-devel] [PATCH 14/23] userfaultfd: wake pending userfaults

2015-10-22 Thread Andrea Arcangeli
On Thu, Oct 22, 2015 at 05:15:09PM +0200, Peter Zijlstra wrote: > Indefinitely is such a long time, we should try and finish > computation before the computer dies etc. :-) Indefinitely as read_seqcount_retry, eventually it makes progress. Even returning 0 from the page fault can trigger it again

Re: [Qemu-devel] [PATCH 14/23] userfaultfd: wake pending userfaults

2015-10-22 Thread Andrea Arcangeli
On Thu, Oct 22, 2015 at 03:38:24PM +0200, Peter Zijlstra wrote: > On Thu, Oct 22, 2015 at 03:20:15PM +0200, Andrea Arcangeli wrote: > > > If schedule spontaneously wakes up a task in TASK_KILLABLE state that > > would be a bug in the scheduler in my view. Luckily there doesn

Re: [Qemu-devel] [PATCH 14/23] userfaultfd: wake pending userfaults

2015-10-22 Thread Andrea Arcangeli
On Thu, Oct 22, 2015 at 02:10:56PM +0200, Peter Zijlstra wrote: > On Thu, May 14, 2015 at 07:31:11PM +0200, Andrea Arcangeli wrote: > > @@ -255,21 +259,23 @@ int handle_userfault(struct vm_area_struct *vma, > > unsigned long address, > >

Re: [Qemu-devel] [PATCH 0/7] userfault21 update

2015-10-19 Thread Andrea Arcangeli
Hello Patrick, On Mon, Oct 12, 2015 at 11:04:11AM -0400, Patrick Donnelly wrote: > Hello Andrea, > > On Mon, Jun 15, 2015 at 1:22 PM, Andrea Arcangeli wrote: > > This is an incremental update to the userfaultfd code in -mm. > > Sorry I'm late to this party. I'

Re: [Qemu-devel] [PATCH 19/23] userfaultfd: activate syscall

2015-08-11 Thread Andrea Arcangeli
nclude > > > -#define __NR_syscalls364 > +#define __NR_syscalls365 > > #define __NR__exit __NR_exit > #define NR_syscalls __NR_syscalls Reviewed-by: Andrea Arcangeli

Re: [Qemu-devel] [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization

2015-06-23 Thread Andrea Arcangeli
Hi Dave, On Tue, Jun 23, 2015 at 12:00:19PM -0700, Dave Hansen wrote: > Down in userfaultfd_wake_function(), it looks like you intended for a > len=0 to mean "wake all". But the validate_range() that we do from > userspace has a !len check in it, which keeps us from passing a len=0 in > from user

Re: [Qemu-devel] [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads

2015-06-16 Thread Andrea Arcangeli
On Mon, Jun 15, 2015 at 08:41:24PM -1000, Linus Torvalds wrote: > On Mon, Jun 15, 2015 at 12:19 PM, Andrea Arcangeli > wrote: > > > > Yes, it would leave the other blocked, how is it different from having > > just 1 reader and it gets killed? > > Either is complet

Re: [Qemu-devel] [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads

2015-06-15 Thread Andrea Arcangeli
On Mon, Jun 15, 2015 at 08:19:07AM -1000, Linus Torvalds wrote: > What if the process doing the polling never doors anything with the end > result? Maybe it meant to, but it got killed before it could? Are you going > to leave everybody else blocked, even though there are pending events? Yes, it w

Re: [Qemu-devel] [PATCH 1/7] userfaultfd: require UFFDIO_API before other ioctls

2015-06-15 Thread Andrea Arcangeli
On Mon, Jun 15, 2015 at 08:11:50AM -1000, Linus Torvalds wrote: > On Jun 15, 2015 7:22 AM, "Andrea Arcangeli" wrote: > > > > + if (cmd != UFFDIO_API) { > > + if (ctx->state == UFFD_STATE_WAIT_API) > > + return

[Qemu-devel] [PATCH 7/7] userfaultfd: selftest

2015-06-15 Thread Andrea Arcangeli
by userfaultfd. The fix for those two bugs was also strightforward and required no design change of any sort. Signed-off-by: Andrea Arcangeli --- tools/testing/selftests/vm/Makefile | 4 +- tools/testing/selftests/vm/userfaultfd.c | 669 +++ 2 files changed

[Qemu-devel] [PATCH 6/7] userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"

2015-06-15 Thread Andrea Arcangeli
ce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 8 include/linux/wait.h | 5 ++--- kernel/sched/wait.c | 7 +++ net/sunrpc/sched.c | 2 +- 4 files changed, 10 insertions(+), 12 deletions(-) diff --git a/fs/userf

[Qemu-devel] [PATCH 0/7] userfault21 update

2015-06-15 Thread Andrea Arcangeli
r CPUs. CPU bugs in SIMD cannot be ruled out either yet. Andrea Arcangeli (7): userfaultfd: require UFFDIO_API before other ioctls userfaultfd: propagate the full address in THP faults userfaultfd: allow signals to interrupt a userfault userfaultfd: avoid missing wakeups during refile

[Qemu-devel] [PATCH 4/7] userfaultfd: avoid missing wakeups during refile in userfaultfd_read

2015-06-15 Thread Andrea Arcangeli
During the refile in userfaultfd_read both waitqueues could look empty to the lockless wake_userfault(). Use a seqcount to prevent this false negative that could leave an userfault blocked. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 26 -- 1 file changed, 24

[Qemu-devel] [PATCH 1/7] userfaultfd: require UFFDIO_API before other ioctls

2015-06-15 Thread Andrea Arcangeli
(all but UFFDIO_API/struct uffdio_api) with a bump of uffdio_api.api. There's no actual plan or need to change the API or the ioctl, the current API already should cover fine even the non cooperative usage, but this is just for the longer term future just in case. Signed-off-by: Andrea Arca

[Qemu-devel] [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads

2015-06-15 Thread Andrea Arcangeli
635,219,658 branches # 256.660 M/sec ( +- 0.71% ) [83.69%] 59,203,898 branch-misses #0.51% of all branches ( +- 2.03% ) [83.54%] 2.600912438 seconds time elapsed ( +- 0.02% ) Signed

[Qemu-devel] [PATCH 3/7] userfaultfd: allow signals to interrupt a userfault

2015-06-15 Thread Andrea Arcangeli
need to get signal processed, coredumps always worked perfectly with userfaults, no matter if the userfault is triggered by GUP a kernel copy_user or directly from userland. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 35 --- 1 file changed, 32 inse

[Qemu-devel] [PATCH 2/7] userfaultfd: propagate the full address in THP faults

2015-06-15 Thread Andrea Arcangeli
IGBUS failure because the wrong page was being copied. For various reasons this wasn't easily reproducible in the qemu workload, but the strestest exposed the problem immediately. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletion

Re: [Qemu-devel] [PATCH 22/23] userfaultfd: avoid mmap_sem read recursion in mcopy_atomic

2015-05-22 Thread Andrea Arcangeli
he buildbot was shutdown recently? That buildbot was very useful to detect for problems like this. === >From 2f0a48670dc515932dec8b983871ec35caeba553 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Sat, 23 May 2015 02:26:32 +0200 Subject: [PATCH] userfaultfd: update the uffd_msg structure to be the same on 32

Re: [Qemu-devel] [PATCH 22/23] userfaultfd: avoid mmap_sem read recursion in mcopy_atomic

2015-05-22 Thread Andrea Arcangeli
On Fri, May 22, 2015 at 01:18:22PM -0700, Andrew Morton wrote: > On Thu, 14 May 2015 19:31:19 +0200 Andrea Arcangeli > wrote: > > > If the rwsem starves writers it wasn't strictly a bug but lockdep > > doesn't like it and this avoids depending on lowlevel impleme

Re: [Qemu-devel] [PATCH 00/23] userfaultfd v4

2015-05-21 Thread Andrea Arcangeli
Hi Kirill, On Thu, May 21, 2015 at 04:11:11PM +0300, Kirill Smelkov wrote: > Sorry for maybe speaking up too late, but here is additional real Not too late, in fact I don't think there's any change required for this at this stage, but it'd be great if you could help me to review. > Since arrays

Re: [Qemu-devel] [PATCH 00/23] userfaultfd v4

2015-05-20 Thread Andrea Arcangeli
Hello Richard, On Tue, May 19, 2015 at 11:59:42PM +0200, Richard Weinberger wrote: > On Tue, May 19, 2015 at 11:38 PM, Andrew Morton > wrote: > > On Thu, 14 May 2015 19:30:57 +0200 Andrea Arcangeli > > wrote: > > > >> This is the latest userfaultfd patchset ag

Re: [Qemu-devel] [PATCH 00/23] userfaultfd v4

2015-05-20 Thread Andrea Arcangeli
Hi Andrew, On Tue, May 19, 2015 at 02:38:01PM -0700, Andrew Morton wrote: > On Thu, 14 May 2015 19:30:57 +0200 Andrea Arcangeli > wrote: > > > This is the latest userfaultfd patchset against mm-v4.1-rc3 > > 2015-05-14-10:04. > > It would be useful to have some userf

Re: [Qemu-devel] [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization

2015-05-15 Thread Andrea Arcangeli
On Thu, May 14, 2015 at 10:49:06AM -0700, Linus Torvalds wrote: > On Thu, May 14, 2015 at 10:31 AM, Andrea Arcangeli > wrote: > > +static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx, > > + struct userfaultfd_wake

[Qemu-devel] [PATCH 23/23] userfaultfd: UFFDIO_COPY and UFFDIO_ZEROPAGE

2015-05-14 Thread Andrea Arcangeli
These two ioctl allows to either atomically copy or to map zeropages into the virtual address space. This is used by the thread that opened the userfaultfd to resolve the userfaults. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 96

[Qemu-devel] [PATCH 14/23] userfaultfd: wake pending userfaults

2015-05-14 Thread Andrea Arcangeli
E. This is even more significant in case of repeated faults on the same address from multiple threads. This optimization is justified by the measurement that the number of spurious UFFDIO_WAKE accounts for 5% and 10% of the total userfaults for heavy workloads, so it's worth optimizing

[Qemu-devel] [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization

2015-05-14 Thread Andrea Arcangeli
to know when there are new pending userfaults to be read (POLLIN). Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 1008 ++ 1 file changed, 1008 insertions(+) create mode 100644 fs/userfaultfd.c diff --git a/fs/userfaultfd.c b/fs

[Qemu-devel] [PATCH 04/23] userfaultfd: linux/userfaultfd_k.h

2015-05-14 Thread Andrea Arcangeli
Kernel header defining the methods needed by the VM common code to interact with the userfaultfd. Signed-off-by: Andrea Arcangeli --- include/linux/userfaultfd_k.h | 79 +++ 1 file changed, 79 insertions(+) create mode 100644 include/linux

[Qemu-devel] [PATCH 06/23] userfaultfd: add VM_UFFD_MISSING and VM_UFFD_WP

2015-05-14 Thread Andrea Arcangeli
These two flags gets set in vma->vm_flags to tell the VM common code if the userfaultfd is armed and in which mode (only tracking missing faults, only tracking wrprotect faults or both). If neither flags is set it means the userfaultfd is not armed on the vma. Signed-off-by: Andrea Arcang

[Qemu-devel] [PATCH 13/23] userfaultfd: change the read API to return a uffd_msg

2015-05-14 Thread Andrea Arcangeli
able overhead. The total number of new events that can be extended or of new future bits for already shipped events, is limited to 64 by the features field of the uffdio_api structure. If more will be needed a bump of UFFD_API will be required. Signed-off-by: Andrea Arcangeli --- Documentation

[Qemu-devel] [PATCH 03/23] userfaultfd: uAPI

2015-05-14 Thread Andrea Arcangeli
Defines the uAPI of the userfaultfd, notably the ioctl numbers and protocol. Signed-off-by: Andrea Arcangeli --- Documentation/ioctl/ioctl-number.txt | 1 + include/uapi/linux/Kbuild| 1 + include/uapi/linux/userfaultfd.h | 81 3 files

[Qemu-devel] [PATCH 07/23] userfaultfd: call handle_userfault() for userfaultfd_missing() faults

2015-05-14 Thread Andrea Arcangeli
as parameter so the "read|write" kind of fault can be passed to userland. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 69 ++-- mm/memory.c | 16 + 2 files changed, 63 insertions(+), 22 deletions(-) di

[Qemu-devel] [PATCH 16/23] userfaultfd: allocate the userfaultfd_ctx cacheline aligned

2015-05-14 Thread Andrea Arcangeli
Use proper slab to guarantee alignment. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 39 +++ 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3d26f41..5542fe7 100644 --- a/fs/userfaultfd.c

[Qemu-devel] [PATCH 00/23] userfaultfd v4

2015-05-14 Thread Andrea Arcangeli
89fc3b8e338daa58e18 o Extendeded the Documentation userfaultfd.txt file to explain how QEMU/KVM uses userfaultfd to implement postcopy live migration. http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=016f9523b7b2238851533736e84452cb00b2ddcd And

[Qemu-devel] [PATCH 21/23] userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation

2015-05-14 Thread Andrea Arcangeli
This implements mcopy_atomic and mfill_zeropage that are the lowlevel VM methods that are invoked respectively by the UFFDIO_COPY and UFFDIO_ZEROPAGE userfaultfd commands. Signed-off-by: Andrea Arcangeli --- include/linux/userfaultfd_k.h | 6 + mm/Makefile | 1 + mm

[Qemu-devel] [PATCH 08/23] userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx

2015-05-14 Thread Andrea Arcangeli
vma->vm_userfaultfd_ctx is yet another vma parameter that vma_merge must be aware about so that we can merge vmas back like they were originally before arming the userfaultfd on some memory range. Signed-off-by: Andrea Arcangeli --- include/linux/mm.h | 2 +- mm/madvise.c | 3 ++-

[Qemu-devel] [PATCH 01/23] userfaultfd: linux/Documentation/vm/userfaultfd.txt

2015-05-14 Thread Andrea Arcangeli
Add documentation. Signed-off-by: Andrea Arcangeli --- Documentation/vm/userfaultfd.txt | 140 +++ 1 file changed, 140 insertions(+) create mode 100644 Documentation/vm/userfaultfd.txt diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm

[Qemu-devel] [PATCH 02/23] userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key

2015-05-14 Thread Andrea Arcangeli
userfaultfd needs to wake all waitqueues (pass 0 as nr parameter), instead of the current hardcoded 1 (that would wake just the first waitqueue in the head list). Signed-off-by: Andrea Arcangeli --- include/linux/wait.h | 5 +++-- kernel/sched/wait.c | 7 --- net/sunrpc/sched.c | 2 +- 3

[Qemu-devel] [PATCH 09/23] userfaultfd: prevent khugepaged to merge if userfaultfd is armed

2015-05-14 Thread Andrea Arcangeli
tiny corner case. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c221be3..9671f51 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2198,7 +2198,8 @@ static int __col

[Qemu-devel] [PATCH 20/23] userfaultfd: UFFDIO_COPY|UFFDIO_ZEROPAGE uAPI

2015-05-14 Thread Andrea Arcangeli
This implements the uABI of UFFDIO_COPY and UFFDIO_ZEROPAGE. Signed-off-by: Andrea Arcangeli --- include/uapi/linux/userfaultfd.h | 42 +++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux

[Qemu-devel] [PATCH 12/23] userfaultfd: Rename uffd_api.bits into .features fixup

2015-05-14 Thread Andrea Arcangeli
Update comment. Signed-off-by: Andrea Arcangeli --- include/uapi/linux/userfaultfd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 5e1c2f7..03f21cb 100644 --- a/include/uapi/linux/userfaultfd.h +++ b

[Qemu-devel] [PATCH 22/23] userfaultfd: avoid mmap_sem read recursion in mcopy_atomic

2015-05-14 Thread Andrea Arcangeli
If the rwsem starves writers it wasn't strictly a bug but lockdep doesn't like it and this avoids depending on lowlevel implementation details of the lock. Signed-off-by: Andrea Arcangeli --- mm/userfaultfd.c | 92 1 file c

[Qemu-devel] [PATCH 11/23] userfaultfd: Rename uffd_api.bits into .features

2015-05-14 Thread Andrea Arcangeli
Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 4 ++-- include/uapi/linux/userfaultfd.h | 10 -- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 1c9be61..9085365 100644 --- a/fs/userfaultfd.c +++ b/fs

[Qemu-devel] [PATCH 05/23] userfaultfd: add vm_userfaultfd_ctx to the vm_area_struct

2015-05-14 Thread Andrea Arcangeli
This adds the vm_userfaultfd_ctx to the vm_area_struct. Signed-off-by: Andrea Arcangeli --- include/linux/mm_types.h | 11 +++ kernel/fork.c| 1 + 2 files changed, 12 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0038ac7..2836da7

[Qemu-devel] [PATCH 18/23] userfaultfd: buildsystem activation

2015-05-14 Thread Andrea Arcangeli
This allows to select the userfaultfd during configuration to build it. Signed-off-by: Andrea Arcangeli --- fs/Makefile | 1 + init/Kconfig | 11 +++ 2 files changed, 12 insertions(+) diff --git a/fs/Makefile b/fs/Makefile index cb92fd4..53e59b2 100644 --- a/fs/Makefile +++ b/fs

[Qemu-devel] [PATCH 17/23] userfaultfd: solve the race between UFFDIO_COPY|ZEROPAGE and read

2015-05-14 Thread Andrea Arcangeli
rom userfault thread This patch removes the need of both UFFDIO_WAKE and of the associated per-page tristate as well. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 81 +--- 1 file changed, 66 insertions(+), 15 deletions(-) diff --gi

[Qemu-devel] [PATCH 15/23] userfaultfd: optimize read() and poll() to be O(1)

2015-05-14 Thread Andrea Arcangeli
This makes read O(1) and poll that was already O(1) becomes lockless. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 172 +++ 1 file changed, 98 insertions(+), 74 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index

[Qemu-devel] [PATCH 19/23] userfaultfd: activate syscall

2015-05-14 Thread Andrea Arcangeli
This activates the userfaultfd syscall. Signed-off-by: Andrea Arcangeli --- arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/uapi/asm/unistd.h | 1 + arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h

[Qemu-devel] [PATCH 09/21] userfaultfd: prevent khugepaged to merge if userfaultfd is armed

2015-03-05 Thread Andrea Arcangeli
tiny corner case. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5374132..8f1b6a5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2145,7 +2145,8 @@ static int __col

[Qemu-devel] [PATCH 18/21] userfaultfd: UFFDIO_REMAP uABI

2015-03-05 Thread Andrea Arcangeli
This implements the uABI of UFFDIO_REMAP. Notably one mode bitflag is also forwarded (and in turn known) by the lowlevel remap_pages method. Signed-off-by: Andrea Arcangeli --- include/uapi/linux/userfaultfd.h | 27 ++- 1 file changed, 26 insertions(+), 1 deletion

Re: [Qemu-devel] [PATCH 19/21] userfaultfd: remap_pages: UFFDIO_REMAP preparation

2015-03-05 Thread Andrea Arcangeli
On Thu, Mar 05, 2015 at 09:39:48AM -0800, Linus Torvalds wrote: > Is this really worth it? On real loads? That people are expected to use? I fully agree that it's not worth merging upstream UFFDIO_REMAP until (and if) a real world usage for it will showup. To further clarify: would this not have b

[Qemu-devel] [PATCH 05/21] userfaultfd: add vm_userfaultfd_ctx to the vm_area_struct

2015-03-05 Thread Andrea Arcangeli
This adds the vm_userfaultfd_ctx to the vm_area_struct. Signed-off-by: Andrea Arcangeli --- include/linux/mm_types.h | 11 +++ kernel/fork.c| 1 + 2 files changed, 12 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 199a03a..fbf21f5

[Qemu-devel] [PATCH 03/21] userfaultfd: uAPI

2015-03-05 Thread Andrea Arcangeli
Defines the uAPI of the userfaultfd, notably the ioctl numbers and protocol. Signed-off-by: Andrea Arcangeli --- Documentation/ioctl/ioctl-number.txt | 1 + include/uapi/linux/userfaultfd.h | 81 2 files changed, 82 insertions(+) create mode 100644

[Qemu-devel] [PATCH 14/21] userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation

2015-03-05 Thread Andrea Arcangeli
This implements mcopy_atomic and mfill_zeropage that are the lowlevel VM methods that are invoked respectively by the UFFDIO_COPY and UFFDIO_ZEROPAGE userfaultfd commands. Signed-off-by: Andrea Arcangeli --- include/linux/userfaultfd_k.h | 6 + mm/Makefile | 1 + mm

[Qemu-devel] [PATCH 21/21] userfaultfd: add userfaultfd_wp mm helpers

2015-03-05 Thread Andrea Arcangeli
These helpers will be used to know if to call handle_userfault() during wrprotect faults in order to deliver the wrprotect faults to userland. Signed-off-by: Andrea Arcangeli --- include/linux/userfaultfd_k.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux

[Qemu-devel] [PATCH 10/21] userfaultfd: add new syscall to provide memory externalization

2015-03-05 Thread Andrea Arcangeli
to know when there are new pending userfaults to be read (POLLIN). Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 977 +++ 1 file changed, 977 insertions(+) create mode 100644 fs/userfaultfd.c diff --git a/fs/userfaultfd.c b/fs

[Qemu-devel] [PATCH 04/21] userfaultfd: linux/userfaultfd_k.h

2015-03-05 Thread Andrea Arcangeli
Kernel header defining the methods needed by the VM common code to interact with the userfaultfd. Signed-off-by: Andrea Arcangeli --- include/linux/userfaultfd_k.h | 79 +++ 1 file changed, 79 insertions(+) create mode 100644 include/linux

[Qemu-devel] [PATCH 00/21] RFC: userfaultfd v3

2015-03-05 Thread Andrea Arcangeli
a fully backwards compatible change and it's only strictly required by the wrprotect tracking mode, so it's no problem to solve this later. Because of its inherent racy nature, nobody could possibly depend on a racy SIGBUS being raised now, when it won't be raised anymore later. Andre

[Qemu-devel] [PATCH 07/21] userfaultfd: call handle_userfault() for userfaultfd_missing() faults

2015-03-05 Thread Andrea Arcangeli
as parameter so the "read|write" kind of fault can be passed to userland. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 68 ++-- mm/memory.c | 16 + 2 files changed, 62 insertions(+), 22 deletions(-) di

[Qemu-devel] [PATCH 19/21] userfaultfd: remap_pages: UFFDIO_REMAP preparation

2015-03-05 Thread Andrea Arcangeli
remap_pages is the lowlevel mm helper needed to implement UFFDIO_REMAP. Signed-off-by: Andrea Arcangeli --- include/linux/userfaultfd_k.h | 17 ++ mm/huge_memory.c | 120 ++ mm/userfaultfd.c | 526 ++ 3 files changed

  1   2   3   >