) and so could generate both
notify_invalidate() and a notify_populate() events.
Hence "fallocate" as an internal mm namespace or operation does not
belong anywhere in core MM infrastructure - it should never get used
anywhere other than the VFS/filesystem layers that implement the
fallocate() syscall or use it directly.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
usters to the end of the file (i.e. the file
itself is not sparse), while the extent size hint will just add 64kB
extents into the file around the write offset. That demonstrates the
other behavioural advantage that extent size hints have is they
avoid needing to extend the file, which is yet another way to
serialise concurrent IO and create IO pipeline stalls...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
you what the
device and filesytem are doing in real time (e.g. I use PCP for this
and visualise ithe behaviour in real time via pmchart) gives a lot
of insight into exactly what is changing during transient workload
changes liek starting a benchmark...
> I was running fio with --ramp_time=5 which ignores the first 5 seconds
> of data in order to let performance settle, but if I remove that I can
> see the effect more clearly. I can observe it with raw files (in 'off'
> and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and
> preallocation=off the performance is stable during the whole test.
What does "preallocation=off" mean again? Is that using
fallocate(ZERO_RANGE) prior to the data write rather than
preallocating the metadata/entire file? If so, I would expect the
limiting factor is the rate at which IO can be issued because of the
fallocate() triggered pipeline bubbles. That leaves idle device time
so you're not pushing the limits of the hardware and hence none of
the behaviours above will be evident...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
.
Thing is, once your writes into sprase image files regularly start
hitting written extents, the performance of (1), (2) and (4) will
trend towards (5) as writes hit already allocated ranges of the file
and the serialisation of extent mapping changes goes away. This
occurs with guest filesystems th
onous.
>*/
/*
* This is the correct multi-line comment format. Please
* update the patch to maintain the existing comment format.
*/
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
tor out all the "MAP_SYNC supported" checks into a
helper so that the filesystem code just doesn't have to care about
the details of checking for DAX+MAP_SYNC support
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
or daring to ask hard questions about
this topic
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
wn, which makes it very difficult for admins to manage.
> We are also planning to support qcow2 sparse image format at
> host side with virtio-pmem.
So you're going to be remapping a huge number of disjoint regions
into a linear pmem mapping? ISTR discussions about similar things
for virtio+fuse+dax that came up against "large numbers of mapped
regions don't scale" and so it wasn't a practical solution compared
to a just using raw sparse files
> - There is no existing solution for Qemu persistent memory
> emulation with write support currently. This solution provides
> us the paravartualized way of emulating persistent memory.
Sure, but the question is why do you need to create an emulation
that doesn't actually perform like pmem? The whole point of pmem is
performance, and emulating pmem by mmap() of a file on spinning
disks is going to be horrible for performance. Even on SSDs it's
going to be orders of magnitudes slower than real pmem.
So exactly what problem are you trying to solve with this driver?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Jan 14, 2019 at 01:35:57PM -0800, Dan Williams wrote:
> On Mon, Jan 14, 2019 at 1:25 PM Dave Chinner wrote:
> >
> > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote:
> > >
> > > > > Until you have images (and hence host page cache) s
ache exceptional
> entries.
> Its solely decision of host to take action on the host page cache pages.
>
> In case of virtio-pmem, guest does not modify host file directly i.e don't
> perform hole punch & truncation operation directly on host file.
... this will no longer be true, and the nuclear landmine in this
driver interface will have been armed
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sun, Jan 13, 2019 at 03:38:21PM -0800, Matthew Wilcox wrote:
> On Mon, Jan 14, 2019 at 10:29:02AM +1100, Dave Chinner wrote:
> > Until you have images (and hence host page cache) shared between
> > multiple guests. People will want to do this, because it means they
> > only
y of the same set of
pages. If the guests can then, in any way, control eviction of the
pages from the host cache, then we have a guest-to-guest information
leak channel.
i.e. it's something we need to be aware of and really careful about
enabling infrastructure that /will/ be abused if guests can find a
way to influence the host side cache residency.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
161
I might be wrong, but if I'm not we're going to have to be very
careful about how guest VMs can access and manipulate host side
resources like the page cache.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
M operating on top of the filesystem
before layout can be determined?
All of the above are *valid* and *correct*, because the filesytem
defines what FIEMAP returns for a given file offset. just because
ext4 and XFS have mostly the same behaviour, it doesn't mean that
every other filesystem behaves the same way.
The assumptions being made about FIEMAP behaviour will only lead to
user data corruption, as they already have several times in the past.
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
On Thu, Jul 21, 2016 at 01:31:21PM +0100, Pádraig Brady wrote:
> On 21/07/16 12:43, Dave Chinner wrote:
> > On Wed, Jul 20, 2016 at 03:35:17PM +0200, Niels de Vos wrote:
> >> Oh... And I was surprised to learn that "cp" does use FIEMAP and not
> >> SEEK_HOLE/SE
exactly what you
need. Using FIEMAP, fallocate and moving data through userspace
won't ever be reliable without special filesystem help (that only
exists for XFS right now), nor will it enable the application to
transparently use smart storage protocols and hardware when it is
present on user systems
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
On Wed, Jul 20, 2016 at 03:35:17PM +0200, Niels de Vos wrote:
> On Wed, Jul 20, 2016 at 10:30:25PM +1000, Dave Chinner wrote:
> > On Wed, Jul 20, 2016 at 05:19:37AM -0400, Paolo Bonzini wrote:
> > > Adding ext4 and XFS guys (Lukas and Dave respectively). As a quick
> >
s will report clean unwritten extents as
data.
3. Maybe - if there is written data in memory over the unwritten
extent on disk (i.e. hasn't been flushed to disk, it will be
considered a data region with non-zero data. (FIEMAP will still
report is as unwritten)
> If not, would
> it be acceptable to introduce Linux-specific SEEK_ZERO/SEEK_NONZERO, which
> would be similar to what SEEK_HOLE/SEEK_DATA do now?
To solve what problem? You haven't explained what problem you are
trying to solve yet.
> 2) for FIEMAP do we really need FIEMAP_FLAG_SYNC? And if not, for what
> filesystems and kernel releases is it really not needed?
I can't answer this question, either, because I don't know what
you want the fiemap information for.
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
On Fri, Jul 15, 2016 at 03:55:20PM +0800, Zhangfei Gao wrote:
> Dear Dave
>
> On Wed, Jul 13, 2016 at 7:03 AM, Dave Chinner wrote:
> > On Tue, Jul 12, 2016 at 12:43:24PM -0400, Theodore Ts'o wrote:
> >> On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei
gt;
> Given that I'm reguarly testing ext4 using kvm, and I haven't seen
> anything like this in a very long time, I suspect the problemb is with
> your SCSI code, and not with ext4.
It's the same error I reported yesterday for ext3 on 4.7-rc6 when
rebooting a VM after it hung.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
20 matches
Mail list logo