Yes I can sorta confirm the bug is in uswsusp. I removed the package
and pm-utils and used both "systemctl hibernate"  and "echo disk >>
/sys/power/state" to hibernate. It seems to succeed and shuts down, I
am just not able to resume from it, which seems to be a classical
problem solved just by setting the resume swap file/partition on grub.
(which i tried and didn't work even with nvidia disabled)

Anyway uswsusp is still necessary because the default kernel
hibernation doesn't work with the proprietary nvidia drivers as long
as I know  and tested.

Is there anyway I could get any workaround to this bug on my current
OS by the way?

On Wed, Apr 3, 2019 at 7:04 AM Rainer Fiebig <j...@mailbox.org> wrote:
>
> Am 03.04.19 um 11:34 schrieb Jan Kara:
> > On Tue 02-04-19 16:25:00, Andrew Morton wrote:
> >>
> >> I cc'ed a bunch of people from bugzilla.
> >>
> >> Folks, please please please remember to reply via emailed
> >> reply-to-all.  Don't use the bugzilla interface!
> >>
> >> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki"
> <rafael.j.wyso...@intel.com> wrote:
> >>
> >>> On 6/13/2014 6:55 AM, Johannes Weiner wrote:
> >>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> >>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>>>>>> Hi Oliver,
> >>>>>>>>
> >>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see
> [1]
> >>>>>>>>> attached.
> >>>>>>>>>
> >>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in
> detail
> >>>>>>>>> Probably more efficient when one of you guys looks directly.
> >>>>>>>> Thanks, this looks interesting.  balance_dirty_pages() wakes up the
> >>>>>>>> bdi_wq workqueue as it should:
> >>>>>>>>
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550413us :
> global_dirty_limits <-balance_dirty_pages_ratelimited
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us :
> global_dirtyable_memory <-global_dirty_limits
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us :
> writeback_in_progress <-balance_dirty_pages_ratelimited
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us :
> bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us :
> mod_delayed_work_on <-balance_dirty_pages_ratelimited
> >>>>>>>> but the worker wakeup doesn't actually do anything:
> >>>>>>>> [  249.148009] kworker/-3466    2d... 48550431us :
> finish_task_switch <-__schedule
> >>>>>>>> [  249.148009] kworker/-3466    2.... 48550431us :
> _raw_spin_lock_irq <-worker_thread
> >>>>>>>> [  249.148009] kworker/-3466    2d... 48550431us :
> need_to_create_worker <-worker_thread
> >>>>>>>> [  249.148009] kworker/-3466    2d... 48550432us : worker_enter_idle
> <-worker_thread
> >>>>>>>> [  249.148009] kworker/-3466    2d... 48550432us : too_many_workers
> <-worker_enter_idle
> >>>>>>>> [  249.148009] kworker/-3466    2.... 48550432us : schedule
> <-worker_thread
> >>>>>>>> [  249.148009] kworker/-3466    2.... 48550432us : __schedule
> <-worker_thread
> >>>>>>>>
> >>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at this
> >>>>>>>> point and so the flush work never runs until resume, whereas before
> my
> >>>>>>>> patch the effective dirty limit was high enough so that image could
> be
> >>>>>>>> written in one go without being throttled; followed by an fsync()
> that
> >>>>>>>> then writes the pages in the context of the unfrozen s2disk.
> >>>>>>>>
> >>>>>>>> Does this make sense?  Rafael?  Tejun?
> >>>>>>> Well, it does seem to make sense to me.
> >>>>>>  From what I see, this is a deadlock in the userspace suspend model
> and
> >>>>>> just happened to work by chance in the past.
> >>>>> Well, it had been working for quite a while, so it was a rather large
> >>>>> opportunity
> >>>>> window it seems. :-)
> >>>> No doubt about that, and I feel bad that it broke.  But it's still a
> >>>> deadlock that can't reasonably be accommodated from dirty throttling.
> >>>>
> >>>> It can't just put the flushers to sleep and then issue a large amount
> >>>> of buffered IO, hoping it doesn't hit the dirty limits.  Don't shoot
> >>>> the messenger, this bug needs to be addressed, not get papered over.
> >>>>
> >>>>>> Can we patch suspend-utils as follows?
> >>>>> Perhaps we can.  Let's ask the new maintainer.
> >>>>>
> >>>>> Rodolfo, do you think you can apply the patch below to suspend-utils?
> >>>>>
> >>>>>> Alternatively, suspend-utils
> >>>>>> could clear the dirty limits before it starts writing and restore them
> >>>>>> post-resume.
> >>>>> That (and the patch too) doesn't seem to address the problem with
> existing
> >>>>> suspend-utils
> >>>>> binaries, however.
> >>>> It's userspace that freezes the system before issuing buffered IO, so
> >>>> my conclusion was that the bug is in there.  This is arguable.  I also
> >>>> wouldn't be opposed to a patch that sets the dirty limits to infinity
> >>>> from the ioctl that freezes the system or creates the image.
> >>>
> >>> OK, that sounds like a workable plan.
> >>>
> >>> How do I set those limits to infinity?
> >>
> >> Five years have passed and people are still hitting this.
> >>
> >> Killian described the workaround in comment 14 at
> >> https://bugzilla.kernel.org/show_bug.cgi?id=75101.
> >>
> >> People can use this workaround manually by hand or in scripts.  But we
> >> really should find a proper solution.  Maybe special-case the freezing
> >> of the flusher threads until all the writeout has completed.  Or
> >> something else.
> >
> > I've refreshed my memory wrt this bug and I believe the bug is really on
> > the side of suspend-utils (uswsusp or however it is called). They are low
> > level system tools, they ask the kernel to freeze all processes
> > (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is
> > relatively heavyweight infrastructure) to work. That is wrong in my
> > opinion.
> >
> > I can see Johanness was suggesting in comment 11 to use O_SYNC in
> > suspend-utils which worked but was too slow. Indeed O_SYNC is rather big
> > hammer but using O_DIRECT should be what they need and get better
> > performance - no additional buffering in the kernel, no dirty throttling,
> > etc. They only need their buffer & device offsets sector aligned - they
> > seem to be even page aligned in suspend-utils so they should be fine. And
> > if the performance still sucks (currently they appear to do mostly random
> > 4k writes so it probably would for rotating disks), they could use AIO DIO
> > to get multiple pages in flight (as many as they dare to allocate buffers)
> > and then the IO scheduler will reorder things as good as it can and they
> > should get reasonable performance.
> >
> > Is there someone who works on suspend-utils these days? Because the repo
> > I've found on kernel.org seems to be long dead (last commit in 2012).
> >
> >                                                               Honza
> >
>
> Whether it's suspend-utils (or uswsusp) or not could be answered quickly
> by de-installing this package and using the kernel-methods instead.
>
>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1328727

Title:
  [Asus 1000HE] s2disk/hibernate hangs during saving of image data

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Triaged

Bug description:
  Hibernation with the pm-hibernate command works fine on a freshly
  booted system, but when a significant portion of the RAM is in use,
  for instance by starting gimp, google-chrome and firefox at the same
  time, it stalls at the start of the saving of the image data:

  s2disk: Snapshotting system
  s2disk: System snapshot ready. Preparing to write
  s2disk: Image size: 441652 kilobytes
  s2disk: Free swap: 1798624 kilobytes
  s2disk: Saving 110413 image data pages (press backspace to abort) ...   0%

  The system is not usable at this point, although alt-sysrq still
  works. It looks like some sort of deadlock.

  The system is an Asus eeePC 1000HE with 2GB RAM and a 2GB swap
  partition and a Samsung SSD 840 EVO. The Ubuntu release is 14.04
  LTS. The kernel package is linux-image-3.13.0-27-generic version
  3.13.0-27.50 (32 bit).

  I found a kernel bug report that looks exactly the same:

  https://bugzilla.kernel.org/show_bug.cgi?id=75101

  In that bug report, the problem was bisected to commit a1c3bfb2.
  This commit is part of the 3.14 kernel, but was apparently backported
  to the 3.13 Ubuntu kernel. I've rebuild the kernel package with this
  commit reverted, and it seems to fix this issue for me.
  --- 
  ApportVersion: 2.14.1-0ubuntu3.2
  Architecture: i386
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  dick       1719 F.... pulseaudio
  CurrentDesktop: Unity
  DistroRelease: Ubuntu 14.04
  HibernationDevice: RESUME=UUID=c6b78f80-b4a1-4b1a-91b4-8ae5c41b36f4
  InstallationDate: Installed on 2013-04-28 (408 days ago)
  InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Release i386 (20130424)
  MachineType: ASUSTeK Computer INC. 1000HE
  Package: linux (not installed)
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-27-generic 
root=UUID=54684f57-2a33-403f-ae4d-ad6b3d0168ea ro acpi_osi=Linux 
acpi_backlight=vendor quiet splash vt.handoff=7
  ProcVersionSignature: Ubuntu 3.13.0-27.50hib-generic 3.13.11
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-27-generic N/A
   linux-backports-modules-3.13.0-27-generic  N/A
   linux-firmware                             1.127.2
  Tags:  trusty
  Uname: Linux 3.13.0-27-generic i686
  UpgradeStatus: Upgraded to trusty on 2014-04-26 (45 days ago)
  UserGroups: adm cdrom dialout dip fuse lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 06/24/2009
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 0902
  dmi.board.asset.tag: To Be Filled By O.E.M.
  dmi.board.name: 1000HE
  dmi.board.vendor: ASUSTeK Computer INC.
  dmi.board.version: x.xx
  dmi.chassis.asset.tag: 0x00000000
  dmi.chassis.type: 10
  dmi.chassis.vendor: ASUSTek Computer INC.
  dmi.chassis.version: x.x
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr0902:bd06/24/2009:svnASUSTeKComputerINC.:pn1000HE:pvrx.x:rvnASUSTeKComputerINC.:rn1000HE:rvrx.xx:cvnASUSTekComputerINC.:ct10:cvrx.x:
  dmi.product.name: 1000HE
  dmi.product.version: x.x
  dmi.sys.vendor: ASUSTeK Computer INC.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1328727/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to