** Changed in: numad (Debian)
Status: New => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about thi
Setting package status based on what was released.
---
numad (0.5+20150602-5ubuntu1) eoan; urgency=medium
* d/p/lp-1832915-fix-sparse-node-ids.patch: fix a crash on ppc64el
(LP: #1832915)
-- Christian Ehrhardt Wed, 19 Jun
2019 13:05:33 +0200
** Changed in: numad (Ubuntu Focal)
S
** Changed in: numad (Ubuntu Bionic)
Importance: Undecided => Low
** Changed in: numad (Ubuntu Cosmic)
Importance: Undecided => Low
** Changed in: numad (Ubuntu Disco)
Importance: Undecided => Low
** Changed in: numad (Ubuntu Eoan)
Importance: Undecided => Low
** Tags removed: block
** Changed in: numad (Ubuntu Eoan)
Status: Invalid => Won't Fix
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications abo
Eoan is now EOL. Marking as "won't fix".
** Changed in: numad (Ubuntu Eoan)
Status: Incomplete => Invalid
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while run
This bug was fixed in the package numad - 0.5+20150602-6
---
numad (0.5+20150602-6) unstable; urgency=medium
[ Christian Ehrhardt ]
* d/p/lp-1832915-fix-sparse-node-ids.patch: fix a crash on ppc64el
(LP: #1832915)(Closes: #930725)
[ gustavo panizzo ]
* [0b4115] add patch
** Tags added: hwe-long-running
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about this bug go to:
https://bugs.launchp
--- Comment From mbri...@us.ibm.com 2020-04-06 18:06 EDT---
Reclassifying as P3/low to match 'numad' classification.
** Tags removed: severity-high
** Tags added: severity-low
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
ht
Hi, this bug has a 'sister' bug: LP 1836913
The outcome on the numad discussion in the interlock calls with IBM (based on
these two bugs) was that proper upstream support and fixing from IBM is needed
especially for Power.
Some structural issues where identified that can't be easily fixed, there
--- Comment From lagar...@br.ibm.com 2020-03-20 16:35 EDT---
Hello Canonical,
So, this is still an issue in Ubuntu 20.04, as the last test results
shows. Is this something you would be willing to fix?
** Tags removed: targetmilestone-inin---
** Tags added: targetmilestone-inin2004
--
Yo
** Changed in: numad (Ubuntu Disco)
Status: Incomplete => Won't Fix
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications
** Changed in: numad (Ubuntu Eoan)
Status: New => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about
** Changed in: ubuntu-power-systems
Importance: High => Low
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about this
TBH, I'd not mark this as prio high from our POV.
It is high to "know if something will come back on this" but not the actual
issue.
For the wider Ubuntu community this is just a rarely used universe
package with a somewhat dead upstream - nothing to stress out for IMHO.
It is somewhat important
** Also affects: numad (Ubuntu Focal)
Importance: High
Assignee: bugproxy (bugproxy)
Status: Incomplete
** Also affects: numad (Ubuntu Eoan)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscrib
Marking as incomplete while awaiting for numad upstream Power porting
work.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notification
** Changed in: ubuntu-power-systems
Status: Triaged => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications a
Yeah this is still broken on both machines, sometimes faster sometimes slower
to reproduce.
So to summarize we have bug 1832915 reported and a fix created.
But we also have bug 1836913 and potentially a whole set of bugs due to the
same conceptual mismatches (assumption in code: numa zones would
@Frank - could you make sure in the next calls that the status on these
two issues is clear?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To ma
** Tags removed: block-proposed
** Tags added: block-proposed-bionic block-proposed-disco
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manag
Wichita was updated with the latest Power8 firmware from IBM and is
ready for your testing needs.
Current firwmare version :
P side : FW860.70 (SV860_205)
T side : FW860.70 (SV860_205)
Boot side : FW860.70 (SV860_205)
** Changed in: ubuntu-power-systems
Status: Incomplete => Triaged
--
Y
Marking as incomplete while awaiting resolution to bug 1839065 or bug
1836913.
** Changed in: ubuntu-power-systems
Status: In Progress => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bug
** Tags added: block-proposed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about this bug go to:
https://bugs.launchpad
@Lukasz:
Thanks Lukasz for your thoughts - you confirm my concerns.
Trying to answer your question:
- reproducible
- it failed on our P9 machine at 100%
- I don't have another P9 to check if it is specific to "that" machine or P9
in general
- I was deploying a P8 system to have some compari
Because why I'm worried is that the original bug was only causing issues
for numad under certain conditions, but the package upgrade will trigger
a restart for *all* the instances of using numad. So if numad restart
will cause trouble on all ppc64el cases, I'm worried we might cause more
harm with
I had to sit down and think about this for a moment. The bug with the
service restart seems to only happen on ppc64el, which means the issue
the package upgrade might trigger might have limited impact. On the
other hand, the main target of this bugfix are ppc64el platforms, as
those were the most l
** Changed in: ubuntu-power-systems
Status: Fix Committed => In Progress
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifica
The service worked fine even through a full release upgrade from Bionic
to Debian I saw it moving processes just fine.
When on Disco I pushed some load in the guest to get more movements but
things worked fine still.
Setting verified for Disco.
P.S. I also think I have found the "other" crash t
Summarizing the state:
- numad is universe only and IMHO in a rather bad state
- upstream seems dead for quite some time and does not respond to my patches
- the bug reported here is fixed and verified
- numad seems to have issues on service restart (unrelated to this update)
-> the upgrades to
It seems on the former deployment I hit some memory bug which broke and
stalled quite some allocations. While I haven't found what was causing
that (would be an interesting bug report) the renewed systems seems
good.
And in that environment I was able to verify the fix just as expected.
Sorry for
Hit another crash:
static id_list_p cpu_bind_list_p;
CLEAR_CPU_LIST(cpu_bind_list_p);
But this is a malloc.c(16) it seems this system currently is broken in general.
tcache_get really shouldn't fail here.
Also I have seen hang_checks in dmesg.
I'll redeploy and give all of this a new try.
Hmm, no this must be different.
This is doing:
for (int ix = 0; (ix <= num_nodes); ix++) {
which essentially is 0,1,2
The 2 is odd here, but it seems to break already at
1796 if (ID_IS_IN_LIST(ix, p->node_list_p)) {
and the latter array access would be fine as ix is currently zero
Since this is constructed like:
ADD_ID_TO_LIST(node[0].node_id, target_node_list_p);
I guess this delivers 0 and then 8 in my system
== the node_id instead of the index.
1796 if (ID_IS_IN_LIST(ix, p->node_list_p)) {
1797 proc_avg_node_CPU
New crash that as found is:
#0 0x02375f1bd2c4 in pick_numa_nodes (pid=, cpus=, mbs=, assume_enough_cpus=) at numad.c:1796
1791: numad_log(LOG_DEBUG, "Interleaved MBs: %ld\n", ix,
p->process_MBs[ix]);
1792: } else {
1793: numad_log(
Took a P9 system which has spares nodes:
$ ll /sys/bus/node/devices/node*
lrwxrwxrwx 1 root root 0 Jul 17 06:42 /sys/bus/node/devices/node0 ->
../../../devices/system/node/node0/
lrwxrwxrwx 1 root root 0 Jul 17 06:42 /sys/bus/node/devices/node8 ->
../../../devices/system/node/node8/
Install and
** Changed in: ubuntu-power-systems
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifica
Hello bugproxy, or anyone else affected,
Accepted numad into disco-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/numad/0.5+20150602-5ubuntu0.19.04.1
in a few hours, and then in the -proposed repository.
Please help us by testing this new package. S
MP reviews complete, uploaded to Bionic/Disco unapproved
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about this bug go
** Description changed:
+ [Impact]
+
+ * The numad code never considered that node IDs could not be sequential
+ and creates an out of array access.
+
+ * Fix the array index usage to not hit that
+
+ [Test Case]
+
+ 0. The most important and least available ingredient to this issue are
** Changed in: ubuntu-power-systems
Status: Incomplete => In Progress
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notificatio
This bug was fixed in the package numad - 0.5+20150602-5ubuntu1
---
numad (0.5+20150602-5ubuntu1) eoan; urgency=medium
* d/p/lp-1832915-fix-sparse-node-ids.patch: fix a crash on ppc64el
(LP: #1832915)
-- Christian Ehrhardt Wed, 19 Jun
2019 13:05:33 +0200
** Changed in: numa
Two new MPs for Bionic/Disco uploads:
-
https://code.launchpad.net/~paelzer/ubuntu/+source/numad/+git/numad/+merge/370043
-
https://code.launchpad.net/~paelzer/ubuntu/+source/numad/+git/numad/+merge/370044
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is s
** Merge proposal linked:
https://code.launchpad.net/~paelzer/ubuntu/+source/numad/+git/numad/+merge/370043
** Merge proposal linked:
https://code.launchpad.net/~paelzer/ubuntu/+source/numad/+git/numad/+merge/370044
--
You received this bug notification because you are a member of Ubuntu
Uploaded to Eoan ...
** Changed in: numad (Ubuntu Cosmic)
Status: New => Won't Fix
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To mana
Ok, thanks bssrikanth!
That means we can go on with the SRU.
I'm still sort of frightened by upstream numad seeming dead, but the fix
seems clear and now is confirmed to work for you which allows us to go
on.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which i
@bssrikanth many thanks for testing and feedback!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about this bug go to:
ht
Hi,
any updates on this one?
All I could reproduce would be fixed with the suggested change, but
since according to you that isn't sufficient I now need you to debug
your case and/suggest add whatever change on top that you need.
After fixing the bug that I could identify I'd hate if this goes in
** Changed in: numad (Debian)
Status: Unknown => New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about this bug
Interesting, for me the issue was no more reproducible with the fix
applied.
Maybe there is another bug in the same code that you hit now.
Could you tell me all details about the involved setup in triggering this crash
still?
Further this should have created a crash dump /var/crash/.
Probably be
Reported to Debian (linked above) and prepared an MP for Eoan for team
review.
But still waiting for your ok @IBM that this solves your case.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Tit
** Merge proposal linked:
https://code.launchpad.net/~paelzer/ubuntu/+source/numad/+git/numad/+merge/369036
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while runni
** Bug watch added: Debian Bug tracker #930725
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930725
** Also affects: numad (Debian) via
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930725
Importance: Unknown
Status: Unknown
--
You received this bug notification because y
** Also affects: numad (Ubuntu Eoan)
Importance: High
Assignee: Canonical Server Team (canonical-server)
Status: Incomplete
** Also affects: numad (Ubuntu Bionic)
Importance: Undecided
Status: New
** Also affects: numad (Ubuntu Cosmic)
Importance: Undecided
Stat
** Changed in: numad (Ubuntu)
Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) =>
Canonical Server Team (canonical-server)
** Changed in: ubuntu-power-systems
Assignee: Manoj Iyer (manjo) => Canonical Server Team (canonical-server)
** Changed in: ubuntu-power-syst
I have made a test build with the fix available at PPA [1]. It resolves
the issue for me, but before going further please give that a try with
your setups as well.
Further I opened a PR for upstream at [2] to discuss it there as well.
Feel free to chime in and give it a +1 there if it works well f
@JFH/Manjo - the bug assignment is odd can you please set it up the way
you need it to reflect that we are waiting on Upstream (ack on PR) and
IBM (test PPA) ?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/
While this numbering is pretty common at power (all non SMT systems) and s390x
(scaling #cpus on load) it is uncommon on x86. Never the less in theory the
issue should exist there as well
But I tried this for an hour and it didn't trigger (plenty of assigns happened)
Repro (x86)
1. Get a KVM gue
Rebuild via:
rm numad
cc -g -O0 -fstack-protector-strong -std=gnu99 -I. -D__thread="" -Wdate-time
-D_FORTIFY_SOURCE=2 -c -o numad.o numad.c cc -Wl,-Bsymbolic-functions
-Wl,-z,relro -Wl,-z,now numad.o -lpthread -lrt -lm -o numad
ls -laF numad
sudo mv numad /usr/bin/numad
My current config trig
While I built a proper PPA in [1] this seems so trivial that we can
rebuild locally with just
$ cc -std=gnu99 -I. -D__thread="" -c -o numad.o numad.c
$ cc numad.o -lpthread -lrt -lm -o numad
$ mv numad /usr/bin/numad
That should allow quick iterations.
With debug enabled I found that the s
[1]: https://launchpad.net/~paelzer/+archive/ubuntu/bug-1832915-numad-debugging
[2]: https://linux.die.net/man/3/cpu_count_s
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashe
The problem is that node[node_id].cpu_list_p is wrong.
When you look at the array again it has two real entries and nothing more:
(gdb) p node[0]
$20 = {node_id = 0, MBs_total = 65266, MBs_free = 1510, CPUs_total = 2000,
CPUs_free = 1144, magnitude = 1727440, distance = 0x304a3a41850
"\n(\032\n\
Chances are that without the odd SMT=off numbering on ppc things would work.
That might explain why this didn't fail more often or on other architectures so
far.
But disabling subset of CPUs is allowed, so this needs to be fixed for
all - no matter how "often" an issue occurs on one of the archit
** Tags added: universe
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about this bug go to:
https://bugs.launchpad.net/u
Note:
- numad is "only" in universe in all releases
- nothing depends on it
- it is on 0.5+20150602-5 which seems rather old
- But upstream commits [1] since 2015 are minimal
TL;DR no (somewhat dead) upstream fix for that available to cherry-pick
Hmm, the above might have been a red herring.
I ge
One fail was at:
CLEAR_CPU_LIST(cpu_bind_list_p);
The next two at:
OR_LISTS(cpu_bind_list_p, cpu_bind_list_p, node[node_id].cpu_list_p);
The common denominator here is cpu_bind_list_p but that is a static local:
static id_list_p cpu_bind_list_p;
The function is defined as:
#define OR_LISTS( or
(gdb) p cpu_bind_list_p->bytes
$5 = 24
(gdb) p *(cpu_bind_list_p->set_p)
$7 = {__bits = {1229782938247303441, 4369, 0, 49, 1955697441360, 274,
303148778372988952, 139284342967816, 48, 337, 1955697440432, 1955697400112,
1955697400080, 1955697400048,
1955697400016, 1955697400512}}
(gdb) p size
# get debug symbols and gdb
$ sudo apt install numad-dbgsym gdb dpkg-dev
# get source as used in the package
$ apt source numad
# I found that we will also need glibc source, so:
$ apt source glibc
It helps to add paths to gdb
(gdb) directory
/home/ubuntu/numad-0.5+20150602:/home/ubuntu/glibc-2.2
** Changed in: ubuntu-power-systems
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832915
Title:
numad crashes while running kvm guest
To manage notifications about
With verbose my numad log file is:
Mon Jun 17 06:22:53 2019: Nodes: 2
Min CPUs free: 1416, Max CPUs: 1423, Avg CPUs: 1419, StdDev: 3.53553
Min MBs free: 12869, Max MBs: 13756, Avg MBs: 13312, StdDev: 443.5
Node 0: MBs_total 65266, MBs_free 12869, CPUs_total 2000, CPUs_free 1416,
Distance: 10 40
On a fresh Bionic running with the latest 4.15.0-51-generic I did the following
trying to reproduce this issues.
Note: My Host has 128G mem and 40 cores (SMT off)
1. installed numad
2. started the numad service and verified it runs fine
3. I spawned two Guests with 20 cores and 50G each (since th
** Also affects: ubuntu-power-systems
Importance: Undecided
Status: New
** Changed in: ubuntu-power-systems
Assignee: (unassigned) => Manoj Iyer (manjo)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launc
71 matches
Mail list logo