Hello Gavin

Yes I tested the fix suggested by Dimitri, and it didn't work.
The system hung during boot.

After doing this test, I did more testings in other directions.

- First, I searched the build that introduced the regression.
I tested builds 55, 87, that worked fine.
And all build after snv_87 fail to start.

I tested a precompiled kernel (bfu archive) in debug mode.
It failed with this assertion message :
panic[cpu0]/thread=fffffffffbc3ae20: assertion failed:
PFN_2_MEM_NODE(pp->p_pagenum) == mnode, file: ../../common/vm/vm_pagelist.c,
line: 2922
Looking at vm_pagelist.c near line 2922 made me think of a kind of kernel 
memory corruption.

- I then tried to isolate the changeset that broke the boot in b88.
So I took the Full html changelog of the build 88, and selected a short list of 
changes that could break the boot.
I selected 6 changesets.
I built 6 kernel based on snv_87 with the addition of each candidate changeset 
(separately), and tested each.
The 3rd changeset (that I listed above in the thread) was the one that broke 
the boot.

- I then did more tests to confirm that discovery :
* When I bfu a snv87 precompiled archive. The system boots and works without 
problem.
* When I bfu a snv88 precompiled archive. The system fails to boot.
* When I compile a snv88, and bfu the result. the server fails to boot.
* When I compile a snv87, and bfu the result the server works fine.
* When I compile a snv87 plus the above changeset. the bfu'ed server fails to 
boot.
* When I compile a snv88 minus the above changeset, the bfu'ed server works 
fine. 

- Then I started the kernel in debug mode, and tried to trace the execution.
I noticed that when the function lgrp_plat_process_srat was called, the kernel 
crash (much later in the boot process).
I then tried to modify the lgrpplat.c to bypass this call.
This modification was tested on build 91 and 99 and 103 and worked fine.

I can send you a full debug log if you want.

I know this is not a fix.
But at least, the system boots.
I have to look futher into the lgrp.c code to see the impact of not running 
lgrp_plat_process_srat, and why it break the vm subsystem later in the boot 
process.

Thanks

Guy
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to