On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> On Mon, 2018-07-02 at 10:57 -0700, Vineet Gupta wrote:
>> +CC Al
>>
>> On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
>>> We process signals in the end of syscall/exception handler.
>>> It the signal is fatal we print register's content using
>>> show_regs function. show_regs() also prints information about
>>> last exception happened.
>>>
>>> In case of multicore system we can catch the situation when we
>>> will print wrong information about exception. See the example:
>>> ______________________________
>>> CPU-0: started to handle page fault
>>> CPU-1: sent signal to process, which is executed on CPU-0
>>> CPU-0: ended page fault handle. Started to process signal before
>>>        returnig to userspace. Process signal, which is send from
>>>        CPU-0. As th signal is fatal we call show_regs().
>>>        show_regs() will show information about last exception
>>>        which is *page fault* (instead of "trap" which is used for
>>>        signals and happened on CPU-0)
>>>
>>> So we will get message like this:
>>>     /home/waitpid02
>>>   potentially unexpected fatal signal 8.
>>>   Path: /home/waitpid02
>>>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>>>   task: 9f11c200 task.stack: 9f3ae000
>>>
>>>   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
>>>   [EFA   ]: 0x00000000
>>>   [BLINK ]: 0x123ea
>>>   [ERET  ]: 0x123ec
>>>     @off 0x123ec in [/home/waitpid02]
>>>     VMA: 0x00010000 to 0x00016000
>>>   [STAT32]: 0x80080882 : IE U
>>>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
>>>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
>>>   [-----other-info-----]
>>>
>>> This message is confusing because it show information about page fault
>>> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
>>> to signal.
>> Agreed this is misleading. @Al, is there a way to identify process 
>> termination
>> from signals because it did something wrong vs. say unhandled signal. For 
>> former,
>> we want to dump additional info in show_regs() such as PC / Fault addres etc 
>> and
>> not in other scenario.
>>
>>> This situation was reproduced with waitpid02 LTP test.
>>> _____________________________
>>>
>>> So remove printing information about exceptions from show_regs()
>>> to avoid confusing messages. Print information about exceptions
>>> only in required places instead of show_regs()
>>>
>>> Now we don't print information about exceptions if signal is simply
>>> send by another userspace app. So in case of waitpid02 we will print
>>> next message:
>>> _____________________________
>>>     ./waitpid02
>>>   potentially unexpected fatal signal 8.
>>>   [STAT32]: 0x80080082 : IE U
>>>   BTA: 0x20000fc4    SP: 0x5ff8bd64  FP: 0x00000000
>>>   LPS: 0x200524a0   LPE: 0x200524b6 LPC: 0x00000006
>>>   [-----other-info-----]
>>> _____________________________
>> The prints I'm seeing now, for a segv from NULL pointer access is even more
>> confusing !
>> There's a mixup of prints....
>>
>> -------------------->8--------------------
>> Path: /segv
>> CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
>>
>> [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
>> [EFA   ]: 0x00000000
>> [BLINK ]: 0x20047bb0
>> [ERET  ]: 0x103ac
>>     @off 0x103ac in [/segv]
>>     VMA: 0x00010000 to 0x00012000
>>
>> potentially unexpected fatal signal 11.
>> [STAT32]: 0x80080882 : IE U    
>> BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
>> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
>> r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
>> r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
>> ...
>> -------------------->8--------------------
>>
>> and for the process killed by signal 8, we get below.
>>
>> -------------------->8--------------------
>> [ARCLinux]# kill -8 71
>> [ARCLinux]# potentially unexpected fatal signal 8.
>> [STAT32]: 0x80080882 : IE U    
>> BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
>> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
>> r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
>> r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
>> r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
>> ...
>>
>>
>> [1]+  Floating point exception   ./sleep
>> -------------------->8--------------------
>> I'm not sure whats the improvement here vs. the status quo.
> Why do you think this is confusing?
> The main change is that we don't print exception registers for signal based 
> kill.

For the pure signal based termination, what is the point of printing the rest of
registers. If you say "it is to give a feel of what the userspace was doing at 
the
time...." then we are lacking the most crucial piece which is the PC at the time
(i.e. ERET placeholder).

> Moreover, new behavior is more like x86-64 behavior. See the example:

For a null pointer based termination, we now have a ugly looking "potentially
unexpected..." print in the middle of reg file dump, that is not what x86 does.
Anyways, that print It is undesirable, but not a deal breaker.  The issue is 
point
above, can we remedy it.

BTW in your original patch, for a null pointer access, the printing code now
allocates 2 pages, in each of show_xxx routines and the one in show_regs() is 
now
pointless, as it is not used there at all there - so please fix that as well.

-Vineet

_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Reply via email to