Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)

2016-09-22 Thread Peter Zijlstra
On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote:
> On 09/20/2016 03:56 PM, Vineet Gupta wrote:
> >On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
> >>>- is that what perf event grouping is ?
> >>
> >>Again, nope. Perf event groups are single counter (so no implicit
> >>addition) that are co-scheduled on the PMU.
> >
> >I'm not sure I understand - does this require specific PMU/arch support - as 
> >in
> >multiple conditions feeding to same counter.
> 
> My read is that is that what Peter meant was that each event in the
> perf event group is a single counter, so all the events in the group
> are counted simultaneously.  (No multiplexing.)

Right, sorry for the poor wording.

> >Again when you say co-scheduled what do you mean - why would anyone use the 
> >event
> >grouping - is it when they only have 1 counter and they want to count 2
> >conditions/events at the same time - isn't this same as event multiplexing ?
> 
> I'd say it's the converse of multiplexing.  Instead of mapping
> multiple events to a single counter, perf event groups map a set of
> events each to their own counter, and they are active simultaneously.
> I suppose it's possible for the _groups_ to be multiplexed with other
> events or groups, but the group as a whole will be scheduled together,
> as a group.

Correct.

Each events get their own hardware counter. Grouped events are
co-scheduled on the hardware.

You can multiplex groups. But if one event in a group is schedule, they
all must be.



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 4.4 099/118] ARC: uaccess: get_user to zero out dest in cause of fault

2016-09-22 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Vineet Gupta 

commit 05d9d0b96e53c52a113fd783c0c97c830c8dc7af upstream.

Al reported potential issue with ARC get_user() as it wasn't clearing
out destination pointer in case of fault due to bad address etc.

Verified using following

| {
|   u32 bogus1 = 0xdeadbeef;
|   u64 bogus2 = 0xdead;
|   int rc1, rc2;
|
|   pr_info("Orig values %x %llx\n", bogus1, bogus2);
|   rc1 = get_user(bogus1, (u32 __user *)0x4000);
|   rc2 = get_user(bogus2, (u64 __user *)0x5000);
|   pr_info("access %d %d, new values %x %llx\n",
|   rc1, rc2, bogus1, bogus2);
| }

| [ARCLinux]# insmod /mnt/kernel-module/qtn.ko
| Orig values deadbeef dead
| access -14 -14, new values 0 0

Reported-by: Al Viro 
Cc: Linus Torvalds 
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Vineet Gupta 
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/arc/include/asm/uaccess.h |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- a/arch/arc/include/asm/uaccess.h
+++ b/arch/arc/include/asm/uaccess.h
@@ -83,7 +83,10 @@
"2: ;nop\n" \
"   .section .fixup, \"ax\"\n"  \
"   .align 4\n" \
-   "3: mov %0, %3\n"   \
+   "3: # return -EFAULT\n" \
+   "   mov %0, %3\n"   \
+   "   # zero out dst ptr\n"   \
+   "   mov %1,  0\n"   \
"   j   2b\n"   \
"   .previous\n"\
"   .section __ex_table, \"a\"\n"   \
@@ -101,7 +104,11 @@
"2: ;nop\n" \
"   .section .fixup, \"ax\"\n"  \
"   .align 4\n" \
-   "3: mov %0, %3\n"   \
+   "3: # return -EFAULT\n" \
+   "   mov %0, %3\n"   \
+   "   # zero out dst ptr\n"   \
+   "   mov %1,  0\n"   \
+   "   mov %R1, 0\n"   \
"   j   2b\n"   \
"   .previous\n"\
"   .section __ex_table, \"a\"\n"   \



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 4.7 162/184] ARC: uaccess: get_user to zero out dest in cause of fault

2016-09-22 Thread Greg Kroah-Hartman
4.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Vineet Gupta 

commit 05d9d0b96e53c52a113fd783c0c97c830c8dc7af upstream.

Al reported potential issue with ARC get_user() as it wasn't clearing
out destination pointer in case of fault due to bad address etc.

Verified using following

| {
|   u32 bogus1 = 0xdeadbeef;
|   u64 bogus2 = 0xdead;
|   int rc1, rc2;
|
|   pr_info("Orig values %x %llx\n", bogus1, bogus2);
|   rc1 = get_user(bogus1, (u32 __user *)0x4000);
|   rc2 = get_user(bogus2, (u64 __user *)0x5000);
|   pr_info("access %d %d, new values %x %llx\n",
|   rc1, rc2, bogus1, bogus2);
| }

| [ARCLinux]# insmod /mnt/kernel-module/qtn.ko
| Orig values deadbeef dead
| access -14 -14, new values 0 0

Reported-by: Al Viro 
Cc: Linus Torvalds 
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Vineet Gupta 
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/arc/include/asm/uaccess.h |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- a/arch/arc/include/asm/uaccess.h
+++ b/arch/arc/include/asm/uaccess.h
@@ -83,7 +83,10 @@
"2: ;nop\n" \
"   .section .fixup, \"ax\"\n"  \
"   .align 4\n" \
-   "3: mov %0, %3\n"   \
+   "3: # return -EFAULT\n" \
+   "   mov %0, %3\n"   \
+   "   # zero out dst ptr\n"   \
+   "   mov %1,  0\n"   \
"   j   2b\n"   \
"   .previous\n"\
"   .section __ex_table, \"a\"\n"   \
@@ -101,7 +104,11 @@
"2: ;nop\n" \
"   .section .fixup, \"ax\"\n"  \
"   .align 4\n" \
-   "3: mov %0, %3\n"   \
+   "3: # return -EFAULT\n" \
+   "   mov %0, %3\n"   \
+   "   # zero out dst ptr\n"   \
+   "   mov %1,  0\n"   \
+   "   mov %R1, 0\n"   \
"   j   2b\n"   \
"   .previous\n"\
"   .section __ex_table, \"a\"\n"   \



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)

2016-09-22 Thread Vineet Gupta
On 09/22/2016 12:56 AM, Peter Zijlstra wrote:
> On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote:
>> On 09/20/2016 03:56 PM, Vineet Gupta wrote:
>>> On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
> - is that what perf event grouping is ?

 Again, nope. Perf event groups are single counter (so no implicit
 addition) that are co-scheduled on the PMU.
>>>
>>> I'm not sure I understand - does this require specific PMU/arch support - 
>>> as in
>>> multiple conditions feeding to same counter.
>>
>> My read is that is that what Peter meant was that each event in the
>> perf event group is a single counter, so all the events in the group
>> are counted simultaneously.  (No multiplexing.)
> 
> Right, sorry for the poor wording.
> 
>>> Again when you say co-scheduled what do you mean - why would anyone use the 
>>> event
>>> grouping - is it when they only have 1 counter and they want to count 2
>>> conditions/events at the same time - isn't this same as event multiplexing ?
>>
>> I'd say it's the converse of multiplexing.  Instead of mapping
>> multiple events to a single counter, perf event groups map a set of
>> events each to their own counter, and they are active simultaneously.
>> I suppose it's possible for the _groups_ to be multiplexed with other
>> events or groups, but the group as a whole will be scheduled together,
>> as a group.
> 
> Correct.
> 
> Each events get their own hardware counter. Grouped events are
> co-scheduled on the hardware.

And if we don't group them, then they _may_ not be co-scheduled (active/counting
at the same time) ? But how can this be possible.
Say we have 2 counters, both the cmds below

 perf -e cycles,instructions hackbench
 perf -e {cycles,instructions} hackbench

would assign 2 counters to the 2 conditions which keep counting until perf asks
them to stop (because the profiled application ended)

I don't understand the "scheduling" of counter - once we set them to count, 
there
is no real intervention/scheduling form software in terms of disabling/enabling
(assuming no multiplexing etc)

> You can multiplex groups. But if one event in a group is schedule, they
> all must be.


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)

2016-09-22 Thread Paul Clarke

On 09/22/2016 12:50 PM, Vineet Gupta wrote:

On 09/22/2016 12:56 AM, Peter Zijlstra wrote:

On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote:

On 09/20/2016 03:56 PM, Vineet Gupta wrote:

On 09/01/2016 01:33 AM, Peter Zijlstra wrote:

- is that what perf event grouping is ?


Again, nope. Perf event groups are single counter (so no implicit
addition) that are co-scheduled on the PMU.


I'm not sure I understand - does this require specific PMU/arch support - as in
multiple conditions feeding to same counter.


My read is that is that what Peter meant was that each event in the
perf event group is a single counter, so all the events in the group
are counted simultaneously.  (No multiplexing.)


Right, sorry for the poor wording.


Again when you say co-scheduled what do you mean - why would anyone use the 
event
grouping - is it when they only have 1 counter and they want to count 2
conditions/events at the same time - isn't this same as event multiplexing ?


I'd say it's the converse of multiplexing.  Instead of mapping
multiple events to a single counter, perf event groups map a set of
events each to their own counter, and they are active simultaneously.
I suppose it's possible for the _groups_ to be multiplexed with other
events or groups, but the group as a whole will be scheduled together,
as a group.


Correct.

Each events get their own hardware counter. Grouped events are
co-scheduled on the hardware.


And if we don't group them, then they _may_ not be co-scheduled (active/counting
at the same time) ? But how can this be possible.
Say we have 2 counters, both the cmds below

 perf -e cycles,instructions hackbench
 perf -e {cycles,instructions} hackbench

would assign 2 counters to the 2 conditions which keep counting until perf asks
them to stop (because the profiled application ended)

I don't understand the "scheduling" of counter - once we set them to count, 
there
is no real intervention/scheduling form software in terms of disabling/enabling
(assuming no multiplexing etc)


If you assume no multiplexing, then this discussion on grouping is moot.

It depends on how many events you specify, how many counters there are, and 
which counters can count which events.  If you specify a set of events for 
which every event can be counted simultaneously, they will be scheduled 
simultaneously and continuously.  If you specify more events than counters, 
there's multiplexing.  AND, if you specify a set of events, some of which 
cannot be counted simultaneously due to hardware limitations, they'll be 
multiplexed.

PC


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)

2016-09-22 Thread Arnaldo Carvalho de Melo
Em Thu, Sep 22, 2016 at 01:23:04PM -0500, Paul Clarke escreveu:
> On 09/22/2016 12:50 PM, Vineet Gupta wrote:
> >On 09/22/2016 12:56 AM, Peter Zijlstra wrote:
> >>On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote:
> >>>On 09/20/2016 03:56 PM, Vineet Gupta wrote:
> On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
> >>- is that what perf event grouping is ?
> >
> >Again, nope. Perf event groups are single counter (so no implicit
> >addition) that are co-scheduled on the PMU.
> 
> I'm not sure I understand - does this require specific PMU/arch support - 
> as in
> multiple conditions feeding to same counter.
> >>>
> >>>My read is that is that what Peter meant was that each event in the
> >>>perf event group is a single counter, so all the events in the group
> >>>are counted simultaneously.  (No multiplexing.)
> >>
> >>Right, sorry for the poor wording.
> >>
> Again when you say co-scheduled what do you mean - why would anyone use 
> the event
> grouping - is it when they only have 1 counter and they want to count 2
> conditions/events at the same time - isn't this same as event 
> multiplexing ?
> >>>
> >>>I'd say it's the converse of multiplexing.  Instead of mapping
> >>>multiple events to a single counter, perf event groups map a set of
> >>>events each to their own counter, and they are active simultaneously.
> >>>I suppose it's possible for the _groups_ to be multiplexed with other
> >>>events or groups, but the group as a whole will be scheduled together,
> >>>as a group.
> >>
> >>Correct.
> >>
> >>Each events get their own hardware counter. Grouped events are
> >>co-scheduled on the hardware.
> >
> >And if we don't group them, then they _may_ not be co-scheduled 
> >(active/counting
> >at the same time) ? But how can this be possible.
> >Say we have 2 counters, both the cmds below
> >
> > perf -e cycles,instructions hackbench
> > perf -e {cycles,instructions} hackbench
> >
> >would assign 2 counters to the 2 conditions which keep counting until perf 
> >asks
> >them to stop (because the profiled application ended)
> >
> >I don't understand the "scheduling" of counter - once we set them to count, 
> >there
> >is no real intervention/scheduling form software in terms of 
> >disabling/enabling
> >(assuming no multiplexing etc)

So, getting this machine as an example:

[0.067739] smpboot: CPU0: Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz (family: 
0x6, model: 0x3a, stepping: 0x9)
[0.067744] Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge events, 
full-width counters, Intel PMU driver.
[0.067774] ... version:3
[0.067776] ... bit width:  48
[0.06] ... generic registers:  4
[0.067778] ... value mask: 
[0.067779] ... max period: 
[0.067780] ... fixed-purpose events:   3
[0.067781] ... event mask: 0007000f
[0.068694] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.

[root@zoo ~]# perf stat -e 
'{branch-instructions,branch-misses,bus-cycles,cache-misses}' ls a
ls: cannot access 'a': No such file or directory

 Performance counter stats for 'ls a':

   356,090  branch-instructions 

17,170  branch-misses #4.82% of all branches

   232,365  bus-cycles  

12,107  cache-misses


   0.003624967 seconds time elapsed

[root@zoo ~]# perf stat -e 
'{branch-instructions,branch-misses,bus-cycles,cache-misses,cpu-cycles}' ls a
ls: cannot access 'a': No such file or directory

 Performance counter stats for 'ls a':

   branch-instructions 
  (0.00%)
   branch-misses   
  (0.00%)
   bus-cycles  
  (0.00%)
   cache-misses
  (0.00%)
   cpu-cycles  
  (0.00%)

   0.003659678 seconds time elapsed

[root@zoo ~]#

That was as a group, i.e. those {} enclosing it, if you run it with -vv, among
other things you'll see the "group_fd" parameter to the sys_perf_event_open
syscall:

[root@zoo ~]# perf stat -vv -e 
'{branch-instructions,branch-misses,bus-cycles,cache-misses,cpu-cycles}' ls a
sys_perf_event_open: pid 28581  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open: pid 28581  cpu -1  group_fd 3  flags 0x8
sys_perf_event_open: pid 28581  cpu -1  group_fd 3  flags 0x8
sys_perf_event_open: pid 28581  cpu -1  group_fd 3  flags 0x8
sys_perf_event_open: pid 28581  cpu -1  group_fd 3  flags 0x8
ls: cannot access 'a': No such file or directory

 Performance counter stats for 'ls a':

   branch-ins

Re: [PATCH 0/3] ARC unwinder switch to .eh_frame

2016-09-22 Thread Vineet Gupta
Hi Daniel,

On 09/19/2016 11:13 PM, Vineet Gupta wrote:
> On 09/19/2016 06:21 PM, Daniel Mentz wrote:
>> > Hi Vineet,
>> > 
>> > Thank you for your patches. I applied them, and verified that
>> > unwinding works with code that is compiled into the kernel image as
>> > well as kernel modules.
>> > I confirmed that the .eh_frame section is present and that the
>> > .debug_frame section is absent. I also verified that the file size of
>> > the .ko files are small enough for our embedded platform and that
>> > unnecessary sections like .debug_info, .debug_line, .debug_str etc.
>> > are also absent.
>> > Is there anything else you want me to test?
>> > Thanks
>> > Daniel
> Nope - that is it. Just wanted to make sure it is "field" tested :-)
> 
> Thx,
> -Vineet

FYI - I've pushed some more changes to unwinding department so that we get
call-traces for memset/memcpy et al.

git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git  #for-next

-Vineet



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


modules still have .debug_* (was Re: [PATCH 0/3] ARC unwinder switch to .eh_frame)

2016-09-22 Thread Vineet Gupta
Hi Daniel,

On 09/19/2016 06:21 PM, Daniel Mentz wrote:
> I confirmed that the .eh_frame section is present and that the
> .debug_frame section is absent. I also verified that the file size of
> the .ko files are small enough for our embedded platform and that
> unnecessary sections like .debug_info, .debug_line, .debug_str etc.
> are also absent.

BTW it seems with my latest set of patches, modules still have .debug_*.
Can you double check if your tree still has the interim patch which added a 
linker
script for modules to strip out .debug_*

http://lists.infradead.org/pipermail/linux-snps-arc/2016-September/001483.html

I'm not planning to carry it and would prefer addressing the the root cause by
removing the -gdwarf-2 toggle. I've added that and pushed rebased series. Care 
to
take it for a respin please.

Thx,
-Vineet

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: modules still have .debug_* (was Re: [PATCH 0/3] ARC unwinder switch to .eh_frame)

2016-09-22 Thread Daniel Mentz
On Thu, Sep 22, 2016 at 1:59 PM, Vineet Gupta
 wrote:
> Hi Daniel,
>
> On 09/19/2016 06:21 PM, Daniel Mentz wrote:
>> I confirmed that the .eh_frame section is present and that the
>> .debug_frame section is absent. I also verified that the file size of
>> the .ko files are small enough for our embedded platform and that
>> unnecessary sections like .debug_info, .debug_line, .debug_str etc.
>> are also absent.
>
> BTW it seems with my latest set of patches, modules still have .debug_*.
> Can you double check if your tree still has the interim patch which added a 
> linker
> script for modules to strip out .debug_*
>
> http://lists.infradead.org/pipermail/linux-snps-arc/2016-September/001483.html

Hi Vineet,

Sorry, that was a misunderstanding. Buildroot routinely runs the strip
command on .ko files before installing them on the target. I was only
looking at the .ko files *after* running the strip command. No, the
interim patch was not in my tree.

I confirmed that your commit "ARC: dw2 unwind: don't force dwarf 2" is
indeed necessary to suppress the .debug_* sections when
CONFIG_DEBUG_INFO is off. But again, we're stripping .ko files anyways
before installing.

> I'm not planning to carry it and would prefer addressing the the root cause by
> removing the -gdwarf-2 toggle. I've added that and pushed rebased series. 
> Care to
> take it for a respin please.

I downloaded your latest commit
e47305af57d7eedc10b4720e604d669b10c69e3b and verified that stack
traces are properly displayed for code inside kernel modules as well
as vmlinux. I also called memcpy() on some bad address and got a
proper stack trace that involved memcpy().

I conclude that unwinding works for us.

Thank You
 Daniel

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc