Re: [GSoC] Application RFC + Question - GENERIC dump

2024-04-02 Thread Richard Biener via Gcc
On Mon, Apr 1, 2024 at 6:23 PM Thor Preimesberger via Gcc
 wrote:
>
> Hey all,
>
> I'm joining the group of people submitting their GSoC applications
> over the holiday. I'm interested in the "Implement structured dumping
> of GENERIC" project idea, and the application I've written is below.

Thank you for the interest in this project.

> A quick question before though:
>
> - What would the expected use cases of the proposed
> -fdump-generic-nodes option be, in addition to, presumably, writing
> front ends into gcc?

I think the main use case is better "visual" debugging and understanding
of GENERIC.  Then a structured dumping would also allow to custom
processing like doing memory or other statistics.

>I'm also curious about also targeting .gz/Graphviz; on a first
> blush, it doesn't seem like too much additional work, and it may be
> useful for the above applications. But I imagine there may be other
> ways to process the data that would ultimately be more useful.

Reading your message top-down I think that dumping in a structured format like
JSON would allow targeting graphviz as postprocessing.

> Best,
> Thor Preimesberger
>
> 
>
>
> Background:
>
> I'm an undergraduate student in pure mathematics who tinkers with
> technology in his free time. I've taken an interest in compilers as of
> last summer. I've written a couple of parsers, by hand, and a couple
> of toy passes in LLVM. I'm currently working through the code
> generation parts of the Dragon Book, in between my current course
> work. I'm familiar with C and C++, and I'm currently taking courses
> (on quantum information science, digital design, and computer
> architecture) that focus on topics adjacent or tertiary to compiler
> engineering.
> In the mathematical part of my life, I mostly concentrate on
> geometric analysis, and I've taken a few post graduate courses, on
> Ricci flow and on variational/geometric partial differential
> equations. These topics don't really capture all the mathematics I'm
> interested in, and I don't think any of this academic work is directly
> relevant to this project. But I hope that it conveys that I enjoy
> deep, technical work that interpolates between multiple levels of
> abstraction.
> I believe compiler engineering shares this same aesthetic appeal.
> This - and the pragmatic, altruistic nature of compiler engineering -
> draws me to the field and to GCC in particular.
>
>
> Expected Outcome:
> - A patch in the main GCC repository that adds an additional dump
> option (-fdump-generic-nodes) for the GENERIC intermediate
> representation that preserves it's tree structure before it is lowered
> to GIMPLE. We want to initially target JSON, and then provide a
> human-readable translation into HTML.
>
> Additional features/visualizations are possible, but I would need
> to discuss them with the mentor, Richard Biener.
>
> Timeline:
>
> Pre-GSoC
>
> I've already built gcc, with and without offloading, and have
> successfully passed the tests under make-gcc. (Well, most of the tests
> for the version of GCC with offloading - I understand that that is to
> be expected.) I'm currently compiling some nontrivial programs to
> various stages of completion, and toying with them in GDB and with
> debug options.
>
> After this, I want to better understand the architecture of GCC
> and it's intermediate representations. I would achieve this by reading
> the documentation and dumping lots of code.
>
> Contemporaneously, I would want to at least understand, if not
> possibly fix, a few items/bugs in the GCC Bugzilla. Here are two I
> want to look at, picked wholly by individual interest:
>
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38833
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97444
>
> (I'm happy to take a look at any issues anyone recommends -
> especially if they are more germane to the project than the above!)

I don't remember any particular bugs around dumping of GENERIC but
there are bugs tagged with the easyhack keyword.

Personally I find memory-hog and compile-time hog issues rewarding
to fix and at times interesting to understand (tiny) bits of GCC very
well.

> GSoC (Starting the week of May 27th, to August 26th)
>
> Community Bonding
>
> Understand the previously submitted patch in (
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646295.html )
> Understand all the intended uses of the project
> Scope out possible augmentations and begin researching them to the
> project after discussing with mentor.
> Continue patching effort, if it's not finished.
>
>
> Weeks 1-4
> Begin working on minimal viable product (GENERIC to JSON, and JSON to HTML)
> Finish scoping possible augmentations by week 4,
> Begin development on any augmentations once approval is obtained
>
> Weeks 4 - 8
> Continue working on minimal viable product
>
> Weeks 8 - 13
> Complete minimal viable product if it is not finis

Fwd: [GSoC] Application RFC + Question - GENERIC dump

2024-04-02 Thread Thor Preimesberger via Gcc
Forgot to CC the mailing list - mea culpa.

-- Forwarded message -
From: Thor Preimesberger 
Date: Tue, Apr 2, 2024 at 5:57 PM
Subject: Re: [GSoC] Application RFC + Question - GENERIC dump
To: Richard Biener 


Thanks for the quick feedback, especially on such short notice - I'll
get the actual GSoC application in, within a couple of hours.

> This looks like a reasonable timeline and overall project structure.  I 
> probably
> pointed to it in my responses to the initial patch but to repeat here
> it would be
> very nice to integrate with the existing GENERIC dumping in 
> tree-pretty-print.cc
> That's what you get when calling 'debug_tree ()' from the inferior 
> inside
> gdb.  Implementation-wise the JSON target would then be a new
> dump flag (see the various TDF_* in dumpfiles.h).

Understood - To check my understanding, is it correct to say that we
essentially want to parse the output of tree-pretty-print.cc into
JSON? Then, this parser then would be used in a new dump pass, from
either inside gdb or from gcc itself, to dump the JSON wherever it
needs to go.

So ultimately the idea is to create both the parser and a new dump pass from it.

I just read the notes you gave on the original patch. I'll edit the
plan a bit to emphasize interworking with tree-pretty-print, and
checking the bugs that mention easyhack.

Best,
Thor Preimesberger


On Tue, Apr 2, 2024 at 4:20 PM Richard Biener
 wrote:
>
> On Mon, Apr 1, 2024 at 6:23 PM Thor Preimesberger via Gcc
>  wrote:
> >
> > Hey all,
> >
> > I'm joining the group of people submitting their GSoC applications
> > over the holiday. I'm interested in the "Implement structured dumping
> > of GENERIC" project idea, and the application I've written is below.
>
> Thank you for the interest in this project.
>
> > A quick question before though:
> >
> > - What would the expected use cases of the proposed
> > -fdump-generic-nodes option be, in addition to, presumably, writing
> > front ends into gcc?
>
> I think the main use case is better "visual" debugging and understanding
> of GENERIC.  Then a structured dumping would also allow to custom
> processing like doing memory or other statistics.
>
> >I'm also curious about also targeting .gz/Graphviz; on a first
> > blush, it doesn't seem like too much additional work, and it may be
> > useful for the above applications. But I imagine there may be other
> > ways to process the data that would ultimately be more useful.
>
> Reading your message top-down I think that dumping in a structured format like
> JSON would allow targeting graphviz as postprocessing.
>
> > Best,
> > Thor Preimesberger
> >
> > 
> >
> >
> > Background:
> >
> > I'm an undergraduate student in pure mathematics who tinkers with
> > technology in his free time. I've taken an interest in compilers as of
> > last summer. I've written a couple of parsers, by hand, and a couple
> > of toy passes in LLVM. I'm currently working through the code
> > generation parts of the Dragon Book, in between my current course
> > work. I'm familiar with C and C++, and I'm currently taking courses
> > (on quantum information science, digital design, and computer
> > architecture) that focus on topics adjacent or tertiary to compiler
> > engineering.
> > In the mathematical part of my life, I mostly concentrate on
> > geometric analysis, and I've taken a few post graduate courses, on
> > Ricci flow and on variational/geometric partial differential
> > equations. These topics don't really capture all the mathematics I'm
> > interested in, and I don't think any of this academic work is directly
> > relevant to this project. But I hope that it conveys that I enjoy
> > deep, technical work that interpolates between multiple levels of
> > abstraction.
> > I believe compiler engineering shares this same aesthetic appeal.
> > This - and the pragmatic, altruistic nature of compiler engineering -
> > draws me to the field and to GCC in particular.
> >
> >
> > Expected Outcome:
> > - A patch in the main GCC repository that adds an additional dump
> > option (-fdump-generic-nodes) for the GENERIC intermediate
> > representation that preserves it's tree structure before it is lowered
> > to GIMPLE. We want to initially target JSON, and then provide a
> > human-readable translation into HTML.
> >
> > Additional features/visualizations are possible, but I would need
> > to discuss them with the mentor, Richard Biener.
> >
> > Timeline:
> >
> > Pre-GSoC
> >
> > I've already built gcc, with and without offloading, and have
> > successfully passed the tests under make-gcc. (Well, most of the tests
> > for the version of GCC with offloading - I understand that that is to
> > be expected.) I'm currently compiling some nontrivial programs to
> > various stages of completion, and toying with them in GDB and with
> > debug options.
> >
> > After this, I want to better understand the architecture of GCC
>

Re: [GSoC] Application RFC + Question - GENERIC dump

2024-04-02 Thread Richard Biener via Gcc
On Tue, Apr 2, 2024 at 11:14 AM Thor Preimesberger via Gcc
 wrote:
>
> Forgot to CC the mailing list - mea culpa.
>
> -- Forwarded message -
> From: Thor Preimesberger 
> Date: Tue, Apr 2, 2024 at 5:57 PM
> Subject: Re: [GSoC] Application RFC + Question - GENERIC dump
> To: Richard Biener 
>
>
> Thanks for the quick feedback, especially on such short notice - I'll
> get the actual GSoC application in, within a couple of hours.
>
> > This looks like a reasonable timeline and overall project structure.  I 
> > probably
> > pointed to it in my responses to the initial patch but to repeat here
> > it would be
> > very nice to integrate with the existing GENERIC dumping in 
> > tree-pretty-print.cc
> > That's what you get when calling 'debug_tree ()' from the inferior 
> > inside
> > gdb.  Implementation-wise the JSON target would then be a new
> > dump flag (see the various TDF_* in dumpfiles.h).
>
> Understood - To check my understanding, is it correct to say that we
> essentially want to parse the output of tree-pretty-print.cc into
> JSON?

No, we want to emit JSON directly from tree-pretty-print.cc conditional
of the new dump flag.

> Then, this parser then would be used in a new dump pass, from
> either inside gdb or from gcc itself, to dump the JSON wherever it
> needs to go.

For the actual -fdump-generic-nodes we would call the dumper with the
new flag and likely have set up the output to a file.

> So ultimately the idea is to create both the parser and a new dump pass from 
> it.

I don't think there's a parser involved, in the end we'd have to
"parse" the JSON
to produce HTML or graphviz output, but the JSON emission would be from
inside dump_generic_node (and sibliings), conditional on the dump flag.  Note
that a lot of things will be dumped the same such as identifiers or constants,
but all structured bits would be different.

Richard.

> I just read the notes you gave on the original patch. I'll edit the
> plan a bit to emphasize interworking with tree-pretty-print, and
> checking the bugs that mention easyhack.
>
> Best,
> Thor Preimesberger
>
>
> On Tue, Apr 2, 2024 at 4:20 PM Richard Biener
>  wrote:
> >
> > On Mon, Apr 1, 2024 at 6:23 PM Thor Preimesberger via Gcc
> >  wrote:
> > >
> > > Hey all,
> > >
> > > I'm joining the group of people submitting their GSoC applications
> > > over the holiday. I'm interested in the "Implement structured dumping
> > > of GENERIC" project idea, and the application I've written is below.
> >
> > Thank you for the interest in this project.
> >
> > > A quick question before though:
> > >
> > > - What would the expected use cases of the proposed
> > > -fdump-generic-nodes option be, in addition to, presumably, writing
> > > front ends into gcc?
> >
> > I think the main use case is better "visual" debugging and understanding
> > of GENERIC.  Then a structured dumping would also allow to custom
> > processing like doing memory or other statistics.
> >
> > >I'm also curious about also targeting .gz/Graphviz; on a first
> > > blush, it doesn't seem like too much additional work, and it may be
> > > useful for the above applications. But I imagine there may be other
> > > ways to process the data that would ultimately be more useful.
> >
> > Reading your message top-down I think that dumping in a structured format 
> > like
> > JSON would allow targeting graphviz as postprocessing.
> >
> > > Best,
> > > Thor Preimesberger
> > >
> > > 
> > >
> > >
> > > Background:
> > >
> > > I'm an undergraduate student in pure mathematics who tinkers with
> > > technology in his free time. I've taken an interest in compilers as of
> > > last summer. I've written a couple of parsers, by hand, and a couple
> > > of toy passes in LLVM. I'm currently working through the code
> > > generation parts of the Dragon Book, in between my current course
> > > work. I'm familiar with C and C++, and I'm currently taking courses
> > > (on quantum information science, digital design, and computer
> > > architecture) that focus on topics adjacent or tertiary to compiler
> > > engineering.
> > > In the mathematical part of my life, I mostly concentrate on
> > > geometric analysis, and I've taken a few post graduate courses, on
> > > Ricci flow and on variational/geometric partial differential
> > > equations. These topics don't really capture all the mathematics I'm
> > > interested in, and I don't think any of this academic work is directly
> > > relevant to this project. But I hope that it conveys that I enjoy
> > > deep, technical work that interpolates between multiple levels of
> > > abstraction.
> > > I believe compiler engineering shares this same aesthetic appeal.
> > > This - and the pragmatic, altruistic nature of compiler engineering -
> > > draws me to the field and to GCC in particular.
> > >
> > >
> > > Expected Outcome:
> > > - A patch in the main GCC repository that adds an additional dump
> > > option (

Re: GSoC Timeline Review

2024-04-02 Thread David Malcolm via Gcc
On Sat, 2024-03-30 at 13:54 +0200, Nada Elsayed wrote:
> I think that I didn't fully understand the project, so I read more
> and
> updated the Timeline Suggestion.

Hi Nada

I'm very sorry for not responding sooner; I've been dealing with an 
difficult issue that's arisen outside of my computer work, but that's a
poor excuse.

The deadline for applications is in a few hours time, so please go
ahead and get an application in if you haven't done so already.  Google
are very strict about the deadlines.

Amongst other things, a good application should:

* describe/give evidence of your ability/skills in C++ (e.g. do you
have a github account?)
* describe/give evidence of your knowledge of the CPython extension
API.  I think you mentioned that you had done a experimented with this
to get a feel for it, and wrote some toy examples; can you send me the
code you wrote please?

If you've already posted an application, feel free to send this info
separately to me (and I'm sorry again for leaving my reply so late).

Have you tried building GCC from source yet?  That would be a good
thing to do (and to mention in an application).

Various notes inline below, throughout...

> 
> Suggested Timeline:
> 
>    -
> 
>    May 1-26:
>    -
> 
>   Explore Cython modules and try more realistic codes to see how
> it
>   translates Python to c/c++.


Note that "Cython" and "CPython" are different things.

"CPython" is the C implementation of Python (i.e. the one that most
people use, as opposed to, say, PyPy, which is a different
implementation of the language used by advanced users with performance
requirements).

"Cython" is a tool for generating .c source files for CPython extension
modules from a .pyx language that's a mixture of C and Python-like
syntax.

The project should primarily be about CPython extension modules.  The
code generated by Cython tends to be correct, so I'm much less
interested in analyzing it.

So in your proposal above it should talk about CPython, rather than
Cython.

>   -
> 
>   Know more about entry-points that Cython uses.

Similarly here.



>   -
> 
>   Understand common bugs that happen when converting Python to
> c/c++.
> 
> 
> 
>    -
> 
>    Explore static analysis tool for CPython Extension code -which is
>    written in Python- and try this analyzer to understand the bugs in
>    translated Python code fully.

Sadly this old project of mine has been bit-rotting for years, and
doesn't work well anymore.  But hopefully it's still useful for getting
ideas.

>    -
> 
>    Know how we can emit warnings and errors.
> 
> 
> 
>    -
> 
>    Debug the codebase to grasp its structure and potential areas for
>    improvement.

I'd like us also to create a page on the gcc wiki to capture some of
the ideas.

> 
> 
>    -
> 
>    Weeks 1-2:
>    -
> 
>   Understand more about reference counting verification.
>   -
> 
>   Develop verifying reference counts for PyObjects passed as
> parameters.
>   -
> 
>    Weeks 3-4:
>    -
> 
>   Begin to investigate Error Handling Checking.
>   -
> 
>   Understand how the Static Analysis tool does Error Handling
> checking.
>   -
> 
>   Implement these checks in the plugin.
>   -
> 
>    Weeks 5-7:
>    -
> 
>   Begin to investigate Exception Handling Checking.
>   -
> 
>   Understand how the Static Analysis tool does Exception Handling
>   checking.
>   -
> 
>   Implement these checks in the plugin.
>   -
> 
>    Weeks 8-11
>    -
> 
>   Begin to investigate Format String Checking.
>   -
> 
>   Understand how the Static Analysis tool does Format String
> Checking.
>   -
> 
>   Implement these checks in the plugin.
>   -
> 
>    Week 12
>    -
> 
>   Writing the GSoC wrapping-up document.

This timeline is very ambitious; last year Eric spent a lot of time
simply understanding the way the analyzer represents the state of the
program.

> 
> 
> ‫في الأربعاء، 27 مارس 2024 في 2:31 ص تمت كتابة ما يلي بواسطة ‪Nada
> Elsayed‬‏ <‪nadaelsayed...@gmail.com‬‏>:‬
> 
> > Greetings All,
> > Hope this email finds you well.
> > I am interested in "Extend the plugin to add checking for usage of
> > the
> > CPython API" project. First of all, I built the library, and now I
> > am
> > trying to debug it. Then, I also used Cpython in 3 demos to
> > understand how
> > it works. Finally, I read the uploaded patch comments to understand
> > the
> > codebase and file structure.

As I mentioned above, please send me the demo code you wrote.

What timezone are you in?  (I'm in EDT, UTC+4)

Sorry again for now responding sooner, and if you haven't applied yet,
you should do that ASAP as it looks like the deadline is in 4 hours
time; prioritize getting that in over responding to my other questions.

Let me know if you need help with anything.

Thanks
Dave


> > 
> > I was wondering if you could review my suggested timeline?
> > suggested Timeline:
> > 
> >    -
> > 
> >    May 

Re: GSoC Timeline Review

2024-04-02 Thread David Malcolm via Gcc
On Tue, 2024-04-02 at 10:06 -0400, David Malcolm wrote:
> What timezone are you in?  (I'm in EDT, UTC+4)

Sorry, that should be UTC-4 (on the east coast of the US)

Dave




Re: GSoC Timeline Review

2024-04-02 Thread Martin Jambor
Hello,

On Sat, Mar 30 2024, Nada Elsayed via Gcc wrote:
> I think that I didn't fully understand the project, so I read more and
> updated the Timeline Suggestion.

Sorry that we were for not being able to respond sooner, Easter got into
way in an unfortunate way.  I do not know much about Cython or static
analysis (so I would not be able to give much advice about improvements
to the time-line), but can confirm we have received your application.

Thanks a lot.

Martin

>
> Suggested Timeline:
>
>-
>
>May 1-26:
>-
>
>   Explore Cython modules and try more realistic codes to see how it
>   translates Python to c/c++.
>   -
>
>   Know more about entry-points that Cython uses.
>   -
>
>   Understand common bugs that happen when converting Python to c/c++.
>
>
>
>-
>
>Explore static analysis tool for CPython Extension code -which is
>written in Python- and try this analyzer to understand the bugs in
>translated Python code fully.
>-
>
>Know how we can emit warnings and errors.
>
>
>
>-
>
>Debug the codebase to grasp its structure and potential areas for
>improvement.
>
>
>-
>
>Weeks 1-2:
>-
>
>   Understand more about reference counting verification.
>   -
>
>   Develop verifying reference counts for PyObjects passed as parameters.
>   -
>
>Weeks 3-4:
>-
>
>   Begin to investigate Error Handling Checking.
>   -
>
>   Understand how the Static Analysis tool does Error Handling checking.
>   -
>
>   Implement these checks in the plugin.
>   -
>
>Weeks 5-7:
>-
>
>   Begin to investigate Exception Handling Checking.
>   -
>
>   Understand how the Static Analysis tool does Exception Handling
>   checking.
>   -
>
>   Implement these checks in the plugin.
>   -
>
>Weeks 8-11
>-
>
>   Begin to investigate Format String Checking.
>   -
>
>   Understand how the Static Analysis tool does Format String Checking.
>   -
>
>   Implement these checks in the plugin.
>   -
>
>Week 12
>-
>
>   Writing the GSoC wrapping-up document.
>
>
> ‫في الأربعاء، 27 مارس 2024 في 2:31 ص تمت كتابة ما يلي بواسطة ‪Nada
> Elsayed‬‏ <‪nadaelsayed...@gmail.com‬‏>:‬
>
>> Greetings All,
>> Hope this email finds you well.
>> I am interested in "Extend the plugin to add checking for usage of the
>> CPython API" project. First of all, I built the library, and now I am
>> trying to debug it. Then, I also used Cpython in 3 demos to understand how
>> it works. Finally, I read the uploaded patch comments to understand the
>> codebase and file structure.
>>
>> I was wondering if you could review my suggested timeline?
>> suggested Timeline:
>>
>>-
>>
>>May 1-26:
>>-
>>
>>   Explore Cython modules, emphasizing entry-points and bug
>>   identification.
>>   -
>>
>>   Study analyzers, particularly cpy-analyzer, to enhance
>>   understanding.
>>   -
>>
>>   Debug the codebase to grasp its structure and potential areas for
>>   improvement.
>>   -
>>
>>   Focus investigation on "errors in GIL handling" and "tp_traverse
>>   errors".
>>   -
>>
>>Weeks 1-6:
>>-
>>
>>   Investigate GIL (Global Interpreter Lock) errors extensively.
>>   -
>>
>>   Engage in discussions and develop viable solutions to address
>>   identified issues.
>>
>>
>>
>>-
>>
>>Weeks 7-12:
>>-
>>
>>   Gain insight into the functioning of the Garbage Collector.
>>   -
>>
>>   Implement checks to mitigate traverse errors effectively.
>>   -
>>
>>   Ensure robust error handling mechanisms are in place through
>>   thorough study and practical implementation.
>>
>>


Re: [GSoC] Interest in applying

2024-04-02 Thread Martin Jambor
Hello,

On Sun, Mar 31 2024, tmpod via Gcc wrote:
> Hello,
>
> I am a Computer Science student, currently taking a Master's degree in
>   
> 
> Portugal's top university. I have a strong passion for algorithms, data   
>   
> 
> structures and high performance computing, having participated in many
>   
> 
> programming contests (nationals, SWERC, Google's now defunct  
>   
> 
> competitions, etc) too. I have also some experience in contributing to
>   
> 
> open-source and working in software development teams. 
> C and C++ are the languages I've used the most, always compiled with
> GCC. My fascination with this area of computer science, and especially
> with GCC's details, has been present since an early age, so this Google
> program felt like the perfect opportunity to act and learn more!
>
> After carefully reading your approved ideas page, the ones that stand 
>   
> 
> out more to me are:
> * Improving OpenACC support
> * Extending the static analysis pass (for format strings, in particular)
> * Improving nothrow detection
>
> I have read the "Before you apply" and completed some of the steps
> outlined, and will do so as well for the remaining, tomorrow.
>
> I am aware of the very tight deadline and that it may be hard to write a
> community-backed proposal now, but unfortunately I was only made aware
> of Google's program a couple of days ago. Still, I'm going to try my
> best to write a quality proposal until Tuesday.
>
> If you have any suggestions, please let me know!
> I'd love to further discuss these with you.
>

We are delighted you found contributing to GCC interesting.

As you correctly wrote, the deadline is tight and so I cannot really write
more than that we are looking forward to see your application - and Easter
has made the situation worse).  Most of the projects you have listed have
been discussed on the mailing list recently, so hopefully you have found
some of the notes in the archive.  

Good luck!

Martin


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Sandra Loosemore

On 4/1/24 09:06, Mark Wielaard wrote:

A big thanks to everybody working this long Easter weekend who helped
analyze the xz-backdoor and making sure the impact on Sourceware and
the hosted projects was minimal.

This email isn't about the xz-backdoor itself. Do see Sam James FAQ
https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27
(Sorry for the github link, but this one does seem viewable without
proprietary javascript)

We should discuss what we have been doing and should do more to
mitigate and prevent the next xz-backdoor. There are a couple of
Sourceware services that can help with that.

TLDR;
- Replicatable isolated container/VMs are nice, we want more.
- autoregen buildbots, it should be transparent (and automated) how to
   regenerate build/source files.
- Automate (snapshot) releases tarballs.
- Reproducible releases (from git).

[snip]


While I appreciate the effort to harden the Sourceware infrastructure 
against malicious attacks and want to join in on thanking everyone who 
helped analyze this issue, to me it seems like the much bigger problem 
is that XZ had a maintainer who appears to have acted in bad faith.  Are 
the development processes used by the GNU toolchain components robust 
enough to cope with deliberate sabotage of the code base?  Do we have 
enough eyes available to ensure that every commit, even those by 
designated maintainers, is vetted by someone else?  Do we to harden our 
process, too, to require all patches to be signed off by someone else 
before committing?


-Sandra




Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Paul Eggert

On 4/2/24 12:54, Sandra Loosemore wrote:
Do we to harden our process, too, to require all patches to be signed 
off by someone else before committing?


It's easy for an attacker to arrange to have "someone else" in cahoots.

Although signoffs can indeed help catch inadvertent mistakes, they're 
relatively useless against determined attacks of this form, and we must 
assume that nation-state attackers will be determined.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Paul Koning via Gcc



> On Apr 2, 2024, at 4:03 PM, Paul Eggert  wrote:
> 
> On 4/2/24 12:54, Sandra Loosemore wrote:
>> Do we to harden our process, too, to require all patches to be signed off by 
>> someone else before committing?
> 
> It's easy for an attacker to arrange to have "someone else" in cahoots.
> 
> Although signoffs can indeed help catch inadvertent mistakes, they're 
> relatively useless against determined attacks of this form, and we must 
> assume that nation-state attackers will be determined.

Another consideration is the size of the project.  "Many eyeballs" helps if 
there are plenty of people watching.  For smaller tools that have only a small 
body of contributors, it's easier for one or two malicious ones to subvert 
things.

Would it help to require (rather than just recommend) "don't use root except 
for the actual 'install' step" ?

paul



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Ian Lance Taylor via Gcc
On Tue, Apr 2, 2024 at 1:21 PM Paul Koning via Gcc  wrote:
>
> Would it help to require (rather than just recommend) "don't use root except 
> for the actual 'install' step" ?

Seems reasonable, but note that it wouldn't make any difference to
this attack.  The liblzma library was modified to corrupt the sshd
binary, when sshd was linked against liblzma.  The actual attack
occurred via a connection to a corrupt sshd.  If sshd was running as
root, as is normal, the attacker had root access to the machine.  None
of the attacking steps had anything to do with having root access
while building or installing the program.

Ian


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Guinevere Larsen via Gcc

On 4/2/24 16:54, Sandra Loosemore wrote:

On 4/1/24 09:06, Mark Wielaard wrote:

A big thanks to everybody working this long Easter weekend who helped
analyze the xz-backdoor and making sure the impact on Sourceware and
the hosted projects was minimal.

This email isn't about the xz-backdoor itself. Do see Sam James FAQ
https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27
(Sorry for the github link, but this one does seem viewable without
proprietary javascript)

We should discuss what we have been doing and should do more to
mitigate and prevent the next xz-backdoor. There are a couple of
Sourceware services that can help with that.

TLDR;
- Replicatable isolated container/VMs are nice, we want more.
- autoregen buildbots, it should be transparent (and automated) how to
   regenerate build/source files.
- Automate (snapshot) releases tarballs.
- Reproducible releases (from git).

[snip]


While I appreciate the effort to harden the Sourceware infrastructure 
against malicious attacks and want to join in on thanking everyone who 
helped analyze this issue, to me it seems like the much bigger problem 
is that XZ had a maintainer who appears to have acted in bad faith.  
Are the development processes used by the GNU toolchain components 
robust enough to cope with deliberate sabotage of the code base?  Do 
we have enough eyes available to ensure that every commit, even those 
by designated maintainers, is vetted by someone else?  Do we to harden 
our process, too, to require all patches to be signed off by someone 
else before committing?


-Sandra


What likely happened for the maintainer who acted in bad faith was that 
they entered the project with bad faith intent from the start - seeing 
as they were only involved with the project for 2 years, and there was 
much social pressure from fake email accounts for the single maintainer 
of XZ to accept help.


While we would obviously like to have more area maintainers and possibly 
global maintainers to help spread the load, I don't think any of the 
projects listed here are all that susceptible to the same type of social 
engineering. For one, getting the same type of blanket approval would be 
a much more involved process because we already have a reasonable amount 
of people with those privileges, no one is dealing with burnout and 
sassy customers saying we aren't doing enough.


Beyond that, we (GDB) are already experimenting with approved-by, and I 
think glibc was doing the same. That guarantees at least a second set of 
eyes that analyzed and agreed with the patch, I don't think signed-off 
would add more than that tag (even if security was not the reason why we 
implemented them).


--
Cheers,
Guinevere Larsen
She/Her/Hers



Re: AutoFDO tools for GCC

2024-04-02 Thread Snehasish Kumar via Gcc
Thanks for initiating this discussion Eugene. For a little bit more context
on the motivation -- Meta has developed a new type of AutoFDO which is
committed upstream in LLVM and we want to unify our tooling with this
approach.

> I do wonder how much common code there is
> between the LLVM and the GCC tooling though and whether it makes sense
> to keep it common (and working with both frontends)?

The key components are the perf.data reader, profile construction data
structures and the profile writer. The reader parses perf.data as protobuf
[1] and suffers from a few drawbacks (as Andi pointed out). The
intermediate data structure which represents the profile is shared
(~1500LOC in [2]). Finally, LLVM and GCC have their own bespoke profile
writers [3]. So given the drawbacks of reader and limited sharing I think
it would be best to fork these tools into the GCC repo. Having perf
generate the profiles is an interesting idea and in addition to addressing
the issues Andi raised, would also simplify replicated symbolization logic.
In fact, the new implementation in LLVM parses perf script output [4] to
generate AutoFDO profiles. Finally, if AutoFDO is adopted by the kernel, a
dependence on another repository is undesirable.

> I think what makes sense to have from the code based are
> profile_diff/merge etc. which are needed for scalable collection.
> Or perhaps it would be best if gcov just gained those functionalities.

Yes, this should be straight-forward.

>> In tree would need convincing Google to assign the copyright.
>
> Would it?  Looks like it's under a free license (apache 2), not
> everything in the tree is copyright FSF or GPL3.

I can ask around more on my end if I get clarification on this.

Thanks,
Snehasish


[1] https://github.com/google/perf_data_converter
[2] https://github.com/google/autofdo/blob/master/symbol_map.cc
[3] https://github.com/google/autofdo/blob/master/profile_writer.cc
[4]
https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-profgen/PerfReader.cpp


On Wed, Mar 27, 2024 at 1:49 PM Jason Merrill  wrote:

> On Tue, Mar 26, 2024 at 6:41 PM Andi Kleen via Gcc 
> wrote:
> >
> > On Tue, Mar 26, 2024 at 08:45:22AM +0100, Richard Biener wrote:
> > > On Mon, Mar 25, 2024 at 9:54 PM Eugene Rozenfeld via Gcc
> > >  wrote:
> > > >
> > > > Hello,
> > > >
> > > > I've been the AutoFDO maintainer for the last 1.5 years. I've
> resurrected autoprofiledbootstrap build and made a number of other
> fixes/improvements (e.g., discriminator support).
> > > >
> > > > The tools for AutoFDO (create_gcov, etc.) currently live in
> https://github.com/google/AutoFDO  repo and GCC AutoFDO documentation
> points users to that repo. That repo also has tools for LLVM AutoFDO.
> > > > https://github.com/google/AutoFDO  has several submodules:
> https://github.com/google/autofdo/blob/master/.gitmodules
> > > >
> > > > I got a message from Snehasish (cc'd) that google intends to migrate
> the tools for LLVM to the LLVM repo and wants to archive
> https://github.com/google/AutoFDO. That will be a problem for AutoFDO in
> GCC. The idea to find a different home for GCC AutoFDO tools was discussed
> before on this alias but this becomes more urgent now. One idea was to
> build these tools from GCC repo and another was to produce gcov from perf
> tool directly. Andi (cc'd)  had some early unfinished prototype for latter.
> > > >
> > > > Please let me know if you have thoughts on how we should proceed.
> > >
> > > I think it makes sense for GCC specific parts to live in the GCC
> > > repository alongside gcov tools.  I do wonder how much common code
> > > there is
> > > between the LLVM and the GCC tooling though and whether it makes sense
> > > to keep it common (and working with both frontends)?  The
> > > pragmatic solution would have been to fork the repo on github to a
> > > place not within the google group ...
> >
> > In tree would need convincing Google to assign the copyright.
>
> Would it?  Looks like it's under a free license (apache 2), not
> everything in the tree is copyright FSF or GPL3.
>
> Jason
>
>


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Jeffrey Walton via Gcc
On Tue, Apr 2, 2024 at 6:09 PM Guinevere Larsen via Gdb
 wrote:
> [...]
> What likely happened for the maintainer who acted in bad faith was that
> they entered the project with bad faith intent from the start - seeing
> as they were only involved with the project for 2 years, and there was
> much social pressure from fake email accounts for the single maintainer
> of XZ to accept help.

The infiltration appears to have started offline, earlier than June
2022. See .

> While we would obviously like to have more area maintainers and possibly
> global maintainers to help spread the load, I don't think any of the
> projects listed here are all that susceptible to the same type of social
> engineering. For one, getting the same type of blanket approval would be
> a much more involved process because we already have a reasonable amount
> of people with those privileges, no one is dealing with burnout and
> sassy customers saying we aren't doing enough.
>
> Beyond that, we (GDB) are already experimenting with approved-by, and I
> think glibc was doing the same. That guarantees at least a second set of
> eyes that analyzed and agreed with the patch, I don't think signed-off
> would add more than that tag (even if security was not the reason why we
> implemented them).

Jeff


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Paul Koning via Gcc



> On Apr 2, 2024, at 6:08 PM, Guinevere Larsen  wrote:
> 
> On 4/2/24 16:54, Sandra Loosemore wrote:
>> On 4/1/24 09:06, Mark Wielaard wrote:
>>> A big thanks to everybody working this long Easter weekend who helped
>>> analyze the xz-backdoor and making sure the impact on Sourceware and
>>> the hosted projects was minimal.
>>> 
>>> This email isn't about the xz-backdoor itself. Do see Sam James FAQ
>>> https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27
>>> (Sorry for the github link, but this one does seem viewable without
>>> proprietary javascript)
>>> 
>>> We should discuss what we have been doing and should do more to
>>> mitigate and prevent the next xz-backdoor. There are a couple of
>>> Sourceware services that can help with that.
>>> 
>>> TLDR;
>>> - Replicatable isolated container/VMs are nice, we want more.
>>> - autoregen buildbots, it should be transparent (and automated) how to
>>>regenerate build/source files.
>>> - Automate (snapshot) releases tarballs.
>>> - Reproducible releases (from git).
>>> 
>>> [snip]
>> 
>> While I appreciate the effort to harden the Sourceware infrastructure 
>> against malicious attacks and want to join in on thanking everyone who 
>> helped analyze this issue, to me it seems like the much bigger problem is 
>> that XZ had a maintainer who appears to have acted in bad faith.  Are the 
>> development processes used by the GNU toolchain components robust enough to 
>> cope with deliberate sabotage of the code base?  Do we have enough eyes 
>> available to ensure that every commit, even those by designated maintainers, 
>> is vetted by someone else?  Do we to harden our process, too, to require all 
>> patches to be signed off by someone else before committing?
>> 
>> -Sandra
>> 
>> 
> What likely happened for the maintainer who acted in bad faith was that they 
> entered the project with bad faith intent from the start - seeing as they 
> were only involved with the project for 2 years, and there was much social 
> pressure from fake email accounts for the single maintainer of XZ to accept 
> help.
> 
> While we would obviously like to have more area maintainers and possibly 
> global maintainers to help spread the load, I don't think any of the projects 
> listed here are all that susceptible to the same type of social engineering. 
> For one, getting the same type of blanket approval would be a much more 
> involved process because we already have a reasonable amount of people with 
> those privileges, no one is dealing with burnout and sassy customers saying 
> we aren't doing enough.
> 
> Beyond that, we (GDB) are already experimenting with approved-by, and I think 
> glibc was doing the same. That guarantees at least a second set of eyes that 
> analyzed and agreed with the patch, I don't think signed-off would add more 
> than that tag (even if security was not the reason why we implemented them).
> 
> -- 
> Cheers,
> Guinevere Larsen
> She/Her/Hers

I agree that GDB, and for that matter other projects with significant numbers 
of contributors, are not nearly as likely to be vulnerable to this sort of 
attack.  But I worry that xz may not be the only project that's small enough to 
be vulnerable, and be security-relevant in not so obvious ways.

One question that comes to mind is whether there has been an effort across the 
open source community to identify possible other targets of such attacks.  
Contributions elsewhere by the suspect in this case are an obvious concern, but 
similar scenarios with different names could also be.  That probably should be 
an ongoing activity: whenever some external component is used, it would be 
worth knowing how it is maintained, and how many eyeballs are involved.  Even 
if this isn't done by everyone, it seems like a proper precaution for security 
sensitive projects.

Another question that comes to mind: I would guess that relevant law 
enforcement agencies are already looking into this, but it would seem 
appropriate for those closest to the attacked software to reach out explicitly 
and assist in any criminal investigations.

paul



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Jeffrey Walton via Gcc
On Tue, Apr 2, 2024 at 7:35 PM Paul Koning via Gdb  wrote:
> [...]
>
> I agree that GDB, and for that matter other projects with significant numbers 
> of contributors, are not nearly as likely to be vulnerable to this sort of 
> attack.  But I worry that xz may not be the only project that's small enough 
> to be vulnerable, and be security-relevant in not so obvious ways.

This cuts a lot deeper than folks think. Here are two other examples
off the top of my head...

Other vulnerable projects include ncurses and libnettle. Ncurses is
run by Thomas Dickey (https://invisible-island.net/). libnettle is run
by Niels Möller (https://www.lysator.liu.se/~nisse/nettle/). Both are
one-man shows with no continuity plans. Dickey does not even run a
public version control system. You have to download his release
tarballs, and there's no history to review or make pull requests
against. If DIckey or Möller got hit by a bus crossing the street,
there would be problems for years.

Jeff



> One question that comes to mind is whether there has been an effort across 
> the open source community to identify possible other targets of such attacks. 
>  Contributions elsewhere by the suspect in this case are an obvious concern, 
> but similar scenarios with different names could also be.  That probably 
> should be an ongoing activity: whenever some external component is used, it 
> would be worth knowing how it is maintained, and how many eyeballs are 
> involved.  Even if this isn't done by everyone, it seems like a proper 
> precaution for security sensitive projects.
>
> Another question that comes to mind: I would guess that relevant law 
> enforcement agencies are already looking into this, but it would seem 
> appropriate for those closest to the attacked software to reach out 
> explicitly and assist in any criminal investigations.
>
> paul
>


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-02 Thread Martin Uecker via Gcc
Am Dienstag, dem 02.04.2024 um 13:28 -0700 schrieb Ian Lance Taylor via Gcc:
> > On Tue, Apr 2, 2024 at 1:21 PM Paul Koning via Gcc  wrote:
> > > > 
> > > > Would it help to require (rather than just recommend) "don't use root 
> > > > except for the actual 'install' step" ?
> > 
> > Seems reasonable, but note that it wouldn't make any difference to
> > this attack.  The liblzma library was modified to corrupt the sshd
> > binary, when sshd was linked against liblzma.  The actual attack
> > occurred via a connection to a corrupt sshd.  If sshd was running as
> > root, as is normal, the attacker had root access to the machine.  None
> > of the attacking steps had anything to do with having root access
> > while building or installing the program.

There does not seem a single good solution against something like this.

My take a way is that software needs to become less complex. Do 
we really still need complex build systems such as autoconf?
Are there still so many different configurations with subtle differences 
that every single feature needs to be tested individually by
running code at build time?

Martin