Re: New contributor tasks

2021-07-13 Thread Philip Herron
On 12/07/2021 23:44, Mark Wielaard wrote:
> On Mon, Jul 12, 2021 at 11:06:01AM +0100, Philip Herron wrote:
>> Great work once again. I am aiming to spend some time towards the end of
>> the week to add more tickets and info for new contributors to get
>> involved, which I will post the interesting ones onto the mailing list
>> as well. I think it should be interesting to contributors of all levels.
>> The main one that sticks out in my mind is the AST, HIR dumps which are
>> a bit of a mess at the moment.
> The AST dump (--rust-dump-parse) was actually useful for checking the
> comment doc strings, but it could certainly be improved. Ideally it
> would be structured in a way that can easily be used in tests.
I think a really good project would be to update our HIR dump, it should
really be an S-expression format so we can emit the
Analysis::NodeMapping information in a way that looks good at the moment
its a mess.
> Some (random) notes I made on issues that might be nice to explain
> and/or work on.
>
> - Full unicode/utf8 support in the lexer. Currently the lexer only
>   explicitly interprets the input as UTF8 for string parseing. It
>   should really treat all input as UTF-8. gnulib has some handy
>   modules we could use to read/convert from/to utf8 (unistr/u8-to-u32,
>   unistr/u32-to-u8) and test various unicode properties
>   (unictype/property-white-space, unictype/property-xid-continue,
>   unictype/property-xid-start). I don't know if we can import those or
>   if gcc already has these kind of UTF-8/unicode support functions for
>   other languages?
GCCGO supports utf-8 formats for identifiers but I think it has its own
implementation to do this. I think pulling in gnulib sounds like a good
idea, i assume we should ask about this on the GCC mailing list but I
would prefer to reuse a library for utf8 support. The piece about
creating the strings in GENERIC will need updated as part of that work.
> - Error handling using rich locations in the lexer and parser.  It
>   seems some support is already there, but it isn't totally clear to
>   me what is already in place and what could/should be added. e.g. how
>   to add notes to an Error.
I've made a wrapper over RichLocation i had some crashes when i added
methods for annotations. Overall my understanding is that a Location
that we have at the moment is a single character location in the source
code but Rustc uses Spans which might be an abstraction we could think
about implementing instead of the Location wrapper we are reusing for
GCCGO.
> - I noticed some expressions didn't parse because of what looks to me
>   operator precedence issues. e.g the following:
>
>   const S: usize = 64;
>
>   pub fn main ()
>   {
> let a:u8 = 1;
> let b:u8 = 2;
> let _c = S * a as usize + b as usize;
>   }
>
>   $ gcc/gccrs -Bgcc as.rs
>
>   as.rs:7:27: error: type param bounds (in TraitObjectType) are not allowed 
> as TypeNoBounds
> 7 |   let _c = S * a as usize + b as usize;
>   |   ^
>
>   How does one fix such operator precedence issues in the parser?

Off the top of my head it looks as though the parse_type_cast_expr has a
FIXME for the precedence issue for it. The Pratt parser uses the notion
of binding powers to handle this and i think it needs to follow in a
similar style to the ::parse_expr piece.

> - Related, TypeCastExpr as the above aren't lowered from AST to HIR.
>   I believe I know how to do it, but a small description of the visitor
>   pattern used and in which files one does such lowering would be helpful.
The AST->HIR lowering does need some documentation, since it must go
through name-resolution first but there is no documentation on how any
of this works yet. I will put this on my todo list its come up a few
times the naming of some of the classes like ResolveItemToplevel vs
ResolveItem are confusing things. Some of this will get cleaned up as
part of traits, such as the forward declared items within a block bug:

Basically the idea is that we always perform a toplevel scan for all
items and create long canonical names in the top most scope, such that
we can resolve their names at any point without requiring prototypes or
look ahead. This means we have a pass to look for the names then we have
a pass to then resolve each structures fields, functions parameters,
returns types and blocks of code. So if a block calls to a function
declared ahead we can still resolve it to its NodeId. It is when we
ResolveItem we push new contexts onto the stack to have lexical scoping
for names. Its worth noting that Rust also supports shadowing of
variables within a block so these do not cause a duplicate name error
and simply add a new declaration to that context or what rustc calls
Ribs such that further resolution will reference this new declaration
and the previous one is shadowed correctly.

> - And of course, how to lower HIR to GENERIC?  For TypeCastExpr you
>   said on irc we need traits first, but the semantics 

Re: New contributor tasks

2021-07-13 Thread Thomas Schwinge
Hi!

On 2021-07-13T00:44:13+0200, Mark Wielaard  wrote:
> On Mon, Jul 12, 2021 at 11:06:01AM +0100, Philip Herron wrote:
>> The main one that sticks out in my mind is the AST, HIR dumps which are
>> a bit of a mess at the moment.
>
> The AST dump (--rust-dump-parse) was actually useful for checking the
> comment doc strings, but it could certainly be improved. Ideally it
> would be structured in a way that can easily be used in tests.

Right.  Already a while ago, I had run into the same (for a lexer-level
thing), and have early-stages WIP changes to implement dumps for the
several GCC/Rust front end stages using the (more or less) standard
'-fdump-lang-[...]' flag.  These dump files may then be scanned using the
usual GCC/DejaGnu testsuite idioms.  I plan to complete that work at some
later point in time, hopefully not too far out.  (Mark, I then actually
had planned to add some testcases fore your recent lexer changes.)

(My work there is independent of/orthogonal to the S-expression dump
format discussed elsewhere.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
-- 
Gcc-rust mailing list
Gcc-rust@gcc.gnu.org
https://gcc.gnu.org/mailman/listinfo/gcc-rust