On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> >Approach 1: Custom Address Spaces
> >=================================
> >
> >GCC's C frontend supports target-specific address spaces; see:
> >   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> >Quoting the N1275 draft of ISO/IEC DTR 18037:
> >   "Address space names are ordinary identifiers, sharing the same name
> >   space as variables and typedef names.  Any such names follow the same
> >   rules for scope as other ordinary identifiers (such as typedef names).
> >   An implementation may provide an implementation-defined set of
> >   intrinsic address spaces that are, in effect, predefined at the start
> >   of every translation unit.  The names of intrinsic address spaces must
> >   be reserved identifiers (beginning with an underscore and an uppercase
> >   letter or with two underscores).  An implementation may also
> >   optionally support a means for new address space names to be defined
> >   within a translation unit."
> >
> >Patch 1a in the following patch kit for GCC implements such a means to
> >define new address spaces names in a translation unit, via a pragma:
> >   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> >
> >For example, the Linux kernel could perhaps write:
> >
> >   #define __kernel
> >   #pragma GCC custom_address_space(__user)
> >   #pragma GCC custom_address_space(__iomem)
> >   #pragma GCC custom_address_space(__percpu)
> >   #pragma GCC custom_address_space(__rcu)
> >
> >and thus the C frontend can complain about code that mismatches __user
> >and kernel pointers, e.g.:
> >
> >custom-address-space-1.c: In function ‘test_argpass_to_p’:
> >custom-address-space-1.c:29:14: error: passing argument 1 of 
> >‘accepts_p’
> >from pointer to non-enclosed address space
> >    29 |   accepts_p (p_user);
> >       |              ^~~~~~
> >custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
> >of type ‘__user void *’
> >    21 | extern void accepts_p (void *);
> >       |                        ^~~~~~
> >custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> >custom-address-space-1.c:135:12: warning: cast to ‘__user’ address 
> >space
> >pointer from disjoint generic address space pointer
> >   135 |   p_user = (void __user *)p_kernel;
> >       |            ^
> 
> This seems like an excellent use of named address spaces :)

It has some big problems though.

Named address spaces are completely target-specific.  Defining them with
a pragma like this does not allow you to set the pointer mode or
anything related to a custom LEGITIMATE_ADDRESS_P.  It does not allow
you to sayy zero pointers are invalid in some address spaces and not in
others.  You cannot provide any of the DWARF address space stuff this
way.  But most importantly, there are only four bits for the address
space field internally, and they are used by however a backend wants to
use them.

None of this cannot be solved, but all of it will have to be solved.

IMO it will be best to not mix this with address spaces in the user
interface (it is of course fine to *implement* it like that, or with
big overlap at least).

> >The patch doesn't yet maintain a good distinction between implicit
> >target-specific address spaces and user-defined address spaces,

And that will have to be fixed in the user code syntax at least.

> >has at
> >least one known major bug, and has only been lightly tested.  I can
> >fix these issues, but was hoping for feedback that this approach is the
> >right direction from both the GCC and Linux development communities.

Allowing the user to define new address spaces does not jibe well with
how targets do (validly!) use them.

> >Approach 2: An "untrusted" attribute
> >====================================
> >
> >Alternatively, patch 1b in the kit implements:
> >
> >   __attribute__((untrusted))
> >
> >which can be applied to types as a qualifier (similarly to const,
> >volatile, etc) to mark a trust boundary, hence the kernel could have:
> >
> >   #define __user __attribute__((untrusted))
> >
> >where my patched GCC treats
> >   T *
> >vs
> >   T __attribute__((untrusted)) *
> >as being different types and thus the C frontend can complain (even without
> >-fanalyzer) about e.g.:
> >
> >extern void accepts_p(void *);
> >
> >void test_argpass_to_p(void __user *p_user)
> >{
> >   accepts_p(p_user);
> >}
> >
> >untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
> >untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
> >from pointer with different trust level
> >    22 |   accepts_p(p_user);
> >       |              ^~~~~~
> >untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
> >type ‘__attribute__((untrusted)) void *’
> >    14 | extern void accepts_p(void *);
> >       |                        ^~~~~~
> >
> >So you'd get enforcement of __user vs non-__user pointers as part of
> >GCC's regular type-checking.  (You need an explicit cast to convert
> >between the untrusted vs trusted types).
> 
> As with the named address space idea, this approach also looks
> reasonable to me.  If you anticipate using the attribute only
> in the analyzer I would suggest to consider introducing it in
> the analyzer's namespace (e.g., analyzer::untrusted, or even
> gnu::analyzer::untrusted).

I don't see any fundamental problems with this approach.  It also is
very much in line with how Perl handles this (and some copycat languages
do as well), the "tainted" flag on data.

> >This approach is much less expressive that the custom addres space
> >approach; it would only cover the trust boundary aspect; it wouldn't
> >cover any differences between generic pointers and __user, vs __iomem,
> >__percpu, and __rcu which I admit I only dimly understand.

Yes, it does not have any of the big problems that come with those
address spaces either!  :-)

> >Other attributes
> >================
> >
> >Patch 2 in the kit adds:
> >   __attribute__((returns_zero_on_success))
> >and
> >   __attribute__((returns_nonzero_on_success))
> >as hints to the analyzer that it's worth bifurcating the analysis of
> >such functions (to explore failure vs success, and thus to better
> >explore error-handling paths).  It's also a hint to the human reader of
> >the source code.
> 
> I thing being able to express something along these lines would
> be useful even outside the analyzer, both for warnings and, when
> done right, perhaps also for optimization.  So I'm in favor of
> something like this.  I'll just reiterate here the comment on
> this attribute I sent you privately some time ago.

What is "success" though?  You probably want it so some checker can make
sure you do handle failure some way, but how do you see what is handling
failure and what is handling the successful case?


Segher

Reply via email to