yaxunl marked an inline comment as done.
yaxunl added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+    SubArchName = SubArchName.substr(0, Pos);
----------------
tra wrote:
> yaxunl wrote:
> > tra wrote:
> > > Parsing should probably be extracted into a separate function to avoid 
> > > replicating it all over the place.
> > > 
> > > I'd also propose use a different syntax for the properties.
> > > * use explicit character to separate individual elements. This way 
> > > splitting the properties becomes independent of what those properties 
> > > are. If you decide to make properties with values or change their meaning 
> > > some other way, it would not affect how you compose them.
> > > * use `name=value` or `name[+-]` for individual properties. This makes it 
> > > easy to parse individual properties and normalize their names. This makes 
> > > property map creation independent of the property values.
> > > 
> > > Right now `[+-]` serves as both a separator and as the value, which would 
> > > present problems if you ever need more flexible parametrization of 
> > > properties. What if a property must be a number or a string. Granted, you 
> > > can always encode them as a set of bools, but that's rather unpractical. 
> > > 
> > > E.g. something like this would work a bit better: 
> > > `gfx111:foo+:bar=33:buz=string`.
> > > 
> > I discussed this with our team.
> > 
> > The target id features are not raw GPU properties. They are distilled to 
> > become a few target features to decide what the compiler should do.
> > 
> > Each target feature has 3 values: on, off, and default, which are encoded 
> > as +feature, -feature, and not present.
> > 
> > For runtime, it is simple and clear how to choose device binaries based on 
> > the target features: it will try exact match, otherwise choose the default.
> > 
> > For compiler, it is also simple and clear what to do for each target 
> > feature value, since they corresponding to backend target features.
> > 
> > Basically we expect the target id feature to be like flags, not key/value 
> > pairs. In case we do need key/value pairs, they can still use + as 
> > delimiter.
> > 
> > Another reason we use +/- format is that it is more in line with the format 
> > of existing clang-offload-bundler id and target triple, which uses - as 
> > delimiter.
> > 
> > Since the target id is an extension of offload arch and users will put it 
> > into command line, we want to make it short, concise and aesthetically 
> > appealing, we would avoid using non-alpha-numeric characters in the target 
> > id features. Target triple components have similar requirements. Using : as 
> > delimiter seems unnecessary, longer, and more difficult to read.
> > 
> > Consider the following example
> > 
> > 
> > ```
> > clang -offload-id gfx908+xnack-sramecc a.hip
> > 
> > clang -offload-id gfx908:xnack+:sramecc- a.hip
> > ```
> > 
> > We are more inclined to keep the original format. 
> You're thinking in terms what's needed by AMDGPU *now*. The scheme you're 
> proposing is sufficient for your use case and I'm fine with that. I'm 
> suggesting that you should consider what happens once this change lands.
> 
> The functionality you're implementing is exposed to end-users via top-level 
> clang driver argument. This is visible to users and will be relied on.
> This will make it hard to change in the future without breaking someone. It's 
> worth making sure we're not painting ourselves in the corner here.
> 
> Also, the functionality may be useful/applicable beyond the scope of amdgpu 
> and the binary flags will not be sufficient for everyone. The scheme you're 
> proposing would be somewhat restrictive if I need to pass an integer value or 
> string. We could use something like `gfx123+foo=456-bar=789` but it would 
> look rather odd, IMO. 
> 
> Granted, none of the above is a showstopper. I guess we could support 
> multiple formats if it comes to that, but I'd rather not multiply things 
> later because we didn't think of them earlier.
> 
> > Another reason we use +/- format is that it is more in line with the format 
> > of existing clang-offload-bundler id and target triple, which uses - as 
> > delimiter.
> 
> The point was that commingling field separator and the field value is not the 
> cleanest approach, IMO. I'd be fine fine with some other character.
> 
> > Since the target id is an extension of offload arch and users will put it 
> > into command line, we want to make it short, concise and aesthetically 
> > appealing, we would avoid using non-alpha-numeric characters in the target 
> > id features. Target triple components have similar requirements. Using : as 
> > delimiter seems unnecessary, longer, and more difficult to read.
> 
> The current use of `gfxXXX` seems to fit the 'short, concise & aesthetically 
> pleasing' part of your argument much better than the proposed scheme.
> 
> Is the end user allowed to specify an arbitrary set of the features? Or is 
> the offload-id set restricted to a smaller number of combinations (i.e. tied 
> to particular hardware variants). I vaguely recall that in the past the 
> problem was that AMD needed to create multiple device compilations for one 
> GPU architecture and that didn't fit in the model used by CUDA compilation. 
> 
> Would it make sense to keep user-visible GPU arch argument as is and map each 
> known one internally into a set of `offload-id` parameters used to create 
> driver device-side compilations? For CUDA it will be a pass-through, for HIP 
> it will translate single user-specified arch into multiple offload-ids. This 
> would leave AMDGPU free to choose the way internally-used  offload-id is 
> structured and can change it if/when it's necessary without worrying about 
> existing users. It also keeps user-visible parameters short. The translation 
> from gpu-arch to offload-id should be simple enough to maintain.
> 
> 
After discussion, we decided to adopt the format you proposed. The rationale is 
that we want target id to be treated as an extended `--offload-arch` option, 
which means it needs to be able to accept all existing and future CUDA arch 
names. Using `:` as delimiter should be tolerant enough whereas `+/-` is not.

Also I will try introducing -offload-target-id for this option.

The features that can be used in target id are restricted to a few predefined 
features for each GPU arch, because both compiler and runtime needs to know how 
to handle them.

I am not sure if I understand your last question. With the new format we should 
be able to use any CUDA arch names as target id, therefore we no longer need a 
map. Also we need to pass each target id as a whole option since we need to use 
it as an id for the device binary for each device compilation.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to