[Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API
labath added a comment. I don't know how deep do you want this refactor to be, but there is one issue I would like us to consider, if only to decide it is out of scope of this change. I am talking about the `quote_char` thingy. The main problem for me is that I don't think it's possible to sanely define the meaning of that field. According to POSIX quoting rules (which our command line more-or-less follows) a single argument can be quoted in a great many ways, using various combinations of quote characters. For example, these are all valid ways to represent the argument `asdf` in a POSIX shell: asdf "asdf" 'asdf' a"sd"f "as"df "as""df" "as"'df' "a"s'd'"f" ... (you get my point) I don't think there is a self-consistent way to define what the `quote_char` field will be for each of these options. Moreover, I don't see why one would ever need to use that field. It can only encourage someone to try to "quote" the argument by doing `quote_char+value+quote_char`, which is absolutely wrong if you ever want that result to be machine parsable.(*) For proper quoting I think we should just have a free-standing `std::string quote_for_posix_shell(llvm::StringRef)` function (and maybe `quote_for_windows_cmd`, and whatever else quoting scheme we need), and then the user can decide which one to use based on who is going to be consuming it. Then we can just kill the `quote` field. The only thing is... I have no idea how much work that will be (but I am ready to chip in to make it happen). So, yea, if we decide not to do that, then I think the interface looks great. Otherwise, I think we can design a slightly simpler (and more consistent) one. (*) Bonus question: Try to start an executable under lldb, so that in enters `main()` with `argc=2` and `argv[1]="'"` I.e., as if it had been started this way via bash: $ /bin/cat \' https://reviews.llvm.org/D26883 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API
The quote char is only exposed as a means to not break existing code which depends on it (most of which, not surprisingly, is in the Args class itself. We could try to come up with a way to kill it, but that seems like a separate refactor (and perhaps quite difficult since different platforms have different rules) On Sat, Nov 19, 2016 at 5:23 AM Pavel Labath wrote: > labath added a comment. > > I don't know how deep do you want this refactor to be, but there is one > issue I would like us to consider, if only to decide it is out of scope of > this change. I am talking about the `quote_char` thingy. The main problem > for me is that I don't think it's possible to sanely define the meaning of > that field. According to POSIX quoting rules (which our command line > more-or-less follows) a single argument can be quoted in a great many ways, > using various combinations of quote characters. For example, these are all > valid ways to represent the argument `asdf` in a POSIX shell: > > asdf > "asdf" > 'asdf' > a"sd"f > "as"df > "as""df" > "as"'df' > "a"s'd'"f" > ... (you get my point) > > I don't think there is a self-consistent way to define what the > `quote_char` field will be for each of these options. Moreover, I don't see > why one would ever need to use that field. It can only encourage someone to > try to "quote" the argument by doing `quote_char+value+quote_char`, which > is absolutely wrong if you ever want that result to be machine parsable.(*) > For proper quoting I think we should just have a free-standing `std::string > quote_for_posix_shell(llvm::StringRef)` function (and maybe > `quote_for_windows_cmd`, and whatever else quoting scheme we need), and > then the user can decide which one to use based on who is going to be > consuming it. Then we can just kill the `quote` field. The only thing is... > I have no idea how much work that will be (but I am ready to chip in to > make it happen). > > So, yea, if we decide not to do that, then I think the interface looks > great. Otherwise, I think we can design a slightly simpler (and more > consistent) one. > > (*) Bonus question: Try to start an executable under lldb, so that in > enters `main()` with `argc=2` and `argv[1]="'"` I.e., as if it had been > started this way via bash: > > $ /bin/cat \' > > > https://reviews.llvm.org/D26883 > > > > ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API
Assuming we do that, what interface do you think would be simpler? We still need easy access to both a StringRef and a c_str(), since StringRef::data is not guaranteed to be null terminated, so the entry thing is still nice. On Sat, Nov 19, 2016 at 5:44 AM Zachary Turner wrote: > The quote char is only exposed as a means to not break existing code which > depends on it (most of which, not surprisingly, is in the Args class itself. > > We could try to come up with a way to kill it, but that seems like a > separate refactor (and perhaps quite difficult since different platforms > have different rules) > On Sat, Nov 19, 2016 at 5:23 AM Pavel Labath wrote: > > labath added a comment. > > I don't know how deep do you want this refactor to be, but there is one > issue I would like us to consider, if only to decide it is out of scope of > this change. I am talking about the `quote_char` thingy. The main problem > for me is that I don't think it's possible to sanely define the meaning of > that field. According to POSIX quoting rules (which our command line > more-or-less follows) a single argument can be quoted in a great many ways, > using various combinations of quote characters. For example, these are all > valid ways to represent the argument `asdf` in a POSIX shell: > > asdf > "asdf" > 'asdf' > a"sd"f > "as"df > "as""df" > "as"'df' > "a"s'd'"f" > ... (you get my point) > > I don't think there is a self-consistent way to define what the > `quote_char` field will be for each of these options. Moreover, I don't see > why one would ever need to use that field. It can only encourage someone to > try to "quote" the argument by doing `quote_char+value+quote_char`, which > is absolutely wrong if you ever want that result to be machine parsable.(*) > For proper quoting I think we should just have a free-standing `std::string > quote_for_posix_shell(llvm::StringRef)` function (and maybe > `quote_for_windows_cmd`, and whatever else quoting scheme we need), and > then the user can decide which one to use based on who is going to be > consuming it. Then we can just kill the `quote` field. The only thing is... > I have no idea how much work that will be (but I am ready to chip in to > make it happen). > > So, yea, if we decide not to do that, then I think the interface looks > great. Otherwise, I think we can design a slightly simpler (and more > consistent) one. > > (*) Bonus question: Try to start an executable under lldb, so that in > enters `main()` with `argc=2` and `argv[1]="'"` I.e., as if it had been > started this way via bash: > > $ /bin/cat \' > > > https://reviews.llvm.org/D26883 > > > > ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API
One idea might be to have the entry contain 2 StringRefs. `str` and `quoted_str`. This way you never get access to the underlying quote char, just the full arg, either quoted or unquoted (although doing this would still be better done orthogonally to this patch) On Sat, Nov 19, 2016 at 5:48 AM Zachary Turner wrote: > Assuming we do that, what interface do you think would be simpler? We > still need easy access to both a StringRef and a c_str(), since > StringRef::data is not guaranteed to be null terminated, so the entry thing > is still nice. > On Sat, Nov 19, 2016 at 5:44 AM Zachary Turner wrote: > > The quote char is only exposed as a means to not break existing code which > depends on it (most of which, not surprisingly, is in the Args class itself. > > We could try to come up with a way to kill it, but that seems like a > separate refactor (and perhaps quite difficult since different platforms > have different rules) > On Sat, Nov 19, 2016 at 5:23 AM Pavel Labath wrote: > > labath added a comment. > > I don't know how deep do you want this refactor to be, but there is one > issue I would like us to consider, if only to decide it is out of scope of > this change. I am talking about the `quote_char` thingy. The main problem > for me is that I don't think it's possible to sanely define the meaning of > that field. According to POSIX quoting rules (which our command line > more-or-less follows) a single argument can be quoted in a great many ways, > using various combinations of quote characters. For example, these are all > valid ways to represent the argument `asdf` in a POSIX shell: > > asdf > "asdf" > 'asdf' > a"sd"f > "as"df > "as""df" > "as"'df' > "a"s'd'"f" > ... (you get my point) > > I don't think there is a self-consistent way to define what the > `quote_char` field will be for each of these options. Moreover, I don't see > why one would ever need to use that field. It can only encourage someone to > try to "quote" the argument by doing `quote_char+value+quote_char`, which > is absolutely wrong if you ever want that result to be machine parsable.(*) > For proper quoting I think we should just have a free-standing `std::string > quote_for_posix_shell(llvm::StringRef)` function (and maybe > `quote_for_windows_cmd`, and whatever else quoting scheme we need), and > then the user can decide which one to use based on who is going to be > consuming it. Then we can just kill the `quote` field. The only thing is... > I have no idea how much work that will be (but I am ready to chip in to > make it happen). > > So, yea, if we decide not to do that, then I think the interface looks > great. Otherwise, I think we can design a slightly simpler (and more > consistent) one. > > (*) Bonus question: Try to start an executable under lldb, so that in > enters `main()` with `argc=2` and `argv[1]="'"` I.e., as if it had been > started this way via bash: > > $ /bin/cat \' > > > https://reviews.llvm.org/D26883 > > > > ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API
labath added a comment. > Assuming we do that, what interface do you think would be simpler? We still > need easy access to both a StringRef and a c_str(), since StringRef::data > is not guaranteed to be null terminated, so the entry thing is still nice. I was assuming (possibly incorrectly, I did not look at that code much) that the main user of the null-terminated string version was execve(2), which needs an entire list of strings, not just a single item, in which case we could have the iteration simply be over StringRefs. That said, it probably does not matter now, as judging by your comments, you're looking for incremental changes, not one grand final design (btw, I didn't mean to suggest that this should all be done in one patch). In that case, this looks fine as far as I am concerned. We can always revise this later, and it doesn't look like it will require any major rewrite, just a bit of syntax-twiddling. You can then read my comments as "the direction I would like to move us in". > One idea might be to have the entry contain 2 StringRefs. `str` and > `quoted_str`. This way you never get access to the underlying quote char, > just the full arg, either quoted or unquoted (although doing this would > still be better done orthogonally to this patch) I don't think this is a good idea, as I don't see a need to be able to access the original quoted string this way. Also, when you construct the args vector programmatically, you would have to invent a quoted representation without knowing if it will ever be used. I'd prefer to have a standalone quote function instead, as then it can be used in other contexts as well (separate algorithms from data). https://reviews.llvm.org/D26883 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API
The main user of the c_str() is printf calls. It's in the commit message :) On Sat, Nov 19, 2016 at 7:14 AM Pavel Labath wrote: > labath added a comment. > > > Assuming we do that, what interface do you think would be simpler? We > still > > need easy access to both a StringRef and a c_str(), since > StringRef::data > > is not guaranteed to be null terminated, so the entry thing is still > nice. > > I was assuming (possibly incorrectly, I did not look at that code much) > that the main user of the null-terminated string version was execve(2), > which needs an entire list of strings, not just a single item, in which > case we could have the iteration simply be over StringRefs. That said, it > probably does not matter now, as judging by your comments, you're looking > for incremental changes, not one grand final design (btw, I didn't mean to > suggest that this should all be done in one patch). In that case, this > looks fine as far as I am concerned. We can always revise this later, and > it doesn't look like it will require any major rewrite, just a bit of > syntax-twiddling. You can then read my comments as "the direction I would > like to move us in". > > > One idea might be to have the entry contain 2 StringRefs. `str` and > > `quoted_str`. This way you never get access to the underlying quote char, > > just the full arg, either quoted or unquoted (although doing this would > > still be better done orthogonally to this patch) > > I don't think this is a good idea, as I don't see a need to be able to > access the original quoted string this way. Also, when you construct the > args vector programmatically, you would have to invent a quoted > representation without knowing if it will ever be used. I'd prefer to have > a standalone quote function instead, as then it can be used in other > contexts as well (separate algorithms from data). > > > https://reviews.llvm.org/D26883 > > > > ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API
The reason we store the quote char, btw, is so that we can use the same exact quoting as the user typed at command line. A general algorithm wouldn't be able to reproduce exactly what the user typed On Sat, Nov 19, 2016 at 9:43 AM Zachary Turner wrote: > The main user of the c_str() is printf calls. It's in the commit message :) > On Sat, Nov 19, 2016 at 7:14 AM Pavel Labath wrote: > > labath added a comment. > > > Assuming we do that, what interface do you think would be simpler? We > still > > need easy access to both a StringRef and a c_str(), since > StringRef::data > > is not guaranteed to be null terminated, so the entry thing is still > nice. > > I was assuming (possibly incorrectly, I did not look at that code much) > that the main user of the null-terminated string version was execve(2), > which needs an entire list of strings, not just a single item, in which > case we could have the iteration simply be over StringRefs. That said, it > probably does not matter now, as judging by your comments, you're looking > for incremental changes, not one grand final design (btw, I didn't mean to > suggest that this should all be done in one patch). In that case, this > looks fine as far as I am concerned. We can always revise this later, and > it doesn't look like it will require any major rewrite, just a bit of > syntax-twiddling. You can then read my comments as "the direction I would > like to move us in". > > > One idea might be to have the entry contain 2 StringRefs. `str` and > > `quoted_str`. This way you never get access to the underlying quote char, > > just the full arg, either quoted or unquoted (although doing this would > > still be better done orthogonally to this patch) > > I don't think this is a good idea, as I don't see a need to be able to > access the original quoted string this way. Also, when you construct the > args vector programmatically, you would have to invent a quoted > representation without knowing if it will ever be used. I'd prefer to have > a standalone quote function instead, as then it can be used in other > contexts as well (separate algorithms from data). > > > https://reviews.llvm.org/D26883 > > > > ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits