[Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API

2016-11-19 Thread Pavel Labath via lldb-commits
labath added a comment.

I don't know how deep do you want this refactor to be, but there is one issue I 
would like us to consider, if only to decide it is out of scope of this change. 
I am talking about the `quote_char` thingy. The main problem for me is that I 
don't think it's possible to sanely define the meaning of that field. According 
to POSIX quoting rules (which our command line more-or-less follows) a single 
argument can be quoted in a great many ways, using various combinations of 
quote characters. For example, these are all valid ways to represent the 
argument `asdf` in a POSIX shell:

  asdf
  "asdf"
  'asdf'
  a"sd"f
  "as"df
  "as""df"
  "as"'df'
  "a"s'd'"f"
  ... (you get my point)

I don't think there is a self-consistent way to define what the `quote_char` 
field will be for each of these options. Moreover, I don't see why one would 
ever need to use that field. It can only encourage someone to try to "quote" 
the argument by doing `quote_char+value+quote_char`, which is absolutely wrong 
if you ever want that result to be machine parsable.(*) For proper quoting I 
think we should just have a free-standing `std::string 
quote_for_posix_shell(llvm::StringRef)` function (and maybe 
`quote_for_windows_cmd`, and whatever else quoting scheme we need), and then 
the user can decide which one to use based on who is going to be consuming it. 
Then we can just kill the `quote` field. The only thing is... I have no idea 
how much work that will be (but I am ready to chip in to make it happen).

So, yea, if we decide not to do that, then I think the interface looks great. 
Otherwise, I think we can design a slightly simpler (and more consistent) one.

(*) Bonus question: Try to start an executable under lldb, so that in enters 
`main()` with `argc=2` and `argv[1]="'"` I.e.,  as if it had been started this 
way via bash:

  $ /bin/cat \'


https://reviews.llvm.org/D26883



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API

2016-11-19 Thread Zachary Turner via lldb-commits
The quote char is only exposed as a means to not break existing code which
depends on it (most of which, not surprisingly, is in the Args class itself.

We could try to come up with a way to kill it, but that seems like a
separate refactor (and perhaps quite difficult since different platforms
have different rules)
On Sat, Nov 19, 2016 at 5:23 AM Pavel Labath  wrote:

> labath added a comment.
>
> I don't know how deep do you want this refactor to be, but there is one
> issue I would like us to consider, if only to decide it is out of scope of
> this change. I am talking about the `quote_char` thingy. The main problem
> for me is that I don't think it's possible to sanely define the meaning of
> that field. According to POSIX quoting rules (which our command line
> more-or-less follows) a single argument can be quoted in a great many ways,
> using various combinations of quote characters. For example, these are all
> valid ways to represent the argument `asdf` in a POSIX shell:
>
>   asdf
>   "asdf"
>   'asdf'
>   a"sd"f
>   "as"df
>   "as""df"
>   "as"'df'
>   "a"s'd'"f"
>   ... (you get my point)
>
> I don't think there is a self-consistent way to define what the
> `quote_char` field will be for each of these options. Moreover, I don't see
> why one would ever need to use that field. It can only encourage someone to
> try to "quote" the argument by doing `quote_char+value+quote_char`, which
> is absolutely wrong if you ever want that result to be machine parsable.(*)
> For proper quoting I think we should just have a free-standing `std::string
> quote_for_posix_shell(llvm::StringRef)` function (and maybe
> `quote_for_windows_cmd`, and whatever else quoting scheme we need), and
> then the user can decide which one to use based on who is going to be
> consuming it. Then we can just kill the `quote` field. The only thing is...
> I have no idea how much work that will be (but I am ready to chip in to
> make it happen).
>
> So, yea, if we decide not to do that, then I think the interface looks
> great. Otherwise, I think we can design a slightly simpler (and more
> consistent) one.
>
> (*) Bonus question: Try to start an executable under lldb, so that in
> enters `main()` with `argc=2` and `argv[1]="'"` I.e.,  as if it had been
> started this way via bash:
>
>   $ /bin/cat \'
>
>
> https://reviews.llvm.org/D26883
>
>
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API

2016-11-19 Thread Zachary Turner via lldb-commits
Assuming we do that, what interface do you think would be simpler? We still
need easy access to both a StringRef and a c_str(), since StringRef::data
is not guaranteed to be null terminated, so the entry thing is still nice.
On Sat, Nov 19, 2016 at 5:44 AM Zachary Turner  wrote:

> The quote char is only exposed as a means to not break existing code which
> depends on it (most of which, not surprisingly, is in the Args class itself.
>
> We could try to come up with a way to kill it, but that seems like a
> separate refactor (and perhaps quite difficult since different platforms
> have different rules)
> On Sat, Nov 19, 2016 at 5:23 AM Pavel Labath  wrote:
>
> labath added a comment.
>
> I don't know how deep do you want this refactor to be, but there is one
> issue I would like us to consider, if only to decide it is out of scope of
> this change. I am talking about the `quote_char` thingy. The main problem
> for me is that I don't think it's possible to sanely define the meaning of
> that field. According to POSIX quoting rules (which our command line
> more-or-less follows) a single argument can be quoted in a great many ways,
> using various combinations of quote characters. For example, these are all
> valid ways to represent the argument `asdf` in a POSIX shell:
>
>   asdf
>   "asdf"
>   'asdf'
>   a"sd"f
>   "as"df
>   "as""df"
>   "as"'df'
>   "a"s'd'"f"
>   ... (you get my point)
>
> I don't think there is a self-consistent way to define what the
> `quote_char` field will be for each of these options. Moreover, I don't see
> why one would ever need to use that field. It can only encourage someone to
> try to "quote" the argument by doing `quote_char+value+quote_char`, which
> is absolutely wrong if you ever want that result to be machine parsable.(*)
> For proper quoting I think we should just have a free-standing `std::string
> quote_for_posix_shell(llvm::StringRef)` function (and maybe
> `quote_for_windows_cmd`, and whatever else quoting scheme we need), and
> then the user can decide which one to use based on who is going to be
> consuming it. Then we can just kill the `quote` field. The only thing is...
> I have no idea how much work that will be (but I am ready to chip in to
> make it happen).
>
> So, yea, if we decide not to do that, then I think the interface looks
> great. Otherwise, I think we can design a slightly simpler (and more
> consistent) one.
>
> (*) Bonus question: Try to start an executable under lldb, so that in
> enters `main()` with `argc=2` and `argv[1]="'"` I.e.,  as if it had been
> started this way via bash:
>
>   $ /bin/cat \'
>
>
> https://reviews.llvm.org/D26883
>
>
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API

2016-11-19 Thread Zachary Turner via lldb-commits
One idea might be to have the entry contain 2 StringRefs. `str` and
`quoted_str`. This way you never get access to the underlying quote char,
just the full arg, either quoted or unquoted (although doing this would
still be better done orthogonally to this patch)
On Sat, Nov 19, 2016 at 5:48 AM Zachary Turner  wrote:

> Assuming we do that, what interface do you think would be simpler? We
> still need easy access to both a StringRef and a c_str(), since
> StringRef::data is not guaranteed to be null terminated, so the entry thing
> is still nice.
> On Sat, Nov 19, 2016 at 5:44 AM Zachary Turner  wrote:
>
> The quote char is only exposed as a means to not break existing code which
> depends on it (most of which, not surprisingly, is in the Args class itself.
>
> We could try to come up with a way to kill it, but that seems like a
> separate refactor (and perhaps quite difficult since different platforms
> have different rules)
> On Sat, Nov 19, 2016 at 5:23 AM Pavel Labath  wrote:
>
> labath added a comment.
>
> I don't know how deep do you want this refactor to be, but there is one
> issue I would like us to consider, if only to decide it is out of scope of
> this change. I am talking about the `quote_char` thingy. The main problem
> for me is that I don't think it's possible to sanely define the meaning of
> that field. According to POSIX quoting rules (which our command line
> more-or-less follows) a single argument can be quoted in a great many ways,
> using various combinations of quote characters. For example, these are all
> valid ways to represent the argument `asdf` in a POSIX shell:
>
>   asdf
>   "asdf"
>   'asdf'
>   a"sd"f
>   "as"df
>   "as""df"
>   "as"'df'
>   "a"s'd'"f"
>   ... (you get my point)
>
> I don't think there is a self-consistent way to define what the
> `quote_char` field will be for each of these options. Moreover, I don't see
> why one would ever need to use that field. It can only encourage someone to
> try to "quote" the argument by doing `quote_char+value+quote_char`, which
> is absolutely wrong if you ever want that result to be machine parsable.(*)
> For proper quoting I think we should just have a free-standing `std::string
> quote_for_posix_shell(llvm::StringRef)` function (and maybe
> `quote_for_windows_cmd`, and whatever else quoting scheme we need), and
> then the user can decide which one to use based on who is going to be
> consuming it. Then we can just kill the `quote` field. The only thing is...
> I have no idea how much work that will be (but I am ready to chip in to
> make it happen).
>
> So, yea, if we decide not to do that, then I think the interface looks
> great. Otherwise, I think we can design a slightly simpler (and more
> consistent) one.
>
> (*) Bonus question: Try to start an executable under lldb, so that in
> enters `main()` with `argc=2` and `argv[1]="'"` I.e.,  as if it had been
> started this way via bash:
>
>   $ /bin/cat \'
>
>
> https://reviews.llvm.org/D26883
>
>
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API

2016-11-19 Thread Pavel Labath via lldb-commits
labath added a comment.

> Assuming we do that, what interface do you think would be simpler? We still
>  need easy access to both a StringRef and a c_str(), since StringRef::data
>  is not guaranteed to be null terminated, so the entry thing is still nice.

I was assuming (possibly incorrectly, I did not look at that code much) that 
the main user of the null-terminated string version was execve(2), which needs 
an entire list of strings, not just a single item, in which case we could have 
the iteration simply be over StringRefs. That said, it probably does not matter 
now, as judging by your comments, you're looking for incremental changes, not 
one grand final design (btw, I didn't mean to suggest that this should all be 
done in one patch). In that case, this looks fine as far as I am concerned. We 
can always revise this later, and it doesn't look like it will require any 
major rewrite, just a bit of syntax-twiddling. You can then read my comments as 
"the direction I would like to move us in".

> One idea might be to have the entry contain 2 StringRefs. `str` and
> `quoted_str`. This way you never get access to the underlying quote char,
>  just the full arg, either quoted or unquoted (although doing this would
>  still be better done orthogonally to this patch)

I don't think this is a good idea, as I don't see a need to be able to access 
the original quoted string this way. Also, when you construct the args vector 
programmatically, you would have to invent a quoted representation without 
knowing if it will ever be used. I'd prefer to have a standalone quote function 
instead, as then it can be used in other contexts as well (separate algorithms 
from data).


https://reviews.llvm.org/D26883



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API

2016-11-19 Thread Zachary Turner via lldb-commits
The main user of the c_str() is printf calls. It's in the commit message :)
On Sat, Nov 19, 2016 at 7:14 AM Pavel Labath  wrote:

> labath added a comment.
>
> > Assuming we do that, what interface do you think would be simpler? We
> still
> >  need easy access to both a StringRef and a c_str(), since
> StringRef::data
> >  is not guaranteed to be null terminated, so the entry thing is still
> nice.
>
> I was assuming (possibly incorrectly, I did not look at that code much)
> that the main user of the null-terminated string version was execve(2),
> which needs an entire list of strings, not just a single item, in which
> case we could have the iteration simply be over StringRefs. That said, it
> probably does not matter now, as judging by your comments, you're looking
> for incremental changes, not one grand final design (btw, I didn't mean to
> suggest that this should all be done in one patch). In that case, this
> looks fine as far as I am concerned. We can always revise this later, and
> it doesn't look like it will require any major rewrite, just a bit of
> syntax-twiddling. You can then read my comments as "the direction I would
> like to move us in".
>
> > One idea might be to have the entry contain 2 StringRefs. `str` and
> > `quoted_str`. This way you never get access to the underlying quote char,
> >  just the full arg, either quoted or unquoted (although doing this would
> >  still be better done orthogonally to this patch)
>
> I don't think this is a good idea, as I don't see a need to be able to
> access the original quoted string this way. Also, when you construct the
> args vector programmatically, you would have to invent a quoted
> representation without knowing if it will ever be used. I'd prefer to have
> a standalone quote function instead, as then it can be used in other
> contexts as well (separate algorithms from data).
>
>
> https://reviews.llvm.org/D26883
>
>
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D26883: Demonstrate proposed new Args API

2016-11-19 Thread Zachary Turner via lldb-commits
The reason we store the quote char, btw, is so that we can use the same
exact quoting as the user typed at command line. A general algorithm
wouldn't be able to reproduce exactly what the user typed
On Sat, Nov 19, 2016 at 9:43 AM Zachary Turner  wrote:

> The main user of the c_str() is printf calls. It's in the commit message :)
> On Sat, Nov 19, 2016 at 7:14 AM Pavel Labath  wrote:
>
> labath added a comment.
>
> > Assuming we do that, what interface do you think would be simpler? We
> still
> >  need easy access to both a StringRef and a c_str(), since
> StringRef::data
> >  is not guaranteed to be null terminated, so the entry thing is still
> nice.
>
> I was assuming (possibly incorrectly, I did not look at that code much)
> that the main user of the null-terminated string version was execve(2),
> which needs an entire list of strings, not just a single item, in which
> case we could have the iteration simply be over StringRefs. That said, it
> probably does not matter now, as judging by your comments, you're looking
> for incremental changes, not one grand final design (btw, I didn't mean to
> suggest that this should all be done in one patch). In that case, this
> looks fine as far as I am concerned. We can always revise this later, and
> it doesn't look like it will require any major rewrite, just a bit of
> syntax-twiddling. You can then read my comments as "the direction I would
> like to move us in".
>
> > One idea might be to have the entry contain 2 StringRefs. `str` and
> > `quoted_str`. This way you never get access to the underlying quote char,
> >  just the full arg, either quoted or unquoted (although doing this would
> >  still be better done orthogonally to this patch)
>
> I don't think this is a good idea, as I don't see a need to be able to
> access the original quoted string this way. Also, when you construct the
> args vector programmatically, you would have to invent a quoted
> representation without knowing if it will ever be used. I'd prefer to have
> a standalone quote function instead, as then it can be used in other
> contexts as well (separate algorithms from data).
>
>
> https://reviews.llvm.org/D26883
>
>
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits