parameters in URL path segments

2006-08-21 Thread James Berry

Guys,

Sorry to open up this subject again. I've just read the mails in this  
thread:


http://marc.theaimsgroup.com/?l=tomcat-dev&m=115344110306194&w=2
http://marc.theaimsgroup.com/?l=tomcat-dev&m=115346837428224&w=2

Though I can't say I paid particular attention to the jkmount  
situation (and so I can't testify as to how treatment of such  
parameters might affect that), I can say that I'd like to be able to  
use ;parameter=value in my path segments in a tomcat environment:


http://example.com/this;biz=bar;foo/that;v=1.1/whatever

To me, this looks completely valid per rfc 2396, 2616, and 3986, and  
it's a surprise to me that tomcat strips any path following the first  
such parameter.


I would like to see tomcat essentially ignore the fact that ';'  
exists in a path segment, and pass it on into the servlet unmodified  
to do with as it pleases.


I'm inspired by the following paragraph in G.4 of rfc 2396:

Extensive testing of current client applications demonstrated that
   the majority of deployed systems do not use the ";" character to
   indicate trailing parameter information, and that the presence of a
   semicolon in a path segment does not affect the relative parsing of
   that segment.  Therefore, parameters have been removed as a separate
   component and may now appear in any path segment.  Their influence
   has been removed from the algorithm for resolving a relative URI
   reference.  The resolution examples in Appendix C have been modified
   to reflect this change.

And also by the following from rfc 3986:

   Aside from dot-segments in hierarchical paths, a path segment is
   considered opaque by the generic syntax.  URI producing applications
   often use the reserved characters allowed in a segment to delimit
   scheme-specific or dereference-handler-specific subcomponents.  For
   example, the semicolon (";") and equals ("=") reserved characters  
are

   often used to delimit parameters and parameter values applicable to
   that segment.  The comma (",") reserved character is often used for
   similar purposes.  For example, one URI producer might use a segment
   such as "name;v=1.1" to indicate a reference to version 1.1 of
   "name", whereas another might use a segment such as "name,1.1" to
   indicate the same.  Parameter types may be defined by scheme- 
specific
   semantics, but in most cases the syntax of a parameter is  
specific to

   the implementation of the URI's dereferencing algorithm.
Note that the segment syntax in rfc 3986 explicitly allows sub-delims  
(through pchar), of which ';' is but one.


James

Re: parameters in URL path segments

2006-08-21 Thread James Berry


On Aug 21, 2006, at 6:26 PM, James Berry wrote:


Guys,

Sorry to open up this subject again. I've just read the mails in  
this thread:


http://marc.theaimsgroup.com/?l=tomcat-dev&m=115344110306194&w=2
http://marc.theaimsgroup.com/?l=tomcat-dev&m=115346837428224&w=2

Though I can't say I paid particular attention to the jkmount  
situation (and so I can't testify as to how treatment of such  
parameters might affect that), I can say that I'd like to be able  
to use ;parameter=value in my path segments in a tomcat environment:


http://example.com/this;biz=bar;foo/that;v=1.1/whatever

To me, this looks completely valid per rfc 2396, 2616, and 3986,  
and it's a surprise to me that tomcat strips any path following the  
first such parameter.


I would like to see tomcat essentially ignore the fact that ';'  
exists in a path segment, and pass it on into the servlet  
unmodified to do with as it pleases.


In fact, I don't see any motivation for any special handling of  
semicolon vs any other of the other sub-delims characters, none of  
which tomcat does anything special with. Comma and Plus are allowed  
in path segments, for instance; why is semicolon treated differently?  
The more recent RFCs say that there is nothing special that the  
server/container should do with such a character when it appears in a  
path segment.


Obviously, of course, there would be places where it would be plain  
wrong to place such a parameter/character (if it didn't map to a  
servlet, or to a file, or...) but there are other places (in extra  
path segments in pathInfo, for instance) where the interpretation of  
such characters should NOT BE made by the server, but by the ultimate  
consumer of those bits. Therefore, the server should simply not place  
any special meaning on such characters in any path segment.


James



I'm inspired by the following paragraph in G.4 of rfc 2396:

Extensive testing of current client applications demonstrated that
   the majority of deployed systems do not use the ";" character to
   indicate trailing parameter information, and that the presence of a
   semicolon in a path segment does not affect the relative parsing of
   that segment.  Therefore, parameters have been removed as a  
separate

   component and may now appear in any path segment.  Their influence
   has been removed from the algorithm for resolving a relative URI
   reference.  The resolution examples in Appendix C have been  
modified

   to reflect this change.

And also by the following from rfc 3986:

   Aside from dot-segments in hierarchical paths, a path segment is
   considered opaque by the generic syntax.  URI producing  
applications

   often use the reserved characters allowed in a segment to delimit
   scheme-specific or dereference-handler-specific subcomponents.  For
   example, the semicolon (";") and equals ("=") reserved  
characters are

   often used to delimit parameters and parameter values applicable to
   that segment.  The comma (",") reserved character is often used for
   similar purposes.  For example, one URI producer might use a  
segment

   such as "name;v=1.1" to indicate a reference to version 1.1 of
   "name", whereas another might use a segment such as "name,1.1" to
   indicate the same.  Parameter types may be defined by scheme- 
specific
   semantics, but in most cases the syntax of a parameter is  
specific to

   the implementation of the URI's dereferencing algorithm.
Note that the segment syntax in rfc 3986 explicitly allows sub- 
delims (through pchar), of which ';' is but one.


James



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: parameters in URL path segments

2006-08-23 Thread James Berry

Jean-Frederic, Bill, Remy,

I didn't get any response to this, but would really like to hear your  
thoughts on this issue of parameters within URL path segments, which  
tomcat current does not allow. Can you give me any feedback?


Thanks!

James

On Aug 21, 2006, at 8:42 PM, James Berry wrote:

On Aug 21, 2006, at 6:26 PM, James Berry wrote:


Guys,

Sorry to open up this subject again. I've just read the mails in  
this thread:


http://marc.theaimsgroup.com/?l=tomcat-dev&m=115344110306194&w=2
http://marc.theaimsgroup.com/?l=tomcat-dev&m=115346837428224&w=2

Though I can't say I paid particular attention to the jkmount  
situation (and so I can't testify as to how treatment of such  
parameters might affect that), I can say that I'd like to be able  
to use ;parameter=value in my path segments in a tomcat environment:


http://example.com/this;biz=bar;foo/that;v=1.1/whatever

To me, this looks completely valid per rfc 2396, 2616, and 3986,  
and it's a surprise to me that tomcat strips any path following  
the first such parameter.


I would like to see tomcat essentially ignore the fact that ';'  
exists in a path segment, and pass it on into the servlet  
unmodified to do with as it pleases.


In fact, I don't see any motivation for any special handling of  
semicolon vs any other of the other sub-delims characters, none of  
which tomcat does anything special with. Comma and Plus are allowed  
in path segments, for instance; why is semicolon treated  
differently? The more recent RFCs say that there is nothing special  
that the server/container should do with such a character when it  
appears in a path segment.


Obviously, of course, there would be places where it would be plain  
wrong to place such a parameter/character (if it didn't map to a  
servlet, or to a file, or...) but there are other places (in extra  
path segments in pathInfo, for instance) where the interpretation  
of such characters should NOT BE made by the server, but by the  
ultimate consumer of those bits. Therefore, the server should  
simply not place any special meaning on such characters in any path  
segment.




I'm inspired by the following paragraph in G.4 of rfc 2396:

Extensive testing of current client applications demonstrated that
   the majority of deployed systems do not use the ";" character to
   indicate trailing parameter information, and that the presence  
of a
   semicolon in a path segment does not affect the relative  
parsing of
   that segment.  Therefore, parameters have been removed as a  
separate

   component and may now appear in any path segment.  Their influence
   has been removed from the algorithm for resolving a relative URI
   reference.  The resolution examples in Appendix C have been  
modified

   to reflect this change.

And also by the following from rfc 3986:

   Aside from dot-segments in hierarchical paths, a path segment is
   considered opaque by the generic syntax.  URI producing  
applications

   often use the reserved characters allowed in a segment to delimit
   scheme-specific or dereference-handler-specific subcomponents.   
For
   example, the semicolon (";") and equals ("=") reserved  
characters are
   often used to delimit parameters and parameter values  
applicable to
   that segment.  The comma (",") reserved character is often used  
for
   similar purposes.  For example, one URI producer might use a  
segment

   such as "name;v=1.1" to indicate a reference to version 1.1 of
   "name", whereas another might use a segment such as "name,1.1" to
   indicate the same.  Parameter types may be defined by scheme- 
specific
   semantics, but in most cases the syntax of a parameter is  
specific to

   the implementation of the URI's dereferencing algorithm.
Note that the segment syntax in rfc 3986 explicitly allows sub- 
delims (through pchar), of which ';' is but one.


James



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: parameters in URL path segments

2006-08-23 Thread James Berry

Hi Jean-Frederic,

On Aug 23, 2006, at 1:24 PM, Jean-frederic Clere wrote:


James Berry wrote:

Jean-Frederic, Bill, Remy,

I didn't get any response to this, but would really like to hear  
your  thoughts on this issue of parameters within URL path  
segments, which  tomcat current does not allow. Can you give me  
any feedback?


http://example.com/this;biz=bar;foo/that;v=1.1/whatever
The question is to which "context" such a request has to be mapped?
http://example.com/this/that and servlet whatever?


My response is that the tomcat should be completely blind to  
"parameters". Basically, to Tomcat's perspective, they don't exist.  
There is nothing any more special about "this;biz=bar" than  
"this,biz=bar" or "this-biz-bar".


So if there is no context with the name "this;biz=bar" then tomcat  
should do whatever it does when there is no context  
"undefinedcontext". Same with servlet mapping.


Tomcat should be blind to the very existence of parameters because it  
doesn't place any meaning on them. And frankly, since it doesn't, I'm  
probably not very likely to try to give you a url such as that one,  
because there's no reason to make such an ugly url, but I guess it  
makes a good argument, and perhaps a good example.


A more realistic url I might use is:

	http://example.com/context/servlet/sega;with=parameters/ 
segb;v=1.0;b;c/segc


In such a case, my servlet would would parse the pathinfo and do  
something meaningful with the parameters.


My point, however, is that currently I cannot do the later, because  
tomcat is trying to be too smart. It doesn't support any meaning for  
the parameter syntax, and so, rather than tossing any path that  
contains parameters, it should just be blind to the fact that they  
are there. If somebody creates a url scheme that uses them, Tomcat  
shouldn't stand in their way.


James


What do you want to do will biz=bar and foo?
v=1.1... means that you want version 1.1 of  that, no?

Cheers

Jean-Frederic



Thanks!

James

On Aug 21, 2006, at 8:42 PM, James Berry wrote:


On Aug 21, 2006, at 6:26 PM, James Berry wrote:


Guys,

Sorry to open up this subject again. I've just read the mails  
in  this thread:


http://marc.theaimsgroup.com/?l=tomcat- 
dev&m=115344110306194&w=2
http://marc.theaimsgroup.com/?l=tomcat- 
dev&m=115346837428224&w=2


Though I can't say I paid particular attention to the jkmount   
situation (and so I can't testify as to how treatment of such   
parameters might affect that), I can say that I'd like to be  
able  to use ;parameter=value in my path segments in a tomcat  
environment:


http://example.com/this;biz=bar;foo/that;v=1.1/whatever

To me, this looks completely valid per rfc 2396, 2616, and  
3986,  and it's a surprise to me that tomcat strips any path  
following  the first such parameter.


I would like to see tomcat essentially ignore the fact that ';'   
exists in a path segment, and pass it on into the servlet   
unmodified to do with as it pleases.



In fact, I don't see any motivation for any special handling of   
semicolon vs any other of the other sub-delims characters, none  
of  which tomcat does anything special with. Comma and Plus are  
allowed  in path segments, for instance; why is semicolon  
treated  differently? The more recent RFCs say that there is  
nothing special  that the server/container should do with such a  
character when it  appears in a path segment.


Obviously, of course, there would be places where it would be  
plain  wrong to place such a parameter/character (if it didn't  
map to a  servlet, or to a file, or...) but there are other  
places (in extra  path segments in pathInfo, for instance) where  
the interpretation  of such characters should NOT BE made by the  
server, but by the  ultimate consumer of those bits. Therefore,  
the server should  simply not place any special meaning on such  
characters in any path  segment.




I'm inspired by the following paragraph in G.4 of rfc 2396:

Extensive testing of current client applications  
demonstrated that

   the majority of deployed systems do not use the ";" character to
   indicate trailing parameter information, and that the  
presence  of a
   semicolon in a path segment does not affect the relative   
parsing of
   that segment.  Therefore, parameters have been removed as a   
separate
   component and may now appear in any path segment.  Their  
influence

   has been removed from the algorithm for resolving a relative URI
   reference.  The resolution examples in Appendix C have been   
modified

   to reflect this change.

And also by the following from rfc 3986:

   Aside from dot-segments in hierarchical paths, a path segment is
   considered opaque by the generic syntax.  URI producing   
applications
   often us

Re: parameters in URL path segments

2006-08-23 Thread James Berry

Hi William,

On Aug 23, 2006, at 2:05 PM, William A. Rowe, Jr. wrote:


James Berry wrote:


My response is that the tomcat should be completely blind to
"parameters". Basically, to Tomcat's perspective, they don't exist.
There is nothing any more special about "this;biz=bar" than
"this,biz=bar" or "this-biz-bar".


But, of course, your access control does call out a segment this/,
then the segment this;biz=bar/ would escape that access control,
so in some ways it is *quite* special; parameters are extra metadata.


Perhaps I'm not understanding you. Yes, in this case the segment name  
should be "this;biz=bar" and not "this". If there were access control  
on segment "this" then "this;biz=bar" should not follow that access  
control.


In what way, and why, does "this;biz=bar" escape access control any  
more than "this-funny-name" would? If it was "this,biz=bar" would it?





Tomcat should be blind to the very existence of parameters because it
doesn't place any meaning on them.


I agree that an application could add meaning to a parameter, but do
consider the first rule of URI namespace which is that each and every
URI should be canonical and unique.  Returning the same 200 OK result
with the same document for everything under /abuseme means that a
crawler can end up with /abuseme/1 /abuseme/2 /abuseme/3 ... in all
sorts of nasty recursive situations.


Again, I'm not following you, perhaps. I can certainly do that today,  
by passing all sorts of information in pathinfo following any url.


Because Tomcat and Apache are blind to parameters, the connector - 
should-

reject them.  When Tomcat/Apache are able to treat your "this;biz=bar"
example the same as "this" for the purpose of access control, then  
they

can be enabled in an opaque manner that lets the application determine
their meaning and context.


So maybe this is the crux of it. Why/where is it that "this;biz=bar"  
cannot be treated the same for the purposes of access control as  
"this"? The URL spec says that these are equally valid, and that  
"this,biz=bar" is equally valid (and suggests too that it might also  
be used for passing parameters) but to my understanding, that should  
be no concern of tomcat's.


James

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: parameters in URL path segments

2006-08-23 Thread James Berry


On Aug 23, 2006, at 2:22 PM, James Berry wrote:

Tomcat should be blind to the very existence of parameters  
because it

doesn't place any meaning on them.


Perhaps I should clarify that when I say tomcat should be "blind" to  
parameters, perhaps I really should have said "Tomcat should not be  
discerning of parameters". It should completely _not_recognize_  
parameters. It's not that it shouldn't see them, but that it  
shouldn't recognize them as anything but a string of characters. It  
should treat the strings of characters "one", "two", "one+two",  
"one;two", "one;two=three", "one,two=three", etc, no differently  
(yes, they are distinct strings, but none is more special than any  
other). rfc 3986 basically says they should be no distinctions  
between these segment names.


James

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: parameters in URL path segments

2006-08-23 Thread James Berry


On Aug 23, 2006, at 2:40 PM, William A. Rowe, Jr. wrote:


James Berry wrote:


Because Tomcat and Apache are blind to parameters, the connector - 
should-
reject them.  When Tomcat/Apache are able to treat your  
"this;biz=bar"
example the same as "this" for the purpose of access control,  
then they
can be enabled in an opaque manner that lets the application  
determine

their meaning and context.


So maybe this is the crux of it. Why/where is it that "this;biz=bar"
cannot be treated the same for the purposes of access control as  
"this"?
The URL spec says that these are equally valid, and that  
"this,biz=bar"

is equally valid (and suggests too that it might also be used for
passing parameters) but to my understanding, that should be no  
concern

of tomcat's.


BUT today's parsers don't do that.  So any DENY rule on "this"  
would let
"this;biz=bar" slip through, while the handler might process "this"  
and

ignore parameters entirely.


So such a deny rule on "this" would currently let "this;biz=bar"  
through, and would also let "thisthatandtheotherthing" through too,  
right?


I see nothing wrong with that: if follows my assertion that semicolon  
parameters simply should not be treated any differently.



Now understand I'm not a big fan of deny rules (deny all, then always
selectively grant access ;-)  But we can't ignore that they exist, and
if parameters must be treated independently of the resource that they


What I'm saying is that they should not be treated independently or  
differently. They should be treated not as metadata, but as part of  
the segment.



modify, then /myfolder;v=1.1/records.doc;f=rtf must parse against any
access control rules of /myfolder/records.doc, which means they need
a canonical form for access control independent of their parameters.


The situation today is that if I form one url that looks like:

/myfolder;v=1.1/records.doc;f=rtf

and another that looks like:

/myfolder,v=1.1/records.doc,f=rtf

and another that looks like:

/myfolder+v=1.1/records.doc+f=rtf

The one that uses semicolons is mangled by Tomcat, while the other  
two are not. What I'm claiming is that this is a bug, because  
parameters should not be treated differently by tomcat; it can do  
nothing different with the first url than it can with the second or  
third url, and it should leave the distinction to code that can,  
rather than assuming it knows something that it doesn't.


The rfcs, to me, are saying that http servers shouldn't treat segment  
parameters in any special way. And until they should, or can, or  
really want to, Tomcat shouldn't reject a semicolon when it doesn't  
reject a comma.



I need to research, but it's probably doable.  It's not doable by just
tweaking the code in mod_jk, however.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: parameters in URL path segments

2006-08-23 Thread James Berry

Hi Bill,

On Aug 23, 2006, at 3:35 PM, William A. Rowe, Jr. wrote:


James Berry wrote:

What I'm saying is that they should not be treated independently or
differently. They should be treated not as metadata, but as part  
of the

segment.


To be 100% clear; this is what Apache httpd does today.  If you ask  
for

foo.html;v=1 it will open the -file- foo.html;v=1 or fail.  What jk or
tomcat does with the same is up to those components, but httpd has no
magic whatsoever which is what you want.  You would like the same of
Tomcat.  But...

...this would be valid if /servlet/MyApplication;v=1 invokes the class
MyApplication;v=1


Yes, I would expect it to invoke the class "MyApplication;v=1". I  
hadn't considered the other behavior.



and not MyApplication with a parameter of v=1.
If it invokes the class MyApplication then we can't follow your  
philosophy
since the permissions were likely to apply to the servlet class and  
not

to the precise syntax the user called MyApplication with.


That's not  what I'm asking for. It's an interesting idea, but as  
you've pointed out, might require significantly more work and thought  
to get right.


I'm simply asking that Tomcat not mangle urls that contain  
parameters. The cases in which I'd be interested in using parameters  
are below the servlet level (in pathinfo), so invoking servlets with  
such parameters, while conceivably interesting, is by no means  
necessary.


I'd ask that, for now, special case handling of parameters be  
removed. If at some point in the future support was added to  
specially parse parameters and pass them into servlets, then I'd  
assume there would have to be special configuration to enable such  
behavior. So allowing the more wide open general case behavior that  
I'm asking for now wouldn't prevent such support in the future, as  
parameters to servlets would presumably only be parsed if such  
behavior was enabled.


Given the present behavior, however, any use of parameters is  
disallowed, which seems overly restrictive, and certainly prevents a  
broad range of interesting application for segment parameters.


James




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: parameters in URL path segments

2006-09-05 Thread James Berry

Bill and others,

I think we fell into agreement regarding how this should work.

What does it take to go about getting this fixed in tomcat? A patch?

	- Is there sufficient agreement from others that the behavior I  
described is desired?


	- Can anybody point me at where I might go about changing this  
behavior, and whether there are any pitfalls I might look out for?


	- Can somebody comment on how a change to this behavior might  
interact with the recent change (http://marc.theaimsgroup.com/? 
l=tomcat-dev&m=115344110306194&w=2) or affect the motivation for that  
change?


Thanks!

James


On Aug 23, 2006, at 4:05 PM, James Berry wrote:


Hi Bill,

On Aug 23, 2006, at 3:35 PM, William A. Rowe, Jr. wrote:


James Berry wrote:

What I'm saying is that they should not be treated independently or
differently. They should be treated not as metadata, but as part  
of the

segment.


To be 100% clear; this is what Apache httpd does today.  If you  
ask for
foo.html;v=1 it will open the -file- foo.html;v=1 or fail.  What  
jk or

tomcat does with the same is up to those components, but httpd has no
magic whatsoever which is what you want.  You would like the same of
Tomcat.  But...

...this would be valid if /servlet/MyApplication;v=1 invokes the  
class

MyApplication;v=1


Yes, I would expect it to invoke the class "MyApplication;v=1". I  
hadn't considered the other behavior.



and not MyApplication with a parameter of v=1.
If it invokes the class MyApplication then we can't follow your  
philosophy
since the permissions were likely to apply to the servlet class  
and not

to the precise syntax the user called MyApplication with.


That's not  what I'm asking for. It's an interesting idea, but as  
you've pointed out, might require significantly more work and  
thought to get right.


I'm simply asking that Tomcat not mangle urls that contain  
parameters. The cases in which I'd be interested in using  
parameters are below the servlet level (in pathinfo), so invoking  
servlets with such parameters, while conceivably interesting, is by  
no means necessary.


I'd ask that, for now, special case handling of parameters be  
removed. If at some point in the future support was added to  
specially parse parameters and pass them into servlets, then I'd  
assume there would have to be special configuration to enable such  
behavior. So allowing the more wide open general case behavior that  
I'm asking for now wouldn't prevent such support in the future, as  
parameters to servlets would presumably only be parsed if such  
behavior was enabled.


Given the present behavior, however, any use of parameters is  
disallowed, which seems overly restrictive, and certainly prevents  
a broad range of interesting application for segment parameters.


James




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]