parameters in URL path segments
Guys, Sorry to open up this subject again. I've just read the mails in this thread: http://marc.theaimsgroup.com/?l=tomcat-dev&m=115344110306194&w=2 http://marc.theaimsgroup.com/?l=tomcat-dev&m=115346837428224&w=2 Though I can't say I paid particular attention to the jkmount situation (and so I can't testify as to how treatment of such parameters might affect that), I can say that I'd like to be able to use ;parameter=value in my path segments in a tomcat environment: http://example.com/this;biz=bar;foo/that;v=1.1/whatever To me, this looks completely valid per rfc 2396, 2616, and 3986, and it's a surprise to me that tomcat strips any path following the first such parameter. I would like to see tomcat essentially ignore the fact that ';' exists in a path segment, and pass it on into the servlet unmodified to do with as it pleases. I'm inspired by the following paragraph in G.4 of rfc 2396: Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference. The resolution examples in Appendix C have been modified to reflect this change. And also by the following from rfc 3986: Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme- specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm. Note that the segment syntax in rfc 3986 explicitly allows sub-delims (through pchar), of which ';' is but one. James
Re: parameters in URL path segments
On Aug 21, 2006, at 6:26 PM, James Berry wrote: Guys, Sorry to open up this subject again. I've just read the mails in this thread: http://marc.theaimsgroup.com/?l=tomcat-dev&m=115344110306194&w=2 http://marc.theaimsgroup.com/?l=tomcat-dev&m=115346837428224&w=2 Though I can't say I paid particular attention to the jkmount situation (and so I can't testify as to how treatment of such parameters might affect that), I can say that I'd like to be able to use ;parameter=value in my path segments in a tomcat environment: http://example.com/this;biz=bar;foo/that;v=1.1/whatever To me, this looks completely valid per rfc 2396, 2616, and 3986, and it's a surprise to me that tomcat strips any path following the first such parameter. I would like to see tomcat essentially ignore the fact that ';' exists in a path segment, and pass it on into the servlet unmodified to do with as it pleases. In fact, I don't see any motivation for any special handling of semicolon vs any other of the other sub-delims characters, none of which tomcat does anything special with. Comma and Plus are allowed in path segments, for instance; why is semicolon treated differently? The more recent RFCs say that there is nothing special that the server/container should do with such a character when it appears in a path segment. Obviously, of course, there would be places where it would be plain wrong to place such a parameter/character (if it didn't map to a servlet, or to a file, or...) but there are other places (in extra path segments in pathInfo, for instance) where the interpretation of such characters should NOT BE made by the server, but by the ultimate consumer of those bits. Therefore, the server should simply not place any special meaning on such characters in any path segment. James I'm inspired by the following paragraph in G.4 of rfc 2396: Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference. The resolution examples in Appendix C have been modified to reflect this change. And also by the following from rfc 3986: Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme- specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm. Note that the segment syntax in rfc 3986 explicitly allows sub- delims (through pchar), of which ';' is but one. James - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: parameters in URL path segments
Jean-Frederic, Bill, Remy, I didn't get any response to this, but would really like to hear your thoughts on this issue of parameters within URL path segments, which tomcat current does not allow. Can you give me any feedback? Thanks! James On Aug 21, 2006, at 8:42 PM, James Berry wrote: On Aug 21, 2006, at 6:26 PM, James Berry wrote: Guys, Sorry to open up this subject again. I've just read the mails in this thread: http://marc.theaimsgroup.com/?l=tomcat-dev&m=115344110306194&w=2 http://marc.theaimsgroup.com/?l=tomcat-dev&m=115346837428224&w=2 Though I can't say I paid particular attention to the jkmount situation (and so I can't testify as to how treatment of such parameters might affect that), I can say that I'd like to be able to use ;parameter=value in my path segments in a tomcat environment: http://example.com/this;biz=bar;foo/that;v=1.1/whatever To me, this looks completely valid per rfc 2396, 2616, and 3986, and it's a surprise to me that tomcat strips any path following the first such parameter. I would like to see tomcat essentially ignore the fact that ';' exists in a path segment, and pass it on into the servlet unmodified to do with as it pleases. In fact, I don't see any motivation for any special handling of semicolon vs any other of the other sub-delims characters, none of which tomcat does anything special with. Comma and Plus are allowed in path segments, for instance; why is semicolon treated differently? The more recent RFCs say that there is nothing special that the server/container should do with such a character when it appears in a path segment. Obviously, of course, there would be places where it would be plain wrong to place such a parameter/character (if it didn't map to a servlet, or to a file, or...) but there are other places (in extra path segments in pathInfo, for instance) where the interpretation of such characters should NOT BE made by the server, but by the ultimate consumer of those bits. Therefore, the server should simply not place any special meaning on such characters in any path segment. I'm inspired by the following paragraph in G.4 of rfc 2396: Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference. The resolution examples in Appendix C have been modified to reflect this change. And also by the following from rfc 3986: Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme- specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm. Note that the segment syntax in rfc 3986 explicitly allows sub- delims (through pchar), of which ';' is but one. James - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: parameters in URL path segments
Hi Jean-Frederic, On Aug 23, 2006, at 1:24 PM, Jean-frederic Clere wrote: James Berry wrote: Jean-Frederic, Bill, Remy, I didn't get any response to this, but would really like to hear your thoughts on this issue of parameters within URL path segments, which tomcat current does not allow. Can you give me any feedback? http://example.com/this;biz=bar;foo/that;v=1.1/whatever The question is to which "context" such a request has to be mapped? http://example.com/this/that and servlet whatever? My response is that the tomcat should be completely blind to "parameters". Basically, to Tomcat's perspective, they don't exist. There is nothing any more special about "this;biz=bar" than "this,biz=bar" or "this-biz-bar". So if there is no context with the name "this;biz=bar" then tomcat should do whatever it does when there is no context "undefinedcontext". Same with servlet mapping. Tomcat should be blind to the very existence of parameters because it doesn't place any meaning on them. And frankly, since it doesn't, I'm probably not very likely to try to give you a url such as that one, because there's no reason to make such an ugly url, but I guess it makes a good argument, and perhaps a good example. A more realistic url I might use is: http://example.com/context/servlet/sega;with=parameters/ segb;v=1.0;b;c/segc In such a case, my servlet would would parse the pathinfo and do something meaningful with the parameters. My point, however, is that currently I cannot do the later, because tomcat is trying to be too smart. It doesn't support any meaning for the parameter syntax, and so, rather than tossing any path that contains parameters, it should just be blind to the fact that they are there. If somebody creates a url scheme that uses them, Tomcat shouldn't stand in their way. James What do you want to do will biz=bar and foo? v=1.1... means that you want version 1.1 of that, no? Cheers Jean-Frederic Thanks! James On Aug 21, 2006, at 8:42 PM, James Berry wrote: On Aug 21, 2006, at 6:26 PM, James Berry wrote: Guys, Sorry to open up this subject again. I've just read the mails in this thread: http://marc.theaimsgroup.com/?l=tomcat- dev&m=115344110306194&w=2 http://marc.theaimsgroup.com/?l=tomcat- dev&m=115346837428224&w=2 Though I can't say I paid particular attention to the jkmount situation (and so I can't testify as to how treatment of such parameters might affect that), I can say that I'd like to be able to use ;parameter=value in my path segments in a tomcat environment: http://example.com/this;biz=bar;foo/that;v=1.1/whatever To me, this looks completely valid per rfc 2396, 2616, and 3986, and it's a surprise to me that tomcat strips any path following the first such parameter. I would like to see tomcat essentially ignore the fact that ';' exists in a path segment, and pass it on into the servlet unmodified to do with as it pleases. In fact, I don't see any motivation for any special handling of semicolon vs any other of the other sub-delims characters, none of which tomcat does anything special with. Comma and Plus are allowed in path segments, for instance; why is semicolon treated differently? The more recent RFCs say that there is nothing special that the server/container should do with such a character when it appears in a path segment. Obviously, of course, there would be places where it would be plain wrong to place such a parameter/character (if it didn't map to a servlet, or to a file, or...) but there are other places (in extra path segments in pathInfo, for instance) where the interpretation of such characters should NOT BE made by the server, but by the ultimate consumer of those bits. Therefore, the server should simply not place any special meaning on such characters in any path segment. I'm inspired by the following paragraph in G.4 of rfc 2396: Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference. The resolution examples in Appendix C have been modified to reflect this change. And also by the following from rfc 3986: Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often us
Re: parameters in URL path segments
Hi William, On Aug 23, 2006, at 2:05 PM, William A. Rowe, Jr. wrote: James Berry wrote: My response is that the tomcat should be completely blind to "parameters". Basically, to Tomcat's perspective, they don't exist. There is nothing any more special about "this;biz=bar" than "this,biz=bar" or "this-biz-bar". But, of course, your access control does call out a segment this/, then the segment this;biz=bar/ would escape that access control, so in some ways it is *quite* special; parameters are extra metadata. Perhaps I'm not understanding you. Yes, in this case the segment name should be "this;biz=bar" and not "this". If there were access control on segment "this" then "this;biz=bar" should not follow that access control. In what way, and why, does "this;biz=bar" escape access control any more than "this-funny-name" would? If it was "this,biz=bar" would it? Tomcat should be blind to the very existence of parameters because it doesn't place any meaning on them. I agree that an application could add meaning to a parameter, but do consider the first rule of URI namespace which is that each and every URI should be canonical and unique. Returning the same 200 OK result with the same document for everything under /abuseme means that a crawler can end up with /abuseme/1 /abuseme/2 /abuseme/3 ... in all sorts of nasty recursive situations. Again, I'm not following you, perhaps. I can certainly do that today, by passing all sorts of information in pathinfo following any url. Because Tomcat and Apache are blind to parameters, the connector - should- reject them. When Tomcat/Apache are able to treat your "this;biz=bar" example the same as "this" for the purpose of access control, then they can be enabled in an opaque manner that lets the application determine their meaning and context. So maybe this is the crux of it. Why/where is it that "this;biz=bar" cannot be treated the same for the purposes of access control as "this"? The URL spec says that these are equally valid, and that "this,biz=bar" is equally valid (and suggests too that it might also be used for passing parameters) but to my understanding, that should be no concern of tomcat's. James - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: parameters in URL path segments
On Aug 23, 2006, at 2:22 PM, James Berry wrote: Tomcat should be blind to the very existence of parameters because it doesn't place any meaning on them. Perhaps I should clarify that when I say tomcat should be "blind" to parameters, perhaps I really should have said "Tomcat should not be discerning of parameters". It should completely _not_recognize_ parameters. It's not that it shouldn't see them, but that it shouldn't recognize them as anything but a string of characters. It should treat the strings of characters "one", "two", "one+two", "one;two", "one;two=three", "one,two=three", etc, no differently (yes, they are distinct strings, but none is more special than any other). rfc 3986 basically says they should be no distinctions between these segment names. James - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: parameters in URL path segments
On Aug 23, 2006, at 2:40 PM, William A. Rowe, Jr. wrote: James Berry wrote: Because Tomcat and Apache are blind to parameters, the connector - should- reject them. When Tomcat/Apache are able to treat your "this;biz=bar" example the same as "this" for the purpose of access control, then they can be enabled in an opaque manner that lets the application determine their meaning and context. So maybe this is the crux of it. Why/where is it that "this;biz=bar" cannot be treated the same for the purposes of access control as "this"? The URL spec says that these are equally valid, and that "this,biz=bar" is equally valid (and suggests too that it might also be used for passing parameters) but to my understanding, that should be no concern of tomcat's. BUT today's parsers don't do that. So any DENY rule on "this" would let "this;biz=bar" slip through, while the handler might process "this" and ignore parameters entirely. So such a deny rule on "this" would currently let "this;biz=bar" through, and would also let "thisthatandtheotherthing" through too, right? I see nothing wrong with that: if follows my assertion that semicolon parameters simply should not be treated any differently. Now understand I'm not a big fan of deny rules (deny all, then always selectively grant access ;-) But we can't ignore that they exist, and if parameters must be treated independently of the resource that they What I'm saying is that they should not be treated independently or differently. They should be treated not as metadata, but as part of the segment. modify, then /myfolder;v=1.1/records.doc;f=rtf must parse against any access control rules of /myfolder/records.doc, which means they need a canonical form for access control independent of their parameters. The situation today is that if I form one url that looks like: /myfolder;v=1.1/records.doc;f=rtf and another that looks like: /myfolder,v=1.1/records.doc,f=rtf and another that looks like: /myfolder+v=1.1/records.doc+f=rtf The one that uses semicolons is mangled by Tomcat, while the other two are not. What I'm claiming is that this is a bug, because parameters should not be treated differently by tomcat; it can do nothing different with the first url than it can with the second or third url, and it should leave the distinction to code that can, rather than assuming it knows something that it doesn't. The rfcs, to me, are saying that http servers shouldn't treat segment parameters in any special way. And until they should, or can, or really want to, Tomcat shouldn't reject a semicolon when it doesn't reject a comma. I need to research, but it's probably doable. It's not doable by just tweaking the code in mod_jk, however. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: parameters in URL path segments
Hi Bill, On Aug 23, 2006, at 3:35 PM, William A. Rowe, Jr. wrote: James Berry wrote: What I'm saying is that they should not be treated independently or differently. They should be treated not as metadata, but as part of the segment. To be 100% clear; this is what Apache httpd does today. If you ask for foo.html;v=1 it will open the -file- foo.html;v=1 or fail. What jk or tomcat does with the same is up to those components, but httpd has no magic whatsoever which is what you want. You would like the same of Tomcat. But... ...this would be valid if /servlet/MyApplication;v=1 invokes the class MyApplication;v=1 Yes, I would expect it to invoke the class "MyApplication;v=1". I hadn't considered the other behavior. and not MyApplication with a parameter of v=1. If it invokes the class MyApplication then we can't follow your philosophy since the permissions were likely to apply to the servlet class and not to the precise syntax the user called MyApplication with. That's not what I'm asking for. It's an interesting idea, but as you've pointed out, might require significantly more work and thought to get right. I'm simply asking that Tomcat not mangle urls that contain parameters. The cases in which I'd be interested in using parameters are below the servlet level (in pathinfo), so invoking servlets with such parameters, while conceivably interesting, is by no means necessary. I'd ask that, for now, special case handling of parameters be removed. If at some point in the future support was added to specially parse parameters and pass them into servlets, then I'd assume there would have to be special configuration to enable such behavior. So allowing the more wide open general case behavior that I'm asking for now wouldn't prevent such support in the future, as parameters to servlets would presumably only be parsed if such behavior was enabled. Given the present behavior, however, any use of parameters is disallowed, which seems overly restrictive, and certainly prevents a broad range of interesting application for segment parameters. James - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: parameters in URL path segments
Bill and others, I think we fell into agreement regarding how this should work. What does it take to go about getting this fixed in tomcat? A patch? - Is there sufficient agreement from others that the behavior I described is desired? - Can anybody point me at where I might go about changing this behavior, and whether there are any pitfalls I might look out for? - Can somebody comment on how a change to this behavior might interact with the recent change (http://marc.theaimsgroup.com/? l=tomcat-dev&m=115344110306194&w=2) or affect the motivation for that change? Thanks! James On Aug 23, 2006, at 4:05 PM, James Berry wrote: Hi Bill, On Aug 23, 2006, at 3:35 PM, William A. Rowe, Jr. wrote: James Berry wrote: What I'm saying is that they should not be treated independently or differently. They should be treated not as metadata, but as part of the segment. To be 100% clear; this is what Apache httpd does today. If you ask for foo.html;v=1 it will open the -file- foo.html;v=1 or fail. What jk or tomcat does with the same is up to those components, but httpd has no magic whatsoever which is what you want. You would like the same of Tomcat. But... ...this would be valid if /servlet/MyApplication;v=1 invokes the class MyApplication;v=1 Yes, I would expect it to invoke the class "MyApplication;v=1". I hadn't considered the other behavior. and not MyApplication with a parameter of v=1. If it invokes the class MyApplication then we can't follow your philosophy since the permissions were likely to apply to the servlet class and not to the precise syntax the user called MyApplication with. That's not what I'm asking for. It's an interesting idea, but as you've pointed out, might require significantly more work and thought to get right. I'm simply asking that Tomcat not mangle urls that contain parameters. The cases in which I'd be interested in using parameters are below the servlet level (in pathinfo), so invoking servlets with such parameters, while conceivably interesting, is by no means necessary. I'd ask that, for now, special case handling of parameters be removed. If at some point in the future support was added to specially parse parameters and pass them into servlets, then I'd assume there would have to be special configuration to enable such behavior. So allowing the more wide open general case behavior that I'm asking for now wouldn't prevent such support in the future, as parameters to servlets would presumably only be parsed if such behavior was enabled. Given the present behavior, however, any use of parameters is disallowed, which seems overly restrictive, and certainly prevents a broad range of interesting application for segment parameters. James - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]