Re: Tomcat 7 & regex
1) It it were configurable, in certain places it makes sense to use space as a separator (e.g. in IP addresses). -> any whitespace? -> \w+ and we end up with using a regex to split a list of regexes. 2) It might make sense to require regex expressions to be surrounded by '/'. E.g. "/192\.168\.1\.\d{1,3}/" is a regex, but "192.168.1.17" is a literal value. 3) I wonder if it makes sense to manipulate RequestFilterValve though JMX. E.g. to add/remove some filtering patterns at runtime. Mark, are there other places than RequestFilterValve and its subclasses (RemoteAddrValve, RemoteHostValve) where you are planning this change? There this feature can be configurable. E.g. if split='' then splitting is not performed at all. I do not see why we should force users to use a single regex only. Having a single regex by default is OK with me, but forcing a single regex saves too little in performance of RequestFilterValve.process(..) (removes iterating over an array but adds a null check). [OT] Merry X'mas Best regards, Konstantin Kolinko 2010/12/25 Tim Funk : > +0.5 - I wonder if in some cases - it may be preferable to use a property > called split which lets the user define the separator which we can pass to > String.split(). [Which OTOH may be more confusing (yet powerful) since the > user is using a regex to split get a list of regex] > > -Tim > > On 12/24/2010 1:34 PM, Mark Thomas wrote: >> >> There are a number of configuration properties defined as "comma >> separated regular expressions". As someone pointed out at at ApacheCon >> that is a little odd. It stops "," being used in an expression and is >> inefficient. >> >> Having just been bitten by this while setting up the new Jira instance, >> I intend change all properties that take regex in Tomcat 7 to use a >> single regex. This will simplify the code, simplify configuration and >> make the regex processing faster. >> >> I probably won't get around to actually doing this until the new year. > > - > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Tomcat 7 & regex
On 25/12/2010 01:49, Tim Funk wrote: > +0.5 - I wonder if in some cases - it may be preferable to use a > property called split which lets the user define the separator which we > can pass to String.split(). [Which OTOH may be more confusing (yet > powerful) since the user is using a regex to split get a list of regex] I think that just makes it more complicated. There is no need to split anything up since the regex can just use |. Rather than mix regex plus our own proprietary splitting mechanism, I think we should just use the support already provided by regex to do the same thing. Mark > > -Tim > > On 12/24/2010 1:34 PM, Mark Thomas wrote: >> There are a number of configuration properties defined as "comma >> separated regular expressions". As someone pointed out at at ApacheCon >> that is a little odd. It stops "," being used in an expression and is >> inefficient. >> >> Having just been bitten by this while setting up the new Jira instance, >> I intend change all properties that take regex in Tomcat 7 to use a >> single regex. This will simplify the code, simplify configuration and >> make the regex processing faster. >> >> I probably won't get around to actually doing this until the new year. > > - > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Tomcat 7 & regex
On 25/12/2010 13:37, Konstantin Kolinko wrote: > 1) It it were configurable, in certain places it makes sense to use > space as a separator (e.g. in IP addresses). > -> any whitespace? -> \w+ and we end up with using a regex to split a > list of regexes. Yes, space could work but I'd rather stick to what folks expect of standard regex. | achieves the same result but is standard regex. > 2) It might make sense to require regex expressions to be surrounded by '/'. > E.g. "/192\.168\.1\.\d{1,3}/" is a regex, but "192.168.1.17" is a literal > value. I'd rather Tomcat 7 moved towards an existing standard rather than tried to create a new one. > 3) I wonder if it makes sense to manipulate RequestFilterValve though > JMX. E.g. to add/remove some filtering patterns at runtime. It is certainly something I can see would be useful - e.g. reacting to an attacker. Making that dynamic should be do-able with care. > Mark, are there other places than RequestFilterValve and its > subclasses (RemoteAddrValve, RemoteHostValve) where you are planning > this change? It was actually RemoteIpValve that got me started on this. You can't explicitly set the default since it uses ',' in the regex but we also split using ','. I wanted to fix that and moving to a single regex fixes that and removes any chance of any similar gotchas in the future. > There this feature can be configurable. E.g. if split='' then > splitting is not performed at all. I do not see why we should force > users to use a single regex only. All it really does is force users to use the standard regex of '|' where they currently use ','. > Having a single regex by default is OK with me, but forcing a single > regex saves too little in performance of > RequestFilterValve.process(..) (removes iterating over an array but > adds a null check). I don't have any hard numbers but I suspect matching a single regex using '|' is going to be faster than matching multiple. Probably not by much. The code simplification is pretty minor too. > [OT] Merry X'mas +1 to all. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Tomcat 7 & regex
I am thinking from an admin point of view. While you can combine OR conditionals in regex's - when you get something more complicated - you may encounter a nasty nesting of () to get all the nested OR's correct. So while one COULD to it in a single regex - most mere mortals might not be able to comprehend that regex. This is where combining different regexes by a user defined split may come in handy. But in reality ... the general case a user is doing is probably a singular regex. So since the capability exists for more exotic matches, it may be easier to simplify the code per your (mark's) suggestion and then provide enough coaching to demonstrate how more advanced regexes can be applied. (So I am cool with it. Time permitting - I'll try to provide an example or 2 for odd regexes - like RFC1918 matching for the RemoteAddrressFilter) -Tim On 12/25/2010 2:53 PM, Mark Thomas wrote: On 25/12/2010 01:49, Tim Funk wrote: +0.5 - I wonder if in some cases - it may be preferable to use a property called split which lets the user define the separator which we can pass to String.split(). [Which OTOH may be more confusing (yet powerful) since the user is using a regex to split get a list of regex] I think that just makes it more complicated. There is no need to split anything up since the regex can just use |. Rather than mix regex plus our own proprietary splitting mechanism, I think we should just use the support already provided by regex to do the same thing. Mark - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: Tomcat 7 & regex
+1 i like the approach toward standard regex. ones that use it most likely will be sysadmins, so they should at least know how to use regex in their tools like shell, awk, perl, python or powershell. it will be nice if the standard regex is in place. --- daniel baktiar http://savinggaia.tritiumapps.com - saving the planet is everyone's business! On 26 December 2010 04:02, Mark Thomas wrote: > On 25/12/2010 13:37, Konstantin Kolinko wrote: > > 1) It it were configurable, in certain places it makes sense to use > > space as a separator (e.g. in IP addresses). > > -> any whitespace? -> \w+ and we end up with using a regex to split a > > list of regexes. > > Yes, space could work but I'd rather stick to what folks expect of > standard regex. | achieves the same result but is standard regex. > > > 2) It might make sense to require regex expressions to be surrounded by > '/'. > > E.g. "/192\.168\.1\.\d{1,3}/" is a regex, but "192.168.1.17" is a literal > value. > > I'd rather Tomcat 7 moved towards an existing standard rather than tried > to create a new one. > > > 3) I wonder if it makes sense to manipulate RequestFilterValve though > > JMX. E.g. to add/remove some filtering patterns at runtime. > > It is certainly something I can see would be useful - e.g. reacting to > an attacker. Making that dynamic should be do-able with care. > > > Mark, are there other places than RequestFilterValve and its > > subclasses (RemoteAddrValve, RemoteHostValve) where you are planning > > this change? > > It was actually RemoteIpValve that got me started on this. You can't > explicitly set the default since it uses ',' in the regex but we also > split using ','. I wanted to fix that and moving to a single regex fixes > that and removes any chance of any similar gotchas in the future. > > > There this feature can be configurable. E.g. if split='' then > > splitting is not performed at all. I do not see why we should force > > users to use a single regex only. > > All it really does is force users to use the standard regex of '|' where > they currently use ','. > > > Having a single regex by default is OK with me, but forcing a single > > regex saves too little in performance of > > RequestFilterValve.process(..) (removes iterating over an array but > > adds a null check). > > I don't have any hard numbers but I suspect matching a single regex > using '|' is going to be faster than matching multiple. Probably not by > much. The code simplification is pretty minor too. > > > [OT] Merry X'mas > +1 to all. > > Mark > > - > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > >