Mark, On 7/7/16 11:53 AM, Mark Thomas wrote: > On 06/07/2016 22:55, Christopher Schultz wrote: >> Mark, >> >> On 7/3/16 4:24 PM, ma...@apache.org wrote: >>> Author: markt >>> Date: Sun Jul 3 20:24:18 2016 >>> New Revision: 1751173 >>> >>> URL: http://svn.apache.org/viewvc?rev=1751173&view=rev >>> Log: >>> The original request for regular expression support would be too expensive >>> to implement. >>> This commit adds support for wild card host names. >>> It adds overhead for requests where the Host header is not an exact match >>> since the code now has to convert the name in the header to the wild card >>> form and then search for that. >>> However, this overhead is offset by caching the default host so it is not >>> necessary to do a look up for the default host. >>> I've also expanded the performance tests. On my laptop the before and after >>> results are broadly similar with some small improvements and some small >>> increases. > > <snip/> > > >>> @@ -720,12 +747,24 @@ public final class Mapper { >>> MappedHost[] hosts = this.hosts; >>> MappedHost mappedHost = exactFindIgnoreCase(hosts, host); >>> if (mappedHost == null) { >>> - if (defaultHostName == null) { >>> - return; >>> + // Note: Internally, the Mapper does not use the leading * on a >>> + // wildcard host. This is to allow this shortcut. >>> + int firstDot = host.indexOf('.'); >>> + if (firstDot > -1) { >>> + int offset = host.getOffset(); >>> + try { >>> + host.setOffset(firstDot + offset); >>> + mappedHost = exactFindIgnoreCase(hosts, host); >>> + } finally { >>> + // Make absolutely sure this gets reset >>> + host.setOffset(offset); >>> + } >>> } >>> - mappedHost = exactFind(hosts, defaultHostName); >>> if (mappedHost == null) { >>> - return; >>> + mappedHost = defaultHost; >>> + if (mappedHost == null) { >>> + return; >>> + } >>> } >>> } >>> mappingData.host = mappedHost.object; >>> @@ -1497,6 +1536,22 @@ public final class Mapper { >>> } >>> >>> >>> + /* >>> + * To simplify the mapping process, wild card hosts take the form >>> + * ".apache.org" rather than "*.apache.org" internally. However, for >>> ease >>> + * of use the external form remains "*.apache.org". Any host name >>> passed >>> + * into this class needs to be passed through this method to rename and >>> + * wild card host names from the external to internal form. >>> + */ >>> + private static String renameWildcardHost(String hostName) { >>> + if (hostName.startsWith("*.")) { >>> + return hostName.substring(1); >>> + } else { >>> + return hostName; >>> + } >>> + } >>> + >>> + >>> // ------------------------------------------------- MapElement Inner >>> Class >>> >>> >>> > >> It's tough to tell from this diff... does the server take a performance >> hit of wildcard-matching if no wildcards are in use? > > I've removed the noise and left the key parts above. > > Marginally. > > The difference comes if no exact match to the requested host is found. > > The old code then searched for the default host > > The new code searches for a wildcard host and if not match is found > returns the cached default host. > > The new code is slightly slower in some cases and faster in others but > the performance difference was marginal. It was the order of 10 > nanoseconds per request. > >> My (long overdue) plan for doing regular-expression matching for things >> like this was going to be to encapsulate the searching algorithm into a >> few classes that would behave differently depending upon what needed to >> be searched. For example, if all hostnames were explicitly-named, then >> the existing binary search algorithm could be used, or if there was only >> the one default host, it would always return that default host. For >> regular expressions, the entire algorithm would change. >> >> I can't see from the diff where the change to the map() function is, so >> I suspect it's quite a small change indeed, which is why I was thinking >> that it might affect all requests even when wildcards weren't in use. > > A 'wild cards in use flag' could be added that would improve things > considerably since it would save an entire lookup. Any patch along those > lines would need to be careful to keep track of whether or not wild > cards were being used.
Yes, that was why I was thinking of having a separate class for each search-type... basically, once a wildcard host had been added, we'd just convert from the "simple" (and fast) search algorithm to the wildcard-aware one. Switching back when all wildcard mappings had been removed (e.g. host-manager) might be tricky but possible. -chris
signature.asc
Description: OpenPGP digital signature