Hi Nitzan, Cant you do what you described with PathHierarchyTokenizerFactory?
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.html Ahmet On Friday, May 16, 2014 5:13 PM, Nitzan Shaked <nitzan.sha...@gmail.com> wrote: Hi list I created a small token filter which I'd gladly "contribute", but want to know if there's any interest in it before I go and make it pretty, add documentation, etc... ;) I originally created it to index domain names: I wanted to be able to search for "google.com" and find "www.google.com" or "ads.google.com", " mail.google.com", etc. What it does is split a token (in my case -- according to "."), and then outputs all sub-sequences. So "a,b,c,d" will output "a", "b", "c", "d", "a.b", "b.c", "c.d", "a.b.c", "b.c.d", and "a.b.c.d". I use it only in the "index" analyzer, and so am able to specify any of the generated tokens to find the original token. It has the following arguments: sepRegexp: regular expression that the original token will be split according to. (I use "[.]" for domains) glue: string that will be used to join sub-sequences back together (I use "." for domains) minLen: minimum generated sub-sequence length maxLen: maximum generated sub-sequence length (0 for unlimited, negative numbers for token length minus specified amount) anchor: "start" to only output prefixes, "end" to only output suffix, or "none" to output any sub-sequence So... is this useful to anyone?