I also couldn't get the exact results I wanted for indexing URL components using WordDelimeterFilter or patternTokenizer, so resorted to adding a new field ('pathparts'), plus a few lines of code to generate the tokens in our content preprocessor which submits documents to SOLR for indexing.
-Simon On Tue, Apr 1, 2008 at 7:24 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Actually I want to use anything that is not alphabet or digit to be the > : separator - anything between them will be a word (so that I can use the > URL > : fragment to see what is indexed about this site)...any suggestion? > > In addition to Mike's suggestion of trying out the WordDelimiterFilter, > take a look at the PatternTokenizerFactory. > > > > -Hoss > >