I also couldn't  get the exact results I wanted for indexing URL components
using WordDelimeterFilter or patternTokenizer, so resorted to adding a new
field ('pathparts'), plus a few lines of code to  generate the tokens in our
content preprocessor which submits documents to SOLR for indexing.

-Simon

On Tue, Apr 1, 2008 at 7:24 PM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:

>
> : Actually I want to use anything that is not alphabet or digit to be the
> : separator - anything between them will be a word (so that I can use the
> URL
> : fragment to see what is indexed about this site)...any suggestion?
>
> In addition to Mike's suggestion of trying out the WordDelimiterFilter,
> take a look at the PatternTokenizerFactory.
>
>
>
> -Hoss
>
>

Reply via email to