die.
>
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php
>
> --- On Thu, 9/23/10, Markus Jelsma wrote:
> > From: Markus Jelsma
> > Subject: RE: Search a URL
> > To: solr-user@lucene.apache.org
> > Date: Thursday, September 23,
e:
> From: Markus Jelsma
> Subject: RE: Search a URL
> To: solr-user@lucene.apache.org
> Date: Thursday, September 23, 2010, 2:11 PM
> Try setting generateWordParts=1 in
> your WDF. Also, having a WhitespaceTokenizer makes little
> sense for URL's, there should be no whitespa
10 23:00
To: solr-user@lucene.apache.org;
Subject: Search a URL
Is there a tokenizer that will allow me to search for parts of a URL? For
example, the search "google" would match on the data "
http://mail.google.com/dlkjadf";
This tokenizer factory does
LetterTokenizerFactory will use each contiguous sequence of letters and discard
the rest. http, https, com, etc. would need to be a stopword.
Alternatively you can try PatternTokenizerFactory with a regular expression if
you are looking for a specific part of the URL.
On Sep 23, 2010, at 10:59
Is there a tokenizer that will allow me to search for parts of a URL? For
example, the search "google" would match on the data "
http://mail.google.com/dlkjadf";
This tokenizer factory doesn't seem to be sufficient: