Re: Search a URL

Markus Jelsma Fri, 24 Sep 2010 03:38:36 -0700

WordDelimiterFilter

On Friday 24 September 2010 02:42:52 Dennis Gearon wrote:
> WDF is not WTF(what I think when I see WDF), right ;-)
> 
> What is WDF?
> 
> Dennis Gearon
> 
> Signature Warning
> ----------------
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php
> 
> --- On Thu, 9/23/10, Markus Jelsma <markus.jel...@buyways.nl> wrote:
> > From: Markus Jelsma <markus.jel...@buyways.nl>
> > Subject: RE: Search a URL
> > To: solr-user@lucene.apache.org
> > Date: Thursday, September 23, 2010, 2:11 PM
> > Try setting generateWordParts=1 in
> > your WDF. Also, having a WhitespaceTokenizer makes little
> > sense for URL's, there should be no whitespace in a URL, the
> > StandardTokenizer can tokenize a URL. Anyway, the problem is
> > your WDF.
> >  
> > -----Original message-----
> > From: Max Lynch <ihas...@gmail.com>
> > Sent: Thu 23-09-2010 23:00
> > To: solr-user@lucene.apache.org;
> >
> > Subject: Search a URL
> >
> > Is there a tokenizer that will allow me to search for parts
> > of a URL?  For
> > example, the search "google" would match on the data "
> > http://mail.google.com/dlkjadf";
> >
> > This tokenizer factory doesn't seem to be sufficient:
> >
> >        <fieldType name="text_standard"
> > class="solr.TextField"
> > positionIncrementGap="100">
> >            <analyzer type="index">
> >                <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >                <filter
> > class="solr.WordDelimiterFilterFactory"
> > generateWordParts="0" generateNumberParts="1"
> > catenateWords="1"
> > catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="1"/>
> >                <filter
> > class="solr.LowerCaseFilterFactory"/>
> >                <filter
> > class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >            </analyzer>
> >            <analyzer type="query">
> >                 <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >
> >                 <filter
> > class="solr.WordDelimiterFilterFactory"
> > generateWordParts="0" generateNumberParts="1"
> > catenateWords="1"
> > catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="1"/>
> >                 <filter
> > class="solr.LowerCaseFilterFactory"/>
> >                 <filter
> > class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >             </analyzer>
> >    </fieldType>
> >
> > Thanks.
>


Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Search a URL

Reply via email to