Re: Searching "inside of words"

Otis Gospodnetic Thu, 17 Apr 2008 20:49:58 -0700

Hi Daniel,
Well, searching "inside of words" requires special treatment, because normally 
searches work on words/terms/tokens.


Make use of the following:
$ ff \*NGram\*java
./src/java/org/apache/solr/analysis/EdgeNGramTokenizerFactory.java
./src/java/org/apache/solr/analysis/NGramTokenizerFactory.java
./src/java/org/apache/solr/analysis/NGramFilterFactory.java
./src/java/org/apache/solr/analysis/EdgeNGramFilterFactory.java

Use these to create a new field type make Solr tokenize and index your terms 
as, say, uni-grams.  Instead (or in addition to) indexing "Termobyxa", index "T 
e r m o b y x a".  Do the same with the query-time analyzer, and you'll be able 
to search within words.
 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Daniel Löfquist <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, April 17, 2008 5:46:15 AM
Subject: Searching "inside of words"

Hi,

I'm still pretty new to Solr. We're using it for searching on our site 
right now though.

The configuration is however pretty much based on the example-files that 
come with Solr and there's one type of search that I can't get to work.

Each item has fields called "title" and "description", both of which are 
of type "text".

The type "text" is defined like this in our schema.xml :

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

My problem is that if I have an item with "title"="Termobyxa", a search 
for "Termo" gives me a hit but if I search for "ermo" or "byxa" I get no 
hit. How do I make it so that this kind of search "inside a word" 
returns a hit?

Sincerely,

Daniel Löfquist

Re: Searching "inside of words"

Reply via email to