Re: Getting an ngram fieldtype to work

Lance Norskog Fri, 08 Oct 2010 22:04:35 -0700

Is your browser caching the older search result? The example config
comes with HTTP caching on, and if you comment it out the engine
defaults to caching on. So, you have to use the XML to configure Solr
to stop caching.


On Fri, Oct 8, 2010 at 6:52 AM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
>
>
> On Friday, October 08, 2010 03:40:09 pm Allistair Crossley wrote:
>> Well, a lot of this is working but not all.
>>
>> Consider the company name Shooters Inc
>>
>> My ngram field is able to match queries to the name for shoot and hoot and
>> so on. This works.
>>
>> However consider the company name
>>
>> Location Scotland
>>
>> If I query scot I get one result back - but it's for a company called
>> Prescott Inc
>>
>> I looked at the analyzer and realised that the NGramTokenizer was
>> generating substrings from the start (left) of the *whole phrase*
>>
>> location scotland
>>
>> Because my max was set to 15 it was not generating a token for scot
>
> Huh? Your supplied config does generate scot as a token. The 15 is just the
> maximum size of a gram, it does not set a limit to how many new terms are
> generated.
>
> Are you querying the correct server? Did you reindex on the correct server? It
> should work.
>
>> So I figured I would change to a whitespace tokenizer first and then apply
>> the ngram as a filter.
>>
>> This now looks like it is generating scot in the tokens as shown below:
>> Index Analyzer
>>
>> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>>
>> term position 1       2
>> term text     location        scotland
>> term type     word    word
>> source start,end      0,8     9,17
>> payload
>> org.apache.solr.analysis.NGramFilterFactory {maxGramSize=15, minGramSize=4}
>>
>> term
>> position      1       2       3       4       5       6       7       8      
>>  9       10      11      12      13      14
> 15      16      17      18      19      20      21      22      23      24    
>   25
>>       26      27      28      29      30 term
>> text  loca    ocat    cati    atio    tion    locat   ocati   catio   ation  
>>  locati  ocatio
> cation
>>       locatio ocation location        scot    cotl    otla    tlan    land   
>>  scotl   cotla   otlan
> tland
>> scotla        cotlan  otland  scotlan cotland scotland term
>> type  word    word    word    word    word    word    word    word    word   
>>  word    word    word    word    word
>>       word    word    word    word    word    word    word    word    word   
>>  word    word    word    word    word    word
>>       word source
>> start,end     0,4     1,5     2,6     3,7     4,8     0,5     1,6     2,7    
>>  3,8     0,6     1,7     2,8     0,7
> 1,8     0,8     9,13
>>       10,14   11,15   12,16   13,17   9,14    10,15   11,16   12,17   9,15
> 10,16   11,17   9,16    10,17
>>       9,17 payload
>> Query Analyzer
>>
>> scot
>> scot
>>
>> BUT it still results no results for scot, but does continue to return the
>> Prescott result.
>>
>> So ngramming is working but it is not working when the query is something
>> far to the right of the indexed value.
>>
>> Is this another user-error or have I missed something else here?
>>
>> Cheers
>>
>> On Oct 8, 2010, at 9:02 AM, Allistair Crossley wrote:
>> > Oh my. I am basically being a total monkey. Every time I was changing my
>> > schema.xml to try new things out I was then reindexing our staging
>> > server's index instead of my local dev index so no changes were
>> > occurring locally.
>> >
>> > Dear me.
>> >
>> > This is working now, surprise.
>> >
>> > On Oct 8, 2010, at 8:53 AM, Markus Jelsma wrote:
>> >> How come your query analyser spits out grams? It isn't configured to do
>> >> so or you posted an older field definition. Anyway,  do you actually
>> >> search on your new field?
>> >>
>> >> On Friday, October 08, 2010 02:46:08 pm Allistair Crossley wrote:
>> >>> Hi,
>> >>>
>> >>> Yep, I was just looking at the analyzer jsp. The ngrams *do* exist as
>> >>> expected, so it's not my configuration that is at fault (he says)
>> >>>
>> >>> Index Analyzer
>> >>> sh        ho      oo      ot      te      er      sho     hoo     oot    
>> >>>  ote     ter     shoo    hoot    oote    oter
> shoot
>> >>
>> >> hoote      ooter
>> >>
>> >>>   shoote  hooter
>> >>>
>> >>> sh        ho      oo      ot      te      er      sho     hoo     oot    
>> >>>  ote     ter     shoo    hoot    oote    oter
> shoot
>> >>
>> >> hoote      oote
>> >>
>> >>> r shoote  hooter
>> >>> sh        ho      oo      ot      te      er      sho     hoo     oot    
>> >>>  ote     ter     shoo    hoot    oote    oter
> shoot
>> >>
>> >> hoote      oote
>> >>
>> >>> r shoote  hooter
>> >>> sh        ho      oo      ot      te      er      sho     hoo     oot    
>> >>>  ote     ter     shoo    hoot    oote    oter
> shoot
>> >>
>> >> hoote      oote
>> >>
>> >>> r shoote  hooter Query Analyzer
>> >>>
>> >>> sh        ho      oo      ot      te      er      sho     hoo     oot    
>> >>>  ote     ter     shoo    hoot    oote    oter
> shoot
>> >>
>> >> hoote      ooter
>> >>
>> >>>   shoote  hooter
>> >>>
>> >>> sh        ho      oo      ot      te      er      sho     hoo     oot    
>> >>>  ote     ter     shoo    hoot    oote    oter
> shoot
>> >>
>> >> hoote      oote
>> >>
>> >>> r shoote  hooter
>> >>> sh        ho      oo      ot      te      er      sho     hoo     oot    
>> >>>  ote     ter     shoo    hoot    oote    oter
> shoot
>> >>
>> >> hoote      oote
>> >>
>> >>> r shoote  hooter
>> >>> sh        ho      oo      ot      te      er      sho     hoo     oot    
>> >>>  ote     ter     shoo    hoot    oote    oter
> shoot
>> >>
>> >> hoote      oote
>> >>
>> >>> r shoote  hooter
>> >>>
>> >>>
>> >>> Yet, searching either
>> >>>
>> >>> /solr/select?q=hoot
>> >>>
>> >>> or
>> >>>
>> >>> /solr/select?q=name:hoot
>> >>>
>> >>> does not yield results.
>> >>>
>> >>> When searching for shooter I see 2 results with names:
>> >>>
>> >>> 1. <str name="name">Shooters International Inc</str>
>> >>> 2. <str name="name">Hong Kong Shooter</str>
>> >>>
>> >>> Yours, puzzled :)
>> >>>
>> >>> On Oct 8, 2010, at 8:38 AM, Jan Høydahl / Cominvent wrote:
>> >>>> Hi,
>> >>>>
>> >>>> The first thing I would try is to go to the analysis page, enter your
>> >>>> test data, and report back what each analysis stage prints out:
>> >>>> http://localhost:8983/solr/admin/analysis.jsp
>> >>>>
>> >>>> --
>> >>>> Jan Høydahl, search solution architect
>> >>>> Cominvent AS - www.cominvent.com
>> >>>>
>> >>>> On 8. okt. 2010, at 14.19, Allistair Crossley wrote:
>> >>>>> Morning all,
>> >>>>>
>> >>>>> I would like to ngram a company name field in our index. I have read
>> >>>>> about
>> >>
>> >> the costs of doing so in the great David Smiley Solr 1.4 book and just
>> >> to get started I have followed his example in setting up an ngram field
>> >> type as
>> >>
>> >> follows:
>> >>>>>                 <fieldType name="text_substring" class="solr.TextField"
>> >>>>>                 positionIncrementGap="100" stored="false" 
>> >>>>> multiValued="true">
>> >>>>>
>> >>>>>                         <analyzer type="index">
>> >>>>>
>> >>>>>                                 <tokenizer 
>> >>>>> class="solr.StandardTokenizerFactory" />
>> >>>>>                                 <filter 
>> >>>>> class="solr.LowerCaseFilterFactory" />
>> >>>>>                                 <filter class="solr.NGramFilterFactory"
> minGramSize="4"
>> >>>>>                                 maxGramSize="15" />
>> >>>>>
>> >>>>>                         </analyzer>
>> >>>>>                         <analyzer type="query">
>> >>>>>
>> >>>>>                                 <tokenizer 
>> >>>>> class="solr.StandardTokenizerFactory" />
>> >>>>>                                 <filter 
>> >>>>> class="solr.LowerCaseFilterFactory" />
>> >>>>>
>> >>>>>                         </analyzer>
>> >>>>>
>> >>>>>                 </fieldType>
>> >>>>>
>> >>>>> I have restarted/reindexed everything but I still cannot search
>> >>>>>
>> >>>>> hoot
>> >>>>>
>> >>>>> and get back the company named Shooter. searching shooter is fine.
>> >>>>>
>> >>>>> I have followed other examples on the internet regards an ngram field
>> >>>>> type. Some examples seem to use an index analyzer that has an ngram
>> >>>>> tokenizer rather than filter if this makes a difference. But in all
>> >>>>> cases I am not seeing the expected result, just 0 results.
>> >>>>>
>> >>>>> Is there anything else I should be considering here? I feel like I
>> >>>>> must be very close, it doesn't seem complicated but yet it's not
>> >>>>> working like everything else I have done with solr to date :)
>> >>>>>
>> >>>>> Any guidance appreciated,
>> >>>>>
>> >>>>> Allistair
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536600 / 06-50258350
>



-- 
Lance Norskog
goks...@gmail.com

Re: Getting an ngram fieldtype to work

Reply via email to