On Friday, October 08, 2010 03:40:09 pm Allistair Crossley wrote: > Well, a lot of this is working but not all. > > Consider the company name Shooters Inc > > My ngram field is able to match queries to the name for shoot and hoot and > so on. This works. > > However consider the company name > > Location Scotland > > If I query scot I get one result back - but it's for a company called > Prescott Inc > > I looked at the analyzer and realised that the NGramTokenizer was > generating substrings from the start (left) of the *whole phrase* > > location scotland > > Because my max was set to 15 it was not generating a token for scot
Huh? Your supplied config does generate scot as a token. The 15 is just the maximum size of a gram, it does not set a limit to how many new terms are generated. Are you querying the correct server? Did you reindex on the correct server? It should work. > So I figured I would change to a whitespace tokenizer first and then apply > the ngram as a filter. > > This now looks like it is generating scot in the tokens as shown below: > Index Analyzer > > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > > term position 1 2 > term text location scotland > term type word word > source start,end 0,8 9,17 > payload > org.apache.solr.analysis.NGramFilterFactory {maxGramSize=15, minGramSize=4} > > term > position 1 2 3 4 5 6 7 8 > 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > 26 27 28 29 30 term > text loca ocat cati atio tion locat ocati catio ation > locati ocatio cation > locatio ocation location scot cotl otla tlan land > scotl cotla otlan tland > scotla cotlan otland scotlan cotland scotland term > type word word word word word word word word word > word word word word word > word word word word word word word word word > word word word word word word > word source > start,end 0,4 1,5 2,6 3,7 4,8 0,5 1,6 2,7 > 3,8 0,6 1,7 2,8 0,7 1,8 0,8 9,13 > 10,14 11,15 12,16 13,17 9,14 10,15 11,16 12,17 9,15 10,16 11,17 9,16 10,17 > 9,17 payload > Query Analyzer > > scot > scot > > BUT it still results no results for scot, but does continue to return the > Prescott result. > > So ngramming is working but it is not working when the query is something > far to the right of the indexed value. > > Is this another user-error or have I missed something else here? > > Cheers > > On Oct 8, 2010, at 9:02 AM, Allistair Crossley wrote: > > Oh my. I am basically being a total monkey. Every time I was changing my > > schema.xml to try new things out I was then reindexing our staging > > server's index instead of my local dev index so no changes were > > occurring locally. > > > > Dear me. > > > > This is working now, surprise. > > > > On Oct 8, 2010, at 8:53 AM, Markus Jelsma wrote: > >> How come your query analyser spits out grams? It isn't configured to do > >> so or you posted an older field definition. Anyway, do you actually > >> search on your new field? > >> > >> On Friday, October 08, 2010 02:46:08 pm Allistair Crossley wrote: > >>> Hi, > >>> > >>> Yep, I was just looking at the analyzer jsp. The ngrams *do* exist as > >>> expected, so it's not my configuration that is at fault (he says) > >>> > >>> Index Analyzer > >>> sh ho oo ot te er sho hoo oot > >>> ote ter shoo hoot oote oter shoot > >> > >> hoote ooter > >> > >>> shoote hooter > >>> > >>> sh ho oo ot te er sho hoo oot > >>> ote ter shoo hoot oote oter shoot > >> > >> hoote oote > >> > >>> r shoote hooter > >>> sh ho oo ot te er sho hoo oot > >>> ote ter shoo hoot oote oter shoot > >> > >> hoote oote > >> > >>> r shoote hooter > >>> sh ho oo ot te er sho hoo oot > >>> ote ter shoo hoot oote oter shoot > >> > >> hoote oote > >> > >>> r shoote hooter Query Analyzer > >>> > >>> sh ho oo ot te er sho hoo oot > >>> ote ter shoo hoot oote oter shoot > >> > >> hoote ooter > >> > >>> shoote hooter > >>> > >>> sh ho oo ot te er sho hoo oot > >>> ote ter shoo hoot oote oter shoot > >> > >> hoote oote > >> > >>> r shoote hooter > >>> sh ho oo ot te er sho hoo oot > >>> ote ter shoo hoot oote oter shoot > >> > >> hoote oote > >> > >>> r shoote hooter > >>> sh ho oo ot te er sho hoo oot > >>> ote ter shoo hoot oote oter shoot > >> > >> hoote oote > >> > >>> r shoote hooter > >>> > >>> > >>> Yet, searching either > >>> > >>> /solr/select?q=hoot > >>> > >>> or > >>> > >>> /solr/select?q=name:hoot > >>> > >>> does not yield results. > >>> > >>> When searching for shooter I see 2 results with names: > >>> > >>> 1. <str name="name">Shooters International Inc</str> > >>> 2. <str name="name">Hong Kong Shooter</str> > >>> > >>> Yours, puzzled :) > >>> > >>> On Oct 8, 2010, at 8:38 AM, Jan Høydahl / Cominvent wrote: > >>>> Hi, > >>>> > >>>> The first thing I would try is to go to the analysis page, enter your > >>>> test data, and report back what each analysis stage prints out: > >>>> http://localhost:8983/solr/admin/analysis.jsp > >>>> > >>>> -- > >>>> Jan Høydahl, search solution architect > >>>> Cominvent AS - www.cominvent.com > >>>> > >>>> On 8. okt. 2010, at 14.19, Allistair Crossley wrote: > >>>>> Morning all, > >>>>> > >>>>> I would like to ngram a company name field in our index. I have read > >>>>> about > >> > >> the costs of doing so in the great David Smiley Solr 1.4 book and just > >> to get started I have followed his example in setting up an ngram field > >> type as > >> > >> follows: > >>>>> <fieldType name="text_substring" class="solr.TextField" > >>>>> positionIncrementGap="100" stored="false" > >>>>> multiValued="true"> > >>>>> > >>>>> <analyzer type="index"> > >>>>> > >>>>> <tokenizer > >>>>> class="solr.StandardTokenizerFactory" /> > >>>>> <filter > >>>>> class="solr.LowerCaseFilterFactory" /> > >>>>> <filter class="solr.NGramFilterFactory" minGramSize="4" > >>>>> maxGramSize="15" /> > >>>>> > >>>>> </analyzer> > >>>>> <analyzer type="query"> > >>>>> > >>>>> <tokenizer > >>>>> class="solr.StandardTokenizerFactory" /> > >>>>> <filter > >>>>> class="solr.LowerCaseFilterFactory" /> > >>>>> > >>>>> </analyzer> > >>>>> > >>>>> </fieldType> > >>>>> > >>>>> I have restarted/reindexed everything but I still cannot search > >>>>> > >>>>> hoot > >>>>> > >>>>> and get back the company named Shooter. searching shooter is fine. > >>>>> > >>>>> I have followed other examples on the internet regards an ngram field > >>>>> type. Some examples seem to use an index analyzer that has an ngram > >>>>> tokenizer rather than filter if this makes a difference. But in all > >>>>> cases I am not seeing the expected result, just 0 results. > >>>>> > >>>>> Is there anything else I should be considering here? I feel like I > >>>>> must be very close, it doesn't seem complicated but yet it's not > >>>>> working like everything else I have done with solr to date :) > >>>>> > >>>>> Any guidance appreciated, > >>>>> > >>>>> Allistair -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350