...or also use the LowerCaseTokenizerFactory at query time for consistency, but not the edge ngram filter.
2011/5/31 Tomás Fernández Löbbe <tomasflo...@gmail.com> > Hi Brian, I don't know if I understand what you are trying to achieve. You > want the term query "abcdefg" to have an idf of 1 insead of 7? I think using > the KeywordTokenizerFilterFactory at query time should work. I would be > something like: > > <fieldType name="edgengram" class="solr.TextField" > positionIncrementGap="1000"> > <analyzer type="index"> > > <tokenizer class="solr.LowerCaseTokenizerFactory" /> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" > maxGramSize="25" side="front" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory" /> > </analyzer> > </fieldType> > > this way, at query time "abcdefg" won't be turned to "a ab abc abcd abcde > abcdef abcdefg". At index time it will. > > Regards, > Tomás > > > On Tue, May 31, 2011 at 1:07 PM, Brian Lamb <brian.l...@journalexperts.com > > wrote: > >> <fieldType name="edgengram" class="solr.TextField" >> positionIncrementGap="1000"> >> <analyzer> >> <tokenizer class="solr.LowerCaseTokenizerFactory" /> >> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" >> maxGramSize="25" side="front" /> >> </analyzer> >> </fieldType> >> >> I believe I used that link when I initially set up the field and it worked >> great (and I'm still using it in other places). In this particular example >> however it does not appear to be practical for me. I mentioned that I have >> a >> similarity class that returns 1 for the idf and in the case of an >> edgengram, >> it returns 1 * length of the search string. >> >> Thanks, >> >> Brian Lamb >> >> On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com < >> bmdakshinamur...@gmail.com> wrote: >> >> > Can you specify the analyzer you are using for your queries? >> > >> > May be you could use a KeywordAnalyzer for your queries so you don't end >> up >> > matching parts of your query. >> > >> > >> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ >> > This should help you. >> > >> > On Tue, May 31, 2011 at 8:24 PM, Brian Lamb >> > <brian.l...@journalexperts.com>wrote: >> > >> > > In this particular case, I will be doing a solr search based on user >> > > preferences. So I will not be depending on the user to type "abcdefg". >> > That >> > > will be automatically generated based on user selections. >> > > >> > > The contents of the field do not contain spaces and since I am created >> > the >> > > search parameters, case isn't important either. >> > > >> > > Thanks, >> > > >> > > Brian Lamb >> > > >> > > On Tue, May 31, 2011 at 9:44 AM, Erick Erickson < >> erickerick...@gmail.com >> > > >wrote: >> > > >> > > > That'll work for your case, although be aware that string types >> aren't >> > > > analyzed at all, >> > > > so case matters, as do spaces etc..... >> > > > >> > > > What is the use-case here? If you explain it a bit there might be >> > > > better answers.... >> > > > >> > > > Best >> > > > Erick >> > > > >> > > > On Fri, May 27, 2011 at 9:17 AM, Brian Lamb >> > > > <brian.l...@journalexperts.com> wrote: >> > > > > For this, I ended up just changing it to string and using >> "abcdefg*" >> > to >> > > > > match. That seems to work so far. >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Brian Lamb >> > > > > >> > > > > On Wed, May 25, 2011 at 4:53 PM, Brian Lamb >> > > > > <brian.l...@journalexperts.com>wrote: >> > > > > >> > > > >> Hi all, >> > > > >> >> > > > >> I'm running into some confusion with the way edgengram works. I >> have >> > > the >> > > > >> field set up as: >> > > > >> >> > > > >> <fieldType name="edgengram" class="solr.TextField" >> > > > >> positionIncrementGap="1000"> >> > > > >> <analyzer> >> > > > >> <tokenizer class="solr.LowerCaseTokenizerFactory" /> >> > > > >> <filter class="solr.EdgeNGramFilterFactory" >> minGramSize="1" >> > > > >> maxGramSize="100" side="front" /> >> > > > >> </analyzer> >> > > > >> </fieldType> >> > > > >> >> > > > >> I've also set up my own similarity class that returns 1 as the >> idf >> > > > score. >> > > > >> What I've found this does is if I match a string "abcdefg" >> against a >> > > > field >> > > > >> containing "abcdefghijklmnop", then the idf will score that as a >> 7: >> > > > >> >> > > > >> 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 >> > abcdefg=2) >> > > > >> >> > > > >> I get why that's happening, but is there a way to avoid that? Do >> I >> > > need >> > > > to >> > > > >> do a new field type to achieve the desired affect? >> > > > >> >> > > > >> Thanks, >> > > > >> >> > > > >> Brian Lamb >> > > > >> >> > > > > >> > > > >> > > >> > >> > >> > >> > -- >> > Thanks and Regards, >> > DakshinaMurthy BM >> > >> > >