Be a little careful here. LowerCaseTokenizerFactory is different than KeywordTokenizerFactory.
LowerCaseTokenizerFactory will give you more than one term. e.g. the string "Intelligence can't be MeaSurEd" will give you 5 terms, any of which may match. i.e. "intelligence", "can", "t", "be", "measured". whereas KeywordTokenizerFactory followed, by, say LowerCaseFilter would give you exactly one token: "intelligence can't be measured". So searching for "measured" would get a hit in the first case but not in the second. Searching for "intellig*" would hit both. Neither is better, just make sure they do what you want! This page will help a lot: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory as will the admin/analysis page. Best Erick On Wed, Jun 1, 2011 at 10:43 AM, Brian Lamb <brian.l...@journalexperts.com> wrote: > Hi Tomás, > > Thank you very much for your suggestion. I took another crack at it using > your recommendation and it worked ideally. The only thing I had to change > was > > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory" /> > </analyzer> > > to > > <analyzer type="query"> > <tokenizer class="solr.LowerCaseTokenizerFactory" /> > </analyzer> > > The first did not produce any results but the second worked beautifully. > > Thanks! > > Brian Lamb > > 2011/5/31 Tomás Fernández Löbbe <tomasflo...@gmail.com> > >> ...or also use the LowerCaseTokenizerFactory at query time for consistency, >> but not the edge ngram filter. >> >> 2011/5/31 Tomás Fernández Löbbe <tomasflo...@gmail.com> >> >> > Hi Brian, I don't know if I understand what you are trying to achieve. >> You >> > want the term query "abcdefg" to have an idf of 1 insead of 7? I think >> using >> > the KeywordTokenizerFilterFactory at query time should work. I would be >> > something like: >> > >> > <fieldType name="edgengram" class="solr.TextField" >> > positionIncrementGap="1000"> >> > <analyzer type="index"> >> > >> > <tokenizer class="solr.LowerCaseTokenizerFactory" /> >> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" >> > maxGramSize="25" side="front" /> >> > </analyzer> >> > <analyzer type="query"> >> > <tokenizer class="solr.KeywordTokenizerFactory" /> >> > </analyzer> >> > </fieldType> >> > >> > this way, at query time "abcdefg" won't be turned to "a ab abc abcd abcde >> > abcdef abcdefg". At index time it will. >> > >> > Regards, >> > Tomás >> > >> > >> > On Tue, May 31, 2011 at 1:07 PM, Brian Lamb < >> brian.l...@journalexperts.com >> > > wrote: >> > >> >> <fieldType name="edgengram" class="solr.TextField" >> >> positionIncrementGap="1000"> >> >> <analyzer> >> >> <tokenizer class="solr.LowerCaseTokenizerFactory" /> >> >> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" >> >> maxGramSize="25" side="front" /> >> >> </analyzer> >> >> </fieldType> >> >> >> >> I believe I used that link when I initially set up the field and it >> worked >> >> great (and I'm still using it in other places). In this particular >> example >> >> however it does not appear to be practical for me. I mentioned that I >> have >> >> a >> >> similarity class that returns 1 for the idf and in the case of an >> >> edgengram, >> >> it returns 1 * length of the search string. >> >> >> >> Thanks, >> >> >> >> Brian Lamb >> >> >> >> On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com < >> >> bmdakshinamur...@gmail.com> wrote: >> >> >> >> > Can you specify the analyzer you are using for your queries? >> >> > >> >> > May be you could use a KeywordAnalyzer for your queries so you don't >> end >> >> up >> >> > matching parts of your query. >> >> > >> >> > >> >> >> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ >> >> > This should help you. >> >> > >> >> > On Tue, May 31, 2011 at 8:24 PM, Brian Lamb >> >> > <brian.l...@journalexperts.com>wrote: >> >> > >> >> > > In this particular case, I will be doing a solr search based on user >> >> > > preferences. So I will not be depending on the user to type >> "abcdefg". >> >> > That >> >> > > will be automatically generated based on user selections. >> >> > > >> >> > > The contents of the field do not contain spaces and since I am >> created >> >> > the >> >> > > search parameters, case isn't important either. >> >> > > >> >> > > Thanks, >> >> > > >> >> > > Brian Lamb >> >> > > >> >> > > On Tue, May 31, 2011 at 9:44 AM, Erick Erickson < >> >> erickerick...@gmail.com >> >> > > >wrote: >> >> > > >> >> > > > That'll work for your case, although be aware that string types >> >> aren't >> >> > > > analyzed at all, >> >> > > > so case matters, as do spaces etc..... >> >> > > > >> >> > > > What is the use-case here? If you explain it a bit there might be >> >> > > > better answers.... >> >> > > > >> >> > > > Best >> >> > > > Erick >> >> > > > >> >> > > > On Fri, May 27, 2011 at 9:17 AM, Brian Lamb >> >> > > > <brian.l...@journalexperts.com> wrote: >> >> > > > > For this, I ended up just changing it to string and using >> >> "abcdefg*" >> >> > to >> >> > > > > match. That seems to work so far. >> >> > > > > >> >> > > > > Thanks, >> >> > > > > >> >> > > > > Brian Lamb >> >> > > > > >> >> > > > > On Wed, May 25, 2011 at 4:53 PM, Brian Lamb >> >> > > > > <brian.l...@journalexperts.com>wrote: >> >> > > > > >> >> > > > >> Hi all, >> >> > > > >> >> >> > > > >> I'm running into some confusion with the way edgengram works. I >> >> have >> >> > > the >> >> > > > >> field set up as: >> >> > > > >> >> >> > > > >> <fieldType name="edgengram" class="solr.TextField" >> >> > > > >> positionIncrementGap="1000"> >> >> > > > >> <analyzer> >> >> > > > >> <tokenizer class="solr.LowerCaseTokenizerFactory" /> >> >> > > > >> <filter class="solr.EdgeNGramFilterFactory" >> >> minGramSize="1" >> >> > > > >> maxGramSize="100" side="front" /> >> >> > > > >> </analyzer> >> >> > > > >> </fieldType> >> >> > > > >> >> >> > > > >> I've also set up my own similarity class that returns 1 as the >> >> idf >> >> > > > score. >> >> > > > >> What I've found this does is if I match a string "abcdefg" >> >> against a >> >> > > > field >> >> > > > >> containing "abcdefghijklmnop", then the idf will score that as >> a >> >> 7: >> >> > > > >> >> >> > > > >> 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 >> >> > abcdefg=2) >> >> > > > >> >> >> > > > >> I get why that's happening, but is there a way to avoid that? >> Do >> >> I >> >> > > need >> >> > > > to >> >> > > > >> do a new field type to achieve the desired affect? >> >> > > > >> >> >> > > > >> Thanks, >> >> > > > >> >> >> > > > >> Brian Lamb >> >> > > > >> >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> > >> >> > >> >> > -- >> >> > Thanks and Regards, >> >> > DakshinaMurthy BM >> >> > >> >> >> > >> > >> >