That's actually easy to explain/understand. If the min n-gram size is 3, a query term with just 2 characters will ever match any terms that originally had > 2 characters because longer terms will never get tokenized into terms below 3-character tokens.
Take the term: house house => hou ous use If you search term is "ho", it will never match the above, as there is no term "ho" in there. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR ----- Original Message ---- > From: Charlie Jackson <charlie.jack...@cision.com> > To: solr-user@lucene.apache.org > Sent: Fri, October 23, 2009 4:32:33 PM > Subject: RE: NGram query failing > > > Well, I fixed my own problem in the end. For the record, this is the > schema I ended up going with: > > > > > > > minGramSize="2" /> > > > > > > minGramSize="2"/> > > > > I could have left it a trigram but went with a bigram because with this > setup, I can get queries to properly hit as long as the min/max gram > size is met. In other words, for any queries two or more characters > long, this works for me. Less than two characters and it fails. > > I don't know exactly why that is, but I'll take it anyway! > > - Charlie > > > -----Original Message----- > From: Charlie Jackson [mailto:charlie.jack...@cision.com] > Sent: Friday, October 23, 2009 10:00 AM > To: solr-user@lucene.apache.org > Subject: NGram query failing > > I have a requirement to be able to find hits within words in a free-form > id field. The field can have any type of alphanumeric data - it's as > likely it will be something like "123456" as it is to be "SUN-123-ABC". > I thought of using NGrams to accomplish the task, but I'm having a > problem. I set up a field like this > > > > > positionIncrementGap="100"> > > > > > minGramSize="1" maxGramSize="3"/> > > > > > > > > > > After indexing a field like this, the analysis page indicates my queries > should work. If I give it a sample field value of "ABC-123456-SUN" and a > query value of "45" it shows hits in several places, which is what I > expected. > > > > However, when I actually query the field with something like "45" I get > no hits back. Looking at the debugQuery output, it looks like it's > taking my analyzed query text and putting it into a phrase query. So, > for a query of "45" it turns into a phrase query of :"4 5 45" > which then doesn't hit on anything in my index. > > > > What am I missing to make this work? > > > > - Charlie