Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh
Zambrano, I was too quick to respond to your idf explanation. I definitely did not mean that "idf" and "length-norms" are the same thing. Andrew, this is how i would have done it - First, I would create a field called "prefix_text" as undeneath in my schema.xml

Re: A little help with indexing joined words

2009-10-05 Thread Robert Muir
fyi, if you don't want to turn off norms entirely, try this option in lucene 2.9 DefaultSimilarity: public void setDiscountOverlaps(boolean v) Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is false, meaning overlap tokens are

Re: A little help with indexing joined words

2009-10-05 Thread Christian Zambrano
Would you mind explaining how omitNorm has any effect on the IDF problem I described earlier? I agree with your second sentence. I had to use the NGramTokenFilter to accommodate partial matches. On 10/05/2009 12:11 PM, Avlesh Singh wrote: Using synonyms might be a better solution because the

Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh
> > Using synonyms might be a better solution because the use of > EdgeNGramTokenizerFactory has the potential of creating a large number of > token which will artificially increase the number of tokens in the index > which in turn will affect the IDF score. > Well, I don't see a reason as to why s

Re: A little help with indexing joined words

2009-10-05 Thread Christian Zambrano
Using synonyms might be a better solution because the use of EdgeNGramTokenizerFactory has the potential of creating a large number of token which will artificially increase the number of tokens in the index which in turn will affect the IDF score. A query for "borderland" should have returned

Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh
> > We have indexed a product database and have come across some search terms > where zero results are returned. There are products in the index with > 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for > 'Borderland' or 'Border Land' and 'Dragon Fly' return zero results > resp

A little help with indexing joined words

2009-10-05 Thread Andrew McCombe
Hi I am hoping someone can point me in the right direction with regards to indexing words that are concatenated together to make other words or product names. We have indexed a product database and have come across some search terms where zero results are returned. There are products in the index