You have: ;arm;arms;elbow;elbows;man;men;male;males;indoors;one;person;Men's;moods;
Note these two: men Men's You probably tokenize that field and you probably lowercase it, and you probably stem it and you probably end up with 2 "men" tokens: men ==> men Men's ==> men Hence your term freq of 2. You could: 1) lowercase outside of Solr, before indexing 2) feed text with sorted words to Solr 3) use that token filter that removes duplicates after stemming That could work. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: KLessou <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, October 2, 2008 4:41:12 AM > Subject: Re: termFreq always = 1 ? > > Yes, each one is a document. > > A real example : > > k1_en:men > > > 0.81426066 > ... > 846 > ... > > > ;arm;arms;elbow;elbows;man;men;male;males;indoors;one;person;Men's;moods; > > ... > > > ... > > > 0.6232885 > > ... > > 652 > > > > ;portrait;portraits;young;adult;young;adults;*man*;*men*;male;males;male;males;young;*men*;young;*man*;identity;identities;self-confidence;assertiveness;male;beauty;masculine;beauty;*men's*;beauty;indoors;inside;day;daytime;one;person;one;individual;northern;european;caucasian > > > ... > > > > .;. > > > > 0.81426066 = (MATCH) weight(k1_en:men in 35050), product of: > 0.99999994 = queryWeight(k1_en:men), product of: > 2.3030772 = idf(docFreq=17576, numDocs=64694) > 0.43420166 = queryNorm > 0.8142607 = (MATCH) fieldWeight(k1_en:men in 35050), product of: > *1.4142135 = tf(termFreq(k1_en:men)=2)* > 2.3030772 = idf(docFreq=17576, numDocs=64694) > 0.25 = fieldNorm(field=k1_en, doc=35050) > > ... > > 0.62328845 = (MATCH) weight(k1_en:men in 13312), product of: > 0.99999994 = queryWeight(k1_en:men), product of: > 2.3030772 = idf(docFreq=17576, numDocs=64694) > 0.43420166 = queryNorm > 0.6232885 = (MATCH) fieldWeight(k1_en:men in 13312), product of: > *1.7320508 = tf(termFreq(k1_en:men)=3)* > 2.3030772 = idf(docFreq=17576, numDocs=64694) > 0.15625 = fieldNorm(field=k1_en, doc=13312) > > ... > > You can see here for the first document termFreq = 2 and for the > second document termFreq = 3 ... > > And I would like to have termFreq = 1 in each case for this field (k1_en). > > Thanks for in advance your help, > > > > > > > > On Wed, Oct 1, 2008 at 8:45 PM, Otis Gospodnetic > > wrote: > > > In each of your examples (is each one a documen?) I see only 1 "men" > > instance, so "men" term frequency should be 1 for that document. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > > > From: KLessou > > > To: solr-user@lucene.apache.org > > > Sent: Wednesday, October 1, 2008 11:43:59 AM > > > Subject: Re: termFreq always = 1 ? > > > > > > Yes this may be my problem, > > > > > > But is there any solution to have only one "men" keyword indexed when > > i''ve > > > got something like this : > > > > > > 1 - k1_en = men;business;Men > > > or : > > > 2 - k1_en = man,business,men > > > or : > > > 3 - k1_en = Man,men,business,Men,man > > > ... > > > > > > Thx in advance, > > > > > > On Wed, Oct 1, 2008 at 5:12 PM, Otis Gospodnetic > > > > wrote: > > > > > > > Hi, > > > > > > > > Note that RemoveDuplicatesTokenFilterFactory "filters out any tokens > > which > > > > are at the same logical position in the tokenstream as a previous token > > with > > > > the same text." > > > > > > > > So if you have "men in black are real men" then > > > > RemoveDuplicatesTokenFilterFactory will not remove duplicate "men". > > > > > > > > This may or may not be your problem. > > > > > > > > Otis > > > > -- > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > > From: KLessou > > > > > To: solr-user@lucene.apache.org > > > > > Sent: Wednesday, October 1, 2008 9:48:28 AM > > > > > Subject: termFreq always = 1 ? > > > > > > > > > > Hi, > > > > > > > > > > I want to index a list of keywords. > > > > > > > > > > When I search "k1_en:men", I find a lot of documents like that : > > > > > > > > > > DocA : > > > > > (k1_en = man;men;Men;business... termFreq=2) > > > > > DocB : > > > > > (k1_en = man;Men;business... termFreq=1) > > > > > DocC : > > > > > ... > > > > > DocD : > > > > > ... > > > > > DocE : > > > > > ... > > > > > > > > > > But I don't want to have a different termFreq for DocA & DocB. > > > > > > > > > > I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help > > me > > > > :-/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ignoreCase="true"/> > > > > > > > > > > protected="protwords.txt" /> > > > > > > > > > > > > > > > > > > > > > > > > > generateWordParts="0" > > > > > generateNumberParts="0" > > > > > catenateWords="0" > > > > > catenateNumbers="0" > > > > > catenateAll="0" > > > > > /> > > > > > > > > > > > > > > > > > > > > > > > > > /> > > > > > > > > > > > > > > > > > > > > > > > > > ignoreCase="true"/> > > > > > > > > > > protected="protwords.txt" /> > > > > > > > > > > > > > > > > > > > > generateWordParts="0" > > > > > generateNumberParts="0" > > > > > catenateWords="0" > > > > > catenateNumbers="0" > > > > > catenateAll="0" > > > > > /> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > required="false" /> > > > > > > > > > > > > > > > If you have any idea, thx in advance. > > > > > > > > > > -- > > > > > ~~~~~ > > > > > | klessou | > > > > > ~~~~~ > > > > > > > > > > > > > > > > > -- > > > ~~~~~ > > > | klessou | > > > ~~~~~ > > > > > > > -- > ~~~~~ > | klessou | > ~~~~~