Yes, each one is a document. A real example :
<str name="q">k1_en:men</str> <doc> <float name="score">0.81426066</float> ... <str name="id">846</str> ... <str name="k1_en"> ;arm;arms;elbow;elbows;man;men;male;males;indoors;one;person;Men's;moods; </str> ... </doc> ... <doc> <float name="score">0.6232885</float> ... <str name="id">652</str> <str name="k1_en"> ;portrait;portraits;young;adult;young;adults;*man*;*men*;male;males;male;males;young;*men*;young;*man*;identity;identities;self-confidence;assertiveness;male;beauty;masculine;beauty;*men's*;beauty;indoors;inside;day;daytime;one;person;one;individual;northern;european;caucasian </str> ... </doc> .;. <lst name="explain"> <str name="846"> 0.81426066 = (MATCH) weight(k1_en:men in 35050), product of: 0.99999994 = queryWeight(k1_en:men), product of: 2.3030772 = idf(docFreq=17576, numDocs=64694) 0.43420166 = queryNorm 0.8142607 = (MATCH) fieldWeight(k1_en:men in 35050), product of: *1.4142135 = tf(termFreq(k1_en:men)=2)* 2.3030772 = idf(docFreq=17576, numDocs=64694) 0.25 = fieldNorm(field=k1_en, doc=35050) </str> ... <str name="652"> 0.62328845 = (MATCH) weight(k1_en:men in 13312), product of: 0.99999994 = queryWeight(k1_en:men), product of: 2.3030772 = idf(docFreq=17576, numDocs=64694) 0.43420166 = queryNorm 0.6232885 = (MATCH) fieldWeight(k1_en:men in 13312), product of: *1.7320508 = tf(termFreq(k1_en:men)=3)* 2.3030772 = idf(docFreq=17576, numDocs=64694) 0.15625 = fieldNorm(field=k1_en, doc=13312) </str> ... You can see here for the first document termFreq = 2 and for the second document termFreq = 3 ... And I would like to have termFreq = 1 in each case for this field (k1_en). Thanks for in advance your help, On Wed, Oct 1, 2008 at 8:45 PM, Otis Gospodnetic <[EMAIL PROTECTED] > wrote: > In each of your examples (is each one a documen?) I see only 1 "men" > instance, so "men" term frequency should be 1 for that document. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: KLessou <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Wednesday, October 1, 2008 11:43:59 AM > > Subject: Re: termFreq always = 1 ? > > > > Yes this may be my problem, > > > > But is there any solution to have only one "men" keyword indexed when > i''ve > > got something like this : > > > > 1 - k1_en = men;business;Men > > or : > > 2 - k1_en = man,business,men > > or : > > 3 - k1_en = Man,men,business,Men,man > > ... > > > > Thx in advance, > > > > On Wed, Oct 1, 2008 at 5:12 PM, Otis Gospodnetic > > > wrote: > > > > > Hi, > > > > > > Note that RemoveDuplicatesTokenFilterFactory "filters out any tokens > which > > > are at the same logical position in the tokenstream as a previous token > with > > > the same text." > > > > > > So if you have "men in black are real men" then > > > RemoveDuplicatesTokenFilterFactory will not remove duplicate "men". > > > > > > This may or may not be your problem. > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > > > > ----- Original Message ---- > > > > From: KLessou > > > > To: solr-user@lucene.apache.org > > > > Sent: Wednesday, October 1, 2008 9:48:28 AM > > > > Subject: termFreq always = 1 ? > > > > > > > > Hi, > > > > > > > > I want to index a list of keywords. > > > > > > > > When I search "k1_en:men", I find a lot of documents like that : > > > > > > > > DocA : > > > > (k1_en = man;men;Men;business... termFreq=2) > > > > DocB : > > > > (k1_en = man;Men;business... termFreq=1) > > > > DocC : > > > > ... > > > > DocD : > > > > ... > > > > DocE : > > > > ... > > > > > > > > But I don't want to have a different termFreq for DocA & DocB. > > > > > > > > I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help > me > > > :-/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ignoreCase="true"/> > > > > > > > > protected="protwords.txt" /> > > > > > > > > > > > > > > > > > > > > generateWordParts="0" > > > > generateNumberParts="0" > > > > catenateWords="0" > > > > catenateNumbers="0" > > > > catenateAll="0" > > > > /> > > > > > > > > > > > > > > > > > > > > /> > > > > > > > > > > > > > > > > > > > > ignoreCase="true"/> > > > > > > > > protected="protwords.txt" /> > > > > > > > > > > > > > > > > generateWordParts="0" > > > > generateNumberParts="0" > > > > catenateWords="0" > > > > catenateNumbers="0" > > > > catenateAll="0" > > > > /> > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > required="false" /> > > > > > > > > > > > > If you have any idea, thx in advance. > > > > > > > > -- > > > > ~~~~~ > > > > | klessou | > > > > ~~~~~ > > > > > > > > > > > > -- > > ~~~~~ > > | klessou | > > ~~~~~ > > -- ~~~~~ | klessou | ~~~~~