Part of the query is 'injected' by my application while unaware of the user query. Would I know that 'paste past' end up together as query 'past past' I would not inject anything as it distorts the score calculation. I could inject after it, but it is not easy.
So, trying to solve it right into the RequestHandler I've difficulties with queries that contain phrases ("") or the 'must be present' + operator. For example I'd not want to touch a user query: +"zusammen essen" +"alein essen" where 'essen' is the duplicate term. My 'good enough solution' is thus to not remove the duplicate in clauses prefixed by + or ". C := set of clauses in which duplicated term t occurs. for each clause c in C: do if(!c.toString().startsWith(") && !c.toString().startsWith(+) && |C| > 1){ C.remove(c); } end What do you think? Better solutions or algorithms to make sure the same term occurs only once in a query, or at least it's weighted once only in the score calculation? On Mon, Jun 20, 2011 at 11:15 AM, Markus Jelsma <markus.jel...@openindex.io>wrote: > That only removed tokens on the same position, as the wiki explains. > > Gabrielle, why would you expect that? You input two tokens so you query for > two tokens, why would it be a `set` ? > > > this might help in your analysis chain > > > > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDupl > > icatesTokenFilterFactory > > > > On 20 June 2011 04:21, Gabriele Kahlout <gabri...@mysimpatico.com> > wrote: > > > <str name="rawquerystring">past past</str> > > > <str name="querystring">*past past*</str> > > > <str name="parsedquery">*content:past content:past*</str> > > > > > > I was expecting the query to get parsed into content:past only and not > > > content:past content:past. > > > > > > On Mon, Jun 20, 2011 at 12:12 AM, lee carroll > > > > > > <lee.a.carr...@googlemail.com>wrote: > > >> do you mean a phrase query? "past past" > > >> can you give some more detail? > > >> > > >> On 18 June 2011 13:02, Gabriele Kahlout <gabri...@mysimpatico.com> > wrote: > > >> > q=past past > > >> > > > >> > 1.0 = (MATCH) sum of: > > >> > * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* > > >> > 1.0 = tf(termFreq(content:past)=1) > > >> > 1.0 = idf(docFreq=1, maxDocs=2) > > >> > 0.5 = fieldNorm(field=content, doc=0) > > >> > * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* > > >> > 1.0 = tf(termFreq(content:past)=1) > > >> > 1.0 = idf(docFreq=1, maxDocs=2) > > >> > 0.5 = fieldNorm(field=content, doc=0) > > >> > > > >> > Is there how I can treat the query keywords as a set? > > >> > > > >> > -- > > >> > Regards, > > >> > K. Gabriele > > >> > > > >> > --- unchanged since 20/9/10 --- > > >> > P.S. If the subject contains "[LON]" or the addressee acknowledges > the > > >> > receipt within 48 hours then I don't resend the email. > > >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > > >> > > >> time(x) > > >> > > >> > < Now + 48h) ⇒ ¬resend(I, this). > > >> > > > >> > If an email is sent by a sender that is not a trusted contact or the > > >> > > >> email > > >> > > >> > does not contain a valid code then the email is not received. A > valid > > >> > > >> code > > >> > > >> > starts with a hyphen and ends with "X". > > >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ > y > > >> > ∈ L(-[a-z]+[0-9]X)). > > > > > > -- > > > Regards, > > > K. Gabriele > > > > > > --- unchanged since 20/9/10 --- > > > P.S. If the subject contains "[LON]" or the addressee acknowledges the > > > receipt within 48 hours then I don't resend the email. > > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > > > time(x) < Now + 48h) ⇒ ¬resend(I, this). > > > > > > If an email is sent by a sender that is not a trusted contact or the > > > email does not contain a valid code then the email is not received. A > > > valid code starts with a hyphen and ends with "X". > > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y > ∈ > > > L(-[a-z]+[0-9]X)). > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).