On Mon, Sep 5, 2011 at 1:22 PM, Markus Jelsma <markus.jel...@openindex.io>wrote:
> Hi, > > URI paths are case-sensitive. If you really want to treat all URL's as > case- > insensitive i would suggest to modifiy the basic URL normalizer to > lowercase > all URL's so that it also ends up lowercased in the CrawlDB. > > What is your problem? I would strongly suggest another solution if you're > doing wide web crawls. > I don't want duplicate results where the only real difference is the case of some letters in the URL. What other solution? > > Cheers, > > > Hi, > > I've just noticed that two search results of indexed data have the same > > url: > > > > http://www.atory.com/dupe_checker_pro/ > > http://www.atory.com/dupe_checker_PRO/ > > > > I thought the url/id was case-insentively unique. Is there how I can set > it > > up to be so? > > > > For Solr it makes sense not to make it the default for disparate uses, > but > > for nutch not. > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).