On Wed, Jul 13, 2011 at 3:54 PM, Michael Kuhlmann <s...@kuli.org> wrote:
> Am 13.07.2011 15:37, schrieb Gabriele Kahlout: > > Well, I'm !sure how usual this scenario would be: > > 1. In general those using solr with nutch don't store the content field > to > > avoid storing the whole web/intranet in their index, twice (1 in the form > of > > stored data, and one in the form of indexed data). > > > > Not exactly. The indexed form is quite different from the stored form; > only the tokens are stored, each token only once, and some additional > data like the document count and, maybe, shingle information etc.. > > Hence, indexed data usually needs much less space on disk than the > original data. > I realized that. Maybe I should have said "1.X (1 in the form of stored data and 0.X in the form of indexed data). > > There's no practical alternative to storing the content in a stored > field. What would you otherwise display as a search result? "The > following web pages have your search term somewhere in their contents, > don't know where, take a look on your own"? > > Display the title, and url (and implicitly say "The following web pages have your search term somewhere in their contents, don't REMEMBER where, take a look on your own"?). Solr is already configured by default not to store more than a <maxFieldLength> anyway. Usually one stores content only to display snippets. > Greetings, > Kuli > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).