On Wed, Jul 13, 2011 at 3:54 PM, Michael Kuhlmann <s...@kuli.org> wrote:

> Am 13.07.2011 15:37, schrieb Gabriele Kahlout:
> > Well, I'm !sure how usual this scenario would be:
> > 1. In general those using solr with nutch don't store the content field
> to
> > avoid storing the whole web/intranet in their index, twice (1 in the form
> of
> > stored data, and one in the form of indexed data).
> >
>
> Not exactly. The indexed form is quite different from the stored form;
> only the tokens are stored, each token only once, and some additional
> data like the document count and, maybe, shingle information etc..
>
> Hence, indexed data usually needs much less space on disk than the
> original data.
>

I realized that. Maybe I should have said "1.X (1 in the form of stored data
and 0.X in the form of indexed data).

>
> There's no practical alternative to storing the content in a stored
> field. What would you otherwise display as a search result? "The
> following web pages have your search term somewhere in their contents,
> don't know where, take a look on your own"?
>
> Display the title, and url (and implicitly say "The
following web pages have your search term somewhere in their contents, don't
REMEMBER where, take a look on your own"?).

Solr is already configured by default not to store more than a
<maxFieldLength> anyway. Usually one stores content only to display
snippets.



> Greetings,
> Kuli
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Reply via email to