Hmm, Otis, very nice!
Koji
Otis Gospodnetic wrote:
Hi,
Wouldn't this be as easy as:
- split email into "paragraphs"
- for each paragraph compute signature (MD5 or something fuzzier, like in
SOLR-799)
- for each signature look for other emails with this signature
- when you find an email with
X
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 11:05:40 PM
Subject: Re: Word Locations & Search Components
Basically I'm working on the Enron dataset, and I've already de-duplicated
the collection and applied a spam filter. All the e-mails after this have
been parsed to
> >> content.
> >>
> >> I suppose if I'm doing this I don't want what's processed to be indexed
> >> as
> >> what's returned in a search, because then presumably it won't be the
> full
> >> e-mail, so do I need to store some kind of copy fie
m doing this I don't want what's processed to be indexed
>> as
>> what's returned in a search, because then presumably it won't be the full
>> e-mail, so do I need to store some kind of copy field that keeps the full
>> e-mail and is fully indexed to
ne direct me to a guide?
>
>
> On another note, is there an easy way to destroy an index...any custom
> code?
>
>
> Thanks for any help!
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Word-Locations---Search-Components-tp22031139p22031139.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
--
Alexander Ramos Jardim
re an easy way to destroy an index...any
custom code?
Send in a delete by query command with the *:* query.
Thanks for any help!
--
View this message in context:
http://www.nabble.com/Word-Locations---Search-Components-tp22031139p22031139.html
Sent from the Solr - User mailing list
full
e-mail and is fully indexed to be returned instead?
Can what I'm suggesting be done and can anyone direct me to a guide?
On another note, is there an easy way to destroy an index...any custom code?
Thanks for any help!
--
View this message in context:
http://www.nabble.com/Word-Loc