Hi,

For message parsing you'll either have to write a custom parser or see if you 
can use JavaMail for that (or some other library if you are not working with 
Java).

As for the second part, that's not directly related to Solr.  Extracting 
meaning out of text would be something that your application needs to do.  Once 
it does that it could index that with Solr so it can be searched later on.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Johnny X <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, November 7, 2008 7:53:40 PM
> Subject: Large Corpus XML Conversion?
> 
> 
> I've been asked to look at the Enron e-mail corpus
> (http://www.cs.cmu.edu/~enron/) and I've decided to use Solr as a means to
> analyse it. 
> 
> So I have a few questions...
> 
> First off, how can I convert the flat file text below:
> 
> 
> Message-ID: <[EMAIL PROTECTED]>
> Date: Mon, 14 May 2001 16:39:00 -0700 (PDT)
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: 
> Mime-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> X-From: Phillip K Allen
> X-To: Tim Belden 
> X-cc: 
> X-bcc: 
> X-Folder: \Phillip_Allen_Jan2002_1\Allen, Phillip K.\'Sent Mail
> X-Origin: Allen-P
> X-FileName: pallen (Non-Privileged).pst
> 
> Here is our forecast
> 
> 
> 
> 
> to XML to input into Solr.
> 
> Secondly, I'm looking into searching for particular things in the e-mails
> and sorting them into groups as a result. Say, characteristics of the
> e-mails that suggest they concerns confidential company information for
> instance.
> 
> How easy is it to make custom searches (based on semantics, word distances
> etc) and use the results as an output?
> 
> 
> I'm a complete newbie so any help is appreciated! I hope I've come to the
> right place.
> 
> Thanks. :-)
> -- 
> View this message in context: 
> http://www.nabble.com/Large-Corpus-XML-Conversion--tp20389947p20389947.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to