Hi there,

I was told before that I'd need to create a custom search component to do
what I want to do, but I'm thinking it might actually be a custom analyzer.

Basically, I'm indexing e-mail in XML in Solr and searching the 'content'
field which is parsed as 'text'.

I want to ignore certain elements of the e-mail (i.e. corporate banners),
but also identify the actual content of those e-mails including corporate
information.

To identify the banners I need something a little more developed than a stop
word list. I need to evaluate the frequency of certain words around words
like 'privileged' and 'corporate' within a word window of about 100ish words
to determine whether they're banners and then remove them from being
indexed.

I need to do the opposite during the same time to identify, in a similar
manner, which e-mails include corporate information in their actual content.

I suppose if I'm doing this I don't want what's processed to be indexed as
what's returned in a search, because then presumably it won't be the full
e-mail, so do I need to store some kind of copy field that keeps the full
e-mail and is fully indexed to be returned instead?

Can what I'm suggesting be done and can anyone direct me to a guide?


On another note, is there an easy way to destroy an index...any custom code?


Thanks for any help!



-- 
View this message in context: 
http://www.nabble.com/Word-Locations---Search-Components-tp22031139p22031139.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to