I would go for a business logic solution and not a Solr customization in this case, as you need to filter information that you actually would like to see in diferent fields on your index.
Did you already tried to split the email in several fields like subject, from, to, content, signature, etc etc etc ? 2009/2/16 Johnny X <jonathanwel...@gmail.com> > > Hi there, > > > I was told before that I'd need to create a custom search component to do > what I want to do, but I'm thinking it might actually be a custom analyzer. > > Basically, I'm indexing e-mail in XML in Solr and searching the 'content' > field which is parsed as 'text'. > > I want to ignore certain elements of the e-mail (i.e. corporate banners), > but also identify the actual content of those e-mails including corporate > information. > > To identify the banners I need something a little more developed than a > stop > word list. I need to evaluate the frequency of certain words around words > like 'privileged' and 'corporate' within a word window of about 100ish > words > to determine whether they're banners and then remove them from being > indexed. > > I need to do the opposite during the same time to identify, in a similar > manner, which e-mails include corporate information in their actual > content. > > I suppose if I'm doing this I don't want what's processed to be indexed as > what's returned in a search, because then presumably it won't be the full > e-mail, so do I need to store some kind of copy field that keeps the full > e-mail and is fully indexed to be returned instead? > > Can what I'm suggesting be done and can anyone direct me to a guide? > > > On another note, is there an easy way to destroy an index...any custom > code? > > > Thanks for any help! > > > > -- > View this message in context: > http://www.nabble.com/Word-Locations---Search-Components-tp22031139p22031139.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Alexander Ramos Jardim