How about this crazy idea - a custom TokenFilter that stores the safe flag in 
ThreadLocal?


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Mike Perham <mper...@onespot.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, January 28, 2010 4:46:54 PM
> Subject: implementing profanity detector
> 
> We'd like to implement a profanity detector for documents during indexing.
> That is, given a file of profane words, we'd like to be able to mark a
> document as safe or not safe if it contains any of those words so that we
> can have something similar to google's safe search.
> 
> I'm trying to figure out how best to implement this with Solr 1.4:
> 
> - An UpdateRequestProcessor would allow me to dynamically populate a "safe"
> boolean field but requires me to pull out the content, tokenize it and run
> each token through my set of profanities, essentially running the analysis
> pipeline again.  That's a lot of overheard AFAIK.
> 
> - A TokenFilter would allow me to tap into the existing analysis pipeline so
> I get the tokens for free but I can't access the document.
> 
> Any suggestions on how to best implement this?
> 
> Thanks in advance,
> mike

Reply via email to