You could have a synonym file that, for each dirty word, changes the
word into an "impossible word": for example, xyzzy. Then, a search for
clean contents is:

(user search) AND NOT xyzzy

A synonym filter that included payloads would be cool.

On Thu, Jan 28, 2010 at 2:31 PM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:
> How about this crazy idea - a custom TokenFilter that stores the safe flag in 
> ThreadLocal?
>
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> ----- Original Message ----
>> From: Mike Perham <mper...@onespot.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thu, January 28, 2010 4:46:54 PM
>> Subject: implementing profanity detector
>>
>> We'd like to implement a profanity detector for documents during indexing.
>> That is, given a file of profane words, we'd like to be able to mark a
>> document as safe or not safe if it contains any of those words so that we
>> can have something similar to google's safe search.
>>
>> I'm trying to figure out how best to implement this with Solr 1.4:
>>
>> - An UpdateRequestProcessor would allow me to dynamically populate a "safe"
>> boolean field but requires me to pull out the content, tokenize it and run
>> each token through my set of profanities, essentially running the analysis
>> pipeline again.  That's a lot of overheard AFAIK.
>>
>> - A TokenFilter would allow me to tap into the existing analysis pipeline so
>> I get the tokens for free but I can't access the document.
>>
>> Any suggestions on how to best implement this?
>>
>> Thanks in advance,
>> mike
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to