Re: implementing profanity detector

Chris Hostetter Fri, 12 Feb 2010 16:27:15 -0800

: Otherwise, I'd do it via copy fields.  Your first field is your main 
: field and is analyzed as before.  Your second field does the profanity 
: detection and simply outputs a single token at the end, safe/unsafe.


you don't even need custom code for this ... copyFiled all your text into 
a 'has_profanity' field where you use a suitable Tokenizer followed by the 
KeepWordsTokenFilter that only keeps profane words and then a 
PatternReplaceTokenFilter that matches .* and replaces it with "HELL_YEA" 
... now a search for "is_profane:HELL_YEA" finds all profane docs, with 
the added bonus that the scores are based on how many profane words occur 
in the doc.

it could be used as a filter query (probably negated) as needed.



-Hoss

Re: implementing profanity detector

Reply via email to