Re: implementing profanity detector

2010-02-12 Thread Mike Perham
On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll wrote: > > Otherwise, I'd do it via copy fields.  Your first field is your main field > and is analyzed as before.  Your second field does the profanity detection > and simply outputs a single token at the end, safe/unsafe. > > How long are your

term frequency vector access?

2010-02-11 Thread Mike Perham
In an UpdateRequestProcessor (processing an AddUpdateCommand), I have a SolrInputDocument with a field 'content' that has termVectors="true" in schema.xml. Is it possible to get access to that field's term vector in the URP?

implementing profanity detector

2010-02-10 Thread Mike Perham
on how to implement this efficiently with Lucene/Solr. mike On Thu, Jan 28, 2010 at 4:31 PM, Otis Gospodnetic wrote: > > How about this crazy idea - a custom TokenFilter that stores the safe flag in > ThreadLocal? > > > > ----- Original Message > > From: M

implementing profanity detector

2010-01-28 Thread Mike Perham
We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document as safe or not safe if it contains any of those words so that we can have something similar to google's safe search. I'm trying to figure out