from:"Mike Perham"

implementing profanity detector

2010-01-28 Thread Mike Perham

We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document as safe or not safe if it contains any of those words so that we can have something similar to google's safe search. I'm trying to figure out

implementing profanity detector

2010-02-10 Thread Mike Perham

on how to implement this efficiently with Lucene/Solr. mike On Thu, Jan 28, 2010 at 4:31 PM, Otis Gospodnetic wrote: > > How about this crazy idea - a custom TokenFilter that stores the safe flag in > ThreadLocal? > > > > ----- Original Message > > From: M

term frequency vector access?

2010-02-11 Thread Mike Perham

In an UpdateRequestProcessor (processing an AddUpdateCommand), I have a SolrInputDocument with a field 'content' that has termVectors="true" in schema.xml. Is it possible to get access to that field's term vector in the URP?

Re: implementing profanity detector

2010-02-12 Thread Mike Perham

On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll wrote: > > Otherwise, I'd do it via copy fields. Your first field is your main field > and is analyzed as before. Your second field does the profanity detection > and simply outputs a single token at the end, safe/unsafe. > > How long are your

implementing profanity detector

implementing profanity detector

term frequency vector access?

Re: implementing profanity detector

4 matches

Site Navigation

Mail list logo

Footer information