Re: Profanity

2018-01-08 Thread John Blythe
n’t. > > Thanks > > Sid > > Sent from my iPhone > > > On Jan 8, 2018, at 4:38 PM, John Blythe wrote: > > > > you could use the keepwords functionality. have a field that only keeps > > profanity and then you can query against that field having its default >

Re: Profanity

2018-01-08 Thread Sadiki Latty
> you could use the keepwords functionality. have a field that only keeps > profanity and then you can query against that field having its default > value vs. profane text > > -- > John Blythe > >> On Mon, Jan 8, 2018 at 3:12 PM, Sadiki Latty wrote: >> >>

Re: Profanity

2018-01-08 Thread Sadiki Latty
overkill hence why I was thinking the list. The data being inserted is from sources that we have “control” over. This requirement is simply for the worst case scenario that we miss something. We might also want to allow this profanity which is why we need to flag it rather than strip it all

RE: Profanity

2018-01-08 Thread Markus Jelsma
gards, Markus -Original message- > From:Davis, Daniel (NIH/NLM) [C] > Sent: Monday 8th January 2018 23:12 > To: solr-user@lucene.apache.org > Subject: RE: Profanity > > Fun topic.   Same complicated issues as normal search: > > Multilingual support?    Is &quo

RE: Profanity

2018-01-08 Thread Davis, Daniel (NIH/NLM) [C]
Fun topic. Same complicated issues as normal search: Multilingual support?Is "Merde" profanity too, or just in French. Multi-word synonyms? Does "God Damn" becomes "goddamn", or do you treat "Damn" and &

RE: Profanity

2018-01-08 Thread Markus Jelsma
text input field for 'profanity' and set another boolean field if it matches or doesn't. If you are using a list of words - or an SVM or another machine learning algorithm - to detect provanity is up to you. Cheers, Markus -Original message- > From:Sadiki Latty > Sen

Re: Profanity

2018-01-08 Thread John Blythe
you could use the keepwords functionality. have a field that only keeps profanity and then you can query against that field having its default value vs. profane text -- John Blythe On Mon, Jan 8, 2018 at 3:12 PM, Sadiki Latty wrote: > Hey > > I would like to find a solution to flag

Profanity

2018-01-08 Thread Sadiki Latty
Hey I would like to find a solution to flag (at index-time) profanity. Optimally, it would be good if it function similar to stopwords in the sense that I can have a predefined list that is read and if token is on the list that document is 'flagged' in a different field. Does anyo

Re: implementing profanity detector

2010-02-16 Thread Lance Norskog
A problem is that your profanity list will not stop growing, and with each new word you will want to rescrub the index. We had a thousand-word NOT clause in every query (a filter query would be true for 99% of the index) until we switched to another arrangement. Another small problem was that I

Re: implementing profanity detector

2010-02-12 Thread Chris Hostetter
: Otherwise, I'd do it via copy fields. Your first field is your main : field and is analyzed as before. Your second field does the profanity : detection and simply outputs a single token at the end, safe/unsafe. you don't even need custom code for this ... copyFiled all your te

Re: implementing profanity detector

2010-02-12 Thread Mike Perham
On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll wrote: > > Otherwise, I'd do it via copy fields.  Your first field is your main field > and is analyzed as before.  Your second field does the profanity detection > and simply outputs a single token at the end, safe/unsafe. >

Re: implementing profanity detector

2010-02-11 Thread Grant Ingersoll
On Jan 28, 2010, at 4:46 PM, Mike Perham wrote: > We'd like to implement a profanity detector for documents during indexing. > That is, given a file of profane words, we'd like to be able to mark a > document as safe or not safe if it contains any of those words so that we &

Re: implementing profanity detector

2010-02-11 Thread Alexey Serba
> - A TokenFilter would allow me to tap into the existing analysis pipeline so > I get the tokens for free but I can't access the document. https://issues.apache.org/jira/browse/SOLR-1536 On Fri, Jan 29, 2010 at 12:46 AM, Mike Perham wrote: > We'd like to implement a pro

implementing profanity detector

2010-02-10 Thread Mike Perham
ike Perham > > To: solr-user@lucene.apache.org > > Sent: Thu, January 28, 2010 4:46:54 PM > > Subject: implementing profanity detector > > > > We'd like to implement a profanity detector for documents during indexing. > > That is, given a file of profane words, we&

Re: implementing profanity detector

2010-01-28 Thread Lance Norskog
ginal Message >> From: Mike Perham >> To: solr-user@lucene.apache.org >> Sent: Thu, January 28, 2010 4:46:54 PM >> Subject: implementing profanity detector >> >> We'd like to implement a profanity detector for documents during indexing. >> That is,

Re: implementing profanity detector

2010-01-28 Thread Otis Gospodnetic
r-user@lucene.apache.org > Sent: Thu, January 28, 2010 4:46:54 PM > Subject: implementing profanity detector > > We'd like to implement a profanity detector for documents during indexing. > That is, given a file of profane words, we'd like to be able to mark a > document as safe or no

implementing profanity detector

2010-01-28 Thread Mike Perham
We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document as safe or not safe if it contains any of those words so that we can have something similar to google's safe search. I'm