Re: filtering number and repeated contents

2012-06-07 Thread Jack Krupansky
Subject: Re: filtering number and repeated contents thanks Jack , I will try updateProcessor Between does SOLR store tokenized "content" in fields if field have property stored="true" ? On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky wrote: My (very limited) understandin

Re: filtering number and repeated contents

2012-06-07 Thread Mark , N
hat is programmed with some > disclaimer signature text strings to be removed from field values. > > -- Jack Krupansky > > -Original Message- From: Mark , N > Sent: Tuesday, June 05, 2012 8:28 AM > To: solr-user@lucene.apache.org > Subject: filtering number and repeate

Re: filtering number and repeated contents

2012-06-05 Thread Jack Krupansky
itself. You may have to resort to a custom update processor that is programmed with some disclaimer signature text strings to be removed from field values. -- Jack Krupansky -Original Message- From: Mark , N Sent: Tuesday, June 05, 2012 8:28 AM To: solr-user@lucene.apache.org Subject:

filtering number and repeated contents

2012-06-05 Thread Mark , N
Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and