Subject: Re: filtering number and repeated contents
thanks Jack , I will try updateProcessor
Between does SOLR store tokenized "content" in fields if field have
property stored="true" ?
On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky
wrote:
My (very limited) understandin
hat is programmed with some
> disclaimer signature text strings to be removed from field values.
>
> -- Jack Krupansky
>
> -Original Message- From: Mark , N
> Sent: Tuesday, June 05, 2012 8:28 AM
> To: solr-user@lucene.apache.org
> Subject: filtering number and repeate
itself. You may have to
resort to a custom update processor that is programmed with some disclaimer
signature text strings to be removed from field values.
-- Jack Krupansky
-Original Message-
From: Mark , N
Sent: Tuesday, June 05, 2012 8:28 AM
To: solr-user@lucene.apache.org
Subject:
Is it possible to filter out numbers and disclaimer ( repeated contents)
while indexing to SOLR?
These are all surplus information and do not want to index it
I have tried using boilerpipe algorithm as well to remove surplus
infromation from web pages such as navigational elements, templates, and