Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it
I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and advertisements , I think it works well but looking forward to see If I could filter out "disclaimer" information too mainly in email texts. -- Thanks, *Nipen Mark *