Is it possible to filter out numbers and disclaimer ( repeated contents)
while indexing to SOLR?
These are all surplus information and do not want to index it

I have tried using boilerpipe algorithm as well to remove surplus
infromation from web pages such as navigational elements, templates, and
advertisements , I think it works well but looking forward to see If I
could filter out  "disclaimer" information too mainly in email texts.
-- 
Thanks,

*Nipen Mark *

Reply via email to