Hi all, I am trying to index html documents using Solr and I am having difficulties to extract certain parts of the main content of the document and store them sepparately into other fields. I saw on the docs that it is possible to achieve this using xpath but in my certain case I need to do a regex match. To be more specifical I am willing to copy a certain pattern content to title field. My first attempt was to define a custom field type with a PatternFilter and copy content field to title field but this did not work. Next attempt was to specify that copyField tag would have a pattern and group attributes but this did not work as well.
Is it possible to do what I am trying? I am unwilling to resort to grep outside Solr as I am pretty sure Solr is capable of doing what I want... best regards, Rafael Ribeiro -- View this message in context: http://lucene.472066.n3.nabble.com/Filter-content-upon-indexing-tp3203946p3203946.html Sent from the Solr - User mailing list archive at Nabble.com.