Hi all,

 I am trying to index html documents using Solr and I am having difficulties
to extract certain parts of the main content of the document and store them
sepparately into other fields. I saw on the docs that it is possible to
achieve this using xpath but in my certain case I need to do a regex match. 
 To be more specifical I am willing to copy a certain pattern content to
title field. My first attempt was to define a custom field type with a
PatternFilter and copy content field to title field but this did not work.
Next attempt was to specify that copyField tag would have a pattern and
group attributes but this did not work as well.

 Is it possible to do what I am trying? I am unwilling to resort to grep
outside Solr as I am pretty sure Solr is capable of doing what I want...

best regards,
Rafael Ribeiro

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-content-upon-indexing-tp3203946p3203946.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to