Hi, 
I am using Solr for searching my email data. My application is in C++ so I a
using CURL library to POST the data to Solr for indexing. I am posting data
in XML format and some of the XML fields are in plain text and some of the
fields are in binary format. I want to know what should I do so that Solr
can index both types of data (plain text as well as binary data) coming in a
single XML file. 

For the reference my XML file looks like: 
"<add><doc><field name=mailbox-id>1111</field><field
name=folder>INBOX</field><field name=from>solr solr
<s...@abc.com></field><field name=to>solr <s...@abc.com></field><field
name=email-body>HI I AM EMAIL BODY\r\n\r\nTHANKS</field><field
name=email-attachment>Some binary data</doc></add>"

I tried to use ExtractingUpdateProcessorFactory  but it seems to me that
ExtractingUpdateProcessorFactory support is not in Solr 4.5(which I am
using) even not in any of the Solr version available in market. 

Also, I think I can not use ExtractingRequestHandler for my problem as the
document is of type XML format and having mixed type of data(text and
binary). Am I right ?? If yes, pls. suggest me how to proceed and if no, how
can I  extract text using ExtractingRequestHandler from some of the binary
fields.

Any help is highly appreciated.....



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to