: You can not do this in Solr, you cannot even send non-character code 
: points in the first place. For Apache Nutch we solved the problem by 

Strictly speak: this is false.  You *can* send control characters to solr 
as field values -- assuming your transport format allows it.

Example: using javabin to send SolrInputDocuments from a SolrJ client 
doesn't care if the field value Strings have control characters in them.  
Likewise it should be possible to send many control characters when using 
JSON formatted updates -- let alone using something like DIH to pull blog 
data from a DB, or the Extracting Request handler which might find
control-characters in MS-Word of PDF docs.

In all of those cases, an UpdateProcessor to strip out hte unwanted 
characters can/will work well.

In the specific case discussed in this thread (based on the eventual stack 
trace posted) and UpdateProcessor witll *not* work because the fundemental 
problem is that the control characters in question mean that the "XML-ish" 
lookin bytes being sent to Solr by the client are not actually valid XML 
-- because by definition XML can not contain those invalid 
control-characters.


-Hoss
http://www.lucidworks.com/

Reply via email to