How to retrieve field contents as UTF-8 from Solr-Index with SolrJ

Andreas Kahl Thu, 18 Oct 2012 07:54:35 -0700

Hello everyone, 

we are trying to implement a simple Servlet querying a Solr 3.5-Index
with SolrJ. The Query we send is an identifier in order to retrieve a
single record. From the result we extract one field to return. This
field contains an XML-Document with characters from several european and
asian alphabets, so we need UTF-8.


Now we have the problem that the string returned by 
marcXml = results.get(0).getFirstValue("marcxml").toString();
is not valid UTF-8, so the resulting XML-Document is not well formed. 

Here is what we do in Java: 
<<
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("q", query.toString());
params.set("fl", "marcxml");
params.set("rows", "1");
try {
        QueryResponse result = server.query(params,
SolrRequest.METHOD.POST);
        SolrDocumentList results = result.getResults();
        if (!results.isEmpty()) {
            marcXml =
results.get(0).getFirstValue("marcxml").toString();
        }
    } catch (Exception ex) {
        Logger.getLogger(MarcServer.class.getName()).log(Level.SEVERE,
null, ex);
    }
>>

Charset.defaultCharset() is "UTF-8" on both, the querying machine and
the Solr-Server. Also we tried BinaryResponseParser as well as
XMLResponseParser when instantiating CommonsHttpSolrServer. 

Does anyone have a solution to this? Is this related to
https://issues.apache.org/jira/browse/SOLR-2034 ? Is there
eventually a workaround?

Regards
Andreas

How to retrieve field contents as UTF-8 from Solr-Index with SolrJ

Reply via email to