Greetings,

Is there a way to configure more graceful handling of field formatting
exceptions when indexing documents?

Currently, there is a field being generated in some documents that I
am indexing that is supposed to be a float but some times slips
through as an empty string. (I know, fix the docs, but sometimes bad
values slip through, and it would be nice to handle them in a more
forgiving manner).

Here's an example of the exception - when this happens, the entire doc
is thrown out due to the one malformed field:
---snip---
ERROR org.apache.solr.core.SolrCore -
org.apache.solr.common.SolrException: ERROR: [doc=docidstr] Error
adding field 'f_floatfield'=''
...
Caused by: java.lang.NumberFormatException: empty String

00:56:46,288 [SI] WARN  com.company.IndexerThread - BAD DOC:
a82a2f6a6a42ad3c98a05ddb3f2c382c
01:02:12,713 [SI] ERROR org.apache.solr.core.SolrCore -
org.apache.solr.common.SolrException: ERROR:
[doc=6ff90020f9ec0f6dd623e9879c3e024d] Error adding field
'f_afloatfield'=''
        at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
        at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142)
        at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
        at com.company.IndexerThread.run(IndexerThread.java:55)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NumberFormatException: empty String
        at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1011)
        at java.lang.Float.parseFloat(Float.java:452)
        at org.apache.solr.schema.TrieField.createField(TrieField.java:410)
        at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
        at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
        at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
        ... 12 more

01:02:12,713 [SI] WARN  com.company.IndexerThread - BAD DOC:
6ff90020f9ec0f6dd623e9879c3e024d
---snip---

In my thinking (and for this situation), it would be much better to
just ignore the malformed field and keep the doc - is there any way to
configure this or enable this behavior instead?

Thanks,
     Aaron

Reply via email to