So I just ran into this bug:
    https://issues.apache.org/jira/browse/SOLR-470

and read about this related one:
    https://issues.apache.org/jira/browse/SOLR-544

Here is the relevant trace:

Apr 22, 2008 10:59:01 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: 
"2008-04-03T22:42:13Z"
        at org.apache.solr.schema.DateField.toObject(DateField.java:173)
        at org.apache.solr.schema.DateField.toObject(DateField.java:83)
        at 
org.apache.solr.update.DocumentBuilder.loadStoredFields(DocumentBuilder.java:285)
...
Caused by: java.text.ParseException: Unparseable date: "2008-04-03T22:42:1
        at java.text.DateFormat.parse(Unknown Source)

The root cause (I believe, am going to confirm tonight) is that I have multiple 
index files I'm uploading into this column in the schema:
   <field name="timestamp_created" type="date" indexed="true" stored="true" 
required="true" multiValued="false" default="NOW" />

Here is my typedef for 'date':
    <fieldType name="date" class="solr.DateField" sortMissingLast="true" 
omitNorms="true"/>


What I came to realize is that my index files contain this column value 
consistently specified, but one of my files does not contain the column at all. 
Due to my indication of a default value, I am reliant on the SOLR default for 
NOW being in the same format (no millis, .0, .00, .000, etc) as I have passed 
in my feed. As you can see from the exception, my feed does not contain any 
millis which is a valid format according to 544 and the documentation I've 
read. 

Now finally, my problem. The format for NOW doesn't seem to be documented so I 
have no idea what I need to 'match' (or even that matching is necessary from 
the documentation outside these 2 bugs) in order to take advantage of the 
default value feature and mix that with data from my streams. I can see from 
here that it isn't the 'no millis' form since a discrepancy is triggering this 
bug. 

Solutions?

A) Should I create a format normalizer and configure that into my typedef for 
'date' so that I am agnostic of these differences in terms of input and insure 
the indexed format is consistent? I believe this would be a <analyzer 
type="index"><filter .../></analyzer>. I'm not concerned about the presence or 
absence of millis on the output. Would this approach work? Based on the 
presence of the filter in the fieldType, it feels like a hack.

B) Should I remove the default value and just insure all my streams have this 
value specified consistently an not trigger the bug? It seems to me that SOLR 
should be robust in this respect, but reading SOLR-544 I can see that this 
isn't an opinion that is held by all.

C) Should I apply one of the existing SOLR-470 patch files and move on?

D) Should I take a stab at https://issues.apache.org/jira/browse/SOLR-440 as an 
alternative 'class' for my 'date' type?

Thanks,

Brian



Reply via email to