Ok, I thought the quickest thing to try out would be (B) so now all of my feeds 
have the same format and I have removed the default value "NOW" from my 
schema.xml file.

    <field name="timestamp_created" type="date" indexed="true" stored="true" 
required="true" multiValued="false" />
... and ...
    <fieldType name="date" class="solr.DateField" sortMissingLast="true" 
omitNorms="true"/>

I rebuilt my index with consistent date formats in all my files but my 
exception remains unchanged.

Apr 24, 2008 9:11:26 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: 
"2008-04-24T09:03:33Z"
        at org.apache.solr.schema.DateField.toObject(DateField.java:173)
        at org.apache.solr.schema.DateField.toObject(DateField.java:83)
        at 
org.apache.solr.update.DocumentBuilder.loadStoredFields(DocumentBuilder.java:285)
...
Caused by: java.text.ParseException: Unparseable date: "2008-04-24T09:03:33Z"
        at java.text.DateFormat.parse(Unknown Source)
        at org.apache.solr.schema.DateField.toObject(DateField.java:170)
        ... 27 more

Here's the output from Luke for the only date in my schema... timestamp_created
Field: timestamp_createdField Type: date
Properties:  Indexed, Stored, Omit Norms, Sort Missing Last
Schema:  Indexed, Stored, Omit Norms, Sort Missing Last
Index:  Indexed, Stored, Omit Norms
Index Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 
Query Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 
Docs:  146704
Distinct:  33


termfrequency
2008-04-24T09:03:53Z11076
2008-04-24T09:03:55Z10036
2008-04-24T09:03:52Z9400
2008-04-24T09:03:51Z8855
2008-04-24T09:03:54Z8763
2008-04-24T09:03:33Z6577
2008-04-24T09:03:34Z5783
2008-04-24T09:03:36Z5665
2008-04-24T09:03:35Z5507
2008-04-24T09:03:38Z5498
2008-04-24T09:03:39Z5496
2008-04-24T09:03:37Z5407
2008-04-24T09:04:02Z4509
2008-04-24T09:03:46Z4366
2008-04-24T09:04:07Z4317
2008-04-24T09:04:19Z4131
2008-04-24T09:04:17Z4021
2008-04-24T09:04:05Z4020
2008-04-24T09:04:15Z3898
2008-04-24T09:04:18Z3894
2008-04-24T09:04:16Z3497
2008-04-24T09:04:06Z3482
2008-04-24T09:04:08Z3442
2008-04-24T09:04:01Z3196
2008-04-24T09:04:03Z3182
2008-04-24T09:04:04Z3179
2008-04-24T09:03:45Z2936
2008-04-24T09:03:32Z2412
2008-04-24T09:03:56Z1870
2008-04-24T09:04:00Z433
2008-04-24T09:04:14Z430
2008-04-24T09:04:20Z369
2008-04-24T09:03:40Z353

... and for the field ...
Field Type: dateFields: timestamp_created  <-- only date filed in the schema
Tokenized:  false
Class Name:  org.apache.solr.schema.DateField
Index Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 
Query Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 

This now seems to be something different than SOLR-470 and SOLR-544 since the 
format seems to be accepted at indexing, and is consistent in the index, but is 
still not accepted at query time.

Anyone have a suggestion?

Thanks,

Brian Johnson

----- Original Message ----
From: Brian Johnson <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, April 23, 2008 11:23:54 AM
Subject: SOLR-470 & default value in schema with NOW

So I just ran into this bug:
    https://issues.apache.org/jira/browse/SOLR-470

and read about this related one:
    https://issues.apache.org/jira/browse/SOLR-544

Here is the relevant trace:

Apr 22, 2008 10:59:01 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: 
"2008-04-03T22:42:13Z"
        at org.apache.solr.schema.DateField.toObject(DateField.java:173)
        at org.apache.solr.schema.DateField.toObject(DateField.java:83)
        at 
org.apache.solr.update.DocumentBuilder.loadStoredFields(DocumentBuilder.java:285)
...
Caused by: java.text.ParseException: Unparseable date: "2008-04-03T22:42:1
        at java.text.DateFormat.parse(Unknown Source)

The root cause (I believe, am going to confirm tonight) is that I have multiple 
index files I'm uploading into this column in the schema:
   <field name="timestamp_created" type="date" indexed="true" stored="true" 
required="true" multiValued="false" default="NOW" />

Here is my typedef for 'date':
    <fieldType name="date" class="solr.DateField" sortMissingLast="true" 
omitNorms="true"/>


What I came to realize is that my index files contain this column value 
consistently specified, but one of my files does not contain the column at all. 
Due to my indication of a default value, I am reliant on the SOLR default for 
NOW being in the same format (no millis, .0, .00, .000, etc) as I have passed 
in my feed. As you can see from the exception, my feed does not contain any 
millis which is a valid format according to 544 and the documentation I've 
read. 

Now finally, my problem. The format for NOW doesn't seem to be documented so I 
have no idea what I need to 'match' (or even that matching is necessary from 
the documentation outside these 2 bugs) in order to take advantage of the 
default value feature and mix that with data from my streams. I can see from 
here that it isn't the 'no millis' form since a discrepancy is triggering this 
bug. 

Solutions?

A) Should I create a format normalizer and configure that into my typedef for 
'date' so that I am agnostic of these differences in terms of input and insure 
the indexed format is consistent? I believe this would be a <analyzer 
type="index"><filter .../></analyzer>. I'm not concerned about the presence or 
absence of millis on the output. Would this approach work? Based on the 
presence of the filter in the fieldType, it feels like a hack.

B) Should I remove the default value and just insure all my streams have this 
value specified consistently an not trigger the bug? It seems to me that SOLR 
should be robust in this respect, but reading SOLR-544 I can see that this 
isn't an opinion that is held by all.

C) Should I apply one of the existing SOLR-470 patch files and move on?

D) Should I take a stab at https://issues.apache.org/jira/browse/SOLR-440 as an 
alternative 'class' for my 'date' type?

Thanks,

Brian






Reply via email to