Looking into the recent JENKINS jobs, one of the failing tests is

java.time.format.DateTimeParseException: Text 'Thu Nov 13 04:35:51 AKST
2008' could not be parsed at index 20

The root cause of this failing test seems to be related to JDK 23 (and
Windows?) and has some history from JDK 8/9 according to our tests. I
believe this is caused by some changes in the newer JDK that no longer
parse time zones as expected.

The documentation of our ParseDateFieldUpdateProcessorFactory says:

<p>A default time zone name or offset may optionally be specified for those
dates that don't
include an explicit zone/offset. NOTE: three-letter zone designations like
"EST" are not
parseable (with the single exception of "UTC"), because they are ambiguous.
If no default time
zone is specified, UTC will be used. See <a
href="http://en.wikipedia.org/wiki/List_of_tz_database_time_zones";
>Wikipedia's list of TZ
database time zone names</a>.


So officially, if I interpret it correctly, we do not support the time zone
abbreviations, but we still have tests that check the JDK behavior for
unsupported abbreviations like ASKT and also support the "z" in the
dateformat pattern for "UTC".

To address this "limitation", instead of removing the tests and the support
for time zones, I would like to propose a new feature, allowing the users
to provide time zone mappings to the
solr.ParseDateFieldUpdateProcessorFactory processor.

A configuration could look something like this:

  <updateRequestProcessorChain
name="parse-date-patterns-timeZoneCompat-config">
    <processor class="solr.ParseDateFieldUpdateProcessorFactory">
      <arr name="format">
        <str>yyyy-MM-dd['T'[HH:mm[:ss[.SSS]][z</str>
        <str>yyyy-MM-dd['T'[HH:mm[:ss[,SSS]][z</str>
        <str>yyyy-MM-dd HH:mm[:ss[.SSS]][z</str>
        <str>yyyy-MM-dd HH:mm[:ss[,SSS]][z</str>
        <str>[EEE, ]dd MMM yyyy HH:mm[:ss] z</str>
        <str>EEEE, dd-MMM-yy HH:mm:ss z</str>
        <str>EEE MMM ppd HH:mm:ss [z ]yyyy</str>
      </arr>
      <arr name="zoneMappings">
        <!-- Alaska Standard Time -->
        <zoneMapping abbreviation="ASKT" offset="-09:00"/>
        <zoneMapping abbreviation="AKDT" offset="-09:00"/>
        <!-- Irish Standard Time -->
        <zoneMapping abbreviation="IST" offset="+01:00"/>
        <!-- Japan Standard Time -->
        <zoneMapping abbreviation="JST" offset="+09:00"/>
      </arr>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

The solution would extend the functionality with an optional parameter
"zoneMappings" that accepts a list of key-value pairs for abbreviations and
their equivalent time offset. These values can then be used to replace
occurences in the datetime string before it is passed to DateTimeFormatter,
guaranteeing the successful parsing of unsupported time zone abbreviations.

This way the user can be explicit of how the time zone abbreviation is
interpreted and provide his own mappings directly in the processor as
configuration. As a current workaround I believe the user has to use
another processor for mapping the time zones before processing the datetime
with the default processor. If this should be the recommended approach,
support for timezone should be dropped completely instead (probably by
simply removing the failing tests related to time zone abbreviations).

Since I have not actively worked on processor implementations, I am not
sure if this proposal makes sense. What are your thoughts?

Reply via email to