My scenario is something like this:

I have a students database. I want to query all the students who were either
`absent` or `present` during a particular `date-range`.

For example:

Student "X" was `absent` between dates:

     Jan 1, 2015 and Jan 15, 2015
     Feb 13, 2015 and Feb 16, 2015
     March 19, 2015 and March 25, 2015

Also "X" was `present` between dates:

     Jan 25, 2015 and Jan 30, 2015
     Feb 1, 2015 and Feb 12, 2015

(Other days were either school holidays or the teacher was either
lazy/forgot to take the attendance ;)

If the date range was only a single-valued field then this approach would
work:
http://stackoverflow.com/questions/25246204/solr-query-for-documents-whose-from-to-date-range-contains-the-user-input.
I have multiple-date ranges for each student, so this would not work for my
use-case.

Lucent 5.0 has support for `DateRangeField`
(http://lucene.apache.org/solr/5_0_0/solr-core/index.html?org/apache/solr/schema/DateRangeField.html
) which is perfect for my use-case, but I cannot upgrade to 5.0 yet! I am on
Lucene 4.1.0. David Smiley had mentioned that it would be ported to 4.x but
I guess it never happened (https://issues.apache.org/jira/browse/SOLR-6103,
I can try porting this patch my-self but I would like to know what it takes
and opinions)

So basically, I need to maintain relationship between the start and end
dates for each of the `state`s (absence or presence). So I thought I would
need to index the fields as pairs as mentioned here:
http://grokbase.com/t/lucene/solr-user/128r96vwz6/how-do-i-represent-a-group-of-customer-key-value-pairs

I guess my schema would look like:

    <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"
precisionStep="6" positionIncrementGap="0"/>

    <field name="state" type="string" indexed="true" stored="true"
multiValued="true"/>
    <dynamicField name="presenceStartTime_*" type="tdate" indexed="true"
stored="true"/>
    <dynamicField name="presenceEndTime_*" type="tdate" indexed="true"
stored="true"/>
    <dynamicField name="absenceStartTime_*" type="tdate" indexed="true"
stored="true"/>
    <dynamicField name="absenceEndTime_*" type="tdate" indexed="true"
stored="true"/>

**Question #1:** Does this look correct ? 

**Question #2:** What are the ramifications if I use `tlong` instead of
`tdate` ? My `tlong` type looks like this:

    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8"
omitNorms="true" positionIncrementGap="0"/>

**Question #3:** So in this case, for the query: "get all the students who
were absent between a date range" would the query would look something
similar to this ?

    (state: absent) AND 
    (absenceStartTime1: givenLowerBoundDate) AND
    (absenceStartTime2: givenLowerBoundDate) AND
    (absenceStartTime3: givenLowerBoundDate) AND
    (absenceEndTime1: givenUpperBoundDate) AND
    (absenceEndTime2: givenUpperBoundDate) AND
    (absenceEndTime3: givenUpperBoundDate)


This would work only if I knew that there were 3 dates in which the student
was absent before hand and there's no way to query all dynamic fields with
wild-cards according to
http://stackoverflow.com/questions/6213184/solr-search-query-for-dynamic-fields-indexed

**Question #4:** The workaround mentioned in one of the answers in that
question did not look terrible but seemed a bit complicated. Is there a
better alternative for solving this problem in Solr ?

Of course, I would be highly interested in any other better approaches.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to