Hey SOLR folks -- There's too much info for me to digest, so please
remove me from the email threads.
However, if we can build you a forum, bulletin board or other web-
based tool, please let us know. For that matter, we would be happy to
build you a new website.
Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we
love SOLR! Let us know how we can support your efforts.
Susan Rust
VP of Client Services
If you wish to travel quickly, go alone
If you wish to travel far, go together
------------------------------------------------
Achieve Internet
1767 Grand Avenue, Suite 2
San Diego, CA 92109
800-618-8777 x106
858-453-5760 x106
Susan-Rust (skype)
@Susan_Rust (twitter)
@Achieveinternet (twitter)
@drupalsandiego (San Diego Drupal Users' Group Twitter)
This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please notify
the sender immediately by e-mail if you have received this e-mail by
mistake and delete this e-mail from your system. E-mail transmission
cannot be guaranteed to be secure or error-free as information could
be intercepted, corrupted, lost, destroyed, arrive late or incomplete,
or contain viruses. The sender therefore does not accept liability for
any errors or omissions in the contents of this message, which arise
as a result of e-mail transmission. If verification is required please
request a hard-copy version.
On Jun 23, 2010, at 1:52 AM, Mark Allan wrote:
Cheers, Geert-Jan, that's very helpful.
We won't always be searching with dates and we wouldn't want
duplicates to show up in the results, so your second suggestion
looks like a good workaround if I can't solve the actual problem. I
didn't know about FieldCollapsing, so I'll definitely keep it in mind.
Thanks
Mark
On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote:
Perhaps my answer is useless, bc I don't have an answer to your
direct
question, but:
You *might* want to consider if your concept of a solr-document is
on the
correct granular level, i.e:
your problem posted could be tackled (afaik) by defining a
document being a
'sub-event' with only 1 daterange.
So for each event-doc you have now, this is replaced by several sub-
event
docs in this proposed situation.
Additionally each sub-event doc gets an additional field 'parent-
eventid'
which maps to something like an event-id (which you're probably
using) .
So several sub-event docs can point to the same event-id.
Lastly, all sub-event docs belonging to a particular event
implement all the
other fields that you may have stored in that particular event-doc.
Now you can query for events based on data-rages like you
envisioned, but
instead of returning events you return sub-event-docs. However
since all
data of the original event (except the multiple dateranges) is
available in
the subevent-doc this shouldn't really bother the client. If you
need to
display all dates of an event (the only info missing from the
returned
solr-doc) you could easily store it in a RDB and fetch it using the
defined
parent-eventid.
The only caveat I see, is that possibly multiple sub-events with
the same
'parent-eventid' might get returned for a particular query.
This however depends on the type of queries you envision. i.e:
1) If you always issue queries with date-filters, and *assuming*
that
sub-events of a particular event don't temporally overlap, you will
never
get multiple sub-events returned.
2) if 1) doesn't hold and assuming you *do* mind multiple sub-
events of
the same actual event, you could try to use Field Collapsing on
'parent-eventid' to only return the first sub-event per parent-
eventid that
matches the rest of your query. (Note however, that Field
Collapsing is a
patch at the moment. http://wiki.apache.org/solr/FieldCollapsing)
Not sure if this helped you at all, but at the very least it was a
nice
conceptual exercise ;-)
Cheers,
Geert-Jan
2010/6/22 Mark Allan <mark.al...@ed.ac.uk>
Hi all,
Firstly, I apologise for the length of this email but I need to
describe
properly what I'm doing before I get to the problem!
I'm working on a project just now which requires the ability to
store and
search on temporal coverage data - ie. a field which specifies a
date range
during which a certain event took place.
I hunted around for a few days and couldn't find anything which
seemed to
fit, so I had a go at writing my own field type based on
solr.PointType.
It's used as follows:
schema.xml
<fieldType name="temporal" class="solr.TemporalCoverage"
dimension="2" subFieldSuffix="_i"/>
<field name="daterange" type="temporal" indexed="true"
stored="true"
multiValued="true"/>
data.xml
<add>
<doc>
...
<field name="daterange">1940,1945</field>
</doc>
</add>
Internally, this gets stored as:
<arr name="daterange"><str>1940,1945</str></arr>
<int name="daterange_0_i">19400000</int>
<int name="daterange_1_i">19450000</int>
In due course, I'll declare the subfields as a proper date type,
but in the
meantime, this works absolutely fine. I can search for an
individual date
and Solr will check (queryDate > daterange_0 AND queryDate <
daterange_1 )
and the correct documents are returned. My code also allows the
user to
input a date range in the query but I won't complicate matters
with that
just now!
The problem arises when a document has more than one "daterange"
field
(imagine a news broadcast which covers a variety of topics and
hence time
periods).
A document with two daterange fields
<doc>
...
<field name="daterange">19820402,19820614</field>
<field name="daterange">1990,2000</field>
</doc>
gets stored internally as
<arr
name="daterange"><str>19820402,19820614</str><str>1990,2000</str></
arr>
<arr name="daterange_0_i"><int>19820402</int><int>19900000</int></
arr>
<arr name="daterange_1_i"><int>19820614</int><int>20000000</int></
arr>
In this situation, searching for 1985 should yield zero results as
it is
contained within neither daterange, however, the above document is
returned
in the result set. What Solr is doing is checking that the
queryDate (1985)
is greater than *any* of the values in daterange_0 AND queryDate
is less
than *any* of the values in daterange_1.
How can I get Solr to respect the positions of each item in the
daterange_0
and _1 arrays? Ideally I'd like the search to use the following
logic, thus
preventing the above document from being returned in a search for
1985:
(queryDate > daterange_0[0] AND queryDate < daterange_1[0]) OR
(queryDate > daterange_0[1] AND queryDate < daterange_1[1])
Someone else had a very similar problem recently on the mailing
list with a
multiValued PointType field but the thread went cold without a final
solution.
While I could filter the results when they get back to my
application
layer, it seems like it's not really the right place to do it.
Any help getting Solr to respect the positions of items in arrays
would be
very gratefully received.
Many thanks,
Mark
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.