I think there's something wrong with your database then, but okay.

You still haven't said what your Solr schema looks like -- that list of values doesn't say what the solr field names or types are. I think this is maybe because you don't actually have a Solr database and have no idea how Solr works, you're just asking in theory? On the other hand, you just said you have better performance with solr -- I'm not sure how you were able to tell the performance of solr in answering these queries if you don't even know how to make them!

But, again, assuming your data is set up like i'm guessing it is, it's quite similar to what you'd do with an rdbms.

What does 'most current' mean? Can jobs be overlapping? To find the project with the latest start date for a given person, just limit to documents with that current person in a 'q' or 'fq', and then sort by "start_date desc". Perhaps limit to 1 if you really only want one hit. Same principle as you would in an rdbms.

Again, this requires setting up your solr index in such a way to answer these sorts of questions. Each "document" in Solr will represent a person-project pair. It'll have fields for person (or multiple fields, personID, personFirst, personLast, etc), project name, project start date, project end date. This will make it easy/possible to answer questions like your examples with Solr, but will make it hard to answer many other sorts of questions -- unlike an rdbms, it is difficult to set up a Solr index that can flexibly answer just about any question you through at it, particularly when you have hieararchical or otherwise multi-entity data.

If you are interested, the standard Solr tutorial is pretty good: http://lucene.apache.org/solr/tutorial.html




On 9/15/2011 6:39 PM, gary tam wrote:
Thanks for the reply.  We had the search within the database initially, but
it proven to be too slow.  With solr we have much better performance.

One more question, how could I find the most current job for each employee

My data looks like


John Smith  department A       web site bug fix       2010-01-01
2010-01-03
                                              unit testing
  2010-01-04       2010-01-06
                                              QA support
2010-01-07       2010-01-12
                                              implementation       2010-01-13
        2010-01-22

Jane Doe  department A          QA support             2010-01-01
2010-05-01
                                              implementation       2010-05-02
        2010-09-28

Joe Doe  department A            PHP development  2011-01-01
2011-08-31
                                              Java Development  2011-09-01
     2011-09-15

I would like to return this as my search result

John Smith       department A            implementation          2010-01-13
       2010-01-22
Jane Doe          department A            implementation          2010-05-02
       2010-09-28
Joe Doe            department A            Java Development      2011-09-01
       2011-09-15


Thanks in advance
Gary



On Thu, Sep 15, 2011 at 3:33 PM, Jonathan Rochkind<rochk...@jhu.edu>  wrote:

You didn't tell us what your schema looks like, what fields with what types
are involved.

But similar to how you'd do it in your database, you need to find
'documents' that have a start date before your date in question, and an end
date after your date in question, to find the ones whose range includes your
date in question.

Something like this:

q=start_date:[* TO '2010-01-05'] AND end_date:['2010-01-05' TO *]

Of course, you need to add on your restriction to just documents about
'John Smith', through another AND clause or an 'fq'.

But in general, if you've got a db with this info already, and this is all
you need, why not just use the db?  Multi-hieararchy data like this is going
to give you trouble in Solr eventually, you've got to arrange the solr
indexes/schema to answer your questions, and eventually you're going to have
two questions which require mutually incompatible schema to answer.

An rdbms is a great general purpose question answering tool for structured
data.  lucene/Solr is a great indexing tool for text matching.


On 9/15/2011 2:55 PM, gary tam wrote:

Hi

I have a scenario that I am not sure how to write the query for.

Here is the scenario - have an employee record with multi value for
project,
started date, end date.

looks something like


John Smith             web site bug fix       2010-01-01       2010-01-03
                              unit testing              2010-01-04
2010-01-06
                              QA support             2010-01-07
2010-01-12
                              implementation       2010-01-13
  2010-01-22

I want to find what project John Smith was working on 2010-01-05

Is this possible or I have to back to my database ?


Thanks


Reply via email to