I developed a solution to this problem and I thought I should share it in case 
someone encounters a similar problem.

Recap: My problem was that for every document in my index I needed to know if 
it was the most recent that contained an ID in a multi-valued field. Doing this 
for one ID was simple (id:${myId} sort:date asc, rows=1). It is much more 
difficult to do this for a set of ids at the same time, in my case up to 100. 
If I try 'id:id1 or id:id2 or id:id3... sort=date asc & rows=11' I may not get 
a match for every ID in my query. IE, with a query of 100 unique IDs, 100 Rows, 
I might only find 75 of those uniqueIds in the response.

My solution is to pre-calculate this information. 

I created a new multi-valued field, mostRecentForIds, and store in that field 
all of the IDS for which this document is the most recent. Each ID will only 
appear once in the index in this field, allowing me to obtain my 100 unique Id 
response when querying with 100 unique IDs. I also created a Boolean field, 
'isPostProcessed' which is set to false when a new doc is added.

Then, on a cron, I select all documents with isPostProcessed:false, and perform 
the precalculation logic on all the ids stored in the resultset, and updating 
isPostProcessed:false. 

The downside to this approach is that every document must be indexed twice. I 
could not perform the logic before the initial index since there could be other 
unindexed documents in a forthcoming commit that would conflict.

Hopefully someone finds this useful eventually!

-Kallin Nagelberg 





-----Original Message-----
From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] 
Sent: Friday, May 21, 2010 4:44 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: seemingly impossible query

I just realized something that may make the fieldcollapsing strategy 
insufficient. My 'ids' field is multi-valued. From what I've read you cannot 
field collapse on a multi-valued field. Any other ideas?

Thanks,
-Kallin Nagelberg

-----Original Message-----
From: Geert-Jan Brits [mailto:gbr...@gmail.com] 
Sent: Thursday, May 20, 2010 1:03 PM
To: solr-user@lucene.apache.org
Subject: Re: seemingly impossible query

Hi Kallin,

again please look at
FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing> ,
that should do the trick.
basically: first you constrain the field: 'listOfIds' to only contain docs
that contain any of the (up to) 100 random ids as you know how to do

Next, in the same query, specify to collapse on field 'listOfIds '
basically:
q=listOfIds:1 OR listOfIds:10 OR listOfIds:24&
collapse.threshold=1&collapse.field=listOfIds&collapse.type=normal

this would return the top-matching doc for each id left in listOfIds. Since
you constrained this field by the ids specified you are left with 1 matching
doc for each id.

Again it is not guarenteed that all docs returned are different. Since you
didn't specify this as a requirement I think this will suffics.

Cheers,
Geert-Jan

2010/5/20 Nagelberg, Kallin <knagelb...@globeandmail.com>

> Yeah I need something like:
> (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..
>
> I'm not sure how I can hit solr once. If I do try and do them all in one
> big OR query then I'm probably not going to get a hit for each ID. I would
> need to request probably 1000 documents to find all 100 and even then
> there's no guarantee and no way of knowing how deep to go.
>
> -Kallin Nagelberg
>
> -----Original Message-----
> From: dar...@ontrenet.com [mailto:dar...@ontrenet.com]
> Sent: Thursday, May 20, 2010 12:27 PM
> To: solr-user@lucene.apache.org
> Subject: RE: seemingly impossible query
>
> I see. Well, now you're asking Solr to ignore its prime directive of
> returning hits that match a query. Hehe.
>
> I'm not sure if Solr has a "unique" attribute.
>
> But this sounds, to me, like you will have to filter the results yourself.
> But at least you hit Solr only once before doing so.
>
> Good luck!
>
> > Thanks Darren,
> >
> > The problem with that is that it may not return one document per id,
> which
> > is what I need.  IE, I could give 100 ids in that OR query and retrieve
> > 100 documents, all containing just 1 of the IDs.
> >
> > -Kallin Nagelberg
> >
> > -----Original Message-----
> > From: dar...@ontrenet.com [mailto:dar...@ontrenet.com]
> > Sent: Thursday, May 20, 2010 12:21 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: seemingly impossible query
> >
> > Ok. I think I understand. What's impossible about this?
> >
> > If you have a single field name called <id> that is multivalued
> > then you can retrieved the documents with something like:
> >
> > id:1 OR id:2 OR id:56 ... id:100
> >
> > then add limit 100.
> >
> > There's probably a more succinct way to do this, but I'll leave that to
> > the experts.
> >
> > If you also only want the documents within a certain time, then you also
> > create a <time> field and use a conjunction (id:0 ...) AND time:NOW-1H
> > or something similar to this. Check the query syntax wiki for specifics.
> >
> > Darren
> >
> >
> >> Hey everyone,
> >>
> >> I've recently been given a requirement that is giving me some trouble. I
> >> need to retrieve up to 100 documents, but I can't see a way to do it
> >> without making 100 different queries.
> >>
> >> My schema has a multi-valued field like 'listOfIds'. Each document has
> >> between 0 and N of these ids associated to them.
> >>
> >> My input is up to 100 of these ids at random, and I need to retrieve the
> >> most recent document for each id (N Ids as input, N docs returned). I'm
> >> currently planning on doing a single query for each id, requesting 1
> >> row,
> >> and caching the result. This could work OK since some of these ids
> >> should
> >> repeat quite often. Of course I would prefer to find a way to do this in
> >> Solr, but I'm not sure it's capable.
> >>
> >> Any ideas?
> >>
> >> Thanks,
> >> -Kallin Nagelberg
> >>
> >
> >
>
>

Reply via email to