Re: Possible to facet across two indices, or document types in single index?

Jeff Schmidt Sun, 11 Dec 2011 12:18:13 -0800

Hi again:

I figured it'd be bad form to hijack Kashif's thread, so I"ll just leverage 
some of its content here. :)

Chris, you replied:

> : But there is a workaround:
> : 1) Do a normal query without facets (you only need to request doc ids
> : at this point)
> : 2) Collect all the IDs of the documents returned
> : 3) Do a second query for all fields and facets, adding a filter to
> : restrict result to those IDs collected in step 2.
> 
> an easier solution, if you really just want the counts based on the data 
> from th page the user is looking at, is to count up the values in your UI 
> from the stored fields you get back.
> 
> (This is the type of thing that falls into the general category of "stuff 
> the client can do just as easily as Solr" so there isn't really any reason
> to consider implementing it in/with Solr.)

Moving on from join, my alternative solution is to do something like you 
described.  I am a server-side/API guy, and my application stands in between 
the UI/client and Solr.  I currently offer a number of facets, based on a 
single document type, and I want to add my new (cross-document) facets so as 
not to impose implementation details on the API client.  That is, they should 
receive values/counts and be able to specify values for narrowing down results, 
page facet values etc.  That is the ideal situation, anyway.

When the initial search query comes in, I can do 1-3 above as you describe.  I 
have fewer than 200K documents in the index. Given the generalness of the 
search terms, let's say I get 7500 document IDs back per 1 and 2.  It sounds 
like I need to create a filter query which includes all 7500 IDs, and issue the 
2nd query (in my case to another core) and have it facet on the additional 
field(s) I'm interested in.  I don't need to return results from this, just get 
the facet values/counts.

Step 4 for me is to search the first index again, to obtain the requested 
number of rows of results, return the appropriate fields, and calculate facets 
for that content.  I can then merge the facet results of both indexes, and the 
client is none the wiser.

A couple questions though (aren't there always? :))  Is this very efficient?  
Beyond building the string of 7500 IDs within my app, can Solr swallow that 
okay?  I'm using SolrJ, javabin format, so hopefully there is not a URL length 
issue (between my app and Solr)?  I'm guessing javabin uses HTTP POST.

What is a reasonable way for the facets derived from the 2nd index to be used 
for narrowing like those in the main content index? That is, pinning down facet 
values from the second index is not going to affect the results (document IDs) 
from searching the first index.  Perhaps that can resolved by performing steps 
1-3 as before, but also retrieving the related ID values (a field in second 
index that refers to the document ID in the first index) from the second index, 
and do a set intersection with the document IDs of the first index (step 2). 
Then I modify my step 4 to filter on the document ID intersection. This would 
cause documents in the first index to drop out due to narrowing a facet in the 
second index.

So, from a performance or strategy perspective, is it a bit crazy to make a go 
of this?

Thanks,

Jeff

On Dec 11, 2011, at 12:17 PM, Jeff Schmidt wrote:

> Thanks Chris.  I was just going to sit down and see if I could get join to do 
> what I want within a single index.  I'm glad I checked my email first. :)
> 
> However, I need to see how else to solve the problem, and it looks likes the 
> most apropriate line of reasoning is in your response to Kashif Kahn's 
> 05-Dec-11 email thread with the subject "Facet on a field with rows=n".  I'll 
> reply to that and officially abandoned my pursuit of join.
> 
> Cheers,
> 
> Jeff
> 
> On Dec 9, 2011, at 1:34 PM, Chris Hostetter wrote:
> 
>> 
>> : What you said about faceting is the key.  I want to use my existing 
>> : edismax configuration to create the scored document result set of type 
>> : Y.  I don't want to affect their scores, but for each document ID, I 
>> : want join it with another type of document (X), which has a field which 
>> : contains a document ID of one of Y. There will be zero or more of these 
>> : per Y doc ID.  The X document then has a multi-valued field I would like 
>> : to facet. I don't need scores for the joined X documents.
>> 
>> If i'm following you correctly, then what you are asking about just isn't 
>> possible with "join".
>> 
>> thecrux of the issue is that you have a particular type of document you 
>> want *returned* to the users as teh results of a search, sorted by score.  
>> that set of documents is what you "join to"  All faceting, stats, 
>> highlighting, etc... are based on that final set of documents -- the 
>> documents you "join from" can only contribute in the query by identifying 
>> the set -- none of their field values or properties "survive the join" so 
>> to speak.
>> 
>> : This does not sound possible according to the end of your final 
>> : paragraph.  Is that because two cores are involved?  Despite the join 
>> 
>> no .. it has nothing to do with the multiple core part of the problem -- 
>> it's just how join works.  it identifies an (unordered) set of "joined to" 
>> documents based on the matching "joined from" documents.
>> 
>> 
>> -Hoss
> 
> 
> 
> --
> Jeff Schmidt
> 535 Consulting
> j...@535consulting.com
> http://www.535consulting.com
> (650) 423-1068
> 
> 
> 
> 
> 
> 
> 
> 
> 

--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068

Re: Possible to facet across two indices, or document types in single index?

Reply via email to