Hi again: I figured it'd be bad form to hijack Kashif's thread, so I"ll just leverage some of its content here. :)
Chris, you replied: > : But there is a workaround: > : 1) Do a normal query without facets (you only need to request doc ids > : at this point) > : 2) Collect all the IDs of the documents returned > : 3) Do a second query for all fields and facets, adding a filter to > : restrict result to those IDs collected in step 2. > > an easier solution, if you really just want the counts based on the data > from th page the user is looking at, is to count up the values in your UI > from the stored fields you get back. > > (This is the type of thing that falls into the general category of "stuff > the client can do just as easily as Solr" so there isn't really any reason > to consider implementing it in/with Solr.) Moving on from join, my alternative solution is to do something like you described. I am a server-side/API guy, and my application stands in between the UI/client and Solr. I currently offer a number of facets, based on a single document type, and I want to add my new (cross-document) facets so as not to impose implementation details on the API client. That is, they should receive values/counts and be able to specify values for narrowing down results, page facet values etc. That is the ideal situation, anyway. When the initial search query comes in, I can do 1-3 above as you describe. I have fewer than 200K documents in the index. Given the generalness of the search terms, let's say I get 7500 document IDs back per 1 and 2. It sounds like I need to create a filter query which includes all 7500 IDs, and issue the 2nd query (in my case to another core) and have it facet on the additional field(s) I'm interested in. I don't need to return results from this, just get the facet values/counts. Step 4 for me is to search the first index again, to obtain the requested number of rows of results, return the appropriate fields, and calculate facets for that content. I can then merge the facet results of both indexes, and the client is none the wiser. A couple questions though (aren't there always? :)) Is this very efficient? Beyond building the string of 7500 IDs within my app, can Solr swallow that okay? I'm using SolrJ, javabin format, so hopefully there is not a URL length issue (between my app and Solr)? I'm guessing javabin uses HTTP POST. What is a reasonable way for the facets derived from the 2nd index to be used for narrowing like those in the main content index? That is, pinning down facet values from the second index is not going to affect the results (document IDs) from searching the first index. Perhaps that can resolved by performing steps 1-3 as before, but also retrieving the related ID values (a field in second index that refers to the document ID in the first index) from the second index, and do a set intersection with the document IDs of the first index (step 2). Then I modify my step 4 to filter on the document ID intersection. This would cause documents in the first index to drop out due to narrowing a facet in the second index. So, from a performance or strategy perspective, is it a bit crazy to make a go of this? Thanks, Jeff On Dec 11, 2011, at 12:17 PM, Jeff Schmidt wrote: > Thanks Chris. I was just going to sit down and see if I could get join to do > what I want within a single index. I'm glad I checked my email first. :) > > However, I need to see how else to solve the problem, and it looks likes the > most apropriate line of reasoning is in your response to Kashif Kahn's > 05-Dec-11 email thread with the subject "Facet on a field with rows=n". I'll > reply to that and officially abandoned my pursuit of join. > > Cheers, > > Jeff > > On Dec 9, 2011, at 1:34 PM, Chris Hostetter wrote: > >> >> : What you said about faceting is the key. I want to use my existing >> : edismax configuration to create the scored document result set of type >> : Y. I don't want to affect their scores, but for each document ID, I >> : want join it with another type of document (X), which has a field which >> : contains a document ID of one of Y. There will be zero or more of these >> : per Y doc ID. The X document then has a multi-valued field I would like >> : to facet. I don't need scores for the joined X documents. >> >> If i'm following you correctly, then what you are asking about just isn't >> possible with "join". >> >> thecrux of the issue is that you have a particular type of document you >> want *returned* to the users as teh results of a search, sorted by score. >> that set of documents is what you "join to" All faceting, stats, >> highlighting, etc... are based on that final set of documents -- the >> documents you "join from" can only contribute in the query by identifying >> the set -- none of their field values or properties "survive the join" so >> to speak. >> >> : This does not sound possible according to the end of your final >> : paragraph. Is that because two cores are involved? Despite the join >> >> no .. it has nothing to do with the multiple core part of the problem -- >> it's just how join works. it identifies an (unordered) set of "joined to" >> documents based on the matching "joined from" documents. >> >> >> -Hoss > > > > -- > Jeff Schmidt > 535 Consulting > j...@535consulting.com > http://www.535consulting.com > (650) 423-1068 > > > > > > > > > -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068