Hello:

I'm trying to relate together two different types of documents.  Currently I 
have 'node' documents that reside in one index (core), and 'product mapping' 
documents that are in another index.  The product mapping index is used to map 
tenant products to nodes. The nodes are canonical content that gets updated 
every quarter, where as the product mappings can change at any time.

I put them in two indexes because (1) canonical content changes rarely, and I 
don't want product mapping changes to affect it (commit, re-open searchers 
etc.), and I would like to support multiple tenants mapping products to the 
same canonical content to avoid duplication (a few GB).

This arrange has worked well thus far, but only in the sense that for each node 
result returned, I can query the product mapping index to determine the 
products mapped to the node.  I combine this information within my application 
and return it to the client.  This works okay in that there are only 5-20 
results returned per page (start, rows).  But now I'm being asked to facet the 
product catagories (multi-valued field within a product mapping document) along 
with other facets defined in the canonical content.

Can this be done with Solr 3.5.0?  I've been looking into sub-queries, function 
queries etc.  Also, I've seen various postings indicating that one needs to 
denormalize more.  I don't want to add product information as fields to the 
canonical content. Not only does that defeat my objective (1) above, but Solr 
does not support incremental updates of document fields.

So, one approach is to issue by query to the canonical index and get all of the 
document IDs (could be 1000s), and then issue a filter query to the product 
mapping index with all of these IDs and have Solr facet the product categories. 
 Is that efficient?  I suppose I could use HTTP POST (via SolrJ) to convey that 
payload of IDs?  I could then take the facet results of that query and combine 
them with the canonical index results and return them to the client.

That may be do-able, but then let's say the user clicks on a product category 
facet value to narrow the node results to only those mapped to category XYZ. 
This will not affect the query issued against the canonical content index.  
Instead, I think I'd have to go through the canonical results and eliminate the 
nodes that are not associated with product category XYZ.  Then, if the current 
page of results is inadequate (rows=10, but 3 nodes were eliminated), I'd have 
to go back to the canonical index to get more rows, eliminate some some again 
perhaps, get more etc.  That sounds unappealing and low performing.

Is there a Solr way to do this?  My Packt "Apache Solr 3 Enterprise Search 
Server" book (page 34) states regarding separate indices:

        "If you do develop separate schemas and if you need to search across 
your indices in one search then you must perform a distributed search, 
described in the last chapter. A distributed search is usually a feature 
employed for a large corpus but it applies here too."

But in the chapter it goes on to talk about dealing with sharding, replication 
etc. to support a large corpus, not necessarily tying together two different 
indexes.

Is it possible to accomplish my goal in a less ugly way than I outlined above?  
Since we only have a single tenant to worry about, I could use a combined index 
at least for a few months (separate fields per document type, IDs are unique 
among then all) if that makes a difference.

Thanks!

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068









Reply via email to