This does appear to work well. It seems there are not many people interested in this particular problem now, but I figured I'd just complete the story in case it helps somebody in the future.
With the neighbor node ID prefixes, I'm getting the facet values and counts as I require. Since I have an application standing between Solr and the client, I can play these games with the prefix. When I build my content for indexing (Solr compatible XML), I add the prefixes. When I return facet results to the client, I remove the prefixes before they pass through. When the client specifies one or more values to pin down for the facet (drill down), I add the prefixes as I configure the Solr query via SolrJ. In the future, I'm sure I'll be asked to facet more of the process fields that are specific between two nodes. I guess I'll just expand the use of the prefixes to more fields. Take it easy, Jeff On Nov 28, 2011, at 9:06 PM, Schmidt Jeff wrote: > Well, here's something that might just work. Using the Solr 3.4+ > facet.prefix parameter, as well as prefixing the values of the particular > field I want to facet based on the node neighbor ID, I get what I need. > > Adding the field: > > <field name="n_directionalityFacet" type="string" indexed="true" > stored="false" multiValued="true" omitNorms="true" /> > > Then, for each value, I prefix it with {nodeId}-. For example, using the > focus node ID of ING:afa, I can get as a result document set, all of the > neighbors of that node ID. Then, I also tell Solr to facet using that same > focus node ID prefix: > > http://localhost:8091/solr/ing-content/select/?qt=partner-tmo&fq=type%3Anode&fq=n_neighborof_id%3AING\:afa&rows=0&facet=true&facet.mincount=1&facet.field=n_directionalityFacet&f.n_directionalityFacet.facet.prefix=ING%3Aafa > > And, for that particular facet, I get only the values and counts relevant to > the focus node ID: > > <lst name="facet_fields"> > <lst name="n_directionalityFacet"> > <int name="ING:afa-D">82</int> > <int name="ING:afa-B">2</int> > <int name="ING:afa-A">1</int> > <int name="ING:afa-U">1</int> > </lst> > </lst> > > My app can then take this response and remove the prefix before returning the > values and counts to the client. It may inflate the size of index some, but > it sure beats my alternative proposals... > > Cheers, > > Jeff > > On Nov 26, 2011, at 1:22 PM, Jeff Schmidt wrote: > >> Hello: >> >> I'm still not finding much joy with this issue. >> >> For one, it looks like FacetComponent (via >> SimpleFacets.getFieldCacheCounts()) goes directly to the Lucene FieldCache >> (non-enum, multi-valued field, single string token) in order to get terms to >> count. So, even if it were possible for me to somehow modify the >> ResponseBuilder in between the QueryComponent and FacetComponent, that won't >> do much good. >> >> i'd rather not modify Solr/Lucene code and have a custom build (though >> that's not impossible in the short term), but QueryComponent does not >> provide sufficient access. I suppose I could further investigate going the >> RequestHandler route. But, let me know if this is crazy talk: >> >> From what I can tell in org.apache.solr.request.SimpleFacets, line 366 >> (sorry, no SCM info in source file, but is from the 3.4.0 source >> distribution); >> >> FieldCache.StringIndex si = >> FieldCache.DEFAULT.getStringIndex(searcher.getReader(), fieldName); >> final String[] terms = si.lookup; >> final int[] termNum = si.order; >> >> SimpleFacets.getFieldCacheCounts() uses the response from the Lucene >> FIeldCache to do its work. My thought is to use AspectJ to place after >> advice on the Lucene method (org.apache.lucene.search.FieldCacheImpl), to >> modify the response. I don't want to muck with the field cache itself. >> After all, the field values I don't want to count for this focusNodeId, I >> may well with another. >> >> Given the FieldCacheImpl method: >> >> // inherit javadocs >> public StringIndex getStringIndex(IndexReader reader, String field) >> throws IOException { >> return (StringIndex) caches.get(StringIndex.class).get(reader, new >> Entry(field, (Parser)null)); >> } >> >> I seems I could take the returned StringIndex instance, and create a new >> filtered one, leaving the cached original intact. StringIndex (defined in >> FieldCache) is public static class with a public constructor. Then, >> SimpleFacets will facet what I provided it. >> >> The other trick is to inform my aspect within Lucene just what the what >> focusNodeId is, so it knows how to filter. This is request specific. I'm >> running Solr within Tomcat. I've not looked exhaustively into how Solr >> threading works. But, if the current app server request thread is used >> synchronously to satisfy any given SolrJ request, then I could provide a >> SearchComponent that looked for some special parameter that indicates the >> focusNodeId of interest, and then place it in a ThreadLocal which the >> interceptor could pick up. If the ThreadLocal is not defined, then the >> interceptor does not filter (a definite scenario) and returns Lucene's >> StringIndex instance. If there is another thread involved in handling the >> request, then more investigation is needed. >> >> Any inside information would be appreciated. Or, firmly stated I should not >> go there would also be appreciated. :) >> >> Cheers, >> >> Jeff >> >> On Nov 21, 2011, at 4:31 PM, Jeff Schmidt wrote: >> >>> Hello: >>> >>> Solr version: 3.4.0 >>> >>> I'm trying to figure out if it's possible to both return (retrieval) as >>> well as facet on certain values of a multivalued field. The scenario is a >>> life science app comprised of a graph of nodes (genes, chemicals etc.) and >>> each node has a "neighborhood" consisting of one or more nodes with which >>> it has a relationships defined as "processes" ("inhibition", >>> "phosphorylation" etc.). >>> >>> What I've done is add a number of multi-valued fields to each node >>> consisting of the neighbor node ID (neighbor's document ID), process, and >>> couple of other related items. For a given node, it'll have multiple >>> neighbors, as well as multiple processes with a single neighbor. For >>> example, in schema.xml: >>> >>> <field name="id" type="string" indexed="true" stored="true" >>> required="true" /> >>> >>> <!-- Network neighborhood fields --> >>> <field name="n_neighborof_id" type="string" indexed="true" stored="true" >>> multiValued="true" /> >>> <field name="n_neighborof_name" type="text_lc_np" indexed="true" >>> stored="true" multiValued="true" termVectors="true" /> >>> <field name="n_neighborof_process" type="text_lc_np" indexed="true" >>> stored="true" multiValued="true" termVectors="true" /> >>> <field name="n_neighborof_processExact" type="string" indexed="true" >>> stored="true" multiValued="true" termVectors="true" /> >>> <field name="n_neighborof_edge_type" type="string" indexed="true" >>> stored="true" multiValued="true" /> >>> <field name="n_neighborof_is_direct" type="boolean" indexed="true" >>> stored="true" multiValued="true" /> >>> <field name="n_neighborof_count" type="sint" indexed="false" >>> stored="true" multiValued="true" /> >>> >>> Note that the type text_lc_np simply lowercases and ignores punctuation. >>> >>> So, when I want the neighbors of a given node, I define a filter query like >>> fq=n_neighborof_id=someFocusNodeId and I get all of the the neighbors. >>> That's exactly what I want in terms of documents. There are a number of per >>> document fields that are returned with the search results. This includes >>> the actual process information defined above. Not surprisingly, I get all >>> all of the values for each field. But I do not want them, I only want those >>> that pertain to the specified focus node ID. >>> >>> For now, my workaround for the retrieval aspect of this is for my >>> application to chuck the irrelevant values. That is, for a set or related >>> field values, if n_neighborof_id != focusNodeId, then out they go. While >>> this gets the job done, it is quite wasteful in terms of both processing by >>> both Solr and my app, as well as bandwidth. >>> >>> Now I need to facet on a couple of the neighbor fields. Solr returns counts >>> relevant to all processes defined within the document result set. Again, >>> that is expected, but not what I want. I'd like Solr to compute facet >>> counts only for processes relevant to the specified focus node, much like >>> my filter query to get the document results. >>> >>> Is this possible? I've looked at grouping queries, though those are >>> document centric and do not work for multivalued fields. I've looked into >>> implementing my own SearchComponent within the Solr server. It sounded >>> ideal to drop something I have control over right between the standard >>> query and facet components. I figured I could eliminate the undesired >>> fields at that point, both solving my first problem of having to toss >>> irrelevant processes in my app, and having Solr compute facet values using >>> only the desired processes. But, there are comments in the Solr source >>> code that stipulates a component must not modify the document set. For >>> example, in org.apache.solr.search.DocSet: >>> >>> /** >>> * <code>DocSet</code> represents an unordered set of Lucene Document Ids. >>> * >>> * <p> >>> * WARNING: Any DocSet returned from SolrIndexSearcher should <b>not</b> be >>> modified as it may have been retrieved from >>> * a cache and could be shared. >>> * </p> >>> * >>> * @version $Id: DocSet.java 1065312 2011-01-30 16:08:25Z rmuir $ >>> * @since solr 0.9 >>> */ >>> >>> Perhaps I cannot use this avenue to accomplish my goals? But, I don't need >>> to modify the document set itself (IDs etc.), just trim the field values >>> per document. Does that make sense? >>> >>> I may well have to re-evaluate my data model, but I'd like to get what I >>> need with what I have currently defined if possible. >>> >>> Thanks, >>> >>> Jeff >>> -- >>> Jeff Schmidt >>> 535 Consulting >>> j...@535consulting.com >>> http://www.535consulting.com >>> (650) 423-1068 >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> >> -- >> Jeff Schmidt >> 535 Consulting >> j...@535consulting.com >> http://www.535consulting.com >> (650) 423-1068 >> >> >> >> >> >> >> >> >> > -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068