Re: Solr Join with Dismax

Jeff Schmidt Tue, 06 Dec 2011 11:57:59 -0800

Hi Pascal:

I have an issue similar to yours, but also need to facet the joined documents...


I've been playing with various things. There's not much documentation I can 
find.

Looking at http://wiki.apache.org/solr/Join, in the fourth example you can see 
the join being relegated to a filter query:

http://localhost:8983/solr/select?q=ipod&fl=*,score&sort=score+desc&fq={!join+from=id+to=manu_id_s}compName_s:Belkin

So, I figured if you can do that, why not specify the query handler (qt). When 
I issue this query for my application:

http://localhost:8091/solr/ing-content/select/?qt=partner-tmo&fq=type:node&q=brca1&fq={!join+from=conceptId+to=id+fromIndex=partner-tmo}*:*&debugQuery=true&rows=5&fl=id,n_type,n_name

My configured edismax based request handler is named "partner-tmo", and with 
debugQuery=true I can see the query being handled by that handler:

<str name="parsedquery_toString">+(n_pathway_namePartial:brca1^4.25 | 
n_pathway_name:brca1^8.5 | n_macromolecule_id:brca1^9.0 | n_m_s_macc:brca1^6.0 
| n_go_id:brca1^6.0 | n_go_term:brca1^4.0 | n_cellreg_regulates:brca1 | 
n_acc_id_sp:brca1^9.5 | n_m_s_mseq:brca1^6.0 | n_namePartial:brca1^5.0 | 
n_synonymPartial:brca1^4.85 | n_neighborof_process:brca1^2.0 | 
n_acc_id_rs_mrna:brca1^9.5 | n_tissue_typePartial:brca1^4.0 | n_c_iupac:brca1 | 
n_member_name:brca1^9.7 | n_c_cas_number:brca1^2.0 | n_c_pubchem_cid:brca1^4.0 
| n_protein_family:brca1^7.0 | n_cellreg_regulated_by:brca1 | 
n_go_termPartial:brca1 | n_function:brca1^7.0 | n_ref_author:brca1 | 
n_name:brca1^9.9 | n_m_s_mirbase_family_name:brca1^4.0 | 
n_protein_familyPartial:brca1^3.5 | n_type:brca1^2.0 | p_name:brca1^8.0 | 
n_m_s_mname:brca1^6.0 | n_c_systematic:brca1^2.0 | n_ref_source_id:brca1^4.0 | 
n_macromolecule_name:brca1^9.8 | n_c_formula:brca1^4.0 | 
n_memberof_name:brca1^9.7 | n_neighborof_name:brca1^6.0 | p_class:brca1^0.1 | 
n_cellreg_diseasePartial:brca1^4.5 | n_m_m_sacc:brca1^6.0 | 
n_ref_title:brca1^1.1 | n_m_acc:brca1^9.5 | n_acc_id_ug:brca1^9.5 | 
n_cellreg_binds:brca1 | n_synonym:brca1^9.7 | n_acc_id:brca1^9.5 | 
n_macromolecule_namePartial:brca1^4.9 | n_macromolecule_summary:brca^0.6 | 
n_m_s_mirbase_comments:brca1^0.6 | p_description:brca^7.0 | 
n_m_m_sname:brca1^6.0 | p_nameExact:brca1^10.0 | 
n_m_s_mirbase_family_acc:brca1^8.0 | n_tissue_type:brca1^7.0 | 
n_eg_id:brca1^9.5 | n_cellreg_disease:brca1^9.0 | n_typePartial:brca1^3.25 | 
p_classExact:brca1^1.5 | n_m_rna_target_name:brca1^3.0 | n_m_m_sseq:brca1^6.0 | 
n_acc_id_rs_prot:brca1^9.5 | n_m_seq:brca1^9.0 | 
n_cellreg_role_in_cellPartial:brca1^3.75 | n_memberof_namePartial:brca1^4.85 | 
n_cellreg_role_in_cell:brca1^7.5)~0.1 ()</str>


I know, that's a lot fields to be searching. :)  Anyway, I'm still working on 
figuring out the join results.  It is doing something according to the debug 
output:

<lst name="join">
    <lst name="{!join from=conceptId to=id fromIndex=partner-tmo}*:*">
        <long name="time">737</long>
        <int name="fromSetSize">1593981</int>
        <int name="toSetSize">63021</int>
        <int name="fromTermCount">63021</int>
        <long name="fromTermTotalDf">63021</long>
        <int name="fromTermDirectCount">62351</int>
        <int name="fromTermHits">63021</int>
        <long name="fromTermHitsTotalDf">63021</long>
        <int name="toTermHits">63021</int>
        <long name="toTermHitsTotalDf">63021</long>
        <int name="toTermDirectCount">62871</int>
        <int name="smallSetsDeferred">150</int>
        <long name="toSetDocsAdded">63021</long>
    </lst>
</lst>

I'm not sure how much this helps you, but it looks like you can combine join 
with [e]dismax.

Cheers,

Jeff

On Dec 6, 2011, at 11:20 AM, Pascal Dimassimo wrote:

> Hi,
> 
> I was trying Solr Join across 2 cores on the same Solr installation. Per
> example:
> 
> /solr/index1/select?q={!join fromIndex=index2 from=tag to=tag}restaurant
> 
> My understanding is that the "restaurant" query will be executed on index2
> and the results of this query will be joined with the documents of index1
> by matching the "tag" field.
> 
> According to my tests, It looks like the "restaurant" query will always be
> parsed using the Lucene QParser. I did not find a way to use another
> QParser, like Dismax. Am I right or is there a way?
> 
> Thanks!
> 
> -- 
> Pascal Dimassimo
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068

Re: Solr Join with Dismax

Reply via email to