RE: cache implemetation?

cbuxbaum Thu, 23 Jul 2015 15:13:50 -0700

Hi Shawn,

Thanks for your help.

I settled on the following solution, that I am in the process of testing out:

<entity name="LEAP_PARTY" pk="LEAP_PARTY_ID"
                                                query="SELECT DISTINCT 
'LEAP_PARTY' AS DOCUMENT_TYPE, VPARTY.OWNER AS PARTY_OWNER, VPARTY.PARTY_ID AS 
PARTY_PARTY_ID, VPARTY.PARTY_ID AS LEAP_PARTY_ID,VPARTY.OWNER AS 
LEAP_PARTY_OWNER,VPARTY.PARTY_ID||'_'||VPARTY.OWNER AS LEAP_PARTY_KEY FROM 
VPARTY"
                                                >
                                                <field name="LEAP_PARTY_ID" 
column="LEAP_PARTY_ID" />
                                                <field name="LEAP_PARTY_OWNER" 
column="LEAP_PARTY_OWNER" />
                                                <field name="PARTY.PARTY_ID" 
column="PARTY_PARTY_ID" />
                                                <field name="PARTY.OWNER" 
column="PARTY_OWNER" />

<entity name="leap_party_offer_h" pk="LEAP_PARTY_ID"
 query="SELECT DISTINCT OFFER.REQUEST_NO AS OFFER_REQUEST_NO,OFFER.OWNER AS 
OFFER_OWNER,OFFER.OFFER_NO AS OFFER_OFFER_NO, OFFER.SUPPLIER||'_'||OFFER.OWNER 
AS OFFER_KEY,OFFER.MODIFY_TS FROM OFFER" processor="CachedSqlEntityProcessor" 
where="OFFER_KEY=LEAP_PARTY.LEAP_PARTY_KEY"
                                                                >
                                                                <field 
name="OFFER.REQUEST_NO" column="OFFER_REQUEST_NO" />
                                                                <field 
name="OFFER.OWNER" column="OFFER_OWNER" />
                                                                <field 
name="OFFER.OFFER_NO" column="OFFER_OFFER_NO" />
                                                </entity>

The complication that prevented me from using the cache was that we rely on a 
combination of fields for a key, and the caching in DIH assumes a single key.  
So I am creating a composite key for the fields that I need to join, and that 
key is what is being used to qualify/look up the results in the cache of the 
left hand side.

Big improvement.

Thanks,

Carl  Buxbaum
Software Architect
TradeStone Software
17 Rogers St. Suite 2; Gloucester, MA 01930
P: 978-515-5128 F : 978-281-0673
www.tradestonesoftware.com<http://www.tradestonesoftware.com/>

Connect with us on 
Twitter<http://twitter.com/TradeStone>/LinkedIn<http://www.linkedin.com/groups?gid=3118854&trk=myg_ugrp_ovr>/Facebook<http://www.facebook.com/home.php?#!/tradestonesoftware?ref=ts>

From: Shawn Heisey-2 [via Lucene] 
[mailto:ml-node+s472066n4218929...@n3.nabble.com]
Sent: Thursday, July 23, 2015 6:07 PM
To: Carl Buxbaum <cbuxb...@tradestonesoftware.com>
Subject: Re: caceh implemetation?

On 7/23/2015 10:55 AM, cbuxbaum wrote:
> Say we have 1000000 party records.  Then the child SQL will be run 1000000
> times (once for each party record).  Isn't there a way to just run the child
> SQL on all of the party records at once with a join, using a GROUP BY and
> ORDER BY on the PARTY_ID?  Then the results from that query could easily be
> placed in SOLR according to the primary key (party_id).  Is there some part
> of the Data Import Handler that operates that way?

Using well-crafted SQL JOIN is almost always going to be better for
dataimport than nested entities.  The heavy lifting is done by the
database server, using code that's extremely well-optimized for that
kind of lifting.  Doing what you describe with a parent entity and one
nested entity (that is not cached) will result in 1000001 total SQL
queries.  A million SQL queries, no matter how fast each one is, will be
slow.

If you can do everything in a single SQL query with JOIN, then Solr will
make exactly one SQL query to the server for a full-import.

For my own dataimport, I use a view that was defined on the mysql server
by the dbadmin.  The view does all the JOINs we require.

Solr's dataimport handler doesn't have any intelligence to do the join
locally.  It would be cool if it did, but somebody would have to write
the code to teach it how.  Because the DB server itself can already do
JOINs, and it can do them VERY well, there's really no reason to teach
it to Solr.

Thanks,
Shawn

________________________________
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/cache-implemetation-tp4218825p4218929.html
To unsubscribe from cache implemetation?, click 
here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4218825&code=Y2J1eGJhdW1AdHJhZGVzdG9uZXNvZnR3YXJlLmNvbXw0MjE4ODI1fC0xNDQ2Mjc3MTI2>.
NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/RE-cache-implemetation-tp4218930.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: cache implemetation?

Reply via email to