How long do those query take to execute and return all it's rows
outside of DataImportHandler?

I'd bring those queries into SQL Developer and get an explain plan on
them to find out if any of them are much slower than the other.

You might have "only" 700k documents for your index, but you're
issuing a separate query for every entity for every document. Multiply
that number of queries times the average round trip latency to your
database and that's the amount of time your app server and database
server spend sitting around doing nothing, waiting for messages to
arrive. If you can remove any of those entities in favor of joins,
you'll be doing yourself a favor.

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 24, 2013 at 3:25 PM, srinalluri <nallurisr...@yahoo.com> wrote:
> Hi,
>
> Environment is Solr 3.6.1. The database is having enough indexes. The box is
> having enough memory. The DB is performance is good. Auto commit is enabled
> for every 1 minute.
> Please see the following entity. The full-import of this entity is taking
> over 48 hours to complete on production environment. The number records in
> the main table is around 700,000 only.  I tried materialized view, but that
> view is having duplicate records. So I can't go with materialized view for
> all these queries.
> Can someone please suggest how to improve the performance for full-import?
>
> <entity name="oracle-article" dataSource="oracle" pk="VCMID"
> preImportDeleteQuery="content_type:article AND repository:oracleqa"
> query="select ID as VCMID from tab_story2 order by published_date desc"
> deltaImportQuery="select '${dataimporter.delta.VCMID}' as VCMID from dual"
> deltaQuery="select s2.ID as VCMID from tab_story2 s2, gnasmomap mms2,
> gnasmometadata mmd where s2.id = mms2.keystring1 and mms2.recordid =
> mmd.contentmgmtid and mmd.lastpublishdate > ((CAST(SYS_EXTRACT_UTC(TIMESTAMP
> '${dataimporter.oracle-article.last_index_time}') AS DATE) -
> TO_DATE('01/01/1970 00:00:00', 'MM-DD-YYYY HH24:MI:SS')) * 24 * 60 * 60 *
> 1000)-300000">
> <entity name="recordid" dataSource="oracle"
> transformer="TemplateTransformer" query="select RECORDID from gnasmomap
> where keystring1 = '${oracle-article.VCMID}'">
> <field column="content_type" template="article"/>
> <field column="RECORDID" name="native_id"/>
> <field column="repository" template="oracleqa"/>
> </entity>
> <entity name="article_details" dataSource="oracle"
> transformer="ClobTransformer,RegexTransformer,script:trimTicker,script:hasBody,script:hasDeck"
> query="select STORY_TITLE, STORY_HEADLINE, SOURCE, DECK,
> regexp_replace(body, '\<p\>\[(pullquote|summary)\]\</p\>|\[video
> [0-9]+?\]|\[youtube .+?\]', '') as BODY, PUBLISHED_DATE, MODIFIED_DATE,
> DATELINE, REPORTER_NAME, TICKER_CODES,ADVERTORIAL_CONTENT from tab_story2
> where id = '${oracle-article.VCMID}'">
> <field column="STORY_TITLE" name="title"/>
> <field column="DECK" name="description" clob="true"/>
> <field column="PUBLISHED_DATE" name="date"/>
> <field column="MODIFIED_DATE" name="last_modified_date"/>
> <field column="BODY" name="body" clob="true"/>
> <field column="SOURCE" name="source"/>
> <field column="DATELINE" name="dateline"/>
> <field column="STORY_HEADLINE" name="export_headline"/>
> <field column="ticker" splitBy="," sourceColName="TICKER_CODES"/>
> <field column="ADVERTORIAL_CONTENT" name="advertorial_content"/>
> <field column="has_body" sourceColName="body"/>
> <field column="has_description" sourceColName="description"/>
> </entity>
> <entity name="site" dataSource="oracle" query="select CASE WHEN
> site.name='fq2' THEN 'fqn' WHEN site.name='fq' THEN 'sbc' WHEN
> site.name='fq-lat' THEN 'latino' ELSE 'gc' END SITE, CASE WHEN
> site.name='fq2' THEN 'v8-qa.tabbusiness.com' WHEN site.name='fb' THEN
> 'v8-qa.smallbusiness.tabbusiness.com' WHEN site.name='qn-latino' THEN
> 'v8-qa.latino.tabdays.com' ELSE 'v8-qa.tabdays.com' END SERVER from
> gnasmomap mm, gnaschannelfileassociation cfa, gnaschannel ch, gnassite site
> where mm.keystring1 = '${oracle-article.VCMID}' and mm.recordid =
> cfa.vcmobjectid and cfa.channelid = ch.id and ch.siteid = site.id and rownum
> = 1">
> <field column="SITE" name="site"/>
> <entity name="url" dataSource="oracle" query="select 'http://' ||
> '${site.SERVER}' || furl as URL from tab_furl where parent_id =
> '${oracle-article.VCMID}'">
> <field column="URL" name="url"/>
> </entity>
> <entity name="image" dataSource="oracle" transformer="script:hasImageURL"
> query="select distinct('http://qa.global.fqstatic.com' || sourcepath) as
> IMAGE_URL from ( select mc.sourcepath from tab_rel_content rc, tab_story2
> st, gnasmomap mm, dsx_media_common mc where rc.parent_id =
> '${oracle-article.VCMID}' and rc.parent_id = st.id and (st.NO_FEATURED_MEDIA
> != 'yes' OR st.NO_FEATURED_MEDIA is null) and rc.ref_id = mm.recordid and
> mm.keystring1 = mc.mediaid and rc.rank = 1 union all select mc.sourcepath
> from tab_rel_content arm, tab_story2 st, gnasmomap cmm, tab_rel_media crm,
> gnasmomap mmm, dsx_media_common mc where arm.parent_id =
> '${oracle-article.VCMID}' and arm.parent_id = st.id and
> (st.NO_FEATURED_MEDIA !='yes' OR st.NO_FEATURED_MEDIA is null) and
> arm.ref_id = cmm.recordid and cmm.keystring1 = crm.parent_id and crm.rank =
> 1 and crm.ref_id = mmm.recordid and mmm.keystring1 = mc.mediaid and arm.rank
> = 1)">
> <field column="IMAGE_URL" name="image_url"/>
> <field column="has_image_url" sourceColName="IMAGE_URL"/>
> </entity>
> </entity>
> <entity name="taxonomy" dataSource="oracle" query="select tc.PATH from
> gnasmomap mm, gndaassociation ass, gndataxonomycategory tc where mm.recordid
> = ass.cmsobjectid and ass.categoryid = tc.id and mm.keystring1 =
> '${oracle-article.VCMID}'">
> <field column="PATH" name="taxonomy_path"/>
> </entity>
> <entity name="keyword" dataSource="oracle"
> transformer="RegexTransformer,script:trimKeyword" query="select KEYWORDS
> from tab_rel_metadata where parent_id = '${oracle-article.VCMID}'">
> <field column="keyword" splitBy="," sourceColName="KEYWORDS"/>
> </entity>
> <entity name="author" dataSource="oracle" query="select pmm.recordid as
> author_id, trim(trim(trailing ',' from trim(regexp_replace(p.first_name || '
> ' || p.last_name || ', ' || p.suffix,'( ){2,}', ' ')))) as author from
> tab_person p, gnasmomap pmm, tab_rel_reporter rr where p.id = pmm.keystring1
> and pmm.recordid = rr.ref_id and rr.parent_id =
> '${oracle-article.VCMID}'"></entity>
> <entity name="channels" dataSource="oracle" query="select CASE WHEN
> cp.site='fb2' THEN 'fbn' WHEN cp.site='fb' THEN 'sbc' WHEN
> cp.site='qn-latino' THEN 'latino' ELSE 'gc' END || cp.path as PATH,
> cp.isa_path ISA_PATH from gnasmomap mm, gnaschannelfileassociation cfa,
> tab_mview_channelpath cp where mm.recordid = cfa.vcmobjectid and
> cfa.channelid = cp.vcmid and mm.keystring1 = '${oracle-article.VCMID}'">
> <field column="PATH" name="section_path"/>
> <field column="ISA_PATH" name="section_path"/>
> </entity>
> </entity>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/full-import-takes-4-days-48-hours-to-complete-where-main-db-table-size-700k-only-tp4058690.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to