Perhaps there is potential to optimize with some PLSQL functions on Oracle side 
to do as much work within database as possible and have the text indexers only 
access a view referencing that function. Also, the obvious optimization is a 
record-updated timestamp so that every time indexer runs, only changed data is 
managed.

-----Original Message-----
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Wednesday, 16 August 2017 5:42 a.m.
To: solr-user@lucene.apache.org
Subject: Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle 
session parameters

On 8/15/2017 8:09 AM, Mannott, Birgit wrote:
> I'm using solr 6.6.0 and I have to do a complex data import from an oracle db 
> concerning 3.500.000 data rows.
> For each row I have 15 additional entities. That means that more than 52 
> Million selects are send to the database.
> For every select that is done I optimized the oracle execution path by 
> creating indizes.
> The execution plans are ok.
> But the import still lasts 12 hours.

I think the reason it takes 12 hours is because there are 52 million SELECT 
statements.  That many statements over 12 hours is an average of
1200 per second.  This sounds like pretty good database performance.

> Is there a way to execute a command like "ALTER SESSION SET cursor_sharing = 
> FORCE;" after getting the connection for processing an entity?

I think that most JDBC drivers (by default) don't allow multiple SQL statements 
to be sent in a single request, so commands like "SELECT FOO; SELECT BAR" won't 
work.  The idea behind denying this kind of command is protection against SQL 
injection attacks.  There is likely a JDBC URL parameter for the Oracle driver 
that would allow that ... and if there is, then you could add that to the 
connection URL in the DIH config to allow putting the ALTER SESSION statement 
before SELECT in your DIH entity.

The Oracle driver might also have a JDBC URL parameter to turn on the cursor 
sharing you're interested in.  That would be the best way to handle it, if that 
is an option.  You're going to need to consult Oracle documentation or an 
Oracle support resource to find out what URL parameter options there are for 
their driver.

I have near zero experience with Oracle databases, but I suspect that even with 
cursor sharing, you're still going to have the sheer number of SELECT 
statements as a bottleneck.  If there is a performance improvement, it probably 
won't be dramatic.

Thanks,
Shawn

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.

Reply via email to