Btw, I removed the batchSize but performance is better with
batchSize=10000. I haven't done further testing to see what the best
setting is, but the difference between setting it at 10000 and not
setting it is almost double the indexing time (~20 minutes vs ~37
minutes)

On Thu, Jun 14, 2012 at 4:49 PM, Jasper Floor <jasper.fl...@m4n.nl> wrote:
> Actually, the readOnly=true makes things worse.
> What it does (among other things) is:
>            c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);
>
> which leads to:
> Caused by: org.postgresql.util.PSQLException: Cannot change
> transaction isolation level in the middle of a transaction.
>
> because the connection is idle in transaction.
>
> I found this issue:
> https://issues.apache.org/jira/browse/SOLR-2045
>
> Patching DIH with the code they suggest seems to work.
>
> mvg,
> Jasper
>
> On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James <james.d...@ingrambook.com> 
> wrote:
>> Try readOnly="true" in the dataSource configuration.  This causes several 
>> defaults to get set in the JDBC connection, and often will solve problems 
>> like this. (see 
>> http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
>> Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
>> optimal.  This might be better than 10000.
>>
>> There is also an issue in that it doesn't explicitly close the resultset but 
>> relies on closing the connection to implicily close the child objects.  I 
>> know when I tried using DIH with Derby a while back this had at the least 
>> caused some log warnings, and it wouldn't work at all without 
>> readOnly=false.  Not sure abour PostgreSql.
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -----Original Message-----
>> From: Jasper Floor [mailto:jasper.fl...@m4n.nl]
>> Sent: Thursday, June 14, 2012 8:21 AM
>> To: solr-user@lucene.apache.org
>> Subject: DIH idle in transaction forever
>>
>> Hi all,
>>
>> It seems that DIH always holds two connections open to the database.
>> One of them is almost always 'idle in transaction'. It may sometimes
>> seem to do a little work but then it goes idle again.
>>
>>
>> datasource definition:
>>        <dataSource name="df-stream-store-ds"
>> jndiName="java:ext_solr_datafeeds_dba" type="JdbcDataSource"
>> autoCommit="false" batchSize="10000" />
>>
>> We have a datasource defined in the jndi:
>>        <no-tx-datasource>
>>                <jndi-name>ext_solr_datafeeds_dba</jndi-name>
>>                
>> <security-domain>ext_solr_datafeeds_dba_realm</security-domain>
>>                
>> <connection-url>jdbc:postgresql://db1.live.mbuyu.nl/datafeeds</connection-url>
>>                <min-pool-size>0</min-pool-size>
>>                <max-pool-size>5</max-pool-size>
>>                
>> <transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
>>                <driver-class>org.postgresql.Driver</driver-class>
>>                <blocking-timeout-millis>30000</blocking-timeout-millis>
>>                <idle-timeout-minutes>5</idle-timeout-minutes>
>>                <new-connection-sql>SELECT 1</new-connection-sql>
>>                <check-valid-connection-sql>SELECT 
>> 1</check-valid-connection-sql>
>>        </no-tx-datasource>
>>
>>
>> If we set autocommit to true then we get an OOM on indexing so that is
>> not an option.
>>
>> Does anyone have any idea why this happens? I would guess that DIH
>> doesn't close the connection, but reading the code I can't be sure of
>> this. The ResultSet object should close itself once it reaches the
>> end.
>>
>> mvg,
>> JAsper

Reply via email to