Btw, I removed the batchSize but performance is better with batchSize=10000. I haven't done further testing to see what the best setting is, but the difference between setting it at 10000 and not setting it is almost double the indexing time (~20 minutes vs ~37 minutes)
On Thu, Jun 14, 2012 at 4:49 PM, Jasper Floor <jasper.fl...@m4n.nl> wrote: > Actually, the readOnly=true makes things worse. > What it does (among other things) is: > c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED); > > which leads to: > Caused by: org.postgresql.util.PSQLException: Cannot change > transaction isolation level in the middle of a transaction. > > because the connection is idle in transaction. > > I found this issue: > https://issues.apache.org/jira/browse/SOLR-2045 > > Patching DIH with the code they suggest seems to work. > > mvg, > Jasper > > On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James <james.d...@ingrambook.com> > wrote: >> Try readOnly="true" in the dataSource configuration. This causes several >> defaults to get set in the JDBC connection, and often will solve problems >> like this. (see >> http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) >> Also, try a batch size of 0 to let your jdbc driver pick what it thinks is >> optimal. This might be better than 10000. >> >> There is also an issue in that it doesn't explicitly close the resultset but >> relies on closing the connection to implicily close the child objects. I >> know when I tried using DIH with Derby a while back this had at the least >> caused some log warnings, and it wouldn't work at all without >> readOnly=false. Not sure abour PostgreSql. >> >> James Dyer >> E-Commerce Systems >> Ingram Content Group >> (615) 213-4311 >> >> >> -----Original Message----- >> From: Jasper Floor [mailto:jasper.fl...@m4n.nl] >> Sent: Thursday, June 14, 2012 8:21 AM >> To: solr-user@lucene.apache.org >> Subject: DIH idle in transaction forever >> >> Hi all, >> >> It seems that DIH always holds two connections open to the database. >> One of them is almost always 'idle in transaction'. It may sometimes >> seem to do a little work but then it goes idle again. >> >> >> datasource definition: >> <dataSource name="df-stream-store-ds" >> jndiName="java:ext_solr_datafeeds_dba" type="JdbcDataSource" >> autoCommit="false" batchSize="10000" /> >> >> We have a datasource defined in the jndi: >> <no-tx-datasource> >> <jndi-name>ext_solr_datafeeds_dba</jndi-name> >> >> <security-domain>ext_solr_datafeeds_dba_realm</security-domain> >> >> <connection-url>jdbc:postgresql://db1.live.mbuyu.nl/datafeeds</connection-url> >> <min-pool-size>0</min-pool-size> >> <max-pool-size>5</max-pool-size> >> >> <transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation> >> <driver-class>org.postgresql.Driver</driver-class> >> <blocking-timeout-millis>30000</blocking-timeout-millis> >> <idle-timeout-minutes>5</idle-timeout-minutes> >> <new-connection-sql>SELECT 1</new-connection-sql> >> <check-valid-connection-sql>SELECT >> 1</check-valid-connection-sql> >> </no-tx-datasource> >> >> >> If we set autocommit to true then we get an OOM on indexing so that is >> not an option. >> >> Does anyone have any idea why this happens? I would guess that DIH >> doesn't close the connection, but reading the code I can't be sure of >> this. The ResultSet object should close itself once it reaches the >> end. >> >> mvg, >> JAsper