|
Hi,
We experienced a similar problem over the last few weeks. Last Tuesday, after
copious debugging, we found the cause of it, at least in our case.
We are using exclusive locking with no caching, so it fairly easy for us to
reproduce.
We use the following style of object mappings, that result in multiple
objects being retrieved in a single access:
-- snip --
<class name="user.jdo.CastorUser" identity="id">
<map-to table="us_user" />
<field name="id" type="long" >
<sql name="user_uid" type="bigint"/>
</field>
<field name="username" type="string">
<sql name="usu_username" type="varchar" />
</field>
<field name="password" type="string">
<sql name="usu_password" type="varchar" />
</field>
<field name="parent" type="user.jdo.CastorGroup">
<sql name="parent_uid" />
</field>
-- snip --
so in the snippet above, when loading a user, their parent group is loaded
also.
Our problem was arising when a deadlock exception occurred while retreiving
the parent of a user, which was happening automatically thanks to the above
object mapping. A locknotgrantedexception was thrown, and the transaction was
backed out, but the user object remained locked until we restarted the
server.
It turns out that as the exception was thrown when attempting to pull the
parent in "automatically", the locknotgrantedexception was propagated
back to the attempt to load the user, due to the nesting of the function
calls. However, the user object had been successfully locked - it was the
underlying parent that failed. Unfortunately there is nothing in the code to
detect that it is a nested call that failed, and so the user object is not
unlocked. Next time we attempt to do anything as this user, locknotgrantedexceptions
all over the place!
I've worked around this for the time being by modifying the load() method in org.exolab.castor.persist.TransactionContext.java
to attempt to release the lock when a lock not granted exception is caught :
-- snip --
} catch ( LockNotGrantedException except ) {
// | it is an important one......
// ---- KAB NEW ---
try
{
// KAB - release the lock, as we may be a nested field that was loaded and
locked
// - correctly.
ObjectEntry tmpEntry = getObjectEntry( object );
tmpEntry.engine.releaseLock( this, tmpEntry.oid );
}
catch (Exception e)
{
//looks like we didn't have the lock. No matter
//System.out.println("Didn't have lock anyway.");
}
// ---- END OF KAB NEW ---
-- snip --
Thus far it's done the trick. Caveat emptor and all that, and if you are not
using exclusive locking you probably need to handle this in other methods - fetch()
seems likely. I've tested this with lock timeouts set to 10 seconds also,
which causes most of our concurrent transactions to fail, and it has
recovered every time.
Hope this helps.
Ken Burcham wrote:
|