I have a program that scans a directory and all subdirectories, generating
a SHA1_HASH for the first 16K of each file.
After that finishes, I get all duplicate candidates by querying derby for:
select coun(*), precheck_sha1_hash, file_size from derbydb where (not
preshceck_sha1_hash is null and not precheck_sha1_hash = '') group by
precheck_sha1_hash, file_size having count(*) > 1
I then use ThreadPoolExecutor (10 threads) but am currently doing
non-threaded with same result:
new ReportUtility(db, miniHash, fileSize)
Files with the same 16K hash and file size are duplicate candidates so I
will generate full hashes for these files to compare them.
Problem is, when I use a where clause in the rs query, I cannot update. It
will work fine if I remove there where clause (I have tried with both where
parameters and only one): (_db is the db passed to the contructor above)
PreparedStatement stmt = _db.prepareStatement("select sha1_hash, file_path
from derbydb where precheck_sha1_hash = '" + _miniHash + "' and file_size =
'"+ _fileSize + "', ResultSet.CONCUR_UPDATEABLE);
ResultSet rs = stmt.executeQuery();
while(rs.next()) {
File file = new File(rs.getString("file_path");
String hash = generateHash(file);
rs.updateString("sha1_hash", hash); // Exception thrown here when my
resultset contains a where clause
rs.updateRow();
}
rs.close();
I am using 10.5.1.1_201105231903 of Derby at the moment. I know you cannot
have updatable resultsets with group by, etc but this should be supported I
would think. I hope I am doing something dumb here. (for the purist
critics, yes, I will turn it into a parameterized PreparedStatement), just
trying to simplify into pseudocode vs. retying whole thing.
Thanks in advance
Michael