Re: Using Hooks To OCR Documents

2010-12-06 Thread Ryan Schmidt

On Dec 3, 2010, at 09:44, Jim Jenkins wrote:

> I’m planning to use Hooks to add OCR scanning for select documents going into 
> a SVN repo.  I’m not really sure where to start so I’m hoping someone here 
> can tell me if it’s possible and even suggest how best to proceed.
>  
> Basically I’d like to have every commit to an SVN repo stop at the pre-commit 
> (or another more suitable) hook so the submitted files can be inspected and 
> if needed run through a command line OCR engine.  We are dealing with “image” 
> based PDF files so these would be sent off to the OCR engine and a 
> “test+image” PDF would be returned.  The new PDF would replace the original 
> before being sent on it’s way into the SVN repo.


Some of this is possible, assuming that you will automate everything, including 
the process of deciding whether or not to OCR the document. (Hook scripts run 
on the server and are not interactive.)

Here's an example pre-commit hook which checks the syntax of any committed Java 
files:

http://svn.haxx.se/users/archive-2006-06/0853.shtml

You could change the criteria from "extension .java" to whatever your criteria 
is ("extension .pdf", maybe, and then some other check to see if the PDF is 
image-based), and change the action from running checkstyle to running your OCR 
program.

What's not possible is changing the content of the incoming transaction, as you 
propose. You must either accept the transaction as-is (by returning 0 from your 
pre-commit hook script), or reject it (by returning any other number). So you 
could do that, and if an incoming PDF is image-based, reject the commit and 
inform the user they must run the OCR program on it first.

I have a pre-commit script on my repository doing something similar: I run 
pngcrush on committed PNGs, and if I find a PNG that would benefit from being 
crushed, I reject the commit and tell the user to pngcrush it and then try the 
commit again.

That would be the preferred way to do things. But if it will be too difficult 
for your users to run the OCR program themselves and you want to automate the 
process server-side, an alternative is to accept the commit -- not run any of 
these checks in the pre-commit -- and run your script at post-commit time 
instead. If you detect that a just-committed revision contains an image-based 
PDF that you can OCR, then OCR it, and replace it, in a second commit initiated 
by the post-commit script. This is trickier because the hook script might then 
have to manage a working copy (check out the directory, change the PDF to the 
OCR'd one, commit, delete the working copy). This is fraught with problems such 
as: What happens if the post-commit script decides to act on the PDF that's 
being committed by the post-commit script? (Infinite loop?) What happens if 
someone manages to commit another revision to that PDF before the hook script 
is done committing its revision? Perhaps that's not likely. But commits can 
fail for many reasons, which the script would either have to anticipate and 
deal with, or log or email failures for someone to deal with manually. There's 
also the problem that a user who committed an image-based PDF would then 
immediately have an out-of-date working copy, which is not expected in normal 
Subversion usage, though you could train your users to understand this and 
recommend they run "svn up" again shortly after committing. Or, if your script 
does replace a PDF, you could inform the user via out-of-band means (email, 
instant message, etc.) that they should run "svn up".




Re: Using Hooks To OCR Documents

2010-12-06 Thread Ulrich Eckhardt
On Friday 03 December 2010, Jim Jenkins wrote:
> I'm planning to use Hooks to add OCR scanning for select documents going
> into a SVN repo.

I assume that you know how to OCR the docs, so this is just about SVN 
integration.

> Basically I'd like to have every commit to an SVN repo stop at the
> pre-commit (or another more suitable) hook so the submitted files can be
> inspected and if needed run through a command line OCR engine.  We are
> dealing with "image" based PDF files so these would be sent off to the
> OCR engine and a "test+image" PDF would be returned.  The new PDF would
> replace the original before being sent on it's way into the SVN repo.

I guess you meant "text+image" there, right? Anyway, what you want to do is 
possible, and you might be able to use the pre-commit hook for that, but you 
shouldn't. The things that you shouldn't do is modify commits on the server, 
because the client has no way of knowing about this, and the client will 
never receive a notification that the content of the repository is different 
from what it sent to the repository itself.

Suggestions:
1. You trigger a process that OCRs the PDF in question and then replaces the 
one in the repository or adds a second one next to it, but in a second 
commit. You could also batch this process, i.e. run it once at night or 
things like that.
2. You could simply reject the commit from a pre-commit hook if the file is 
not OCRed already. This would put it into the user's responsibility to run 
the OCR on the file before committing it.

You also mentioned that you only want to scan "select[ed] documents", you 
could achieve this using a custom property that you check in one of the 
processing steps.


Greetings from Hamburg!

Uli

-- 
ML: http://subversion.apache.org/docs/community-guide/mailing-lists.html
FAQ: http://subversion.apache.org/faq.html
Docs: http://svnbook.red-bean.com/


**
Domino Laser GmbH, Fangdieckstraße 75a, 22547 Hamburg, Deutschland
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
**
Visit our website at 
**
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten 
bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen 
Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein 
sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, 
weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte 
Änderungen enthalten. Domino Laser GmbH ist für diese Folgen nicht 
verantwortlich.
**




extreme svn slowdown due to X-forwarding timeouts

2010-12-06 Thread pjaytycy
Hello,


we recently upgraded svn on our remote build server from 1.5 to 1.6.
This caused a drastic reduction in speed:

u...@server:~$ time svn log -l 25 -q https://
[… snip …]

real0m22.745s
user0m0.192s
sys 0m0.048s

After some debugging, we found the cause to be entirely on my windows
pc, not related to the linux server:

=> I log in with putty over SSH and had X-forwarding turned on
=> I did not have an X-server running on my windows pc

The combination of these 2 made command-line svn on the linux server
extremely slow. I think this is due to the gnome/kde keyring
integration which is new in 1.6

So, to solve this: either run xming before starting putty, or turn off
X-forwarding in putty.


Maybe command line client svn could give a warning that it tried to
communicate with an X-server but got no response? That would have
helped us in discovering what caused the performance issue. However,
even when I actually run an X-server, I got no GUI popup, but atleast
SVN was not slow anymore.


Kind regards,

Pieter-Jan


permission issues with apache and subversion

2010-12-06 Thread Nalini Kumar

Hi,
I am new to SVN and wish to setup the repository through 
for one of my projects.



I am using ubuntu and installed apache2 and subversion 
1.6.12. the system has apache user setup and all the 
repositories are owned by apache (group apache).


I have created a passwd file using the command, "sudo 
htpasswd -b /etc/subversion/svn-passwd-file bliu bepbbq".


I use svn co http://127.0.0.1/svn/bcm/trunk to checkout 
the code. checkout has no issues. However when I try to 
check in a modified file from bcm/trunk, I get this error, 
"svn: Can't open file 
'/var/lib/svn/bcm/db/txn-current-lock': Permission 
denied". I was not even prompted for the passwd for the 
user bliu.


dav_svn.conf
=


  DAV svn
  SVNParentPath /var/lib/svn
  AuthType Basic
  AuthName "Subversion Repository"
  AuthUserFile /etc/subversion/sub-passwd-file
  DELETE MKACTIVITY OPTIONS REPORT>

Require valid-user
  


$  ls -al /var/lib/svn
total 16
drwxr-xr-x  4 apache apache 4096 2010-12-02 18:21 .
drwxrwxrwx 75 root   root   4096 2010-12-06 01:09 ..
drwxr-xr-x  7 apache apache 4096 2010-12-03 15:08 bcm
$ 


what is that I am doing wrong ?
FYI, apache is not in sudoers list.

I appreciate your help.

regards
Nalini Kumar.


Re: subversion cross compile (arm)

2010-12-06 Thread Philip Martin
Takács András  writes:

>> Here you are printing 64-bits, so some part of your system thinks that
>> apr_off_t is 64-bits.  How are apr_off_t and APR_HAS_LARGE_FILES defined
>> in apr.h?
>
> #define APR_HAS_LARGE_FILES   0
> typedef  off_t   apr_off_t;
>
> I think this is OK, isn't it?

It shows that APR is just following the rest of the system.  When you
printed values you showed 64 bits so it looks like off_t is 64-bits,
which conflicts with your earlier statement that you were using 32-bit
file offsets.  However the 64-bit values you printed look as if the
lower 32-bits are valid and the higher 32-bits are junk.  Your
environment appears to be confused about the size of file offsets.

-- 
Philip


RE: permission issues with apache and subversion

2010-12-06 Thread Edward Ned Harvey
> From: Nalini Kumar [mailto:nku...@actiontec.com]
> 
> "svn: Can't open file
> '/var/lib/svn/bcm/db/txn-current-lock': Permission
> 
> $  ls -al /var/lib/svn
> total 16
> drwxr-xr-x  4 apache apache 4096 2010-12-02 18:21 .
> drwxrwxrwx 75 root   root   4096 2010-12-06 01:09 ..
> drwxr-xr-x  7 apache apache 4096 2010-12-03 15:08 bcm

Of course, the more relevant information would be:
ls -ld /var /var/lib /var/lib/svn /var/lib/svn/bcm /var/lib/svn/bcm/db
/var/lib/svn/bcm/db/txn-current-lock

Even so, I'm assuming you already did that, and the perms are right.

Does ubuntu use selinux?  Or something else instead?  What about ACL's?

I know it's certainly possible in other OSes, for selinux to block apache
from accessing something, even when the permission bits are set correctly.
By default, apache writing to /var/lib would be a suspicious activity...
Not sure if you installed these things from native packages (via apt or
synaptics package manager) ... If you didn't install using the native
packages, I would suggest you try.  For this, and many other reasons.



Re: Using Hooks To OCR Documents

2010-12-06 Thread David Weintraub
On Fri, Dec 3, 2010 at 10:44 AM, Jim Jenkins  wrote:

> I’m planning to use Hooks to add OCR scanning for select documents going
into
> a SVN repo.  I’m not really sure where to start so I’m hoping someone here
can tell
> me if it’s possible and even suggest how best to proceed.

I'm going to take a slightly different approach. Pre-commit hooks are not
what you want.

   1. A pre-commit hook should only be used if the developer has some way of
   fixing an issue.  A good pre-commit hook is to make sure all files that end
   in *.sh have the property svn:eol-style set to "LF". If a developer doesn't
   set this, and the pre-commit hook fails, the developer can easily fix the
   problem and recommit the file.
   2. The user is left twiddling their thumbs on hooks, even a post-commit
   hook. If you have a hook that takes a few minutes to run, users will get
   impatient. They may simply not bother committing changes they should until
   they have a big horking commit which they'll do at the end of the day and
   leave.
   3. Changing committed files on a commit is very difficult. You, after
   all, don't have access to the client's workspace, so you'll have to emulate
   their checkout, so you can make your changes and do a commit. Of course that
   means that your pre-commit hook will fire off once more, so you'll have to
   have some mechanism in place letting your pre-commit hook know to not do
   whatever is it was suppose to do in the first place.
   4. Also, it's a bad idea to change a commit on a user. As Ulrich
   Eckhardt pointed out, your user's client doesn't know that the files they
   just committed were changed. Besides, what if your pre-commt hook created an
   error as a side effect of that hook? I once wrote a pre-commit hook in
   ClearCase to automatically expand RCS keywords. On occasion, the pre-commit
   hook expanded a sprintf statement or something like that, and the developer
   was furious because their program worked, and I botched it up.

I would instead think of your committed files as a "source" code, and that
your OCR scans as a "compiled" code.

What you probably want, although you really don't compile, is a continuous
build server that takes the committed files, and creates the needed OCR
scans of these files, and stores them where they can be referenced. The
storage area does not have to be Subversion (and in fact, I would argue that
Subversion is not your ideal storage area).

Take a look at Hudson. It's a powerful continuous build server and is very
flexible in its setup. With Hudson, you could automatically do the scans
after a commit, and then email the user if the scan failed for some reason.
It is possible to only have Hudson scan the files that were changed (since
Hudson knows which files were committed). And, it is possible to have Hudson
FTP or store the changed OCR files onto another server (or to simply keep
the scanned archive on Hudson itself.

It'll. take a bit of tweaking, but so would trying this in Subversion. And,
you and your users would be much happier with this arrangement.

--
David Weintraub
qazw...@gmail.com


failed to add directory

2010-12-06 Thread Dąbrowski , Leszek
Hello,
I encountered the behaviour that  looks like a bug.
I work with a quite big repository (local copy on my PC (x86, winXP SP3) has 
about 8GB
I did the following.
I started with the directory tree marked by the overly icon "normal" at the 
trunk node, I believe, that all was correct.
I start with the  .xls file on the  8th level of the directory tree (with trunk 
on 1th). The file needed  acquiring a lock to be  changed.
1. I opened the file with MSexcel and tried to write it. Excell refused to do 
it, I closed excel.
2. I got the lock and opened the file again. I tried to write it, excel refused 
to do it because it thought that the file is read-only. I wrote the file with 
modified name
3. I closed the excel
4. I renamed the file to its original name using the file manager. It asked 
about overwriting, I confirmed.
Now I had modified file, which had not been committed.
5. I tried to do update on the trunk. svn marked with the red "modified{" icon 
the branch of the directory tree ( all its nodes, except the file itself) in 
which the file was located
6. I renamed the modified file and then I did the update on the trunk  The last 
revision of the file was written in my local copy, but the tree still had been 
marked wit the "modified" icon .
7. I did  the update of the trunk once again and I got the following error 
message:

Command: Update
Error: Failed to add directory 'E:\ld\pgnig\new_repo\trunk': a versioned 
directory of
Error: the same name already exists
Finished!:

I did the clean-up on trunk, svn reported it to be successful, but the next 
update on trunk ended with the same error message.

I used the following version:
TortoiseSVN 1.6.12, Build 20536 - 32 Bit , 2010/11/24 20:59:01
Subversion 1.6.15,
apr 1.3.8
apr-utils 1.3.9
neon 0.29.5
OpenSSL 0.9.8p 16 Nov 2010
zlib 1.2.3

I encountered this behaviour several times, usually I had to remove the 
reported directory and get the fresh copy with update, but the size of the 
repository is too big to mean this as a valuable work-around

Sincerely yours,
Leszek Dabrowski







How to remove svn:externals?

2010-12-06 Thread W. Martin Borgert
Hi,

I have some svn:externals in my project, which must be removed,
but I don't know how. Removing the entries from svn:externals
using svn propedit did not work, i.e. the files were still there.
svn delete did not help either. I'm using SVN 1.6.12 on Debian.

Thanks in advance and please Cc me, as I'm not yet subscribed!


Re: How to remove svn:externals?

2010-12-06 Thread Johan Corveleyn
On Mon, Dec 6, 2010 at 6:27 PM, W. Martin Borgert  wrote:
> Hi,
>
> I have some svn:externals in my project, which must be removed,
> but I don't know how. Removing the entries from svn:externals
> using svn propedit did not work, i.e. the files were still there.
> svn delete did not help either. I'm using SVN 1.6.12 on Debian.
>
> Thanks in advance and please Cc me, as I'm not yet subscribed!

I assume you're talking about file externals (as opposed to directory
externals). Then this is a known problem, see:

http://subversion.tigris.org/issues/show_bug.cgi?id=3351 - can't
remove file externals

I think the only workaround currently is to throw away (part of) your
working copy (after you've removed/edited the svn:externals property
and committed that), and checking it out again.

Cheers,
-- 
Johan


Bug report -- space in env. var. VISUAL causes commits to fail (needs confirmation)

2010-12-06 Thread David Dyer-Bennet
Subversion 1.6.12 running on Centos 5.5

If the value of the environment variable VISUAL contains a space,
subversion fails when attempting to invoke the editor to get the
comment.

sh-3.2$ export VISUAL="/home/spaces in name/bin/emacs"
sh-3.2$ svn commit
sh: /home/spaces: No such file or directory
svn: Commit failed (details follow):
svn: system('/home/spaces in name/bin/emacs svn-commit.tmp') returned 32512

As you see in the error, it's constructing a command string without
consideration of the possibility of spaces in various places.

This also occurs when running subvrsion under Cygwin on Windows XP.  I
discovered it there, because there the natural place to put my
personal editor script (which attaches to an existing emacs if there
is one, and otherwise starts emacs itself, with various other personal
parameters and condition checking) ends up having the path

/cygdrive/c/Documents and Settings/david.bennet/My Documents/bin/ew

which has three spaces in two path components, neither name chosen by
me.

The workaround, obviously, is to place your editor (or editor script)
in a place that doesn't have spaces in the path.

So, can somebody confirm this please?  And, ideally, submit the bug
(since I keep getting stuck trying to go through all the hoops they
want you to go through to become enabled to do that)?
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info




Re: Bug report -- space in env. var. VISUAL causes commits to fail (needs confirmation)

2010-12-06 Thread Daniel Näslund
On Mon, Dec 06, 2010 at 01:44:23PM -0600, David Dyer-Bennet wrote:
> Subversion 1.6.12 running on Centos 5.5
> 
> If the value of the environment variable VISUAL contains a space,
> subversion fails when attempting to invoke the editor to get the
> comment.
> 
> sh-3.2$ export VISUAL="/home/spaces in name/bin/emacs"
> sh-3.2$ svn commit
> sh: /home/spaces: No such file or directory
> svn: Commit failed (details follow):
> svn: system('/home/spaces in name/bin/emacs svn-commit.tmp') returned 32512
> 
> As you see in the error, it's constructing a command string without
> consideration of the possibility of spaces in various places.

[...]

> So, can somebody confirm this please?  And, ideally, submit the bug
> (since I keep getting stuck trying to go through all the hoops they
> want you to go through to become enabled to do that)?

Found this thread [1] that discusses spaces in the editor cmd string. Julian
Foad confirms that it's a bug in a follow-up and even has some
suggestions on how to tackle the problem if someone wants to write a
patch. I haven't found any issues related to the problem in the
tracker but I'm really loosy when it comes to using issue trackers.

Daniel

[1] http://svn.haxx.se/dev/archive-2010-02/0051.shtml



Re: extreme svn slowdown due to X-forwarding timeouts

2010-12-06 Thread Nico Kadel-Garcia
On Mon, Dec 6, 2010 at 4:27 AM, pjaytycy  wrote:
> Hello,
>
>
> we recently upgraded svn on our remote build server from 1.5 to 1.6.
> This caused a drastic reduction in speed:
>
> u...@server:~$ time svn log -l 25 -q https://
> [… snip …]
>
> real    0m22.745s
> user    0m0.192s
> sys     0m0.048s
>
> After some debugging, we found the cause to be entirely on my windows
> pc, not related to the linux server:
>
> => I log in with putty over SSH and had X-forwarding turned on
> => I did not have an X-server running on my windows pc
>
> The combination of these 2 made command-line svn on the linux server
> extremely slow. I think this is due to the gnome/kde keyring
> integration which is new in 1.6
>
> So, to solve this: either run xming before starting putty, or turn off
> X-forwarding in putty.

Or switch to NX, from www.nomachine.com. I'm extremely happy with it
for well-managed X sessions, especially Xterm, optimized for low
bandwidth remote links.


SVN Version upgrade.

2010-12-06 Thread Gavin Beau Baumanis
Hi Everyone,

For our production repositories we're using 1.5.1

My question is - is there an appropriate version to upgrade to?
We're tossing up between;

Update to the latest 1.5.x
Update to 1.6.x
Or simply wait it out  - for 1.7

Is there a "suggested" upgrade path?

As always - thanks.
Gavin "Beau" Baumanis