Hello Sean, meny thanks for this report! On 11/01/2019 17:58, Sean Whitton wrote: > Package: python3-pdfminer > Version: 20181108+dfsg-2 > > Dear maintainer, > > pdfminer.six's setup.py says that it requires pycryptodome, so I would > expect to see > > Depends: python3-pycryptodome > > but instead there is > > Recommends: python3-crypto > > which seems to be wrong in two ways: > > 1) it is a recommends, not a hard depends, but in setup.py it is listed > as 'required'
You are right, it's listed as a hard depends, but it used only in the following module: >>> pdfminer.pdfdocument <module 'pdfminer.pdfdocument' from '/usr/lib/python3/dist-packages/pdfminer/pdfdocument.py'> If we look at the code we will see: try: from Crypto.Cipher import ARC4 from Crypto.Cipher import AES from Crypto.Hash import SHA256 except ImportError: AES = SHA256 = None from . import arcfour as ARC4 So if an ImportError is raised the error will be catched and only AES and SHA256 will be disabled. For ARC4 there is an implementation inside pdfminer itself used as fallback. This is why I chose recommends instead of depends, I think that could be a valid usecase to not have pycrypto (or pycryptodome) as hard dependency since it's not really required. I usually do this when I discover that a library is not really required. Is this causing some problems? > 2) it is -crypto, rather than the -cryptodome fork. > > I note that python3-pycryptodome is broken (#886291) and is not likely > to be fixed before the transitions freeze, but I am not really sure > whether that bug blocks this one or not. I also stumbled on #886291, and since only AES, ARC4 and SHA256 are used (and I don't think that the implementation in pycrypto of those algorithms should worry us) I chose pycrypto (also it was used in the original pdfminer) instead of pycryptodome. I tested it with using the following steps (with python3-crypto installed): ❯ cat test.tex \documentclass[a4paper]{article} \begin{document} \thispagestyle{empty} This is a test! \end{document} ❯ lualatex test.tex [CUT output] ❯ qpdf --encrypt 1234 1234 40 -- test.pdf test-encrypted_ARC4.pdf ❯ pdf2txt -P 1234 test-encrypted_ARC4.pdf This is a test! ❯ qpdf --encrypt 1234 1234 128 --use-aes=y -- test.pdf test-encrypted_AES.pdf ❯ pdf2txt -P 1234 test-encrypted_AES.pdf This is a test! Uninstalling python{,3}-crypto we will not be able do decrypt the file encrypted with AES: ❯ pdf2txt -P 1234 test-encrypted_AES.pdf Traceback (most recent call last): File "/usr/bin/pdf2txt", line 136, in <module> if __name__ == '__main__': sys.exit(main()) File "/usr/bin/pdf2txt", line 131, in main outfp = extract_text(**vars(A)) File "/usr/bin/pdf2txt", line 63, in extract_text pdfminer.high_level.extract_text_to_fp(fp, **locals()) File "/usr/lib/python3/dist-packages/pdfminer/high_level.py", line 80, in extract_text_to_fp check_extractable=True): File "/usr/lib/python3/dist-packages/pdfminer/pdfpage.py", line 129, in get_pages doc = PDFDocument(parser, password=password, caching=caching) File "/usr/lib/python3/dist-packages/pdfminer/pdfdocument.py", line 577, in __init__ self._initialize_password(password) File "/usr/lib/python3/dist-packages/pdfminer/pdfdocument.py", line 602, in _initialize_password raise PDFEncryptionError('Unknown algorithm: param=%r' % param) pdfminer.pdfdocument.PDFEncryptionError: Unknown algorithm: param={'CF': {'StdCF': {'AuthEvent': /'DocOpen', 'CFM': /'AESV2', 'Length': 16}}, 'Filter': /'Standard', 'Length': 128, 'O': b'\xc4\x8f\x00\x1f\xdcy\xa00\xd7\x18\xdf]\xbb\xda\xad\x81\xd1\xf6\xfe\xde\xc4\xa7\xb5\xcd\x98\rd\x13\x9e\xdf\xcb~', 'P': -4, 'R': 4, 'StmF': /'StdCF', 'StrF': /'StdCF', 'U': b'\xfe\xeb\xed\x0e\x0f{2}r\xc5g\xc7\xf2]\xf0\xf4\x01"Ej\x91\xba\xe5\x13Bs\xa6\xdb\x13L\x87\xc4', 'V': 4} but the ARC4 will be fine, due the provided implementation of pdfminer: ❯ pdf2txt -P 1234 test-encrypted_ARC4.pdf This is a test! Did you have problems with encryption? I only know qpdf to perform encryption, so I used it, but if you can suggest more tools to test this feature I'll be happy to try them. Since all was fine I did not investigated more on pycryptodome, but please tell me if I missed something. > Please excuse my limited knowledge of python library packaging. No need to excuse, your point are perfectly valid without an in-depth analysis and I'm sorry for not putting what I just wrote in a README.Debian file inside the package to explain why I did these choices. I did not thought about it, I taken for granted... :( My plan was to use python{,3}-crypto for Buster and then switch to pycryptodome in Buster+1, but please tell me if something is not working or there is something that I did not considered. Regards, -- Daniele Tricoli 'eriol' https://mornie.org
signature.asc
Description: OpenPGP digital signature