Package: recoll Version: 1.24.3-3 Severity: normal -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I a bunch of PDFs which fail to index with a Python exception. A bunch are confidential, but this one isn't. Traceback (most recent call last): File "/usr/share/recoll/filters/rclpdf.py", line 523, in <module> rclexecm.main(proto, extract) File "/usr/share/recoll/filters/rclexecm.py", line 330, in main proto.mainloop(extract) File "/usr/share/recoll/filters/rclexecm.py", line 257, in mainloop self.processmessage(processor, params) File "/usr/share/recoll/filters/rclexecm.py", line 237, in processmessage self.answer(data, ipath, eof) File "/usr/share/recoll/filters/rclexecm.py", line 176, in answer self.senditem("Document", docdata) File "/usr/share/recoll/filters/rclexecm.py", line 168, in senditem l = len(data) TypeError: object of type 'NoneType' has no len() I'm attaching (a) complete output of it run with loglevel=7; (b) the PDF; (c) recoll.conf; (d) fields file - -- System Information: Debian Release: 10.0 APT prefers testing-debug APT policy: (500, 'testing-debug'), (500, 'testing'), (500, 'stable'), (130, 'unstable-debug'), (130, 'unstable'), (120, 'experimental-debug'), (120, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.19.0-5-amd64 (SMP w/8 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages recoll depends on: ii recollcmd 1.24.3-3 ii recollgui 1.24.3-3 recoll recommends no packages. recoll suggests no packages. - -- no debconf information -----BEGIN PGP SIGNATURE----- iHMEARECADMWIQTlAc7j4DAtSNRJJ0z7P4jCVepZ/gUCXP1XtxUcYW50aG9ueUBk ZXJvYmVydC5uZXQACgkQ+z+IwlXqWf78iACfYnW4XiezE2mKKpWWB1xVIv0t48gA oITYXzhoAgrB3ej/h4sOGJlmn1F7 =be9G -----END PGP SIGNATURE-----
Script started on 2019-06-09 14:59:23-04:00 [TERM="xterm" TTY="/dev/pts/21" COLUMNS="162" LINES="59"] Running scope as unit: run-r947e58c5876741de815297dad38b0b6e.scope :3:common/rclinit.cpp:312::Configuration directory: /home/anthony/.recoll :4:common/rclconfig.cpp:558::RclConfig::initThrConf: autoconf requested. 8 concurrent threads available. :4:common/rclconfig.cpp:604::RclConfig::initThrConf: chosen config (ql,nt): (2, 5) (2, 3) (2, 1) :5:common/rclinit.cpp:351::rclinit: will use vfork() for starting commands :3:index/recollindex.cpp:667::recollindex: changing current directory to [/tmp] :4:utils/execmd.cpp:469::ExecCmd::startExec: (0|0) /usr/share/recoll/filters/rclcheckneedretry.sh :4:utils/execmd.cpp:993::ExecCmd::wait: got status 0x256 :3:index/recollindex.cpp:700::recollindex: starting up :4:utils/execmd.cpp:469::ExecCmd::startExec: (0|0) /usr/bin/ionice {-c} {3} {-p} {29047} :4:utils/execmd.cpp:993::ExecCmd::wait: got status 0x0 :4:rcldb/rcldb.cpp:917::Db::open: m_isopen 0 m_iswritable 0 mode 1 :5:rcldb/stoplist.cpp:35::StopList::StopList: file_to_string(/home/anthony/.recoll/stoplist.txt) failed: open/stat: errno: 2 : :4:rcldb/rcldb.cpp:249::RclDb:: threads: haveWriteQ 1, wqlen 2 wqts 1 :4:rcldb/rcldb.cpp:943::Db::open: lastdocid: 5026 :4:index/fsindexer.cpp:135::FsIndexer: threads: haveIQ 1 iql 2 iqts 5 haveSQ 1 sql 2 sqts 3 :4:index/fsindexer.cpp:322::FsIndexer::indexFiles :5:index/fsindexer.cpp:527::FsIndexerInternfileWorker: task fn /home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf :4:index/fsindexer.cpp:639::processone: needupdate 1 noretry 0 existing 4294967295 oldsig [] :5:index/fsindexer.cpp:672::processone: processing: [48 KB ] /home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf :5:internfile/internfile.cpp:123::FileInterner::FileInterner(fn=/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf) :5:internfile/uncomp.cpp:41::Uncomp::Uncomp: m_docache: 0 :4:internfile/internfile.cpp:168::FileInterner::init fn [/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf] mime [(null)] preview 0 :4:utils/execmd.cpp:469::ExecCmd::startExec: (0|1) git-annex-meta-to-recoll {/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf} :5:utils/netcon.cpp:369::Netcon::selectloop: fd 11 has 0x0 mask, erasing :5:utils/execmd.cpp:827::ExecCmd::doexec: selectloop returned 0 :4:utils/execmd.cpp:993::ExecCmd::wait: got status 0x0 :4:internfile/mimehandler.cpp:268::getMimeHandler: mtype [application/pdf] filtertypes 1 :4:internfile/mimehandler.cpp:64::getMimeHandlerFromCache: 259dc001f1d38c8b4c425d19d34b1c4f cache size 0 :4:internfile/mimehandler.cpp:80::getMimeHandlerFromCache: 259dc001f1d38c8b4c425d19d34b1c4f not found :4:internfile/internfile.cpp:255::FileInterner:: init ok application/pdf [/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf] :4:internfile/internfile.cpp:767::FileInterner::internfile. ipath [] :4:internfile/mh_execm.cpp:157::MimeHandlerExecMultiple::next_document(): [/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf] :4:internfile/mh_execm.cpp:39::MimeHandlerExecMultiple::startCmd :4:utils/execmd.cpp:469::ExecCmd::startExec: (1|1) /usr/share/recoll/filters/rclpdf.py WARNING: The creator of the input PDF: /home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf has set an owner password (which is not required to handle this PDF). You did not supply this password. Please respect any copyright. Traceback (most recent call last): File "/usr/share/recoll/filters/rclpdf.py", line 523, in <module> rclexecm.main(proto, extract) File "/usr/share/recoll/filters/rclexecm.py", line 330, in main proto.mainloop(extract) File "/usr/share/recoll/filters/rclexecm.py", line 257, in mainloop self.processmessage(processor, params) File "/usr/share/recoll/filters/rclexecm.py", line 237, in processmessage self.answer(data, ipath, eof) File "/usr/share/recoll/filters/rclexecm.py", line 176, in answer self.senditem("Document", docdata) File "/usr/share/recoll/filters/rclexecm.py", line 168, in senditem l = len(data) TypeError: object of type 'NoneType' has no len() :4:utils/execmd.cpp:936::ExecCmd::getline: got 0 :2:internfile/mh_execm.cpp:89::MHExecMultiple: getline error :4:utils/execmd.cpp:280::ExecCmd: pid 29083 killpg(29083, SIGTERM) :5:internfile/extrameta.cpp:37::Internfile:: setting [company] from cmd/xattr value [Charles Schwab] :5:internfile/extrameta.cpp:37::Internfile:: setting [month] from cmd/xattr value [09] :5:internfile/extrameta.cpp:37::Internfile:: setting [doctype] from cmd/xattr value [Notice Regarding Delivery of Security Holder Documents] :5:internfile/extrameta.cpp:37::Internfile:: setting [year] from cmd/xattr value [2016] :5:internfile/extrameta.cpp:37::Internfile:: setting [ga-file_extension] from cmd/xattr value [pdf] :5:internfile/extrameta.cpp:37::Internfile:: setting [lastchanged] from cmd/xattr value [2017-04-09@21-54-42] :5:internfile/extrameta.cpp:37::Internfile:: setting [ga-month] from cmd/xattr value [1] :5:internfile/extrameta.cpp:37::Internfile:: setting [ga-year] from cmd/xattr value [2016] :4:internfile/internfile.cpp:610::collectIpath..: fbytes->47902 :2:internfile/internfile.cpp:762::FileInterner::internfile: next_document error [/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf] application/pdf :4:internfile/mimehandler.cpp:99::returnMimeHandler: returning filter for application/pdf cache size 0 :4:internfile/internfile.cpp:881::FileInterner::internfile: conversion ended with no doc :5:index/fsindexer.cpp:504::FsIndexerDbUpdWorker: task ql 1 :4:rcldb/rcldb.cpp:1462::Db::add: udi [/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf|] parent [] :5:rcldb/rcldb.cpp:1581::Db::add: field [company] pfx [XYC] inc 1: [Charles Schwab] :5:rcldb/rcldb.cpp:1581::Db::add: field [containerfilename] pfx [XCFN] inc 1: [2015-06 — Important notice regarding delivery of security holder documents.pdf] :5:rcldb/rcldb.cpp:1581::Db::add: field [doctype] pfx [XYT] inc 1: [Notice Regarding Delivery of Security Holder Documents] :5:rcldb/rcldb.cpp:1581::Db::add: field [filename] pfx [XSFN] inc 1: [2015-06 — Important notice regarding delivery of security holder documents.pdf] :5:rcldb/rcldb.cpp:1589::Db::add: no prefix for field [ga-file_extension], no indexing :5:rcldb/rcldb.cpp:1589::Db::add: no prefix for field [ga-month], no indexing :5:rcldb/rcldb.cpp:1589::Db::add: no prefix for field [ga-year], no indexing :5:rcldb/rcldb.cpp:1589::Db::add: no prefix for field [lastchanged], no indexing :4:rcldb/rcldb.cpp:1569::Adding value: for field month slot 1002 :5:rcldb/rclvalues.cpp:56::Rcl::add_field_value: slot 1002 [09] :5:rcldb/rcldb.cpp:1581::Db::add: field [month] pfx [XYM] inc 1: [09] :4:rcldb/rcldb.cpp:1569::Adding value: for field year slot 1003 :5:rcldb/rclvalues.cpp:56::Rcl::add_field_value: slot 1003 [2016] :5:rcldb/rcldb.cpp:1581::Db::add: field [year] pfx [XYY] inc 1: [2016] :5:rcldb/rcldb.cpp:1824::Rcl::Db::add: new doc record: url=file:///home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf mtype=application/pdf fmtime=01453055267 origcharset= fbytes=47902 pcbytes=47902 dbytes=0 sig=479021453130170+ company=Charles Schwab doctype=Notice Regarding Delivery of Security Holder Documents filename=2015-06 — Important notice regarding delivery of security holder documents.pdf lastchanged=2017-04-09@21-54-42 :4:rcldb/rcldb.cpp:205::DbUpdWorker: got add/update task, ql 1 :3:rcldb/rcldb.cpp:753::Db::add: docid 2373 updated [/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery of security holder documents.pdf|] :5:internfile/uncomp.cpp:138::Uncomp::~Uncomp: m_docache: 0 m_dir (null) :3:rcldb/rcldb.cpp:1938::Db::waitUpdIdle: total xapian work 396 mS :4:index/fsindexer.cpp:386::Indexfiles: purging orphans :3:rcldb/rcldb.cpp:1938::Db::waitUpdIdle: total xapian work 396 mS :4:index/fsindexer.cpp:398::FsIndexer::indexFiles: done :4:rcldb/rcldb.cpp:997::Db::i_close(0): m_isopen 1 m_iswritable 1 :3:rcldb/rcldb.cpp:1938::Db::waitUpdIdle: total xapian work 396 mS :4:rcldb/rcldb.cpp:1011::Rcl::Db:close: xapian will close. May take some time :4:./utils/workqueue.h:176::setTerminateAndWait:DbUpd :4:./utils/workqueue.h:312::WorkQueue:ok:DbUpd: not ok m_ok 0 m_workers_exited 0 m_worker_threads size 1 :4:./utils/workqueue.h:293::workerExit:DbUpd :3:./utils/workqueue.h:192::DbUpd: tasks 1 nowakes 1 wsleeps 2 csleeps 0 :4:./utils/workqueue.h:216::setTerminateAndWait:DbUpd done :4:rcldb/rcldb.cpp:1015::Rcl::Db:close() xapian close done. :4:internfile/mimehandler.cpp:129::clearMimeHandlerCache() :4:./utils/workqueue.h:176::setTerminateAndWait:Internfile :4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 m_workers_exited 0 m_worker_threads size 5 :4:./utils/workqueue.h:293::workerExit:Internfile :4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 m_workers_exited 1 m_worker_threads size 5 :4:./utils/workqueue.h:293::workerExit:Internfile :4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 m_workers_exited 1 m_worker_threads size 5 :4:./utils/workqueue.h:293::workerExit:Internfile :4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 m_workers_exited 1 m_worker_threads size 5 :4:./utils/workqueue.h:293::workerExit:Internfile :4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 m_workers_exited 1 m_worker_threads size 5 :4:./utils/workqueue.h:293::workerExit:Internfile :3:./utils/workqueue.h:192::Internfile: tasks 1 nowakes 0 wsleeps 6 csleeps 0 :4:./utils/workqueue.h:216::setTerminateAndWait:Internfile done :5:index/fsindexer.cpp:147::FsIndexer: internfile wrkr status: 0x1 (1->ok) :4:./utils/workqueue.h:176::setTerminateAndWait:Split :4:./utils/workqueue.h:312::WorkQueue:ok:Split: not ok m_ok 0 m_workers_exited 0 m_worker_threads size 3 :4:./utils/workqueue.h:293::workerExit:Split :4:./utils/workqueue.h:312::WorkQueue:ok:Split: not ok m_ok 0 m_workers_exited 0 m_worker_threads size 3 :4:./utils/workqueue.h:293::workerExit:Split :4:./utils/workqueue.h:312::WorkQueue:ok:Split: not ok m_ok 0 m_workers_exited 0 m_worker_threads size 3 :4:./utils/workqueue.h:293::workerExit:Split :3:./utils/workqueue.h:192::Split: tasks 1 nowakes 1 wsleeps 4 csleeps 0 :4:./utils/workqueue.h:216::setTerminateAndWait:Split done :5:index/fsindexer.cpp:151::FsIndexer: dbupd worker status: 0x1 (1->ok) :4:rcldb/rcldb.cpp:891::Db::~Db: isopen 0 m_iswritable 0 :4:rcldb/rcldb.cpp:997::Db::i_close(1): m_isopen 0 m_iswritable 0 Script done on 2019-06-09 14:59:24-04:00 [COMMAND_EXIT_CODE="0"]
pdfpMY0Ua13RS.pdf
Description: Adobe PDF document
# The system-wide configuration files for recoll are located in: # /usr/share/recoll/examples # The default configuration files are commented, you should take a look # at them for an explanation of what can be set (you could also take a look # at the manual instead). # Values set in this file will override the system-wide values for the file # with the same name in the central directory. The syntax for setting # values is identical. followLinks = 1 topdirs = /home/anthony/Filing skippedPaths = ~/Filing/.git pdfocr = 1 pdfocrlang = eng pdfattach = 0 pdfextrameta = pdf:Producer xmp:ModifyDate xmp:CreateDate rdf:title \ xapMM:DocumentID dc:creator dc:relation dc:publisher \ dc:title dc:type dc:identifier pages xap:CreateDate \ xap:CreatorTool # bloody OCR takes forever. filtermaxseconds = 7200 # default causes pdftk to fail to init java VM; disable. Better to use # cgroups anyway to limit whole group of processes (if its ever needed). filtermaxmbytes = 0 loglevel = 3 nomd5types = compressedfilemaxkbs = 200000 indexallfilenames = 1 indexStoreDocText = 1 nonumbers = 0 idxflushmb = 20 metadatacmds = ; rclmulti01 = git-annex-meta-to-recoll %f indexstemminglanguages = english defaultcharset = UTF8// aspellLanguage = en
[prefixes] company = XYC day = XYD; pfxonly = 1 month = XYM; pfxonly = 1 doctype = XYT year = XYY; pfxonly = 1 [values] day = 1001; type=int; len=2 month = 1002; type=int; len=2 year = 1003; type=int; len=4 [stored] company = doctype = lastchanged = [aliases] company = ga-company year = ga-document_year month = ga-document_month day = ga-document_day doctype = ga-document_type lastchanged = ga-lastchanged keywords = keyword xesam:keyword tag tags dc:subject xesam:subject \ dc:description ga-tag