Package: recoll
Version: 1.24.3-3
Severity: normal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I a bunch of PDFs which fail to index with a Python exception. A bunch
are confidential, but this one isn't.

Traceback (most recent call last):
  File "/usr/share/recoll/filters/rclpdf.py", line 523, in <module>
    rclexecm.main(proto, extract)
  File "/usr/share/recoll/filters/rclexecm.py", line 330, in main
    proto.mainloop(extract)
  File "/usr/share/recoll/filters/rclexecm.py", line 257, in mainloop
    self.processmessage(processor, params)
  File "/usr/share/recoll/filters/rclexecm.py", line 237, in processmessage
    self.answer(data, ipath, eof)
  File "/usr/share/recoll/filters/rclexecm.py", line 176, in answer
    self.senditem("Document", docdata)
  File "/usr/share/recoll/filters/rclexecm.py", line 168, in senditem
    l = len(data)
TypeError: object of type 'NoneType' has no len()

I'm attaching (a) complete output of it run with loglevel=7; (b) the
PDF; (c) recoll.conf; (d) fields file

- -- System Information:
Debian Release: 10.0
  APT prefers testing-debug
  APT policy: (500, 'testing-debug'), (500, 'testing'), (500, 'stable'), (130, 
'unstable-debug'), (130, 'unstable'), (120, 'experimental-debug'), (120, 
'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.19.0-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages recoll depends on:
ii  recollcmd  1.24.3-3
ii  recollgui  1.24.3-3

recoll recommends no packages.

recoll suggests no packages.

- -- no debconf information

-----BEGIN PGP SIGNATURE-----

iHMEARECADMWIQTlAc7j4DAtSNRJJ0z7P4jCVepZ/gUCXP1XtxUcYW50aG9ueUBk
ZXJvYmVydC5uZXQACgkQ+z+IwlXqWf78iACfYnW4XiezE2mKKpWWB1xVIv0t48gA
oITYXzhoAgrB3ej/h4sOGJlmn1F7
=be9G
-----END PGP SIGNATURE-----
Script started on 2019-06-09 14:59:23-04:00 [TERM="xterm" TTY="/dev/pts/21" 
COLUMNS="162" LINES="59"]
Running scope as unit: run-r947e58c5876741de815297dad38b0b6e.scope
:3:common/rclinit.cpp:312::Configuration directory: /home/anthony/.recoll
:4:common/rclconfig.cpp:558::RclConfig::initThrConf: autoconf requested. 8 
concurrent threads available.
:4:common/rclconfig.cpp:604::RclConfig::initThrConf: chosen config (ql,nt): (2, 
5) (2, 3) (2, 1) 
:5:common/rclinit.cpp:351::rclinit: will use vfork() for starting commands
:3:index/recollindex.cpp:667::recollindex: changing current directory to [/tmp]
:4:utils/execmd.cpp:469::ExecCmd::startExec: (0|0) 
/usr/share/recoll/filters/rclcheckneedretry.sh 
:4:utils/execmd.cpp:993::ExecCmd::wait: got status 0x256
:3:index/recollindex.cpp:700::recollindex: starting up
:4:utils/execmd.cpp:469::ExecCmd::startExec: (0|0) /usr/bin/ionice {-c} {3} 
{-p} {29047} 
:4:utils/execmd.cpp:993::ExecCmd::wait: got status 0x0
:4:rcldb/rcldb.cpp:917::Db::open: m_isopen 0 m_iswritable 0 mode 1
:5:rcldb/stoplist.cpp:35::StopList::StopList: 
file_to_string(/home/anthony/.recoll/stoplist.txt) failed: open/stat: errno: 2 
: 
:4:rcldb/rcldb.cpp:249::RclDb:: threads: haveWriteQ 1, wqlen 2 wqts 1
:4:rcldb/rcldb.cpp:943::Db::open: lastdocid: 5026
:4:index/fsindexer.cpp:135::FsIndexer: threads: haveIQ 1 iql 2 iqts 5 haveSQ 1 
sql 2 sqts 3
:4:index/fsindexer.cpp:322::FsIndexer::indexFiles
:5:index/fsindexer.cpp:527::FsIndexerInternfileWorker: task fn 
/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery 
of security holder documents.pdf
:4:index/fsindexer.cpp:639::processone: needupdate 1 noretry 0 existing 
4294967295 oldsig []
:5:index/fsindexer.cpp:672::processone: processing: [48 KB ] 
/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery 
of security holder documents.pdf
:5:internfile/internfile.cpp:123::FileInterner::FileInterner(fn=/home/anthony/Filing/Schwab/2015/2015-06
 — Important notice regarding delivery of security holder documents.pdf)
:5:internfile/uncomp.cpp:41::Uncomp::Uncomp: m_docache: 0
:4:internfile/internfile.cpp:168::FileInterner::init fn 
[/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery 
of security holder documents.pdf] mime [(null)] preview 0
:4:utils/execmd.cpp:469::ExecCmd::startExec: (0|1) git-annex-meta-to-recoll 
{/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery 
of security holder documents.pdf} 
:5:utils/netcon.cpp:369::Netcon::selectloop: fd 11 has 0x0 mask, erasing
:5:utils/execmd.cpp:827::ExecCmd::doexec: selectloop returned 0
:4:utils/execmd.cpp:993::ExecCmd::wait: got status 0x0
:4:internfile/mimehandler.cpp:268::getMimeHandler: mtype [application/pdf] 
filtertypes 1
:4:internfile/mimehandler.cpp:64::getMimeHandlerFromCache: 
259dc001f1d38c8b4c425d19d34b1c4f cache size 0
:4:internfile/mimehandler.cpp:80::getMimeHandlerFromCache: 
259dc001f1d38c8b4c425d19d34b1c4f not found
:4:internfile/internfile.cpp:255::FileInterner:: init ok application/pdf 
[/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery 
of security holder documents.pdf]
:4:internfile/internfile.cpp:767::FileInterner::internfile. ipath []
:4:internfile/mh_execm.cpp:157::MimeHandlerExecMultiple::next_document(): 
[/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery 
of security holder documents.pdf]
:4:internfile/mh_execm.cpp:39::MimeHandlerExecMultiple::startCmd
:4:utils/execmd.cpp:469::ExecCmd::startExec: (1|1) 
/usr/share/recoll/filters/rclpdf.py 
WARNING: The creator of the input PDF:
   /home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding 
delivery of security holder documents.pdf
   has set an owner password (which is not required to handle this PDF).
   You did not supply this password. Please respect any copyright.
Traceback (most recent call last):
  File "/usr/share/recoll/filters/rclpdf.py", line 523, in <module>
    rclexecm.main(proto, extract)
  File "/usr/share/recoll/filters/rclexecm.py", line 330, in main
    proto.mainloop(extract)
  File "/usr/share/recoll/filters/rclexecm.py", line 257, in mainloop
    self.processmessage(processor, params)
  File "/usr/share/recoll/filters/rclexecm.py", line 237, in processmessage
    self.answer(data, ipath, eof)
  File "/usr/share/recoll/filters/rclexecm.py", line 176, in answer
    self.senditem("Document", docdata)
  File "/usr/share/recoll/filters/rclexecm.py", line 168, in senditem
    l = len(data)
TypeError: object of type 'NoneType' has no len()
:4:utils/execmd.cpp:936::ExecCmd::getline: got 0
:2:internfile/mh_execm.cpp:89::MHExecMultiple: getline error
:4:utils/execmd.cpp:280::ExecCmd: pid 29083 killpg(29083, SIGTERM)
:5:internfile/extrameta.cpp:37::Internfile:: setting [company] from cmd/xattr 
value [Charles Schwab]
:5:internfile/extrameta.cpp:37::Internfile:: setting [month] from cmd/xattr 
value [09]
:5:internfile/extrameta.cpp:37::Internfile:: setting [doctype] from cmd/xattr 
value [Notice Regarding Delivery of Security Holder Documents]
:5:internfile/extrameta.cpp:37::Internfile:: setting [year] from cmd/xattr 
value [2016]
:5:internfile/extrameta.cpp:37::Internfile:: setting [ga-file_extension] from 
cmd/xattr value [pdf]
:5:internfile/extrameta.cpp:37::Internfile:: setting [lastchanged] from 
cmd/xattr value [2017-04-09@21-54-42]
:5:internfile/extrameta.cpp:37::Internfile:: setting [ga-month] from cmd/xattr 
value [1]
:5:internfile/extrameta.cpp:37::Internfile:: setting [ga-year] from cmd/xattr 
value [2016]
:4:internfile/internfile.cpp:610::collectIpath..: fbytes->47902
:2:internfile/internfile.cpp:762::FileInterner::internfile: next_document error 
[/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery 
of security holder documents.pdf] application/pdf 
:4:internfile/mimehandler.cpp:99::returnMimeHandler: returning filter for 
application/pdf cache size 0
:4:internfile/internfile.cpp:881::FileInterner::internfile: conversion ended 
with no doc
:5:index/fsindexer.cpp:504::FsIndexerDbUpdWorker: task ql 1
:4:rcldb/rcldb.cpp:1462::Db::add: udi [/home/anthony/Filing/Schwab/2015/2015-06 
— Important notice regarding delivery of security holder documents.pdf|] parent 
[]
:5:rcldb/rcldb.cpp:1581::Db::add: field [company] pfx [XYC] inc 1: [Charles 
Schwab]
:5:rcldb/rcldb.cpp:1581::Db::add: field [containerfilename] pfx [XCFN] inc 1: 
[2015-06 — Important notice regarding delivery of security holder documents.pdf]
:5:rcldb/rcldb.cpp:1581::Db::add: field [doctype] pfx [XYT] inc 1: [Notice 
Regarding Delivery of Security Holder Documents]
:5:rcldb/rcldb.cpp:1581::Db::add: field [filename] pfx [XSFN] inc 1: [2015-06 — 
Important notice regarding delivery of security holder documents.pdf]
:5:rcldb/rcldb.cpp:1589::Db::add: no prefix for field [ga-file_extension], no 
indexing
:5:rcldb/rcldb.cpp:1589::Db::add: no prefix for field [ga-month], no indexing
:5:rcldb/rcldb.cpp:1589::Db::add: no prefix for field [ga-year], no indexing
:5:rcldb/rcldb.cpp:1589::Db::add: no prefix for field [lastchanged], no indexing
:4:rcldb/rcldb.cpp:1569::Adding value: for field month slot 1002
:5:rcldb/rclvalues.cpp:56::Rcl::add_field_value: slot 1002 [09]
:5:rcldb/rcldb.cpp:1581::Db::add: field [month] pfx [XYM] inc 1: [09]
:4:rcldb/rcldb.cpp:1569::Adding value: for field year slot 1003
:5:rcldb/rclvalues.cpp:56::Rcl::add_field_value: slot 1003 [2016]
:5:rcldb/rcldb.cpp:1581::Db::add: field [year] pfx [XYY] inc 1: [2016]
:5:rcldb/rcldb.cpp:1824::Rcl::Db::add: new doc record:
url=file:///home/anthony/Filing/Schwab/2015/2015-06 — Important notice 
regarding delivery of security holder documents.pdf
mtype=application/pdf
fmtime=01453055267
origcharset=
fbytes=47902
pcbytes=47902
dbytes=0
sig=479021453130170+
company=Charles Schwab
doctype=Notice Regarding Delivery of Security Holder Documents
filename=2015-06 — Important notice regarding delivery of security holder 
documents.pdf
lastchanged=2017-04-09@21-54-42

:4:rcldb/rcldb.cpp:205::DbUpdWorker: got add/update task, ql 1
:3:rcldb/rcldb.cpp:753::Db::add: docid 2373 updated 
[/home/anthony/Filing/Schwab/2015/2015-06 — Important notice regarding delivery 
of security holder documents.pdf|]
:5:internfile/uncomp.cpp:138::Uncomp::~Uncomp: m_docache: 0 m_dir (null)
:3:rcldb/rcldb.cpp:1938::Db::waitUpdIdle: total xapian work 396 mS
:4:index/fsindexer.cpp:386::Indexfiles: purging orphans
:3:rcldb/rcldb.cpp:1938::Db::waitUpdIdle: total xapian work 396 mS
:4:index/fsindexer.cpp:398::FsIndexer::indexFiles: done
:4:rcldb/rcldb.cpp:997::Db::i_close(0): m_isopen 1 m_iswritable 1
:3:rcldb/rcldb.cpp:1938::Db::waitUpdIdle: total xapian work 396 mS
:4:rcldb/rcldb.cpp:1011::Rcl::Db:close: xapian will close. May take some time
:4:./utils/workqueue.h:176::setTerminateAndWait:DbUpd
:4:./utils/workqueue.h:312::WorkQueue:ok:DbUpd: not ok m_ok 0 m_workers_exited 
0 m_worker_threads size 1
:4:./utils/workqueue.h:293::workerExit:DbUpd
:3:./utils/workqueue.h:192::DbUpd: tasks 1 nowakes 1 wsleeps 2 csleeps 0
:4:./utils/workqueue.h:216::setTerminateAndWait:DbUpd done
:4:rcldb/rcldb.cpp:1015::Rcl::Db:close() xapian close done.
:4:internfile/mimehandler.cpp:129::clearMimeHandlerCache()
:4:./utils/workqueue.h:176::setTerminateAndWait:Internfile
:4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 
m_workers_exited 0 m_worker_threads size 5
:4:./utils/workqueue.h:293::workerExit:Internfile
:4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 
m_workers_exited 1 m_worker_threads size 5
:4:./utils/workqueue.h:293::workerExit:Internfile
:4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 
m_workers_exited 1 m_worker_threads size 5
:4:./utils/workqueue.h:293::workerExit:Internfile
:4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 
m_workers_exited 1 m_worker_threads size 5
:4:./utils/workqueue.h:293::workerExit:Internfile
:4:./utils/workqueue.h:312::WorkQueue:ok:Internfile: not ok m_ok 0 
m_workers_exited 1 m_worker_threads size 5
:4:./utils/workqueue.h:293::workerExit:Internfile
:3:./utils/workqueue.h:192::Internfile: tasks 1 nowakes 0 wsleeps 6 csleeps 0
:4:./utils/workqueue.h:216::setTerminateAndWait:Internfile done
:5:index/fsindexer.cpp:147::FsIndexer: internfile wrkr status: 0x1 (1->ok)
:4:./utils/workqueue.h:176::setTerminateAndWait:Split
:4:./utils/workqueue.h:312::WorkQueue:ok:Split: not ok m_ok 0 m_workers_exited 
0 m_worker_threads size 3
:4:./utils/workqueue.h:293::workerExit:Split
:4:./utils/workqueue.h:312::WorkQueue:ok:Split: not ok m_ok 0 m_workers_exited 
0 m_worker_threads size 3
:4:./utils/workqueue.h:293::workerExit:Split
:4:./utils/workqueue.h:312::WorkQueue:ok:Split: not ok m_ok 0 m_workers_exited 
0 m_worker_threads size 3
:4:./utils/workqueue.h:293::workerExit:Split
:3:./utils/workqueue.h:192::Split: tasks 1 nowakes 1 wsleeps 4 csleeps 0
:4:./utils/workqueue.h:216::setTerminateAndWait:Split done
:5:index/fsindexer.cpp:151::FsIndexer: dbupd worker status: 0x1 (1->ok)
:4:rcldb/rcldb.cpp:891::Db::~Db: isopen 0 m_iswritable 0
:4:rcldb/rcldb.cpp:997::Db::i_close(1): m_isopen 0 m_iswritable 0

Script done on 2019-06-09 14:59:24-04:00 [COMMAND_EXIT_CODE="0"]

Attachment: pdfpMY0Ua13RS.pdf
Description: Adobe PDF document

# The system-wide configuration files for recoll are located in:
#   /usr/share/recoll/examples
# The default configuration files are commented, you should take a look
# at them for an explanation of what can be set (you could also take a look
# at the manual instead).
# Values set in this file will override the system-wide values for the file
# with the same name in the central directory. The syntax for setting
# values is identical.

followLinks = 1
topdirs = /home/anthony/Filing
skippedPaths = ~/Filing/.git

pdfocr = 1
pdfocrlang = eng
pdfattach = 0
pdfextrameta = pdf:Producer xmp:ModifyDate xmp:CreateDate rdf:title \
xapMM:DocumentID dc:creator dc:relation dc:publisher \
dc:title dc:type dc:identifier pages xap:CreateDate \
xap:CreatorTool

# bloody OCR takes forever.
filtermaxseconds = 7200

# default causes pdftk to fail to init java VM; disable. Better to use
# cgroups anyway to limit whole group of processes (if its ever needed).
filtermaxmbytes = 0

loglevel = 3

nomd5types = 
compressedfilemaxkbs = 200000

indexallfilenames = 1
indexStoreDocText = 1
nonumbers = 0
idxflushmb = 20

metadatacmds = ; rclmulti01 = git-annex-meta-to-recoll %f

indexstemminglanguages = english
defaultcharset = UTF8//

aspellLanguage = en
[prefixes]
company = XYC
day     = XYD; pfxonly = 1
month   = XYM; pfxonly = 1
doctype = XYT
year    = XYY; pfxonly = 1

[values]
day   = 1001; type=int; len=2
month = 1002; type=int; len=2
year  = 1003; type=int; len=4

[stored]
company =
doctype = 
lastchanged =

[aliases]
company = ga-company
year = ga-document_year
month = ga-document_month
day = ga-document_day
doctype = ga-document_type
lastchanged = ga-lastchanged
keywords = keyword xesam:keyword tag tags dc:subject xesam:subject \
           dc:description ga-tag

Reply via email to