Have you tried just using Tika directly and seeing what gets output? Maybe it is all prefixed somehow. Or sending one file as a sample directly to the extract handler and temporarily storing the ignored_* dynamicField to see what actually happens?
Basically, check what is there before trying to figure out what is not there. Sometimes it is faster in a multi-step chain of actions. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Mar 18, 2014 at 3:59 PM, Anders Gustafsson <anders.gustafs...@pedago.fi> wrote: > solr-spec 4.6.1 > lucene-spec 4.6.0 > lux-appserver 1.1.0 > tika 1.4 > poi 3.9 > > Hi! > > I set it up, pretty much following the instructions at > http://www.codewrecks.com/blog/index.php/2013/05/25/import-folder-of-documents-with-apache-solr-4-0-and-tika/ > > Problem is that I cannot seem to import custom properties? Ie I created > a word 2013 doc with a custom property called "Testmeta". It is visible > in custom.xml if I open up the ooxml file in winzip. I then tried to map > it for import in data-config.xml: > > <dataConfig> > <dataSource type="BinFileDataSource" /> > <document> > <entity name="files" dataSource="null" rootEntity="false" > processor="FileListEntityProcessor" > baseDir="/tmp/docs" fileName=".*.(doc)|(pdf)|(docx)" > onError="skip" > recursive="true"> > <field column="fileAbsolutePath" name="lux_uri" /> > <field column="fileSize" name="size" /> > <field column="fileLastModified" name="lastModified" > /> > > <entity > name="documentImport" > processor="TikaEntityProcessor" > url="${files.fileAbsolutePath}" > format="text"> > <field column="file" name="fileName"/> > <field column="Author" name="author" meta="true"/> > <field column="title" name="title" meta="true"/> > <field column="text" name="text"/> > <field column="Testmeta" name="Testmeta" > meta="true"/> > <field column="LastModifiedBy" > name="LastModifiedBy" meta="true"/> > </entity> > </entity> > </document> > </dataConfig> > > and schema.xml: > > <field name="Testmeta" type="text" indexed="true" stored="true" /> > > Still I see no mention of the field when I do an import (below). > According to https://issues.apache.org/jira/browse/TIKA-695 it should > work. But I see no mention of any special config that needs to be done. > > > Any help appreciated! > > "mode": "debug", > "documents": [ > { > "size": [ > 14516 > ], > "lastModified": [ > "2014-03-18T06:53:14Z" > ], > "lux_uri": [ > "/tmp/docs/ff-1923-12.docx" > ], > "text": [ > "Förordning ........." > ], > "title": [ > "Förordning ........" > ], > "author": [ > "Lagberedningen" > ], > "_version_": [ > 1462902187294195700 > ] > } > ], > > -- > Anders Gustafsson > Engineer, CNI, CNE6, ASE > Pedago, The Aaland Islands (N60 E20) > www.pedago.fi > phone +358 18 12060 > mobile +358 40506 7099 >