Tika support inside DIH does not support wildcard mapping. If you are not
planning to do any inner-entity content parsing, you might be better off
with using ExtractingRequestHandler and uprefix parameter.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Sat, May 25, 2013 at 4:44 AM, Gian Maria Ricci
<alkamp...@nablasoft.com>wrote:

> Hi to everyone,****
>
> ** **
>
> I’ve configured import of a document folder with FileListEntityProcessor,
> everything went smooth on the first try, but I have a simple question. I’m
> able to map metadata without any problem, but I’d like to import in my
> index all metadata, not only those I’ve configured with field nodes. In
> this example I’ve imported Author and title, but I does not know in advance
> which metadata a document could have and I wish to have all of them inside
> my index.****
>
> ** **
>
> Here is my import config. It is the first try with importing with tika and
> probably I’m missing a simple stuff.****
>
> ** **
>
> <dataConfig>  ****
>
>                 <dataSource type="BinFileDataSource" />****
>
>                                 <document>****
>
>                                                 <entity name="files"
> dataSource="null" rootEntity="false"****
>
>
> processor="FileListEntityProcessor" ****
>
>                                                 baseDir="c:/temp/docs"
> fileName=".*\.(doc)|(pdf)|(docx)"****
>
>                                                 onError="skip"****
>
>                                                 recursive="true">****
>
>                                                                 <field
> column="file" name="id" />****
>
>                                                                 <field
> column="fileAbsolutePath" name="path" />****
>
>                                                                 <field
> column="fileSize" name="size" />****
>
>                                                                 <field
> column="fileLastModified" name="lastModified" />****
>
>                                                                 ****
>
>                                                                 <entity **
> **
>
>
> name="documentImport" ****
>
>
> processor="TikaEntityProcessor"****
>
>
> url="${files.fileAbsolutePath}" ****
>
>
> format="text">****
>
>
> <field column="file" name="fileName"/>****
>
>
> <field column="Author" name="author" meta="true"/>****
>
>
> <field column="title" name="title" meta="true"/>****
>
>
> <field column="text" name="text"/>****
>
>                                                                 </entity>*
> ***
>
>                                 </entity>****
>
>                                 </document> ****
>
> </dataConfig>  ****
>
> ** **
>
> ** **
>
> --****
>
> Gian Maria Ricci****
>
> Mobile: +39 320 0136949****
>
> <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635> [image:
> https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQyg0wiW_QuTxl-rnuVR2P0jGuj4qO3I9attctCNarL--FC3vdPYg]<http://www.linkedin.com/in/gianmariaricci>
>  [image:
> https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcT8z0HpwpDSjDWw1I59Yx7HmF79u-NnP0NYeYYyEyWM1WtIbOl7]<https://twitter.com/alkampfer>
>  [image:
> https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQQWMj687BGGypKMUTub_lkUrull1uU2LTx0K2tDBeu3mNUr7Oxlg]<http://feeds.feedburner.com/AlkampferEng>
>  [image:
> https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSkTG_lPTPFe470xfDtiInUtseqKcuV_lvI5h_-8t_3PsY5ikg3]
> ****
>
> ** **
>
> ** **
>

Reply via email to