Thanks for the help. @Alexandre: Thanks for the suggestion, I'll try to use an ExtractingRequestHandler, I thought that I was missing some DIH option :).
@Erik: I'm interested in knowing them all to do various form of analysis. I have documents coming from heterogeneous sources and I'm interested in searching inside the content, but also being able to extract all possible metadata. I'm working in .Net so it is useful letting tika doing everything for me directly in solr and then retrieve all metadata for matched documents. Thanks again to everyone. -- Gian Maria Ricci Mobile: +39 320 0136949 -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, May 26, 2013 5:30 PM To: solr-user@lucene.apache.org; Gian Maria Ricci Subject: Re: Tika: How can I import automatically all metadata without specifiying them explicitly In addition to Alexandre's comment: bq: ...I'd like to import in my index all metadata Be a little careful here, this isn't actually very useful in my experience. Sure it's nice to have all that data in the index, but... how do you search it meaningfully? Consider that some doc may have an "author" metadata field. Another may have a "last editor" field. Yet another may have a "main author" field. If you add all these as their field name, what do you do to search for "author"? Somehow you have to create a mapping between the various metadata names and something that's searchable, why not do this at index time? Not to mention I've seen this done and the result may be literally hundreds of different metadata fields which are not very useful. All that said, it may be perfectly valid to inde them all, but before going there it's worth considering whether the result is actually _useful_. Best Erick On Sat, May 25, 2013 at 4:44 AM, Gian Maria Ricci <alkamp...@nablasoft.com>wrote: > Hi to everyone,**** > > ** ** > > I've configured import of a document folder with > FileListEntityProcessor, everything went smooth on the first try, but > I have a simple question. I'm able to map metadata without any > problem, but I'd like to import in my index all metadata, not only > those I've configured with field nodes. In this example I've imported > Author and title, but I does not know in advance which metadata a > document could have and I wish to have all of them inside my > index.**** > > ** ** > > Here is my import config. It is the first try with importing with tika > and probably I'm missing a simple stuff.**** > > ** ** > > <dataConfig> **** > > <dataSource type="BinFileDataSource" />**** > > <document>**** > > <entity name="files" > dataSource="null" rootEntity="false"**** > > > processor="FileListEntityProcessor" **** > > baseDir="c:/temp/docs" > fileName=".*\.(doc)|(pdf)|(docx)"**** > > onError="skip"**** > > recursive="true">**** > > <field > column="file" name="id" />**** > > <field > column="fileAbsolutePath" name="path" />**** > > <field > column="fileSize" name="size" />**** > > <field > column="fileLastModified" name="lastModified" />**** > > **** > > > <entity ** > ** > > > name="documentImport" **** > > > processor="TikaEntityProcessor"**** > > > url="${files.fileAbsolutePath}" **** > > > format="text">**** > > > <field column="file" name="fileName"/>**** > > > <field column="Author" name="author" meta="true"/>**** > > > <field column="title" name="title" meta="true"/>**** > > > <field column="text" name="text"/>**** > > > </entity>* > *** > > </entity>**** > > </document> **** > > </dataConfig> **** > > ** ** > > ** ** > > --**** > > Gian Maria Ricci**** > > Mobile: +39 320 0136949**** > > <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635> [image: > https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQyg0wiW_QuTxl-rn > uVR2P0jGuj4qO3I9attctCNarL--FC3vdPYg]<http://www.linkedin.com/in/gianm > ariaricci> > [image: > https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcT8z0HpwpDSjDWw1I > 59Yx7HmF79u-NnP0NYeYYyEyWM1WtIbOl7]<https://twitter.com/alkampfer> > [image: > https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQQWMj687BGGypKMU > Tub_lkUrull1uU2LTx0K2tDBeu3mNUr7Oxlg]<http://feeds.feedburner.com/Alka > mpferEng> > [image: > https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSkTG_lPTPFe470xf > DtiInUtseqKcuV_lvI5h_-8t_3PsY5ikg3] > **** > > ** ** > > ** ** >