Standalone Tika can also run in a network server mode. That increases data roundtrips but gives you more options. Even in .net .
Regards, Alex On 27 May 2013 04:22, "Gian Maria Ricci" <alkamp...@nablasoft.com> wrote: > Thanks for the help. > > @Alexandre: Thanks for the suggestion, I'll try to use an > ExtractingRequestHandler, I thought that I was missing some DIH option :). > > @Erik: I'm interested in knowing them all to do various form of analysis. I > have documents coming from heterogeneous sources and I'm interested in > searching inside the content, but also being able to extract all possible > metadata. I'm working in .Net so it is useful letting tika doing everything > for me directly in solr and then retrieve all metadata for matched > documents. > > Thanks again to everyone. > > -- > Gian Maria Ricci > Mobile: +39 320 0136949 > > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Sunday, May 26, 2013 5:30 PM > To: solr-user@lucene.apache.org; Gian Maria Ricci > Subject: Re: Tika: How can I import automatically all metadata without > specifiying them explicitly > > In addition to Alexandre's comment: > > bq: ...I'd like to import in my index all metadata > > Be a little careful here, this isn't actually very useful in my experience. > Sure > it's nice to have all that data in the index, but... how do you search it > meaningfully? > > Consider that some doc may have an "author" metadata field. Another may > have > a "last editor" field. Yet another may have a "main author" field. If you > add all these as their field name, what do you do to search for "author"? > Somehow you have to create a mapping between the various metadata names and > something that's searchable, why not do this at index time? > > Not to mention I've seen this done and the result may be literally hundreds > of different metadata fields which are not very useful. > > All that said, it may be perfectly valid to inde them all, but before going > there it's worth considering whether the result is actually _useful_. > > Best > Erick > > > On Sat, May 25, 2013 at 4:44 AM, Gian Maria Ricci > <alkamp...@nablasoft.com>wrote: > > > Hi to everyone,**** > > > > ** ** > > > > I've configured import of a document folder with > > FileListEntityProcessor, everything went smooth on the first try, but > > I have a simple question. I'm able to map metadata without any > > problem, but I'd like to import in my index all metadata, not only > > those I've configured with field nodes. In this example I've imported > > Author and title, but I does not know in advance which metadata a > > document could have and I wish to have all of them inside my > > index.**** > > > > ** ** > > > > Here is my import config. It is the first try with importing with tika > > and probably I'm missing a simple stuff.**** > > > > ** ** > > > > <dataConfig> **** > > > > <dataSource type="BinFileDataSource" />**** > > > > <document>**** > > > > <entity name="files" > > dataSource="null" rootEntity="false"**** > > > > > > processor="FileListEntityProcessor" **** > > > > baseDir="c:/temp/docs" > > fileName=".*\.(doc)|(pdf)|(docx)"**** > > > > onError="skip"**** > > > > recursive="true">**** > > > > <field > > column="file" name="id" />**** > > > > <field > > column="fileAbsolutePath" name="path" />**** > > > > <field > > column="fileSize" name="size" />**** > > > > <field > > column="fileLastModified" name="lastModified" />**** > > > > **** > > > > > > <entity ** > > ** > > > > > > name="documentImport" **** > > > > > > processor="TikaEntityProcessor"**** > > > > > > url="${files.fileAbsolutePath}" **** > > > > > > format="text">**** > > > > > > <field column="file" name="fileName"/>**** > > > > > > <field column="Author" name="author" meta="true"/>**** > > > > > > <field column="title" name="title" meta="true"/>**** > > > > > > <field column="text" name="text"/>**** > > > > > > </entity>* > > *** > > > > </entity>**** > > > > </document> **** > > > > </dataConfig> **** > > > > ** ** > > > > ** ** > > > > --**** > > > > Gian Maria Ricci**** > > > > Mobile: +39 320 0136949**** > > > > <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635> > [image: > > https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQyg0wiW_QuTxl-rn > > uVR2P0jGuj4qO3I9attctCNarL--FC3vdPYg]<http://www.linkedin.com/in/gianm > > ariaricci> > > [image: > > https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcT8z0HpwpDSjDWw1I > > 59Yx7HmF79u-NnP0NYeYYyEyWM1WtIbOl7]<https://twitter.com/alkampfer> > > [image: > > https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQQWMj687BGGypKMU > > Tub_lkUrull1uU2LTx0K2tDBeu3mNUr7Oxlg]<http://feeds.feedburner.com/Alka > > mpferEng> > > [image: > > https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSkTG_lPTPFe470xf > > DtiInUtseqKcuV_lvI5h_-8t_3PsY5ikg3] > > **** > > > > ** ** > > > > ** ** > > >