I'm confused now.. so, my last question: I add this in my solrconfig.xml:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">c:\solr\conf\db-config.xml</str> </lst> </requestHandler> And I wrote my db-config.xml like this: <dataConfig> <dataSource type="BinFileDataSource" name="bin" /> <document> <entity name="sd" processor="FileListEntityProcessor" newerThan="'NOW-30DAYS'" fileName=".*pdf$" baseDir="D:\myfiles" recursive="true" rootEntity="false" transformer="DateFormatTransformer" > <entity name="tika-test" processor="TikaEntityProcessor" url="${sd.fileAbsolutePath}" format="text" dataSource="bin"> <field column="author" name="author" meta="true"/> <field column="title" name="title" meta="true"/> <field column="description" name="description" /> <field column="comments" name="comments" /> <field column="content_type" name="content_type" /> <field column="last_modified" name="last_modified" /> </entity> <!-- field column="fileLastModified" name="date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" / --> <field column="fileSize" name="size"/> <field column="file" name="filename"/> </entity> </document> </dataConfig> that's must work, in your opinion, or you see an error in this code? thanks, alessio Il giorno 17 febbraio 2012 21:29, Erick Erickson <erickerick...@gmail.com>ha scritto: > Sorry, my error! In that case you *do* have to do some fiddling to get > it all to work. > > Good Luck! > Erick > > On Fri, Feb 17, 2012 at 3:27 PM, alessio crisantemi > <alessio.crisant...@gmail.com> wrote: > > i try...but i works with solr 1.4.1.... > > > > Il giorno 17 febbraio 2012 15:59, Erick Erickson > > <erickerick...@gmail.com>ha scritto: > > > >> You should not have to do anything with Maven, the instructions > >> you followed were from 1.4.1 days...... > >> > >> Assuming you're working with a 3.x build, here's a data-config > >> that worked for me, just a straight distro. But note a couple of things: > >> 1> for simplicity, I changed the schema.xml to NOT require > >> the id field. You'll have to change this back probably and > >> select a good <uniqueKey> > >> 2> I had to add this line to solrconfig.xml to find the path: > >> <lib dir="../../dist/" > >> regex="apache-solr-dataimporthandler-extras-\d.*\.jar"/> > >> 3> If this all works without errors in the Solr log and you still > >> can't find anything, be sure you issue a commit. > >> > >> Best > >> Erick > >> > >> <dataConfig> > >> <dataSource name="bin" type="BinFileDataSource"/> > >> <document> > >> <entity baseDir="/Users/Erick/testdocs" fileName=".*pdf" name="sd" > >> processor="FileListEntityProcessor" recursive="true" > >> rootEntity="false"> > >> <entity dataSource="bin" format="text" name="tika-test" > >> processor="TikaEntityProcessor" url="${sd.fileAbsolutePath}"> > >> <field column="Author" meta="true" name="author"/> > >> <field column="Content-Type" meta="true" name="title"/> > >> <!-- field column="title" name="title" meta="true"/ --> > >> <field column="text" name="text"/> > >> </entity> > >> <!-- field column="fileLastModified" name="date" > >> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" / --> > >> <field column="fileSize" meta="true" name="size"/> > >> </entity> > >> </document> > >> </dataConfig> > >> On Fri, Feb 17, 2012 at 9:35 AM, alessio crisantemi > >> <alessio.crisant...@gmail.com> wrote: > >> > thanks gora for your help. > >> > I installed Maven and downloaded Tika following the guide: But I have > an > >> > errore during the built of Tika about 'tika compiler', and the maven > >> > installation of Tika is stopped. > >> > > >> > there is another way? > >> > thank you > >> > a. > >> > > >> > 2012/2/16 Gora Mohanty <g...@mimirtech.com> > >> > > >> >> On 16 February 2012 21:37, alessio crisantemi > >> >> <alessio.crisant...@gmail.com> wrote: > >> >> > here the log: > >> >> > > >> >> > > >> >> > org.apache.solr.handler.dataimport.DataImporter doFullImport > >> >> > Grave: Full Import failed > >> >> > org.apache.solr.handler.dataimport.DataImportHandlerException: > >> 'baseDir' > >> >> is > >> >> > a required attribute Processing Document # 1 > >> >> [...] > >> >> > >> >> The exception message above is pretty clear. You need to define a > >> >> baseDir attribute for the second entity. > >> >> > >> >> However, even if you fix this, the setup will *not* work for indexing > >> >> PDFs. Did you read the URLs that I sent earlier? > >> >> > >> >> Regards, > >> >> Gora > >> >> > >> >