I'm confused now..
so, my last question:
I add this in my solrconfig.xml:

<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
  <lst name="defaults">
    <str name="config">c:\solr\conf\db-config.xml</str>
  </lst>
</requestHandler>


And I wrote my db-config.xml like this:
<dataConfig>
  <dataSource type="BinFileDataSource" name="bin" />
    <document>
      <entity name="sd"
        processor="FileListEntityProcessor"
        newerThan="'NOW-30DAYS'"
        fileName=".*pdf$"
        baseDir="D:\myfiles"
        recursive="true"
        rootEntity="false"
        transformer="DateFormatTransformer"
      >
        <entity name="tika-test" processor="TikaEntityProcessor"
url="${sd.fileAbsolutePath}" format="text" dataSource="bin">
         <field column="author"  name="author" meta="true"/>
   <field column="title" name="title" meta="true"/>
     <field column="description" name="description" />
     <field column="comments" name="comments" />
     <field column="content_type" name="content_type" />
     <field column="last_modified" name="last_modified" />
        </entity>

        <!-- field column="fileLastModified" name="date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" / -->
        <field column="fileSize" name="size"/>
        <field column="file" name="filename"/>
    </entity>
  </document>
</dataConfig>
that's must work, in your opinion, or you see an error in this code?
thanks,
alessio



Il giorno 17 febbraio 2012 21:29, Erick Erickson
<erickerick...@gmail.com>ha scritto:

> Sorry, my error! In that case you *do* have to do some fiddling to get
> it all to work.
>
> Good Luck!
> Erick
>
> On Fri, Feb 17, 2012 at 3:27 PM, alessio crisantemi
> <alessio.crisant...@gmail.com> wrote:
> > i try...but i works with solr 1.4.1....
> >
> > Il giorno 17 febbraio 2012 15:59, Erick Erickson
> > <erickerick...@gmail.com>ha scritto:
> >
> >> You should not have to do anything with Maven, the instructions
> >> you followed were from 1.4.1 days......
> >>
> >> Assuming you're working with a 3.x build, here's a data-config
> >> that worked for me, just a straight distro. But note a couple of things:
> >> 1> for simplicity, I changed the schema.xml to NOT require
> >> the id field. You'll have to change this back probably and
> >> select a good <uniqueKey>
> >> 2> I had to add this line to solrconfig.xml to find the path:
> >> <lib dir="../../dist/"
> >> regex="apache-solr-dataimporthandler-extras-\d.*\.jar"/>
> >> 3> If this all works without errors in the Solr log and you still
> >>     can't find anything, be sure you issue a commit.
> >>
> >> Best
> >> Erick
> >>
> >> <dataConfig>
> >>  <dataSource name="bin" type="BinFileDataSource"/>
> >>  <document>
> >>    <entity baseDir="/Users/Erick/testdocs" fileName=".*pdf" name="sd"
> >> processor="FileListEntityProcessor" recursive="true"
> >> rootEntity="false">
> >>      <entity dataSource="bin" format="text" name="tika-test"
> >> processor="TikaEntityProcessor" url="${sd.fileAbsolutePath}">
> >>        <field column="Author" meta="true" name="author"/>
> >>        <field column="Content-Type" meta="true" name="title"/>
> >>        <!-- field column="title" name="title" meta="true"/ -->
> >>        <field column="text" name="text"/>
> >>      </entity>
> >>      <!-- field column="fileLastModified" name="date"
> >> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" / -->
> >>      <field column="fileSize" meta="true" name="size"/>
> >>    </entity>
> >>  </document>
> >> </dataConfig>
> >> On Fri, Feb 17, 2012 at 9:35 AM, alessio crisantemi
> >> <alessio.crisant...@gmail.com> wrote:
> >> > thanks gora for your help.
> >> > I installed Maven and downloaded Tika following the guide: But I have
> an
> >> > errore during the built of Tika about 'tika compiler', and the maven
> >> > installation of Tika is stopped.
> >> >
> >> > there is another way?
> >> > thank you
> >> > a.
> >> >
> >> > 2012/2/16 Gora Mohanty <g...@mimirtech.com>
> >> >
> >> >> On 16 February 2012 21:37, alessio crisantemi
> >> >> <alessio.crisant...@gmail.com> wrote:
> >> >> > here the log:
> >> >> >
> >> >> >
> >> >> > org.apache.solr.handler.dataimport.DataImporter doFullImport
> >> >> > Grave: Full Import failed
> >> >> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> >> 'baseDir'
> >> >> is
> >> >> > a required attribute Processing Document # 1
> >> >> [...]
> >> >>
> >> >> The exception message above is pretty clear. You need to define a
> >> >> baseDir attribute for the second entity.
> >> >>
> >> >> However, even if you fix this, the setup will *not* work for indexing
> >> >> PDFs. Did you read the URLs that I sent earlier?
> >> >>
> >> >> Regards,
> >> >> Gora
> >> >>
> >>
>

Reply via email to