You should not have to do anything with Maven, the instructions
you followed were from 1.4.1 days......

Assuming you're working with a 3.x build, here's a data-config
that worked for me, just a straight distro. But note a couple of things:
1> for simplicity, I changed the schema.xml to NOT require
the id field. You'll have to change this back probably and
select a good <uniqueKey>
2> I had to add this line to solrconfig.xml to find the path:
<lib dir="../../dist/" regex="apache-solr-dataimporthandler-extras-\d.*\.jar"/>
3> If this all works without errors in the Solr log and you still
     can't find anything, be sure you issue a commit.

Best
Erick

<dataConfig>
  <dataSource name="bin" type="BinFileDataSource"/>
  <document>
    <entity baseDir="/Users/Erick/testdocs" fileName=".*pdf" name="sd"
processor="FileListEntityProcessor" recursive="true"
rootEntity="false">
      <entity dataSource="bin" format="text" name="tika-test"
processor="TikaEntityProcessor" url="${sd.fileAbsolutePath}">
        <field column="Author" meta="true" name="author"/>
        <field column="Content-Type" meta="true" name="title"/>
        <!-- field column="title" name="title" meta="true"/ -->
        <field column="text" name="text"/>
      </entity>
      <!-- field column="fileLastModified" name="date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" / -->
      <field column="fileSize" meta="true" name="size"/>
    </entity>
  </document>
</dataConfig>
On Fri, Feb 17, 2012 at 9:35 AM, alessio crisantemi
<alessio.crisant...@gmail.com> wrote:
> thanks gora for your help.
> I installed Maven and downloaded Tika following the guide: But I have an
> errore during the built of Tika about 'tika compiler', and the maven
> installation of Tika is stopped.
>
> there is another way?
> thank you
> a.
>
> 2012/2/16 Gora Mohanty <g...@mimirtech.com>
>
>> On 16 February 2012 21:37, alessio crisantemi
>> <alessio.crisant...@gmail.com> wrote:
>> > here the log:
>> >
>> >
>> > org.apache.solr.handler.dataimport.DataImporter doFullImport
>> > Grave: Full Import failed
>> > org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir'
>> is
>> > a required attribute Processing Document # 1
>> [...]
>>
>> The exception message above is pretty clear. You need to define a
>> baseDir attribute for the second entity.
>>
>> However, even if you fix this, the setup will *not* work for indexing
>> PDFs. Did you read the URLs that I sent earlier?
>>
>> Regards,
>> Gora
>>

Reply via email to