Re: How to index PDF file stored in SQL Server 2008

Roy Liu Thu, 07 Apr 2011 19:46:11 -0700

Thanks Lance,

I'm using Solr 1.4.
If I want to using TikaEP, need to upgrade to Solr 3.1 or import jar files?


Best Regards,
Roy Liu


On Fri, Apr 8, 2011 at 10:22 AM, Lance Norskog <goks...@gmail.com> wrote:

> You need the TikaEntityProcessor to unpack the PDF image. You are
> sticking binary blobs into the index. Tika unpacks the text out of the
> file.
>
> TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release.
>
> On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu <liuchua...@gmail.com> wrote:
> > Hi,
> >
> > I have a table named *attachment *in MS SQL Server 2008.
> >
> > COLUMN    TYPE
> > -------------     ----------------
> > id               int
> > title            varchar(200)
> > attachment image
> >
> > I need to index the attachment(store pdf files) column from database via
> > DIH.
> >
> > After access this URL, it returns "Indexing completed. Added/Updated: 5
> > documents. Deleted 0 documents."
> > http://localhost:8080/solr/dataimport?command=full-import
> >
> > However, I can not search anything.
> >
> > Anyone can help me ?
> >
> > Thanks.
> >
> >
> > --------------------
> > *data-config-sql.xml*
> > <dataConfig>
> >  <dataSource type="JdbcDataSource"
> >              driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> >              url="jdbc:sqlserver://localhost:1433;databaseName=master"
> >              user="user"
> >              password="pw"/>
> >  <document>
> >    <entity name="doc"
> >            query="select id,title,attachment from attachment">
> >    </entity>
> >  </document>
> > </dataConfig>
> >
> > *schema.xml*
> > <field name="attachment" type="text" indexed="true" stored="true"/>
> >
> >
> >
> > Best Regards,
> > Roy Liu
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Re: How to index PDF file stored in SQL Server 2008

Reply via email to