Hi Jorg,

This is working now.  If you look at SOLR-1583 
(http://issues.apache.org/jira/browse/SOLR-1583) you can see that an 
InputStream was needed from the DataSource for file and URL data sources.  The 
same is true for the FieldReaderDataSource.  I created a class, 
BinFieldReaderDataSource that returns the InputStream rather than a Reader of 
the BLOB.

I am working off the trunk code from a few days ago which I checked out using 
tortoise svn and compiled using the ant that was in my eclipse plugin 
directory, a fairly painless process.

I am somewhat new to open source development, so for now I have just copied the 
text of the java file and my xml config below.

##### BinFieldReaderDataSource.java
public class BinFieldReaderDataSource extends DataSource<InputStream> {
        private static final Logger LOG = LoggerFactory
                        .getLogger(FieldReaderDataSource.class);
        protected VariableResolver vr;
        protected String dataField;
        private String encoding;
        private EntityProcessorWrapper entityProcessor;

        public void init(Context context, Properties initProps) {
                dataField = context.getEntityAttribute("dataField");
                encoding = context.getEntityAttribute("encoding");
                entityProcessor = (EntityProcessorWrapper) 
context.getEntityProcessor();
                /* no op */
        }

        public InputStream getData(String query) {
                Object o = 
entityProcessor.getVariableResolver().resolve(dataField);
                if (o == null) {
                        throw new DataImportHandlerException(SEVERE,
                                        "No field available for name : " + 
dataField);
                }

                if (o instanceof String) {
                        throw new DataImportHandlerException(SEVERE,
                                        "Unsupported field type: String");
                } else if (o instanceof Clob) {
                        throw new DataImportHandlerException(SEVERE,
                                        "Unsupported field type: CLOB");
                } else if (o instanceof Blob) {
                        Blob blob = (Blob) o;
                        try {
                                // Most of the JDBC drivers have 
getBinaryStream defined as
                                // public
                                // so let us just check it
                                Method m = 
blob.getClass().getDeclaredMethod("getBinaryStream");
                                if (Modifier.isPublic(m.getModifiers())) {
                                        return getInputStream(m, blob);
                                } else {
                                        // force invoke
                                        m.setAccessible(true);
                                        return getInputStream(m, blob);
                                }
                        } catch (Exception e) {
                                LOG.info("Unable to get data from BLOB");
                                return null;

                        }
                } else {
                        return null;
                }

        }

        static Reader readCharStream(Clob clob) {
                try {
                        Method m = 
clob.getClass().getDeclaredMethod("getCharacterStream");
                        if (Modifier.isPublic(m.getModifiers())) {
                                return (Reader) m.invoke(clob);
                        } else {
                                // force invoke
                                m.setAccessible(true);
                                return (Reader) m.invoke(clob);
                        }
                } catch (Exception e) {
                        wrapAndThrow(SEVERE, e, "Unable to get reader from 
clob");
                        return null;// unreachable
                }
        }

        private InputStream getInputStream(Method m, Blob blob)
                        throws IllegalAccessException, 
InvocationTargetException,
                        UnsupportedEncodingException {
                InputStream is = (InputStream) m.invoke(blob);
                return is;
        }

        public void close() {

        }
}

## Tika-data-config.xml
<dataConfig>
  <dataSource name="f1" type="BinFieldReaderDataSource" />
  <dataSource name="orcle" driver="oracle.jdbc.driver.OracleDriver" 
url="jdbc:oracle:thin:user/p...@host:1521:sid" />
  <document>
        <entity dataSource="orcle" name="attach" query="select attachment from 
testtable2">
                <entity dataSource="f1" processor="TikaEntityProcessor" 
url="attachment" dataField="attach.ATTACHMENT" format="text">
                        <field column="text" name="text" />
                </entity>
        </entity>
  </document>
</dataConfig>


Nirmal Shah


-----Original Message-----
From: Jorg Heymans [mailto:jorg.heym...@gmail.com] 
Sent: Tuesday, January 26, 2010 3:43 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

Hi Shah,

I am assuming you are talking about the integration of SOLR-1358, i am very
interested in this feature as well. Did you get it to work ? Is there a
snapshot build available for this somewhere or do i have to build solr from
source myself ?

Thanks,
Jorg

On Mon, Jan 25, 2010 at 6:27 PM, Shah, Nirmal <ns...@columnit.com> wrote:

> Hi,
>
>
>
> I am fairly new to Solr and would like to use the DIH to pull rich text
> files (pdfs, etc) from BLOB fields in my database.
>
>
>
> There was a suggestion made to use the FieldReaderDataSource with the
> recently commited TikaEntityProcessor.  Has anyone accomplished this?
>
> This is my configuration, and the resulting error - I'm not sure if I'm
> using the FieldReaderDataSource correctly.  If anyone could shed light
> on whether I am going the right direction or not, it would be
> appreciated.
>
>
>
> ---------------Data-config.xml:
>
> <dataConfig>
>
>   <datasource name="f1" type="FieldReaderDataSource" />
>
>   <dataSource name="orcle" driver="oracle.jdbc.driver.OracleDriver"
> url="jdbc:oracle:thin:un/p...@host:1521:sid" />
>
>      <document>
>
>      <entity dataSource="orcle" name="attach" query="select id as name,
> attachment from testtable2">
>
>         <entity dataSource="f1" processor="TikaEntityProcessor"
> dataField="attach.attachment" format="text">
>
>            <field column="text" name="NAME" />
>
>         </entity>
>
>      </entity>
>
>   </document>
>
> </dataConfig>
>
>
>
>
>
> -------------Debug error:
>
> <response>
>
> <lst name="responseHeader">
>
> <int name="status">0</int>
>
> <int name="QTime">203</int>
>
> </lst>
>
> <lst name="initArgs">
>
> <lst name="defaults">
>
> <str name="config">testdb-data-config.xml</str>
>
> </lst>
>
> </lst>
>
> <str name="command">full-import</str>
>
> <str name="mode">debug</str>
>
> <null name="documents"/>
>
> <lst name="verbose-output">
>
> <lst name="entity:attach">
>
> <lst name="document#1">
>
> <str name="query">select id as name, attachment from testtable2</str>
>
> <str name="time-taken">0:0:0.32</str>
>
> <str>----------- row #1-------------</str>
>
> <str name="NAME">java.math.BigDecimal:2</str>
>
> <str name="ATTACHMENT">oracle.sql.BLOB:oracle.sql.b...@1c8e807</str>
>
> <str>---------------------------------------------</str>
>
> <lst name="entity:253433571801723">
>
> <str name="EXCEPTION">
>
> org.apache.solr.handler.dataimport.DataImportHandlerException: No
> dataSource :f1 available for entity :253433571801723 Processing Document
> # 1
>
>                at
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
> taImporter.java:279)
>
>                at
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
> .java:93)
>
>                at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:97)
>
>                at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:237)
>
>                at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:357)
>
>                at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:383)
>
>                at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :242)
>
>                at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 0)
>
>                at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:331)
>
>                at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :389)
>
>                at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
> ataImportHandler.java:203)
>
>                at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
> ase.java:131)
>
>                at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>
>                at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
> va:338)
>
>                at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
> ava:241)
>
>                at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
> dler.java:1089)
>
>                at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>
>                at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
> 16)
>
>                at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>
>                at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>
>                at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>
>                at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
> Collection.java:211)
>
>                at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
> a:114)
>
>                at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>
>                at org.mortbay.jetty.Server.handle(Server.java:285)
>
>                at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>
>                at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
> ction.java:821)
>
>                at
> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>
>                at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>
>                at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>
>                at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav
> a:226)
>
>                at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
> va:442)
>
>
>
> Thanks,
>
> Nirmal
>
>

Reply via email to