Hi Andreas, When using XPathEntityProcessor<http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor>your DataSource must be of type DataSource<Reader>. You shouldn't be using BinURLDataSource, it's giving you the cast exception. Use URLDataSource<https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/URLDataSource.html> or FileDataSource<https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/FileDataSource.html>instead.
I don't think you need to specify namespaces, at least you didn't used to. The other thing that I've noticed is that the anywhere xpath expression // doesn't always work in DIH. You might have to be more specific. Cheers, Tricia On Sun, Sep 29, 2013 at 9:47 AM, Andreas Owen <a...@conx.ch> wrote: > how dum can you get. obviously quite dum... i would have to analyze the > html-pages with a nested instance like this: > > <entity name="rec" processor="XPathEntityProcessor" > url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml" > forEach="/docs/doc" dataSource="main"> > > <entity name="htm" processor="XPathEntityProcessor" > url="${rec.urlParse}" forEach="/xhtml:html" dataSource="dataUrl"> > <field column="text" xpath="//content" /> > <field column="h_2" xpath="//body" /> > <field column="text_nohtml" xpath="//text" /> > <field column="h_1" xpath="//h:h1" /> > </entity> > </entity> > > but i'm pretty sure the foreach is wrong and the xpath expressions. in the > moment i getting the following error: > > Caused by: java.lang.RuntimeException: > org.apache.solr.handler.dataimport.DataImportHandlerException: > java.lang.ClassCastException: > sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast > to java.io.Reader > > > > > > On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote: > > > ok i see what your getting at but why doesn't the following work: > > > > <field xpath="//h:h1" column="h_1" /> > > <field column="text" xpath="/xhtml:html/xhtml:body" /> > > > > i removed the tiki-processor. what am i missing, i haven't found > anything in the wiki? > > > > > > On 28. Sep 2013, at 12:28 AM, P Williams wrote: > > > >> I spent some more time thinking about this. Do you really need to use > the > >> TikaEntityProcessor? It doesn't offer anything new to the document you > are > >> building that couldn't be accomplished by the XPathEntityProcessor alone > >> from what I can tell. > >> > >> I also tried to get the Advanced > >> Parsing<http://wiki.apache.org/solr/TikaEntityProcessor>example to > >> work without success. There are some obvious typos (<document> > >> instead of </document>) and an odd order to the pieces (<dataSources> is > >> enclosed by <document>). It also looks like > >> FieldStreamDataSource< > http://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.html > >is > >> the one that is meant to work in this context. If Koji is still around > >> maybe he could offer some help? Otherwise this bit of erroneous > >> instruction should probably be removed from the wiki. > >> > >> Cheers, > >> Tricia > >> > >> $ svn diff > >> Index: > >> > solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java > >> =================================================================== > >> --- > >> > solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java > >> (revision 1526990) > >> +++ > >> > solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java > >> (working copy) > >> @@ -99,13 +99,13 @@ > >> runFullImport(getConfigHTML("identity")); > >> assertQ(req("*:*"), testsHTMLIdentity); > >> } > >> - > >> + > >> private String getConfigHTML(String htmlMapper) { > >> return > >> "<dataConfig>" + > >> " <dataSource type='BinFileDataSource'/>" + > >> " <document>" + > >> - " <entity name='Tika' format='xml' > >> processor='TikaEntityProcessor' " + > >> + " <entity name='Tika' format='html' > >> processor='TikaEntityProcessor' " + > >> " url='" + > >> getFile("dihextras/structured.html").getAbsolutePath() + "' " + > >> ((htmlMapper == null) ? "" : (" htmlMapper='" + htmlMapper + > >> "'")) + ">" + > >> " <field column='text'/>" + > >> @@ -114,4 +114,36 @@ > >> "</dataConfig>"; > >> > >> } > >> + private String[] testsHTMLH1 = { > >> + "//*[@numFound='1']" > >> + , "//str[@name='h1'][contains(.,'H1 Header')]" > >> + }; > >> + > >> + @Test > >> + public void testTikaHTMLMapperSubEntity() throws Exception { > >> + runFullImport(getConfigSubEntity("identity")); > >> + assertQ(req("*:*"), testsHTMLH1); > >> + } > >> + > >> + private String getConfigSubEntity(String htmlMapper) { > >> + return > >> + "<dataConfig>" + > >> + "<dataSource type='BinFileDataSource' name='bin'/>" + > >> + "<dataSource type='FieldStreamDataSource' name='fld'/>" + > >> + "<document>" + > >> + "<entity name='tika' processor='TikaEntityProcessor' url='" + > >> getFile("dihextras/structured.html").getAbsolutePath() + "' > >> dataSource='bin' format='html' rootEntity='false'>" + > >> + "<!--Do appropriate mapping here meta=\"true\" means it is a > >> metadata field -->" + > >> + "<field column='Author' meta='true' name='author'/>" + > >> + "<field column='title' meta='true' name='title'/>" + > >> + "<!--'text' is an implicit field emited by TikaEntityProcessor > . > >> Map it appropriately-->" + > >> + "<field name='text' column='text'/>" + > >> + "<entity name='detail' type='XPathEntityProcessor' > forEach='/html' > >> dataSource='fld' dataField='tika.text' rootEntity='true' >" + > >> + "<field xpath='//div' column='foo'/>" + > >> + "<field xpath='//h1' column='h1' />" + > >> + "</entity>" + > >> + "</entity>" + > >> + "</document>" + > >> + "</dataConfig>"; > >> + } > >> + > >> } > >> Index: > >> > solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml > >> =================================================================== > >> --- > >> > solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml > >> (revision 1526990) > >> +++ > >> > solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml > >> (working copy) > >> @@ -194,6 +194,8 @@ > >> <field name="title" type="string" indexed="true" stored="true"/> > >> <field name="author" type="string" indexed="true" stored="true" /> > >> <field name="text" type="text" indexed="true" stored="true" /> > >> + <field name="h1" type="text" indexed="true" stored="true" /> > >> + <field name="foo" type="text" indexed="true" stored="true" /> > >> > >> </fields> > >> <!-- field for the QueryParser to use when an explicit fieldname is > >> absent --> > >> > >> > >> I find the SqlEntityProcessor part particularly odd. That's the default > >> right?: > >> 2405 T12 C1 oashd.SqlEntityProcessor.initQuery ERROR The query failed > >> 'null' java.lang.RuntimeException: unsupported type : class > java.lang.String > >> at > >> > org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:89) > >> at > >> > org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:1) > >> at > >> > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) > >> at > >> > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) > >> at > >> > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) > >> at > >> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:469) > >> at > >> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:495) > >> at > >> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408) > >> at > >> > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323) > >> at > >> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231) > >> at > >> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411) > >> at > >> > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476) > >> at > >> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) > >> at > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) > >> at org.apache.solr.util.TestHarness.query(TestHarness.java:291) > >> at > >> > org.apache.solr.handler.dataimport.AbstractDataImportHandlerTestCase.runFullImport(AbstractDataImportHandlerTestCase.java:96) > >> at > >> > org.apache.solr.handler.dataimport.TestTikaEntityProcessor.testTikaHTMLMapperSubEntity(TestTikaEntityProcessor.java:124) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > >> at > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> at java.lang.reflect.Method.invoke(Method.java:601) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) > >> at > >> > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) > >> at > >> > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > >> at > >> > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) > >> at > >> > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) > >> at > >> > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > >> at > >> > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) > >> at > >> > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) > >> at > >> > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > >> at > >> > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > >> at > >> > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) > >> at > >> > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) > >> at > >> > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) > >> at > >> > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) > >> at > >> > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > >> at > >> > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) > >> at > >> > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) > >> at > >> > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) > >> at > >> > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > >> at > >> > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) > >> at > >> > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) > >> at > >> > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > >> at > >> > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) > >> at > >> > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > >> at > >> > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) > >> at > >> > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) > >> at > >> > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > >> at > >> > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) > >> at java.lang.Thread.run(Thread.java:722) > >> > >> > >> > >> On Fri, Sep 27, 2013 at 3:55 AM, Andreas Owen <a...@conx.ch> wrote: > >> > >>> i removed the FieldReaderDataSource and dataSource="fld" but it didn't > >>> help. i get the following for each document: > >>> DataImportHandlerException: Exception in invoking url null > >>> Processing Document # 9 > >>> nullpointerexception > >>> > >>> > >>> On 26. Sep 2013, at 8:39 PM, P Williams wrote: > >>> > >>>> Hi, > >>>> > >>>> Haven't tried this myself but maybe try leaving out the > >>>> FieldReaderDataSource entirely. From my quick searching looks like > it's > >>>> tied to SQL. Did you try copying the > >>>> http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing > example > >>>> exactly? What happens when you leave out FieldReaderDataSource? > >>>> > >>>> Cheers, > >>>> Tricia > >>>> > >>>> > >>>> On Thu, Sep 26, 2013 at 4:17 AM, Andreas Owen <a...@conx.ch> wrote: > >>>> > >>>>> i'm using solr 4.3.1 and the dataimporter. i am trying to use > >>>>> XPathEntityProcessor within the TikaEntityProcessor for indexing > >>> html-pages > >>>>> but i'm getting this error for each document. i have also tried > >>>>> dataField="tika.text" and dataField="text" to no avail. the nested > >>>>> XPathEntityProcessor "detail" creates the error, the rest works fine. > >>> what > >>>>> am i doing wrong? > >>>>> > >>>>> error: > >>>>> > >>>>> ERROR - 2013-09-26 12:08:49.006; > >>>>> org.apache.solr.handler.dataimport.SqlEntityProcessor; The query > failed > >>>>> 'null' > >>>>> java.lang.ClassCastException: java.io.StringReader cannot be cast to > >>>>> java.util.Iterator > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) > >>>>> at > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) > >>>>> at > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) > >>>>> at > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > >>>>> at org.eclipse.jetty.server.Server.handle(Server.java:365) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) > >>>>> at > >>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) > >>>>> at > >>>>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > >>>>> at java.lang.Thread.run(Unknown Source) > >>>>> ERROR - 2013-09-26 12:08:49.022; > org.apache.solr.common.SolrException; > >>>>> Exception in entity : > >>>>> detail:org.apache.solr.handler.dataimport.DataImportHandlerException: > >>>>> java.lang.ClassCastException: java.io.StringReader cannot be cast to > >>>>> java.util.Iterator > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:65) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) > >>>>> at > >>>>> > >>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) > >>>>> at > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) > >>>>> at > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) > >>>>> at > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > >>>>> at org.eclipse.jetty.server.Server.handle(Server.java:365) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) > >>>>> at > >>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) > >>>>> at > >>>>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > >>>>> at > >>>>> > >>> > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > >>>>> at java.lang.Thread.run(Unknown Source) > >>>>> Caused by: java.lang.ClassCastException: java.io.StringReader cannot > be > >>>>> cast to java.util.Iterator > >>>>> at > >>>>> > >>> > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) > >>>>> ... 41 more > >>>>> > >>>>> > >>>>> > >>>>> data-config.xml > >>>>> > >>>>> <dataConfig> > >>>>> <dataSource type="BinURLDataSource" name="dataFile"/> > >>>>> <dataSource type="BinURLDataSource" name="dataUrl"/> > >>>>> <dataSource type="URLDataSource" name="main"/> > >>>>> <dataSource type="FieldReaderDataSource" name="fld"/> > >>>>> <document> > >>>>> <entity name="rec" processor="XPathEntityProcessor" > >>>>> > >>> > url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml" > >>>>> forEach="/docs/doc" dataSource="main"> > >>>>> <field column="title" xpath="//title" /> > >>>>> <field column="id" xpath="//id" /> > >>>>> <field column="file" xpath="//file" /> > >>>>> <field column="url" xpath="//url" /> > >>>>> <field column="urlParse" xpath="//urlParse" /> > >>>>> <field column="last_modified" xpath="//last_modified" /> > >>>>> <field column="Author" xpath="//author" /> > >>>>> > >>>>> <entity name="tika" processor="TikaEntityProcessor" > >>>>> url="${rec.urlParse}" dataSource="dataUrl" onError="skip" > format="html"> > >>>>> <field column="text"/> > >>>>> > >>>>> <entity name="detail" > type="XPathEntityProcessor" > >>>>> forEach="/html" dataSource="fld" dataField="${tika.text}" > >>> rootEntity="true" > >>>>> onError="skip"> > >>>>> <field xpath="//h1" column="h_1" /> > >>>>> </entity> > >>>>> </entity> > >>>>> </entity> > >>>>> </document> > >>>>> </dataConfig> > >>> > >>> > >