yeah, i want to use DIH and i tried config my file dataconfig. but it is wrong. This is my config:
*<dataConfig> <dataSource type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://ipAddress;databaseName=VTC_Edu" user="myuser" password="mypass" name="VTCEduDocument"/> <dataSource type="BinURLDataSource" name="dsurl"/> <document> <entity name="VTCEduDocument" pk="pk_document_id" query="select TOP 10 pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document]" transformer="vn.vtc.solr.transformer.ImageFilter,vn.vtc.solr.transformer.RemoveHTML,RegexTransformer,TemplateTransformer,vn.vtc.solr.transformer.vntransformer,vn.vtc.solr.correctUnicodeString.correctUnicodeString,vn.vtc.solr.unescapeHtmlString.UnescapeHtmlString,vn.vtc.solr.correctISOString.correctISOString" > <field column="pk_document_id" name="pk_document_id" /> <field column="s_path_origin" name="s_path_origin" /> </entity> <entity processor="TikaEntityProcessor" dataSource="dsurl" format="text" url= "http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin}"> <field column="Author" name="author" meta="true"/> <field column="title" name="title" meta="true"/> <field column="text" name="text"/> </entity> </document> </dataConfig>* And here error: *EVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in invoking url null Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:89) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392) Caused by: java.net.MalformedURLException: no protocol: nullselect TOP 10 pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document] at java.net.URL.<init>(URL.java:567) at java.net.URL.<init>(URL.java:464) at java.net.URL.<init>(URL.java:413) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:81) ... 10 more* ??? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3348149.html Sent from the Solr - User mailing list archive at Nabble.com.