indexing data from rich documents - Tika with solr3.1
Hi everyone, Now i have had a problem with tika and solr. I successed in index data from various file formats (pdf, doc...) with a file absolute path. but now I have a link from internet (ex: http://myweb/filename.pdf). I want to index from this link, But it's not ok. I don't why? This is my file dataconfig.xml: * http://myweb/filename.pdf"; format="text" dataSource="bin" > * when i change url=" http://myweb/filename.pdf"; by a file absolute path, it work very good. Any one know this? Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3322555.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing data from rich documents - Tika with solr3.1
oh, it is good for me. Thank Erik Hatcher-4 very much. I have done to index from https. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3326971.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing data from rich documents - Tika with solr3.1
Hi, Can you explain me this problem? I have indexed data from multi file which use tika libs. And i have indexed data from http. But only one file (ex: http://myweb/filename.pdf). Now i have many file formats in a http path (ex:http://myweb/files/). I tried index data from a http path but it's not work. It is my data-config. * http://www.lc.unsw.edu.au/onlib/pdf/"; recursive="true" rootEntity="false" transformer="DateFormatTransformer" > * Error: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' value: http://www.lc.unsw.edu.au/onlib/pdf/ is not a directory Processing Document # 1 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:124) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:69) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:552) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392) Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3331651.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing data from rich documents - Tika with solr3.1
Hi Erick Erickson, Now, we have many files format(doc, ppt, pdf, ...), File's purpose serve to search details content of education in that files. Because i am new solr, so maybe i understand not enough depth about Apache Tika. At the moment i can't index pdf files from http, with one file is ok. Thank for your attention. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3337963.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing data from rich documents - Tika with solr3.1
Hi Erik Hatcher-4 I tried index from your url. But i have a problem. In your case, you knew a files absolute path (Dir.new("/Users/erikhatcher/apache-solr-3.3.0/docs"). So you can indexed it. In my case, i don't know a files absolute path. I only know http's address where have files (ex: you can see this link as reference: http://www.lc.unsw.edu.au/onlib/pdf/). Another ways? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3347706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing data from rich documents - Tika with solr3.1
yeah, i want to use DIH and i tried config my file dataconfig. but it is wrong. This is my config: * http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin}";> * And here error: *EVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in invoking url null Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:89) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392) Caused by: java.net.MalformedURLException: no protocol: nullselect TOP 10 pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document] at java.net.URL.(URL.java:567) at java.net.URL.(URL.java:464) at java.net.URL.(URL.java:413) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:81) ... 10 more* ??? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3348149.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing data from rich documents - Tika with solr3.1
Hi all, thanks everyone who help me very much, i indexed form http using DIH. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3351278.html Sent from the Solr - User mailing list archive at Nabble.com.
How to skip current document to index data from DIP
hi, can anyone help me this problem? I'm using tika to index data from rich documents and index by http request. I queried from database to get fields and then combined with Tika. everything is ok, but i face to face with this error "FileNotFoundException". I known this error, but I want skip documents to continue index data. Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-skip-current-document-to-index-data-from-DIP-tp3381894p3381894.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to skip current document to index data from DIP
Hi, thanks for your reply. But, when i set attribute onError="skip", There is no data which import. This is my config. * http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin}"; transformer="com.vtc.search.Converter" onError="skip" > * Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-skip-current-document-to-index-data-from-DIH-tp3381894p3388700.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to skip current document to index data from DIP
Hi Erick Erickson Thank you for reply for me, In my config, i indexed successful data from HTTP using Tika. I combined a field and url in Tika to get file by that http. But during indexing, i have seen some URL which is not exist or notice: *Caused by: java.io.FileNotFoundException: http://media.gox.vn/edu/document/original/1/2704201010071760_Bai25.ppt * it mean that, this file is not exist in server. i want to skip file (documents) to index next files. I tried to use *onError="skip"* to continue index from file rich documents but it doesn't work and stop at. Is there a way to overcome this problem? Best Regard Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-skip-current-document-to-index-data-from-DIH-tp3381894p3392055.html Sent from the Solr - User mailing list archive at Nabble.com.
Autocomplete with Solr 3.1
Hi all, when i use autocomplete to suggest like google: http://www.google.com/webhp?complete=1&hl=en and follow this url http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/ to config my project, but when i tested with more two terms in my query, it's not right, i don't know why? Can anyone tell me ? Thanks for help. -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3202214.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Different options for autocomplete/autosuggestion
HI Bell, i used autocomplete in solr 3.1. same this: autocomplete org.apache.solr.spelling.suggest.Suggester org.apache.solr.spelling.suggest.jaspell.JaspellLookup autocomplete true and i make following URL* http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/* to index my data. and had a problem. with one word, it have done very good. But when i typed more two words, rerults return not right. I don't know why? Can any one know this problem? Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Different-options-for-autocomplete-autosuggestion-tp2678899p3203032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocomplete with Solr 3.1
Hi Klein, Thanks for your reply. But i tried some suggestion with solr, and results return is good. But i want to using search component with solr 3.1. Now i have had some problems with Suggester. i think my problem perhaps about in schema file. This is schema file: And i defined fields: where: fieldType with text_auto: In file solrconfig.xml i defined: suggest org.apache.solr.spelling.suggest.Suggester org.apache.solr.spelling.suggest.tst.TSTLookup search_autocomplete true true suggest 10 true spellcheck-autocomplete Can any one help??? -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3204176.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocomplete with Solr 3.1
Nobody can help me -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3206095.html Sent from the Solr - User mailing list archive at Nabble.com.