indexing data from rich documents - Tika with solr3.1

2011-09-09 Thread scorpking
Hi everyone, 
Now i have had a problem with tika and solr. I successed in index data from
various file formats (pdf, doc...) with a file absolute path. but now I have
a link from internet (ex: http://myweb/filename.pdf). I want to index from
this link, But it's not ok. I don't why? This is my file dataconfig.xml:

*



http://myweb/filename.pdf"; format="text" dataSource="bin" >







*

when i change url=" http://myweb/filename.pdf"; by a file absolute path, it
work very good. 
Any one know this? 
Thanks for your help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3322555.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing data from rich documents - Tika with solr3.1

2011-09-11 Thread scorpking
oh, it is good for me. Thank Erik Hatcher-4 very much. I have done to index
from https. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3326971.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing data from rich documents - Tika with solr3.1

2011-09-12 Thread scorpking
Hi, 
Can you explain me this problem?
I have indexed data from multi file which use tika libs. And i have indexed
data from http. But only one file (ex: http://myweb/filename.pdf). Now i
have many file formats in a http path (ex:http://myweb/files/). I tried
index data from a http path but it's not work. It is my data-config. 

*


http://www.lc.unsw.edu.au/onlib/pdf/";
recursive="true" rootEntity="false" 
transformer="DateFormatTransformer"
> 








  
 


*

Error: 
Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
'baseDir' value: http://www.lc.unsw.edu.au/onlib/pdf/ is not a directory
Processing Document # 1
at
org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:124)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:69)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:552)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)

Thanks for your help.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3331651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing data from rich documents - Tika with solr3.1

2011-09-14 Thread scorpking
Hi Erick Erickson, 
Now, we have many files format(doc, ppt, pdf, ...), File's purpose serve to
search details content of education in that files. Because i am new solr, so
maybe i understand not enough depth about Apache Tika. At the moment i can't
index pdf files from http, with one file is ok. Thank for your attention. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3337963.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing data from rich documents - Tika with solr3.1

2011-09-18 Thread scorpking
Hi Erik Hatcher-4
I tried index from your url. But i have a problem. In your case, you knew a
files absolute path (Dir.new("/Users/erikhatcher/apache-solr-3.3.0/docs").
So you can indexed it. In my case, i don't know a files absolute path. I
only know http's address where have files (ex: you can see this link as
reference: http://www.lc.unsw.edu.au/onlib/pdf/). Another ways? Thanks 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3347706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing data from rich documents - Tika with solr3.1

2011-09-19 Thread scorpking
yeah, i want to use DIH and i tried config my file dataconfig. but it is
wrong. This is my config:

*







 

   


http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin}";>


 
  
  

*

And here error: 
*EVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception in invoking url null Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:89)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
Caused by: java.net.MalformedURLException: no protocol: nullselect TOP 10
pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document]
at java.net.URL.(URL.java:567)
at java.net.URL.(URL.java:464)
at java.net.URL.(URL.java:413)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:81)
... 10 more*

???
Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3348149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing data from rich documents - Tika with solr3.1

2011-09-20 Thread scorpking
Hi all, thanks everyone who help me very much, i indexed form http using DIH. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3351278.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to skip current document to index data from DIP

2011-09-30 Thread scorpking
hi, 
can anyone help me this problem? I'm using tika to index data from rich
documents and index by http request. I queried from database to get fields
and then combined with Tika. everything is ok, but i face to face with this
error "FileNotFoundException". I known this error, but I want skip documents
to continue index data. 

Thanks for your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-skip-current-document-to-index-data-from-DIP-tp3381894p3381894.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to skip current document to index data from DIP

2011-10-02 Thread scorpking
Hi, thanks for your reply.
But, when i set attribute onError="skip", There is no data which import.
This is my config. 
*







 

   


http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin}";
transformer="com.vtc.search.Converter" onError="skip"
>


 
  
  
  

*

Thanks


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-skip-current-document-to-index-data-from-DIH-tp3381894p3388700.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to skip current document to index data from DIP

2011-10-03 Thread scorpking
Hi Erick Erickson
Thank you for reply for me, In my config, i indexed successful data from
HTTP using Tika. I combined a field and url in Tika to get file by that
http. But during indexing, i have seen some URL which is not exist or
notice:

*Caused by: java.io.FileNotFoundException:
http://media.gox.vn/edu/document/original/1/2704201010071760_Bai25.ppt
*

it mean that, this file is not exist in server. i want to skip file
(documents) to index next files. I tried to use *onError="skip"* to continue
index from file rich documents but it doesn't work and stop at. Is there a
way to overcome this problem?

Best Regard
Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-skip-current-document-to-index-data-from-DIH-tp3381894p3392055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Autocomplete with Solr 3.1

2011-07-26 Thread scorpking
Hi all, 
when i use autocomplete to suggest like google:
http://www.google.com/webhp?complete=1&hl=en and follow this url
http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/ to config my
project, but when i tested with more two terms in my query, it's not right,
i don't know why? 
Can anyone tell me ? 
Thanks for help.

 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3202214.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Different options for autocomplete/autosuggestion

2011-07-27 Thread scorpking
HI Bell, 
i used autocomplete in solr 3.1. same this: 

  

  autocomplete
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.jaspell.JaspellLookup
  autocomplete
  true 
 

and i make following URL*
http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/* to index my
data. and had a problem. with one word, it have done very good. But when i
typed more two words, rerults return not right. I don't know why? Can any
one know this problem? Thanks for your help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-options-for-autocomplete-autosuggestion-tp2678899p3203032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autocomplete with Solr 3.1

2011-07-27 Thread scorpking
Hi Klein, 
Thanks for your reply. But i tried some suggestion with solr, and results
return is good. But i want to using search component with solr 3.1. Now i
have had some problems with Suggester. i think my problem perhaps about in
schema file. This is schema file: 



















And i defined fields: 




where: 
fieldType with text_auto:

 


 
 


 

In file solrconfig.xml i defined: 

 
  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.tst.TSTLookup
  search_autocomplete
  true
 

  
  

true
suggest
10
true


spellcheck-autocomplete

  

Can any one help???

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3204176.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autocomplete with Solr 3.1

2011-07-28 Thread scorpking
Nobody can help me

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3206095.html
Sent from the Solr - User mailing list archive at Nabble.com.