TikaEntityProcessor with DIH

2020-04-20 Thread Srinivas Kashyap
Hi, we were in Solr 5.2.1 and TikaEntityProcessor to index pdf documents through DIH and was working fine. The jars were tika-core-1.4.jar and tika-parsers-1.4.jar. Below is my schema.xml: (p,s. All filed types have been defined) And my tika-data-config.xml

Embedding XPathEntityProcessor inside TikaEntityProcessor

2019-03-12 Thread wclarke
I am pulling a large amount of data from a local source D:\foo\resource\. I am using tika through a DIH to index the multiple file formats with text and metadata. I have almost all the information being pulled that I want, however, I am having a couple of issues: 1. I need to run a regex replace

Re: DIH for TikaEntityProcessor

2018-10-12 Thread Kamuela Lau
lucene.apache.org > Emne: Re: DIH for TikaEntityProcessor > > Also, just wondering, have you have tried to specify dataSource="bin" for > read_file? > > On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau wrote: > > > Hi, > > > > I was unable to reproduc

SV: DIH for TikaEntityProcessor

2018-10-12 Thread Martin Frank Hansen (MHQ)
You sir just made my day!!! It worked!!! Thanks a million! Martin Frank Hansen, -Oprindelig meddelelse- Fra: Kamuela Lau Sendt: 12. oktober 2018 11:41 Til: solr-user@lucene.apache.org Emne: Re: DIH for TikaEntityProcessor Also, just wondering, have you have tried to specify

SV: DIH for TikaEntityProcessor

2018-10-12 Thread Martin Frank Hansen (MHQ)
-user@lucene.apache.org Emne: Re: DIH for TikaEntityProcessor Hi, I was unable to reproduce the error that you got with the information provided. Below are the data-config.xml and managed-schema fields I used; the data-config is mostly the same (I think that BinFileDataSource doesn't act

Re: DIH for TikaEntityProcessor

2018-10-12 Thread Alexandre Rafalovitch
83C9.6C129A60] > > > Lautrupparken 40-42, DK-2750 Ballerup > E-mail m...@kmd.dk Web www.kmd.dk > Mobil +4525571418 > > > > *Fra:* Martin Frank Hansen (MHQ) > *Sendt:* 10. oktober 2018 10:15 > *Til:* solr-user > *Emne:* DIH for TikaEntityProcessor

Re: DIH for TikaEntityProcessor

2018-10-12 Thread Kamuela Lau
t; > > > > On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) > wrote: > >> Hi again, >> >> >> >> Can anybody help me? Any suggestions to why I am getting the error below? >> >> >> >> >> >> *Martin Frank Ha

Re: DIH for TikaEntityProcessor

2018-10-12 Thread Kamuela Lau
rken 40-42, DK-2750 Ballerup > E-mail m...@kmd.dk Web www.kmd.dk > Mobil +4525571418 > > > > *Fra:* Martin Frank Hansen (MHQ) > *Sendt:* 10. oktober 2018 10:15 > *Til:* solr-user > *Emne:* DIH for TikaEntityProcessor > > > > Hi, > > > > I am trying to re

SV: DIH for TikaEntityProcessor

2018-10-12 Thread Martin Frank Hansen (MHQ)
md.dk<http://www.kmd.dk/> Mobil +4525571418 Fra: Martin Frank Hansen (MHQ) Sendt: 10. oktober 2018 10:15 Til: solr-user Emne: DIH for TikaEntityProcessor Hi, I am trying to read documents from a file system into Solr, using dataimporthandler but keep getting the following errors: [cid:image

DIH for TikaEntityProcessor

2018-10-10 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to read documents from a file system into Solr, using dataimporthandler but keep getting the following errors: [cid:image002.png@01D46082.022FF7A0] Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassC

RE: Japanese character is garbled when using TikaEntityProcessor

2017-04-12 Thread Noriyuki TAKEI
Thanks!!I appreciate for your quick reply. -- View this message in context: http://lucene.472066.n3.nabble.com/Japanese-character-is-garbled-when-using-TikaEntityProcessor-tp4329217p4329657.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Japanese character is garbled when using TikaEntityProcessor

2017-04-10 Thread Allison, Timothy B.
s garbled when using TikaEntityProcessor Hi,All I use TikaEntityProcessor to extract the text content from binary or text file. But when I try to extract Japanese Characters from HTML File whose caharacter encoding is SJIS, the content is garbled.In the case of UTF-8,it does work well. The se

Japanese character is garbled when using TikaEntityProcessor

2017-04-10 Thread Noriyuki TAKEI
Hi,All I use TikaEntityProcessor to extract the text content from binary or text file. But when I try to extract Japanese Characters from HTML File whose caharacter encoding is SJIS, the content is garbled.In the case of UTF-8,it does work well. The setting of Data Import Handler is as below

Re: TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
.nabble.com/TikaEntityProcessor-Not-Finding-My-Files-tp4212241p4212252.html Sent from the Solr - User mailing list archive at Nabble.com.

TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
Hi, there's a guy who's already asked a question similar to this and I'm basically going off what he did here. It's exactly what I'm doing which is taking a file path from a database and using TikaEntityProcessor to analyze the document. The link to his question is here.

Re: TikaEntityProcessor + multivalue field as url source

2014-01-29 Thread Bustaa
Thanks for you suggestions Ahmet. We are using the Typo3 CMS (with custom extensions / db-schemas). We are using Solarium to connect to the Solr instance. The schema is pretty simple:

Re: TikaEntityProcessor + multivalue field as url source

2014-01-29 Thread Ahmet Arslan
Hi Bustaa, Can you paste your data-config.xml?  Also, did you consider using ManifoldCF [1] to crawl/index your CMS? What CMS are you using? [1] http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#repositoryconnectiontypes On Wednesday, January 29, 2014 1:03 PM,

TikaEntityProcessor + multivalue field as url source

2014-01-29 Thread Bustaa
Hello Solr Users, i'm trying to get Tika's "BinFileDataSource" to take the filenames from a multivalue field (array) but I'm getting the following exception: Debug output from running dataimport (shortenend): "query", "<<< LONG SQL-QUERY >>>", "time-taken",

Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-14 Thread PeteBleackley
OK, so I put my pdf files in a directory /path/to/pdf, and edited example-DIH/solr/tika/conf/tika-data-config.xml to contain the parameter <entity name="tika-test" processor="TikaEntityProcessor" url="/path/to/pdf" format="xml" > What should I do nex

Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Shawn Heisey
On 10/11/2013 9:32 AM, PeteBleackley wrote: > I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404 > error, apparently caused by post.jar adding /extract to the end of the URL In order to use post.jar, you would need the /update/extract handler, which is not defined in the ti

Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Furkan KAMACI
g.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > at > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > at > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > at java.lang.Thread.run(Thread.java:724) > > > I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404 > error, apparently caused by post.jar adding /extract to the end of the URL > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Problems-using-DataImportHandler-and-TikaEntityProcessor-tp4094983p4094987.html > Sent from the Solr - User mailing list archive at Nabble.com. >

Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread PeteBleackley
at java.lang.Thread.run(Thread.java:724) I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404 error, apparently caused by post.jar adding /extract to the end of the URL -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-using-DataImportHandler-and-TikaEntityProcessor-tp4094983p4094987.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Furkan KAMACI
There may be a problem with you schema. Could you send your solr logs? 2013/10/11 Peter Bleackley > Starting Solr with the command line > > > java -Dsolr.solr.home=example-DIH/**solr -jar start.jar > > > and then trying to import some data with > > java > -Durl=http://localhost:8983/**solr/tik

Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Peter Bleackley
Starting Solr with the command line java -Dsolr.solr.home=example-DIH/solr -jar start.jar and then trying to import some data with java -Durl=http://localhost:8983/solr/tika/update -Dtype=application/pdf -jar post.jar *.pdf fails with error SimplePostTool: WARNING: Solr returned an error

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-10-01 Thread Andreas Owen
nection$HttpInputStream cannot be cast >> to java.io.Reader >> >> >> >> >> >> On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote: >> >>> ok i see what your getting at but why doesn't the following work: >>> >>> >&g

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-30 Thread P Williams
tion$HttpInputStream cannot be cast > to java.io.Reader > > > > > > On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote: > > > ok i see what your getting at but why doesn't the following work: > > > > > > > > > > i removed th

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-29 Thread Andreas Owen
he wiki? > > > On 28. Sep 2013, at 12:28 AM, P Williams wrote: > >> I spent some more time thinking about this. Do you really need to use the >> TikaEntityProcessor? It doesn't offer anything new to the document you are >> building that couldn't be acco

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-28 Thread Andreas Owen
thanks but the first suggestion is already implemented and the 2. didn't work. i have also tried htmlMapper="identity" but nothing worked. i also tried this but the html was stripped in both fields but in the end i think it's best to cut tika o

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Alexandre Rafalovitch
This is a rather complicated example to chew through, but try the following two things: *) dataField="${tika.text}" => dataField="text" (or less likely htmlMapper tika.text) You might be trying to read content of the field rather than passing reference to the field that seems to be expected. This

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Andreas Owen
u really need to use the > TikaEntityProcessor? It doesn't offer anything new to the document you are > building that couldn't be accomplished by the XPathEntityProcessor alone > from what I can tell. > > I also tried to get the Advanced > Parsing<http://wiki.ap

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread P Williams
I spent some more time thinking about this. Do you really need to use the TikaEntityProcessor? It doesn't offer anything new to the document you are building that couldn't be accomplished by the XPathEntityProcessor alone from what I can tell. I also tried to get the Advanced Pa

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Andreas Owen
ms wrote: > Hi, > > Haven't tried this myself but maybe try leaving out the > FieldReaderDataSource entirely. From my quick searching looks like it's > tied to SQL. Did you try copying the > http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing example

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-26 Thread P Williams
Hi, Haven't tried this myself but maybe try leaving out the FieldReaderDataSource entirely. From my quick searching looks like it's tied to SQL. Did you try copying the http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing example exactly? What happens when you

XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-26 Thread Andreas Owen
i'm using solr 4.3.1 and the dataimporter. i am trying to use XPathEntityProcessor within the TikaEntityProcessor for indexing html-pages but i'm getting this error for each document. i have also tried dataField="tika.text" and dataField="text" to no avail. th

Re: Issue with Solr 3.5 while using TikaEntityProcessor on .docx files

2012-04-16 Thread Roman K
On 04/16/2012 06:45 PM, Roman K wrote: On 04/16/2012 04:31 PM, Jan Høydahl wrote: Hi, Solr3.6 is just out with Tika 1.0. Can you try that? Also, Solr TRUNK now has Tika 1.1... I recommend downloading Tika-App and testing your offending files directly with that http://tika.apache.org/1.1/getti

Re: Issue with Solr 3.5 while using TikaEntityProcessor on .docx files

2012-04-16 Thread Roman K
On 04/16/2012 04:31 PM, Jan Høydahl wrote: Hi, Solr3.6 is just out with Tika 1.0. Can you try that? Also, Solr TRUNK now has Tika 1.1... I recommend downloading Tika-App and testing your offending files directly with that http://tika.apache.org/1.1/gettingstarted.html -- Jan Høydahl, search s

Re: Issue with Solr 3.5 while using TikaEntityProcessor on .docx files

2012-04-16 Thread Jan Høydahl
Hi, Solr3.6 is just out with Tika 1.0. Can you try that? Also, Solr TRUNK now has Tika 1.1... I recommend downloading Tika-App and testing your offending files directly with that http://tika.apache.org/1.1/gettingstarted.html -- Jan Høydahl, search solution architect Cominvent AS - www.cominven

Issue with Solr 3.5 while using TikaEntityProcessor on .docx files

2012-04-16 Thread Roman K
Hello, I am running some tests to see, whether we can use Solr in our organization. I have to be able to process MS Word .docx files and then be able to search them as they were simple plain text. The problem is that when processing the docx files, the result that I get while running the *:* q

RE: how to store file path in Solr when using TikaEntityProcessor

2012-03-28 Thread ZHANG Liang F
: Ahmet Arslan Subject: Re: how to store file path in Solr when using TikaEntityProcessor Hi, you should change your data-config moving data that come from FileListEntityProcessor to its entity, one level up. Try this configuration

Re: how to store file path in Solr when using TikaEntityProcessor

2012-03-28 Thread Luca Cavanna
: On Wed, Mar 28, 2012 at 3:50 AM, ZHANG Liang F < liang.f.zh...@alcatel-sbell.com.cn> wrote: > Could you please show me how to get those values inside > TikaEntityProcessor? > > -Original Message- > From: Ahmet Arslan [mai

RE: how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread ZHANG Liang F
Could you please show me how to get those values inside TikaEntityProcessor? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: 2012年3月27日 22:43 To: solr-user@lucene.apache.org Subject: Re: how to store file path in Solr when using TikaEntityProcessor > I am us

Re: how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread Ahmet Arslan
BinFileDataSource" /> >     >         dataSource="null" rootEntity="false" >             > processor="FileListEntityProcessor" >             > baseDir="E:/my_project/ecmkit/infotouch" >             > fileName=&quo

how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread ZHANG Liang F
Hi, I am using DIH to index local file system. But the file path, size and lastmodified field were not stored. in the schema.xml I defined: And also defined tika-data-config.xml:

how to store file path in Solr when using TikaEntityProcessor

2012-03-26 Thread ZHANG Liang F
Hi, I am using DIH to index local file system. But the file path, size and lastmodified field were not stored. in the schema.xml I defined: And also defined tika-data-config.xml:

Re: [jira] [Commented] (SOLR-2961) DIH with threads and TikaEntityProcessor JDBC ISsue

2011-12-10 Thread Mikhail Khludnev
IH process immediately stops and performs a rollback. >> >> This is preventing me from using DIH to load and maintain my production >> index. Any help is greatly appreciated since I am now at the 11th hour. :) >> >> Solr and all components have been stellar up to

Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
t; > Regards, > Gora > > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3525046.html > To unsubscribe f

Re: TikaEntityProcessor not working?

2011-11-21 Thread Gora Mohanty
On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj wrote: > So where can i get some information on this issue, Can you please help ? Have you tried simple things like searching Google, using the Tika site, and, failing these, asking on a Tika-specific mailing list? No offence, but you might do some basi

Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
37 AM, kumar8anuj <[hidden > email]<http://user/SendEmail.jtp?type=node&node=3524905&i=0>> > wrote: > > Erick, > > Need your help on this. Waiting for resolution. Please help ... > > > > -- > > View this message in context: > htt

Re: TikaEntityProcessor not working?

2011-11-21 Thread Erick Erickson
Sorry, but I don't really have that info. Erick On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj wrote: > Erick, >          Need your help on this. Waiting for resolution. Please help ... > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/TikaEnti

Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
Erick, Need your help on this. Waiting for resolution. Please help ... -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

2011-11-14 Thread kumar8anuj
(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) ... 7 more -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3506596.html Sent from the Solr - User mailing list archive at

Re: TikaEntityProcessor not working?

2011-11-08 Thread Erick Erickson
i m not clear to you. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3490077.html > Sent from the Solr - User mailing list archive at Nabble.com. >

Re: TikaEntityProcessor not working?

2011-11-08 Thread kumar8anuj
if i m not clear to you. -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3490077.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

2011-11-07 Thread Erick Erickson
s related to that. > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3486898.html > Sent from the Solr - User mailing list archive at Nabble.com. >

Re: TikaEntityProcessor not working?

2011-11-07 Thread kumar8anuj
message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3486898.html Sent from the Solr - User mailing list archive at Nabble.com.

TikaEntityProcessor is filling logs

2011-08-02 Thread O. Klein
I want to use TikaEntityProcessor for URLs defined in "link" from the parent entity. This field can be empty as well. While the dataimport is working OK, the logging is filling up with exceptions in case link is null. Is there way to prevent this? -- View this message

Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-27 Thread Gora Mohanty
On Thu, May 26, 2011 at 6:52 PM, Rahul Warawdekar wrote: > Hi All, > > I am using Solr 3.1 for one of our search based applications. > We are using DIH to index our data and TikaEntityProcessor to index > attachments. > Currently we are running into an issue while extracting c

Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Rahul Warawdekar
e are using DIH to index our data and TikaEntityProcessor to index >> attachments. >> Currently we are running into an issue while extracting content from one >> of >> our MS Excel 2007 files, using TikaEntityProcessor. >> >> The issue is the TikaEntityProcessor

Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Markus Jelsma
Can you rule out Tika or Solr by trying to parse the file with a stand-alone Tika? > Hi All, > > I am using Solr 3.1 for one of our search based applications. > We are using DIH to index our data and TikaEntityProcessor to index > attachments. > Currently we are running i

Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Rahul Warawdekar
Hi All, I am using Solr 3.1 for one of our search based applications. We are using DIH to index our data and TikaEntityProcessor to index attachments. Currently we are running into an issue while extracting content from one of our MS Excel 2007 files, using TikaEntityProcessor. The issue is the

Re: solr 3.1 - TikaEntityProcessor

2011-05-01 Thread firdous_kind86
thanks koji :), it works-- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-1-TikaEntityProcessor-tp2883520p2885679.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 3.1 - TikaEntityProcessor

2011-04-30 Thread Koji Sekiguchi
(11/05/01 3:45), firdous_kind86 wrote: Hi all, i have solr 3.1 with DIH running fine, but i m unable to use tikaentityprocessor, i tried with replacing tika-parsers and tika-core jars 0.9 (in contrib/extraction/lib) but still no success, may be i m missing something, could anyone define the

solr 3.1 - TikaEntityProcessor

2011-04-30 Thread firdous_kind86
Hi all, i have solr 3.1 with DIH running fine, but i m unable to use tikaentityprocessor, i tried with replacing tika-parsers and tika-core jars 0.9 (in contrib/extraction/lib) but still no success, may be i m missing something, could anyone define the exact steps to start using with

Re: TikaEntityProcessor

2011-04-20 Thread firdous_kind86
after reading this post i hoped that i could achieve.. but couldnt find any success in almost a week http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html#a867572 -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-tp2839188p2843084

Re: TikaEntityProcessor

2011-04-20 Thread Andreas Kemkes
1 12:38:02 AM Subject: Re: TikaEntityProcessor hi, i asked that :) didnt get that.. what dependencies? i am using solr 1.4 and tika 0.9 i replaced tika-core 0.9 and tika-parsers 0.9 at /contrib/extraction/lib also replaced old version of dataimporthandler-extras by apache-solr-dataimporthandler-extras-

Re: TikaEntityProcessor

2011-04-20 Thread firdous_kind86
.. someone pointed bug SOLR-2116 to me but i guess it is only for solr-3.1 -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-tp2839188p2841936.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor

2011-04-19 Thread Oleg Tikhonov
rybody, > >> > >> Recently, I got a message from a guy who was asking about > >> TikaEntityProcessor. > >> He uses Solr 1.4 and Tika 0.8. > >> Here is a stack: > >> SEVERE: Full Import failed > >> org.apache.solr.handler. > >

TikaEntityProcessor

2011-04-19 Thread Oleg Tikhonov
Hello everybody, Recently, I got a message from a guy who was asking about TikaEntityProcessor. He uses Solr 1.4 and Tika 0.8. Here is a stack: SEVERE: Full Import failed org.apache.solr.handler. dataimport.DataImportHandlerException: Unable to load En tityProcessor implementation for entity

Re: TikaEntityProcessor

2011-04-19 Thread Li
Looks like dependencies. Did you or him included the dependencies in the solrconfig? Sent from my iPhone On Apr 19, 2011, at 8:35 AM, Oleg Tikhonov wrote: >> Hello everybody, >> >> Recently, I got a message from a guy who was asking about >> TikaEntityProcessor. >

Re: TikaEntityProcessor

2011-04-19 Thread Oleg Tikhonov
> Hello everybody, > > Recently, I got a message from a guy who was asking about > TikaEntityProcessor. > He uses Solr 1.4 and Tika 0.8. > Here is a stack: > SEVERE: Full Import failed > org.apache.solr.handler. > dataimport.DataImportHandlerException: Unable

TikaEntityProcessor and metadata

2010-10-07 Thread Peter Blokland
hi, I'm using Solr to index document both through a combination of DataImportHandler/TikaEntityProcessor and Solr's ExtractingRequestHandler. The latter gives me the option of dynamically mapping metadata to fields using "uprefix='attr_'" in the configuration. Is it

Indexing Rich Format Documents using Data Import Handler (DIH) and the TikaEntityProcessor

2010-06-23 Thread Tod
Please refer to this thread for history: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3c4c1b6bb6.7010...@gmail.com%3e I'm trying to integrate the TikaEntityProcessor as suggested. I'm using Solr Version: 1.4.0 and getting the follo

RE: TikaEntityProcessor on Solr 1.4?

2010-06-08 Thread Tim Gilbert
[mailto:six...@sfko.com] Sent: Tuesday, June 08, 2010 3:53 PM To: solr-user@lucene.apache.org Subject: Re: TikaEntityProcessor on Solr 1.4? 2010/5/22 Noble Paul നോബിള്‍ नोब्ळ् : > just copy the dih-extras jar file from the nightly should be fine Now that I've finally got a server on

Re: TikaEntityProcessor on Solr 1.4?

2010-06-08 Thread Sixten Otto
2010/5/22 Noble Paul നോബിള്‍ नोब्ळ् : > just copy the dih-extras jar file from the nightly should be fine Now that I've finally got a server on which to attempt to set these things up... this turns out not to be a viable solution. The extras jar does contain the TikaEntityProcessor cl

Re: TikaEntityProcessor not working?

2010-06-04 Thread Brad Greenlee
#x27;ve not tried to get this to work and am > not sure what config is needed to make this work. I simply installed Tika > 0.6 which can be dowloaded from the apache tika website. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-worki

Re: TikaEntityProcessor not working?

2010-06-03 Thread David George
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p867572.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

2010-05-31 Thread Brad Greenlee
It is a file. Only the filename is stored in the database. Brad On May 31, 2010, at 2:59 AM, Noble Paul നോബിള്‍ नो ब्ळ् wrote: BinFileDataSource will only work with file, Try FieldStreamDataSource On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee wrote: Hi. I'm trying to get Solr to i

Re: TikaEntityProcessor not working?

2010-05-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
BinFileDataSource will only work with file, Try FieldStreamDataSource On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee wrote: > Hi. I'm trying to get Solr to index a database in which one column is a > filename of a PDF document I'd like to index. My configuration looks like > this: > > > url=

TikaEntityProcessor not working?

2010-05-30 Thread Brad Greenlee
Hi. I'm trying to get Solr to index a database in which one column is a filename of a PDF document I'd like to index. My configuration looks like this: I'm using Solr from trunk (as of two days ago). The import process completes without errors, and

Re: TikaEntityProcessor on Solr 1.4?

2010-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
just copy the dih-extras jar file from the nightly should be fine On Sat, May 22, 2010 at 3:12 AM, Sixten Otto wrote: > On Fri, May 21, 2010 at 5:30 PM, Chris Harris wrote: >> Actually, rather than cherry-pick just the changes from SOLR-1358 and >> SOLR-1583 what I did was to merge in all DataIm

Re: TikaEntityProcessor on Solr 1.4?

2010-05-21 Thread Sixten Otto
On Fri, May 21, 2010 at 5:30 PM, Chris Harris wrote: > Actually, rather than cherry-pick just the changes from SOLR-1358 and > SOLR-1583 what I did was to merge in all DataImportHandler-related > changes from between the 1.4 release up through Solr trunk r890679 > (inclusive). I'm not sure if that

Re: TikaEntityProcessor on Solr 1.4?

2010-05-21 Thread Chris Harris
You are right that TikaEntityProcessor has a couple of other prereqs beyond stock Solr 1.4. I think the main point is that they're relatively minor. I've merged TikaEntityProcessor (and some prereqs) and its dependencies into my Solr 1.4 tree and it compiles fine, though I haven't

Re: TikaEntityProcessor on Solr 1.4?

2010-05-21 Thread Sixten Otto
2010/5/19 Noble Paul നോബിള്‍ नोब्ळ् : > I guess it should work because Tika Entityprocessor does not use any > new 1.4 APIs > > On Wed, May 19, 2010 at 1:17 AM, Sixten Otto wrote: >> The TikaEntityProcessor class that enables DataImportHandler to >> process business docum

Re: TikaEntityProcessor on Solr 1.4?

2010-05-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess it should work because Tika Entityprocessor does not use any new 1.4 APIs On Wed, May 19, 2010 at 1:17 AM, Sixten Otto wrote: > Sorry to repeat this question, but I realized that it probably > belonged in its own thread: > > The TikaEntityProcessor class that enables DataImpo

TikaEntityProcessor on Solr 1.4?

2010-05-18 Thread Sixten Otto
Sorry to repeat this question, but I realized that it probably belonged in its own thread: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, along with some other changes (like the binary DataSources) to support it

Re: TikaEntityProcessor in Solr1.4

2010-04-27 Thread Monmohan Singh
typo: Also, is there a timeframe on Solr1. release? should be Also, is there a timeframe on Solr1.5 release? On Tue, Apr 27, 2010 at 8:10 AM, monmohan wrote: > > Hi, > I would like to use TikaEntityProcessor with Solr1.4. > https://issues.apache.org/jira/browse/SOLR-1358 shows

TikaEntityProcessor in Solr1.4

2010-04-26 Thread monmohan
Hi, I would like to use TikaEntityProcessor with Solr1.4. https://issues.apache.org/jira/browse/SOLR-1358 shows that this is added in Solr1.5. Can anyone please point me on the steps to patch Solr 1.4 with these changes (if this is possible/allowed). Also, is there a timeframe on Solr1. release

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-02-05 Thread Jorg Heymans
;> wrote: > >> > Hi, > >> > I'm having some troubles getting this to work on a snapshot from 3rd > feb > >> My > >> > config looks as follows > >> > url="..

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-02-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 >> >         at >> > >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98) >> >         at >> > >> org.apache.solr.handler.dataimport.Enti

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-02-05 Thread Jorg Heymans
t; org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 > > at > > > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98) > > at > > > org.apache.solr.handler.dataimport.EntityProcessorWra

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-02-04 Thread Jorg Heymans
നോബിള്‍ नोब्ळ् > There is no corresponding DataSurce which can be used with > TikaEntityProcessor which reads from BLOB > I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737 > > On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal wrote: > > Hi, > > > > &

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is no corresponding DataSurce which can be used with TikaEntityProcessor which reads from BLOB I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737 On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal wrote: > Hi, > > > > I am fairly new to Solr and would like to

RE: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-26 Thread Shah, Nirmal
Nirmal Shah -Original Message- From: Jorg Heymans [mailto:jorg.heym...@gmail.com] Sent: Tuesday, January 26, 2010 3:43 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource Hi Shah, I am assuming you are talking about

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-26 Thread Jorg Heymans
, Shah, Nirmal wrote: > Hi, > > > > I am fairly new to Solr and would like to use the DIH to pull rich text > files (pdfs, etc) from BLOB fields in my database. > > > > There was a suggestion made to use the FieldReaderDataSource with the > recently commited

DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-25 Thread Shah, Nirmal
Hi, I am fairly new to Solr and would like to use the DIH to pull rich text files (pdfs, etc) from BLOB fields in my database. There was a suggestion made to use the FieldReaderDataSource with the recently commited TikaEntityProcessor. Has anyone accomplished this? This is my