Re: Indexing PDF files in SqlBase database

2019-04-03 Thread Arunas Spurga
Yes, I know the reasons why put this work on a client rather than use Solr directly and it should be maybe the next my task. But I need to finish first my task - index a pdf files stored in SqlBase database. The pdf files are pretty simple, sometimes only dozens text lines. Regards, Aruna On Wed

Re: Indexing PDF files in SqlBase database

2019-04-03 Thread Erick Erickson
For a lot of reasons, I greatly prefer to put this work on a client rather than use Solr directly. Here’s a place to get started, it connects to a DB and also scans local file directory for docs to push through (local) Tika and index. So you should be able to modify it relatively easily to get t

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Allison, Timothy B.
>http - however, the big advantage of doing your indexing on different machine >is that the heavy lifting that tika does in extracting text from documents, >finding metadata etc is not happening on the server. If the indexer crashes, >it doesn’t affect Solr either. +1 for what can go wrong:

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Phil Scadden
: ZiYuan [mailto:ziyu...@gmail.com] Sent: Tuesday, 20 June 2017 11:29 p.m. To: solr-user@lucene.apache.org Subject: Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context Dear Erick and Timothy, I also took a look at the Python clients (say, SolrClient and

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Allison, Timothy B.
Yeah, Chris knows a thing or two about Tika. :) -Original Message- From: ZiYuan [mailto:ziyu...@gmail.com] Sent: Tuesday, June 20, 2017 8:00 AM To: solr-user@lucene.apache.org Subject: Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context No

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread ZiYuan
No intention of spamming but I also want to mention tika-python in the toolchain. Ziyuan On Tue, Jun 20, 2017 at 2:29 PM, ZiYuan wrote: > Dear Erick and Timothy, > > I also took a look at the Python clients (say, SolrClient and pysolr) > because Py

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread ZiYuan
Dear Erick and Timothy, I also took a look at the Python clients (say, SolrClient and pysolr) because Python is my main programming language. I have an impression that 1. they send HTTP requests to the server according to the server APIs; 2. they are not official and thus possibly not up to date.

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan
Dear Erick and Timothy, yes I will parse from the client for all the benefits. I am just trying to figure out what is going on by indexing one or two PDF files first. Thank you both. Best regards, Ziyuan On Mon, Jun 19, 2017 at 6:17 PM, Erick Erickson wrote: > bq: Hope that there is no side ef

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Erick Erickson
bq: Hope that there is no side effect of not mapping the PDF Well, yes it will have that side effect. You can cure that with a copyField directive from content to _text_. But do really consider running this as a SolrJ program on the client. Tim knows in far more painful detail than I do what kind

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan
Hi Erick, Now it is clear. I have to update the request handler of /update/extract/ from "defaults":{"fmap.content":"_text_"} to "defaults":{"fmap.content":"content"} to fill the field. Hope that there is no side effect of not mapping the PDF content to _text_. Thank you for the hint. Best regar

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Allison, Timothy B.
file. Finally, and I mean it this time, I heartily second Erik's point about SolrJ and the need to keep your file processing outside of Solr's JVM, VM and M! -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Monday, June 19, 2017 6:56 AM To: solr-us

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Erik Hatcher
Ziyuan - You may be interested in the example/files that ships with Solr too. It’s got schema and config and even UI for file indexing and searching. Check it out README.txt under example/files in your Solr install. Erik > On Jun 19, 2017, at 6:52 AM, ZiYuan wrote: > > Hi Erick, >

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan
Hi Erick, thanks very much for the explanations! Clarification for question 2: more specifically I cannot see the field content in the returned JSON, with the the same definitions as in the post

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-18 Thread Erick Erickson
1> Yes, you can use your single definition. The author identifies the "text" field as a catch-all. Somewhere in the schema there'll be a copyField directive copying (perhaps) many different fields to the "text" field. That permits simple searches against a single field rather than, say, using edism

Re: indexing pdf files using post tool

2016-03-19 Thread Francisco Andrés Fernández
Vidya, I don't know if I'm understanding it very well but, I think that the best way is to parse your text using a routine outside Solr. You might need to map the different parts of your document using your domain knowledge and use such routine to produce an XML document for example, with correspon

Re: indexing pdf files using post tool

2016-03-19 Thread Binoy Dalal
Take a look at the CloneFieldUpdateProcessorFactory here: http://www.solr-start.com/info/update-request-processors/ On Wed, 16 Mar 2016, 18:25 Binoy Dalal, wrote: > Like Francisco said, use a custom update processor to map the fields the > way you want and add it to your update chain. > > On Wed

Re: indexing pdf files using post tool

2016-03-18 Thread Binoy Dalal
Like Francisco said, use a custom update processor to map the fields the way you want and add it to your update chain. On Wed, 16 Mar 2016, 18:16 Francisco Andrés Fernández, wrote: > Vidya, I don't know if I'm understanding it very well but, I think that the > best way is to parse your text usin

Re: indexing pdf files using post tool

2016-03-18 Thread Jan Høydahl
Hi You can look at the Apache Tika project or the PDFBox project to parse your files before sending to Solr. Alternatively, if your processing is very simple, you can use the built-in Tika as U just did, and then deploy some UpdateRequestProcessor’s in order to modify the Tika output into whate

Re: indexing pdf files using post tool

2016-03-16 Thread vidya
Sorry for conveying it in wrong way. I want my data of 1 pdf file to be indexed with different fields in a document of solr according to data in it like name;id;title;content etc Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp42

Re: indexing pdf files using post tool

2016-03-15 Thread roshan agarwal
Yes vidya, you just have to use copy field Roshan On Tue, Mar 15, 2016 at 3:07 PM, vidya wrote: > Hi > I got data into my content field. But i wanted to have differnt fields to > be > allocated for data in my file.How can I achieve this ? > > > > -- > View this message in context: > http://luce

Re: indexing pdf files using post tool

2016-03-15 Thread Binoy Dalal
You should use copy fields. https://cwiki.apache.org/confluence/display/solr/Copying+Fields On Tue, 15 Mar 2016, 15:07 vidya, wrote: > Hi > I got data into my content field. But i wanted to have differnt fields to > be > allocated for data in my file.How can I achieve this ? > > > > -- > View th

Re: indexing pdf files using post tool

2016-03-15 Thread vidya
Hi I got data into my content field. But i wanted to have differnt fields to be allocated for data in my file.How can I achieve this ? -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp4263811p4263840.html Sent from the Solr - User mailing

Re: indexing pdf files using post tool

2016-03-15 Thread Binoy Dalal
Do you have a "content" field defined in your schema? Is it stored? By default, the content from the docs uploaded through post should be mapped to a field called "content". On Tue, 15 Mar 2016, 12:47 vidya, wrote: > Hi > I am trying to index a pdf file by using post tool in my linux system,Whe

Re: Indexing pdf files - question.

2013-09-08 Thread Nutan Shinde
Error got resolved,solution was must be within tag. On Sun, Sep 8, 2013 at 3:31 AM, Furkan KAMACI wrote: > Could you show us logs you get when you start your web container? > > > 2013/9/4 Nutan Shinde > > > My solrconfig.xml is: > > > > > > > > > class="solr.extraction.ExtractingRequestHandl

Re: Indexing pdf files - question.

2013-09-07 Thread Furkan KAMACI
Could you show us logs you get when you start your web container? 2013/9/4 Nutan Shinde > My solrconfig.xml is: > > > > class="solr.extraction.ExtractingRequestHandler" > > > > > descwhich > is defined as shown below in schem.xml--> > > true > > attr_ > > true > > > > > > > > > > Schem

Re: Indexing pdf files - question.

2013-09-04 Thread Nutan Shinde
My solrconfig.xml is: desc true attr_ true Schema.xml: doc_id I have created extract directory and copied all required .jar and solr-cell jar files into this extract directory and given its path in lib tag in solrconfig.xml When I try

Re: Indexing PDF Files

2013-04-24 Thread Jack Krupansky
or these "lib" elements ("INFO org.apache.solr.core.SolrConfig – Adding specified lib dirs to ClassLoader"). -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Wednesday, April 24, 2013 6:50 PM To: solr-user@lucene.apache.org Subject: Re: Indexing PD

Re: Indexing PDF Files

2013-04-24 Thread Jan Høydahl
In your schema you have written > class="solr.StrField" /> Note that XML tag and param names are case sensitive, so instead of fieldtype you should use fieldType I see that you have the same error for several fieldTypes in your schema, probably resulting in other similar errors too. -- Jan H

Re: Indexing PDF Files

2013-04-24 Thread Furkan KAMACI
Hi Alex; What do you mean with wrong case. Could you tell me what should I do? 2013/4/25 Alexandre Rafalovitch > You still seem to have 'fieldtype' with wrong case. Can you try that > simple thing before doing other complicated steps? And yes, restart > Solr after you change schema.xml > > Regar

Re: Indexing PDF Files

2013-04-24 Thread Alexandre Rafalovitch
You still seem to have 'fieldtype' with wrong case. Can you try that simple thing before doing other complicated steps? And yes, restart Solr after you change schema.xml Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time

Re: Indexing PDF Files

2013-04-24 Thread Furkan KAMACI
Here is my definition for handler: text true attr_ true 2013/4/25 Furkan KAMACI > I just want to search on rich documents but I still get same error. I have > copied example folder into anywhere else at my computer. I have copied dist > and contrib folders from my build folder into that

Re: Indexing PDF Files

2013-04-24 Thread Furkan KAMACI
I just want to search on rich documents but I still get same error. I have copied example folder into anywhere else at my computer. I have copied dist and contrib folders from my build folder into that copy of example folder (because solr-cell etc. are within that folders) However I still get same

Re: Indexing PDF Files

2013-04-24 Thread Erik Hatcher
Did you restart after adding those fields and types? On Apr 24, 2013, at 16:59, Furkan KAMACI wrote: > I have added that fields: > > > stored="true" multiValued="true"/> > > > and I have that definition: > > class="solr.StrField" /> > > here is my error: > > > > > 400 > 4154 > > >

Re: Indexing PDF Files

2013-04-24 Thread Alexandre Rafalovitch
Wrong case for ? Though I would have through Solr would complaint about that when it hits dynamicField with unknown type. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events fro

Re: Indexing PDF Files

2013-04-24 Thread Furkan KAMACI
I have added that fields: and I have that definition: here is my error: 400 4154 ERROR: [doc=1] unknown field 'ignored_meta' 400 What should I do more? 2013/4/24 Erik Hatcher > Also, at Solr startup time it logs what it loads from those > elements, so you can see whether it is

Re: Indexing PDF Files

2013-04-24 Thread Erik Hatcher
Also, at Solr startup time it logs what it loads from those elements, so you can see whether it is loading the files you intend to or not. Erik On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote: > Have you tried using absolute path to the relevant urls? That will > cleanly split

Re: Indexing PDF Files

2013-04-24 Thread Alexandre Rafalovitch
Have you tried using absolute path to the relevant urls? That will cleanly split the problem into 'still not working' and 'wrong relative path'. Regards, Alex. On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI wrote: > > Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.li

Re: Indexing PDF-Files using Solr Cell

2012-09-17 Thread Jack Krupansky
ber 17, 2012 1:12 AM To: solr-user@lucene.apache.org Subject: Re: Indexing PDF-Files using Solr Cell Thank you for your response. I'm writing my Bachelor-Thesis about Solr and my company doesn't want me to use a beta-version. I dont want to be annoying, but "how" do i direct the

Re: Indexing PDF-Files using Solr Cell

2012-09-16 Thread Alexander Troost
ng. > > Again, this is all simplified in Solr 4.0-BETA. > > > -- Jack Krupansky > > -Original Message- From: Alexander Troost > Sent: Sunday, September 16, 2012 11:59 PM > To: solr-user@lucene.apache.org > Subject: Re: Indexing PDF-Files using Solr Cell > >

Re: Indexing PDF-Files using Solr Cell

2012-09-16 Thread Jack Krupansky
n Solr 4.0-BETA. -- Jack Krupansky -Original Message- From: Alexander Troost Sent: Sunday, September 16, 2012 11:59 PM To: solr-user@lucene.apache.org Subject: Re: Indexing PDF-Files using Solr Cell Hi, first of all: Thank you for that quick response! But i am not sure if i am doing this r

Re: Indexing PDF-Files using Solr Cell

2012-09-16 Thread Alexander Troost
Hi, first of all: Thank you for that quick response! But i am not sure if i am doing this right. For my point of view the command now has to look like: curl " http://localhost:8983/solr/update/extract?literal.id=doc11&literal.filename=markus&fmap.content=text&commit=true"; -F "myfile=@markus.pdf

Re: Indexing PDF-Files using Solr Cell

2012-09-16 Thread Jack Krupansky
The content will be sent to the "content" field, which you can redirect using the &fmap.content=some-field request parameter. You need to explicitly set the file name field yourself, using the &literal.your-file-name-field=file-name request parameter. Also, if using Solr 4.0-BETA, you can simp

Re: Indexing pdf files - question.

2011-04-08 Thread Mike
Hi Erick, Thank you for the Reply. Now I am able to index the PDF files and search. I am left with couple of questions: 1. Can I add custom field to Search Response XML (Ex: Need to as description which gives brief description about the PDF file). 2. Currently Solr runs as a separate applicatio

Re: Indexing pdf files - question.

2011-04-07 Thread Erick Erickson
Did you try the curl commands that Adam suggested as part of this e-mail thread? If so, what happened? Best Erick On Wed, Apr 6, 2011 at 7:50 AM, Mike wrote: > Hi All, > > I am new to solr. I have gone through solr documents to index pdf files, > But > it was hard to find the exact procedure to

Re: Indexing pdf files - question.

2011-04-07 Thread Mike
Hi All, I am new to solr. I have gone through solr documents to index pdf files, But it was hard to find the exact procedure to get started. I need step by step procedure to do this. Could you please let me know the steps to index pdf files. Thanks, Mike -- View this message in context: http://

Re: Indexing pdf files - question.

2010-12-13 Thread Wodek Siebor
The sample /docs/tutorial.pdf does not require OCR. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-pdf-files-question-tp2079505p2080307.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing pdf files - question.

2010-12-13 Thread Adam Estrada
Hi, I use the following command to post PDF files. $ curl "http://localhost:8983/solr/update/extract?stream.file=C :\temp\document.docx&stream.contentType=application/msword&literal.id =esc.doc&commit=true" $ curl "http://localhost:8983/solr/update/extract?stream.file=C :\temp\features.pdf&stream