subject:"PDF indexing"

Re: Regarding pdf indexing issue

2018-07-11 Thread Terry Steichen

Walter, Well said. (And I love the hamburger conversion analogy - very apt.) The only thing I will add is that when you have a collection of similar rich text documents, you might be able to construct queries to respect internal structures within the documents. If all/most of your documents hav

Re: Regarding pdf indexing issue

2018-07-11 Thread Shamik Sinha

You may try to use tesseract tool to check data extraction from pdf or images and then go forward accordingly. As far as I understand the PDF is an image and not data. The searchable PDF actually overlays the selectable text as hidden text over the PDF image. These PDFs can be indexed and extracted

Re: Regarding pdf indexing issue

2018-07-11 Thread Walter Underwood

PDF is not a structured document format. It is a printer control format. PDF does not have a paragraph marker. Instead, it says to move to this spot on the page, choose this font, and print this letter. For a paragraph, it moves farther. For the next letter in a word, it moves a little bit. Extra

Re: Regarding pdf indexing issue

2018-07-11 Thread Erick Erickson

Solr will not do this automatically, the Extracting Request Handler simply indexes the entire contents of the doc without regard to things like paragraphs etc. Ditto with HTML. This is actually a task that requires getting into Tika and using all the bells and whistles there. I'd recommend two thi

Regarding pdf indexing issue

2018-07-11 Thread Rahul Prasad Dwivedi

Hello Team, I am using the Solr for indexing and searching for pdf document I have go through with your website document and installed solr but unable to index and search the document. For example: Suppose we have a PDF file which have no of paragraph with separate heading. So If I search for t

Apache Solr - Pdf Indexing.

2014-04-29 Thread vignesh

Hi Team, I am indexing PDF using Apache Solr 3.6 . Passing around 3000 keywords using the OR operator (gardens OR flowers OR time OR train OR trees OR etc) able to get the files containing these keywords. But every .PDF file will not be containing all the keywords, some may contai

Re: Apache Solr - Pdf Indexing.

2014-04-29 Thread Gora Mohanty

On Apr 29, 2014 2:52 PM, "vignesh" wrote: > > Hi Team, > > > > I am indexing PDF using Apache Solr 3.6 . Passing around 3000 keywords using the OR operator and able to get the files containing the keywords. Kindly guide me to get the keyword list in a .PDF file. What do you mean? Do

Re: Apache Solr - Pdf Indexing.

2014-04-29 Thread Alexandre Rafalovitch

Your question is not terribly clear. Are you having troubles indexing PDF in general? Try the tutorial and specifically look for extract handler. Or you already got PDF into the system but your 3000 Keyword query does not match it? In which case it might be just that PDF extraction is limited by d

Apache Solr - Pdf Indexing.

2014-04-29 Thread vignesh

Hi Team, I am indexing PDF using Apache Solr 3.6 . Passing around 3000 keywords using the OR operator and able to get the files containing the keywords. Kindly guide me to get the keyword list in a .PDF file. Note : In Schema.xml have declared a unique tag "id". Than

Re: PDF Indexing

2014-04-02 Thread Jack Krupansky

: Wednesday, April 2, 2014 3:35 PM To: solr-user@lucene.apache.org Subject: Re: PDF Indexing Hi Sujatha, There is no built in mechanism. Prepare page documents outside of the solr. http://searchhub.org/2012/02/14/indexing-with-solrj/ And you may want to save text content somewhere too. If you change

Re: PDF Indexing

2014-04-02 Thread Ahmet Arslan

Hi Sujatha, There is no built in mechanism. Prepare page documents outside of the solr. http://searchhub.org/2012/02/14/indexing-with-solrj/ And you may want to save text content somewhere too. If you change something in index analysis/schema you need to reindex. If you save text data, you can

PDF Indexing

2014-04-02 Thread Sujatha Arun

Hi, I am able to use TIKA and DIH to Index a pdf as a single document.However I need each page to be single document. Is there any inbuilt mechanism to achieve the same or do I have to use pdfbox or any other tool achieve this? Regards

Re: PDF indexing issues

2013-11-18 Thread Marcello Lorenzi

You should check the Apache PDFBox project. A similar question: https://issues.apache.org/jira/browse/PDFBOX-940 2013/11/15 Marcello Lorenzi Hi, during you testing of Apache SOLR 4.3, we have noticed some errors occurred for PDF indexing: ERROR - 2013-11-15 15:14:2

Re: PDF indexing issues

2013-11-15 Thread Furkan KAMACI

You should check the Apache PDFBox project. A similar question: https://issues.apache.org/jira/browse/PDFBOX-940 2013/11/15 Marcello Lorenzi > Hi, > during you testing of Apache SOLR 4.3, we have noticed some errors > occurred for PDF indexing: > > ERROR - 2013-11-

PDF indexing issues

2013-11-15 Thread Marcello Lorenzi

Hi, during you testing of Apache SOLR 4.3, we have noticed some errors occurred for PDF indexing: ERROR - 2013-11-15 15:14:26.248; org.apache.pdfbox.pdmodel.font.PDCIDFont; Error: Could not parse predefined CMAP file for 'PDFXC30-Indentity0-UCS2' ERROR - 2013-11-15 15

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-27 Thread Furkan KAMACI

solr/ExtractingRequestHandler >> >> Again, DO NOT MIX the instructions from the two. >> >> post.jar is designed so that you do not need to know or care exactly how >> rich document indexing works. >> >> -- Jack Krupansky >> >> -Original Message

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI

nsky > > -Original Message- From: Furkan KAMACI > Sent: Friday, April 26, 2013 5:30 AM > To: solr-user@lucene.apache.org > Subject: Document is missing mandatory uniqueKey field: id for Solr PDF > indexing > > > I use Solr 4.2.1 and these are my fiel

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Jack Krupansky

Krupansky -Original Message- From: Furkan KAMACI Sent: Friday, April 26, 2013 5:30 AM To: solr-user@lucene.apache.org Subject: Document is missing mandatory uniqueKey field: id for Solr PDF indexing I use Solr 4.2.1 and these are my fields: I run th

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI

I think that I should start a new thread for my question to help people who searches for same situation. 2013/4/26 Furkan KAMACI > If you can help me it would be nice. I get that error: > > SimplePostTool version 1.5 > Posting files to base url http://localhost:8983/solr/update/extract.. > Enter

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI

If you can help me it would be nice. I get that error: SimplePostTool version 1.5 Posting files to base url http://localhost:8983/solr/update/extract.. Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing f

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Jan Høydahl

http://wiki.apache.org/solr/post.jar -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 26. apr. 2013 kl. 13:28 skrev Furkan KAMACI : > Hi Raymond; > > Now I get that error: SimplePostTool: WARNING: IOException while reading > respons

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI

Hi Raymond; Now I get that error: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: 2013/4/26 Raymond Wiker > You could start by doing > > java post.jar -help > > --- the 7th example shows exactly what you need to do to add a document id. > > On Fri, Ap

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Raymond Wiker

You could start by doing java post.jar -help --- the 7th example shows exactly what you need to do to add a document id. On Fri, Apr 26, 2013 at 11:30 AM, Furkan KAMACI wrote: > I use Solr 4.2.1 and these are my fields: > > multiValued="false" /> > > > > > multiValued="true"/> > > stored=

Document is missing mandatory uniqueKey field: id for Solr PDF indexing

2013-04-26 Thread Furkan KAMACI

I use Solr 4.2.1 and these are my fields: I run that command: java -Durl=http://localhost:8983/solr/update/extract -jar post.jar 523387.pdf However I get that error, any ideas? Apr 26, 2013 12:26:51 PM org.apache.solr.common.SolrException log SEVERE: org.apache

Re: PDF indexing

2012-05-08 Thread Lance Norskog

ctingRequestHandler >> >> -- Jack Krupansky >> >> -Original Message- From: Tolga Sent: Monday, May 07, 2012 3:24 PM >> To: solr-user@lucene.apache.org Subject: PDF indexing >> Hi, >> >> From what I have read, I think I have to use Tika (?) to ind

Re: PDF indexing

2012-05-07 Thread Tolga

On 05/07/2012 10:35 PM, Jack Krupansky wrote: Try SolrCell (ExtractingRequestHandler). See: http://wiki.apache.org/solr/ExtractingRequestHandler -- Jack Krupansky -Original Message- From: Tolga Sent: Monday, May 07, 2012 3:24 PM To: solr-user@lucene.apache.org Subject: PDF indexing

Re: PDF indexing

2012-05-07 Thread Jack Krupansky

Try SolrCell (ExtractingRequestHandler). See: http://wiki.apache.org/solr/ExtractingRequestHandler -- Jack Krupansky -Original Message- From: Tolga Sent: Monday, May 07, 2012 3:24 PM To: solr-user@lucene.apache.org Subject: PDF indexing Hi, From what I have read, I think I have

PDF indexing

2012-05-07 Thread Tolga

Hi, From what I have read, I think I have to use Tika (?) to index PDF, xls, doc, etc files. How do I start? Do I use mvn clean install in the source directory to get all the jar files to begin? Centos doesn't provide mvn, how do I build Tika after getting it from http://maven.apache.org ?

PDF indexing

2011-09-29 Thread Jón Helgi Jónsson

Good day, I'm checking if Solr would work for indexing PDFs. My requirements are: 1) I must know which page has what contents. 2) Left to right search support. Such as Hebrew. This has been the most trickiest to achieve. I also prefer to know the position of the searched contents on the page but

Re: response time for pdf indexing

2011-06-23 Thread simon

How long are the documents ? indexing a large document can be slow (although 2 seconds is very slow indeed). 2011/6/22 Rode González (libnova) : > Hi ! > > > > We are using Zend Search based on Lucene. Our indexing pdf consultations > take longer than 2 seconds. > > We want to change to solr to tr

RE: response time for pdf indexing

2011-06-22 Thread Steven A Rowe

o Iglesias; Leo; Marcos; Mario Crespo > (Silvereme); 'Rode' > Subject: response time for pdf indexing > > Hi ! > > > > We are using Zend Search based on Lucene. Our indexing pdf consultations > take longer than 2 seconds. > > > > We want to chan

response time for pdf indexing

2011-06-22 Thread libnova

Hi ! We are using Zend Search based on Lucene. Our indexing pdf consultations take longer than 2 seconds. We want to change to solr to try to solve this problem. i. Can anyone tell me the response time for querys on pdf documents on solr? ii. Can anyone tell me some strategies to reduce

Re: Regarding pdf indexing issue

Re: Regarding pdf indexing issue

Re: Regarding pdf indexing issue

Re: Regarding pdf indexing issue

Regarding pdf indexing issue

Apache Solr - Pdf Indexing.

Re: Apache Solr - Pdf Indexing.

Re: Apache Solr - Pdf Indexing.

Apache Solr - Pdf Indexing.

Re: PDF Indexing

Re: PDF Indexing

PDF Indexing

Re: PDF indexing issues

Re: PDF indexing issues

PDF indexing issues

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Re: PDF indexing

Re: PDF indexing

Re: PDF indexing

PDF indexing

PDF indexing

Re: response time for pdf indexing

RE: response time for pdf indexing

response time for pdf indexing

32 matches

Site Navigation

Mail list logo

Footer information