mmit(solr, "prindex");
return true;
-Original Message-
From: Erick Erickson
Sent: Wednesday, 31 October 2018 06:00
To: solr-user
Subject: Re: Indexing PDF file in Apache SOLR via Apache TIKA
All of the above work, but for robust production situations you'll wa
let me introduce my self. My name is Mohammad Kevin Putra
> (you
> > > can call me Kevin), from Indonesia, i am a beginner in backend
> developer, i
> > > use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0.
> > >
> > > I have a little bit problem
0, 2018 at 3:40 PM adiyaksa kevin
> wrote:
>
> > Hello there, let me introduce my self. My name is Mohammad Kevin Putra (you
> > can call me Kevin), from Indonesia, i am a beginner in backend developer, i
> > use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0.
&
t; can call me Kevin), from Indonesia, i am a beginner in backend developer, i
> use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0.
>
> I have a little bit problem about how to put PDF File via Apache TIKA. I
> understand how SOLR or TIKA works, but i don't know how th
Hello there, let me introduce my self. My name is Mohammad Kevin Putra (you
can call me Kevin), from Indonesia, i am a beginner in backend developer, i
use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0.
I have a little bit problem about how to put PDF File via Apache TIKA. I
:8983/solr/techproducts/update/extract?literal.id=doc1&commit=true'
-F
"myfile=@example/exampledocs/solr-word.pdf"
On Wed, Jun 14, 2017 at 1:30 PM, Vasiliy Boldyrev <
vasiliy.boldy...@gmail.com> wrote:
> Hello,
>
> I used Apache Solr™ version 6.6.0 but can't upload
Hello,
I used Apache Solr™ version 6.6.0 but can't upload pdf file to Core
Instruction and Example has been get from
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
Add to solconfig.xml additional path to /dist/ and /contrib/extractio
hanks for the replies!
- Mensaje original -
De: "Upayavira"
Para: solr-user@lucene.apache.org
Enviados: Martes, 5 de Febrero 2013 9:05:58
Asunto: Re: Indexing several parts of PDF file
This would involve you querying against every page in your document,
which will be too many
Yes.. I also think the same..Better Index each Page as Documents
On Tue, Feb 5, 2013 at 7:35 PM, Upayavira wrote:
> This would involve you querying against every page in your document,
> which will be too many fields and will break quickly.
>
> The best way to do it is to index pages as documents
This would involve you querying against every page in your document,
which will be too many fields and will break quickly.
The best way to do it is to index pages as documents. You can use field
collapsing to group pages from the same document together.
Upayavira
On Tue, Feb 5, 2013, at 02:00 PM
Hi:
I'm working on a search engine for several PDF documents, right now one of the
requirements is that we can provide not only the documents matching the search
criteria but the page that match the criteria. Normally tika only extracts the
text content and does not do this distinction, but usi
I'd also suggest trying extracting text using tika-app (shipped with tika
distribution as executable jar) on the PDF(s) in question to see if problem
is with extraction or with indexing.
Rav
On Mon, Apr 2, 2012 at 1:55 PM, Erick Erickson wrote:
> You can index 2B tokens, so upping maxFieldLength
You can index 2B tokens, so upping maxFieldLength should have
fixed your problem at least as far as Solr is concerned. How
many tokens get indexed? I'm not as familiar with Tika, but
there may be some kind of parameter there (although I
don't remember this coming up before)...
Did you restart Solr
Hello Guys,
I am using apache solr 3.3.0 with Tikka 1.0.
I have pdf files which I am pushing into solr for conent searching. Apache
solr is indexing pdf files and I can see them in apache solr admin interface
for search. But the issue is apache solr is not indexing whole file content.
It is index
Howdy Folks,
I'm stumped and hope somebody can give me some clues on how to work around
this occasional error I'm getting.
I've got a .Net console program using SolrNet to scour certain folders at
certain times and extract text from PDF files and index them. It succeeds on
a majority of the fi
Hello,
I'm having trouble adding a pdf file to my index. It's multicored. My server
object instantiates properly (StreamingUpdateSolrServer). In my request object
(ContentStreamUpdateRequest) I add a couple of literals to populate fields in
the index that the parsed content of the
I changed data-config-sql.xml to
There are no errors, but, the indexed pdf is convert to Numbers..
200 1 202 1 203 1 212 1 222 1 236 1 242 1 244 1 254 1 255
--
Best Regards,
Roy Liu
On Mon, Apr 11, 2011 at 2:02 PM, Roy Liu wrote:
>
Hi,
I have copied
\apache-solr-3.1.0\dist\apache-solr-dataimporthandler-extras-3.1.0.jar
into \apache-tomcat-6.0.32\webapps\solr\WEB-INF\lib\
Other Errors:
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Unclosed
quotation mark after the character string 'B@3e574'.
--
Best Regards,
Hi, all
Thank YOU very much for your kindly help.
*1. I have upgrade from Solr 1.4 to Solr 3.1*
*2. Change data-config-sql.xml *
***
*
*3. solrconfig.xml and schema.xml are NOT changed.*
However, when I
You have to upgrade completely to the Apache Solr 3.1 release. It is
worth the effort. You cannot copy any jars between Solr releases.
Also, you cannot copy over jars from newer Tika releases.
On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman wrote:
> Hi again
> what you are missing is field mapping
>
Hi again
what you are missing is field mapping
no need for TikaEntityProcessor since you are not accessing pdf files
Hi there
TikaEntityProcessor is available as part of DIH-extras*.jar in 3.x and 4.0
Thanks Lance,
I'm using Solr 1.4.
If I want to using TikaEP, need to upgrade to Solr 3.1 or import jar files?
Best Regards,
Roy Liu
On Fri, Apr 8, 2011 at 10:22 AM, Lance Norskog wrote:
> You need the TikaEntityProcessor to unpack the PDF image. You are
> sticking binary blobs into the index.
You need the TikaEntityProcessor to unpack the PDF image. You are
sticking binary blobs into the index. Tika unpacks the text out of the
file.
TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release.
On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu wrote:
> Hi,
>
> I have a table named *attachme
Hi,
I have a table named *attachment *in MS SQL Server 2008.
COLUMNTYPE
-
id int
titlevarchar(200)
attachment image
I need to index the attachment(store pdf files) column from database via
DIH.
After access this URL, it returns "Ind
Check your libraries for Tika related Jar files.Tika related files must be on
classpath of solr
-
Grijesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/Internal-Server-Error-when-indexing-a-pdf-file-tp2214617p2226374.html
Sent from the Solr - User mailing list archive
Hi,
I was trying to use Solr Cell (through the Java API) to index a pdf file.
The class has been extracted from
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
public class Solr {
public static void main(String[] args) {
try {
String solrId = "beautiful_st
: Subject: PDF file
: References: <20100729152139.321c4...@ibis>
:
: In-Reply-To:
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh
ur help!
Thanks,
-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C]
Sent: Wednesday, August 11, 2010 10:36 AM
To: solr-user@lucene.apache.org
Cc: 'jayendra.patil@gmail.com'
Subject: RE: PDF file
Thanks so much for your help! I got "Remote Streaming is disabled" error.
rg
Subject: Re: PDF file
Try ...
curl "
http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file=
/pub2009001.pdf&literal.id=777045&commit=true"
stream.file - specify full path
literal. - specify any extra params if needed
Regards,
Jayendra
On Tue, Aug 10, 2010
i (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:
> Thanks so much for your help! I tried to index a pdf file and got the
> following. The command I used is
>
> curl '
> http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&
Thanks so much for your help! I tried to index a pdf file and got the
following. The command I used is
curl
'http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true'
-F "fi...@pub2009001.pdf"
Did I do somet
, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov]
Sent: Tuesday, August 10, 2010 11:57 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: PDF file
Does anyone have any experience with PDF file? I really appreciate your help!
Thanks so much in advance.
-Original Message-
From: M
Does anyone have any experience with PDF file? I really appreciate your help!
Thanks so much in advance.
-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C]
Sent: Tuesday, August 10, 2010 10:37 AM
To: 'solr-user@lucene.apache.org'
Subject: PDF file
I have a lot of pdf f
I have a lot of pdf files. I am trying to import pdf files to solr and index
them. I added ExtractingRequestHandler to solrconfig.xml.
Please tell me if I need download some jar files.
In the Solr1.4 Enterprise Search Server book, use following command to import a
mccm.pdf.
curl
'http://loc
tingDocumentLoader.load(ExtractingDocumentLoader.java:158)
>> at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>> at
>> org.
solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
>
> etc etc...
>
> --
> View this message in context:
> http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512952.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
--
Lance Norskog
goks...@gmail.com
e.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
etc etc...
--
View this message in context:
http://old.nabble.com/Posting
/old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512455.html
Sent from the Solr - User mailing list archive at Nabble.com.
. The data for these
text fields comes from multiple pdf files. As i am currently supporting 4
locales, I will have a different pdf file for each locale.In addition I
have a number of other fields that are used by the application. Solr will be
returning a reference used by the application to
40 matches
Mail list logo