q=doc_content?Try q=id:""
Solr Cell and DIH are comparable (in that they are about getting content into
Solr) but "unrelated" to TVRH. TVRH is about inspecting indexed content,
regardless of how it got in.
Erik
> On May 1, 2019, at 3:14 PM, Geoffrey Will
I am using Solr in a web app to extract text from .pdf, and docx files. I was
wondering if I can access the TermFreq and TermPosition vectors via the HTTP
interface exposed by Solr Cell. I’m posting/getting documents fine, I’ve
enabled the TV, TFV etc in the managed schema:
http://localhost
Several things:
1> Please don’t use add-unknown…. It’s fine for prototyping, but guesses field
definitions.
2> the solrocnfig appears to be malformed, I’m surprised it fires up at all.
This never terminates for instance:
able to point her in the right direction
more quickly than I can.
Here is her original inquiry:
I am pulling data from a local drive for indexing. I am using solr cell and
tika in schemaless mode. I am attempting to rewrite certain field information
prior to indexing using
The tika.config param is documented here:
https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-solr-extractingrequesthandler
I notice that the code
(https://github.com/apache/lucene-solr/blob/964cc88cee7d62edf03a923e3217809d630af5d5/solr
Robertson, Eric J :
> Hello all,
>
> Currently trying to define a tika config to use when posting a pdf to Solr
> Cell as we may want to override the default tika configuration depending on
> type of document being ingested.
>
> In the docs it lists tika.config as an input param
Hello all,
Currently trying to define a tika config to use when posting a pdf to Solr Cell
as we may want to override the default tika configuration depending on type of
document being ingested.
In the docs it lists tika.config as an input parameter to the Solr Cell
endpoint. Though in my
process can improve the overall stability of the SolR service.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On Apr 25, 2018, 12:49 PM -0400, Shawn Heisey , wrote:
> On 4/25/2018 4:02 AM, Lee Carroll wrote:
> > *We don't recommend using solr-cell for production indexing.*
>
On 4/25/2018 4:02 AM, Lee Carroll wrote:
*We don't recommend using solr-cell for production indexing.*
Ok. Are the reasons for:
Performance. I think we have rather modest index requirement (1000 a day...
on a busy day)
Security. The index workflow is, upload files to public facing s
Agreed. The app will have a few implementations for storing the binary
file. Easiest for a user to configure for proto-typing would be store in
index impl. A live impl would probably be fs
*We don't recommend using solr-cell for production indexing.*
Ok. Are the reasons for:
Performa
On 4/24/2018 10:26 AM, Lee Carroll wrote:
> Does the solr cell contrib give access to the files raw content along with
> the extracted metadata?\
That's not usually the kind of information you want to have in a Solr
index. Most of the time, there will be an entry in the Solr index
Does the solr cell contrib give access to the files raw content along with
the extracted metadata?
cheers Lee C
Tika (Solr Cell) to extract content from HTML document
instead of Solr's MostlyPassthroughHtmlMapper ?
As a bonus here's a Dropwizard Tika wrapper that gives you a Tika web service
https://github.com/mattflax/dropwizard-tika-server written by a colleague of
mine at Flax. Hope this is
includes Charlie's advice and
> the link to Erick's blog post whenever Tika is used. 😊
> >
> >
> > -Original Message-
> > From: Charlie Hull [mailto:char...@flax.co.uk]
> > Sent: Monday, April 9, 2018 12:44 PM
> > To: solr-user@lucene.apac
;s blog post whenever Tika is used. 😊
>
>
> -Original Message-
> From: Charlie Hull [mailto:char...@flax.co.uk]
> Sent: Monday, April 9, 2018 12:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to use Tika (Solr Cell) to extract content from HTML
> document instead of Solr&
Oh this is great! Saves me a whole bunch of manual work.
Thanks!
-Original Message-
From: Charlie Hull [mailto:char...@flax.co.uk]
Sent: Monday, April 09, 2018 2:15 PM
To: solr-user@lucene.apache.org
Subject: [EXT] Re: How to use Tika (Solr Cell) to extract content from HTML
document
> I will integrate Tika in my Java app and use SolrJ to send data to Solr.
>
>
> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Monday, April 09, 2018 11:24 AM
> To: solr-user@lucene.apache.org
> Subject: [EXT] RE: How to use Tika (So
Thank you Charlie, Tim.
I will integrate Tika in my Java app and use SolrJ to send data to Solr.
-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Monday, April 09, 2018 11:24 AM
To: solr-user@lucene.apache.org
Subject: [EXT] RE: How to use Tika (Solr Cell
2018 12:44 PM
To: solr-user@lucene.apache.org
Subject: Re: How to use Tika (Solr Cell) to extract content from HTML document
instead of Solr's MostlyPassthroughHtmlMapper ?
I'd recommend you run Tika externally to Solr, which will allow you to catch
this kind of problem and prevent i
I'd recommend you run Tika externally to Solr, which will allow you to
catch this kind of problem and prevent it bringing down your Solr
installation.
Cheers
Charlie
On 9 April 2018 at 16:59, Hanjan, Harinder
wrote:
> Hello!
>
> Solr (i.e. Tika) throws a "zip bomb" exception with certain docum
Hello!
Solr (i.e. Tika) throws a "zip bomb" exception with certain documents we have
in our Sharepoint system. I have used the tika-app.jar directly to extract the
document in question and it does _not_ throw an exception and extract the
contents just fine. So it would seem Solr is doing someth
Solr, using the
>> ExtractingRequestHandler. Basically, when indexing a PDF (for
>> example) I get all the metadata mixed into the "content" field along
>> with the content. See:
>> <https://stackoverflow.com/questions/47934257/importing-files-with-
eld along
> with the content. See:
> <https://stackoverflow.com/questions/47934257/importing-files-with-solr-cell-tika-is-mixing-metadata-fields-with-content>
> for the gory details.
>
> I'm guessing this is the same basic issue as
> <https://issues.apache.org/jira/browse/SOLR-9178&g
Hi all, I have been having an issue with Solr, using the
ExtractingRequestHandler. Basically, when indexing a PDF (for
example) I get all the metadata mixed into the "content" field along
with the content. See:
<https://stackoverflow.com/questions/47934257/importing-files-with-solr
uly wrote:
> Hello,
>
> I have been successfully able to index archive files (zip, tar, and the
> like) using solr cell, but the archive is returned as a single document
> when I do queries. Is there a way to configure it so that files are
> extracted recursively, and indexed sepa
Hello,
I have been successfully able to index archive files (zip, tar, and the
like) using solr cell, but the archive is returned as a single document
when I do queries. Is there a way to configure it so that files are
extracted recursively, and indexed separately?
I know that if I set the
You will have to configure your schema.xml in Solr.
What version are you using?
On Fri, May 20, 2016 at 2:17 AM, scott.chu wrote:
>
> I have a mysql table with over 300M blog articles. The records are in html
> format. Is it possible to import these records using only Solr
> CELL
I have a mysql table with over 300M blog articles. The records are in html
format. Is it possible to import these records using only Solr CELL+TIKA+DIH to
some Solr with schema? I mean when importing, I can map schema on mysql to
schema in Solr?
scott.chu,scott@udngroup.com
2016/5/20 (週五)
these formats:
>
> -MM-dd'T'HH:mm:ss'Z'
> -MM-dd'T'HH:mm:ss
> -MM-dd
> -MM-dd hh:mm:ss
> -MM-dd HH:mm:ss
> EEE MMM d hh:mm:ss z
> EEE, dd MMM HH:mm:ss zzz
> , dd-MMM-yy HH:mm:ss zzz
> EEE MMM d HH:mm:ss
HH:mm:ss'Z'
-MM-dd'T'HH:mm:ss
-MM-dd
-MM-dd hh:mm:ss
-MM-dd HH:mm:ss
EEE MMM d hh:mm:ss z
EEE, dd MMM HH:mm:ss zzz
, dd-MMM-yy HH:mm:ss zzz
EEE MMM d HH:mm:ss
See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+
7;HH:mm:ss
Thanks in advance
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Cell-Tika-date-formats-tp4138478.html
Sent from the Solr - User mailing list archive at Nabble.com.
You can have a look here:
http://solr.pl/en/2011/04/04/indexing-files-like-doc-pdf-solr-and-tika-integration/
2013/10/10 Peter Bleackley
> I'm trying to index a set of PDF documents with Solr 4.5.0. So far I can
> get Solr to ingest the entire document as one long string, stored in the
> index
I'm trying to index a set of PDF documents with Solr 4.5.0. So far I can
get Solr to ingest the entire document as one long string, stored in the
index as "content". However, I want to index structure within the documents.
I know that the ExtractingRequestHandler uses Apache Tika to convert the
Thanks Erick, This is how I was doing it but when I saw the Solr Cell
stuff I figured I'd give it a go. What I ended up doing is the following
ModifiableSolrParams params = indexer.index(artifact);
params.add("fmap.content", "my_custom_field");
params.a
ol over
what's done.
Here's a skeletal program with indexing from a DB mixed in, but
it shouldn't be hard at all to pull the DB parts out.
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/
FWIW,
Erick
On Thu, Sep 5, 2013 at 5:28 PM, Jamie Johnson wrote:
> Is it possible to c
Is it possible to configure solr cell to only extract and store the body of
a document when indexing? I'm currently doing the following which I
thought would work
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("defaultField", "content");
params.
Another options similar to this would be the new file system
WatchService available in java 7:
http://docs.oracle.com/javase/tutorial/essential/io/notification.html
Arcadius.
On 15 March 2013 15:22, Michael Della Bitta
wrote:
> Niklas,
>
> In Linux, the API for watching for filesystem changes i
Take a look at ManifoldCF, whch has a file system crawler which can track
changed files.
-- Jack Krupansky
-Original Message-
From: Niklas Langvig
Sent: Friday, March 15, 2013 11:10 AM
To: solr-user@lucene.apache.org
Subject: solr cell
We have all our documents (doc, docx, pdf) on a
Niklas,
In Linux, the API for watching for filesystem changes is called
inotify. You'd need to write something to listen to those events and
react accordingly.
Here's a brief discussion about it:
http://stackoverflow.com/questions/4062806/inotify-how-to-use-it-linux
Michael Della Bitta
---
Hi Chris thank you for replying. My "content" field in the schema is
stored="true" and indexed="false" because I am copying the "content" field
in "text" field which is by default indexed="true".
I was having a query that I am able to search in the html documents I had
fed to the solr, but as the
: Hi everyone, i am new to solr technology and not getting a way to get back
: the original HTML document with Hits highlighted into it. what
: configuration and where i can do to instruct SolrCell/ Tika so that it does
: not strips down the tags of HTML document in the content field.
I _think_ w
---Original Message- From: Divyanand Tiwari
> Sent: Monday, February 18, 2013 10:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How can i instruct the Solr/ Solr Cell to output the original
> HTML document which was fed to it.?
>
>
> Thank you for replying sir !!!
ansky
-Original Message-
From: Divyanand Tiwari
Sent: Monday, February 18, 2013 10:52 PM
To: solr-user@lucene.apache.org
Subject: Re: How can i instruct the Solr/ Solr Cell to output the original
HTML document which was fed to it.?
Thank you for replying sir !!!
I have two queries related
Thank you for replying sir !!!
I have two queries related with this -
1) So in this case which request handler I have to use because
'ExtractingRequestHandler' by default strips the html content and the
default handler 'UpdateRequestHandler' does not accepts the HTML contrents.
2) How can I 'Ext
highlighting.
See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
-- Jack Krupansky
-Original Message-
From: Divyanand Tiwari
Sent: Monday, February 18, 2013 7:28 AM
To: solr-user@lucene.apache.org
Subject: How can i instruct the Solr/ Solr
roach...
FWIW,
Erick
On Tue, Sep 25, 2012 at 10:04 AM, wrote:
> The difference with solr cell is, that i'am sending every single document
> to solr cell and don't collect them until i have a couple of them in my
> memory.
> Using mainly the code form here:
&g
a
separate process) to minimize thread issues, GC issues, hung parsers, etc.
-- Jack Krupansky
-Original Message-
From: Alexandre Rafalovitch
Sent: Tuesday, September 25, 2012 10:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cell Questions
Are you by any chance committing
http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)
On Mon, Sep 24, 2012 at 10:04 AM, wrote:
> Hi,
>
> Im currently experimenting with Solr
The difference with solr cell is, that i'am sending every single document
to solr cell and don't collect them until i have a couple of them in my
memory.
Using mainly the code form here:
http://wiki.apache.org/solr/ExtractingRequestHandler#SolrJ
Erick Erickson schrieb am 25.09.201
t?
Best
Erick
On Tue, Sep 25, 2012 at 5:23 AM, wrote:
> Thank you Erick for your respone,
>
> I've already tried what you've suggested and got some out of memory
> exceptions. Because of this i like the solution with solr Cell where i can
> send the file directly t
Thank you Erick for your respone,
I've already tried what you've suggested and got some out of memory
exceptions. Because of this i like the solution with solr Cell where i can
send the file directly to solr via stream and don't collect them in my
memory.
And another question
d to do the indexing
Best
Erick
On Mon, Sep 24, 2012 at 10:04 AM, wrote:
> Hi,
>
> Im currently experimenting with Solr Cell to index files to Solr. During
> this some questions came up.
>
> 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads
> at th
Hi,
Im currently experimenting with Solr Cell to index files to Solr. During
this some questions came up.
1. Is it possible (and wise) to connect to Solr Cell with multiple Threads
at the same time to index several documents at the same time?
This question came up because my prrogramm takes
ber 17, 2012 1:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing PDF-Files using Solr Cell
Thank you for your response.
I'm writing my Bachelor-Thesis about Solr and my company doesn't want me to
use a beta-version.
I dont want to be annoying, but "how" do i direct the
ng.
>
> Again, this is all simplified in Solr 4.0-BETA.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Alexander Troost
> Sent: Sunday, September 16, 2012 11:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing PDF-Files using Solr Cell
>
>
n Solr 4.0-BETA.
-- Jack Krupansky
-Original Message-
From: Alexander Troost
Sent: Sunday, September 16, 2012 11:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing PDF-Files using Solr Cell
Hi, first of all: Thank you for that quick response!
But i am not sure if i am doing this r
ost
> Sent: Sunday, September 16, 2012 10:16 PM
> To: solr-user@lucene.apache.org
> Subject: Indexing PDF-Files using Solr Cell
>
>
> Hello *,
>
> I've got a problem indexing and searching PDF-Files.
>
> It seems like Solr doenst index the name of the file.
>
> In re
ject: Indexing PDF-Files using Solr Cell
Hello *,
I've got a problem indexing and searching PDF-Files.
It seems like Solr doenst index the name of the file.
In returning i only get
A28240application/pdfdoc52012-09-17T01:45:39Z
He founds the right document, but no content or title is displa
Hello *,
I've got a problem indexing and searching PDF-Files.
It seems like Solr doenst index the name of the file.
In returning i only get
A28240application/pdfdoc52012-09-17T01:45:39Z
He founds the right document, but no content or title is displayed in the
XML-Response. Where do i config tha
It's pretty easy to accidentally run into the AWT stuff if you're
doing anything that involves image processing, which I would expect a
generic RTF parser might do.
Michael Della Bitta
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
w
The backstory here is that Tika uses a library that for some crazy
reason is inside the Java AWG graphics toolkit. (I think the RTF
parser?)
On Wed, Aug 15, 2012 at 5:57 AM, Ahmet Arslan wrote:
>> You can try passing
>> -Djava.awt.headless=true as one of the arguments
>> when you start Jetty to s
> You can try passing
> -Djava.awt.headless=true as one of the arguments
> when you start Jetty to see if you can get this to go away
> with no ill
> effects.
I started jetty using : 'java -Djava.awt.headless=true -jar start.jar' and
successfully indexed two pdf files. That icon didn't appeared :
You can try passing -Djava.awt.headless=true as one of the arguments
when you start Jetty to see if you can get this to go away with no ill
effects.
Michael Della Bitta
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Le 15 août 2012 à 13:03, Ahmet Arslan a écrit :
> Hi Paul, thanks for the explanation. So is it nothing to worry about?
it is nothing to worry about except to remember that you can't run this step in
a daemon-like process.
(on Linux, I had to set-up a VNC-server for similar tasks)
paul
> the dock icon appears when AWT starts, e.g. when a font is
> loaded.
> You can prevent it using the headless mode but this is
> likely to trigger an exception.
> Same if your user is not UI-logged-in.
Hi Paul, thanks for the explanation. So is it nothing to worry about?
Ahmet,
the dock icon appears when AWT starts, e.g. when a font is loaded.
You can prevent it using the headless mode but this is likely to trigger an
exception.
Same if your user is not UI-logged-in.
hope it helps.
Paul
Le 15 août 2012 à 01:30, Ahmet Arslan a écrit :
> Hi All,
>
> I have set
> When I send a scanned pdf to extraction request
> handler, below icon appears in my Dock.
>
> http://tinypic.com/r/2mpmo7o/6
> http://tinypic.com/r/28ukxhj/6
I found that text-extractable pdf files triggers above weird icon too.
curl
"http://localhost:8983/solr/update/extract?literal.id=solr-
ostScript fonts.
Try a "normal" PDF for comparison.
-- Jack Krupansky
-Original Message-
From: Ahmet Arslan
Sent: Tuesday, August 14, 2012 7:30 PM
To: solr-user@lucene.apache.org
Subject: scanned pdf with solr cell
Hi All,
I have set of rich documents. Some of them are scanned
Hi All,
I have set of rich documents. Some of them are scanned pdf files. When I send a
scanned pdf to extraction request handler, below icon appears in my Dock.
http://tinypic.com/r/2mpmo7o/6
http://tinypic.com/r/28ukxhj/6
Does anyone know what this is?
curl
"http://localhost:8983/solr/docum
Hi ,
I'm using solr cell(solrj) to index plain text files, but am encountering
IllegalCharsetNameException: Could you please point out if anything should
be added in schema.xml file. I could index the other mime types
efficiently. I gave the field type as
Hi John,
See discussion about the issue of indexing contents of ZIP files:
https://issues.apache.org/jira/browse/SOLR-2416
Depending on your use case, you may be able to write a Tika parser which
handles your specific case, such as uncompressing a GZIP file and using
AutoDetect on its contents
Is it possible to extract content for file types that Tika doesn’t support
without changing and rebuilding Tika? Do I need to specify a tika.config
file in the solrconfig.xml file, and if so, what is the format of that file?
One example that I’m trying to solve is for a document management syst
: Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.extraction.ExtractingRequestHandler'
:
: With the jetty and the provided example, I have no problem. It all happens
when I use tomcat and solr.
:
: My setup is as follows:
:
: I downloaded the apache-solr-3.3.0 and
Can you please guide me through step-by-step usgae of Solr Cell installation?
regards,
Sina
--
Sina Fakhraee , PhD candidate
Department of Computer Science
Wayne State University
5057 Woodward Avenue
3rd floor, Suite 3105
Detroit, Michigan 48202
(517)974-8437(Cell)
http://uwerg.c
Latest version is 3.4, and it is fairly compatible with 1.4.1, but you have to
reindex.
First step migration can be to continue using your 1.4 schema on new solr.war
(and SolrJ), but I suggest you take a few hours upgrading your schema and
config as well.
--
Jan Høydahl, search solution archite
On 10/10/2011 3:39 PM, � wrote:
Hi,
If you have 4Gb on your server total, try giving about 1Gb to Solr, leaving 3Gb
for OS, OS caching and mem-allocation outside the JVM.
Also, add 'ulimit -v unlimited' and 'ulimit -s 10240' to /etc/profile to
increase virtual memory and stack limit.
I will
Hi,
If you have 4Gb on your server total, try giving about 1Gb to Solr, leaving 3Gb
for OS, OS caching and mem-allocation outside the JVM.
Also, add 'ulimit -v unlimited' and 'ulimit -s 10240' to /etc/profile to
increase virtual memory and stack limit.
And you should also consider upgrading to
On 10/07/2011 6:21 PM, � wrote:
Hi,
What Solr version?
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42.
Its running on a Suse Linux VM.
How often do you do commits, or do you use autocommit?
I had been doing commits every 100 documents (the entire set is about
3
solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 7. okt. 2011, at 20:19, Tod wrote:
> I'm batching documents into solr using solr cell with the 'stream.url'
> parameter. Everything is working fine until I get to about 5k docume
I'm batching documents into solr using solr cell with the 'stream.url'
parameter. Everything is working fine until I get to about 5k documents
in and then it starts issuing 'read timeout 500' errors on every document.
The sysadmin says there's plenty of CP
I am new to both Solr and Cell, so sorry if I am misusing some of the
terminologies. So the problem I am trying to solve is to index a PDF document
using Solr Cell where I want to exclude part of it via XPATH. I am using Solr
release 3.1. When researching the user list, I came across one entry
tion from files with Solr Cell. Some of
> the files we are indexing are large, and have much content. I would like to
> limit the amount of data I index to a specified limit of characters (example
> 300 chars) which I will use as a document preview. Is this possible to set as
>
Hello everyone,
I have just gotten extracting information from files with Solr Cell. Some of
the files we are indexing are large, and have much content. I would like to
limit the amount of data I index to a specified limit of characters (example
300 chars) which I will use as a document
- www.cominvent.com
On 30. mai 2011, at 22.46, Greg Georges wrote:
> Hello everyone,
>
> We have our infrastructure on Amazon cloud servers, and we use the S3 file
> system. We need to index files using Solr Cell. From what I have read, we
> need to stream files to Solr in order for it
Hello everyone,
We have our infrastructure on Amazon cloud servers, and we use the S3 file
system. We need to index files using Solr Cell. From what I have read, we need
to stream files to Solr in order for it to extract the metadata into the index.
If we stream data through a public url there
Hi,
I have a question about Solr Cell please.
I index some files. For example, if I want to extract the filename, then use
a hash function on it like MD5 and then store it on Solr ; the correct way
is to use Tika « manually » to extract the metadata I want, do the
transformations on it and
jel...@openindex.io]
> Sent: Friday, March 25, 2011 1:23 PM
> To: solr-user@lucene.apache.org
> Cc: Upayavira
> Subject: Re: Multiple Cores with Solr Cell for indexing documents
>
> You can only set properties for a lib dir that must be used in solrconfig.xml.
> You can use sharedLi
__
From: Markus Jelsma [markus.jel...@openindex.io]
Sent: Friday, March 25, 2011 1:23 PM
To: solr-user@lucene.apache.org
Cc: Upayavira
Subject: Re: Multiple Cores with Solr Cell for indexing documents
You can only set properties for a lib dir that must be used in solrconfig.xml.
You can use shared
solr.xml file is > > sharedLib="lib">. That is housed in .../example/solr/. So, does it
> > > look in .../example/lib or .../example/solr/lib?
> > >
> > > ~Brandon Waterloo
> > > ____
> > > From: Markus Jelsma [markus.jel...@openindex.io]
> > &g
jel...@openindex.io]
> > Sent: Thursday, March 24, 2011 11:29 AM
> > To: solr-user@lucene.apache.org
> > Cc: Brandon Waterloo
> > Subject: Re: Multiple Cores with Solr Cell for indexing documents
> >
> > Sounds like the Tika jar is not on the class path. Add it to a
_
> From: Markus Jelsma [markus.jel...@openindex.io]
> Sent: Thursday, March 24, 2011 11:29 AM
> To: solr-user@lucene.apache.org
> Cc: Brandon Waterloo
> Subject: Re: Multiple Cores with Solr Cell for indexing documents
>
> Sounds like the Tika jar is not on the class pat
Markus Jelsma [markus.jel...@openindex.io]
Sent: Thursday, March 24, 2011 11:29 AM
To: solr-user@lucene.apache.org
Cc: Brandon Waterloo
Subject: Re: Multiple Cores with Solr Cell for indexing documents
Sounds like the Tika jar is not on the class path. Add it to a directory where
Solr's looking f
Markus Jelsma [markus.jel...@openindex.io]
Sent: Thursday, March 24, 2011 11:29 AM
To: solr-user@lucene.apache.org
Cc: Brandon Waterloo
Subject: Re: Multiple Cores with Solr Cell for indexing documents
Sounds like the Tika jar is not on the class path. Add it to a directory where
Solr's looking f
Sounds like the Tika jar is not on the class path. Add it to a directory where
Solr's looking for libs.
On Thursday 24 March 2011 16:24:17 Brandon Waterloo wrote:
> Hello everyone,
>
> I've been trying for several hours now to set up Solr with multiple cores
> with Sol
Hello everyone,
I've been trying for several hours now to set up Solr with multiple cores with
Solr Cell working on each core. The only items being indexed are PDF, DOC, and
TXT files (with the possibility of expanding this list, but for now, just
assume the only things in the index shou
Hello everyone,
I've been trying for several hours now to set up Solr with multiple cores with
Solr Cell working on each core. The only items being indexed are PDF, DOC, and
TXT files (with the possibility of expanding this list, but for now, just
assume the only things in the index shou
In case the exact problem was not clear to somebody:
The problem with FileUpload interpreting file data as regular form fields is
that, Solr thinks there are no content streams in the request and throws a
"missing_content_stream" exception.
On Thu, Mar 10, 2011 at 10:59 AM, Karthik Shiraly <
karth
Hi,
I'm using Solr 1.4.1.
The scenario involves user uploading multiple files. These have content
extracted using SolrCell, then indexed by Solr along with other information
about the user.
ContentStreamUpdateRequest seemed like the right choice for this - use
addFile() to send file data, and use
Working with the latest Solr Trunk code and seems the Tika handlers
for Solr Cell (ExtractingDocumentLoader.java) and Data Import handler
(TikaEntityProcessor.java) fails to index the zip file contents again.
It just indexes the file names again.
This issue was addressed some time back, late last
apache.org/2009-09/msg00037.html
Looking at my libraries it seems I am using pdfbox 0.7.3. I am using maven
for building and pdfbox 0.7.3 appears to have come from the tika-parsers 0.4
pom file which in turn appears to have come solr-cell 1.4.0 pom file. In my
project's maven pom file I hav
1 - 100 of 200 matches
Mail list logo