Hi All,
I have the same issue. I have installed solr instance on tomcat6. When try
to index pdf I am running into the below exception:
11 Apr, 2011 12:11:55 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError:
org/apache/tika/exception/TikaException
at java.
Hi Yonik !
Thanks for your reply.
I decided to switch to 3.1 and see if the performance would settle down
after building up a proper index. Looking at the average response time from
both installations i can see that 3.1 is now actually performing much better
than 1.4.1 (1.4.1 shows an average of
I changed data-config-sql.xml to
There are no errors, but, the indexed pdf is convert to Numbers..
200 1 202 1 203 1 212 1 222 1 236 1 242 1 244 1 254 1 255
--
Best Regards,
Roy Liu
On Mon, Apr 11, 2011 at 2:02 PM, Roy Liu wrote:
>
\apache-solr-3.1.0\contrib\extraction\lib\tika*.jar
--
Best Regards,
Roy Liu
On Mon, Apr 11, 2011 at 3:10 PM, Mike wrote:
> Hi All,
>
> I have the same issue. I have installed solr instance on tomcat6. When try
> to index pdf I am running into the below exception:
>
> 11 Apr, 2011 12:11:55 PM
Hi Lance,
your are right:
XPathEntityProcessor has the attribut "xsl", so I can use xslt to generate a
xml-File "in the form of the standard Solr update schema".
I will check the performance of this.
Best regards
Karsten
btw. "flatten" is an attribute of the "field"-Tag, not of XPathEntityP
Hi All,
I have installed solr instance on tomcat6. When i tried to index the PDF
file i was able to see the response:
0
479
Query:
http://localhost:8080/solr/update/extract?stream.file=D:\mike\lucene\apache-solr-1.4.1\example\exampledocs\Struts%202%20Design%20and%20Programming1.pdf&stream.cont
Hi Roy,
Thank you for the quick reply. When i tried to index the PDF file i was able
to see the response:
0
479
Query:
http://localhost:8080/solr/update/extract?stream.file=D:\mike\lucene\apache-solr-1.4.1\example\exampledocs\Struts%202%20Design%20and%20Programming1.pdf&stream.contentType=app
Jayendra,
Thanks for the info - been keeping an eye on this list in case this
topic cropped up again. It's currently a background task for me, so
I'll try and take a look at the patches and re-test soon.
Joey - glad you brought this issue up again. I haven't progressed any
further with it.
Hello,
I have some synonyms for city names. Sometimes there are multiple names for
one city, example:.
newyork, newyork city, big apple
I search for "big apple" and get results with new york(synonym)
If somebody search for "big aple" i want a spelling suggestion like: big
apple. How can i fix th
Did you configure synonyms for your field at query time ?
Ludovic.
2011/4/11 royr [via Lucene]
> Hello,
>
> I have some synonyms for city names. Sometimes there are multiple names for
> one city, example:.
>
> newyork, newyork city, big apple
>
> I search for "big apple" and get results with ne
Yes, it looks like this:
will work on query and index time i think.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Spellchecker-with-synonyms-tp2806028p2806157.html
Sent from the Solr - User mailing list archive at Nabble.com
All:
Lately I've been seeing a lot of posts where people paste in parts of their
schema.xml or solrconfig.xml and the results are...er...disappointing. None
of the less-than or greater-than symbols show and the formatting is all over
the map.
Since some mails would come through with the XML forma
In the Distributed Search page (
http://wiki.apache.org/solr/DistributedSearch), it is documented that in
order to perform a distributed search over a sharded index, I should use the
"shards" request parameter, listing the shards to participate in the search
(e.g. ?shards=localhost:8983/solr,localh
Tom,
I think I see where this may be -- it looks like another > 2B terms
bug in Lucene (we are using an int instead of a long in the
TermInfoAndOrd class inside TermInfosReader.java), only present in
3.1.
I'm also mad that Test2BTerms fails to catch this!! I will go fix
that test and confirm it
You can add to your search handler the "shards" parameter :
host1/solr, host2/solr
Is is what you are looking for ?
Ludovic.
2011/4/11 Ran Peled [via Lucene] <
ml-node+2806331-346788257-383...@n3.nabble.com>
> In the Distributed Search page (
> http://wiki.apache.org/solr/DistributedSearc
Hi All,
I am new to solr. I want to implement solr search.
I have to implement two search buttons(1. books and 2. computers and both
are in the same datasource) which are completely different there is no
relation between each other.
Could you please let know how to define the entities in data-con
If it's of any help I've split the processing of PDF files from the
indexing. I put the PDF content into a text file (but I guess you could load
it into a database) and use that as part of the indexing. My processing of
the PDF files also compares timestamps on the document and the text file so
th
Hi,
Apparently, when one RELOADs a core, the synonyms file is not reloaded. Is
this
the expected behaviour? Is it the desired behaviour?
Here's the use-case:
When one is doing purely query-time synonym expansion, ideally one would be
able
to edit synonyms.txt and get it reloaded, so that
I have not worked with shards/distributed, but I think you can probably
specify them as defaults in your requesthandler in your solrconfig.xml
instead.
Somewhere there is (or was) a wiki page on this I can't find right now.
There's a way to specify (for a particular request handler) a default
Hi,
Perhaps you should give Lucene/Solr trunk a try and compare! The Wildcard
query
in trunk should be much faster.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Ueland
> To: solr
Hi,
Can one actually *force* replication of the index from the master without a
commit being issued on the master since the last replication?
I do see "Force a fetchindex on slave from master command:
http://slave_host:port/solr/replication?command=fetchindex"; on
http://wiki.apache.org/solr/S
Thanks Mike,
At first I thought this couldn't be related to the 2.1 Billion terms issue
since the only place we have tons of terms is in the OCR field and this is not
the OCR field. But then I remembered that the total number of terms in all
fields is what matters. We've had no problems with re
I found a simpler command-line method to update the PDF files. On some
documents it does so perfect, the result is a pixel-for-pixel match and none of
the OCR text (which is what all these PDFs are, newspaper articles that have
been passed through OCR) is lost. However, on other documents the
Right, it's the total number of terms across all fields... unfortunately.
This class is used to enroll a term into the terms cache that wraps
the terms dictionary, so in theory you could also hit this issue
during normal searching when a term is looked up once, and then
looked up again (the 2nd t
Thanks Mike,
With the unpatched version, the first time I run the facet query on topicStr it
works fine, but the second time I get the ArrayIndexOutOfBoundsException. If
I try different facets such as language, I don't see the same symptoms. Maybe
the number of facet values needs to exceed s
A quick reminder that there's one week left on special pricing for Lucene
Revolution 2011. Sign up this week and save some serious cash:
- Conference Registration, now $545, a savings of $180 over the $725 late
registration price
- Training Package with 2-day Training plus Conference Re
Hi Lance,
I used XPathEntityProcessor with attribut "xsl" and generate a xml-File "in the
form of the standard Solr update schema".
I lost a lot of performance, it is a pity that XPathEntityProcessor does only
use one thread.
My tests with a collection of 350T Document:
1. use of XPathRecordRea
What is the slave replication behavior if a replication request to pull
indexes takes longer than the replication interval itself?
Anotherwords, if my replication interval is set to be every 30 seconds,
and my indexes are significantly large enough to take longer than 30
seconds to transfer, is t
Yes. It will wait whatever the replication interval is after the most recent
replication completes before attempting again.
On Apr 11, 2011, at 2:42 PM, Parker Johnson wrote:
>
> What is the slave replication behavior if a replication request to pull
> indexes takes longer than the replication
Hi,
Using quoted means "use this as a phrase", not "use this as a literal". :)
I think copying to unstemmed field is the only/common work-around.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
Thanks Larry.
-Parker
On 4/11/11 12:14 PM, "Green, Larry (CMG - Digital)"
wrote:
>Yes. It will wait whatever the replication interval is after the most
>recent replication completes before attempting again.
>
>On Apr 11, 2011, at 2:42 PM, Parker Johnson wrote:
>
>>
>> What is the slave replic
All,
I have a question on the Dismax plugin for the search handler. I have
two test instances of Solr. In one I am using the default search
handler. In this case, the fields that I am working with (slug and
story) are indexed via the all_text filed and the searches are done on
the all_text fiel
Hi Raj,
I'm guessing your slug field is much shorter and thus a match in that field has
more weight than a match is a much longer story field. If you omit norms for
those fields in the schema (and reindex), I believe you will see File 4 drop to
position #4.
Otis
Sematext :: http://semate
Thank you guys for your answers.
I didn't recognise that it will be so easy to do it and example from
http://wiki.apache.org/solr/UpdateJSON#Example works perfectly for me.
Regards,
Andrew
--
View this message in context:
http://lucene.472066.n3.nabble.com/Mongo-REST-interface-and-full-data-impo
Does anyone have any thoughts on this one?
On Fri, Apr 8, 2011 at 9:26 AM, Brian Lamb wrote:
> I've looked at both wiki pages and none really clarify the difference
> between these two. If I copy and paste an existing index value for field and
> do an mlt search, it shows up under match but not r
Hi,
I get this solrj error in development environment.
org.apache.solr.client.solrj.SolrServerException: java.net.SocketException:
Too many open files
At the time there was no reindexing or any write to the index. There were
only different queries genrated using solrj to hit solr server:
I'm curious to know why Solr is not respecting the phrase.
If it consider "manager" as a phrase... shouldn't it return only document
containing that phrase?
-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: April-11-11 3:42 PM
To: solr-user@lucene.apache
> I'm curious to know why Solr is not respecting the phrase.
> If it consider "manager" as a phrase... shouldn't it return only document
> containing that phrase?
A phrase means to solr (or rather to the lucene and dismax query parsers, which
are what understand double-quoted phrases) "these t
Paul: can you elaborate a little bit on what exactly your problem is?
- what is the full component list you are using?
- how are you changing the param value (ie: what does the code look like)
- what isn't working the way you expect?
: I've been using my own QueryComponent (that extends the s
: Q1. Is is possible to pass *analyzed* content to the
:
: public abstract class Signature {
No, analysis happens as the documents are being written to the lucene
index, well after the UpdateProcessors have had a chance to interact with
the values.
: Q2. Method calculate() is using concatenat
I see the same problem (missing markup) in Thunderbird. Seems like
Nabble might be the culprit?
-Mike
On 4/11/2011 8:13 AM, Erick Erickson wrote:
All:
Lately I've been seeing a lot of posts where people paste in parts of their
schema.xml or solrconfig.xml and the results are...er...disappoint
Hi,
I only read the short story. :)
Note that you should post questions like this on solr-user@lucene list, which
is
where I'm replying now.
Since you are just starting with Solr, why not grab the recently released 3.1?
That way you'll get the latest Lucene and the latest Solr.
Otis
Sem
: I have a core with 120+ segment files and I tried partial optimize specify
: maxNumSegments=10, after the optimize the segment files reduced to 64 files;
a) the option you want to specify is "maxSegments" .. not "maxNumSegments"
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes
: I tried it with the example json documents, and even if I add
: overwrite=false to the URL, it still overwrites.
:
: Do this twice:
: curl 'http://localhost:8983/solr/update/json?commit=true&overwrite=false'
--data-binary @books.json -H 'Content-type:application/json'
...the JSON Update Reque
: I see the same problem (missing markup) in Thunderbird. Seems like Nabble
: might be the culprit?
if someone can cite some specific examples (by email message-id, or
subject, or date+sender, or url from nabble, or url from any public
archive, or anything more specific then "posts from nabble
Awesome. Thanks Jayendra. I hadn't caught these patches yet.
I applied SOLR-2416 patch to the solr-3.1 release tag. This resolved the
problem of archive files not being unpacked and indexed with Solr CELL.
Thanks for the FYI.
https://issues.apache.org/jira/browse/SOLR-2416
On Mon, Apr 11, 2011 a
Thanks for the clarification. This make sense.
-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: April-11-11 7:54 PM
To: solr-user@lucene.apache.org
Subject: FW: Exact match on a field with stemming
> I'm curious to know why Solr is not respecting the phrase.
>
Match is the document that's the top result of the query (q param)
that you specify.
Response is the list of documents that are similar to the 'match' document.
-Mike
On Mon, Apr 11, 2011 at 4:55 PM, Brian Lamb
wrote:
> Does anyone have any thoughts on this one?
>
> On Fri, Apr 8, 2011 at 9:26
The DIH has multi-threading. You can have one thread fetching files
and then give them to different threads.
On Mon, Apr 11, 2011 at 11:40 AM, wrote:
> Hi Lance,
>
> I used XPathEntityProcessor with attribut "xsl" and generate a xml-File "in
> the form of the standard Solr update schema".
> I l
Hi Mike-
Please start a new thread for this.
On Mon, Apr 11, 2011 at 2:47 AM, Mike wrote:
> Hi All,
>
> I have installed solr instance on tomcat6. When i tried to index the PDF
> file i was able to see the response:
>
>
> 0
> 479
>
>
> Query:
> http://localhost:8080/solr/update/extract?stream.fi
SOLR-1499 is a plug-in for the DIH that uses Solr as a DataSource.
This means that you can read the database and PDFs separately. You
could index all of the PDF content in one DIH script. Then, when
there's a database update, you have a separate DIH scripts that reads
the old row from Solr, and pul
Ah! Did you set the UTF-8 parameter in Tomcat?
On Mon, Apr 11, 2011 at 2:49 AM, Mike wrote:
> Hi Roy,
>
> Thank you for the quick reply. When i tried to index the PDF file i was able
> to see the response:
>
>
> 0
> 479
>
>
>
> Query:
> http://localhost:8080/solr/update/extract?stream.file=D:\mik
Marius: "I have copied the configuration from 1.4.1 to the 3.1."
Does the Directory implementation show up in the JMX beans? In
admin/statistics.jsp ? Or the Solr startup logs? (Sorry, don't have a
Solr available.)
Yonik:
> What platform are you on? I believe the Lucene Directory
> implementatio
Has anyone tried doing this? Got any tips for someone getting started?
Thanks,
Adam
Sent from my iPhone
Looking at the code, issuing a fetchindex will cause the fetch to occur right
away, with no respect for polling.
- Mark
On Apr 11, 2011, at 12:37 PM, Otis Gospodnetic wrote:
> Hi,
>
> Can one actually *force* replication of the index from the master without a
> commit being issued on the mast
Hoss,
as of now I managed to adjust this in the client code before it touches the
server so it is not urgent at all anymore.
I wanted to avoid touching the client code (which is giving, oh great fun, MSIE
concurrency miseries) hence I wanted a server-side rewrite of the maximum
number of hits
Thanks, Ludovic and Jonathan. Yes, this configuration default is exactly
what I was looking for.
Ran
On Mon, Apr 11, 2011 at 7:12 PM, Jonathan Rochkind wrote:
> I have not worked with shards/distributed, but I think you can probably
> specify them as defaults in your requesthandler in your so
57 matches
Mail list logo