Bump?
On Mon, Jun 27, 2011 at 06:17:42PM +0100, me said:
> On the SolrJetty page
>
> http://wiki.apache.org/solr/SolrJetty
>
> there's a link to a tar ball
>
> http://wiki.apache.org/solr/SolrJetty?action=AttachFile&do=view&target=DEMO_multiple_webapps_jetty_6.1.3.tgz
>
> which fails with th
On the SolrJetty page
http://wiki.apache.org/solr/SolrJetty
there's a link to a tar ball
http://wiki.apache.org/solr/SolrJetty?action=AttachFile&do=view&target=DEMO_multiple_webapps_jetty_6.1.3.tgz
which fails with the error
You are not allowed to do AttachFile on this page.
Can someone fix
First, a couple of assumptions.
We have boxes with a large amount (~70Gb) of memory which we're running
Solr under. We've currently set -Xmx to 25Gb with the GC settings
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
We're reluctant to up the -Xmx
Due to some emergency maintenance I needed to run delete on a large
number of documents in a 200Gb index.
The problem is that it's taking an inordinately long amount of time (2+
hours so far and counting) and is steadily eating up disk space -
presumably up to 2x index size which is getting awf
On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said:
> It really shouldn't be that slow... how many documents are in your
> index, and how many match -type:foo?
Total number of docs is 161,000,000
type:foo 39,000,000
-type:foo 122,200,000
type:bar 90,000,000
We're aware it's large an
On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
> This is what I do instead, to rewrite the query to mean the same thing but
> not give the lucene query parser trouble:
>
> fq=( (*:* AND -type:foo) OR restriction_id:1)
>
> "*:*" means "everything", so (*:* AND -type:foo) means
I have a field 'type' that has several values. If it's type 'foo' then
it also has a field 'restriction_id'.
What I want is a filter query which says "either it's not a 'foo' or if
it is then it has the restriction '1'"
I expect two matches - one of type 'bar' and one of type 'foo'
Neither
On Wed, Apr 06, 2011 at 12:05:57AM +0200, Jan Høydahl said:
> Just curious, was there any resolution to this?
Not really.
We tuned the GC pretty aggressively - we use these options
-server
-Xmx20G -Xms20G -Xss10M
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+CMSIncrementalMode
-XX:+CMSIncrem
On Fri, Jan 28, 2011 at 12:29:18PM -0500, Yonik Seeley said:
> That's odd - there should be nothing special about negative numbers.
> Here are a couple of ideas:
> - if you have a really big index and querying by a negative number
> is much more rare, it could just be that part of the index wasn'
On Mon, Feb 07, 2011 at 02:06:00PM +0100, Markus Jelsma said:
> Heap usage can spike after a commit. Existing caches are still in use and new
> caches are being generated and/or auto warmed. Can you confirm this is the
> case?
We see spikes after replication which I suspect is, as you say, becau
On Thu, Jan 27, 2011 at 11:32:26PM +, me said:
> If I do
>
> qt=dismax
> fq=uid:1
>
> (or any other positive number) then queries are as quick as normal - in
> the 20ms range.
For what it's worth uid is a TrieIntField with precisionStep=0,
omitNorms=true, positionIncrementGap=0
On Tue, Jan 25, 2011 at 01:28:16PM +0100, Markus Jelsma said:
> Are you sure you need CMS incremental mode? It's only adviced when running on
> a machine with one or two processors. If you have more you should consider
> disabling the incremental flags.
I'll test agin but we added those to get b
If I do
qt=dismax
fq=uid:1
(or any other positive number) then queries are as quick as normal - in
the 20ms range.
However, any of
fq=uid:\-1
or
fq=uid:[* TO -1]
or
fq=uid:[-1 to -1]
or
fq=-uid:[0 TO *]
then queries are incredibly slow - in the 9 *s
On Mon, Jan 24, 2011 at 10:55:59AM -0800, Em said:
> Could it be possible that your slaves not finished their replicating until
> the new replication-process starts?
> If so, there you got the OOM :).
This was one of my thoughts as well - we're currently running a slave
which has no queries in it
On Mon, Jan 24, 2011 at 08:00:53PM +0100, Markus Jelsma said:
> Are you using 3rd-party plugins?
No third party plugins - this is actually pretty much stock tomcat6 +
solr from Ubuntu. The only difference is that we've adapted the
directory layout to fit in with our house style
We have two slaves replicating off one master every 2 minutes.
Both using the CMS + ParNew Garbage collector. Specifically
-server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
but periodically they both get into a GC storm and just keel over.
Looki
On Mon, Jan 10, 2011 at 05:58:42PM -0500, François Schiettecatte said:
> http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html(you
> need to read this one)
>
> http://java.sun.com/performance/reference/whitepapers/tuning.html (and
> this one).
Yeah, I have these two pages b
On Mon, Jan 10, 2011 at 01:56:27PM -0500, Brian Burke said:
> This sounds like it could be garbage collection related, especially
> with a heap that large. Depending on your jvm tuning, a FGC could
> take quite a while, effectively 'pausing' the JVM.
>
> Have you looked at something like jstat
I have a fairly classic master/slave set up.
Response times on the slave are generally good with blips periodically,
apparently when replication is happening.
Occasionally however the process will have one incredibly slow query and
will peg the CPU at 100%.
The weird thing is that it will rema
We've got a largish corpus (~94 million documents). We'd like to be able
to sort on one of the string fields. However this takes an incredibly
long time. A warming query for that field takes about ~20 minutes.
However most of the time the result sets are small since we use filters
heavily - typ
On Fri, Nov 19, 2010 at 12:01:09AM +, me said:
> I'm baffled - I've had way bigger indexes than this before with no
> performance problems. At first it was the frequent updates but the fact
> that it happens even when the indexer isn't running seems to put paid to
> that.
More information:
We currently have a 30G index with 73M of .tii files running on a
machine with 4 Intel 2.27GHz Xeons with 15G of memory.
About once a second a process indexes ~10-20 smallish documents using
the XML Update Handler. A commit happens after every update. However we
see this behaviour even if the i
On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said:
> You should query against the indexer. I'm impressed that you got 5s
> replication to work reliably.
That's our current solution - I was just wondering if there was anything
I was missing.
Thanks!
We've been trying to get a setup in which a slave replicates from a
master every few seconds (ideally every second but currently we have it
set at every 5s).
Everything seems to work fine until, periodically, the slave just stops
responding from what looks like it running out of memory:
org.ap
On Mon, Oct 11, 2010 at 07:17:43PM +0100, me said:
> It was just an idea though and I was hoping that there would be a
> simpler more orthodox way of doing it.
In the end, for anyone who cares, we used dynamic fields.
There are a lot of them but we haven't seen performance impacted that
badly s
On Sat, Oct 09, 2010 at 06:31:19PM -0400, Erick Erickson said:
> I'm confused. What do you mean that a user can "set any
> number of arbitrarily named fields on a document". It sounds
> like you are talking about a user adding arbitrarily may entries
> to a multi-valued field? Or is it some kind of
On Fri, Oct 08, 2010 at 04:56:38PM -0700, kenf_nc said:
>
> What behavior are you trying to see? You are allowed to sort on fields that
> are potentially empty, they just sort to the top or bottom depending on your
> sort order. Now, if you Query on the fields that could be empty, you won't
> see
We have a set of documents - which have a standard set of fields.
However they can also have an arbitary number of custom fields which may
each have a value. So some docs may look like
id: 1
title: Document 1
created: 2010-10-09 15:23:00
custom_fields:
- foo : 5
- bar : 6
id: 2
titl
On Wed, Sep 01, 2010 at 01:05:47AM +0100, me said:
> I'm trying to index a latLon field.
>
>
>
Turns out changing it to
fixed it.
I'm trying to index a latLon field.
I have a fieldType in my schema.xml that looks like
and a field that looks like
I'm trying upload via the JSON update handler but I'm getting a 400
error
undefined field location_0_latLon
FWIW the JSON looks like
"location": "38.044337,-103.513824"
On Thu, Apr 22, 2010 at 02:15:08AM +0100, me said:
> It looks like org.apache.lucene.search.highlight.TextFragment has the
> right information to do this (i.e textStartPos)
Turns out that it doesn't seem to have the right information in that
textStartPos always seems to be 0 (and textEndPos just
Having poked around little it doesn't look like there's an query param
to turn this on but it'd be really useful if highlighted fragments could
have a character offset return somehow - maybe something like
Lorem ipsum dolor sit amet, consectetur adipisicing
On Wed, Feb 03, 2010 at 07:38:13PM -0800, Lance Norskog said:
> The debugQuery parameter shows you how the query is parsed into a tree
> of Lucene query objects.
Well, that's kind of what I'm asking - I know how the query is being
parsed:
myers 8e psychology chapter 9
myers 8e psychology chapte
According to my logs
org.apache.solr.handler.component.QueryComponent.process()
takes a significant amount of time (>5s but I've seen up to 15s) when a
query has an odd pattern of numbers in e.g
"neodymium megagauss-oersteds (MGOe) (1 MG·Oe = 7,958·10³ T·A/m = 7,958
kJ/m³"
"myers 8e psycholo
The spellchecker in my 1.4 install started behaving increasingly
erratically andsuggestions would only be returned some of the time with
the same query.
I tried to force a rebuild using
spellcheck.build=yes
The full request being
/select/?q=alexandr the great&
indent=on&
fl=title&
spellchec
I have a Master server with two Slaves populated via Solr 1.4 native
replication.
Slave1 syncs at a respectable speed i.e around 100MB/s but Slave2 runs
much, much slower - the peak I've seen is 56KB/s.
Both are running off the same hardware with the same config -
compression is set to 'intern
On Mon, Nov 23, 2009 at 12:10:42PM -0800, Chris Hostetter said:
> ...hmm, you shouldn't have to reindex everything. arey ou sure you
> restarted solr after making the enablePositionIncrements="true" change to
> the query analyzer?
Yup - definitely restarted
> what do the offsets look like whe
On Tue, Nov 17, 2009 at 11:09:38AM -0800, Chris Hostetter said:
>
> Several things about your message don't make sense...
Hmm, sorry - a byproduct of building up the mail over time I think.
The query
?q="Here there be dragons"
&fl=id,title,score
&debugQuery=on
&qt=dismax
&qf=title
gets echoed
I have a document with the title "Here, there be dragons" and a body.
When I search for
q = Here, there be dragons
qf = title^2.0 body^0.8
qt = dismax
Which is parsed as
+DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here
dragon"^2.0)~0.01) ()
I get the document as the first hit
On Fri, Oct 30, 2009 at 11:20:19AM +0530, Shalin Shekhar Mangar said:
> That is very strange. IndexReaders do get re-opened after commits. Do you
> see a commit message in the Solr logs?
Sorry for the delay - I've been trying to puzzle over this some more.
The code looks like
server.add(do
We've been trying to build an indexing pipeline using SolrJ but we've
run into a couple of issues - namely that IndexReaders don't seem to get
reopened after a commit().
After an index or delete the change doesn't show up until I restart
solr.
I've tried commit() and commit(true, true) just to
We have an indexing script which has been running for a couple of weeks
now without problems. It indexes documents and then periodically commit
(which is a tad redundant I suppose) both via the HTTP interface.
All documents are indexed to a master and a slave rsyncs them off using
the standard
Our index has some items in it which basically contain a title and a
single word body.
If the user searches for a word in the title (especially if title is of
itself only oen word) then that doc will get scored quite highly,
despite the fact that, in this case, it's not really relevant.
I've t
I know that the Solr FAQ says
"Users should decide for themselves which Servlet Container they
consider the easiest/best for their use cases based on their
needs/experience. For high traffic scenarios, investing time for tuning
the servlet container can often make a big difference."
but is th
44 matches
Mail list logo