Anybody an idea?
Dec 5, 2012 3:52:32 PM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=500&maxConnectionsPerHost=16
Dec 5, 2012 3:52:33 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [intradesk] webapp=/s
Hi
Query's with wildcards or fuzzy operators are called multi term queries and do
not pass through the field's analyzer as you might expect.
See: http://wiki.apache.org/solr/MultitermQueryAnalysis
-Original message-
> From:Pratyul Kapoor
> Sent: Thu 06-Dec-2012 06:28
> To: solr-user
Hi,
You can either use omitTermFreqAndPositions on that field or set a custom
similarity for that field that returns 1 for tf > 0.
http://wiki.apache.org/solr/SchemaXml#Common_field_options
http://wiki.apache.org/solr/SchemaXml#Similarity
-Original message-
> From:Amit Jha
> Sent: T
Sounds like it's worth a try! Thanks Andre.
Tom
On 5 Dec 2012, at 17:49, Andre Bois-Crettez wrote:
> If you do grouping on source_id, it should be enough to request 3 times
> more documents than you need, then reorder and drop the bottom.
>
> Is a 3x overhead acceptable ?
>
>
>
> On 12/05/20
Hi,
The file descriptor count is always quite low.. At the moment after heavy
usage for a few days file descriptor counts are between 100-150 and I don't
have any errors in the logs. My worry at the moment is around all the
CLOSE_WAIT connections I am seeing. This is particularly true on the
Hey all,
I'm in the process of migrating a single Solr 4.0 instace to a SolrCloud
setup for availability reasons.
After studying the wiki page for SolrCloud I'm not sure what the absolute
minimum setup is that would allow for one machine to go down.
Would it be enough to have one shard with one
Hi,
I currently have this setup:
Bring in data into the "description" schema and then have this code:
To then truncate the description and move it to "truncated_description".
This works fine.
I was wondering, is it possible so that when I bring in data from another
source I actually bring it
-Original message-
> From:Mark Miller
> Sent: Wed 05-Dec-2012 23:23
> To: solr-user@lucene.apache.org
> Subject: Re: The shard called `properties`
>
> See the custom hashing issue - the UI has to be updated to ignore this.
Ah yes, i see it in clusterstate.json.
Thanks for the pointer
I'm sorry, I don't see how the resource loader awareness is relevant to
schema awareness? Or perhaps you didn't imply that? Good to know about the
resource loader though. Although I'm not sure what the resource loader does
at this point (but that's a side track).
I guess I use the core to get to t
It depends on if you are running embedded zk or an external zk ensemble.
One leader and a replica is all you need for Solr to allow on machine to go
down - but if those same machines are running zookeeper, you need 3.
You could also run zookeeper on one external machine and then it would be fine
"but if those same machines are running zookeeper, you need 3."
And one of those 3 can go down? I thought 3 was the minimum number of
zookeepers.
-- Jack Krupansky
-Original Message-
From: Mark Miller
Sent: Thursday, December 06, 2012 9:30 AM
To: solr-user@lucene.apache.org
Subject:
The quorum is the minimun, so it depends on how many you have running in the
ensemble. If it's three or four, then two is the quorum and therefore the
minumum. Three is regarded as a minumum in the ensemble because two makes no
sense.
-Original message-
> From:Jack Krupansky
> Sent: T
On Thu, Dec 6, 2012 at 9:56 AM, Markus Jelsma
wrote:
> The quorum is the minimun, so it depends on how many you have running in the
> ensemble. If it's three or four, then two is the quorum
I think that for 4 ZK servers, then 3 would be the quorum?
-Yonik
http://lucidworks.com
Hi users,
Could you please help us on tuning the solr search performance. we have tried
to do some PT on solr instance with 8GB RAM and 50,000 record in index. and we
got 33 concurrent usr hitting the instance with on avg of 17.5 hits per second
with response time 2 seconds. as it is very high
On Dec 6, 2012, at 6:54 AM, Yonik Seeley wrote:
> On Thu, Dec 6, 2012 at 9:56 AM, Markus Jelsma
> wrote:
>> The quorum is the minimun, so it depends on how many you have running in the
>> ensemble. If it's three or four, then two is the quorum
>
> I think that for 4 ZK servers, then 3 would b
-Original message-
> From:Yonik Seeley
> Sent: Thu 06-Dec-2012 16:01
> To: solr-user@lucene.apache.org
> Subject: Re: Minimum HA Setup with SolrCloud
>
> On Thu, Dec 6, 2012 at 9:56 AM, Markus Jelsma
> wrote:
> > The quorum is the minimun, so it depends on how many you have running in
First, forget about master/slave with SolrCloud! Leaders really exist to
resolve conflicts, the old notion of M/S replication is largely irrelevant.
Updates can go to any node in the cluster, leader, replica, whatever. The
node forwards the doc to the correct leader based on a hash of the
, which
Thanks a lot guys!
On Thu, Dec 6, 2012 at 4:22 PM, Markus Jelsma wrote:
>
> -Original message-
> > From:Yonik Seeley
> > Sent: Thu 06-Dec-2012 16:01
> > To: solr-user@lucene.apache.org
> > Subject: Re: Minimum HA Setup with SolrCloud
> >
> > On Thu, Dec 6, 2012 at 9:56 AM, Markus Jelsma
However the tomcat logs are reporting:
INFO: Adding
'file:/opt/solr/contrib/extraction/lib/juniversalchardet-1.0.3.jar' to
classloader
Dec 6, 2012 3:42:57 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
Original Message
Subject:Tika error
Date: Thu
On Dec 6, 2012, at 9:50 AM, "joe.cohe...@gmail.com"
wrote:
> Is there an out-of-the-box or have anyone already implemented a feature for
> collecting statistics on queries?
What sort of statistics are you talking about? Are you talking about
collecting information in aggregate about queries
On Wed, Dec 5, 2012 at 5:17 PM, Mark Miller wrote:
> See the custom hashing issue - the UI has to be updated to ignore this.
>
> Unfortunately, it seems that clients have to be hard coded to realize
> properties is not a shard unless we add another nested layer.
Yeah, I talked about this a while
Hi,
In most of the examples I have seen for configuring the
DirectSolrSpellChecker the minPrefix attribute is set to 1 (and this is the
default value as well).
Is there any specific reason for this - would performance take a hit if it
was set to 0? We'd like to support returning corrections which
Yeah, the main problem with it didn't really occur to me until I saw the
properties shard in the cluster view.
I started working on the UI to ignore it the other day and then never got there
because I was getting all sorts of weird 'busy' errors from svn for a while and
didn't have a clean chec
Hi All,
I followed the advice Michael and the timings reduced to couple of hours
now from 6-8 hours :-)
I have attached the solrconfig.xml we're using, can you let me know if I'm
missing something..
Thanks,
Sandeep
LUCENE_40
${solr.abortOnConfigurationError:true}
: I'm sorry, I don't see how the resource loader awareness is relevant to
: schema awareness? Or perhaps you didn't imply that? Good to know about the
No, my mistake ... typed one when i ment the other one.
: I guess I use the core to get to the schema then. Hmm, I may recall trying
: that at so
You'll need to tell us more about your custom component so that we can
make some suggestions as to how to update it to work with SolrCloud.
In particular: what exactly are you doing with the result from
getConfigDir() ? ... if you are just using it to build a path to a File
that you open to co
http://lucenerevolution.org/
Lucene Revolution 2013 will take place at The Westin San Diego on April 29
- May 2, 2013. Many of the brightest minds in open source search will
convene at this 4th annual Lucene Revolution to discuss topics and trends
driving the next generation of search. The co
: Hi - no we're not getting any errors because we enabled positions on all
: fields that are also listed in the qf-parameter. If we don't, and send a
: phrase query we would get an error such as:
:
: java.lang.IllegalStateException: field "h1" was indexed without position
data; cannot run
: Ph
Grouping should work:
group=true&group.field=source_id&group.limit=3&group.main=true
On Thu, Dec 6, 2012 at 2:35 AM, Tom Mortimer wrote:
> Sounds like it's worth a try! Thanks Andre.
> Tom
>
> On 5 Dec 2012, at 17:49, Andre Bois-Crettez wrote:
>
> > If you do grouping on source_id, it should be
Thanks, but even with group.main=true the results are not in relevancy (score)
order, they are in group order. Which is why I can't use it as is.
Tom
On 6 Dec 2012, at 19:00, Way Cool wrote:
> Grouping should work:
> group=true&group.field=source_id&group.limit=3&group.main=true
>
> On Thu,
Jason,
Thanks for raising it!
Erick,
That's what I want to discuss for a long time. Frankly speaking, the
question is:
if old-school (master/slave) search deployments doesn't comply to vision by
SolrCloud/ElasticSearch, does it mean that they are wrong?
Let me enumerate kinds of 'old-school sear
Hello,
What's you OS/cpu? is it a VM or real hardware? which jvm do you run? with
which parameters? have you checked GC log? what's the index size? what's a
typical query parameters? what's an average number of results in the
query? have you tried to run query with debugQuery=true during hard loa
Hi Joe,
http://sematext.com/search-analytics/index.html is free and will give you a
bunch of reports about search (Solr or anything else). Not queries by IP,
though - for that you better grep logs.
Yes, you could also implement your own SearchComponent, assuming the
servers/LBs in front of Solr
1 is the minimum :)
2 makes no sense.
3 must be the most common number in the zoo.
Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html
On Thu, Dec 6, 2012 at 9:46 AM, Jack Krupansky wrote:
> "but if those same
One is the loneliest number that you'll ever do,
Two can be as bad as one, it's the loneliest number since the single Zoo.
Michael Della Bitta
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271
www.appinions.com
Where Influence Isn
We measured for just 3 nodes the overhead is around 100ms. We also noticed is
that CPU spikes to 100% and some queries get blocked, this happens only when
cloud has multiple nodes but does not happen on single node. All the nodes
has the exact same configuration and JVM setting and hardware configu
Rewind.
If 1 is the minimum, what is the 3 "minimum" all about?
The zk web page does say "Three ZooKeeper servers is the minimum recommended
size for an ensemble, and we also recommend that they run on separate
machines" - but it does say "recommended".
But back to the original question - it
Slightly more recent link:
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Thursday, December 06, 2012 5:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Minimum HA Setup with SolrCloud
Rewind.
If 1 is the m
And that link includes this sentence: "For example, with four machines
ZooKeeper can only handle the failure of a single machine; if two machines
fail, the remaining two machines do not constitute a majority."
wunder
On Dec 6, 2012, at 2:25 PM, Jack Krupansky wrote:
> Slightly more recent link
In case you missed the parallel thread running right now, a read of the main
zookeeper admin web page is a good background to have:
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
-- Jack Krupansky
-Original Message-
From: Jamie Johnson
Sent: Thursday, December 06, 2012 5:
On Thu, Dec 6, 2012 at 5:21 PM, Jack Krupansky wrote:
> If 1 is the minimum, what is the 3 "minimum" all about?
The minimum for running an ensemble (a cluster) and having any sort of
fault tolerance?
> The zk web page does say "Three ZooKeeper servers is the minimum recommended
> size for an ens
Jack,
The recommended ensemble configured size takes into consideration that you
might have a node failure. You can still run with two while you replace the
third, so it's sort of like RAID-5.
If you run with four configured nodes, you're still running with
RAID-5-like failure survival characteri
I just rethought what I wrote and it doesn't make any sense. :)
If you have two remaining nodes left when you have a three node ensemble,
how are ties broken? Or does Zookeeper not resolve ties since it doesn't
tolerate partitions?
Michael
Michael Della Bitta
--
3 is the minimum if you want to allow a node to go down.
1 is the minimum if you want the thing to work at all - but if the 1 goes down,
ZooKeeper may stop working…
- Mark
On Dec 6, 2012, at 2:21 PM, Jack Krupansky wrote:
> Rewind.
>
> If 1 is the minimum, what is the 3 "minimum" all about?
On Dec 6, 2012, at 2:32 PM, Michael Della Bitta
wrote:
> I just rethought what I wrote and it doesn't make any sense. :)
>
> If you have two remaining nodes left when you have a three node ensemble,
> how are ties broken? Or does Zookeeper not resolve ties since it doesn't
> tolerate partition
I trust that you have the right answer, Mark, but maybe I'm just struggling
to parse this statement: "the remaining two machines do not constitute a
majority."
If you start with 3 zk and lose one, you have an ensemble that does not
"constitute a majority".
So, the question is what can or can
On Dec 6, 2012, at 2:55 PM, Jack Krupansky wrote:
> I mean, there must be some hard-core downside, other than that you can't lose
> any more nodes.
Nope, not really. You just can't lose any more nodes.
Technically, you will also lose a bit of read performance - but write
performance generall
I also did a test running a load directed to one single server in the cloud
and checked the CPU usage of other servers. It seems that even if there are
no load directed to those servers there is a CPU spike each minute. Did you
also di this test on the SolrCloud, any observations or suggestions?
On Thu, Dec 6, 2012 at 5:55 PM, Jack Krupansky wrote:
> I trust that you have the right answer, Mark, but maybe I'm just struggling
> to parse this statement: "the remaining two machines do not constitute a
> majority."
>
> If you start with 3 zk and lose one, you have an ensemble that does not
>
On Thu, Dec 6, 2012 at 12:17 PM, Sandeep Mestry wrote:
> I followed the advice Michael and the timings reduced to couple of hours now
> from 6-8 hours :-)
Just changing from mmap to NIO, eh? What does your system look like?
operating system, JVM, drive, memory, etc?
-Yonik
http://lucidworks.com
see: http://wiki.apache.org/solr/DistributedSearch
joins aren't supported in distributed search. Any time you have more than
one shard in SolrCloud, you are, by definition, doing distributed search.
Best
Erick
On Wed, Dec 5, 2012 at 10:16 AM, adm1n wrote:
> Hi,
>
> I'm running some join query
I suspect that you're seeing a timeout issue and the simplest fix might be
to up the timeouts, probably at the servlet-level.
You might get some evidence that this is the issue if your log files for
the time when this happens show some unusual activity, garbage collection
is a popular reason for t
I've seen the "Waiting until we see..." message as well, it seems for me to
be an artifact of bouncing servers rapidly. It took a lot of patience to
wait until the timeoutin value got all the way to 0, but when it did the
system recovered.
As to your original problem, are you possibly getting page
Not quite. Too much memory for the JVM means that it starves the op system
and the caching that goes on there. Objects that consume memory are created
all the time in Solr. They won't be recovered until some threshold is
passed. So you can be sure that the more memory you allocate to the JVM,
the m
Why not do this at the app level? You can simply reorder the docs returned
in your groups by score and display it that way.
Or am I misunderstanding your requirement?
Best
Erick
On Thu, Dec 6, 2012 at 11:03 AM, Tom Mortimer wrote:
> Thanks, but even with group.main=true the results are not in
+1 to using IntelliJ's remote debugging facilities.
I've done this with Tomcat too - just edit catalina.sh to add the parameters to
the JVM invocation that the IntelliJ remote run configuration suggests.
With Tomcat you'll have to build the war using the Ant build, but that's more
sensible anyw
But that is the context I was originally referring to - that with 4 zk you
can lose only one, that you can't lose two. So, if you want to tolerate a
loss on one, 4 zk would be the minimum... but then it was claimed that you
COULD start with 3 zk and loss of one would be fine. I mean whether you
well, you could probably do what you want. Go ahead and index on the "super
cool AWS instance", just don't bring the replicas up yet. All the indexing
is going to this machine. Once your index is constructed, bring up
replicas. Old-style replication will take place and you should be off to
the race
The Zookeeper ensemble knows the total size. It does not adjust it each time
that a machine is partitioned or down.
Two machines is not a quorum for a four machine ensemble.
Why do you think that the documentation would get this wrong?
wunder
On Dec 6, 2012, at 4:14 PM, Jack Krupansky wrote:
Yes - it means that 001 went down (or more likely had it's connection to
ZooKeeper interrupted? that's what I mean about a session timeout - if the
solr->zk link is broken for longer than the session timeout that will trigger a
leader election and when the connection is reestablished, the node w
It's still an unresolved mystery, for now.
-- Jack Krupansky
-Original Message-
From: Walter Underwood
Sent: Thursday, December 06, 2012 7:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Minimum HA Setup with SolrCloud
The Zookeeper ensemble knows the total size. It does not adjust
What is the mystery? Two is not more than half of four. Therefore, two machines
is not a quorum for a four machine Zookeeper ensemble.
wunder
On Dec 6, 2012, at 4:50 PM, Jack Krupansky wrote:
> It's still an unresolved mystery, for now.
>
> -- Jack Krupansky
>
> -Original Message- Fro
Ok we think we found out the issue here. When solrcloud is started without
specifying numShards argument solrcloud starts with a single shard but still
thinks that there are multiple shards, so it forwards every single query to
all the nodes in the cloud. We did a tcpdump on the node where queries
There are some gains to be made in Solr's distributed search code. A few
weeks about I spent time profiling dist search using dtrace/btrace and
found some areas for improvement. I planned on writing up some blog posts
and providing patches but I'll list them off now in case others have input.
1)
The part I still find confusing is that if you start with 3 and lose 1, your
have 2, which means you can't always break a tie, right? How is this
explained? As opposed to saying that 4 is the minimum if you need to
tolerate a loss of 1.
-- Jack Krupansky
-Original Message-
From: Walt
Configure an ensemble of three. When one goes down, you still have an ensemble
of three, but with one down. The ensemble size is not reset after failures.
wunder
On Dec 6, 2012, at 5:20 PM, Jack Krupansky wrote:
> The part I still find confusing is that if you start with 3 and lose 1, your
> h
And this is precisely why the mystery remains - because you're only
describing half the picture! Describe the rest of the picture - including
what exactly those two zks can and can't do, including resolution of ties
and the concept of "constitu.
-- Jack Krupansky
-Original Message-
F
Oops...
And this is precisely why the mystery remains - because you're only
describing half the picture! Describe the rest of the picture - including
what exactly those two zks can and can't do, including resolution of ties
and the concept of "constituting a majority" and a quorum.
I'm not s
I should have sent this some time ago:
https://issues.apache.org/jira/browse/SOLR-3940 "Rejoining the leader election
incorrectly triggers the code path for a fresh cluster start rather than fail
over."
The above is a somewhat ugly bug.
It means that if you are playing around with recovery and
Ryan, my new best friend! Please, file JIRA issue(s) for these items!
I'm sure you will get some feedback.
- Mark
On Dec 6, 2012, at 5:09 PM, Ryan Zezeski wrote:
> There are some gains to be made in Solr's distributed search code. A few
> weeks about I spent time profiling dist search using d
On Dec 6, 2012, at 5:08 PM, sausarkar wrote:
> We solved the issue by explicitly adding numShards=1 argument to the solr
> start up script. Is this a bug?
Sounds like it…perhaps related to SOLR-3971…not sure though.
- Mark
Hi,
Im generating SOLR using SOLR 3.3, Apache Tomcat 7.0.19. Some times my
Tomcat get hanged giving below error in log.
SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: PermGen space
at
org.apache.solr.handler.dataimport.Do
On Thu, Dec 6, 2012 at 8:42 PM, Jack Krupansky wrote:
> And this is precisely why the mystery remains - because you're only
> describing half the picture! Describe the rest of the picture - including
> what exactly those two zks can and can't do, including resolution of ties
> and the concept of "
You might consider implementing some jmx tooling. Nagios is one of several
such engines.
wiki.apache.org/*tomcat*/FAQ/Monitoring
On Thursday, December 6, 2012, aniljayanti wrote:
> Hi,
>
> Im generating SOLR using SOLR 3.3, Apache Tomcat 7.0.19. Some times my
> Tomcat get hanged giving below
You might consider implementing some jmx tooling. Nagios is one of several
such engines.
wiki.apache.org/*tomcat*/FAQ/Monitoring
On Thursday, December 6, 2012, aniljayanti wrote:
> Hi,
>
> Im generating SOLR using SOLR 3.3, Apache Tomcat 7.0.19. Some times my
> Tomcat get hanged giving below
Thank you. I will read about these commands.
I don't copy anything anywhere. I just edit the code and click Run, IDEA does
everything for me. I guess, IDEA's artifacts are exactly for these routines.
Anyway, there are no such instructions, you described, anywhere in the solr
wiki, so it's hard t
I think I've figured out how to express it: A zk node can offer its services
if it is able to communicate with more than half of the specified ensemble
size, which assures that there is no split brain, where two or more
competing groups of inter-communicating nodes could offer services that
con
On 7 December 2012 12:30, Zeng Lames wrote:
> Hi,
>
> wanna to know is there any plugin / tool to import data from Excel to Solr.
[...]
You could export to CSV from Excel, and import the CSV into Solr:
http://wiki.apache.org/solr/UpdateCSV
Regards,
Gora
78 matches
Mail list logo