RE: Scaling Solr on VMWare

2013-03-26 Thread Frank Wennerdahl
Hi Otis and thanks for your response. We are indeed suspecting that the problem with only 2 cores being used might be caused by the virtual environment. We're hoping that someone with experience of running Solr on VMWare might know more about this or the other issues we have. The servlet we're ru

Accessing multicore setup using solrj

2013-03-26 Thread J Mohamed Zahoor
Hi I am having a multi core setup with 2 core "core0" and core1". How do i insert doc in core 1? I am using as below. searchServer = new CloudSolrServer(zooQourumUrl); searchServer.setDefaultCollection("core1"); searchServer.connect(); and i get "No live solr servers" exception. But i could s

multicore vs multi collection

2013-03-26 Thread J Mohamed Zahoor
Hi I am kind of confuzed between multi core and multi collection. Docs dont seem to clarify this.. can someone enlighten me what is ther difference between a core and a collection? Are they same? ./zahoor

Re: lucene 42 codec

2013-03-26 Thread Mario Casola
thank you very much. Mario 2013/3/25 Chris Hostetter : > > : I noticed that apache solr 4.2 uses the lucene codec 4.1. How can I > : switch to 4.2? > > Unless you've configured something oddly, Solr is already using the 4.2 > codec. > > What you are probably seeing is that the fileformat for seve

Re: multicore vs multi collection

2013-03-26 Thread Furkan KAMACI
Did you check that document: http://wiki.apache.org/solr/SolrCloud#A_little_about_SolrCores_and_CollectionsIt says: On a single instance, Solr has something called a SolrCorethat is essentially a single index. If you want multiple indexes, you create multiple S

Re: multicore vs multi collection

2013-03-26 Thread J Mohamed Zahoor
Thanks. This make it clear than the wiki. How do you create multiple collection which can have different schema? ./zahoor On 26-Mar-2013, at 3:52 PM, Furkan KAMACI wrote: > Did you check that document: > http://wiki.apache.org/solr/SolrCloud#A_little_about_SolrCores_and_CollectionsIt > says:

Debugging Map Reduce Jobs at Solr

2013-03-26 Thread Furkan KAMACI
Is there any easy way(tools etc.) that I can debug Map Reduce jobs of Solr?

Customize Solr Fragmeant

2013-03-26 Thread meghana
I want to use Regexp Fragmenter in my solr highlighting feature to customize my fragment. As per requirement , we need to return 25 words before and after highlighting term. To do so , i have made below regular expression ((?:\w+\W*){25})\b(span class)\b((?:\W*\w+){25}) This regular expression

Solr Phonetic Search Highlight issue in search results

2013-03-26 Thread Soumyanayan Kar
When we are issuing a query with Phonetic Search, it is returning the correct documents but not returning the highlights. When we use Stemming or Synonym searches we are getting the proper highlights. For example, when we execute a phonetic query for the term fakt(ContentSearchPhonetic:fakt) in

Re: Solr 4 automatic DB updates for sync using Delta query DIH with scheduler

2013-03-26 Thread majiedahamed
kindly help to update the index rather that adding the index -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-automatic-DB-updates-for-sync-using-Delta-query-DIH-with-scheduler-tp4051114p4051340.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4 automatic DB updates for sync using Delta query DIH with scheduler

2013-03-26 Thread Gora Mohanty
On 26 March 2013 14:06, majiedahamed wrote: > kindly help to update the index rather that adding the index If the documents have uniqueKey fields (defined in schema.xml, and named id by default) with values that already exist in the Solr index, such documents will automatically be updated, rather

Re: Debugging Map Reduce Jobs at Solr

2013-03-26 Thread Jan Høydahl
Hi, Please elaborate your question. Solr does not have any M/R jobs, so you have to let us know your setup. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 26. mars 2013 kl. 11:34 skrev Furkan KAMACI : > Is there any easy way(tool

Re: Solr 4 automatic DB updates for sync using Delta query DIH with scheduler

2013-03-26 Thread majiedahamed
Thanks.. I got this concept bit clear. while using deltaimport in solr 4.x specifically the automatic update feature will add new document to the index with updating latest value into the field(because we use the_version_ field in our schema).Therefore we will have two results with different _vers

Re: Accessing multicore setup using solrj

2013-03-26 Thread Mark Miller
Are you using SolrCloud mode? - Mark On Mar 26, 2013, at 4:49 AM, J Mohamed Zahoor wrote: > Hi I am having a multi core setup with 2 core "core0" and core1". > How do i insert doc in core 1? > > I am using as below. > > searchServer = new CloudSolrServer(zooQourumUrl); > searchServer.setDefau

Re: Solr 4 automatic DB updates for sync using Delta query DIH with scheduler

2013-03-26 Thread majiedahamed
It working now..! i was actually using case insensitive for unique key and iam suprised it dons not work.when i changed to just string automatic updates started working. -- View this message in context: http://lucene.472066.n3.

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-26 Thread Uomesh
Hi Mark, Further details: My master details has not changed since last 24 hours but Slave index version and Gen has increased. If i do the full import slave is replicated and Version and Gen is reset. Version GenSize Master: 1364238678758 111 768.23 KB Slave: 13642992

RE: Solr 4.2 - Slave Index version is higher than Master

2013-03-26 Thread John, Phil (CSS)
Sorry, Mark, just realised. Yes, we're replicating: schema.xml, stopwords.txt Regards, Phil. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 22 March 2013 20:31 To: solr-user@lucene.apache.org Subject: Re: Solr 4.2 - Slave Index version is higher than

Slow performance on distributed search

2013-03-26 Thread qungg
Hi, I have 40 shards running on 48 core machine with 256GB RAM (The data is about 40 GB). I am using legacy distributed method as setup. So I have one additional shard with no data. Queries would go to this shard and the shard would merge result from the rest of the 40 shards. From the log, I see

Re: Slow performance on distributed search

2013-03-26 Thread Otis Gospodnetic
Hi, Does your query really need to search all 40 shards? If not, dispatching the query only to shards that need to be queried will help. Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html On Tue, Mar 26, 2013 at 11:17 AM, qungg wrote: > Hi, > > I have 40 shards runnin

Re: Slow performance on distributed search

2013-03-26 Thread qungg
Thank you for reply. Queries do need to go to all 40 shards, though 40 shards is not a final number, my setup can be changed if search time decreases. Im using 40 shards because we have a tool that can index faster with more shards. -- View this message in context: http://lucene.472066.n3.nabb

Re: Debugging Map Reduce Jobs at Solr

2013-03-26 Thread Otis Gospodnetic
Hi, Solr doesn't really do MapReduce jobs. Maybe you mean distributed search where queries are dispatched to N servers and then responses are merged/reduced to top N and returned? Otis -- Solr & ElasticSearch Support http://sematext.com/ On Tue, Mar 26, 2013 at 6:34 AM, Furkan KAMACI wrote

Re: Debugging Map Reduce Jobs at Solr

2013-03-26 Thread Furkan KAMACI
Ok, thanks for your responses. Actually I was wondering about indexing and reindexing from nutch to Solr and debugging them. I think according to your responses there is no difference for Solr side that data is coming through a map reduce or not. 2013/3/26 Otis Gospodnetic > Hi, > > Solr doesn't

Re: Debugging Map Reduce Jobs at Solr

2013-03-26 Thread Gora Mohanty
On 26 March 2013 21:32, Furkan KAMACI wrote: > > Ok, thanks for your responses. Actually I was wondering about indexing and > reindexing from nutch to Solr and debugging them. I think according to > your > responses there is no difference for Solr side that data is coming through > a map reduce or

RE: Any experience with adding documents batch sizes?

2013-03-26 Thread Benjamin, Roy
Thanks Otis! -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Monday, March 25, 2013 7:41 PM To: solr-user@lucene.apache.org Subject: Re: Any experience with adding documents batch sizes? Hi, You'll have to test because there is no general rule that wo

Solr Fuzzy search on short string

2013-03-26 Thread Jimmy Dean
I did a fuzzy search on solr. The result is a little strange to me. Query "carj~" can match "carl". But "cari" can't match "carl". As a matter of fact, car[x]~, [x]>"i" can match "carl". Is this the correct behavior? Jimmey

Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-26 Thread Nate Fox
I'm new to solr and I'm load testing our setup to see what we can handle. I'm using solrmeter and my problem is a bit odd: * When I set solrmeter to run 4000 queries/min, it will handle a few hundred queries and then tomcat will stop responding completely to requests (even though according to lsof

Re: Solr Fuzzy search on short string

2013-03-26 Thread Jack Krupansky
Could your provide the precise query URLs. I don't quite follow the notation you are using, especially: car[x]~, [x]>"i". I mean, are you saying that q=cari~ does not match "carl"? (You left out the tilda in your message.) -- Jack Krupansky -Original Message- From: Jimmy Dean Sent:

RE: Slow performance on distributed search

2013-03-26 Thread Michael Ryan
What are the values of the start and rows parameters you are using? When you say the controller shard takes a long time, how long is it taking - 100ms, 1s, 10s...? -Michael -Original Message- From: qungg [mailto:qzheng1...@gmail.com] Sent: Tuesday, March 26, 2013 11:17 AM To: solr-user

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-26 Thread Mark Miller
That's pretty interesting. The slave should have no way of doing this without a commit… - Mark On Mar 26, 2013, at 11:07 AM, Uomesh wrote: > Hi Mark, > > Further details: My master details has not changed since last 24 hours but > Slave index version and Gen has increased. If i do the full im

Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-26 Thread Otis Gospodnetic
Hi, In short, certain data structures need to load from index in the beginning, (for sorting and faceting) caches need to warm up, JVM needs to warm up, etc., so going slowly in the beginning makes sense. Why things die after that is a different Q. Maybe it OOMs? Maybe queries are very complex?

Re: Query question

2013-03-26 Thread Chris Hostetter
: So as I said, the search result I want is the one with the highest score, : but I was hoping to find a way to boost the score based on the number of : terms it finds (or matches well) so that I can differentiate between a close : match and nowhere near. Any suggestions? In general, this already

Re: Conditional Field Search without affecting score.

2013-03-26 Thread Chris Hostetter
: document accordingly. This works good in most cases. but we had a case where : we ran into issue. : : DocA // Common title and is same for all county so no additional titles. : Fighter : : DocB : The Ultimate Street Fighter // Default : Ultimate Fighter // For UK : : : now querying f

Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-26 Thread Nate Fox
I was wondering if the warmup stuff was one of the culprits (we dont have warmup's at all - the configs are pretty stock). As for the system, it seems capable of quite a bit more: memory usage is ~30%, jvm-memory (from the dashboard) is very low (~220Mb out of 3Gb) and load below 1.00. The seed da

RE: Slow performance on distributed search

2013-03-26 Thread qungg
for start=100,000&row=10. event though each individual shard take only < 10ms to query, the merging process done by controller would take about a minutes. By looking at logs, each shard is giving the controller shard 100,010 rows of data, and because there are 40 shards in total, the controller i

Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-26 Thread Otis Gospodnetic
Hi Nate, Try adding some warmup queries and making sure the setting for using the cold searcher in solrconfig.xml is set to false. Your warmup queries should use facets and sorting if your normal queries use them. In SPM you'll actually see how much time warming up takes, so you'll get a better

Re: OutOfMemoryError

2013-03-26 Thread Shawn Heisey
On 3/25/2013 1:34 AM, Arkadi Colson wrote: I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-kille

Re: To get Term Offsets of a term per document

2013-03-26 Thread Chris Hostetter
: Is there a way to get Term Offsets of a given term per document without : enabling the termVectors ? : : Is it that Lucene index stores the positions but not the offsets by default : - is it correct ? correct -- unless you specifically enable termVectors, the offset information isn't availa

Re: Slow performance on distributed search

2013-03-26 Thread Joel Bernstein
Take a look at this jira: https://issues.apache.org/jira/browse/SOLR-659 I have not tried this out myself but it looks promising. On Tue, Mar 26, 2013 at 2:55 PM, qungg wrote: > for start=100,000&row=10. event though each individual shard take only < > 10ms > to query, the merging process don

Re: Error creating collection using CORE-API

2013-03-26 Thread Joel Bernstein
What version of Solr are you using? I'll see if I can reproduce the problem. Your initial error, which I believe was because Solr could not find the matching config-set in zookeeper, I've run into many times. But I've never had a problem when I've re-issued the command with a matching config-set.

Re: Slow performance on distributed search

2013-03-26 Thread Walter Underwood
Why on earth are you starting at row 100,000? What use case is that? --wunder On Mar 26, 2013, at 11:55 AM, qungg wrote: > for start=100,000&row=10. event though each individual shard take only < 10ms > to query, the merging process done by controller would take about a minutes. > > By looking

Re: Tlog File not removed after hard commit

2013-03-26 Thread Shawn Heisey
On 3/24/2013 10:02 AM, Niran Fajemisin wrote: We import about 1.5 million documents on a nightly basis using DIH. During this time, we need to ensure that all documents make it into index otherwise rollback on any errors; which DIH takes care of for us. We also disable autoCommit in DIH but in

Re: lucene 42 codec

2013-03-26 Thread Shawn Heisey
On 3/25/2013 10:59 AM, Mario Casola wrote: I noticed that apache solr 4.2 uses the lucene codec 4.1. How can I switch to 4.2? The index format did not change significantly enough to warrant new filenames. The only significant change I'm actually aware of is that now termvectors are compresse

Re: Too many fields to Sort in Solr

2013-03-26 Thread Joel Bernstein
I pretty sure that the boost function is going load up the fieldCache for that field, which will have the same memory footprint as sorting. You may want to try out disk based doc values (new 4.2). Documentation is sparse on this but you can specify a field as being docValues="true" and set it to a

Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-26 Thread Michael Della Bitta
Nate, We just cleared up a problem similar to this by ditching Elastic Load Balancer and switching over to the APR connector in Tomcat. Are you using either of those? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-

Re: Slow performance on distributed search

2013-03-26 Thread Jack Krupansky
(You mean, other than "deep paging".) -- Jack Krupansky -Original Message- From: Walter Underwood Sent: Tuesday, March 26, 2013 3:47 PM To: solr-user@lucene.apache.org Subject: Re: Slow performance on distributed search Why on earth are you starting at row 100,000? What use case is h

Re: Slow performance on distributed search

2013-03-26 Thread Walter Underwood
That is extremely deep paging. That is page 10,000 with ten hits on each page. No human will look at ten thousand pages of results. The system really does need to rank the first 100,000 before it knows which document should be at rank 100,001. There is no way around that. wunder On Mar 26, 201

Re: Slow performance on distributed search

2013-03-26 Thread Michael Della Bitta
We've been able to speed up deep paging through big sets by using a filter query to segment them as well as start/rows paging. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence

RE: Slow performance on distributed search

2013-03-26 Thread Michael Ryan
Depending on your use case and the particulars of your system, a previous post I made about using a FieldCache in SolrIndexSearcher for id retrieval (see http://osdir.com/ml/solr-user.lucene.apache.org/2013-01/msg01574.html) may help you. In your case, it might not be the merging process on the

There are no SolrCores running. Using the Solr Admin UI currently requires at least one SolrCore.

2013-03-26 Thread Furkan KAMACI
I use Solr 4.2 on Centos 6.4 at AWS and I have deployed solr wars into two different amazon instances at tomcats. *When I run them without solrcloud they are OK.* However I want to use them as solrCloud. I want to start embedded zookeper at one of them. When I run: ps aux | grep catalina I get t

Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-26 Thread Nate Fox
We're not using ELB and I have no idea which connector I'm using - I'm guessing whatever is default (I'm a total noob). This is from my server.xml: -- Nate Fox Sr Systems Engineer o: 310.658.5775 m: 714.248.5350 Follow us @NEOGOV and on Facebook

Re: There are no SolrCores running. Using the Solr Admin UI currently requires at least one SolrCore.

2013-03-26 Thread Mark Miller
java.lang.NoSuchMethodError: There must be something off with the jars you are using - a mix of versions or something. - Mark On Mar 26, 2013, at 5:18 PM, Furkan KAMACI wrote: > I use Solr 4.2 on Centos 6.4 at AWS and I have deployed solr wars into two > different amazon instances at tomcats

Custom ValueSource, filtering with frange, and caching woes

2013-03-26 Thread Timothy Potter
Hi, I have a custom ValueSource that I'd like to use as a filter, something like: fq={!frange l=1 u=1}MYFUNC(some_field, addl args) Based on the args passed in and the value in some_field, MYFUNC returns either 1 or 0. This works but it doesn't seem like the results get cached as subsequent req

Re: Custom ValueSource, filtering with frange, and caching woes

2013-03-26 Thread Chris Hostetter
: I have a custom ValueSource that I'd like to use as a filter, something : like: ... : My question is whether I should expect Solr to cache this filter in the : filterCache? In other words, is there any reason to expect frange filters Did you remember to implement consistent and meaningf

Re: There are no SolrCores running. Using the Solr Admin UI currently requires at least one SolrCore.

2013-03-26 Thread Furkan KAMACI
Yes, I cleaned and compiled with ant again and fixed. Because there were some other jars at my lib somehow. How could do understand that there is mix of jars? Just because of NoSuchMethodError or with something else? 2013/3/26 Mark Miller > java.lang.NoSuchMethodError: > > There must be somethin

[ScriptUpdateProcessor] Params aren't being picked up from solrconfig

2013-03-26 Thread Rene Nederhand
Hi, I'm trying to use the new ScriptUpdateProcessor in trunk with a simple Python script. Everything seems to work nicely, but none of the params I specify in solrconfig.xml are being picked up. The error I'm getting is: "NameError: global name 'params' is not defined". Am I doing something w

SolrCloud On Different AWS Instances With Embedded Zookeeper

2013-03-26 Thread Furkan KAMACI
I have to Amazon Web Services instances. I have set up SolrCloud for them. Solr .wars are deployed into tomcat. When I start solr that runs zookeper, it is OK. It can not find second shard as usual. When I start up second solr it throws error. This is first solr config: JAVA_OPTS="$JAVA_OPTS -Dbo

Re: multicore vs multi collection

2013-03-26 Thread Furkan KAMACI
Also from there http://wiki.apache.org/solr/SolrCloud: *Q:* What is the difference between a Collection and a SolrCore? *A:* In classic single node Solr, a SolrCoreis basically equivalent to a Collection. It presents one

Re: [ScriptUpdateProcessor] Params aren't being picked up from solrconfig

2013-03-26 Thread Chris Hostetter
: none of the params I specify in solrconfig.xml are being picked up. The : error I'm getting is: "NameError: global name 'params' is not defined". ... : : : summarize.py : : : : abstract : summary : ...that list of "params" isn't in

Re: Solr Phonetic Search Highlight issue in search results

2013-03-26 Thread Erick Erickson
How would you expect it to highlight successfully? The term is "fakt", there's nothing built in (and, indeed couldn't be) to un-phoneticize it into "fact" and apply that to the Content field. The whole point of phonetic processing is to do a lossy translation from the word into some variant, losing

Re: Solr 4 automatic DB updates for sync using Delta query DIH with scheduler

2013-03-26 Thread Erick Erickson
Hmmm, glad it's working, but something you mentioned really makes me nervous. You say you use the _version_ field in your schema. I hope you're not setting that field in your Solr document, if you ever turn on any of the SolrCloud options I'm not sure what would happen, that field is best left alon

Re: Tlog File not removed after hard commit

2013-03-26 Thread Erick Erickson
Shawn: If you do hard commits, no matter what the openSearcher value, and the machine crashes when it comes back up you'll see those commits. How I'd approach it if I absolutely _had_ to do a complete rollback would be something like force a replication to a dedicated machine before the import, t

Re: There are no SolrCores running. Using the Solr Admin UI currently requires at least one SolrCore.

2013-03-26 Thread Erick Erickson
Answering for Mark, yep. NoSuchMethod indicates that a class being referenced is in some of the jar, but that a method in that class is not found. Which is exceedingly rare since the compiler should have already complained if one references a method in a class that truly isn't there FWIW, Eric

Re: Slow performance on distributed search

2013-03-26 Thread Erick Erickson
See if this one makes it through: bq: Sorting 4 million values really shouldn't take that long True. But transmitting 100,010 results back to the originating machine from each of 40 machines, unpacking them and _then_ sorting them could be another story ... Erick On Tue, Mar 26, 2013 at 5:07 PM

Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-26 Thread Chris Hostetter
: * When I set solrmeter to run 4000 queries/min, it will handle a few : hundred queries and then tomcat will stop responding completely to requests : (even though according to lsof -i it is still listening and the java : process is still running). have you tried tacking using jstack to generate

[WEBINAR] - "Lucene/Solr 4 – A Revolution in Enterprise Search Technology"

2013-03-26 Thread Erik Hatcher
Excuse the blatant marketing, though for the benefit of the community... Join me tomorrow/today (March 27) for a webinar on what's new and improved in Lucene and Solr 4. It's the last call to register. Help me break the webinar syst

Re: Custom ValueSource, filtering with frange, and caching woes

2013-03-26 Thread Timothy Potter
Thanks for the reply Chris - equals and hashCode are implemented correctly ... I ended up solving my issue by enabling the PostFilter support in the frange parser using: {!frange l=1 u=1 cost=200 cache=false} This works for me because the queries that use my custom ValueSource are pretty tightly

Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-26 Thread Michael Della Bitta
You're using the blocking IO connector, which isn't so great for heavy loads. Give this a shot... You'll end up with 8192 max connections by default, although this is tunable too: Run: apt-get install libapr1 libtcnative-1 Add this to the list of Listeners at the top of server.xml: These inst