RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
Wow! That was the most pointed, concise discussion of hardware requirements I've seen to date, and it's fabulously helpful, thank you Shawn! We currently have 2 servers that I can dedicate about 12GB of ram to Solr on (we're moving to these 2 servers now). I can upgrade further if it's needed & ju

Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread Shawn Heisey
On 4/18/2013 11:02 PM, sawanverma wrote: > Giving content:[* TO *] gives the same error but when I give content:[a TO z] > it works fine. Can you please explain what does it mean when I give > content:[a TO z]? Can I use this as workaround? The datatype of content field > is text_en. That synta

RE: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread sawanverma
Shawn, Giving content:[* TO *] gives the same error but when I give content:[a TO z] it works fine. Can you please explain what does it mean when I give content:[a TO z]? Can I use this as workaround? The datatype of content field is text_en. Thanks again for you replies and suggestions. Regar

Re: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread Shawn Heisey
On 4/18/2013 8:12 PM, David Parks wrote: > I think I still don't understand something here. > > My concern right now is that query times are very slow for 120GB index (14s > on avg), I've seen a lot of disk activity when running queries. > > I'm hoping that distributing that query across 2 serve

DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-18 Thread SandeepM
Hi! I am using SOLR 4.2.1. My solrconfig.xml contains the following: text_spell MySpellchecker spell solr.DirectSolrSpellChecker internal 0.5 2 1 5 3 0.01 10 id MySpell

RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
I think I still don't understand something here. My concern right now is that query times are very slow for 120GB index (14s on avg), I've seen a lot of disk activity when running queries. I'm hoping that distributing that query across 2 servers is going to improve the query time, specifically I

Re: Solr indexing

2013-04-18 Thread uohzoaix
you just change date filedtype to string -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-tp4057017p4057136.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr system and numbers

2013-04-18 Thread Alexandre Rafalovitch
Do you mean a range (e.g. [4 TO 17]) or a prefix (e.g. 10*)? For range you need to index it as a number. For prefix, string is probably better. Than, just use standard query parameters. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrera

Re: Solr system and numbers

2013-04-18 Thread uohzoaix
if i wanna search on subsets of number,what can i do? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-system-and-numbers-tp482519p4057134.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: What are the pros and cons Having More Replica at SolrCloud

2013-04-18 Thread Manuel Le Normand
On the query side, another down side i see would be that for a given memory pool, you'd have to share it with more cores because every replica uses it's own cache. True for the inner solr caching (JVM's heap) and OS caching as well. Adding a replicated core creates a new data set (index) that will

Updating clusterstate from the zookeeper

2013-04-18 Thread Manuel Le Normand
Hello, After creating a distributed collection on several different servers I sometimes get to deal with failing servers (cores appear "not available" = grey) or failing cores ("Down / unable to recover" = brown / red). In case i wish to delete this errorneous collection (through collection API) on

Re: What are the pros and cons Having More Replica at SolrCloud

2013-04-18 Thread Timothy Potter
re: more replicas - pro: you can scale your query processing workload because you have more nodes available to service queries, eg 1,000 QPS sent to Solr with 5 replicas, then each is only processing roughly 200 QPS. If you need to scale up to 10K QPS, then add more replicas to distribute the incr

PositionLengthAttribute - Does it do anything at all?

2013-04-18 Thread Hayden Muhl
I've been playing around with the PositionLengthAttribute for a few days, and it doesn't seem to have any effect at all. I'm aware that position length is not stored in the index, as explained in this blog post. http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html Howeve

Re: Paging and sorting in Solr

2013-04-18 Thread hassancrowdc
thnx -- View this message in context: http://lucene.472066.n3.nabble.com/Paging-and-sorting-in-Solr-tp4057000p4057098.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Change the response of delta import

2013-04-18 Thread Shawn Heisey
On 4/18/2013 1:59 PM, hassancrowdc wrote: Is there any way i can change the response xml from delta import query: locathost:8080/solr/devices/dataimport?command=delta-import&commit=true I want to change the response. The response is created by the dataimporthandler source code. It's a contri

Change the response of delta import

2013-04-18 Thread hassancrowdc
Is there any way i can change the response xml from delta import query: locathost:8080/solr/devices/dataimport?command=delta-import&commit=true I want to change the response. -- View this message in context: http://lucene.472066.n3.nabble.com/Change-the-response-of-delta-import-tp4057093.html

RE: Making fields unavailable for return to specific end points.

2013-04-18 Thread Andrew Lundgren
Hmm... Just found this JIRA: https://issues.apache.org/jira/browse/SOLR-3191 I think I have answered my question. -Original Message- From: Andrew Lundgren [mailto:lundg...@familysearch.org] Sent: Thursday, April 18, 2013 1:21 PM To: solr-user@lucene.apache.org Subject: Making fields un

Making fields unavailable for return to specific end points.

2013-04-18 Thread Andrew Lundgren
We have a few internal fields that we would like to restrict from being returned in result sets. I have seen how fl is used in specify fields that you do what returned, I am kind of looking for the opposite. There are just a few fields that don't make sense to return to our clients. Is there

updating documents unintentionally adds extra values to certain fields

2013-04-18 Thread joyce chan
Hi I am using solr 4.2, and have set up spatial search config as below http://wiki.apache.org/solr/SpatialSearch#Schema_Configuration But everything I make an update to a document, http://wiki.apache.org/solr/UpdateJSON#Updating_a_Solr_Index_with_JSON more values of the *_coordinates fields get

Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread Shawn Heisey
On 4/18/2013 11:53 AM, sawanverma wrote: Shawn, Thanks a lot for your reply. But I am confused again if the following query is complex. http://localhost:8983/solr/test/select/?q=content:*&fl=content&hl=true&hl.fl=content&hl.maxAnalyzedChars=31375&start=64&rows=1&sort=obs_date%20desc I hardly

Re: Query Elevation Component

2013-04-18 Thread davers
I want to elevate certain documents differently depending a a certain fq parameter in the request. I've read of somebody coding solr to do this but no code was shared. Where would I start looking to implement this feature myself? -- View this message in context: http://lucene.472066.n3.nabble.c

RE: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread sawanverma
Shawn, Thanks a lot for your reply. But I am confused again if the following query is complex. http://localhost:8983/solr/test/select/?q=content:*&fl=content&hl=true&hl.fl=content&hl.maxAnalyzedChars=31375&start=64&rows=1&sort=obs_date%20desc Is that because of content : *? The only unusual thin

Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Lance Norskog
Run checksums on all files in both master and slave, and verify that they are the same. TCP/IP has a checksum algorithm that was state-of-the-art in 1969. On 04/18/2013 02:10 AM, Victor Ruiz wrote: Also, I forgot to say... the same error started to happen again.. the index is again corrupted :(

Sorting on alias fields

2013-04-18 Thread Stephane Gamard
Hi all, I am trying to sort results based on multiple fields aliased as one. Is that possible? While solr does not complain (no error, results OK, etc etc etc) it fails to sort the hits appropriately. I've attached the query, relevant schema part and result. I am very curious to know if that

shard query return 500 on large data set

2013-04-18 Thread Jie Sun
Hi - when I execute a shard query like: [myhost]:8080/solr/mycore/select?q=type:message&rows=14&...&qt=standard&wt=standard&explainOther=&hl.fl=&shards=solrserver1:8080/solr/mycore,solrserver2:8080/solr/mycore,solrserver3:8080/solr/mycore everything works fine until I query against a large

Re: solr 3.5 core rename issue

2013-04-18 Thread Jie Sun
yeah I realize using ${solr.core.name} for dataDir must be the cause for the issue we see... it is fair to say the SWAP and RENAME just create an alias that still points to the old datadir. if they can not fix it then it is not a bug :-) at least we understand exactly what is going on there. than

Re: Max http connections in CloudSolrServer

2013-04-18 Thread Shawn Heisey
On 4/18/2013 6:42 AM, J Mohamed Zahoor wrote: I dont yet know if this is the reason... I am looking if jetty has some limit on accepting connections.. Are you using the Jetty included with Solr, or a Jetty installed separately? The Jetty included with Solr has a maxThreads value of 1 in

Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread Shawn Heisey
On 4/18/2013 6:02 AM, sawanverma wrote: Hi Yonik, Thanks for your reply. I tried increasing the maxClauseCount to a bigger value. But what could be the ideal value and will not that hit the performance? What are the chances that if we increase the value we will not face this issue again? Ch

Re: Solr indexing

2013-04-18 Thread Jack Krupansky
Solr dates are always "Z", GMT. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Thursday, April 18, 2013 11:49 AM To: solr-user@lucene.apache.org Subject: Solr indexing Solr is not showing the dates i have in database. any help? is solr following any specific timezone?

Re: facet.method enum vs fc

2013-04-18 Thread Mingfeng Yang
20G is allocated to Solr already. Ming On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen wrote: > On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote: > > I am doing faceting on an index of 120M documents, > > on the field of url[...] > > I would guess that you would need 3-4GB for that. > H

Re: Paging and sorting in Solr

2013-04-18 Thread Jack Krupansky
Maybe you have your name field as "text" rather than "string". Don't try sorting "text" fields - make a copy (copyField) to a string field and sort the string field. So, for example, have "name" as "text" for keyword search, and "name_s" as "string" for sorting (and faceting.) -- Jack Krupansk

Re: Solr indexing

2013-04-18 Thread Andy Lester
On Apr 18, 2013, at 10:49 AM, hassancrowdc wrote: > Solr is not showing the dates i have in database. any help? is solr following > any specific timezone? On my database my date is 2013-04-18 11:29:33 but > solr shows me "2013-04-18T15:29:33Z". Any help Solr knows nothing of timezones. Solr

Solr indexing

2013-04-18 Thread hassancrowdc
Solr is not showing the dates i have in database. any help? is solr following any specific timezone? On my database my date is 2013-04-18 11:29:33 but solr shows me "2013-04-18T15:29:33Z". Any help -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-tp4057017.htm

Re: Paging and sorting in Solr

2013-04-18 Thread hassancrowdc
Hi, I double checked. It is the field. if i sort through manufacturer field it sorts but if i sort through name it does not sort. both the field has everything same. Is there any difference in sorting alphabetically or size of the word? -- View this message in context: http://lucene.472066.n3

Re: Solr 4.2 fl issue

2013-04-18 Thread Yonik Seeley
When using a field name that doen't follow conventions (basically like Java identifiers), try this: fl=field(098765-765-788558-7654_userid) Or enclose it in quotes if it's really a whacky field name: fl=field("098765-765-788558-7654_userid") -Yonik http://lucidworks.com On Thu, Apr 18, 2013 a

Re: Paging and sorting in Solr

2013-04-18 Thread Oussama Jilal
I am sure it does the sorting first (since I always done that). On 04/18/2013 02:49 PM, hassancrowdc wrote: I have done paging using solr rows and start query attributes. But now it shows me result with that is sorted page wise. I meant if i have the following scenario: rows=25&start=0&sort=ma

Paging and sorting in Solr

2013-04-18 Thread hassancrowdc
I have done paging using solr rows and start query attributes. But now it shows me result with that is sorted page wise. I meant if i have the following scenario: rows=25&start=0&sort=manufacturer asc It will give me first 25 matching results and then sort only those. I want it to sort all

solr4 : disable updateLog

2013-04-18 Thread Jamel ESSOUSSI
Hi, If I disable (comment) the updateLog bloc, this will affect indexing result: -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-disable-updateLog-tp4056998.html Sent from the Solr - User mailing list archive at Nabble.com.

more results when adding more criterias

2013-04-18 Thread Kai Becker
Hi, I have a field which has data like this: Where can have from 1 to 10 letters strings and can have up to 4 digits. It is defined like this: When the user enters foo, i search for foo directly or something that starts with "foo ". I don't wa

Re: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread Timothy Potter
Hi Dave, This sounds more like a budget / deployment issue vs. anything architectural. You want 2 shards with replication so you either need sufficient capacity on each of your 2 servers to host 2 Solr instances or you need 4 servers. You need to avoid starving Solr of necessary RAM, disk performa

RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
But my concern is this, when we have just 2 servers: - I want 1 to be able to take over in case the other fails, as you point out. - But when *both* servers are up I don't want the SolrCloud load balancer to have Shard1 and Replica2 do the work (as they would both reside on the same physical serv

RE: Tokenize on paragraphs and sentences

2013-04-18 Thread Alex Cougarman
Thanks, Jack. Sorry, took me a while to reply :) It sounds like sentence/paragraph level searches won't be easy. Warm regards, Alex -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: 15 April 2013 5:09 PM To: solr-user@lucene.apache.org Subject: Re: Tokenize

Re: Select Queris While Merging Indexes

2013-04-18 Thread Otis Gospodnetic
If you understand the underlying lucene searcher it will be easy to understand what's happening at solr level. Otis Solr & ElasticSearch Support http://sematext.com/ On Apr 18, 2013 3:22 AM, "Furkan KAMACI" wrote: > Thanks for explanations. I should read deep about the lifecycle of Searcher > ob

Re: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread Otis Gospodnetic
Correct. This is what you want if server 2 goes down. Otis Solr & ElasticSearch Support http://sematext.com/ On Apr 18, 2013 3:11 AM, "David Parks" wrote: > Step 1: distribute processing > > We have 2 servers in which we'll run 2 SolrCloud instances on. > > We'll define 2 shards so that both ser

Re: Solr 4.2 fl issue

2013-04-18 Thread Otis Gospodnetic
Hi, What is the issue though? :) Otis Solr & ElasticSearch Support http://sematext.com/ On Apr 18, 2013 2:53 AM, "William Bell" wrote: > We are getting an issue when using a GUID got a field in Solr 4.2. Solr 3.6 > is fine. Something like: > > fl=098765-765-788558-7654_userid as a string stored

Re: zkState changes too often

2013-04-18 Thread Mark Miller
On Apr 18, 2013, at 8:40 AM, jmozah wrote: > > > On 16-Apr-2013, at 11:16 PM, Mark Miller wrote: > >> Are you using a the concurrent low pause garbage collector or perhaps G1? > > > I use the default one which comes in jdk 1.7. It varies by platform, but 99% that means you are using the

Re: Max http connections in CloudSolrServer

2013-04-18 Thread J Mohamed Zahoor
I dont yet know if this is the reason... I am looking if jetty has some limit on accepting connections.. ./zahoor On 18-Apr-2013, at 12:52 PM, J Mohamed Zahoor wrote: > > Thanks for this. > The reason i asked this was.. when i fire 30 queries simultaneously from 30 > threads using the same

Re: zkState changes too often

2013-04-18 Thread jmozah
On 16-Apr-2013, at 11:16 PM, Mark Miller wrote: > Are you using a the concurrent low pause garbage collector or perhaps G1? I use the default one which comes in jdk 1.7. > > Are you able to use something like visualvm to pinpoint what the bottleneck > might be? Unfortunately.. it is pro

stats.facet not working for timestamp field

2013-04-18 Thread J Mohamed Zahoor
Hi I am using SOlr 4.1 with 6 shards. i want to find out some "price" stats for all the days in my index. I ended up using stats component like "stats=true&stats.field=price&stats.facet=timestamp". but it throws up error like Invalid Date String:' #1;#0;#0;#0;'[my(#0;' My Question is :

Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen
> You are missing an essential part: Both the facet and the sort > structures needs to hold one reference for each document > _in_the_full_index_, even when the document does not have any values in > the fields. > Wow, thank you for this awesome explanation! This is where the penny dropped for me.

RE: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread sawanverma
Yonik, When i remove the sort part from the query below it works fine. But with sort it throws the exception http://localhost:8983/solr/test/select/?q=content:*&fl=content&hl=true&hl.fl=content&hl.maxAnalyzedChars=31375&start=64&rows=1&sort=obs_date%20desc -- > Throws Exception http://localhos

RE: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread sawanverma
Hi Yonik, Thanks for your reply. I tried increasing the maxClauseCount to a bigger value. But what could be the ideal value and will not that hit the performance? What are the chances that if we increase the value we will not face this issue again? As you asked pasting below the full trace of

Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread pravesh
Update: Also remove your range queries from the main query and specify it as a filter query. Best Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/TooManyClauses-maxClauseCount-is-set-to-1024-tp4056965p4056969.html Sent from the Solr - User mailing list archive at

RE: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread sawanverma
Thanks Pravesh. But won't that hit the query performance? Still what would be the ideal value to increase? Say this error may come even if we increase the value from 1024 to say 5120? Have tried increasing the value and it had hit the performance. Regards, Sawan From: pravesh [via Lucene] [mai

Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread Yonik Seeley
Can you provide a full stack trace of the exception? There's a maxClauseCount in solrconfig.xml that you can increase to work around the issue. -Yonik http://lucidworks.com On Thu, Apr 18, 2013 at 7:31 AM, sawanverma wrote: > Its quite confusing about this error. > > I had a situation where i

Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread pravesh
Just increase the value of /maxClauseCount/ in your solrconfig.xml. Keep it large enough. Best Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/TooManyClauses-maxClauseCount-is-set-to-1024-tp4056965p4056966.html Sent from the Solr - User mailing list archive at Nabbl

TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread sawanverma
Its quite confusing about this error. I had a situation where i have to turn on the highlighting. In some cases though the number of docs found for a particular query was for example say 2, the highlighting was coming only for 1. I did some checks and found that that particular text searched was i

Re: Solr using a ridiculous amount of memory

2013-04-18 Thread Toke Eskildsen
On Thu, 2013-04-18 at 11:59 +0200, John Nielsen wrote: > Yes, thats right. No search from any given client ever returns > anything from another client. Great. That makes the 1 core/client solution feasible. [No sort & facet warmup is performed] [Suggestion 1: Reduce the number of sort fields by

Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen
> > > http://172.22.51.111:8000/solr/default1_Danish/search > > [...] > > > &fq=site_guid%3a(10217) > > This constraints to hits to a specific customer, right? Any search will > only be in a single customer's data? > Yes, thats right. No search from any given client ever returns anything from anot

Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Victor Ruiz
Also, I forgot to say... the same error started to happen again.. the index is again corrupted :( -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056926.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Victor Ruiz
Thank you again for your answer Shawn. Network card seems to work fine, but we've found segmentation faults, so now our hosting provider is going to run a full hw check. Hopefully they'll replace the server and problem wil be solved Regards, Victor -- View this message in context: http://l

Re: Solr using a ridiculous amount of memory

2013-04-18 Thread Toke Eskildsen
On Thu, 2013-04-18 at 08:34 +0200, John Nielsen wrote: > [Toke: Can you find the facet fields in any of the other caches?] > Yes, here it is, in the field cache: > http://screencast.com/t/mAwEnA21yL > Ah yes, mystery solved, my mistake. > http://172.22.51.111:8000/solr/default1_Danish/search

Re: Max http connections in CloudSolrServer

2013-04-18 Thread J Mohamed Zahoor
Thanks for this. The reason i asked this was.. when i fire 30 queries simultaneously from 30 threads using the same CloudSolrServer instance, some queries gets fired after a delay.. sometime the delay is 30-50 seconds... In solr logs i can see.. 20+ queries get fired almost immediately... but s

Re: Select Queris While Merging Indexes

2013-04-18 Thread Furkan KAMACI
Thanks for explanations. I should read deep about the lifecycle of Searcher objects. Should I read them from a Lucene book or is there any Solr documentation or books covers it? 2013/4/18 Jack Krupansky > "merging indexes" > > The proper terminology is "merging segments". > > Until the new, merg

SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
Step 1: distribute processing We have 2 servers in which we'll run 2 SolrCloud instances on. We'll define 2 shards so that both servers are busy for each request (improving response time of the request). Step 2: Failover We would now like to ensure that if either of the servers goes down (we