Migrating from Solr 3.6.1 to Solr 4

2013-01-05 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm currently working with solr 3.6.1, but solr 4 has great features like the 
ones bundled with SolrCloud, the content in the index is really not the problem 
to the transition, the thing is that I've a large app written in PHP + Solarium 
that interacts with the index in solr 3. As far as I know there is no support 
for solr 4 in solarium. So my question is is possible to use a solr 3.6.1 
fronted that gets the data from a solr 4 behind scenes, or there is any other 
workaround this?

Greetings!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Migrating from Solr 3.6.1 to Solr 4

2013-01-05 Thread Upayavira
Try pointing your app at 4.0. I converted an app recently. Here's the
steps I took (as I recall):

 * get original solrconfig.xml for the release I'm using
 * diff that and my solrconfig.xml
 * apply those changes to a 4.0 solrconfig.xml
 * try to start up solr with this new solrconfig and an old schema and
 an old index
 * fix each problem you find in the schema 
- some class names have changed
- you may want to delete some field definitions that you're not
using
- you'll need to copy the version field from the 4.0 schema

I found my app was able to search/index without any difficulty via the
XML/HTTP interface.

Your mileage may vary, but for that particular app, that is what it
took. 

Note, 4.0 can work in a 3.x way (old style replication, etc). You don't
need to use SolrCloud etc when using 4.0.

Upayavira

On Sat, Jan 5, 2013, at 08:20 AM, Jorge Luis Betancourt Gonzalez wrote:
> Hi:
> 
> I'm currently working with solr 3.6.1, but solr 4 has great features like
> the ones bundled with SolrCloud, the content in the index is really not
> the problem to the transition, the thing is that I've a large app written
> in PHP + Solarium that interacts with the index in solr 3. As far as I
> know there is no support for solr 4 in solarium. So my question is is
> possible to use a solr 3.6.1 fronted that gets the data from a solr 4
> behind scenes, or there is any other workaround this?
> 
> Greetings!
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


Re: Does solr cares about sections order in schema.xml?

2013-01-05 Thread Shawn Heisey

On 1/5/2013 2:32 AM, Alexandre Rafalovitch wrote:

Does schema.xml has a formal XML schema? In other words, what are the
restrictions on the order in which declarations inside the file happen.

I have seen some examples where types come before fields and some where it
is reversed. Some with copyFields separate and some where they seem to be
intermixed.


The order of fields and types was intentionally reversed in newer 
examples because they wanted a beginner to readily see the fields when 
they look at the example.  With the fieldType entries first, the actual 
field information can get a little lost.  I seem to remember seeing a 
message from Yonik about this reversal either on the dev list or in a 
Jira issue, but now I can't find it.


I don't think order matters much for the major sections.  My addled 
brain seems to remember something about copyField directives being 
processed in the order they appear, but I could be wrong about that part.


In my own config, copyField entries are at the end.  Looking at the 
newest 4x example, they are between fields and types.


Thanks,
Shawn



Re: Solr 3.6.2 or 4.0

2013-01-05 Thread vijeshnair
Thanks guyz, it really helps. I will now go ahead with my original plan of
using 4.0 for this project, I should be able to update you guyz soon on
this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-2-or-4-0-tp4030527p4030842.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question about GC logging timestamps

2013-01-05 Thread Shawn Heisey
I have a question about java GC logging.  Here's a log entry that I'm 
looking at:


2013-01-04T16:37:32.694-0700: 101832.244: [GC 101832.244: [ParNew: 
3722124K->419392K(3774912K), 9.1200100 secs] 
5800224K->2591046K(7969216K), 9.1201970 secs] [Times: user=10.46 
sys=45.66, real=9.12 secs]


This is a GC that took over 9 seconds.  The timestamp is at 16:37:32 ... 
but is that the time that the GC started, or is it the time that it 
ended?  If it is the start time, then the problem I am investigating is 
most likely caused by GC pauses, but if it is end time, then I need to 
look for another cause.


I have been unable to find any definitive answer one way or the other. 
I even went as far as grabbing the OpenJDK source and trying to decipher 
that, but that proved too large a task.  I'm actually using the Oracle 
JVM, but I couldn't locate the full Oracle source code.


If anyone knows the answer to this question, please let me know.  An 
official URL explaining the situation would be very nice.


Thanks,
Shawn


Re: SolrCloud and Join Queries

2013-01-05 Thread Hassan

Thanks Per and Otis,

It is much clearer now but I have a question about adding new solr nodes 
and collections.
I have a dedicated zookeeper instance. Lets say I have uploaded my 
configuration to zookeeper using "zkcli" and named it, say, 
"configuration1".
Now I want to create a new solrcloud from scratch with two solr nodes. I 
need to create a new collection (with one shard) called "customer1" 
using the configuration name "configuration1". I have tried different 
ways using Collections API, zkcli linkconfig/downconfig but I cannot get 
it to work. Collection is only available on one node. The example 
"collection1" works as expected where one node has the Leader shard and 
the other node has the replica. See the cloud graph 
http://imageshack.us/f/706/selection008p.png/


What is the correct way to dynamically add collections to already 
existing nodes and new nodes?


Thanks you,
Hs
On 05/01/13 09:07, Otis Gospodnetic wrote:

Hi,

I think things will work for Hassan as he described them.  The key is not
to shard in his case, that's all.

Hassan, yes, 1-2M docs is small. But beware of creating a crazy
number (e.g. thousands) of collections per server, as each collection has
some cost.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Jan 4, 2013 at 5:28 AM, Per Steffensen  wrote:


On 1/4/13 9:21 AM, Hassan wrote:


Hi,

I am considering SolrCloud for our applications but I have run into the
limitation of not being able to use Join Queries in distributed searches.
Our requirements are the following:
- SolrCloud will serve many applications where each application "index"
is separate from other application. Each application really is customer
deployment and we need to isolate customers data from each other
-Join queries are required. Queries will only look at one customer at a
time.
- Since data volume for each customer is small in Solr/Lucene standards,
(1-2 Million document is small, right?


Yes

  ), we are really interested in the replication aspect of SolrCloud more

than distributed search.

I am considering the following SolrCloud design with questions:
- Start SolrCloud with 1 shard only. This should allow join queries to
work correctly since all documents will be available in the same shard
(index). is this a correct assumption?
- Each customer will have its own collection in the SolrCloud.


You cant have only one shard and several collections. A collections
consists of a number of shards, but a shards "belong" to a collection, so
two different collections do not use the same shard. Shard is "below"
collection in the concept-hierarchy so to speak.

  Do collections provide me with data isolation between customers?
Yes?
Depends on what you mean with "isolation". Since different collections
enforce different shards, and each shard basically has its own lucene index
(set of lucene indices if you use replication), and distinct lucene indices
typically persist in different disk-folders, you will get "isolation" of
data in the way that data for different customers will be stored in
different disk-folders.

  - Adding more nodes as replicas of the single shard to achieve

replication and fault tolerance.

Thank you,
Hs


Not sure I understand completely what you want to achieve, but you might
want to have a collection per customer. One shard per collection = one
shard per customer = (as long as we do not consider replication) one lucene
index per customer = one data-disk-folder per customer. You should be able
to do join queries inside the specific customers shard.

Regards, Per Steffensen






Re: SolrCloud and Join Queries

2013-01-05 Thread Per Steffensen
Do you remember to add replicationFactor parameter when you create your 
"customer1" and "customer2" collections/shards?
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API 
(note that maxShardsPerNode and createNodeSet params are not available 
in 4.0.0, but will be in 4.1)


Regards, Per Steffensen

On 1/5/13 11:55 AM, Hassan wrote:

Thanks Per and Otis,

It is much clearer now but I have a question about adding new solr 
nodes and collections.
I have a dedicated zookeeper instance. Lets say I have uploaded my 
configuration to zookeeper using "zkcli" and named it, say, 
"configuration1".
Now I want to create a new solrcloud from scratch with two solr nodes. 
I need to create a new collection (with one shard) called "customer1" 
using the configuration name "configuration1". I have tried different 
ways using Collections API, zkcli linkconfig/downconfig but I cannot 
get it to work. Collection is only available on one node. The example 
"collection1" works as expected where one node has the Leader shard 
and the other node has the replica. See the cloud graph 
http://imageshack.us/f/706/selection008p.png/


What is the correct way to dynamically add collections to already 
existing nodes and new nodes?


Thanks you,
Hs




Re: SolrCloud and Join Queries

2013-01-05 Thread Hassan

Thanks Per and Otis,

It is much clearer now but I have a question about adding new solr nodes 
and collections.
I have a dedicated zookeeper instance. Lets say I have uploaded my 
configuration to zookeeper using "zkcli" and named it, say, 
"configuration1".
Now I want to create a new solrcloud from scratch with two solr nodes. I 
need to create a new collection (with one shard) called "customer1" 
using the configuration name "configuration1". I have tried different 
ways using Collections API, zkcli linkconfig/downconfig but I cannot get 
it to work. Collection is only available on one node. The example 
"collection1" works as expected where one node has the Leader shard and 
the other node has the replica. See the cloud tree 
http://imageshack.us/f/706/selection008p.png/


What is the correct way to dynamically add collections to already 
existing nodes and new nodes?


Thanks you,
Hs

On 05/01/13 09:07, Otis Gospodnetic wrote:

Hi,

I think things will work for Hassan as he described them.  The key is not
to shard in his case, that's all.

Hassan, yes, 1-2M docs is small. But beware of creating a crazy
number (e.g. thousands) of collections per server, as each collection has
some cost.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Jan 4, 2013 at 5:28 AM, Per Steffensen  wrote:


On 1/4/13 9:21 AM, Hassan wrote:


Hi,

I am considering SolrCloud for our applications but I have run into the
limitation of not being able to use Join Queries in distributed searches.
Our requirements are the following:
- SolrCloud will serve many applications where each application "index"
is separate from other application. Each application really is customer
deployment and we need to isolate customers data from each other
-Join queries are required. Queries will only look at one customer at a
time.
- Since data volume for each customer is small in Solr/Lucene standards,
(1-2 Million document is small, right?


Yes

  ), we are really interested in the replication aspect of SolrCloud more

than distributed search.

I am considering the following SolrCloud design with questions:
- Start SolrCloud with 1 shard only. This should allow join queries to
work correctly since all documents will be available in the same shard
(index). is this a correct assumption?
- Each customer will have its own collection in the SolrCloud.


You cant have only one shard and several collections. A collections
consists of a number of shards, but a shards "belong" to a collection, so
two different collections do not use the same shard. Shard is "below"
collection in the concept-hierarchy so to speak.

  Do collections provide me with data isolation between customers?
Yes?
Depends on what you mean with "isolation". Since different collections
enforce different shards, and each shard basically has its own lucene index
(set of lucene indices if you use replication), and distinct lucene indices
typically persist in different disk-folders, you will get "isolation" of
data in the way that data for different customers will be stored in
different disk-folders.

  - Adding more nodes as replicas of the single shard to achieve

replication and fault tolerance.

Thank you,
Hs


Not sure I understand completely what you want to achieve, but you might
want to have a collection per customer. One shard per collection = one
shard per customer = (as long as we do not consider replication) one lucene
index per customer = one data-disk-folder per customer. You should be able
to do join queries inside the specific customers shard.

Regards, Per Steffensen






Re: SolrCloud and Join Queries

2013-01-05 Thread Hassan

Missed the replicationFactor parameter. Works great now.
http://imm.io/RM66
Thanks a lot for you help,

One last question. in terms of scalability, having this design of one 
collection per customer, with one shard and many replicas, A query will 
be handled by one shard (or replica) on one node only and scalability 
here is really about load balancing queries between the replicas only. 
i.e no distributed search. is this correct?


Hassan

On 05/01/13 15:47, Per Steffensen wrote:
Do you remember to add replicationFactor parameter when you create 
your "customer1" and "customer2" collections/shards?
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API 
(note that maxShardsPerNode and createNodeSet params are not available 
in 4.0.0, but will be in 4.1)


Regards, Per Steffensen

On 1/5/13 11:55 AM, Hassan wrote:

Thanks Per and Otis,

It is much clearer now but I have a question about adding new solr 
nodes and collections.
I have a dedicated zookeeper instance. Lets say I have uploaded my 
configuration to zookeeper using "zkcli" and named it, say, 
"configuration1".
Now I want to create a new solrcloud from scratch with two solr 
nodes. I need to create a new collection (with one shard) called 
"customer1" using the configuration name "configuration1". I have 
tried different ways using Collections API, zkcli 
linkconfig/downconfig but I cannot get it to work. Collection is only 
available on one node. The example "collection1" works as expected 
where one node has the Leader shard and the other node has the 
replica. See the cloud graph 
http://imageshack.us/f/706/selection008p.png/


What is the correct way to dynamically add collections to already 
existing nodes and new nodes?


Thanks you,
Hs








Re: background merge hit exception AND read past EOF: NIOFSIndexInput

2013-01-05 Thread Karan jindal
thanks otis,
index was corrupted.


On Sat, Jan 5, 2013 at 1:17 AM, Otis Gospodnetic  wrote:

> Sounds like you may have a corrupt index. Try running the CheckIndex tool.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jan 3, 2013 8:59 AM, "Karan jindal"  wrote:
>
> > Hi everyone,
> >
> > I have a solr index which is built using solr 3.2.
> >
> > I am facing two problem with that solr index.
> >
> > While Seaching:-
> > 1) Searching (searching on a particular field) on that index gives
>  "*read
> > past EOF: NIOFSIndexInput*" error  but work fine for **:** query.
> > 2) searching work fine with solr 3.2.
> >
> > While Optimizing:-
> > 1) for both version of solr 3.2 and 3.6 it gives "background merge hit
> > exception" error while optimizing. I read various discuss about this on
> > solr user list. mostly all of them have complaint about *no disk space
> > error
> > * but that is not happening in my case. I have a free memory (disk space)
> > of 140GB and index size is 5GB.
> >
> > Any Ideas what is going on?
> >
> > Thanks
> > Karan
> >
>


Java memory tuning advice - ignore what I used to say!

2013-01-05 Thread Shawn Heisey
I would like to offer a formal apology to anyone who has followed my 
memory tuning advice that includes the "-XX:NewRatio=1" option.  I have 
lately learned that this setting is causing me a problem I didn't even 
know about - extremely long pauses during young generation (ParNew) GC, 
leading to occasional super-long queries and load balancer health check 
failures.  I was warned this might happen, so some of you get to say "I 
told you so!"


I am still working on my config, so I don't think it would be 
appropriate to share it at this time.  After I've done a lot more 
testing and can reach some reasonable conclusions, I'll share the final 
results.


Thanks,
Shawn


Re: Migrating from Solr 3.6.1 to Solr 4

2013-01-05 Thread Jorge Luis Betancourt Gonzalez
So, from my "php app point of view" if I have the desire of using solrcloud 
feautures changes will be needed right? One more thing the responses generated 
from solr4 are in any way different from the ones generated from solr3? Because 
solarium parses the JSON response from the server to provide high level objects 
encapsulating the response and response content.

Greetings!

- Mensaje original -
De: "Upayavira" 
Para: solr-user@lucene.apache.org
Enviados: Sábado, 5 de Enero 2013 4:49:01
Asunto: Re: Migrating from Solr 3.6.1 to Solr 4

Try pointing your app at 4.0. I converted an app recently. Here's the
steps I took (as I recall):

 * get original solrconfig.xml for the release I'm using
 * diff that and my solrconfig.xml
 * apply those changes to a 4.0 solrconfig.xml
 * try to start up solr with this new solrconfig and an old schema and
 an old index
 * fix each problem you find in the schema
- some class names have changed
- you may want to delete some field definitions that you're not
using
- you'll need to copy the version field from the 4.0 schema

I found my app was able to search/index without any difficulty via the
XML/HTTP interface.

Your mileage may vary, but for that particular app, that is what it
took.

Note, 4.0 can work in a 3.x way (old style replication, etc). You don't
need to use SolrCloud etc when using 4.0.

Upayavira

On Sat, Jan 5, 2013, at 08:20 AM, Jorge Luis Betancourt Gonzalez wrote:
> Hi:
>
> I'm currently working with solr 3.6.1, but solr 4 has great features like
> the ones bundled with SolrCloud, the content in the index is really not
> the problem to the transition, the thing is that I've a large app written
> in PHP + Solarium that interacts with the index in solr 3. As far as I
> know there is no support for solr 4 in solarium. So my question is is
> possible to use a solr 3.6.1 fronted that gets the data from a solr 4
> behind scenes, or there is any other workaround this?
>
> Greetings!
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


custom solr sort

2013-01-05 Thread andy
Hi,

Maybe this is an old thread or maybe it's different with previous one.

I want to custom solr sort and  pass solr param from client to solr server,
so I  implemented SearchComponent which named MySortComponent in my code,
and also implemented FieldComparatorSource and FieldComparator. when I use
"mysearch" requesthandler(see following codes), I found that custom sort
just effect on the current page when I got multiple page results, but the
sort is expected when I sets the rows which contains  all the results. Does
anybody know how to solve it or the reason?

code snippet:

public class MySortComponent extends SearchComponent implements
SolrCoreAware {
  
public void inform(SolrCore arg0) {
}

@Override
public void prepare(ResponseBuilder rb) throws IOException {
SolrParams params = rb.req.getParams();
String uid = params.get("uid")
private RestTemplate restTemplate = new RestTemplate();

MyComparatorSource comparator = new MyComparatorSource(uid);
SortSpec sortSpec = rb.getSortSpec();
if (sortSpec.getSort() == null) {
sortSpec.setSort(new Sort(new SortField[] {
new SortField("relation",
comparator),SortField.FIELD_SCORE }));
  
} else {
  
SortField[] current = sortSpec.getSort().getSort();
ArrayList sorts = new ArrayList(
current.length + 1);
sorts.add(new SortField("relation", comparator));
for (SortField sf : current) {
sorts.add(sf);
}
sortSpec.setSort(new Sort(sorts.toArray(new
SortField[sorts.size()])));
  
}

}

@Override
public void process(ResponseBuilder rb) throws IOException {

}

//
-
// SolrInfoMBean
//
-

@Override
public String getDescription() {
return "Custom Sorting";
}

@Override
public String getSource() {
return "";
}

@Override
public URL[] getDocs() {
try {
return new URL[] { new URL(
"http://wiki.apache.org/solr/QueryComponent";) };
} catch (MalformedURLException e) {
throw new RuntimeException(e);
}
}

public class MyComparatorSource extends FieldComparatorSource {
private BitSet dg1;
private BitSet dg2;
private BitSet dg3;

public MyComparatorSource(String uid) throws IOException {

SearchResponse responseBody = restTemplate.postForObject(
"http://search.test.com/userid/search/"; + uid, null,
SearchResponse.class);

String d1 = responseBody.getOneDe();
String d2 = responseBody.getTwoDe();
String d3 = responseBody.getThreeDe();

if (StringUtils.hasLength(d1)) {
byte[] bytes = Base64.decodeBase64(d1);
dg1 = BitSetHelper.loadFromBzip2ByteArray(bytes);
}
 
if (StringUtils.hasLength(d2)) {
byte[] bytes = Base64.decodeBase64(d2);
dg2 = BitSetHelper.loadFromBzip2ByteArray(bytes);
}
   
if (StringUtils.hasLength(d3)) {
byte[] bytes = Base64.decodeBase64(d3);
dg3 = BitSetHelper.loadFromBzip2ByteArray(bytes);
}
   
}

@Override
public FieldComparator newComparator(String fieldname,
final int numHits, int sortPos, boolean reversed)
throws IOException {
return new RelationComparator(fieldname, numHits);
}

class RelationComparator extends FieldComparator {
private int[] uidDoc;
private float[] values;
private float bottom;
String fieldName;

public RelationComparator(String fieldName, int numHits)
throws IOException {
values = new float[numHits];
this.fieldName = fieldName;
}

@Override
public int compare(int slot1, int slot2) {
if (values[slot1] > values[slot2])
return -1;
if (values[slot1] < values[slot2])
return 1;
return 0;
}

@Override
public int compareBottom(int doc) throws IOException {
float docDistance = getRelation(doc);
if (bottom < docDistance)
return -1;
if (bottom > docDistance)
return 1;
return 0;
}

@Override
public void copy(int slot, int doc) throws IOException {
values[slot] = getRelation(doc);

Re: Question about GC logging timestamps

2013-01-05 Thread Otis Gospodnetic
What is you write an app that creates lots of objects, connect to it with
jconsole and try forcing/requesting gc.  Or just do it from the app itself.
Then you can log the start and stop time and correlate that with time in gc
log.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 5, 2013 5:49 AM, "Shawn Heisey"  wrote:

> I have a question about java GC logging.  Here's a log entry that I'm
> looking at:
>
> 2013-01-04T16:37:32.694-0700: 101832.244: [GC 101832.244: [ParNew:
> 3722124K->419392K(3774912K), 9.1200100 secs] 5800224K->2591046K(7969216K),
> 9.1201970 secs] [Times: user=10.46 sys=45.66, real=9.12 secs]
>
> This is a GC that took over 9 seconds.  The timestamp is at 16:37:32 ...
> but is that the time that the GC started, or is it the time that it ended?
>  If it is the start time, then the problem I am investigating is most
> likely caused by GC pauses, but if it is end time, then I need to look for
> another cause.
>
> I have been unable to find any definitive answer one way or the other. I
> even went as far as grabbing the OpenJDK source and trying to decipher
> that, but that proved too large a task.  I'm actually using the Oracle JVM,
> but I couldn't locate the full Oracle source code.
>
> If anyone knows the answer to this question, please let me know.  An
> official URL explaining the situation would be very nice.
>
> Thanks,
> Shawn
>


Re: Frequent OOM - (Unknown source in logs).

2013-01-05 Thread Otis Gospodnetic
Out of curiosity, did you figure out the cause or did the OOMs just go away
with a nightly build?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Sat, Dec 29, 2012 at 8:08 PM, shreejay  wrote:

> Otis,
>
> As of now I have disabled caches. And we are hardly running any queries at
> this point. I filter mostly on string fields and two int fields, 2 dates
> (one is a dynamic date field) and one dynamic string field. Same goes for
> faceting also, except I do not use facets on the dynamic field.
>
> In terms of documents , my index is around 5.5 million.
>
> I am not trying to rule out the possibility of queries causing this, but I
> think its highly unlikely. The cluster is being used only by a select few
> and I have noticed this behaviour generally during indexing (which happens
> mostly during the night).
>
> The Autocommit with 0 values seems to be working. I am constantly
> monitoring
> the logs and I can see commits happening only every 30 and 60 min
> intervals.
> These commits are run by a cron job.
>
> Shawn,
>
> Your assumption is right! Thanks for the tips on Solconfig changes. They
> have been working good most of the times. I am assuming these issues come
> up
> after a couple of million of documents are indexed. During the initial
> indexing (upto 3-4 million docs) everything seems to be fine. I am going to
> try the recent nightly build apache-solr-4.1-2012-12-28_12-29-23 now. Lets
> hope using 4.1 fixes these issues. I have already started indexing my
> documents on a 3.6.2 Solr box as a backup. I am also going to reduce the
> JVM
> heap size and experiment between 8 - 10 GB, since i think some of the ZK
> connection issues were happening due to longer GC pauses.
>
>
> Thanks Jack for verifying it in the code.
>
> --Shreejay
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361p4029657.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud and Join Queries

2013-01-05 Thread Otis Gospodnetic
Hi Hassan,

Correct. If you have a single shard, then the query will execute the query
on only one node and that is it.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Sat, Jan 5, 2013 at 9:06 AM, Hassan  wrote:

> Missed the replicationFactor parameter. Works great now.
> http://imm.io/RM66
> Thanks a lot for you help,
>
> One last question. in terms of scalability, having this design of one
> collection per customer, with one shard and many replicas, A query will be
> handled by one shard (or replica) on one node only and scalability here is
> really about load balancing queries between the replicas only. i.e no
> distributed search. is this correct?
>
> Hassan
>
>
> On 05/01/13 15:47, Per Steffensen wrote:
>
>> Do you remember to add replicationFactor parameter when you create your
>> "customer1" and "customer2" collections/shards?
>> http://wiki.apache.org/solr/**SolrCloud#Managing_**collections_via_the_**
>> Collections_API(note
>>  that maxShardsPerNode and createNodeSet params are not available in
>> 4.0.0, but will be in 4.1)
>>
>> Regards, Per Steffensen
>>
>> On 1/5/13 11:55 AM, Hassan wrote:
>>
>>> Thanks Per and Otis,
>>>
>>> It is much clearer now but I have a question about adding new solr nodes
>>> and collections.
>>> I have a dedicated zookeeper instance. Lets say I have uploaded my
>>> configuration to zookeeper using "zkcli" and named it, say,
>>> "configuration1".
>>> Now I want to create a new solrcloud from scratch with two solr nodes. I
>>> need to create a new collection (with one shard) called "customer1" using
>>> the configuration name "configuration1". I have tried different ways using
>>> Collections API, zkcli linkconfig/downconfig but I cannot get it to work.
>>> Collection is only available on one node. The example "collection1" works
>>> as expected where one node has the Leader shard and the other node has the
>>> replica. See the cloud graph http://imageshack.us/f/706/**
>>> selection008p.png/ 
>>>
>>> What is the correct way to dynamically add collections to already
>>> existing nodes and new nodes?
>>>
>>> Thanks you,
>>> Hs
>>>
>>
>>
>>
>>
>


RE: Question about GC logging timestamps

2013-01-05 Thread Michael Ryan
>From my own experience, the timestamp seems to be logged at the start of the 
>garbage collection.

-Michael


Searching for Solr Stop Words

2013-01-05 Thread Cool Techi

On of my solr fields is configured in the following manned,


 





 
 
   
   
   
   




This works in cases where i don't want stemming, but now there is 
another use case which is causing a problem, people are beginning to 
seach for the following combinations,

The Ivy : In this case results with just ivy is being returned, 
when the expected result would be with The. I understand that this is 
because of the stop word but is the way to achieve this. For example if 
they search for "the ivy" within quotes than this should work.(Mom & Me) OR 
("mom and me"): In this case also & is 
dropped or results including both mom and me in some part of the 
statement is returned.

I am ok if only new data behaves in the right way but wouldnt be able
 to reindex. Also, would changing the schema.xml file trigger a full 
replication?


Regards,

Ayush