Re: Querying a transitive closure?

2013-03-28 Thread Upayavira
Why don't you index all ancestor classes with the document, as a
multivalued field, then you could get it in one hit. Am I missing
something?

Upayavira

On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote:
> Hi Otis,
> That's essentially the answer I was looking for: each shard (are we
> talking master + replicas?) has the plug-in custom query handler.  I
> need to build it to find out.
> 
> What I mean is that there is a taxonomy, say one with a single root
> for sake of illustration, which grows all the classes, subclasses, and
> instances. If I have an object that is somewhere in that taxonomy,
> then it has a zigzag chain of parents up that tree (I've seen that
> called a "transitive closure". If class B is way up that tree from M,
> no telling how many queries it will take to find it.  Hmmm...
> recursive ascent, I suppose.
> 
> Many thanks
> Jack
> 
> On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
>  wrote:
> > Hi Jack,
> >
> > I don't fully understand the exact taxonomy structure and your needs,
> > but in terms of reducing the number of HTTP round trips, you can do it
> > by writing a custom SearchComponent that, upon getting the initial
> > request, does everything "locally", meaning that it talks to the
> > local/specified shard before returning to the caller.  In SolrCloud
> > setup with N shards, each of these N shards could be queried in such a
> > way in parallel, running query/queries on their local shards.
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Wed, Mar 27, 2013 at 3:11 PM, Jack Park  wrote:
> >> Hi Otis,
> >>
> >> I fully expect to grow to SolrCloud -- many shards. For now, it's
> >> solo. But, my thinking relates to cloud. I look for ways to reduce the
> >> number of HTTP round trips through SolrJ. Maybe you have some ideas?
> >>
> >> Thanks
> >> Jack
> >>
> >> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
> >>  wrote:
> >>> Hi Jack,
> >>>
> >>> Is this really about HTTP and Solr vs. SolrCloud or more whether
> >>> Solr(Cloud) is the right tool for the job and if so how to structure
> >>> the schema and queries to make such lookups efficient?
> >>>
> >>> Otis
> >>> --
> >>> Solr & ElasticSearch Support
> >>> http://sematext.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park  
> >>> wrote:
>  This is a question about "isA?"
> 
>  We want to know if M isA B   isA?(M,B)
> 
>  For some M, one might be able to look into M to see its type or which
>  class(es) for which it is a subClass. We're talking taxonomic queries
>  now.
>  But, for some M, one might need to ripple up the "transitive closure",
>  looking at all the super classes, etc, recursively.
> 
>  It seems unreasonable to do that over HTTP; it seems more reasonable
>  to grab a core and write a custom isA query handler. But, how do you
>  do that in a SolrCloud?
> 
>  Really curious...
> 
>  Many thanks in advance for ideas.
>  Jack


Re: Setup solrcloud on tomcat

2013-03-28 Thread Furkan KAMACI
First of all, can you check your catalina.out log. It gives the detail
about what is wrong. Secondly you can separate such kind of JVM parameters
from that solr.xml and put them into a file setenv.sh (you will create it
under bin folder of tomcat.) and here is what you should do:

#!/bin/sh
JAVA_OPTS="$JAVA_OPTS
-Dbootstrap_confdir=/usr/share/solrhome/collection1/conf
-Dcollection.configName=custom_conf -DnumShards=2 -DzkRun"
export JAVA_OPTS

You should change here -> /usr/share/solrhome
into where is your solr home.

That should start up an embedded zookeper.

On the other hand client that will connect to embedded zookeper should have
that setenv.sh:

#!/bin/sh
JAVA_OPTS="$JAVA_OPTS -DzkHost=**.**.***.**:2181"
export JAVA_OPTS

I have masked ip address, you should put your's.


2013/3/28 하정대 

> Hi, all
>
> I tried setup solrcloud on tomcat. But I couldn’t see the cloud bar on
> solr menu. I think embedded zookeeper might not be loaded.
> This is my solr.xml file that was supposed to run zookeeper.
>
> 
> host=”${host:}” hostPort=”8080” hostContext=”${hostContext:}” numShards=”2”
> zkRun=http://localhost:9081 zkClientTimeout=”${zkClientTimeout:15000}” >
>   
>
> 
>
> What shall I have? I need your help.
> Also, Example file or tutorial could be a good help for me.
> I am working this with solrcloud wiki.
>
> Thanks. All.
>
>
> 
> “세상에서 가장 안전한 이름 - 안철수연구소”
> 하정대,  선임연구원 / ASD실
> Tel: 031-722-8338
> e-mail: jungdae...@ahnlab.com  http://www.ahnlab.com<
> http://www.ahnlab.com/>
> (우)463-400 경기도 성남시 분당구 삼평동 673번지
> 
>
>


How to update synonyms.txt without restart?

2013-03-28 Thread Kaneyama Genta
Dear all,

I investigating how to update synonyms.txt.
Some people says CORE RELOAD will reload synonyms.txt.

But solr wiki says:
```
Starting with Solr4.0, the RELOAD command is implemented in a way that
results a "live" reloads of the SolrCore, reusing the existing various
objects such as the SolrIndexWriter. As a result, some configuration
options can not be changed and made active with a simple RELOAD...
```
http://wiki.apache.org/solr/CoreAdmin#RELOAD

And https://issues.apache.org/jira/browse/SOLR-3592 is marked as unresolved.

Problem is How can I update synonyms.txt in production environment?
Workaround is restart Solr process. But it is not looks good for me.

Will someone tell me what is the best practice of synonyms.txt updating?

Thanks in advance.


Re: multicore vs multi collection

2013-03-28 Thread hupadhyay
Does that means i can create multiple collections with different
configurations ?
can you please outline basic steps to create multiple collections,cause i am
not able to 
create them on solr 4.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multicore-vs-multi-collection-tp4051352p4052002.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Querying a transitive closure?

2013-03-28 Thread Jens Grivolla
Exactly, you should usually design your schema to fit your queries, and 
if you need to retrieve all ancestors then you should index all 
ancestors so you can query for them easily.


If that doesn't work for you then either Solr is not the right tool for 
the job, or you need to rethink your schema.


The description of doing lookups within a tree structure doesn't sound 
at all like what you would use a text retrieval engine for, so you might 
want to rethink why you want to use Solr for this. But if that 
"transitive closure" is something you can calculate at indexing time 
then the correct solution is the one Upayavira provided.


If you want people to be able to help you you need to actually describe 
your problem (i.e. what is my data, and what are my queries) instead of 
diving into technical details like "reducing HTTP roundtrips". My guess 
is that if you need to "reduce HTTP roundtrips" you're probably doing it 
wrong.


HTH,
Jens

On 03/28/2013 08:15 AM, Upayavira wrote:

Why don't you index all ancestor classes with the document, as a
multivalued field, then you could get it in one hit. Am I missing
something?

Upayavira

On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote:

Hi Otis,
That's essentially the answer I was looking for: each shard (are we
talking master + replicas?) has the plug-in custom query handler.  I
need to build it to find out.

What I mean is that there is a taxonomy, say one with a single root
for sake of illustration, which grows all the classes, subclasses, and
instances. If I have an object that is somewhere in that taxonomy,
then it has a zigzag chain of parents up that tree (I've seen that
called a "transitive closure". If class B is way up that tree from M,
no telling how many queries it will take to find it.  Hmmm...
recursive ascent, I suppose.

Many thanks
Jack

On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
 wrote:

Hi Jack,

I don't fully understand the exact taxonomy structure and your needs,
but in terms of reducing the number of HTTP round trips, you can do it
by writing a custom SearchComponent that, upon getting the initial
request, does everything "locally", meaning that it talks to the
local/specified shard before returning to the caller.  In SolrCloud
setup with N shards, each of these N shards could be queried in such a
way in parallel, running query/queries on their local shards.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Mar 27, 2013 at 3:11 PM, Jack Park  wrote:

Hi Otis,

I fully expect to grow to SolrCloud -- many shards. For now, it's
solo. But, my thinking relates to cloud. I look for ways to reduce the
number of HTTP round trips through SolrJ. Maybe you have some ideas?

Thanks
Jack

On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
 wrote:

Hi Jack,

Is this really about HTTP and Solr vs. SolrCloud or more whether
Solr(Cloud) is the right tool for the job and if so how to structure
the schema and queries to make such lookups efficient?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Mar 27, 2013 at 12:53 PM, Jack Park  wrote:

This is a question about "isA?"

We want to know if M isA B   isA?(M,B)

For some M, one might be able to look into M to see its type or which
class(es) for which it is a subClass. We're talking taxonomic queries
now.
But, for some M, one might need to ripple up the "transitive closure",
looking at all the super classes, etc, recursively.

It seems unreasonable to do that over HTTP; it seems more reasonable
to grab a core and write a custom isA query handler. But, how do you
do that in a SolrCloud?

Really curious...

Many thanks in advance for ideas.
Jack







Re: How to update synonyms.txt without restart?

2013-03-28 Thread Jack Krupansky
You should be fine for synonym and other schema changes since they are 
unrelated to the IndexWriter.


But... if you are using synonyms in your "index" analyzer, as opposed to in 
your "query" analyzer, then you need to do a full reindex anyway, which is 
best done by deleting the contents of the Solr data directory for the 
collection and restarting Solr and resending all of the source documents.


-- Jack Krupansky

-Original Message- 
From: Kaneyama Genta

Sent: Thursday, March 28, 2013 5:11 AM
To: solr-user@lucene.apache.org
Subject: How to update synonyms.txt without restart?

Dear all,

I investigating how to update synonyms.txt.
Some people says CORE RELOAD will reload synonyms.txt.

But solr wiki says:
```
Starting with Solr4.0, the RELOAD command is implemented in a way that
results a "live" reloads of the SolrCore, reusing the existing various
objects such as the SolrIndexWriter. As a result, some configuration
options can not be changed and made active with a simple RELOAD...
```
http://wiki.apache.org/solr/CoreAdmin#RELOAD

And https://issues.apache.org/jira/browse/SOLR-3592 is marked as unresolved.

Problem is How can I update synonyms.txt in production environment?
Workaround is restart Solr process. But it is not looks good for me.

Will someone tell me what is the best practice of synonyms.txt updating?

Thanks in advance. 



Re: How to update synonyms.txt without restart?

2013-03-28 Thread Erik Hatcher
 (pointed to from SOLR-3592) 
indicates it is resolved.

I just tried it on my local 4x branch checkout, using the analysis page 
(text_general analyzing "foo"), added a synonym, went to core admin clicked 
"reload" and saw the synonym appear afterwards.

Erik


On Mar 28, 2013, at 05:11 , Kaneyama Genta wrote:

> Dear all,
> 
> I investigating how to update synonyms.txt.
> Some people says CORE RELOAD will reload synonyms.txt.
> 
> But solr wiki says:
> ```
> Starting with Solr4.0, the RELOAD command is implemented in a way that
> results a "live" reloads of the SolrCore, reusing the existing various
> objects such as the SolrIndexWriter. As a result, some configuration
> options can not be changed and made active with a simple RELOAD...
> ```
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
> 
> And https://issues.apache.org/jira/browse/SOLR-3592 is marked as unresolved.
> 
> Problem is How can I update synonyms.txt in production environment?
> Workaround is restart Solr process. But it is not looks good for me.
> 
> Will someone tell me what is the best practice of synonyms.txt updating?
> 
> Thanks in advance.



Re: Too many fields to Sort in Solr

2013-03-28 Thread Joel Bernstein
Hi,

I tested this config on Solr 4.2 this morning and it worked:




I also loaded data and ran a sort  and looked at the heap with jvisualvm
and the longs were not loaded into the jvm's heap. The sort was also very
fast, although only on 600,000 records.

Possibly you are not on Solr 4.2? Can you post both your filedType
definition and your field definition?

Joel





On Thu, Mar 28, 2013 at 12:57 AM, adityab  wrote:

> Hi Joel,
> you are correct, boost function populates the field cache. Well i am not
> aware of docValue, so while trying the example you provided i see the error
> when i define the field type
>
> Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
> configured with a docValues format, but the codec does not support it:
> class
> org.apache.solr.core.SolrCore$3
> at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
> at org.apache.solr.core.SolrCore.(SolrCore.java:719)
> ... 13 more
>
> My field defination:
>  positionIncrementGap="0" docValuesFormat="Disk"/>
>
> what am i missing here?
>
> thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4051960.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Joel Bernstein
Professional Services LucidWorks


Re:

2013-03-28 Thread anuj vats
Waiting for your assitence to get config entries for 3 server solr cloud setup..


Thanks in advance


Anuj From: "anuj vats"Sent: Fri, 22 Mar 2013 
17:32:10 To: 
"solr-user@lucene.apache.org"Cc: 
"mayank...@gmail.com"Subject: 
Hi Shawan,

I have seen your post on solr cloud Master-Master configuration on two servers. 
I have to use the same Solr structure, but from long I am not able to configure 
it to comunicate between two server, on single server it works fine.
Can you pls help me out to provide required config changes, so that SOLR can 
comunicate between two servers.

http://grokbase.com/t/lucene/solr-user/132pb1pe34/solrcloud-master-master

Regards
Anuj Vats
Get your own FREE website and domain with business email solutions, click here

Re: multicore vs multi collection

2013-03-28 Thread Jack Krupansky

Unable? In what way?

Did you look at the Solr "example"?

Did you look at solr.xml?

Did you see the  element? (Needs to be one per core/collection.)

Did you see the "multicore" directory in the example?

Did you look at the solr.xml file in multicore?

Did you see how there are separate directories for each collection/core in 
multicore?


Did you see how there is a  element in solr.xml in multicore, one for 
each collection directory (instance)?


Did you try setting up your own test directory parallel to multicore in 
example?


Did you read the README.txt files in the Solr example directories?

Did you see the command to start Solr with a specific Solr "home" 
directory? -


   java -Dsolr.solr.home=multicore -jar start.jar

Did you try that for your own test solr home directory created above?

So... what exactly was the problem you were encountering? Be specific.

My guess is that you simply need to re-read the README.txt files more 
carefully in the Solr "example" directories.


If you have questions about what the README.txt files say, please ask them, 
but please be specific.


-- Jack Krupansky

-Original Message- 
From: hupadhyay

Sent: Thursday, March 28, 2013 5:35 AM
To: solr-user@lucene.apache.org
Subject: Re: multicore vs multi collection

Does that means i can create multiple collections with different
configurations ?
can you please outline basic steps to create multiple collections,cause i am
not able to
create them on solr 4.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multicore-vs-multi-collection-tp4051352p4052002.html
Sent from the Solr - User mailing list archive at Nabble.com. 



How to male Solr complex Join Query

2013-03-28 Thread ashimbose
Hi 

I need to do complex join in single core with multiple table. 

Like Inner , Outer, Left, Right and so on.

I am working with solr4.

Is there I can work with any type of join with solr4?

Is there any way to do so? Please give your suggestion, its very important.
Please help me..

Thanks in advance.

Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-male-Solr-complex-Join-Query-tp4052023.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to male Solr complex Join Query

2013-03-28 Thread Karol Sikora

Hi Ashim,

You probably doing something in wrong way if You need using such a 
complex joins.

Remember that solr isn't relational database.
You should probably revisit Your schema and flatten Your data structure.

Regards,
Karol


W dniu 28.03.2013 13:45, ashimbose pisze:

Hi

I need to do complex join in single core with multiple table.

Like Inner , Outer, Left, Right and so on.

I am working with solr4.

Is there I can work with any type of join with solr4?

Is there any way to do so? Please give your suggestion, its very important.
Please help me..

Thanks in advance.

Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-male-Solr-complex-Join-Query-tp4052023.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re:

2013-03-28 Thread Tomás Fernández Löbbe
Could you give more details on what's not working? Have you followed the
instructions here: http://wiki.apache.org/solr/SolrCloud#Getting_Started
Are you using an embedded Zookeeper or an external server? How many of
them? Are you using numShards=1?2?

What do you see in the Solr UI, in the "cloud" section?

Tomás


On Thu, Mar 28, 2013 at 8:44 AM, anuj vats  wrote:

> Waiting for your assitence to get config entries for 3 server solr cloud
> setup..
>
>
> Thanks in advance
>
>
> Anuj From: "anuj vats"Sent: Fri, 22 Mar
> 2013 17:32:10 To: "solr-user@lucene.apache.org"&
> lt;solr-user@lucene.apache.org>Cc: "mayank...@gmail.com"&
> lt;mayank...@gmail.com>Subject: 
> Hi Shawan,
>
> I have seen your post on solr cloud Master-Master configuration on two
> servers. I have to use the same Solr structure, but from long I am not able
> to configure it to comunicate between two server, on single server it works
> fine.
> Can you pls help me out to provide required config changes, so that SOLR
> can comunicate between two servers.
>
> http://grokbase.com/t/lucene/solr-user/132pb1pe34/solrcloud-master-master
>
> Regards
> Anuj Vats
> Get your own FREE website and domain with business email solutions, click
> here


Re: Too many fields to Sort in Solr

2013-03-28 Thread adityab
Here is the field type definition. same as what you posted yesterday just a
different name. 

 

And Field Definition



as soon as i restart the server i see the exception in log. removing the
*docValuesFormat="Disk"* from the field type i don't see this exception. 

01:49:37,177 ERROR [org.apache.solr.core.CoreContainer]
(coreLoadExecutor-3-thread-1) Unable to create core: collection1:
org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with
a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.(SolrCore.java:806)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at org.apache.solr.core.SolrCore.(SolrCore.java:619)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
[rt.jar:1.7.0_09]
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
[rt.jar:1.7.0_09]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[rt.jar:1.7.0_09]
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
[rt.jar:1.7.0_09]
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
[rt.jar:1.7.0_09]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
[rt.jar:1.7.0_09]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
[rt.jar:1.7.0_09]
at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09]
Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
configured with a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at org.apache.solr.core.SolrCore.(SolrCore.java:719)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
... 13 more

01:49:37,202 ERROR [org.apache.solr.core.CoreContainer]
(coreLoadExecutor-3-thread-1) null:org.apache.solr.common.SolrException:
Unable to create core: collection1
at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
configured with a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.(SolrCore.java:806)
at org.apache.solr.core.SolrCore.(SolrCore.java:619)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
... 10 more
Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
configured with a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
at org.apache.solr.core.SolrCore.(SolrCore.java:719)
... 13 more




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052036.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Too many fields to Sort in Solr

2013-03-28 Thread Joel Bernstein
OK, you'll need to re-index. Shutdown, delete the data, re-index.


On Thu, Mar 28, 2013 at 9:12 AM, adityab  wrote:

> Here is the field type definition. same as what you posted yesterday just a
> different name.
>
>  docValuesFormat="Disk" positionIncrementGap="0"/>
>
> And Field Definition
>  stored="true"
> default="0" docValues="true"/>
>
>
> as soon as i restart the server i see the exception in log. removing the
> *docValuesFormat="Disk"* from the field type i don't see this exception.
>
> 01:49:37,177 ERROR [org.apache.solr.core.CoreContainer]
> (coreLoadExecutor-3-thread-1) Unable to create core: collection1:
> org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with
> a docValues format, but the codec does not support it: class
> org.apache.solr.core.SolrCore$3
> at org.apache.solr.core.SolrCore.(SolrCore.java:806)
> [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
> at org.apache.solr.core.SolrCore.(SolrCore.java:619)
> [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
> [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
> at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
> at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> [rt.jar:1.7.0_09]
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> [rt.jar:1.7.0_09]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> [rt.jar:1.7.0_09]
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> [rt.jar:1.7.0_09]
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> [rt.jar:1.7.0_09]
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> [rt.jar:1.7.0_09]
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> [rt.jar:1.7.0_09]
> at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09]
> Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
> configured with a docValues format, but the codec does not support it:
> class
> org.apache.solr.core.SolrCore$3
> at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
> [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
> at org.apache.solr.core.SolrCore.(SolrCore.java:719)
> [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
> ... 13 more
>
> 01:49:37,202 ERROR [org.apache.solr.core.CoreContainer]
> (coreLoadExecutor-3-thread-1) null:org.apache.solr.common.SolrException:
> Unable to create core: collection1
> at
> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
> at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
> configured with a docValues format, but the codec does not support it:
> class
> org.apache.solr.core.SolrCore$3
> at org.apache.solr.core.SolrCore.(SolrCore.java:806)
> at org.apache.solr.core.SolrCore.(SolrCore.java:619)
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> ... 10 more
> Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
> configured with a docValues format, but the codec does not support it:
> class
> org.apache.solr.core.SolrCore$3
> at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
> at org.apache.solr.core.SolrCore.(SolrCore.java:719)
> ... 13 more
>
>
>
>
> --
> View this message in context:
> http://luce

Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Andy Lester

On Mar 24, 2013, at 10:18 PM, Steve Rowe  wrote:

> The wiki at http://wiki.apache.org/solr/ has come under attack by spammers 
> more frequently of late, so the PMC has decided to lock it down in an attempt 
> to reduce the work involved in tracking and removing spam.
> 
> From now on, only people who appear on 
> http://wiki.apache.org/solr/ContributorsGroup will be able to 
> create/modify/delete wiki pages.
> 
> Please request either on the solr-user@lucene.apache.org or on 
> d...@lucene.apache.org to have your wiki username added to the 
> ContributorsGroup page - this is a one-time step.


Please add my username, AndyLester, to the approved editors list.  Thanks.

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Steve Rowe
On Mar 28, 2013, at 9:25 AM, Andy Lester  wrote:
> On Mar 24, 2013, at 10:18 PM, Steve Rowe  wrote:
>> Please request either on the solr-user@lucene.apache.org or on 
>> d...@lucene.apache.org to have your wiki username added to the 
>> ContributorsGroup page - this is a one-time step.
> 
> Please add my username, AndyLester, to the approved editors list.  Thanks.

Added to solr ContributorsGroup.

Is deltaQuery mandatory ?

2013-03-28 Thread A. Lotfi
Is deltaQuery mandatory in data-config.xml ?

I did it like this :

 
Then my manager come and said we don't need it, this is only for incremental.
I took off the line that start with deltaQuery, now in :

http://localhost:8983/solr/#/db/dataimport//dataimport


entity is empty, when I click the button Exwcute, nothing happened,

thanks.

Re: Querying a transitive closure?

2013-03-28 Thread Jack Park
Thank you for this. I had thought about it but reasoned in a naive
way: who would do such a thing?

Doing so makes the query local: once the object has been retrieved, no
further HTTP queries are required. Implementation perhaps entails one
request to fetch the presumed parent in order to harvest its
transitive closure.  I need to think about that.

Many thanks
Jack

On Thu, Mar 28, 2013 at 5:06 AM, Jens Grivolla  wrote:
> Exactly, you should usually design your schema to fit your queries, and if
> you need to retrieve all ancestors then you should index all ancestors so
> you can query for them easily.
>
> If that doesn't work for you then either Solr is not the right tool for the
> job, or you need to rethink your schema.
>
> The description of doing lookups within a tree structure doesn't sound at
> all like what you would use a text retrieval engine for, so you might want
> to rethink why you want to use Solr for this. But if that "transitive
> closure" is something you can calculate at indexing time then the correct
> solution is the one Upayavira provided.
>
> If you want people to be able to help you you need to actually describe your
> problem (i.e. what is my data, and what are my queries) instead of diving
> into technical details like "reducing HTTP roundtrips". My guess is that if
> you need to "reduce HTTP roundtrips" you're probably doing it wrong.
>
> HTH,
> Jens
>
>
> On 03/28/2013 08:15 AM, Upayavira wrote:
>>
>> Why don't you index all ancestor classes with the document, as a
>> multivalued field, then you could get it in one hit. Am I missing
>> something?
>>
>> Upayavira
>>
>> On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote:
>>>
>>> Hi Otis,
>>> That's essentially the answer I was looking for: each shard (are we
>>> talking master + replicas?) has the plug-in custom query handler.  I
>>> need to build it to find out.
>>>
>>> What I mean is that there is a taxonomy, say one with a single root
>>> for sake of illustration, which grows all the classes, subclasses, and
>>> instances. If I have an object that is somewhere in that taxonomy,
>>> then it has a zigzag chain of parents up that tree (I've seen that
>>> called a "transitive closure". If class B is way up that tree from M,
>>> no telling how many queries it will take to find it.  Hmmm...
>>> recursive ascent, I suppose.
>>>
>>> Many thanks
>>> Jack
>>>
>>> On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
>>>  wrote:

 Hi Jack,

 I don't fully understand the exact taxonomy structure and your needs,
 but in terms of reducing the number of HTTP round trips, you can do it
 by writing a custom SearchComponent that, upon getting the initial
 request, does everything "locally", meaning that it talks to the
 local/specified shard before returning to the caller.  In SolrCloud
 setup with N shards, each of these N shards could be queried in such a
 way in parallel, running query/queries on their local shards.

 Otis
 --
 Solr & ElasticSearch Support
 http://sematext.com/





 On Wed, Mar 27, 2013 at 3:11 PM, Jack Park 
 wrote:
>
> Hi Otis,
>
> I fully expect to grow to SolrCloud -- many shards. For now, it's
> solo. But, my thinking relates to cloud. I look for ways to reduce the
> number of HTTP round trips through SolrJ. Maybe you have some ideas?
>
> Thanks
> Jack
>
> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
>  wrote:
>>
>> Hi Jack,
>>
>> Is this really about HTTP and Solr vs. SolrCloud or more whether
>> Solr(Cloud) is the right tool for the job and if so how to structure
>> the schema and queries to make such lookups efficient?
>>
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park 
>> wrote:
>>>
>>> This is a question about "isA?"
>>>
>>> We want to know if M isA B   isA?(M,B)
>>>
>>> For some M, one might be able to look into M to see its type or which
>>> class(es) for which it is a subClass. We're talking taxonomic queries
>>> now.
>>> But, for some M, one might need to ripple up the "transitive
>>> closure",
>>> looking at all the super classes, etc, recursively.
>>>
>>> It seems unreasonable to do that over HTTP; it seems more reasonable
>>> to grab a core and write a custom isA query handler. But, how do you
>>> do that in a SolrCloud?
>>>
>>> Really curious...
>>>
>>> Many thanks in advance for ideas.
>>> Jack
>>
>>
>
>


RE: Is deltaQuery mandatory ?

2013-03-28 Thread Swati Swoboda
No, it's not mandatory. You can't do delta imports without delta queries 
though; you'd need to do a full-import. Per your query, you'd only ever do 
objects with rownum<=5000.

-Original Message-
From: A. Lotfi [mailto:majidna...@yahoo.com] 
Sent: Thursday, March 28, 2013 10:07 AM
To: gene...@lucene.apache.org; solr-user@lucene.apache.org
Subject: Is deltaQuery mandatory ?

Is deltaQuery mandatory in data-config.xml ?

I did it like this :

 
Then my manager come and said we don't need it, this is only for incremental.
I took off the line that start with deltaQuery, now in :

http://localhost:8983/solr/#/db/dataimport//dataimport


entity is empty, when I click the button Exwcute, nothing happened,

thanks.


RE: SOLR - "Unable to execute query" error - DIH

2013-03-28 Thread Dyer, James
You may want to run your jdbc driver in trace mode just to see if it is picking 
up these different options.  I know from experience that the "selectMethod" 
parameter can sometimes be important to prevent SQLServer drivers from caching 
the entire resultset in memory.  

But something seems very wrong here and maybe driver tuning is really not what 
you need.  18 minutes to index 500 documents is extreme.  Unless the documents 
were huge or you were doing very unusual, I'd expect this to happen in seconds 
(<1 second?).  Are you indexing on a Raspberry Pi?

Possibly, you have a cartesian join somewhere in your sql, or some other little 
mistake?  If you post your entire data-config.xml possibly someone will see the 
error.  Or, could you be extremely memory constrained because of bad JVM heap 
choices?  Do your logs show you the jvm constantly in GC cycles?

Just a little note:  batchSize goes on the  tag, not on .  I really don't think tweaking batchSize is going to fix this though.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] 
Sent: Thursday, March 28, 2013 1:43 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR - "Unable to execute query" error - DIH

Thanks James.

We have tried the following options *(individually)* including the one you
suggested,

1."selectMethod=cursor" 
2. "batchSize=-1"
3."responseBuffering=adaptive"

But the indexing process doesn't seem to be improving at all. When we try to
index set of 500 rows it works well gets completed in 18 min. For 1000K rows
it took 22 hours (long) for indexing. But, when we try to index the complete
set of 750K rows it doesn't show any progress and keeps on executing.

Currently both the SQL server as well as the SOLR machine is running on 4 GB
RAM. With this configuration does the above scenario stands justified? If we
think of upgrading the RAM, which machine should that be, the SOLR machine
or the SQL Server machine?

Are there any other efficient methods to import/ index data from SQL Server
to SOLR?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Is deltaQuery mandatory ?

2013-03-28 Thread Dyer, James
You do not need "deltaQuery" unless you're doing delta (incremental) updates.  
To configure a full import, try starting with this example:

http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: A. Lotfi [mailto:majidna...@yahoo.com] 
Sent: Thursday, March 28, 2013 9:07 AM
To: gene...@lucene.apache.org; solr-user@lucene.apache.org
Subject: Is deltaQuery mandatory ?

Is deltaQuery mandatory in data-config.xml ?

I did it like this :

 
Then my manager come and said we don't need it, this is only for incremental.
I took off the line that start with deltaQuery, now in :

http://localhost:8983/solr/#/db/dataimport//dataimport


entity is empty, when I click the button Exwcute, nothing happened,

thanks.



RE: SOLR - "Unable to execute query" error - DIH

2013-03-28 Thread Swati Swoboda
What version of Solr4 are you running? We are on 3.6.2 so I can't be confident 
whether these settings still exist (they probably do...), but here is what we 
do to speed up full-indexing:

In solrconfig.xml, increase your ramBufferSize to 128MB.
Increase mergeFactor to 20.
Make sure autoCommit is disabled.

Basically, you want to minimize how often Lucene/Solr flushes (as that is very 
time consuming). Merging is also very time consuming, so you want large 
segments and fewer merges (hence the merge factor increase). We use these 
settings when we are doing our initial full-indexing and then switch them over 
to saner defaults do our regular/delta indexing.

Roll-backs concern me; why did your query roll back? Did it give an error -- it 
should have. Should be in your solr log file. Was it because the connection 
timed out? It's important to find out. We prevented roll backs by effectively 
splitting our data across entities and then indexing one-entity at a time. This 
allowed us to make sure that if one "sector" failed, it didn't impact the 
entire process. (This can be done by using autoCommit, but that slows down 
indexing.) 

If you're getting OOM errors, be sure that your Xmx value is set high enough 
(and that you have enough memory). You may be able to increase ramBufferSize 
depending on how much memory you had (we didn't have much). 

Hope this helps.
Swati


-Original Message-
From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] 
Sent: Thursday, March 28, 2013 2:43 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR - "Unable to execute query" error - DIH

Thanks James.

We have tried the following options *(individually)* including the one you 
suggested,

1."selectMethod=cursor" 
2. "batchSize=-1"
3."responseBuffering=adaptive"

But the indexing process doesn't seem to be improving at all. When we try to 
index set of 500 rows it works well gets completed in 18 min. For 1000K rows it 
took 22 hours (long) for indexing. But, when we try to index the complete set 
of 750K rows it doesn't show any progress and keeps on executing.

Currently both the SQL server as well as the SOLR machine is running on 4 GB 
RAM. With this configuration does the above scenario stands justified? If we 
think of upgrading the RAM, which machine should that be, the SOLR machine or 
the SQL Server machine?

Are there any other efficient methods to import/ index data from SQL Server to 
SOLR?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-28 Thread Uomesh
Yes, Only thing is, on master delta import is running every 1/2 hour but as
there is no data change in last 24 hour i think index version still remains
same. another thing i notice is after full import index Gen is bumped to
directly higher than slave. Can that means Master is not increasing Version
and Gen with delta-import correctly? See as below.

*Before Full Import*

Master:
1364331607690
154
88.28 KB
Slave:
1364395321127
241
98.75 KB

*After Full Import*
Master:
1364395566324
242
88.28 KB
Slave:
1364395321127
241
98.75 KB

On Tue, Mar 26, 2013 at 1:05 PM, Mark Miller-3 [via Lucene] <
ml-node+s472066n4051477...@n3.nabble.com> wrote:

> That's pretty interesting. The slave should have no way of doing this
> without a commit…
>
> - Mark
>
> On Mar 26, 2013, at 11:07 AM, Uomesh <[hidden 
> email]>
> wrote:
>
> > Hi Mark,
> >
> > Further details: My master details has not changed since last 24 hours
> but
> > Slave index version and Gen has increased. If i do the full import slave
> > is replicated and Version and Gen is reset.
> >   Version   GenSize
> > Master:
> > 1364238678758
> > 111
> > 768.23 KB
> > Slave:
> > 1364299206396
> > 155
> > 768.02 KB
> >
> >
> >
> > On Fri, Mar 22, 2013 at 3:32 PM, Mark Miller-3 [via Lucene] <
> > [hidden email] >
> wrote:
> >
> >> That was to you Phil.
> >>
> >> So it seems this is a problem with the configuration replication case I
> >> would guess - I didn't really look at that path in the 4.2 fixes I
> worked
> >> on.
> >>
> >> I did add it to the new testing I'm doing since I've suspected it (it
> will
> >> prompt a core reload that doesn't happen when configs don't replicate).
> >> I'll see what I can do to try and get a test to catch it.
> >>
> >> - mark
> >>
> >> On Mar 22, 2013, at 1:49 PM, Mark Miller <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=4050577&i=0>>
> >> wrote:
> >>
> >>> And your also on 4.2?
> >>>
> >>> - Mark
> >>>
> >>> On Mar 22, 2013, at 12:41 PM, Uomesh <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=4050577&i=1>>
> >> wrote:
> >>>
>  Also, I am replicating only on commit and startup.
> 
>  Thanks,
>  Umesh
> 
>  On Fri, Mar 22, 2013 at 11:23 AM, Umesh Sharma <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=4050577&i=2>>
> >> wrote:
> 
> > Hi Mrk,
> >
> > I am replicating below config files but not replicating
> >> solrconfig.xml.
> >
> > confFiles: schema.xml, elevate.xml, stopwords.txt,
> > mapping-FoldToASCII.txt, mapping-ISOLatin1Accent.txt, protwords.txt,
> > spellings.txt, synonyms.txt
> >
> >
> > also strange I am seeing big Gen difference between Master and
> slave.
> >> My
> > master slave is 2 while Slave is 56. If i do the full import then
> the
> >> Gen
> > is getting higher then slave one and its replicating. i have more
> than
> >> 30
> > cores on my solr instance and all are scheduled to replicate on same
> >> time.
> >
> > Index Version Gen Size Master:
> > 1363903243590
> > 2
> > 94 bytes
> > Slave:
> > 1363967579193
> > 56
> > 94 bytes
> >
> > Thanks,
> > Umesh
> >
> >
> > On Fri, Mar 22, 2013 at 10:42 AM, Mark Miller-3 [via Lucene] <
> > [hidden email] >
>
> >> wrote:
> >
> >> Are you replicating configuration files as well?
> >>
> >> - Mark
> >>
> >> On Mar 22, 2013, at 6:38 AM, "John, Phil (CSS)" <[hidden email]<
> >> http://user/SendEmail.jtp?type=node&node=4050075&i=0>>
> >> wrote:
> >>
> >>> To add to the discussion.
> >>>
> >>> We're running classic master/slave replication (not solrcloud)
> with
> >> 1
> >> master and 2 slaves and I noticed the slave having a higher version
> >> number
> >> than the master the other day as well.
> >>>
> >>> In our case, knock on wood, it hasn't stopped replication.
> >>>
> >>> If you'd like a copy of our config I can provide off-list.
> >>>
> >>> Regards,
> >>>
> >>> Phil.
> >>>
> >>> 
> >>>
> >>> From: Mark Miller [mailto:[hidden email]<
> >> http://user/SendEmail.jtp?type=node&node=4050075&i=1>]
> >>
> >>> Sent: Fri 22/03/2013 06:32
> >>> To: [hidden email]<
> >> http://user/SendEmail.jtp?type=node&node=4050075&i=2>
> >>> Subject: Re: Solr 4.2 - Slave Index version is higher than Master
> >>>
> >>>
> >>>
> >>> The other odd thing here is that this should not stop replication
> at
> >> all. When the slave is ahead, it will still have it's index
> replaced.
> >>>
> >>> - Mark
> >>>
> >>> On Mar 22, 2013, at 1:26 AM, Mark Miller <[hidden email]<
> >> http://user/SendEmail.jtp?type=node&node=4050075&i=3>>
> >> wrote:
> >>>
>  I'm

Re: Too many fields to Sort in Solr

2013-03-28 Thread adityab
still no luck

Performed.
1. Stop the Application Server (JBoss)
2. Deleted everything under data
3. Star the server 
4. Observe exception in log (i have uploaded the file) 

on a side note. do i need to have any additional jar files in the solr home
lib folder. currently its empty. 


docValueException.log
  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052070.html
Sent from the Solr - User mailing list archive at Nabble.com.


Batch Search Query

2013-03-28 Thread Mike Haas
Hello. My company is currently thinking of switching over to Solr 4.2,
coming off of SQL Server. However, what we need to do is a bit weird.

Right now, we have ~12 million segments and growing. Usually these are
sentences but can be other things. These segments are what will be stored
in Solr. I’ve already done that.

Now, what happens is a user will upload say a word document to us. We then
parse it and process it into segments. It very well could be 5000 segments
or even more in that word document. Each one of those ~5000 segments needs
to be searched for similar segments in solr. I’m not quite sure how I will
do the query (whether proximate or something else). The point though, is to
get back similar results for each segment.

However, I think I’m seeing a bigger problem first. I have to search
against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m
pretty sure that would take a LOT of hardware. Keep in mind this could be
happening with maybe 4 different users at once right now (and of course
more in the future). Is there a good way to send a batch query over one (or
at least a lot fewer) http requests?

If not, what kinds of things could I do to implement such a feature (if
feasible, of course)?


Thanks,

Mike


Re: Batch Search Query

2013-03-28 Thread Timothy Potter
Hi Mike,

Interesting problem - here's some pointers on where to get started.

For finding similar segments, check out Solr's More Like This support -
it's built in to the query request processing so you just need to enable it
with query params.

There's nothing built in for doing batch queries from the client side. You
might look into implementing a custom search component and register it as a
first-component in your search handler (take a look at solrconfig.xml for
how search handlers are configured, e.g. /browse).

Cheers,
Tim


On Thu, Mar 28, 2013 at 9:43 AM, Mike Haas  wrote:

> Hello. My company is currently thinking of switching over to Solr 4.2,
> coming off of SQL Server. However, what we need to do is a bit weird.
>
> Right now, we have ~12 million segments and growing. Usually these are
> sentences but can be other things. These segments are what will be stored
> in Solr. I’ve already done that.
>
> Now, what happens is a user will upload say a word document to us. We then
> parse it and process it into segments. It very well could be 5000 segments
> or even more in that word document. Each one of those ~5000 segments needs
> to be searched for similar segments in solr. I’m not quite sure how I will
> do the query (whether proximate or something else). The point though, is to
> get back similar results for each segment.
>
> However, I think I’m seeing a bigger problem first. I have to search
> against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m
> pretty sure that would take a LOT of hardware. Keep in mind this could be
> happening with maybe 4 different users at once right now (and of course
> more in the future). Is there a good way to send a batch query over one (or
> at least a lot fewer) http requests?
>
> If not, what kinds of things could I do to implement such a feature (if
> feasible, of course)?
>
>
> Thanks,
>
> Mike
>


Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Jilal Oussama
Please add OussamaJilal to the group.

Thank you.


2013/3/28 Steve Rowe 

> On Mar 28, 2013, at 9:25 AM, Andy Lester  wrote:
> > On Mar 24, 2013, at 10:18 PM, Steve Rowe  wrote:
> >> Please request either on the solr-user@lucene.apache.org or on
> d...@lucene.apache.org to have your wiki username added to the
> ContributorsGroup page - this is a one-time step.
> >
> > Please add my username, AndyLester, to the approved editors list.
>  Thanks.
>
> Added to solr ContributorsGroup.


Re: Too many fields to Sort in Solr

2013-03-28 Thread adityab
Update ---

I was able to fix the exception by adding following line in solrconfig.xml



Not sure if its mentioned in any document to have this declared in config
file. 
I am now re-indexing and data on the master and will perform test to see if
it works as expected. 

thanks for your support. 

Aditya 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052091.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Steve Rowe
On Mar 28, 2013, at 11:57 AM, Jilal Oussama  wrote:
> Please add OussamaJilal to the group.

Added to solr ContributorsGroup.


Re: Solr and OpenPipe

2013-03-28 Thread Fabio Curti
git clone https://github.com/kolstae/openpipe
cd openpipe
mvn install

regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-OpenPipe-tp484777p4052079.html
Sent from the Solr - User mailing list archive at Nabble.com.


bootstrap_conf without restarting

2013-03-28 Thread jimtronic
I'm doing fairly frequent changes to my data-config.xml files on some of my
cores in a solr cloud setup. Is there anyway to to get these files active
and up to Zookeeper without restarting the instance?

I've noticed that if I just launch another instance of solr with the
bootstrap_conf flag set to true, it uploads the new settings, but it dies
because there's already a solr instance running on that port. It also seems
to make the original one unresponsive or at least "down" in zookeeper's
eyes. I then just restart that instance and everything is back up. It'd be
nice if I could bootstrap without actually starting solr.

What's the best practice for deploying changes to data-config.xml?

Thanks, Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Batch Search Query

2013-03-28 Thread Roman Chyla
Apologies if you already do something similar, but perhaps of general
interest...

One (different approach) to your problem is to implement a local
fingerprint - if you want to find documents with overlapping segments, this
algorithm will dramatically reduce the number of segments you create/search
for every document

http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf

Then you simply end up indexing each document, and upon submission:
computing fingerprints and querying for them. I don't know (ie. remember)
exact numbers, but my feeling is that you end up storing ~13% of document
text (besides, it is a one token fingerprint, therefore quite fast to
search for - you could even try one huge boolean query with 1024 clauses,
ouch... :))

roman

On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas  wrote:

> Hello. My company is currently thinking of switching over to Solr 4.2,
> coming off of SQL Server. However, what we need to do is a bit weird.
>
> Right now, we have ~12 million segments and growing. Usually these are
> sentences but can be other things. These segments are what will be stored
> in Solr. I’ve already done that.
>
> Now, what happens is a user will upload say a word document to us. We then
> parse it and process it into segments. It very well could be 5000 segments
> or even more in that word document. Each one of those ~5000 segments needs
> to be searched for similar segments in solr. I’m not quite sure how I will
> do the query (whether proximate or something else). The point though, is to
> get back similar results for each segment.
>
> However, I think I’m seeing a bigger problem first. I have to search
> against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m
> pretty sure that would take a LOT of hardware. Keep in mind this could be
> happening with maybe 4 different users at once right now (and of course
> more in the future). Is there a good way to send a batch query over one (or
> at least a lot fewer) http requests?
>
> If not, what kinds of things could I do to implement such a feature (if
> feasible, of course)?
>
>
> Thanks,
>
> Mike
>


Re: Solr and OpenPipe

2013-03-28 Thread Giovanni Bricconi
Bella lì!
vedo che ci divertiamo
 Il giorno 28/mar/2013 17:11, "Fabio Curti"  ha
scritto:

> git clone https://github.com/kolstae/openpipe
> cd openpipe
> mvn install
>
> regards
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-and-OpenPipe-tp484777p4052079.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Batch Search Query

2013-03-28 Thread Mike Haas
Thanks for your reply, Roman. Unfortunately, the business has been running
this way forever so I don't think it would be feasible to switch to a whole
document store versus segments store. Even then, if I understand you
correctly it would not work for our needs. I'm thinking because we don't
care about any other parts of the document, just the segment. If a similar
segment is in an entirely different document, we want that segment.

I'll keep taking any and all feedback however so that I can develop an idea
and present it to my manager.


On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla  wrote:

> Apologies if you already do something similar, but perhaps of general
> interest...
>
> One (different approach) to your problem is to implement a local
> fingerprint - if you want to find documents with overlapping segments, this
> algorithm will dramatically reduce the number of segments you create/search
> for every document
>
> http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
>
> Then you simply end up indexing each document, and upon submission:
> computing fingerprints and querying for them. I don't know (ie. remember)
> exact numbers, but my feeling is that you end up storing ~13% of document
> text (besides, it is a one token fingerprint, therefore quite fast to
> search for - you could even try one huge boolean query with 1024 clauses,
> ouch... :))
>
> roman
>
> On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas  wrote:
>
> > Hello. My company is currently thinking of switching over to Solr 4.2,
> > coming off of SQL Server. However, what we need to do is a bit weird.
> >
> > Right now, we have ~12 million segments and growing. Usually these are
> > sentences but can be other things. These segments are what will be stored
> > in Solr. I’ve already done that.
> >
> > Now, what happens is a user will upload say a word document to us. We
> then
> > parse it and process it into segments. It very well could be 5000
> segments
> > or even more in that word document. Each one of those ~5000 segments
> needs
> > to be searched for similar segments in solr. I’m not quite sure how I
> will
> > do the query (whether proximate or something else). The point though, is
> to
> > get back similar results for each segment.
> >
> > However, I think I’m seeing a bigger problem first. I have to search
> > against ~5000 segments. That would be 5000 http requests. That’s a lot!
> I’m
> > pretty sure that would take a LOT of hardware. Keep in mind this could be
> > happening with maybe 4 different users at once right now (and of course
> > more in the future). Is there a good way to send a batch query over one
> (or
> > at least a lot fewer) http requests?
> >
> > If not, what kinds of things could I do to implement such a feature (if
> > feasible, of course)?
> >
> >
> > Thanks,
> >
> > Mike
> >
>


Re: Batch Search Query

2013-03-28 Thread Walter Underwood
This might not be a good match for Solr, or for many other systems. It does 
seem like a natural fit for MarkLogic. That natively searches and selects over 
XML documents.

Disclaimer: I worked at MarkLogic for a couple of years.

wunder

On Mar 28, 2013, at 9:27 AM, Mike Haas wrote:

> Thanks for your reply, Roman. Unfortunately, the business has been running
> this way forever so I don't think it would be feasible to switch to a whole
> document store versus segments store. Even then, if I understand you
> correctly it would not work for our needs. I'm thinking because we don't
> care about any other parts of the document, just the segment. If a similar
> segment is in an entirely different document, we want that segment.
> 
> I'll keep taking any and all feedback however so that I can develop an idea
> and present it to my manager.
> 
> 
> On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla  wrote:
> 
>> Apologies if you already do something similar, but perhaps of general
>> interest...
>> 
>> One (different approach) to your problem is to implement a local
>> fingerprint - if you want to find documents with overlapping segments, this
>> algorithm will dramatically reduce the number of segments you create/search
>> for every document
>> 
>> http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
>> 
>> Then you simply end up indexing each document, and upon submission:
>> computing fingerprints and querying for them. I don't know (ie. remember)
>> exact numbers, but my feeling is that you end up storing ~13% of document
>> text (besides, it is a one token fingerprint, therefore quite fast to
>> search for - you could even try one huge boolean query with 1024 clauses,
>> ouch... :))
>> 
>> roman
>> 
>> On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas  wrote:
>> 
>>> Hello. My company is currently thinking of switching over to Solr 4.2,
>>> coming off of SQL Server. However, what we need to do is a bit weird.
>>> 
>>> Right now, we have ~12 million segments and growing. Usually these are
>>> sentences but can be other things. These segments are what will be stored
>>> in Solr. I’ve already done that.
>>> 
>>> Now, what happens is a user will upload say a word document to us. We
>> then
>>> parse it and process it into segments. It very well could be 5000
>> segments
>>> or even more in that word document. Each one of those ~5000 segments
>> needs
>>> to be searched for similar segments in solr. I’m not quite sure how I
>> will
>>> do the query (whether proximate or something else). The point though, is
>> to
>>> get back similar results for each segment.
>>> 
>>> However, I think I’m seeing a bigger problem first. I have to search
>>> against ~5000 segments. That would be 5000 http requests. That’s a lot!
>> I’m
>>> pretty sure that would take a LOT of hardware. Keep in mind this could be
>>> happening with maybe 4 different users at once right now (and of course
>>> more in the future). Is there a good way to send a batch query over one
>> (or
>>> at least a lot fewer) http requests?
>>> 
>>> If not, what kinds of things could I do to implement such a feature (if
>>> feasible, of course)?
>>> 
>>> 
>>> Thanks,
>>> 
>>> Mike
>>> 
>> 

--
Walter Underwood
wun...@wunderwood.org





Re: Batch Search Query

2013-03-28 Thread Mike Haas
Thanks Timothy,

In regards to you mentioning using MoreLikeThis, do you know what kind of
algorithm it uses? My searching didn't reveal anything.


On Thu, Mar 28, 2013 at 10:51 AM, Timothy Potter wrote:

> Hi Mike,
>
> Interesting problem - here's some pointers on where to get started.
>
> For finding similar segments, check out Solr's More Like This support -
> it's built in to the query request processing so you just need to enable it
> with query params.
>
> There's nothing built in for doing batch queries from the client side. You
> might look into implementing a custom search component and register it as a
> first-component in your search handler (take a look at solrconfig.xml for
> how search handlers are configured, e.g. /browse).
>
> Cheers,
> Tim
>
>
> On Thu, Mar 28, 2013 at 9:43 AM, Mike Haas  wrote:
>
> > Hello. My company is currently thinking of switching over to Solr 4.2,
> > coming off of SQL Server. However, what we need to do is a bit weird.
> >
> > Right now, we have ~12 million segments and growing. Usually these are
> > sentences but can be other things. These segments are what will be stored
> > in Solr. I’ve already done that.
> >
> > Now, what happens is a user will upload say a word document to us. We
> then
> > parse it and process it into segments. It very well could be 5000
> segments
> > or even more in that word document. Each one of those ~5000 segments
> needs
> > to be searched for similar segments in solr. I’m not quite sure how I
> will
> > do the query (whether proximate or something else). The point though, is
> to
> > get back similar results for each segment.
> >
> > However, I think I’m seeing a bigger problem first. I have to search
> > against ~5000 segments. That would be 5000 http requests. That’s a lot!
> I’m
> > pretty sure that would take a LOT of hardware. Keep in mind this could be
> > happening with maybe 4 different users at once right now (and of course
> > more in the future). Is there a good way to send a batch query over one
> (or
> > at least a lot fewer) http requests?
> >
> > If not, what kinds of things could I do to implement such a feature (if
> > feasible, of course)?
> >
> >
> > Thanks,
> >
> > Mike
> >
>


Re: Batch Search Query

2013-03-28 Thread Roman Chyla
On Thu, Mar 28, 2013 at 12:27 PM, Mike Haas  wrote:

> Thanks for your reply, Roman. Unfortunately, the business has been running
> this way forever so I don't think it would be feasible to switch to a whole
>

sure, no arguing against that :)


> document store versus segments store. Even then, if I understand you
> correctly it would not work for our needs. I'm thinking because we don't
> care about any other parts of the document, just the segment. If a similar
> segment is in an entirely different document, we want that segment.
>

the algo should work for this case - the beauty of the local winnowing is
that it is *local*, ie it tends to select the same segments from the text
(ie. you process two documents, written by two different people - but if
they cited the same thing, and it is longer than 'm' tokens, you will have
at least one identical fingerprints from both documents - which means:
match!) then of course, you can store the position offset of the original
words of the fingerprint and retrieve the original, compute ratio of
overlap etc... but a database seems to be better suited for these kind of
jobs...

let us know what you adopt!

ps: MoreLikeThis selects 'significant' tokens from the document you
selected and then constructs a new boolean query searching for those.
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/

>
> I'll keep taking any and all feedback however so that I can develop an idea
> and present it to my manager.
>
>
> On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla 
> wrote:
>
> > Apologies if you already do something similar, but perhaps of general
> > interest...
> >
> > One (different approach) to your problem is to implement a local
> > fingerprint - if you want to find documents with overlapping segments,
> this
> > algorithm will dramatically reduce the number of segments you
> create/search
> > for every document
> >
> > http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
> >
> > Then you simply end up indexing each document, and upon submission:
> > computing fingerprints and querying for them. I don't know (ie. remember)
> > exact numbers, but my feeling is that you end up storing ~13% of document
> > text (besides, it is a one token fingerprint, therefore quite fast to
> > search for - you could even try one huge boolean query with 1024 clauses,
> > ouch... :))
> >
> > roman
> >
> > On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas 
> wrote:
> >
> > > Hello. My company is currently thinking of switching over to Solr 4.2,
> > > coming off of SQL Server. However, what we need to do is a bit weird.
> > >
> > > Right now, we have ~12 million segments and growing. Usually these are
> > > sentences but can be other things. These segments are what will be
> stored
> > > in Solr. I’ve already done that.
> > >
> > > Now, what happens is a user will upload say a word document to us. We
> > then
> > > parse it and process it into segments. It very well could be 5000
> > segments
> > > or even more in that word document. Each one of those ~5000 segments
> > needs
> > > to be searched for similar segments in solr. I’m not quite sure how I
> > will
> > > do the query (whether proximate or something else). The point though,
> is
> > to
> > > get back similar results for each segment.
> > >
> > > However, I think I’m seeing a bigger problem first. I have to search
> > > against ~5000 segments. That would be 5000 http requests. That’s a lot!
> > I’m
> > > pretty sure that would take a LOT of hardware. Keep in mind this could
> be
> > > happening with maybe 4 different users at once right now (and of course
> > > more in the future). Is there a good way to send a batch query over one
> > (or
> > > at least a lot fewer) http requests?
> > >
> > > If not, what kinds of things could I do to implement such a feature (if
> > > feasible, of course)?
> > >
> > >
> > > Thanks,
> > >
> > > Mike
> > >
> >
>


multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Bill Au
Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
would each SolrCloud cluster requires its own ZooKeeper ensemble?

Bill


Re: multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Chris Hostetter

: Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
: would each SolrCloud cluster requires its own ZooKeeper ensemble?

https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot

(I'm going to FAQ this)


-Hoss


Re: Too many fields to Sort in Solr

2013-03-28 Thread Joel Bernstein
I didn't have to do anything with the codecs to make it work. Checked my
solrconfig.xml and the codecFactory element is not present.  I'm running
the out of the box jetty setup.


On Thu, Mar 28, 2013 at 11:58 AM, adityab  wrote:

> Update ---
>
> I was able to fix the exception by adding following line in solrconfig.xml
>
> 
>
> Not sure if its mentioned in any document to have this declared in config
> file.
> I am now re-indexing and data on the master and will perform test to see if
> it works as expected.
>
> thanks for your support.
>
> Aditya
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052091.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Joel Bernstein
Professional Services LucidWorks


Re: Batch Search Query

2013-03-28 Thread Mike Haas
I will definitely let you all know what we end up doing. I realized I
forgot to mention something that might make what we do more clear.

Right now we use sql server full text to get back fairly similar matches
for each segment. We do this with some funky sql stuff which I didn't write
and haven't even looked at. It gives us back 100 results. They are not
really all that good of matches though, it just gives us something to work
with. So although some results are good, some are horrible. Then, to truly
make sure we have a good match we take each one of those ~100 results and
run it through a levenshtein algorithm implemented in c# code. Levenshtein
gives back a % match. We then use the highest match so long as it is above
85%

Hope this makes it a little more clear what we are doing.


On Thu, Mar 28, 2013 at 11:39 AM, Roman Chyla  wrote:

> On Thu, Mar 28, 2013 at 12:27 PM, Mike Haas  wrote:
>
> > Thanks for your reply, Roman. Unfortunately, the business has been
> running
> > this way forever so I don't think it would be feasible to switch to a
> whole
> >
>
> sure, no arguing against that :)
>
>
> > document store versus segments store. Even then, if I understand you
> > correctly it would not work for our needs. I'm thinking because we don't
> > care about any other parts of the document, just the segment. If a
> similar
> > segment is in an entirely different document, we want that segment.
> >
>
> the algo should work for this case - the beauty of the local winnowing is
> that it is *local*, ie it tends to select the same segments from the text
> (ie. you process two documents, written by two different people - but if
> they cited the same thing, and it is longer than 'm' tokens, you will have
> at least one identical fingerprints from both documents - which means:
> match!) then of course, you can store the position offset of the original
> words of the fingerprint and retrieve the original, compute ratio of
> overlap etc... but a database seems to be better suited for these kind of
> jobs...
>
> let us know what you adopt!
>
> ps: MoreLikeThis selects 'significant' tokens from the document you
> selected and then constructs a new boolean query searching for those.
> http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
>
> >
> > I'll keep taking any and all feedback however so that I can develop an
> idea
> > and present it to my manager.
> >
> >
> > On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla 
> > wrote:
> >
> > > Apologies if you already do something similar, but perhaps of general
> > > interest...
> > >
> > > One (different approach) to your problem is to implement a local
> > > fingerprint - if you want to find documents with overlapping segments,
> > this
> > > algorithm will dramatically reduce the number of segments you
> > create/search
> > > for every document
> > >
> > > http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
> > >
> > > Then you simply end up indexing each document, and upon submission:
> > > computing fingerprints and querying for them. I don't know (ie.
> remember)
> > > exact numbers, but my feeling is that you end up storing ~13% of
> document
> > > text (besides, it is a one token fingerprint, therefore quite fast to
> > > search for - you could even try one huge boolean query with 1024
> clauses,
> > > ouch... :))
> > >
> > > roman
> > >
> > > On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas 
> > wrote:
> > >
> > > > Hello. My company is currently thinking of switching over to Solr
> 4.2,
> > > > coming off of SQL Server. However, what we need to do is a bit weird.
> > > >
> > > > Right now, we have ~12 million segments and growing. Usually these
> are
> > > > sentences but can be other things. These segments are what will be
> > stored
> > > > in Solr. I’ve already done that.
> > > >
> > > > Now, what happens is a user will upload say a word document to us. We
> > > then
> > > > parse it and process it into segments. It very well could be 5000
> > > segments
> > > > or even more in that word document. Each one of those ~5000 segments
> > > needs
> > > > to be searched for similar segments in solr. I’m not quite sure how I
> > > will
> > > > do the query (whether proximate or something else). The point though,
> > is
> > > to
> > > > get back similar results for each segment.
> > > >
> > > > However, I think I’m seeing a bigger problem first. I have to search
> > > > against ~5000 segments. That would be 5000 http requests. That’s a
> lot!
> > > I’m
> > > > pretty sure that would take a LOT of hardware. Keep in mind this
> could
> > be
> > > > happening with maybe 4 different users at once right now (and of
> course
> > > > more in the future). Is there a good way to send a batch query over
> one
> > > (or
> > > > at least a lot fewer) http requests?
> > > >
> > > > If not, what kinds of things could I do to implement such a feature
> (if
> > > > feasible, of course)?
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Mike
> > > >
> > >
> >
>


Re: bootstrap_conf without restarting

2013-03-28 Thread Joel Bernstein
You can use the upconfig command witch is described on the Solr Cloud wiki
page, followed by a collection reload also described on the wiki. Here is a
sample command upconfig:

java -classpath example/solr-webapp/WEB-INF/lib/*
org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
-confdir example/solr/collection1/conf -confname conf1 -solrhome
example/solr




On Thu, Mar 28, 2013 at 12:05 PM, jimtronic  wrote:

> I'm doing fairly frequent changes to my data-config.xml files on some of my
> cores in a solr cloud setup. Is there anyway to to get these files active
> and up to Zookeeper without restarting the instance?
>
> I've noticed that if I just launch another instance of solr with the
> bootstrap_conf flag set to true, it uploads the new settings, but it dies
> because there's already a solr instance running on that port. It also seems
> to make the original one unresponsive or at least "down" in zookeeper's
> eyes. I then just restart that instance and everything is back up. It'd be
> nice if I could bootstrap without actually starting solr.
>
> What's the best practice for deploying changes to data-config.xml?
>
> Thanks, Jim
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Joel Bernstein
Professional Services LucidWorks


Re: Solr sorting and relevance

2013-03-28 Thread scallawa
Thanks for the fast response.  I am still just learning solr so please bear
with me.  

This still sounds like the wrong products would appear at the top if they
have more inventory unless I am misunderstanding.  High boost low boost
seems to make sense to me.  That alone would return the more relevant items
at the top but once we do a query boost on inventory, wouldn't jeans (using
the aforementioned example) with more inventory that boots appear at top.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Too many fields to Sort in Solr

2013-03-28 Thread adityab
Wo. that's strange. 

I tried toggling with the code factory line in solrconfig.xml (attached in
this post)
commenting gives me error where as un-commenting works. 

can you please take a look into config and let me know if anything wrong
there?

thanks
Aditya 


solrconfig.xml
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052131.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr sorting and relevance

2013-03-28 Thread Joel Bernstein
If you had a high boost on the title with a moderate boost on the inventory
it sounds like you'd get boots first ordered by inventory followed by jeans
ordered by inventory. Because the heavy title boost would move the boots to
the top. You can play with the boost factors to try and get the mix you're
looking for.


On Thu, Mar 28, 2013 at 1:20 PM, scallawa  wrote:

> Thanks for the fast response.  I am still just learning solr so please bear
> with me.
>
> This still sounds like the wrong products would appear at the top if they
> have more inventory unless I am misunderstanding.  High boost low boost
> seems to make sense to me.  That alone would return the more relevant items
> at the top but once we do a query boost on inventory, wouldn't jeans (using
> the aforementioned example) with more inventory that boots appear at top.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Joel Bernstein
Professional Services LucidWorks


Re: multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Bill Au
Thanks.

Now I have to go back and re-read the entire SolrCloud Wiki to see what
other info I missed and/or forgot.

Bill


On Thu, Mar 28, 2013 at 12:48 PM, Chris Hostetter
wrote:

>
> : Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
> : would each SolrCloud cluster requires its own ZooKeeper ensemble?
>
> https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot
>
> (I'm going to FAQ this)
>
>
> -Hoss
>


Could not load config for solrconfig.xml

2013-03-28 Thread A. Lotfi
Hi,
solr setup in windows worked fine,
I tried to follow installing solr in unix, when I started tomcat I got this 
exxception :


SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load config for solrconfig.xml
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:991)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat
h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa
che-tomcat-7.0.39/bin
        at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoad
er.java:318)
        at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader
.java:283)
        at org.apache.solr.core.Config.(Config.java:103)
        at org.apache.solr.core.Config.(Config.java:73)
        at org.apache.solr.core.SolrConfig.(SolrConfig.java:117)
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:989)
        ... 11 more
Mar 28, 2013 1:39:43 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: collec
tion1
        at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:
1672)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Could not load config for solrc
onfig.xml
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:991)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
        ... 10 more
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat
h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa
che-tomcat-7.0.39/bin
        at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoad
er.java:318)
        at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader
.java:283)
        at org.apache.solr.core.Config.(Config.java:103)
        at org.apache.solr.core.Config.(Config.java:73)
        at org.apache.solr.core.SolrConfig.(SolrConfig.java:117)
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:989)
        ... 11 more

Mar 28, 2013 1:39:43 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=/home/spbear/javaguys/apache-tomcat-7.0.39/bin
Mar 28, 2013 1:39:43 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init() done
Mar 28, 2013 1:39:43 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/spbear/javaguys/apache-tomcat-7.
INFO: Registering Log Listener
Mar 28, 2013 1:39:42 PM org.apache.solr.core.CoreContainer create
INFO: Creating SolrCore 'collection1' using instanceDir: /home/javaguys/solr-hom
e/collection1
Mar 28, 2013 1:39:42 PM org.apache.solr.core.SolrResourceLoader 
INFO: new SolrResourceLoader for directory: '/home/javaguys/solr-home/collection
1/'
Mar 28, 2013 1:39:43 PM org.apache.solr.core.CoreContainer recordAndThrow
SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load config for solrconfig.xml
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:991)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
        at org.apache.solr.core.CoreC

Re: bootstrap_conf without restarting

2013-03-28 Thread Mark Miller
Couple notes though:

> java -classpath example/solr-webapp/WEB-INF/lib/*
> org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
> -confdir example/solr/collection1/conf -confname conf1 -solrhome
> example/solr

I don't think you want that -solrhome - if I remember right, thats for 
testing/local purposes and is just for when you want to run zk internally from 
the cmd. Generally that should be ignored. I think you also might want to put 
the -classpath value in quotes, or your OS can do some auto expanding that 
causes issues…so I think it might be better to do like:

> java -classpath "example/solr-webapp/WEB-INF/lib/*"
> org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
> -confdir example/solr/collection1/conf -confname conf1

I think the examples on the wiki should probably be updated. -solrhome is only 
needed with the bootstrap option I believe.

- Mark

On Mar 28, 2013, at 1:14 PM, Joel Bernstein  wrote:

> You can use the upconfig command witch is described on the Solr Cloud wiki
> page, followed by a collection reload also described on the wiki. Here is a
> sample command upconfig:
> 
> java -classpath example/solr-webapp/WEB-INF/lib/*
> org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
> -confdir example/solr/collection1/conf -confname conf1 -solrhome
> example/solr
> 
> 
> 
> 
> On Thu, Mar 28, 2013 at 12:05 PM, jimtronic  wrote:
> 
>> I'm doing fairly frequent changes to my data-config.xml files on some of my
>> cores in a solr cloud setup. Is there anyway to to get these files active
>> and up to Zookeeper without restarting the instance?
>> 
>> I've noticed that if I just launch another instance of solr with the
>> bootstrap_conf flag set to true, it uploads the new settings, but it dies
>> because there's already a solr instance running on that port. It also seems
>> to make the original one unresponsive or at least "down" in zookeeper's
>> eyes. I then just restart that instance and everything is back up. It'd be
>> nice if I could bootstrap without actually starting solr.
>> 
>> What's the best practice for deploying changes to data-config.xml?
>> 
>> Thanks, Jim
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> 
> 
> -- 
> Joel Bernstein
> Professional Services LucidWorks



Re: Solr sorting and relevance

2013-03-28 Thread Otis Gospodnetic
Hi,

But can you ever get this universally right?
In some cases there is very little inventory and in some case there is
a ton of inventory, so even if you use a small boost for inventory,
when the intentory is very large, that will overpower the title boost,
no?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Mar 28, 2013 at 2:27 PM, Joel Bernstein  wrote:
> If you had a high boost on the title with a moderate boost on the inventory
> it sounds like you'd get boots first ordered by inventory followed by jeans
> ordered by inventory. Because the heavy title boost would move the boots to
> the top. You can play with the boost factors to try and get the mix you're
> looking for.
>
>
> On Thu, Mar 28, 2013 at 1:20 PM, scallawa  wrote:
>
>> Thanks for the fast response.  I am still just learning solr so please bear
>> with me.
>>
>> This still sounds like the wrong products would appear at the top if they
>> have more inventory unless I am misunderstanding.  High boost low boost
>> seems to make sense to me.  That alone would return the more relevant items
>> at the top but once we do a query boost on inventory, wouldn't jeans (using
>> the aforementioned example) with more inventory that boots appear at top.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Joel Bernstein
> Professional Services LucidWorks


Re: Could not load config for solrconfig.xml

2013-03-28 Thread Gora Mohanty
On 29 March 2013 00:19, A. Lotfi  wrote:
> Hi,
> solr setup in windows worked fine,
> I tried to follow installing solr in unix, when I started tomcat I got this 
> exxception :
[...]

Seems it cannot find solrconfig.xml. The relevant part from the logs is:
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat
h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa
che-tomcat-7.0.39/bin

Have you defined the solr/home property properly in your Solr
configuration file?

Regards,
Gora


Re: SOLR - Documents with large number of fields ~ 450

2013-03-28 Thread Marcin Rzewucki
Hi John,

Mark is right. DocValues can be enabled in two ways: RAM resident (default)
or on-disk. You can read more here:
http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues

Regards.

On 22 March 2013 16:55, John Nielsen  wrote:

> "with the on disk option".
>
> Could you elaborate on that?
> Den 22/03/2013 05.25 skrev "Mark Miller" :
>
> > You might try using docvalues with the on disk option and try and let the
> > OS manage all the memory needed for all the faceting/sorting. This would
> > require Solr 4.2.
> >
> > - Mark
> >
> > On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote:
> >
> > > Hello All,
> > >
> > > Scenario:
> > >
> > > My data model consist of approx. 450 fields with different types of
> > data. We
> > > want to include each field for indexing as a result it will create a
> > single
> > > SOLR document with *450 fields*. The total of number of records in the
> > data
> > > set is *755K*. We will be using the features like faceting and sorting
> on
> > > approx. 50 fields.
> > >
> > > We are planning to use SOLR 4.1. Following is the hardware
> configuration
> > of
> > > the web server that we plan to install SOLR on:-
> > >
> > > CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB
> > >
> > > Questions :
> > >
> > > 1)What's the best approach when dealing with documents with large
> number
> > of
> > > fields. What's the drawback of having a single document with a very
> large
> > > number of fields. Does SOLR support documents with large number of
> > fields as
> > > in my case?
> > >
> > > 2)Will there be any performance issue if i define all of the 450 fields
> > for
> > > indexing? Also if faceting is done on 50 fields with document having
> > large
> > > number of fields and huge number of records?
> > >
> > > 3)The name of the fields in the data set are quiet lengthy around 60
> > > characters. Will it be a problem defining fields with such a huge name
> in
> > > the schema file? Is there any best practice to be followed related to
> > naming
> > > convention? Will big field names create problem during querying?
> > >
> > > Thanks!
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>


Re: Solr sorting and relevance

2013-03-28 Thread Joel Bernstein
Otis brings up a good point. Possibly you could put logic in your function
query to account for this. But it may be that you can't achieve the mix
you're looking for without taking direct control.

That is the main reason that SOLR-4465 was put out there, for cases where
direct control is needed. I have to reiterate that SOLR-4465 is
experimental at this point and subject to change.



On Thu, Mar 28, 2013 at 3:00 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> But can you ever get this universally right?
> In some cases there is very little inventory and in some case there is
> a ton of inventory, so even if you use a small boost for inventory,
> when the intentory is very large, that will overpower the title boost,
> no?
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, Mar 28, 2013 at 2:27 PM, Joel Bernstein 
> wrote:
> > If you had a high boost on the title with a moderate boost on the
> inventory
> > it sounds like you'd get boots first ordered by inventory followed by
> jeans
> > ordered by inventory. Because the heavy title boost would move the boots
> to
> > the top. You can play with the boost factors to try and get the mix
> you're
> > looking for.
> >
> >
> > On Thu, Mar 28, 2013 at 1:20 PM, scallawa  wrote:
> >
> >> Thanks for the fast response.  I am still just learning solr so please
> bear
> >> with me.
> >>
> >> This still sounds like the wrong products would appear at the top if
> they
> >> have more inventory unless I am misunderstanding.  High boost low boost
> >> seems to make sense to me.  That alone would return the more relevant
> items
> >> at the top but once we do a query boost on inventory, wouldn't jeans
> (using
> >> the aforementioned example) with more inventory that boots appear at
> top.
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > Joel Bernstein
> > Professional Services LucidWorks
>



-- 
Joel Bernstein
Professional Services LucidWorks


Re: Could not load config for solrconfig.xml

2013-03-28 Thread A. Lotfi
Thanks, 
my path to solr home was missing something, it's worlking, but no results, the 
same solr app with same configuration files worked in windows.
 
Abdel



 From: Gora Mohanty 
To: solr-user@lucene.apache.org; A. Lotfi  
Cc: "gene...@lucene.apache.org"  
Sent: Thursday, March 28, 2013 3:22 PM
Subject: Re: Could not load config for solrconfig.xml
 
On 29 March 2013 00:19, A. Lotfi  wrote:
> Hi,
> solr setup in windows worked fine,
> I tried to follow installing solr in unix, when I started tomcat I got this 
> exxception :
[...]

Seems it cannot find solrconfig.xml. The relevant part from the logs is:
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat
h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa
che-tomcat-7.0.39/bin

Have you defined the solr/home property properly in your Solr
configuration file?

Regards,
Gora

Re: Too many fields to Sort in Solr

2013-03-28 Thread Joel Bernstein
Not, sure that making changes to the solrconfig.xml is going down the right
path here. There might something else with your setup that's causing this
issue. I'm not sure what it would be though.


On Thu, Mar 28, 2013 at 1:38 PM, adityab  wrote:

> Wo. that's strange.
>
> I tried toggling with the code factory line in solrconfig.xml (attached in
> this post)
> commenting gives me error where as un-commenting works.
>
> can you please take a look into config and let me know if anything wrong
> there?
>
> thanks
> Aditya
>
>
> solrconfig.xml
> 
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052131.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Joel Bernstein
Professional Services LucidWorks


Re: bootstrap_conf without restarting

2013-03-28 Thread Timothy Potter
I do this frequently, but use the scripts provided in cloud-scripts, e.g.

export ZK_HOST=...

cloud-scripts/zkcli.sh -zkhost $ZK_HOST -cmd upconfig -confdir
$COLLECTION_INSTANCE_DIR/conf -confname $COLLECTION_NAME

Also, once you do this, you still have to reload the collection so that it
picks up the change:

curl -i -v "
http://URL/solr/admin/collections?action=RELOAD&name=COLLECTION_NAME";




On Thu, Mar 28, 2013 at 1:03 PM, Mark Miller  wrote:

> Couple notes though:
>
> > java -classpath example/solr-webapp/WEB-INF/lib/*
> > org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
> > -confdir example/solr/collection1/conf -confname conf1 -solrhome
> > example/solr
>
> I don't think you want that -solrhome - if I remember right, thats for
> testing/local purposes and is just for when you want to run zk internally
> from the cmd. Generally that should be ignored. I think you also might want
> to put the -classpath value in quotes, or your OS can do some auto
> expanding that causes issues…so I think it might be better to do like:
>
> > java -classpath "example/solr-webapp/WEB-INF/lib/*"
> > org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
> > -confdir example/solr/collection1/conf -confname conf1
>
> I think the examples on the wiki should probably be updated. -solrhome is
> only needed with the bootstrap option I believe.
>
> - Mark
>
> On Mar 28, 2013, at 1:14 PM, Joel Bernstein  wrote:
>
> > You can use the upconfig command witch is described on the Solr Cloud
> wiki
> > page, followed by a collection reload also described on the wiki. Here
> is a
> > sample command upconfig:
> >
> > java -classpath example/solr-webapp/WEB-INF/lib/*
> > org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
> > -confdir example/solr/collection1/conf -confname conf1 -solrhome
> > example/solr
> >
> >
> >
> >
> > On Thu, Mar 28, 2013 at 12:05 PM, jimtronic  wrote:
> >
> >> I'm doing fairly frequent changes to my data-config.xml files on some
> of my
> >> cores in a solr cloud setup. Is there anyway to to get these files
> active
> >> and up to Zookeeper without restarting the instance?
> >>
> >> I've noticed that if I just launch another instance of solr with the
> >> bootstrap_conf flag set to true, it uploads the new settings, but it
> dies
> >> because there's already a solr instance running on that port. It also
> seems
> >> to make the original one unresponsive or at least "down" in zookeeper's
> >> eyes. I then just restart that instance and everything is back up. It'd
> be
> >> nice if I could bootstrap without actually starting solr.
> >>
> >> What's the best practice for deploying changes to data-config.xml?
> >>
> >> Thanks, Jim
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > Joel Bernstein
> > Professional Services LucidWorks
>
>


Re: Could not load config for solrconfig.xml

2013-03-28 Thread Gora Mohanty
On 29 March 2013 01:59, A. Lotfi  wrote:
> Thanks,
> my path to solr home was missing something, it's worlking, but no results, 
> the same solr app with same configuration files worked in windows.

What do you mean by "no results"? Have you indexed stuff, and
are not able to search for it? Are you expecting to copy Solr files
from an old setup with an index, and have things work? That would
be OK, provided that the Solr index formats were compatible,
but you would also need to copy the index, and define dataDir
properly in solrconfig.xml.

Regards,
Gora


Re: Solr Cloud update process

2013-03-28 Thread Walter Underwood
There are lots of small issues, though. 

1. Is Solr tested with a mix of current and previous versions? It is safe to 
run a cluster that is a mix of 4.1 and 4.2, even for a little bit?

2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/, not 
just the main XML files.

3. We don't want a cluster with config files that are ahead of the software 
version, so I think we need:

* Update all the war files and restart each Solr process.
* Upload the new config files 
* Reload each collection on each Solr process

But this requires that Solr 4.2 be able to start with Solr 4.1 config files.

4. Do we need to stop updates, wait for all nodes to sync, and not restart 
until the whole cluster is uploaded.

5. I'd like a bit more detail about exactly what upconfig is supposed to do, 
because I spent a lot of time with it doing things that did not result in a 
working Solr cluster. For example, for files in the directory argument, where 
exactly do they end up in the Zookeeper space? Currently, I've been doing 
updates with bootstrap, because it was the only thing I could get to work.

wunder

On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote:

> On 3/27/2013 12:34 PM, Walter Underwood wrote:
>> What do people do for updating, say from 4.1 to 4.2.1, on a live cluster?
>> 
>> I need to help our release engineering team create the Jenkins scripts for 
>> deployment.
> 
> Aside from replacing the .war file and restarting your container, there 
> hopefully won't be anything additional required.
> 
> The subject says SolrCloud, so your config(s) should be in zookeeper. It 
> would generally be a good idea to update luceneMatchVersion to LUCENE_42 in 
> the config(s), unless you happen to know that you're relying on behavior from 
> the old version that changed in the new version.
> 
> I also make a point of deleting the old extracted version of the .war before 
> restarting, just to be sure there won't be any problems.  In theory a servlet 
> container should be able to handle this without intervention, but I don't 
> like taking the chance.
> 
> Thanks,
> Shawn
> 






Re: Solr Cloud update process

2013-03-28 Thread Timothy Potter
Hi Walter,

I just did our upgrade from a nightly build of 4.1 (a few weeks before the
release) and 4.2 - thankfully it went off with 0 downtime and no issues ;-)

First and foremost, I had a staging environment that I upgraded first so I
already had a good feeling that things would be fine. Hopefully you have a
sandbox environment where you can mess around with the upgrade first.

On Thu, Mar 28, 2013 at 3:01 PM, Walter Underwood wrote:

> There are lots of small issues, though.
>
> 1. Is Solr tested with a mix of current and previous versions? It is safe
> to run a cluster that is a mix of 4.1 and 4.2, even for a little bit?
>
>
I did a rolling upgrade and no issues. So I dropped a node, waited until
that was noticed by Zk (almost instant). This left me with a new leader
still on 4.1 and then I brought up a replica on 4.2. Then I took down the
leader on 4.1 (so Solr failed over to my 4.2 node) and brought it up to 4.2



> 2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/,
> not just the main XML files.
>

Afaik yes - I didn't change any configuration between 4.1 and 4.2 other
than some newSearcher warming queries and cache settings


>
> 3. We don't want a cluster with config files that are ahead of the
> software version, so I think we need:
>
> * Update all the war files and restart each Solr process.
> * Upload the new config files
> * Reload each collection on each Solr process
>
> But this requires that Solr 4.2 be able to start with Solr 4.1 config
> files.
>

This is what I did too.


>
> 4. Do we need to stop updates, wait for all nodes to sync, and not restart
> until the whole cluster is uploaded.
>

Can't help you on this one as I was not accepting updates during the
upgrade.


>
> 5. I'd like a bit more detail about exactly what upconfig is supposed to
> do, because I spent a lot of time with it doing things that did not result
> in a working Solr cluster. For example, for files in the directory
> argument, where exactly do they end up in the Zookeeper space? Currently,
> I've been doing updates with bootstrap, because it was the only thing I
> could get to work.
>
>
So when you do upconfig, you pass the collection name, so the files get put
under: /configs/COLLECTION_NAME
You can test this by doing the upconfig and then going into the admin
console: Cloud > Tree > /configs and verifying your updates are correct.




> wunder
>
> On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote:
>
> > On 3/27/2013 12:34 PM, Walter Underwood wrote:
> >> What do people do for updating, say from 4.1 to 4.2.1, on a live
> cluster?
> >>
> >> I need to help our release engineering team create the Jenkins scripts
> for deployment.
> >
> > Aside from replacing the .war file and restarting your container, there
> hopefully won't be anything additional required.
> >
> > The subject says SolrCloud, so your config(s) should be in zookeeper. It
> would generally be a good idea to update luceneMatchVersion to LUCENE_42 in
> the config(s), unless you happen to know that you're relying on behavior
> from the old version that changed in the new version.
> >
> > I also make a point of deleting the old extracted version of the .war
> before restarting, just to be sure there won't be any problems.  In theory
> a servlet container should be able to handle this without intervention, but
> I don't like taking the chance.
> >
> > Thanks,
> > Shawn
> >
>
>
>
>
>


Re: Solr Cloud update process

2013-03-28 Thread Mark Miller
Comments hidden inline below. Overall - we need to focus on upgrades at some 
point, but there is little that should stop the old distrib update process from 
working (multi node clusters pre solrcloud).

Hoever, we should have tests and stuff. If only the days were twice as long.

On Mar 28, 2013, at 5:27 PM, Timothy Potter  wrote:

> Hi Walter,
> 
> I just did our upgrade from a nightly build of 4.1 (a few weeks before the
> release) and 4.2 - thankfully it went off with 0 downtime and no issues ;-)
> 
> First and foremost, I had a staging environment that I upgraded first so I
> already had a good feeling that things would be fine. Hopefully you have a
> sandbox environment where you can mess around with the upgrade first.
> 
> On Thu, Mar 28, 2013 at 3:01 PM, Walter Underwood 
> wrote:
> 
>> There are lots of small issues, though.
>> 
>> 1. Is Solr tested with a mix of current and previous versions? It is safe
>> to run a cluster that is a mix of 4.1 and 4.2, even for a little bit?
>> 
>> 
> I did a rolling upgrade and no issues. So I dropped a node, waited until
> that was noticed by Zk (almost instant). This left me with a new leader
> still on 4.1 and then I brought up a replica on 4.2. Then I took down the
> leader on 4.1 (so Solr failed over to my 4.2 node) and brought it up to 4.2
> 
> 
> 
>> 2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/,
>> not just the main XML files.
>> 
> 
> Afaik yes - I didn't change any configuration between 4.1 and 4.2 other
> than some newSearcher warming queries and cache settings

That's generally been how things work - old config works with new versions. 
Occasionally, things might get deprecated. That's why there is the version 
thing in solrconfig.xml.

> 
> 
>> 
>> 3. We don't want a cluster with config files that are ahead of the
>> software version, so I think we need:
>> 
>> * Update all the war files and restart each Solr process.
>> * Upload the new config files
>> * Reload each collection on each Solr process
>> 
>> But this requires that Solr 4.2 be able to start with Solr 4.1 config
>> files.
>> 
> 
> This is what I did too.
> 
> 
>> 
>> 4. Do we need to stop updates, wait for all nodes to sync, and not restart
>> until the whole cluster is uploaded.
>> 
> 
> Can't help you on this one as I was not accepting updates during the
> upgrade.

This should generally work fine.

> 
> 
>> 
>> 5. I'd like a bit more detail about exactly what upconfig is supposed to
>> do, because I spent a lot of time with it doing things that did not result
>> in a working Solr cluster. For example, for files in the directory
>> argument, where exactly do they end up in the Zookeeper space? Currently,
>> I've been doing updates with bootstrap, because it was the only thing I
>> could get to work.
>> 
>> 
> So when you do upconfig, you pass the collection name, so the files get put
> under: /configs/COLLECTION_NAME
> You can test this by doing the upconfig and then going into the admin
> console: Cloud > Tree > /configs and verifying your updates are correct.

The main different between using bootstrap and upconfig is that upconfig does 
not link a collection to a config set.

You must have a link from a collection to a config set. The following rules 
apply for this:

1. If there is only one config set, when you start a new collection without an 
explicit link, it will link to it.
2. If a collection does not have an explicit link, but shares the name of a 
config set, it will link to it.
3. You can set an explicit link.

Also, you can link before creating the collection - it will sit in zk waiting 
for the collection to find it.

- Mark

> 
> 
> 
> 
>> wunder
>> 
>> On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote:
>> 
>>> On 3/27/2013 12:34 PM, Walter Underwood wrote:
 What do people do for updating, say from 4.1 to 4.2.1, on a live
>> cluster?
 
 I need to help our release engineering team create the Jenkins scripts
>> for deployment.
>>> 
>>> Aside from replacing the .war file and restarting your container, there
>> hopefully won't be anything additional required.
>>> 
>>> The subject says SolrCloud, so your config(s) should be in zookeeper. It
>> would generally be a good idea to update luceneMatchVersion to LUCENE_42 in
>> the config(s), unless you happen to know that you're relying on behavior
>> from the old version that changed in the new version.
>>> 
>>> I also make a point of deleting the old extracted version of the .war
>> before restarting, just to be sure there won't be any problems.  In theory
>> a servlet container should be able to handle this without intervention, but
>> I don't like taking the chance.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
>> 
>> 
>> 
>> 



Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Tomás Fernández Löbbe
Steve, could you add me to the contrib group? TomasFernandezLobbe

Thanks!

Tomás


On Thu, Mar 28, 2013 at 1:04 PM, Steve Rowe  wrote:

> On Mar 28, 2013, at 11:57 AM, Jilal Oussama 
> wrote:
> > Please add OussamaJilal to the group.
>
> Added to solr ContributorsGroup.
>


Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Steve Rowe
On Mar 28, 2013, at 5:43 PM, Tomás Fernández Löbbe  
wrote:
> Steve, could you add me to the contrib group? TomasFernandezLobbe

Added to solr ContributorsGroup.

How to shut down the SolrCloud?

2013-03-28 Thread Li, Qiang
How to shut down the SolrCloud? Just kill all nodes?

Regards,
Ivan

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message. Local registered entity information: 
http://www.msci.com/legal/local_registered_entities.html


Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Chris R
So, by using the numshards  at initialization time, using the sample
collection1 solr.xml, I'm able to create a sharded and distributed index.
Also, by removing any initial cores from the solr.xml file, i'm able to use
the collections API via the web to create multiple collection with sharded
indexes that work correctly; however, I can't create distributed
collections by using the solr.xml alone.   Adding the numshards parameter
to the first instance of a collection core in the solr.xml file is ignore,
cores are created, by update distribution doesn't happen.  When booting up
Solr, the configs INFO messages show numShards= null.  I get the impression
from the documentation that you should be able to do this, buy I haven't
seen a specific example.

With out that, it seems that I'm relegated to the shard names, locations,
etc provided by the collections API.  I've done this testing under 4.1

True or False?

Chris
 On Mar 27, 2013 9:46 PM, "corg...@gmail.com"  wrote:

> I realized my error shortly, more docs, better spread.  I continued to do
> some testing to see how I could manually lay out the shards in what I
> thought was a more organized manner and with more descriptive  names than
> the numshards parameter alone produced.  I also gen'd up a few thousand
> docs and schema to test with.
>
> Appreciate the help.
>
>
>
> - Reply message -
> From: "Erick Erickson" 
> To: 
> Subject: Solrcloud 4.1 Collection with multiple slices only use
> Date: Wed, Mar 27, 2013 9:30 pm
>
>
> First, three documents isn't enough to really test. The formula for
> assigning shards is to hash on the unique ID. It _is_ possible that
> all three just happened to land on the same shard. If you index all 32
> docs in the example dir and they're all on the same shard, we should
> talk.
>
> Second, a regular query to the cluster will always search all the
> shards. Use &distrib=false on the URL to restrict the search to just
> the node you fire the request at.
>
> Let us know if you index more docs and still see the problem.
>
> Best
> Erick
>
> On Wed, Mar 27, 2013 at 9:39 AM, Chris R  wrote:
> > So - I must be missing something very basic here and I've gone back to
> the
> > Wiki example.  After setting up the two shard example in the first
> tutorial
> > and indexing the three example documents, look at the shards in the Admin
> > UI.  The documents are stored in the index where the update with
> directed -
> > they aren't distributed across both shards.
> >
> > Release notes state that the compositeId router is the default when using
> > the numshards parameter?  I want an even distribution of documents based
> on
> > ID across all shards suggestions on what I'm screwing up.
> >
> > Chris
> >
> > On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller 
> wrote:
> >
> >> I'm guessing you didn't specify numShards. Things changed in 4.1 - if
> you
> >> don't specify numShards it goes into a mode where it's up to you to
> >> distribute updates.
> >>
> >> - Mark
> >>
> >> On Mar 25, 2013, at 10:29 PM, Chris R  wrote:
> >>
> >> > I have two issues and I'm unsure if they are related:
> >> >
> >> > Problem:  After setting up a multiple collection Solrcloud 4.1
> instance
> >> on
> >> > seven servers, when I index the documents they aren't distributed
> across
> >> > the index slices.  It feels as though, I don't actually have a "cloud"
> >> > implementation, yet everything I see in the admin interface and
> zookeeper
> >> > implies I do.  I feel as I'm overlooking something obvious, but have
> not
> >> > been able to figure out what.
> >> >
> >> > Configuration: Seven servers and four collections, each with 12 slices
> >> (no
> >> > replica shards yet).  Zookeeper configured in a three node ensemble.
> >>  When
> >> > I send documents to Server1/Collection1 (which holds two slices of
> >> > collection1), all the documents show up in a single index shard
> (core).
> >> > Perhaps related, I have found it impossible to get Solr to recognize
> the
> >> > server names with anything but a literal host="servername" parameter
> in
> >> the
> >> > solr.xml.  hostname parameters, host files, network, dns, are all
> >> > configured correctly
> >> >
> >> > I have a Solr 4.0 single collection set up similarly and it works just
> >> > fine.  I'm using the same schema.xml and solrconfig.xml files on the
> 4.1
> >> > implementation with only the luceneMatchVersion changed to LUCENE_41.
> >> >
> >> > sample solr.xml from server1
> >> >
> >> > 
> >> > 
> >> >  >> > shareSchema="true" zkClientTimeout="6">
> >> >  >> > instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
> >> > dataDir="/solr/col201301/col201301s04sh01/data"/>
> >> >  >> > instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
> >> > dataDir="/solr/col201301/col201301s11sh01/data"/>
> >> >  >> > instanceDir="/solr/col201302/col201302s06sh01" name="col201302s06sh01"
> >> > dataDir="/solr/col201302/col201302s06sh01/data"/>
> >> >  >> > instance

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Mark Miller
True - though I think for 4.2. numShards has never been respected in the  wrote:

> So, by using the numshards  at initialization time, using the sample
> collection1 solr.xml, I'm able to create a sharded and distributed index.
> Also, by removing any initial cores from the solr.xml file, i'm able to use
> the collections API via the web to create multiple collection with sharded
> indexes that work correctly; however, I can't create distributed
> collections by using the solr.xml alone.   Adding the numshards parameter
> to the first instance of a collection core in the solr.xml file is ignore,
> cores are created, by update distribution doesn't happen.  When booting up
> Solr, the configs INFO messages show numShards= null.  I get the impression
> from the documentation that you should be able to do this, buy I haven't
> seen a specific example.
> 
> With out that, it seems that I'm relegated to the shard names, locations,
> etc provided by the collections API.  I've done this testing under 4.1
> 
> True or False?
> 
> Chris
> On Mar 27, 2013 9:46 PM, "corg...@gmail.com"  wrote:
> 
>> I realized my error shortly, more docs, better spread.  I continued to do
>> some testing to see how I could manually lay out the shards in what I
>> thought was a more organized manner and with more descriptive  names than
>> the numshards parameter alone produced.  I also gen'd up a few thousand
>> docs and schema to test with.
>> 
>> Appreciate the help.
>> 
>> 
>> 
>> - Reply message -
>> From: "Erick Erickson" 
>> To: 
>> Subject: Solrcloud 4.1 Collection with multiple slices only use
>> Date: Wed, Mar 27, 2013 9:30 pm
>> 
>> 
>> First, three documents isn't enough to really test. The formula for
>> assigning shards is to hash on the unique ID. It _is_ possible that
>> all three just happened to land on the same shard. If you index all 32
>> docs in the example dir and they're all on the same shard, we should
>> talk.
>> 
>> Second, a regular query to the cluster will always search all the
>> shards. Use &distrib=false on the URL to restrict the search to just
>> the node you fire the request at.
>> 
>> Let us know if you index more docs and still see the problem.
>> 
>> Best
>> Erick
>> 
>> On Wed, Mar 27, 2013 at 9:39 AM, Chris R  wrote:
>>> So - I must be missing something very basic here and I've gone back to
>> the
>>> Wiki example.  After setting up the two shard example in the first
>> tutorial
>>> and indexing the three example documents, look at the shards in the Admin
>>> UI.  The documents are stored in the index where the update with
>> directed -
>>> they aren't distributed across both shards.
>>> 
>>> Release notes state that the compositeId router is the default when using
>>> the numshards parameter?  I want an even distribution of documents based
>> on
>>> ID across all shards suggestions on what I'm screwing up.
>>> 
>>> Chris
>>> 
>>> On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller 
>> wrote:
>>> 
 I'm guessing you didn't specify numShards. Things changed in 4.1 - if
>> you
 don't specify numShards it goes into a mode where it's up to you to
 distribute updates.
 
 - Mark
 
 On Mar 25, 2013, at 10:29 PM, Chris R  wrote:
 
> I have two issues and I'm unsure if they are related:
> 
> Problem:  After setting up a multiple collection Solrcloud 4.1
>> instance
 on
> seven servers, when I index the documents they aren't distributed
>> across
> the index slices.  It feels as though, I don't actually have a "cloud"
> implementation, yet everything I see in the admin interface and
>> zookeeper
> implies I do.  I feel as I'm overlooking something obvious, but have
>> not
> been able to figure out what.
> 
> Configuration: Seven servers and four collections, each with 12 slices
 (no
> replica shards yet).  Zookeeper configured in a three node ensemble.
 When
> I send documents to Server1/Collection1 (which holds two slices of
> collection1), all the documents show up in a single index shard
>> (core).
> Perhaps related, I have found it impossible to get Solr to recognize
>> the
> server names with anything but a literal host="servername" parameter
>> in
 the
> solr.xml.  hostname parameters, host files, network, dns, are all
> configured correctly
> 
> I have a Solr 4.0 single collection set up similarly and it works just
> fine.  I'm using the same schema.xml and solrconfig.xml files on the
>> 4.1
> implementation with only the luceneMatchVersion changed to LUCENE_41.
> 
> sample solr.xml from server1
> 
> 
> 
>  shareSchema="true" zkClientTimeout="6">
>  instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
> dataDir="/solr/col201301/col201301s04sh01/data"/>
>  instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
> dataDir="/solr/col201301/col201301s11sh01/data"/>
> >

Re: How to shut down the SolrCloud?

2013-03-28 Thread Mark Miller
Currently, yes. Stop each web container in the normal fashion. That will do a 
clean shutdown.

- Mark

On Mar 28, 2013, at 5:48 PM, "Li, Qiang"  wrote:

> How to shut down the SolrCloud? Just kill all nodes?
> 
> Regards,
> Ivan
> 
> This email message and any attachments are for the sole use of the intended 
> recipients and may contain proprietary and/or confidential information which 
> may be privileged or otherwise protected from disclosure. Any unauthorized 
> review, use, disclosure or distribution is prohibited. If you are not an 
> intended recipient, please contact the sender by reply email and destroy the 
> original message and any copies of the message as well as any attachments to 
> the original message. Local registered entity information: 
> http://www.msci.com/legal/local_registered_entities.html



Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Chris R
Interesting, I've been doing battle with it while coming from a 4.0
environment.  I only had a single collection then and just created the
solr.xml files for each server up front.  They each supported a half dozen
cores for a single collection.

As for 4.1 and collections API, the only issue I've had is the
maxCoresPerNode.   As you said, the responses all say ok even when its
not.  I'll probably move up to 4.2 tomorrow.

Thanks for the reply.
On Mar 28, 2013 6:23 PM, "Mark Miller"  wrote:

> True - though I think for 4.2. numShards has never been respected in the
> 
> In 4.0 and 4.1, things should have still worked though - you didn't need
> to give numShards and everything should work just based on configuring
> different shard names for core or accepting the default shard names.
>
> In 4.2 this went away - not passing numShards now means that you must
> distrib updates yourself. There are various technical reasons for this
> given new features that are being added.
>
> So, you can only really pre configure *one* collection in solr.xml and
> then use the numShards sys prop. If you wanted to create another collection
> the same way with a *different* number of shards, you would have to stop
> Solr, do a new numShards sys prop after pre configuring the next
> collection, then start Solr. Not really a good option.
>
> And so, the collections API is the way to go - and it's fairly poor in 4.2
> due to it's lack of result responses (you have to search the overseer
> logs). It's slightly better in 4.2 (you will get some response) and much
> better in 4.2.1 (you will get decent responses).
>
> Now that it's much more central, it will continue to improve rapidly.
>
> - Mark
>
> On Mar 28, 2013, at 6:08 PM, Chris R  wrote:
>
> > So, by using the numshards  at initialization time, using the sample
> > collection1 solr.xml, I'm able to create a sharded and distributed index.
> > Also, by removing any initial cores from the solr.xml file, i'm able to
> use
> > the collections API via the web to create multiple collection with
> sharded
> > indexes that work correctly; however, I can't create distributed
> > collections by using the solr.xml alone.   Adding the numshards parameter
> > to the first instance of a collection core in the solr.xml file is
> ignore,
> > cores are created, by update distribution doesn't happen.  When booting
> up
> > Solr, the configs INFO messages show numShards= null.  I get the
> impression
> > from the documentation that you should be able to do this, buy I haven't
> > seen a specific example.
> >
> > With out that, it seems that I'm relegated to the shard names, locations,
> > etc provided by the collections API.  I've done this testing under 4.1
> >
> > True or False?
> >
> > Chris
> > On Mar 27, 2013 9:46 PM, "corg...@gmail.com"  wrote:
> >
> >> I realized my error shortly, more docs, better spread.  I continued to
> do
> >> some testing to see how I could manually lay out the shards in what I
> >> thought was a more organized manner and with more descriptive  names
> than
> >> the numshards parameter alone produced.  I also gen'd up a few thousand
> >> docs and schema to test with.
> >>
> >> Appreciate the help.
> >>
> >>
> >>
> >> - Reply message -
> >> From: "Erick Erickson" 
> >> To: 
> >> Subject: Solrcloud 4.1 Collection with multiple slices only use
> >> Date: Wed, Mar 27, 2013 9:30 pm
> >>
> >>
> >> First, three documents isn't enough to really test. The formula for
> >> assigning shards is to hash on the unique ID. It _is_ possible that
> >> all three just happened to land on the same shard. If you index all 32
> >> docs in the example dir and they're all on the same shard, we should
> >> talk.
> >>
> >> Second, a regular query to the cluster will always search all the
> >> shards. Use &distrib=false on the URL to restrict the search to just
> >> the node you fire the request at.
> >>
> >> Let us know if you index more docs and still see the problem.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Mar 27, 2013 at 9:39 AM, Chris R  wrote:
> >>> So - I must be missing something very basic here and I've gone back to
> >> the
> >>> Wiki example.  After setting up the two shard example in the first
> >> tutorial
> >>> and indexing the three example documents, look at the shards in the
> Admin
> >>> UI.  The documents are stored in the index where the update with
> >> directed -
> >>> they aren't distributed across both shards.
> >>>
> >>> Release notes state that the compositeId router is the default when
> using
> >>> the numshards parameter?  I want an even distribution of documents
> based
> >> on
> >>> ID across all shards suggestions on what I'm screwing up.
> >>>
> >>> Chris
> >>>
> >>> On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller 
> >> wrote:
> >>>
>  I'm guessing you didn't specify numShards. Things changed in 4.1 - if
> >> you
>  don't specify numShards it goes into a mode where it's up to you to
>  distribute updates.
> 
>  - Mark
> 

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Mark Miller

On Mar 28, 2013, at 6:30 PM, Chris R  wrote:

> I'll probably move up to 4.2 tomorrow.

4.2.1 should be ready as soon as I have time to publish it - we have a passing 
vote and I think we are close to 72 hours after. I just have to stock up on 
some beer first - Robert tells me it's like a 20 beer event…

- Mark



Re: Batch Search Query

2013-03-28 Thread Chris Hostetter

: Now, what happens is a user will upload say a word document to us. We then
: parse it and process it into segments. It very well could be 5000 segments
: or even more in that word document. Each one of those ~5000 segments needs
: to be searched for similar segments in solr. I’m not quite sure how I will
: do the query (whether proximate or something else). The point though, is to
: get back similar results for each segment.

You've described your black box (an index of small textual documents) 
and you've described your input (a large document that will be broken down 
into N=~5000 small textual snippets) but you haven't really clarified what 
your desired output should be...

* N textual documents from your index, where each doc is the 1 'best' 
match to 1 of hte N textual input snippets.

* Some fixed number Y textual documents from your index representing the 
"best of the best" matches against your textual input snippets (ie: if one 
input snippet is a "really good" match for multiple indexed docs, return 
all of those "really good" matches, but don't return any matches from 
other snippets if the only matches are "poor".)

* Some variable number Y textual documents from your index representing 
the "best of hte best" matches against your textual input snippets based 
on some minimum threshhold of matching criteria.

* etc...

Forgot for a momoent that we are talking about solr at all -- describe 
some hypothetical data, some hypothetical query examples, and some 
hypothetical results you would like to get back (or not get back) 
from each of those query examples (ideally in psuedo-code) and lets see if 
that doesn't help suggest an implemntation strategy.


-Hoss

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread corgone
That's my kind of release!

Sent from my Verizon Wireless Phone

- Reply message -
From: "Mark Miller" 
To: 
Subject: Solrcloud 4.1 Collection with multiple slices only use
Date: Thu, Mar 28, 2013 6:34 pm



On Mar 28, 2013, at 6:30 PM, Chris R  wrote:

> I'll probably move up to 4.2 tomorrow.

4.2.1 should be ready as soon as I have time to publish it - we have a passing 
vote and I think we are close to 72 hours after. I just have to stock up on 
some beer first - Robert tells me it's like a 20 beer event…

- Mark



Re: How to update synonyms.txt without restart?

2013-03-28 Thread Chris Hostetter

: But solr wiki says:
: ```
: Starting with Solr4.0, the RELOAD command is implemented in a way that
: results a "live" reloads of the SolrCore, reusing the existing various
: objects such as the SolrIndexWriter. As a result, some configuration
: options can not be changed and made active with a simple RELOAD...

Directly below that sentence are bullet points listing exactly which 
config options can't be changed with a simple reload...

>>>   * IndexWriter related settings in 
>>>   *  location 

: http://wiki.apache.org/solr/CoreAdmin#RELOAD


-Hoss


Re: Solr Cloud update process

2013-03-28 Thread Shawn Heisey

On 3/28/2013 3:01 PM, Walter Underwood wrote:

There are lots of small issues, though.

1. Is Solr tested with a mix of current and previous versions? It is safe to 
run a cluster that is a mix of 4.1 and 4.2, even for a little bit?

2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/, not 
just the main XML files.

3. We don't want a cluster with config files that are ahead of the software 
version, so I think we need:

* Update all the war files and restart each Solr process.
* Upload the new config files
* Reload each collection on each Solr process

But this requires that Solr 4.2 be able to start with Solr 4.1 config files.

4. Do we need to stop updates, wait for all nodes to sync, and not restart 
until the whole cluster is uploaded.

5. I'd like a bit more detail about exactly what upconfig is supposed to do, 
because I spent a lot of time with it doing things that did not result in a 
working Solr cluster. For example, for files in the directory argument, where 
exactly do they end up in the Zookeeper space? Currently, I've been doing 
updates with bootstrap, because it was the only thing I could get to work.


Solr 4.2 will work just fine with config files from 4.1.

I have a SolrCloud that was running a 4.1 snapshot.  I upgraded it to 
4.2.1 built from source with no problem.  The exact steps that I did were:


1) Replace solr.war.
2) Replace lucene-analyzers-icu-4.1-SNAPSHOT.jar with 
lucene-analyzers-icu-4.2.1-SNAPSHOT.jar

3) Upgrade all of my jetty jars from 8.1.7 to 8.1.9.
4) Repeat the steps above on the other server.
5) Use zkcli.sh to 'upconfig' a replacement config set with only one 
change - luceneMatchVersion went from LUCENE_40 to LUCENE_42.

6) Restart both Solr instances.

Upgrading jetty is something applicable to only my install, and was not 
a necessary step.  The jetty version currently included in Solr as of 
4.1 is 8.1.8 - see SOLR-4155.


The upconfig command on zkcli.sh will add/replace the config set with 
the one that you specify.  It will go into /configs in your zookeeper 
ensemble.  If you specify a chroot on your zkhost parameter, then it 
will go into /path/to/chroot/configs instead.  Most of the time a chroot 
will only have one element, so /chroot/configs would the most likely 
location.


I actually would like more detail on upconfig myself - what if you 
delete files from the config directory on disk?  Will they be deleted 
from zookeeper?  I use a solrconfig that has xinclude statements, and 
occasionally those files do get deleted or renamed.


Thanks,
Shawn



Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Shawn Heisey

On 3/28/2013 4:23 PM, Mark Miller wrote:

True - though I think for 4.2. numShards has never been respected in the 

Can't you leave numShards out completely, then include a numShards 
parameter on a collection api CREATE url, possibly giving a different 
numShards to each collection?


Thanks,
Shawn



Re: SOLR - "Unable to execute query" error - DIH

2013-03-28 Thread Chris Hostetter

: I am trying to index data from SQL Server view to the SOLR using the DIH

Have you ruled out the view itself being the bottle neck?

Try running whatever command line SQLServer client exists on your SOLR 
server to connect remotely to your existing SQL server and run "select * 
from view" and redirect thek output to a file.

that will give you a minimal absolute baseline for the best possible 
performace you could expect to hope for when indexing into Solr -- and tip 
you off to wether the view is the problem when asking for more then a 
handful of documents.



-Hoss


Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Mark Miller

On Mar 28, 2013, at 7:30 PM, Shawn Heisey  wrote:

> Can't you leave numShards out completely, then include a numShards parameter 
> on a collection api CREATE url, possibly giving a different numShards to each 
> collection?
> 
> Thanks,
> Shawn
> 

Yes - that's why I say the collections API is the way forward - it has none of 
these limitations. The limitations are all around pre-configuring everything in 
solr.xml and not using the collections API.

- Mark

Re: Solr Cloud update process

2013-03-28 Thread Mark Miller

On Mar 28, 2013, at 7:27 PM, Shawn Heisey  wrote:

> 
> I actually would like more detail on upconfig myself - what if you delete 
> files from the config directory on disk?  Will they be deleted from 
> zookeeper?  I use a solrconfig that has xinclude statements, and occasionally 
> those files do get deleted or renamed.
> 
> Thanks,
> Shawn
> 

Currently, it's a straight upload - if files went away locally, they will stay 
in zk. It will just replace what you upload. Happy to help implement a sync 
option or something if you create a JIRA for it.

- mark

Re: How to update synonyms.txt without restart?

2013-03-28 Thread Upayavira
Not sure, but if you put it in the data dir, I think it picks it up and
reloads on commit.

Upayavira

On Thu, Mar 28, 2013, at 09:11 AM, Kaneyama Genta wrote:
> Dear all,
> 
> I investigating how to update synonyms.txt.
> Some people says CORE RELOAD will reload synonyms.txt.
> 
> But solr wiki says:
> ```
> Starting with Solr4.0, the RELOAD command is implemented in a way that
> results a "live" reloads of the SolrCore, reusing the existing various
> objects such as the SolrIndexWriter. As a result, some configuration
> options can not be changed and made active with a simple RELOAD...
> ```
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
> 
> And https://issues.apache.org/jira/browse/SOLR-3592 is marked as
> unresolved.
> 
> Problem is How can I update synonyms.txt in production environment?
> Workaround is restart Solr process. But it is not looks good for me.
> 
> Will someone tell me what is the best practice of synonyms.txt updating?
> 
> Thanks in advance.


Re: How to update synonyms.txt without restart?

2013-03-28 Thread Mark Miller
But this is fixed in 4.2 - now the index writer is rebooted on core reload.

So that's just 4.0 and 4.1.

- Mark

On Mar 28, 2013, at 6:48 PM, Chris Hostetter  wrote:

> 
> : But solr wiki says:
> : ```
> : Starting with Solr4.0, the RELOAD command is implemented in a way that
> : results a "live" reloads of the SolrCore, reusing the existing various
> : objects such as the SolrIndexWriter. As a result, some configuration
> : options can not be changed and made active with a simple RELOAD...
> 
> Directly below that sentence are bullet points listing exactly which 
> config options can't be changed with a simple reload...
> 
  * IndexWriter related settings in 
  *  location 
> 
> : http://wiki.apache.org/solr/CoreAdmin#RELOAD
> 
> 
> -Hoss



Re: How to update synonyms.txt without restart?

2013-03-28 Thread Mark Miller
Though I think *another* JIRA made data dir not changeable over core reload for 
some reason I don't recall exactly. But the other stuff is back to being 
changeable :)

- Mark

On Mar 28, 2013, at 8:04 PM, Mark Miller  wrote:

> But this is fixed in 4.2 - now the index writer is rebooted on core reload.
> 
> So that's just 4.0 and 4.1.
> 
> - Mark
> 
> On Mar 28, 2013, at 6:48 PM, Chris Hostetter  wrote:
> 
>> 
>> : But solr wiki says:
>> : ```
>> : Starting with Solr4.0, the RELOAD command is implemented in a way that
>> : results a "live" reloads of the SolrCore, reusing the existing various
>> : objects such as the SolrIndexWriter. As a result, some configuration
>> : options can not be changed and made active with a simple RELOAD...
>> 
>> Directly below that sentence are bullet points listing exactly which 
>> config options can't be changed with a simple reload...
>> 
> * IndexWriter related settings in 
> *  location 
>> 
>> : http://wiki.apache.org/solr/CoreAdmin#RELOAD
>> 
>> 
>> -Hoss
> 



Basic auth on SolrCloud /admin/* calls

2013-03-28 Thread Vaillancourt, Tim
Hey guys,

I've recently setup basic auth under Jetty 8 for all my Solr 4.x '/admin/*' 
calls, in order to protect my Collections and Cores API.

Although the security constraint is working as expected ('/admin/*' calls 
require Basic Auth or return 401), when I use the Collections API to create a 
collection, I receive a 200 OK to the Collections API CREATE call, but the 
background Cores API calls that are ran on the Collection API's behalf fail on 
the Basic Auth on other nodes with a 401 code, as I should have foreseen, but 
didn't.

Is there a way to tell SolrCloud to use authentication on internal Cores API 
calls that are spawned on Collections API's behalf, or is this a new feature 
request?

To reproduce:

1.   Implement basic auth on '/admin/*' URIs.

2.   Perform a CREATE Collections API call to a node (which will return 200 
OK).

3.   Notice all Cores API calls fail (Collection isn't created). See stack 
trace below from the node that was issued the CREATE call.

The stack trace I get is:

"org.apache.solr.common.SolrException: Server at http://:8983/solr returned non ok status:401, 
message:Unauthorized
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:169)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)"

Cheers!

Tim




Re: Could not load config for solrconfig.xml

2013-03-28 Thread A. Lotfi
In windows when I hit Execute Query button I got this results :

0181truestreetname:mdwxmlMEADOW2501001ABN 1MD 262MEADOW2501001ABRM1MD 
472MEADOW2501001ADMS1MD 350

..
.

In Unix with same setup, I got this result :

02true*:*xml

 
I did not understand why .
thanks, your help is appreciated.



 From: Gora Mohanty 
To: solr-user@lucene.apache.org; A. Lotfi  
Sent: Thursday, March 28, 2013 4:40 PM
Subject: Re: Could not load config for solrconfig.xml
 
On 29 March 2013 01:59, A. Lotfi  wrote:
> Thanks,
> my path to solr home was missing something, it's worlking, but no results, 
> the same solr app with same configuration files worked in windows.

What do you mean by "no results"? Have you indexed stuff, and
are not able to search for it? Are you expecting to copy Solr files
from an old setup with an index, and have things work? That would
be OK, provided that the Solr index formats were compatible,
but you would also need to copy the index, and define dataDir
properly in solrconfig.xml.

Regards,
Gora

Re: Could not load config for solrconfig.xml

2013-03-28 Thread Gora Mohanty
On 29 March 2013 07:23, A. Lotfi  wrote:
> In windows when I hit Execute Query button I got this results :
[...]

There seem to be no documents in your Solr index on the
UNIX system. As I mentioned in my previous message, you
either need to copy the index files from the WIndows system
(provided that the Solr index format has not changed, this
will work), or reindex on the UNIX system.

Regards,
Gora


Re: Could not load config for solrconfig.xml

2013-03-28 Thread A. Lotfi
In Unix, in data/index there is :
segments.gen  20 B 3/28/2013   rw-r--r--
segments_1    45 B 3/28/2013 rw-r--r--

I don't know how this was generated , should I delete them from the directory, 
or from other place ?
If so, how to delete reindex on the UNIX system ?


thanks lot.
 


 From: Gora Mohanty 
To: A. Lotfi  
Cc: "solr-user@lucene.apache.org"  
Sent: Thursday, March 28, 2013 9:59 PM
Subject: Re: Could not load config for solrconfig.xml
 
On 29 March 2013 07:23, A. Lotfi  wrote:
> In windows when I hit Execute Query button I got this results :
[...]

There seem to be no documents in your Solr index on the
UNIX system. As I mentioned in my previous message, you
either need to copy the index files from the WIndows system
(provided that the Solr index format has not changed, this
will work), or reindex on the UNIX system.

Regards,
Gora