Latency Comparison between cloud hosting Vs Dedicated hosting

2013-04-09 Thread Sujatha Arun
Hi,

We are comparing search request latency between Amazon Vs  Dedicated
hosting [Rackspace] .For comparison we used solr version 3.6.1 and Amazon
small instance.The index size was less than 1GB.

We see that the latency is about 75 -100 %  from Amazon. Any body who has
migrated form Dedicated hosting to Cloud has any pointers  for improving
latecny?

Would a bigger instance improve latency?

Regards
Sujatha


Re: Indexed data not searchable

2013-04-09 Thread Max Bo
The XML files are formatted like this. I think there is the problem.


  

   
   T0084-00371-DOWNLOAD - Blatt 184r
   T0084-00371-DOWNLOAD
   application/pdf
   
 
   2012-11-08T00:09:57.531+01:00
  
2012-11-08T00:09:57.531+01:00
   2012-11-08T00:09:57.531+01:00
   ..




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 4.2.1 still has problems with index version and index generation

2013-04-09 Thread Bernd Fehling
Looking a bit deeper showed that replication?command=commit reports the
right indexversion, generation and filelist.

 
  1365357951589
  198
   
...

And with replication?command=details I also see the correct commit part as
above, BUT where the hell are the wrong info below the commit array are
coming from?
true
false
1365357774190
197

The command replication?command=filelist&generation=197
replies with
invalid index generation

Have a look into the sources:
Ahh, it is build in getReplicationDetails with:
details.add("isMaster", String.valueOf(isMaster));
details.add("isSlave", String.valueOf(isSlave));
long[] versionAndGeneration = getIndexVersion();
details.add("indexVersion", versionAndGeneration[0]);
details.add(GENERATION, versionAndGeneration[1]);

So getIndexVersion() gets a wrong version and generation, but why?
It first gets the searcher from the core and then tries to get
via the IndexReader the IndexCommit and then the commitData.

I think I should use remote debugging on master.
At least I now know that it is the master.

Regards
Bernd


Am 09.04.2013 08:35, schrieb Bernd Fehling:
> Hi Hoss,
> 
> we don't use autoCommit and autoSoftCommit.
> We don't use openSearcher.
> We don't use transaction log.
> 
> I can see it in the AdminGUI and with
> http://master_host:port/solr/replication?command=indexversion
> 
> All files are replicated from master to slave, nothing lost.
> It is just that the gen/version differs and breaks our cronjobs which
> worked since solr 2.x.
> As you mentioned, it seams that the latest commit is fetched.
> 
> Strange thing is,
> we start with a clean, empty index on master. With all commands we send
> a commit=true and, where applicable, an optimze=true.
> The master is always in state "optimzed and current" when replicating.
> 
> How can it be that the searcher on master is referring to an older commit 
> point
> if there is no such point. The logs show _AFTER_ the last optimize has 
> finished
> a new searcher is started and the old one is closed.
> 
> Also, we have replicateAfter startup, commit and optimize set but the AdminGUI
> and replication details only report replicateAfter commit and startup.
> Not really an error, but not that what is really set in the config!
> 
> Very strange, I will try the patch.
> 
> Regards
> Bernd
> 
> 
> Am 08.04.2013 20:12, schrieb Chris Hostetter:
>> : I know there was some effort to fix this but I must report
>> : that solr 4.2.1 has still problems with index version and
>> : index generation numbering in master/slave mode with replication.
>>  ...
>> : RESULT: slave has different (higher) version number and is with generation 
>> 1 ahead :-(
>>
>> Can you please provide more details...
>>
>> * are you using autocommit? with what settings?
>> * are you using openSearcher=false in any of your commits?
>> * where exactly are you looking that you see the master/slave out of sync?
>> * are you observing any actual problems, or just seeing that the 
>> gen/version are reported as different?
>>
>> As Joel mentioned, there is an open Jira related purely to the *display* 
>> of information about gen/version between master & slave, because in many 
>> cases the "searcher" in use on the master may refer to an older commit 
>> point, but it doesn't mean there is any actual problem in replication -- 
>> the slave is still fetching & seraching the latest commit from the master 
>> as intended
>>
>> https://issues.apache.org/jira/browse/SOLR-4661
>>
>>
>> -Hoss
>>

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Sub field indexing

2013-04-09 Thread Toke Eskildsen
On Tue, 2013-04-09 at 08:40 +0200, It-forum wrote:
> Le 08/04/2013 20:02, Toke Eskildsen a écrit :
> > compatible_engine:productZ/85 to get all products compatible with productZ, 
> > version 85
> > compatible_engine:productZ* to get all products compatible with any version 
> > of productZ.

Whoops, slash triggers regexts, so you probably need to search for
compatible_engine:"productZ/85"
or
compatible_engine:productZ\/85

- Toke



Re: Indexed data not searchable

2013-04-09 Thread Gora Mohanty
On 9 April 2013 13:10, Max Bo  wrote:
> The XML files are formatted like this. I think there is the problem.
[...]

Yes, to use curl to post to /solr/update you need to
have XML in the form described at
http://wiki.apache.org/solr/UpdateXmlMessages

Else, you can use  FileListEntityProcessor and
XPathEntityProcessor with FileDataSource from
the Solr DataImportHandler. Please see examples
at http://wiki.apache.org/solr/DataImportHandler

Regards,
Gora


Re: Empty Solr 4.2.1 can not create Collection

2013-04-09 Thread A.Eibner

Hi,
thanks for your faster answer.

You don't use the Collection API - may I ask you why ?
Therefore you have to setup everything (replicas, ...) manually..., 
which I would like to avoid.


Also what I don't understand, why my steps work in 4.0 but won't in 4.2.1...
Any clues ?

Kind Regards
Alexander

Am 2013-04-08 19:12, schrieb Joel Bernstein:

The steps that I use to setup the collection are slightly different:


1) Start zk and upconfig the config set. Your approach is same.
2) Start appservers with Solr zkHost set to the zk started in step 1.
3) Use a core admin command to spin up a new core and collection.


http://app01/solr/admin/cores?action=CREATE&name=storage-core&collection=storage&numShards=1&collection.configName=storage-conf
&shard=shard1

This will spin up the new collection and initial core. I'm not using a
replication factor because the following commands manually bind the
replicas.

4) Spin up replica with a core admin command:
http://app02/solr/admin/cores?action=CREATE&name=storage-core&collection=storage&;
shard=shard1

5) Same command as above on the 3rd server to spin up another replica.

This will spin up a new core and bind it to shard1 of the storage
collection.





On Mon, Apr 8, 2013 at 9:34 AM, A.Eibner  wrote:


Hi,

I have a problem with setting up my solr cloud environment (on three
machines).
If I want to create my collections from scratch I do the following:

*) Start ZooKeeper on all machines.

*) Upload the configuration (on app02) for the collection via the
following command:
 zkcli.sh -cmd upconfig --zkhost app01:4181,app02:4181,app03:**4181
--confdir config/solr/storage/conf/ --confname storage-conf

*) Linking the configuration (on app02) via the following command:
 zkcli.sh -cmd linkconfig --collection storage --confname storage-conf
--zkhost app01:4181,app02:4181,app03:**4181

*) Start Tomcats (containing Solr) on app02,app03

*) Create Collection via:
http://app03/solr/admin/**collections?action=CREATE&**
name=storage&numShards=1&**replicationFactor=2&**
collection.configName=storage-**conf

This creates the replication of the shard on app02 and app03, but neither
of them is marked as leader, both are marked as DOWN.
And after wards I can not access the collection.
In the browser I get:
"SEVERE: org.apache.solr.common.**SolrException: no servers hosting
shard:"

In the log files the following error is present:
SEVERE: Error from shard: app02:9985/solr
org.apache.solr.common.**SolrException: Error CREATEing SolrCore
'storage_shard1_replica1':
 at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**
HttpSolrServer.java:404)
 at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**
HttpSolrServer.java:181)
 at org.apache.solr.handler.**component.HttpShardHandler$1.**
call(HttpShardHandler.java:**172)
 at org.apache.solr.handler.**component.HttpShardHandler$1.**
call(HttpShardHandler.java:**135)
 at java.util.concurrent.**FutureTask$Sync.innerRun(**
FutureTask.java:334)
 at java.util.concurrent.**FutureTask.run(FutureTask.**java:166)
 at java.util.concurrent.**Executors$RunnableAdapter.**
call(Executors.java:471)
 at java.util.concurrent.**FutureTask$Sync.innerRun(**
FutureTask.java:334)
 at java.util.concurrent.**FutureTask.run(FutureTask.**java:166)
 at java.util.concurrent.**ThreadPoolExecutor.runWorker(**
ThreadPoolExecutor.java:1110)
 at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.**java:722)
Caused by: org.apache.solr.common.cloud.**ZooKeeperException:
 at org.apache.solr.core.**CoreContainer.registerInZk(**
CoreContainer.java:922)
 at org.apache.solr.core.**CoreContainer.registerCore(**
CoreContainer.java:892)
 at org.apache.solr.core.**CoreContainer.register(**
CoreContainer.java:841)
 at org.apache.solr.handler.admin.**CoreAdminHandler.**
handleCreateAction(**CoreAdminHandler.java:479)
 ... 19 more
Caused by: org.apache.solr.common.**SolrException: Error getting leader
from zk for shard shard1
 at org.apache.solr.cloud.**ZkController.getLeader(**
ZkController.java:864)
 at org.apache.solr.cloud.**ZkController.register(**
ZkController.java:776)
 at org.apache.solr.cloud.**ZkController.register(**
ZkController.java:727)
 at org.apache.solr.core.**CoreContainer.registerInZk(**
CoreContainer.java:908)
 ... 22 more
Caused by: java.lang.**InterruptedException: sleep interrupted
 at java.lang.Thread

Re: Empty Solr 4.2.1 can not create Collection

2013-04-09 Thread A.Eibner

Hi,

you are right, I have removed "collection1" from the solr.xml but set 
defaultCoreName="storage".


Also this works in 4.0 but won't in 4.2.1, any clues ?

Kind Regards
Alexander

Am 2013-04-08 20:06, schrieb Joel Bernstein:

The scenario above needs to have collection1 removed from the solr.xml to
work. This, I believe, is the "Empty Solr" scenario that you are talking
about. If you don't remove collection1 from solr.xml on all the solr
instances, they will get tripped up on collection1 during these steps.

If you startup with collection1 in solr.xml it's best to startup the
initial Solr instance with the bootstrap-conf parameter so Solr can
properly create this collection.


On Mon, Apr 8, 2013 at 1:12 PM, Joel Bernstein  wrote:


The steps that I use to setup the collection are slightly different:


1) Start zk and upconfig the config set. Your approach is same.
2) Start appservers with Solr zkHost set to the zk started in step 1.
3) Use a core admin command to spin up a new core and collection.


http://app01/solr/admin/cores?action=CREATE&name=storage-core&collection=storage&numShards=1&collection.configName=storage-conf
&shard=shard1

This will spin up the new collection and initial core. I'm not using a
replication factor because the following commands manually bind the
replicas.

4) Spin up replica with a core admin command:

http://app02/solr/admin/cores?action=CREATE&name=storage-core&collection=storage&;
shard=shard1

5) Same command as above on the 3rd server to spin up another replica.

This will spin up a new core and bind it to shard1 of the storage
collection.





On Mon, Apr 8, 2013 at 9:34 AM, A.Eibner  wrote:


Hi,

I have a problem with setting up my solr cloud environment (on three
machines).
If I want to create my collections from scratch I do the following:

*) Start ZooKeeper on all machines.

*) Upload the configuration (on app02) for the collection via the
following command:
 zkcli.sh -cmd upconfig --zkhost app01:4181,app02:4181,app03:**4181
--confdir config/solr/storage/conf/ --confname storage-conf

*) Linking the configuration (on app02) via the following command:
 zkcli.sh -cmd linkconfig --collection storage --confname storage-conf
--zkhost app01:4181,app02:4181,app03:**4181

*) Start Tomcats (containing Solr) on app02,app03

*) Create Collection via:
http://app03/solr/admin/**collections?action=CREATE&**
name=storage&numShards=1&**replicationFactor=2&**
collection.configName=storage-**conf

This creates the replication of the shard on app02 and app03, but neither
of them is marked as leader, both are marked as DOWN.
And after wards I can not access the collection.
In the browser I get:
"SEVERE: org.apache.solr.common.**SolrException: no servers hosting
shard:"

In the log files the following error is present:
SEVERE: Error from shard: app02:9985/solr
org.apache.solr.common.**SolrException: Error CREATEing SolrCore
'storage_shard1_replica1':
 at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**
HttpSolrServer.java:404)
 at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**
HttpSolrServer.java:181)
 at org.apache.solr.handler.**component.HttpShardHandler$1.**
call(HttpShardHandler.java:**172)
 at org.apache.solr.handler.**component.HttpShardHandler$1.**
call(HttpShardHandler.java:**135)
 at java.util.concurrent.**FutureTask$Sync.innerRun(**
FutureTask.java:334)
 at java.util.concurrent.**FutureTask.run(FutureTask.**java:166)
 at java.util.concurrent.**Executors$RunnableAdapter.**
call(Executors.java:471)
 at java.util.concurrent.**FutureTask$Sync.innerRun(**
FutureTask.java:334)
 at java.util.concurrent.**FutureTask.run(FutureTask.**java:166)
 at java.util.concurrent.**ThreadPoolExecutor.runWorker(**
ThreadPoolExecutor.java:1110)
 at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.**java:722)
Caused by: org.apache.solr.common.cloud.**ZooKeeperException:
 at org.apache.solr.core.**CoreContainer.registerInZk(**
CoreContainer.java:922)
 at org.apache.solr.core.**CoreContainer.registerCore(**
CoreContainer.java:892)
 at org.apache.solr.core.**CoreContainer.register(**
CoreContainer.java:841)
 at org.apache.solr.handler.admin.**CoreAdminHandler.**
handleCreateAction(**CoreAdminHandler.java:479)
 ... 19 more
Caused by: org.apache.solr.common.**SolrException: Error getting leader
from zk for shard shard1
 at org.apache.solr.cloud.**ZkController.getLead

Average Solr Server Spec.

2013-04-09 Thread Furkan KAMACI
This question may not have a generel answer and may be open ended but is
there any commodity server spec. for a usual Solr running machine? I mean
what is the average server spesification for a Solr machine (i.e. Hadoop
running system it is not recommended to have very big storage capably
computers.) I will use Solr for indexing web crawled data.


Re: SOLR-4581

2013-04-09 Thread Shalin Shekhar Mangar
Hi Alexander,

I have put up a test case reproducing your issue. Perhaps someone more
familiar with faceting code can debug this.

For now, you can workaround this issue by adding facet.method=fc on your
queries.


On Mon, Apr 8, 2013 at 2:14 PM, Alexander Buhr  wrote:

> Hello,
>
> I created
> https://issues.apache.org/jira/browse/SOLR-4581
> on 14.03.2013. Can anyone help me out with this?
> Thank You.
>
> Alexander Buhr
> Software Engineer
>
> ePages GmbH
> Pilatuspool 2
> 20355 Hamburg
> Germany
>
> +49-40-350 188-266 phone
> +49-40-350 188-222 fax
>
> a.b...@epages.com
> www.epages.com
> www.epages.com/blog
> www.epages.com/twitter
> www.epages.com/facebook
>
> e-commerce. now plug&play.
>
> Geschäftsführer: Wilfried Beeck
> Handelsregister: Amtsgericht Hamburg HRB 120861
> Sitz der Gesellschaft: Pilatuspool 2, 20355 Hamburg
> Steuernummer: 48/718/02195
> USt-Ident.-Nr.: DE 282 947 700
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Doc Transformer with SolrDocumentList object

2013-04-09 Thread neha yadav
I am trying to modify the results of solr output . basically I need to
change the ranking of the output of solr for a query.

So please can anyone help.

I wrote a java code that returns the SolrDocumentList object which is a
union of the results  I want this object to be displayed on solr.

hats is once the query is hit. The solr runs the java code i wrote and the
output returned in the java code gets as a output to the screen .


I have tried to use the code as a data transformer. But I am getting this
error:


org.apache.solr.handler.dataimport.SolrWriter upload
WARNING: Error creating document : SolrInputDocument[id=44,
category=Apparel & Fash Accessories, _version_=1431753044032225280,
price=ERROR:SCHEMA-INDEX-MISMATC
H,stringValue=1400, description=for girls, brand=Wrangler,
price_c=1400,USD, siz
e=ERROR:SCHEMA-INDEX-MISMATCH,stringValue=12]
org.apache.solr.common.SolrException: version conflict for 44
expected=143175304
4032225280 actual=-1


Please can anyone help ?


Re: conditional queries?

2013-04-09 Thread Koji Sekiguchi

Hi Mark,

> Is it possible to do a conditional query if another query has no results?  For example, say I 
want to search against a given field for:


- Search for "car".  If there are results, return them.
- Else, search for "car*" .  If there are results, return them.
- Else, search for "car~" .  If there are results, return them.

Is this possible in one query?  Or would I need to make 3 separate queries by 
implementing this logic within my client?


As far as I know, there is no such SearchComponent.
But the idea of "FallbackRequestHandler" has been told, see SOLR-1878, for 
example:

https://issues.apache.org/jira/browse/SOLR-1878

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html


How to configure shards with SSL?

2013-04-09 Thread eShard
Good morning everyone,
I'm running solr 4.0 Final with ManifoldCF v1.2dev on tomcat 7.0.37 and I
had shards up and running on http but when I migrated to SSL it won't work
anymore.
First I got an IO Exception but then I changed my configuration in
solrconfig.xml to this:
   
 
   explicit
   xml
   true
   *:*

id, solr.title, content, category, link, pubdateiso

dev:7443/solr/ProfilesJava/|dev:7443/solr/C3Files/|dev:7443/solr/Blogs/|dev:7443/solr/Communities/|dev:7443/solr/Wikis/|dev:7443/solr/Bedeworks/|dev:7443/solr/Forums/|dev:7443/solr/Web/|dev:7443/solr/Bookmarks/
 

 
 
https://
1000
5000
   

  

And Now I'm getting this error:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request:
How do I configure shards with SSL?
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-configure-shards-with-SSL-tp4054735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Latency Comparison between cloud hosting Vs Dedicated hosting

2013-04-09 Thread Michael Della Bitta
On Tue, Apr 9, 2013 at 3:33 AM, Sujatha Arun  wrote:
> Would a bigger instance improve latency?

Yes, and prewarming caches would help, too.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


Re: Best practice for rebuild index in SolrCloud

2013-04-09 Thread Michael Della Bitta
We're setting up two collection aliases. One's a read alias, one's a
write alias.

When we need to start over with a new collection, we create the
collection alongside the original, and point the write alias at it.

When indexing is done, we point the read alias at it.

Then you can delete the old collection when you feel good about the new one.

Obviously this means that none of your clients should point at the
collection directly, but rather one of the aliases depending on
whether they're reading or writing.

HTH,

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Apr 8, 2013 at 5:45 PM, Bill Au  wrote:
> We are using SolrCloud for replication and dynamic scaling but not
> distribution so we are only using a single shard.  From time to time we
> make changes to the index schema that requires rebuilding of the index.
>
> Should I treat the rebuilding as just any other index operation?  It seems
> to me it would be better if I can somehow take a node "offline" and rebuild
> the index there, then put it back online and let the new index be
> replicated from there.  But I am not sure how to do the latter.
>
> Bill


Re: conditional queries?

2013-04-09 Thread Walter Underwood
We do this on the client side with multiple queries. It is fairly efficient, 
because most responses are from the first, exact query.

wunder

On Apr 9, 2013, at 6:15 AM, Koji Sekiguchi wrote:

> Hi Mark,
> 
> > Is it possible to do a conditional query if another query has no results?  
> > For example, say I want to search against a given field for:
>> 
>> - Search for "car".  If there are results, return them.
>> - Else, search for "car*" .  If there are results, return them.
>> - Else, search for "car~" .  If there are results, return them.
>> 
>> Is this possible in one query?  Or would I need to make 3 separate queries 
>> by implementing this logic within my client?
> 
> As far as I know, there is no such SearchComponent.
> But the idea of "FallbackRequestHandler" has been told, see SOLR-1878, for 
> example:
> 
> https://issues.apache.org/jira/browse/SOLR-1878
> 
> koji
> -- 
> http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html







Execution of Queries in Parallel: geotagged textual documents in Solrvvvv

2013-04-09 Thread Massimiliano Ruocco
I have around 100M of textual document geotagged (lat,long). THese 
documents are indexed with Solr 1.4. I am testing a retrieval model 
(written over Terrier). This model requires frequent execution of 
queries ( Bounding-box filter). These queries could be executed in 
parallel, one for each specific geographic tile.


I was wondering if exists a solution speeding up the execution of 
queries in parallel. My naif idea is Split the index in many parts 
according the geographical tiles (how to do that? SolrCloud? Solr Index 
Replication? What is the max number of eventual replication?)


Any practical further suggestion?

Thanks in advance

Massimiliano



Re: Search data who does not have "x" field

2013-04-09 Thread Victor Ruiz
Sorry, I didnt explain my self good, I mean , you have to create an
additional field 'hasCategory' in your schema, and then, before indexing,
set the field 'hasCategory' in the indexed document as true, if your
document has categories, or set it to false, if it has any. With this you
will save computation time, since the query for a boolean field is much
easier for Solr than checking for an empty string field. 

The query should be => q=*:*&fq=hasCategory:true 


anurag.jain wrote
> "another solution would be to add a boolean field, hasCategory, and use it
> for filtering 
> q=
> 
> &fq=hasCategory:true "
> 
> 
> I am not getting result.
> 
> 
> i am trying
> 
> localhost:8983/search?q=*:*&fq=category:true
> 
> it is giving zero result.
> 
> by the way first technique is working fine.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-data-who-does-not-have-x-field-tp4046959p4054763.html
Sent from the Solr - User mailing list archive at Nabble.com.


corrupted index in slave?

2013-04-09 Thread Victor Ruiz
Hi guys,

I'm getting exceptions in a Solr slave, when accessing TermVector component
and RealTimeGetHandler. The weird thing is, that in the master and in one of
the 2 slaves, the documents are ok, and the same query doesnt return any
exception. For now, the only way I have to solve the problem is deleting
these documents and indexing them again.

I upgraded Solr from 4.0 directly to 4.2, then to 4.2.1 last week These
exceptions seems to appear since the upgrade to 4.2.
I didn't run the script for migrating the index files (as I did in the
migration from 3.6 to 4.0), should I? Has the format of the index changed?
If not, is that a known bug? If it's, sorry I couldn't find it in JIRA.

These are the exceptions I get:

{"responseHeader":{"status":500,"QTime":1},"response":{"numFound":1,"start":0,"docs":[{"itemid":"105266867","text":"exklusiver
kann man kaum würzen  safran ist das teuerste gewürz der welt handverlesen
und in mühevoller kleinstarbeit hergestellt ist safran sehr selten und wird
in winzigen mengen gehandelt und
verwendet","title":"safran","domainid":4287,"date_i":"2012-11-21T17:01:23Z","date":"2012-11-21T17:01:09Z","category":["kultur","literatur","gesellschaft","umwelt","trinken","essen"]}]},"termVectors":["uniqueKeyFieldName","itemid","105266867",["uniqueKey","105266867"]],"error":{"trace":"java.lang.ArrayIndexOutOfBoundsException\n\tat
org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)\n\tat
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)\n\tat
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(CompressingTermVectorsReader.java:493)\n\tat
org.apache.lucene.index.SegmentReader.getTermVectors(SegmentReader.java:175)\n\tat
org.apache.lucene.index.BaseCompositeReader.getTermVectors(BaseCompositeReader.java:97)\n\tat
org.apache.lucene.index.IndexReader.getTermVector(IndexReader.java:385)\n\tat
org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:313)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)\n\tat
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)\n\tat
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)\n\tat
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)\n\tat
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)\n\tat
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)\n\tat
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)\n\tat
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)\n\tat
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)\n\tat
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)\n\tat
org.mortbay.jetty.Server.handle(Server.java:326)\n\tat
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)\n\tat
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)\n\tat
org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)\n\tat
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)\n\tat
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)\n\tat
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)\n\tat
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)\n","code":500}}


{"error":{"trace":"java.lang.ArrayIndexOutOfBoundsException\n\tat
org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)\n\tat
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)\n\tat
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:258)\n\tat
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:139)\n\tat
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:116)\n\tat
org.apache.lucene.index.IndexReader.document(IndexReader.java:436)\n\tat
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:640)\n\tat
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:568)\n\tat
org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:176)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)\n\ta

Re: corrupted index in slave?

2013-04-09 Thread Victor Ruiz
sorry I forgot to say, the exceptions are not for every document, but only
for a few...

regards,
Victor

Victor Ruiz wrote
> Hi guys,
> 
> I'm getting exceptions in a Solr slave, when accessing TermVector
> component and RealTimeGetHandler. The weird thing is, that in the master
> and in one of the 2 slaves, the documents are ok, and the same query
> doesnt return any exception. For now, the only way I have to solve the
> problem is deleting these documents and indexing them again.
> 
> I upgraded Solr from 4.0 directly to 4.2, then to 4.2.1 last week These
> exceptions seems to appear since the upgrade to 4.2.
> I didn't run the script for migrating the index files (as I did in the
> migration from 3.6 to 4.0), should I? Has the format of the index changed?
> If not, is that a known bug? If it's, sorry I couldn't find it in JIRA.
> 
> These are the exceptions I get:
> 
> {"responseHeader":{"status":500,"QTime":1},"response":{"numFound":1,"start":0,"docs":[{"itemid":"105266867","text":"exklusiver
> kann man kaum würzen  safran ist das teuerste gewürz der welt handverlesen
> und in mühevoller kleinstarbeit hergestellt ist safran sehr selten und
> wird in winzigen mengen gehandelt und
> verwendet","title":"safran","domainid":4287,"date_i":"2012-11-21T17:01:23Z","date":"2012-11-21T17:01:09Z","category":["kultur","literatur","gesellschaft","umwelt","trinken","essen"]}]},"termVectors":["uniqueKeyFieldName","itemid","105266867",["uniqueKey","105266867"]],"error":{"trace":"java.lang.ArrayIndexOutOfBoundsException\n\tat
> org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)\n\tat
> org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)\n\tat
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(CompressingTermVectorsReader.java:493)\n\tat
> org.apache.lucene.index.SegmentReader.getTermVectors(SegmentReader.java:175)\n\tat
> org.apache.lucene.index.BaseCompositeReader.getTermVectors(BaseCompositeReader.java:97)\n\tat
> org.apache.lucene.index.IndexReader.getTermVector(IndexReader.java:385)\n\tat
> org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:313)\n\tat
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)\n\tat
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)\n\tat
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)\n\tat
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)\n\tat
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)\n\tat
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)\n\tat
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)\n\tat
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)\n\tat
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)\n\tat
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)\n\tat
> org.mortbay.jetty.Server.handle(Server.java:326)\n\tat
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)\n\tat
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)\n\tat
> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)\n\tat
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)\n\tat
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)\n\tat
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)\n\tat
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)\n","code":500}}
> 
> 
> {"error":{"trace":"java.lang.ArrayIndexOutOfBoundsException\n\tat
> org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)\n\tat
> org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)\n\tat
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:258)\n\tat
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:139)\n\tat
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:116)\n\tat
> org.apache.lucene.index.IndexReader.document(IndexReader.java:436)\n\tat
> org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:640)\n\tat
> org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:568)\n\tat
> org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:176)\n\tat

Re: solr 4.2.1 still has problems with index version and index generation

2013-04-09 Thread Chris Hostetter

: And with replication?command=details I also see the correct commit part as
: above, BUT where the hell are the wrong info below the commit array are
: coming from?

Please read the details in the previously mentioned Jira issue...

https://issues.apache.org/jira/browse/SOLR-4661

The indexVersion and generation you are looking at refer to the speciics 
of the IndexReader as used by the *seracher* on the master server -- but 
in addition to situations like openSearcher=false, there are some 
optimizations in place such that Solr/Lucene is smart enough to realize 
when an "empty commit" doesn't change the IndexReader it continues to use 
the previous commit point...

https://issues.apache.org/jira/browse/SOLR-4661?focusedCommentId=13620195&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13620195

...but from the perspective of the slave, this is still a commit that 
needs replicated and loaded.

Hence the current objective of the patch in SOLR-3855: add more details to 
the command=details response (as well as the Admin UI) to clearly 
distinguish between the gen/ver of the currently replicatable commit and 
the gen/ver of the currently open searcher.

All available information suggests that this is purely a problem of 
conveying information to users via command=details -- Replication is 
behaving as designed using the correct information about hte commit 
points.



-Hoss


How can I set configuration options?

2013-04-09 Thread Edd Grant
Hi all,

I have been working through the examples on the SolrCloud page:
http://wiki.apache.org/solr/SolrCloud

I am now at the point where, rather than firing up Solr through start.jar,
I'm deploying the Solr war in to Tomcat instances. Taking the following
command as an example:

java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2
-jar start.jar

I can't figure out from the documentation how/ where I set the above
properties when deploying Solr as a war file. I initially thought these
might be configurable through solr.xml but can't find anything in the
documentation to support this.

Most grateful for any pointers here.

Cheers,

Edd
-- 
Web: http://www.eddgrant.com
Email: e...@eddgrant.com
Mobile: +44 (0) 7861 394 543


Re: conditional queries?

2013-04-09 Thread Miguel
I not sure, but you can create a class extend of SearchComponent and 
include at the least of your requesthandler and in this way add optional 
actions about whatever query on your solr server.

Example solrconfig.xml
 
  actions

 

 

  

Regars

El 09/04/2013 17:05, Walter Underwood escribió:

We do this on the client side with multiple queries. It is fairly efficient, 
because most responses are from the first, exact query.

wunder

On Apr 9, 2013, at 6:15 AM, Koji Sekiguchi wrote:


Hi Mark,


Is it possible to do a conditional query if another query has no results?  For 
example, say I want to search against a given field for:

- Search for "car".  If there are results, return them.
- Else, search for "car*" .  If there are results, return them.
- Else, search for "car~" .  If there are results, return them.

Is this possible in one query?  Or would I need to make 3 separate queries by 
implementing this logic within my client?

As far as I know, there is no such SearchComponent.
But the idea of "FallbackRequestHandler" has been told, see SOLR-1878, for 
example:

https://issues.apache.org/jira/browse/SOLR-1878

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html










Index Replication Failing in Solr 4.2.1

2013-04-09 Thread Umesh Prasad
Hi All,
  I am migrating from Solr 3.5.0 to Solr 4.2.1. And everything is running
fine and set to go, except the master slave replication.

We use master slave replication with multi cores ( 1 master, 10 slaves and
20 plus cores).

My Configuration is :

Master :  Solr 3.5.0,  Has existing index, and delta import running using
DIH.
Slave : Solr 4.2.1 ,  Has no startup index


Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute
INFO: [phcare] webapp= path=/replication
params={command=fetchindex&_=1365522520521&wt=json} status=0 QTime=1
Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
*INFO: Master's generation: 107876
*Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
*INFO: Slave's generation: 79248
*Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
*Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchFileList
SEVERE: No files to download for index generation: 107876
*Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute
INFO: [phcare] webapp= path=/replication
params={command=details&_=1365522520556&wt=json} status=0 QTime=7

In Both Master and Slave The File list for replicable version is correct.
*on Slave *

{

   - masterDetails: {
  - indexSize: "4.31 MB",
  - indexPath:
  "/var/lib/fk-w3-sherlock/cores/phcare/data/index.20130124235012",
  - commits: [
 - [
- "indexVersion",
- 1323961124638,
- "generation",
- 107856,
- "filelist",
- [
   - "_45e1.tii",
   - "_45e1.nrm",
   -

..


*ON Master
*
[

   - "indexVersion",
   - 1323961124638,
   - "generation",
   - 107856,
   - "filelist",
   - [
  - "_45e1.tii",
  - "_45e1.nrm",
  - "_45e2_1.del",
  - "_45e2.frq",
  - "_45e1_3.del",
  - "_45e1.tis",
  - ..



Can someone help. Our whole Migration to Solr 4.2 is blocked on Replication
issue.

---
Thanks & Regards
Umesh Prasad


SolrCloud: Result Grouping - no groups with field type with precisionStep > 0

2013-04-09 Thread Elodie Sannier

Hello,

I am using the Result Grouping feature with SolrCloud, and it seems that
grouping does not work with field types having precisionStep property
greater than 0, in distributed mode.

I updated the "SolrCloud - Getting Started" page example A (Simple two
shard cluster).
In my schema.xml, the "popularity" field has an "int" type where I
changed precisionStep from 0 to 4 :




When I'm requesting in distributed mode, the grouping on this field does
not return groups :
http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity&distrib=true



1


0






When I'm requesting on a single core, the grouping on this field returns
a group :
http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity&distrib=false


10


MA147LL/A
...
10
...




If I come back to the origin configuration, changing the "int" type with
precisionStep="0", the distributed request works :


The precisionStep > 0 can be useful for range queries but is it normal
that it is not compatible with grouping queries, in distributed mode only ?

Elodie Sannier

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Execution of Queries in Parallel: geotagged textual documents in Solrvvvv

2013-04-09 Thread Otis Gospodnetic
Hi,

I'd move to SolrCloud 4.2.1 to benefit from sharding, replication, and
the latest Lucene.  How many queries you will then be able to run in
parallel will depend on their complexity, index size, query
cachability, index size, latency requirements... But move to the
latest setup first.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html





On Tue, Apr 9, 2013 at 11:10 AM, Massimiliano Ruocco  wrote:
> I have around 100M of textual document geotagged (lat,long). THese documents
> are indexed with Solr 1.4. I am testing a retrieval model (written over
> Terrier). This model requires frequent execution of queries ( Bounding-box
> filter). These queries could be executed in parallel, one for each specific
> geographic tile.
>
> I was wondering if exists a solution speeding up the execution of queries in
> parallel. My naif idea is Split the index in many parts according the
> geographical tiles (how to do that? SolrCloud? Solr Index Replication? What
> is the max number of eventual replication?)
>
> Any practical further suggestion?
>
> Thanks in advance
>
> Massimiliano
>


Re: Latency Comparison between cloud hosting Vs Dedicated hosting

2013-04-09 Thread Otis Gospodnetic
Hi Sujatha,

You should really do the same stuff to improve latency in the cloud as
what you would do on a dedicated server.
Amazon-specific stuff:
Bigger EC2 instances have better IO.  EBS performance varies.  Some
people mount N of them and stripe across them.  Some people try N EBS
volumes to find the best performing one(s) and discard the rest.  Some
people pay for provisioned IOPS.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html





On Tue, Apr 9, 2013 at 3:33 AM, Sujatha Arun  wrote:
> Hi,
>
> We are comparing search request latency between Amazon Vs  Dedicated
> hosting [Rackspace] .For comparison we used solr version 3.6.1 and Amazon
> small instance.The index size was less than 1GB.
>
> We see that the latency is about 75 -100 %  from Amazon. Any body who has
> migrated form Dedicated hosting to Cloud has any pointers  for improving
> latecny?
>
> Would a bigger instance improve latency?
>
> Regards
> Sujatha


Solr 4.2.1 SSLInitializationException

2013-04-09 Thread Sarita Nair
Hi All,

Deploying Solr 4.2.1 to GlassFish 3.1.1 results in the error below.  I have 
seen similar problems being reported with Solr 4.2

and my take-away was that 4.2.1 contains the necessary fix.

Any help with this will be appreciated.

Thanks!


    2013-04-09 10:45:06,144 [main] ERROR 
org.apache.solr.servlet.SolrDispatchFilter - Could not start Solr. Check 
solr/home property and the logs
    2013-04-09 10:45:06,224 [main] ERROR 
org.apache.solr.core.SolrCore - 
null:org.apache.http.conn.ssl.SSLInitializationException: Failure 
initializing default system SSL context
    Caused by: java.io.IOException: Keystore was tampered with, or password was 
incorrect
  at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:772)
    at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:55)
    at java.security.KeyStore.load(KeyStore.java:1214)
    at
 
org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:281)
    at 
org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:366)
 
    ... 50 more
Caused by: java.security.UnrecoverableKeyException: Password verification failed
    at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:770)

Re: Solr 4.2.1 SSLInitializationException

2013-04-09 Thread Chris Hostetter

: Deploying Solr 4.2.1 to GlassFish 3.1.1 results in the error below.  I 
: have seen similar problems being reported with Solr 4.2

Are you trying to use server SSL with glassfish?

can you please post the full stack trace so we can see where this error is 
coming from.

My best guess is that this is coming from the changes made in 
SOLR-4451 to use system defaults correctly when initializing HttpClient, 
which suggets that your problem is exactly what the error message says...

  "Keystore was tampered with, or password was incorrect"

Is it possible that the default keystore password for your JVM (or as 
overridden by glassfish defaults - possibly using the 
"javax.net.ssl.keyStore" sysprop) has a password set on it?  If so you 
need to confiure your JVM with the standard java system properties to 
specify what that password is.

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3c1364232676233-4051159.p...@n3.nabble.com%3E

:     2013-04-09 10:45:06,144 [main] ERROR 
: org.apache.solr.servlet.SolrDispatchFilter - Could not start Solr. Check 
solr/home property and the logs
:     2013-04-09 10:45:06,224 [main] ERROR 
: org.apache.solr.core.SolrCore - 
: null:org.apache.http.conn.ssl.SSLInitializationException: Failure 
: initializing default system SSL context
:     Caused by: java.io.IOException: Keystore was tampered with, or password 
was incorrect
:   at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:772)
:     at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:55)
:     at java.security.KeyStore.load(KeyStore.java:1214)
:     at
:  
org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:281)
:     at 
org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:366)
 
:     ... 50 more
: Caused by: java.security.UnrecoverableKeyException: Password verification 
failed
:     at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:770)

-Hoss

Re: Execution of Queries in Parallel: geotagged textual documents in Solrvvvv

2013-04-09 Thread Chris Hostetter

: I'd move to SolrCloud 4.2.1 to benefit from sharding, replication, and
: the latest Lucene.  How many queries you will then be able to run in
: parallel will depend on their complexity, index size, query
: cachability, index size, latency requirements... But move to the
: latest setup first.

No to mention thta geospatial query support is vastly improved in Solr 4.x 
vs what was possible in Solr 1.4.

-Hoss


query regarding the use of boost across the fields in edismax query

2013-04-09 Thread Rohan Thakur
hi all

wanted to know what could be the difference between the results if I apply
boost accross say 5 fields in query like for

first: title^10.0 features^7.0 cat^5.0 color^3.0 root^1.0 and
second settings like : title^10.0 features^5.0 cat^3.0 color^2.0 root^1.0

what could be the difference as in the weights are in same order decreasing?

thanks in advance

regards
Rohan


Re: query regarding the use of boost across the fields in edismax query

2013-04-09 Thread Otis Gospodnetic
Not sure if i'm missing something but in the first case features, cat,
and color field have more weight, so matches on them with have bigger
contribution to the overall relevancy score.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Tue, Apr 9, 2013 at 1:52 PM, Rohan Thakur  wrote:
> hi all
>
> wanted to know what could be the difference between the results if I apply
> boost accross say 5 fields in query like for
>
> first: title^10.0 features^7.0 cat^5.0 color^3.0 root^1.0 and
> second settings like : title^10.0 features^5.0 cat^3.0 color^2.0 root^1.0
>
> what could be the difference as in the weights are in same order decreasing?
>
> thanks in advance
>
> regards
> Rohan


Re: Average Solr Server Spec.

2013-04-09 Thread Otis Gospodnetic
Hi,

You are right there is no average.  I saw a Solr cluster with a
few EC2 micro instances yesterday and regularly see Solr running on 16
or 32 GB RAM and sometimes well over 100 GB RAM.  Sometimes they have
just 2 CPU cores, sometimes 32 or more.  Some use SSDs, some HDDs,
some local storage, some SAN, some EBS on AWS. etc.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Tue, Apr 9, 2013 at 7:04 AM, Furkan KAMACI  wrote:
> This question may not have a generel answer and may be open ended but is
> there any commodity server spec. for a usual Solr running machine? I mean
> what is the average server spesification for a Solr machine (i.e. Hadoop
> running system it is not recommended to have very big storage capably
> computers.) I will use Solr for indexing web crawled data.


Re: Average Solr Server Spec.

2013-04-09 Thread Walter Underwood
We mostly run m1.xlarge with an 8GB heap. --wunder

On Apr 9, 2013, at 10:57 AM, Otis Gospodnetic wrote:

> Hi,
> 
> You are right there is no average.  I saw a Solr cluster with a
> few EC2 micro instances yesterday and regularly see Solr running on 16
> or 32 GB RAM and sometimes well over 100 GB RAM.  Sometimes they have
> just 2 CPU cores, sometimes 32 or more.  Some use SSDs, some HDDs,
> some local storage, some SAN, some EBS on AWS. etc.
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> 
> 
> 
> 
> On Tue, Apr 9, 2013 at 7:04 AM, Furkan KAMACI  wrote:
>> This question may not have a generel answer and may be open ended but is
>> there any commodity server spec. for a usual Solr running machine? I mean
>> what is the average server spesification for a Solr machine (i.e. Hadoop
>> running system it is not recommended to have very big storage capably
>> computers.) I will use Solr for indexing web crawled data.







Results Order When Performing Wildcard Query

2013-04-09 Thread P Williams
Hi,

I wrote a test of my application which revealed a Solr oddity (I think).
 The test which I wrote on Windows 7 and makes use of the
solr-test-framework
fails
under Ubuntu 12.04 because the Solr results I expected for a wildcard query
of the test data are ordered differently under Ubuntu than Windows.  On
both Windows and Ubuntu all items in the result set have a score of 1.0 and
appear to be ordered by docid (which looks like in corresponds to
alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
root of my issue is that a different docid was assigned to the same
document on each operating system.

The data was imported using a DataImportHandler configuration during a
@BeforeClass step in my JUnit test on both systems.

Any suggestions on how to ensure a consistently ordered wildcard query
result set for testing?

Thanks,
Tricia


Re: How can I set configuration options?

2013-04-09 Thread Nate Fox
In Ubuntu, I've added it to /etc/default/tomcat7 in the JAVA_OPTS options.

For example, I have:
JAVA_OPTS="-Djava.awt.headless=true -Xmx2048m -XX:+UseConcMarkSweepGC"
JAVA_OPTS="${JAVA_OPTS} -DnumShards=2 -Djetty.port=8080
-DzkHost=zookeeper01.dev.:2181 -Dboostrap_conf=true"



--
Nate Fox
Sr Systems Engineer

o: 310.658.5775
m: 714.248.5350

Follow us @NEOGOV  and on
Facebook

NEOGOV  is among the top fastest growing software
companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500, and
the LA Business Journal. We are hiring!



On Tue, Apr 9, 2013 at 8:55 AM, Edd Grant  wrote:

> Hi all,
>
> I have been working through the examples on the SolrCloud page:
> http://wiki.apache.org/solr/SolrCloud
>
> I am now at the point where, rather than firing up Solr through start.jar,
> I'm deploying the Solr war in to Tomcat instances. Taking the following
> command as an example:
>
> java -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf -DzkRun
> -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2
> -jar start.jar
>
> I can't figure out from the documentation how/ where I set the above
> properties when deploying Solr as a war file. I initially thought these
> might be configurable through solr.xml but can't find anything in the
> documentation to support this.
>
> Most grateful for any pointers here.
>
> Cheers,
>
> Edd
> --
> Web: http://www.eddgrant.com
> Email: e...@eddgrant.com
> Mobile: +44 (0) 7861 394 543
>


Re: How can I set configuration options?

2013-04-09 Thread Furkan KAMACI
Hi Edd;

The parameters you mentioned are JVM parameters. There are two ways to
define them.
First one is if you are using an IDE you can indicate them as JVM
parameters. i.e. if you are using Intellij IDEA when you click your
Run/Debug configurations there is a line called VM Options. You can write
your paramters without writing "java" word in front of them.

Second one is deploying your war file into Tomcat without using an IDE (I
think this is what you want). Here is what to do:

Go to tomcat home folder and under the bin folder create a file called
setenv.sh Then add that lines:

#!/bin/sh
#
#
export JAVA_OPTS="$JAVA_OPTS
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2"



2013/4/9 Edd Grant 

> Hi all,
>
> I have been working through the examples on the SolrCloud page:
> http://wiki.apache.org/solr/SolrCloud
>
> I am now at the point where, rather than firing up Solr through start.jar,
> I'm deploying the Solr war in to Tomcat instances. Taking the following
> command as an example:
>
> java -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf -DzkRun
> -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2
> -jar start.jar
>
> I can't figure out from the documentation how/ where I set the above
> properties when deploying Solr as a war file. I initially thought these
> might be configurable through solr.xml but can't find anything in the
> documentation to support this.
>
> Most grateful for any pointers here.
>
> Cheers,
>
> Edd
> --
> Web: http://www.eddgrant.com
> Email: e...@eddgrant.com
> Mobile: +44 (0) 7861 394 543
>


Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-09 Thread Otis Gospodnetic
You may also be interested in looking at things like solrbase (on Github).

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI  wrote:
> Hi;
>
> First of all should mention that I am new to Solr and making a research
> about it. What I am trying to do that I will crawl some websites with Nutch
> and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )
>
> I wonder about something. I have a cloud of machines that crawls websites
> and stores that documents. Then I send that documents into SolrCloud. Solr
> indexes that documents and generates indexes and save them. I know that
> from Information Retrieval theory: it *may* not be efficient to store
> indexes at a NoSQL database (they are something like linked lists and if
> you store them in such kind of database you *may* have a sparse
> representation -by the way there may be some solutions for it. If you
> explain them you are welcome.)
>
> However Solr stores some documents too (i.e. highlights) So some of my
> documents will be doubled somehow. If I consider that I will have many
> documents, that dobuled documents may cause a problem for me. So is there
> any way not storing that documents at Solr and pointing to them at
> Hbase(where I save my crawled documents) or instead of pointing directly
> storing them at Hbase (is it efficient or not)?


Re: Average Solr Server Spec.

2013-04-09 Thread Furkan KAMACI
Hi Walter;

Could I learn that what is the average size of Solr indexes and average
query per second to your Solr. Maybe I can come up with an assumption?

2013/4/9 Walter Underwood 

> We mostly run m1.xlarge with an 8GB heap. --wunder
>
> On Apr 9, 2013, at 10:57 AM, Otis Gospodnetic wrote:
>
> > Hi,
> >
> > You are right there is no average.  I saw a Solr cluster with a
> > few EC2 micro instances yesterday and regularly see Solr running on 16
> > or 32 GB RAM and sometimes well over 100 GB RAM.  Sometimes they have
> > just 2 CPU cores, sometimes 32 or more.  Some use SSDs, some HDDs,
> > some local storage, some SAN, some EBS on AWS. etc.
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Tue, Apr 9, 2013 at 7:04 AM, Furkan KAMACI 
> wrote:
> >> This question may not have a generel answer and may be open ended but is
> >> there any commodity server spec. for a usual Solr running machine? I
> mean
> >> what is the average server spesification for a Solr machine (i.e. Hadoop
> >> running system it is not recommended to have very big storage capably
> >> computers.) I will use Solr for indexing web crawled data.
>
>
>
>
>
>


Indexing and searching documents in different languages

2013-04-09 Thread dev


Hello,

I'm trying to index a large number of documents in different languages.
I don't know the language of the document, so I'm using  
TikaLanguageIdentifierUpdateProcessorFactory to identify it.


So, this is my configuration in solrconfig.xml

 
   class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">

 true
 title,subtitle,content
 language_s
 0.3
 general
 en,fr,de,it,es
 true
 true
   
   
   
 

So, the detection works fine and I put some dynamic fields in  
schema.xml to store the results:
  stored="true" multiValued="true"/>
  stored="true" multiValued="true"/>
  stored="true" multiValued="true"/>
  stored="true" multiValued="true"/>
  stored="true" multiValued="true"/>


My main problem now is how to search the document without knowing the  
language of the searched document.
I don't want to have a huge querystring like   
?q=title_en:+term+subtitle_en:+term+title_de:+term...
Okay, using copyField and copy all fields into the "text" field...but  
"text" has the type text_general, so the language specific indexing is  
not working. I could use at least a combined field for every language  
(like text_en, text_fr...) but still, my querystring gets very long  
and to add new languages is terribly uncomfortable.


So, what can I do? Is there a better solution to index and search  
documents in many languages without knowing the language of the  
document and the query before?


- Geschan



Re: Number of segments

2013-04-09 Thread Michael Long
My main concern was just making sure we were getting the best search 
performance, and that we did not have too many segments. Every attempt I 
made to adjust the segment count resulted in no difference (segment 
count never changed). Looking at that blog page, it looks like 30-40 
segments is probably the norm.


On 04/08/2013 08:43 PM, Chris Hostetter wrote:

: How do I determine how many tiers it has?

You may find this blog post from mccandless helpful...

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

(don't ignore the videos! watching them really helpful to understand what
he is talking about)

Once you've obsorbed that, then please revist your question, specifically
Upayavira's key point: what is the problem you are trying to solve?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss




Re: Indexing and searching documents in different languages

2013-04-09 Thread Otis Gospodnetic
Hi,

Typically people try to figure out the query language somehow.
Queries are short, so LID on them is hard.  But user profile could
indicate a language, or users can be asked and such.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Tue, Apr 9, 2013 at 2:32 PM,   wrote:
>
> Hello,
>
> I'm trying to index a large number of documents in different languages.
> I don't know the language of the document, so I'm using
> TikaLanguageIdentifierUpdateProcessorFactory to identify it.
>
> So, this is my configuration in solrconfig.xml
>
>  
> class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
>  true
>  title,subtitle,content
>  language_s
>  0.3
>  general
>  en,fr,de,it,es
>  true
>  true
>
>
>
>  
>
> So, the detection works fine and I put some dynamic fields in schema.xml to
> store the results:
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>
> My main problem now is how to search the document without knowing the
> language of the searched document.
> I don't want to have a huge querystring like
> ?q=title_en:+term+subtitle_en:+term+title_de:+term...
> Okay, using copyField and copy all fields into the "text" field...but "text"
> has the type text_general, so the language specific indexing is not working.
> I could use at least a combined field for every language (like text_en,
> text_fr...) but still, my querystring gets very long and to add new
> languages is terribly uncomfortable.
>
> So, what can I do? Is there a better solution to index and search documents
> in many languages without knowing the language of the document and the query
> before?
>
> - Geschan
>


Re: Solr metrics in Codahale metrics and Graphite?

2013-04-09 Thread Walter Underwood
If it isn't obvious, I'm glad to help test a patch for this. We can run a 
simulated production load in dev and report to our metrics server.

wunder

On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:

> That approach sounds great. --wunder
> 
> On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
> 
>> I've been thinking about how to improve this reporting, especially now that 
>> metrics-3 (which removes all of the funky thread issues we ran into last 
>> time I tried to add it to Solr) is close to release.  I think we could go 
>> about it as follows:
>> 
>> * refactor the existing JMX reporting to use metrics-3.  This would mean 
>> replacing the SolrCore.infoRegistry map with a MetricsRegistry, and adding a 
>> JmxReporter, keeping the existing config logic to determine which JMX server 
>> to use.  PluginInfoHandler and SolrMBeanInfoHandler translate the metrics-3 
>> data back into SolrMBean format to keep the reporting backwards-compatible.  
>> This seems like a lot of work for no visible benefit, but…
>> * we can then add the ability to define other metrics reporters in 
>> solrconfig.xml.  There are already reporters for Ganglia and Graphite - you 
>> just add then to the Solr lib/ directory, configure them in solrconfig, and 
>> voila - Solr can be monitored using the same devops tools you use to monitor 
>> everything else.
>> 
>> Does this sound sane?
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 6 Apr 2013, at 20:49, Walter Underwood wrote:
>> 
>>> Wow, that really doesn't help at all, since these seem to only be reported 
>>> in the stats page. 
>>> 
>>> I don't need another non-standard app-specific set of metrics, especially 
>>> one that needs polling. I need metrics delivered to the common system that 
>>> we use for all our servers.
>>> 
>>> This is also why SPM is not useful for us, sorry Otis.
>>> 
>>> Also, there is no time period on these stats. How do you graph the 95th 
>>> percentile? I know there was a lot of work on these, but they seem really 
>>> useless to me. I'm picky about metrics, working at Netflix does that to you.
>>> 
>>> wunder
>>> 
>>> On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
>>> 
 In the Jira, but not in the docs. 
 
 It would be nice to have VM stats like GC, too, so we can have common 
 monitoring and alerting on all our services.
 
 wunder
 
 On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
 
> It's there! :)
> http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood  
> wrote:
>> That sounds great. I'll check out the bug, I didn't see anything in the 
>> docs about this. And if I can't find it with a search engine, it 
>> probably isn't there.  --wunder
>> 
>> On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
>> 
>>> On 3/29/2013 12:07 PM, Walter Underwood wrote:
 What are folks using for this?
>>> 
>>> I don't know that this really answers your question, but Solr 4.1 and
>>> later includes a big chunk of codahale metrics internally for request
>>> handler statistics - see SOLR-1972.  First we tried including the jar
>>> and using the API, but that created thread leak problems, so the source
>>> code was added.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>>> 
>>> 
>> 
> 
> --
> Walter Underwood
> wun...@wunderwood.org
> 
> 
> 

--
Walter Underwood
wun...@wunderwood.org





Re: Indexing and searching documents in different languages

2013-04-09 Thread Alexandre Rafalovitch
Have you looked at edismax and the 'qf' fields parameter? It allows you to
define the fields to search. Also, you can define those parameters in
solrconfig.xml and not have to send them down the wire.

Finally, you can define several different request handlers (e.g. /ensearch,
/frsearch) and have each of them use different 'qf' values, possibly with
'fl' field also defined and with field name aliasing from language-specific
to generic names.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Apr 9, 2013 at 2:32 PM,  wrote:

>
> Hello,
>
> I'm trying to index a large number of documents in different languages.
> I don't know the language of the document, so I'm using
> TikaLanguageIdentifierUpdatePr**ocessorFactory to identify it.
>
> So, this is my configuration in solrconfig.xml
>
>  
>
>  true
>  title,**subtitle,content
>  **language_s
>  0.3
>  **general
>  en,fr,**de,it,es
>  true
>  **true
>
>
>
>  
>
> So, the detection works fine and I put some dynamic fields in schema.xml
> to store the results:
> stored="true" multiValued="true"/>
> stored="true" multiValued="true"/>
> stored="true" multiValued="true"/>
> stored="true" multiValued="true"/>
> stored="true" multiValued="true"/>
>
> My main problem now is how to search the document without knowing the
> language of the searched document.
> I don't want to have a huge querystring like
>  ?q=title_en:+term+subtitle_en:**+term+title_de:+term...
> Okay, using copyField and copy all fields into the "text" field...but
> "text" has the type text_general, so the language specific indexing is not
> working. I could use at least a combined field for every language (like
> text_en, text_fr...) but still, my querystring gets very long and to add
> new languages is terribly uncomfortable.
>
> So, what can I do? Is there a better solution to index and search
> documents in many languages without knowing the language of the document
> and the query before?
>
> - Geschan
>
>


Re: Slow qTime for distributed search

2013-04-09 Thread Manuel Le Normand
Thanks for replying.
My config:

   - 40 dedicated servers, dual-core each
   - Running Tomcat servlet on Linux
   - 12 Gb RAM per server, splitted half between OS and Solr
   - Complex queries (up to 30 conditions on different fields), 1 qps rate

Sharding my index was done for two reasons, based on 2 servers (4shards)
tests:

   1. As index grew above few million of docs qTime raised greatly, while
   sharding the index to smaller pieces (about 0.5M docs) gave way better
   results, so I bound every shard to have 0.5M docs.
   2. Tests showed i was cpu-bounded during queries. As i have low qps rate
   (emphasize: lower than expected qTime) and as a query runs single-threaded
   on each shard, it made sense to accord a cpu to each shard.

For the same amount of docs per shards I do expect a raise in total qTime
for the reasons:

   1. The response should wait for the slowest shard
   2. Merging the responses from 40 different shards takes time

What i understand from your explanation is that it's the merging that takes
time and as qTime ends only after the second retrieval phase, the qTime on
each shard will take longer. Meaning during a significant proportion of the
first query phase (right after the [id,score] are retieved), all cpu's are
idle except the response-merger thread running on a single cpu. I thought
of the merge as a simple sorting of [id,score], way more simple than
additional 300 ms cpu time.

Why would a RAM increase improve my performances, as it's a
"response-merge" (CPU resource) bottleneck?

Thanks in advance,
Manu


On Mon, Apr 8, 2013 at 10:19 PM, Shawn Heisey  wrote:

> On 4/8/2013 12:19 PM, Manuel Le Normand wrote:
>
>> It seems that sharding my collection to many shards slowed down
>> unreasonably, and I'm trying to investigate why.
>>
>> First, I created "collection1" - 4 shards*replicationFactor=1 collection
>> on
>> 2 servers. Second I created "collection2" - 48 shards*replicationFactor=2
>> collection on 24 servers, keeping same config and same num of documents
>> per
>> shard.
>>
>
> The primary reason to use shards is for index size, when your index is so
> big that a single index cannot give you reasonable performance. There are
> also sometimes performance gains when you break a smaller index into
> shards, but there is a limit.
>
> Going from 2 shards to 3 shards will have more of an impact that going
> from 8 shards to 9 shards.  At some point, adding shards makes things
> slower, not faster, because of the extra work required for combining
> multiple queries into one result response.  There is no reasonable way to
> predict when that will happen.
>
>  Observations showed the following:
>>
>> 1. Total qTime for the same query set is 5 time higher in collection2
>> (150ms->700 ms)
>> 2. Adding to colleciton2 the *shard.info=true* param in the query
>> shows
>>
>> that each shard is much slower than each shard was in collection1
>> (about 4
>> times slower)
>> 3.  Querying only specific shards on collection2 (by adding the
>>
>> shards=shard1,shard2...shard12 param) gave me much better qTime per
>> shard
>> (only 2 times higher than in collection1)
>> 4. I have a low qps rate, thus i don't suspect the replication factor
>>
>> for being the major cause of this.
>> 5. The avg. cpu load on servers during querying was much higher in
>>
>> collection1 than in collection2 and i didn't catch any other
>> bottlekneck.
>>
>
> A distributed query actually consists of up to two queries per shard. The
> first query just requests the uniqueKey field, not the entire document.  If
> you are sorting the results, then the sort field(s) are also requested,
> otherwise the only additional information requested is the relevance score.
>  The results are compiled into a set of unique keys, then a second query is
> sent to the proper shards requesting specific documents.
>
>
>  Q:
>> 1. Why does the amount of shards affect the qTime of each shard?
>> 2. How can I overcome to reduce back the qTime of each shard?
>>
>
> With more shards, it takes longer for the first phase to compile the
> results, so the second phase (document retrieval) gets delayed, and the
> QTime goes up.
>
> One way to reduce the total time is to reduce the number of shards.
>
> You haven't said anything about how complex your queries are, your index
> size(s), or how much RAM you have on each server and how it is allocated.
>  Can you provide this information?
>
> Getting good performance out of Solr requires plenty of RAM in your OS
> disk cache.  Query times of 150 to 700 milliseconds seem very high, which
> could be due to query complexity or a lack of server resources (especially
> RAM), or possibly both.
>
> Thanks,
> Shawn
>
>


Re: Results Order When Performing Wildcard Query

2013-04-09 Thread Shawn Heisey

On 4/9/2013 12:08 PM, P Williams wrote:

I wrote a test of my application which revealed a Solr oddity (I think).
  The test which I wrote on Windows 7 and makes use of the
solr-test-framework
fails
under Ubuntu 12.04 because the Solr results I expected for a wildcard query
of the test data are ordered differently under Ubuntu than Windows.  On
both Windows and Ubuntu all items in the result set have a score of 1.0 and
appear to be ordered by docid (which looks like in corresponds to
alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
root of my issue is that a different docid was assigned to the same
document on each operating system.


It might be due to differences in how Java works on the two platforms, 
or even something as simple as different Java versions.  I don't know a 
lot about the underlying Lucene stuff, so this next sentence may not be 
correct: If you have are not starting from an index where the actual 
index directory was deleted before the test started (rather than 
deleting all documents), that might produce different internal Lucene 
document ids.



The data was imported using a DataImportHandler configuration during a
@BeforeClass step in my JUnit test on both systems.

Any suggestions on how to ensure a consistently ordered wildcard query
result set for testing?


Include an explicit sort parameter.  That way it will depend on the 
data, not the internal Lucene representation.


Thanks,
Shawn



Re: Slow qTime for distributed search

2013-04-09 Thread Shawn Heisey

On 4/9/2013 2:10 PM, Manuel Le Normand wrote:

Thanks for replying.
My config:

- 40 dedicated servers, dual-core each
- Running Tomcat servlet on Linux
- 12 Gb RAM per server, splitted half between OS and Solr
- Complex queries (up to 30 conditions on different fields), 1 qps rate

Sharding my index was done for two reasons, based on 2 servers (4shards)
tests:

1. As index grew above few million of docs qTime raised greatly, while
sharding the index to smaller pieces (about 0.5M docs) gave way better
results, so I bound every shard to have 0.5M docs.
2. Tests showed i was cpu-bounded during queries. As i have low qps rate
(emphasize: lower than expected qTime) and as a query runs single-threaded
on each shard, it made sense to accord a cpu to each shard.

For the same amount of docs per shards I do expect a raise in total qTime
for the reasons:

1. The response should wait for the slowest shard
2. Merging the responses from 40 different shards takes time

What i understand from your explanation is that it's the merging that takes
time and as qTime ends only after the second retrieval phase, the qTime on
each shard will take longer. Meaning during a significant proportion of the
first query phase (right after the [id,score] are retieved), all cpu's are
idle except the response-merger thread running on a single cpu. I thought
of the merge as a simple sorting of [id,score], way more simple than
additional 300 ms cpu time.

Why would a RAM increase improve my performances, as it's a
"response-merge" (CPU resource) bottleneck?


If you have not tweaked the Tomcat configuration, that can lead to 
problems, but if your total query volume is really only one query per 
second, this is probably not a worry for you.  A tomcat connector can be 
configured with a maxThreads parameter.  The recommended value there is 
1, but Tomcat defaults to 200.


You didn't include the index sizes.  There's half a million docs per 
shard, but I don't know what that translates to in terms of MB or GB of 
disk space.


On another email thread you mention that your documents are about 50KB 
each.  That would translate to an index that's at least 25GB, possibly 
more.  That email thread also says that optimization for you takes an 
hour, further indications that you've got some really big indexes.


You're saying that you have given 6GB out of the 12GB to Solr, leaving 
only 6GB for the OS and caching.  Ideally you want to have enough RAM to 
cache the entire index, but in reality you can usually get away with 
caching between half and two thirds of the index.  Exactly what ratio 
works best is highly dependent on your schema.


If my numbers are even close to right, then you've got a lot more index 
on each server than available RAM.  Based on what I can deduce, you 
would want 24 to 48GB of RAM per server.  If my numbers are wrong, then 
this estimate is wrong.


I would be interested in seeing your queries.  If the complexity can be 
expressed as filter queries that get re-used a lot, the filter cache can 
be a major boost to performance.  Solr's caches in general can make a 
big difference.  There is no guarantee that caches will help, of course.


Thanks,
Shawn



Re: How can I set configuration options?

2013-04-09 Thread Edd Grant
Thanks for the replies. The problem I have is that setting them at the JVM
level would mean that all instances of Solr deployed in the Tomcat instance
are forced to use the same settings. I actually want to set the properties
at the application level (e.g. in solr.xml, zoo.conf or maybe an
application level Tomcat Context.xml file).

I'll grab the Solr source and see if there's any way to do this, unless
anyone knows how off the top of their head?

Cheers,

Edd


On 9 April 2013 19:21, Furkan KAMACI  wrote:

> Hi Edd;
>
> The parameters you mentioned are JVM parameters. There are two ways to
> define them.
> First one is if you are using an IDE you can indicate them as JVM
> parameters. i.e. if you are using Intellij IDEA when you click your
> Run/Debug configurations there is a line called VM Options. You can write
> your paramters without writing "java" word in front of them.
>
> Second one is deploying your war file into Tomcat without using an IDE (I
> think this is what you want). Here is what to do:
>
> Go to tomcat home folder and under the bin folder create a file called
> setenv.sh Then add that lines:
>
> #!/bin/sh
> #
> #
> export JAVA_OPTS="$JAVA_OPTS
> -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf -DzkRun
> -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2"
>
>
>
> 2013/4/9 Edd Grant 
>
> > Hi all,
> >
> > I have been working through the examples on the SolrCloud page:
> > http://wiki.apache.org/solr/SolrCloud
> >
> > I am now at the point where, rather than firing up Solr through
> start.jar,
> > I'm deploying the Solr war in to Tomcat instances. Taking the following
> > command as an example:
> >
> > java -Dbootstrap_confdir=./solr/collection1/conf
> > -Dcollection.configName=myconf -DzkRun
> > -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2
> > -jar start.jar
> >
> > I can't figure out from the documentation how/ where I set the above
> > properties when deploying Solr as a war file. I initially thought these
> > might be configurable through solr.xml but can't find anything in the
> > documentation to support this.
> >
> > Most grateful for any pointers here.
> >
> > Cheers,
> >
> > Edd
> > --
> > Web: http://www.eddgrant.com
> > Email: e...@eddgrant.com
> > Mobile: +44 (0) 7861 394 543
> >
>



-- 
Web: http://www.eddgrant.com
Email: e...@eddgrant.com
Mobile: +44 (0) 7861 394 543


Re: Results Order When Performing Wildcard Query

2013-04-09 Thread P Williams
Hey Shawn,

My gut says the difference in assignment of docids has to do with how the
FileListEntityProcessor
works
on the two operating systems. The documents are updated/imported in a
different order is my guess, but I haven't tested that theory. I still
think it's kind of odd that there would be a difference.

Indexes are created from scratch in my test, so it's not that. java
-versionreports the same values on both machines
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) Client VM (build 23.7-b01, mixed mode)

The explicit (arbitrary non-score) sort parameter will work as a
work-around to get my test to pass in both environments while I think about
this some more. Thanks!

Cheers,
Tricia


On Tue, Apr 9, 2013 at 2:13 PM, Shawn Heisey  wrote:

> On 4/9/2013 12:08 PM, P Williams wrote:
>
>> I wrote a test of my application which revealed a Solr oddity (I think).
>>   The test which I wrote on Windows 7 and makes use of the
>> solr-test-framework> solr-test-framework/index.html
>> **>
>>
>> fails
>> under Ubuntu 12.04 because the Solr results I expected for a wildcard
>> query
>> of the test data are ordered differently under Ubuntu than Windows.  On
>> both Windows and Ubuntu all items in the result set have a score of 1.0
>> and
>> appear to be ordered by docid (which looks like in corresponds to
>> alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
>> root of my issue is that a different docid was assigned to the same
>> document on each operating system.
>>
>
> It might be due to differences in how Java works on the two platforms, or
> even something as simple as different Java versions.  I don't know a lot
> about the underlying Lucene stuff, so this next sentence may not be
> correct: If you have are not starting from an index where the actual index
> directory was deleted before the test started (rather than deleting all
> documents), that might produce different internal Lucene document ids.
>
>
>  The data was imported using a DataImportHandler configuration during a
>> @BeforeClass step in my JUnit test on both systems.
>>
>> Any suggestions on how to ensure a consistently ordered wildcard query
>> result set for testing?
>>
>
> Include an explicit sort parameter.  That way it will depend on the data,
> not the internal Lucene representation.
>
> Thanks,
> Shawn
>
>


Re: Slow qTime for distributed search

2013-04-09 Thread Furkan KAMACI
Hi Shawn;

You say that:

*... your documents are about 50KB each.  That would translate to an index
that's at least 25GB*

I know we can not say an exact size but what is the approximately ratio of
document size / index size according to your experiences?


2013/4/9 Shawn Heisey 

> On 4/9/2013 2:10 PM, Manuel Le Normand wrote:
>
>> Thanks for replying.
>> My config:
>>
>> - 40 dedicated servers, dual-core each
>> - Running Tomcat servlet on Linux
>> - 12 Gb RAM per server, splitted half between OS and Solr
>> - Complex queries (up to 30 conditions on different fields), 1 qps
>> rate
>>
>> Sharding my index was done for two reasons, based on 2 servers (4shards)
>> tests:
>>
>> 1. As index grew above few million of docs qTime raised greatly, while
>> sharding the index to smaller pieces (about 0.5M docs) gave way better
>> results, so I bound every shard to have 0.5M docs.
>> 2. Tests showed i was cpu-bounded during queries. As i have low qps
>> rate
>> (emphasize: lower than expected qTime) and as a query runs
>> single-threaded
>> on each shard, it made sense to accord a cpu to each shard.
>>
>> For the same amount of docs per shards I do expect a raise in total qTime
>> for the reasons:
>>
>> 1. The response should wait for the slowest shard
>> 2. Merging the responses from 40 different shards takes time
>>
>> What i understand from your explanation is that it's the merging that
>> takes
>> time and as qTime ends only after the second retrieval phase, the qTime on
>> each shard will take longer. Meaning during a significant proportion of
>> the
>> first query phase (right after the [id,score] are retieved), all cpu's are
>> idle except the response-merger thread running on a single cpu. I thought
>> of the merge as a simple sorting of [id,score], way more simple than
>> additional 300 ms cpu time.
>>
>> Why would a RAM increase improve my performances, as it's a
>> "response-merge" (CPU resource) bottleneck?
>>
>
> If you have not tweaked the Tomcat configuration, that can lead to
> problems, but if your total query volume is really only one query per
> second, this is probably not a worry for you.  A tomcat connector can be
> configured with a maxThreads parameter.  The recommended value there is
> 1, but Tomcat defaults to 200.
>
> You didn't include the index sizes.  There's half a million docs per
> shard, but I don't know what that translates to in terms of MB or GB of
> disk space.
>
> On another email thread you mention that your documents are about 50KB
> each.  That would translate to an index that's at least 25GB, possibly
> more.  That email thread also says that optimization for you takes an hour,
> further indications that you've got some really big indexes.
>
> You're saying that you have given 6GB out of the 12GB to Solr, leaving
> only 6GB for the OS and caching.  Ideally you want to have enough RAM to
> cache the entire index, but in reality you can usually get away with
> caching between half and two thirds of the index.  Exactly what ratio works
> best is highly dependent on your schema.
>
> If my numbers are even close to right, then you've got a lot more index on
> each server than available RAM.  Based on what I can deduce, you would want
> 24 to 48GB of RAM per server.  If my numbers are wrong, then this estimate
> is wrong.
>
> I would be interested in seeing your queries.  If the complexity can be
> expressed as filter queries that get re-used a lot, the filter cache can be
> a major boost to performance.  Solr's caches in general can make a big
> difference.  There is no guarantee that caches will help, of course.
>
> Thanks,
> Shawn
>
>


Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Are there anybody who can help me about how to guess the approximately
needed RAM for 5000 query/second at a Solr machine?


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Jack Krupansky
It all depends on the nature of your query and the nature of the data in the 
index. Does returning results from a result cache count in your QPS? Not to 
mention how many cores and CPU speed and CPU caching as well. Not to mention 
network latency.


The best way to answer is to do a proof of concept implementation and 
measure it yourself.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Tuesday, April 09, 2013 6:06 PM
To: solr-user@lucene.apache.org
Subject: Approximately needed RAM for 5000 query/second at a Solr machine?

Are there anybody who can help me about how to guess the approximately
needed RAM for 5000 query/second at a Solr machine? 



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Actually I will propose a system and I should figure out about machine
specifications. There will be no faceting mechanism at first, just simple
search queries of a web search engine. We can think that I will have a
commodity server (I don't know is there any benchmark for a usual Solr
machine)

2013/4/10 Jack Krupansky 

> It all depends on the nature of your query and the nature of the data in
> the index. Does returning results from a result cache count in your QPS?
> Not to mention how many cores and CPU speed and CPU caching as well. Not to
> mention network latency.
>
> The best way to answer is to do a proof of concept implementation and
> measure it yourself.
>
> -- Jack Krupansky
>
> -Original Message- From: Furkan KAMACI
> Sent: Tuesday, April 09, 2013 6:06 PM
> To: solr-user@lucene.apache.org
> Subject: Approximately needed RAM for 5000 query/second at a Solr machine?
>
>
> Are there anybody who can help me about how to guess the approximately
> needed RAM for 5000 query/second at a Solr machine?
>


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Walter Underwood
On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:

> Are there anybody who can help me about how to guess the approximately
> needed RAM for 5000 query/second at a Solr machine?

No.

That depends on the kind of queries you have, the size and content of the 
index, the required response time, how frequently the index is updated, and 
many more factors. So anyone who can guess that is wrong.

You can only find that out by running your own benchmarks with your own queries 
against your own index.

In our system, we can meet our response time requirements at a rate of 4000 
queries/minute. We have several cores, but most traffic goes to a 3M document 
index. This index is small documents, mostly titles and authors of books. We 
have no wildcard queries and less than 5% of our queries use fuzzy matching. We 
update once per day and have cache hit rates of around 30%.

We run new benchmarks twice each year, before our busy seasons. We use the 
current index and configuration and the queries from the busiest day of the 
previous season.

Our key benchmark is the 95th percentile response time, but we also measure 
median, 90th, and 99th percentile.

We are currently on Solr 3.3 with some customizations. We're working on 
transitioning to Solr 4.

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Hi Walter;

Firstly thank for your detailed reply. I know that this is not a well
detailed question but I don't have any metrics yet. If we talk about your
system, what is the average RAM size of your Solr machines? Maybe that can
help me to make a comparison.

2013/4/10 Walter Underwood 

> On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:
>
> > Are there anybody who can help me about how to guess the approximately
> > needed RAM for 5000 query/second at a Solr machine?
>
> No.
>
> That depends on the kind of queries you have, the size and content of the
> index, the required response time, how frequently the index is updated, and
> many more factors. So anyone who can guess that is wrong.
>
> You can only find that out by running your own benchmarks with your own
> queries against your own index.
>
> In our system, we can meet our response time requirements at a rate of
> 4000 queries/minute. We have several cores, but most traffic goes to a 3M
> document index. This index is small documents, mostly titles and authors of
> books. We have no wildcard queries and less than 5% of our queries use
> fuzzy matching. We update once per day and have cache hit rates of around
> 30%.
>
> We run new benchmarks twice each year, before our busy seasons. We use the
> current index and configuration and the queries from the busiest day of the
> previous season.
>
> Our key benchmark is the 95th percentile response time, but we also
> measure median, 90th, and 99th percentile.
>
> We are currently on Solr 3.3 with some customizations. We're working on
> transitioning to Solr 4.
>
> wunder
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Walter Underwood
We are using Amazon EC2 M1 Extra Large instances (m1.xlarge).

http://aws.amazon.com/ec2/instance-types/

wunder

On Apr 9, 2013, at 3:35 PM, Furkan KAMACI wrote:

> Hi Walter;
> 
> Firstly thank for your detailed reply. I know that this is not a well
> detailed question but I don't have any metrics yet. If we talk about your
> system, what is the average RAM size of your Solr machines? Maybe that can
> help me to make a comparison.
> 
> 2013/4/10 Walter Underwood 
> 
>> On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:
>> 
>>> Are there anybody who can help me about how to guess the approximately
>>> needed RAM for 5000 query/second at a Solr machine?
>> 
>> No.
>> 
>> That depends on the kind of queries you have, the size and content of the
>> index, the required response time, how frequently the index is updated, and
>> many more factors. So anyone who can guess that is wrong.
>> 
>> You can only find that out by running your own benchmarks with your own
>> queries against your own index.
>> 
>> In our system, we can meet our response time requirements at a rate of
>> 4000 queries/minute. We have several cores, but most traffic goes to a 3M
>> document index. This index is small documents, mostly titles and authors of
>> books. We have no wildcard queries and less than 5% of our queries use
>> fuzzy matching. We update once per day and have cache hit rates of around
>> 30%.
>> 
>> We run new benchmarks twice each year, before our busy seasons. We use the
>> current index and configuration and the queries from the busiest day of the
>> previous season.
>> 
>> Our key benchmark is the 95th percentile response time, but we also
>> measure median, 90th, and 99th percentile.
>> 
>> We are currently on Solr 3.3 with some customizations. We're working on
>> transitioning to Solr 4.
>> 
>> wunder
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>> 
>> 
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org





Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Thanks for your answer.

2013/4/10 Walter Underwood 

> We are using Amazon EC2 M1 Extra Large instances (m1.xlarge).
>
> http://aws.amazon.com/ec2/instance-types/
>
> wunder
>
> On Apr 9, 2013, at 3:35 PM, Furkan KAMACI wrote:
>
> > Hi Walter;
> >
> > Firstly thank for your detailed reply. I know that this is not a well
> > detailed question but I don't have any metrics yet. If we talk about your
> > system, what is the average RAM size of your Solr machines? Maybe that
> can
> > help me to make a comparison.
> >
> > 2013/4/10 Walter Underwood 
> >
> >> On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:
> >>
> >>> Are there anybody who can help me about how to guess the approximately
> >>> needed RAM for 5000 query/second at a Solr machine?
> >>
> >> No.
> >>
> >> That depends on the kind of queries you have, the size and content of
> the
> >> index, the required response time, how frequently the index is updated,
> and
> >> many more factors. So anyone who can guess that is wrong.
> >>
> >> You can only find that out by running your own benchmarks with your own
> >> queries against your own index.
> >>
> >> In our system, we can meet our response time requirements at a rate of
> >> 4000 queries/minute. We have several cores, but most traffic goes to a
> 3M
> >> document index. This index is small documents, mostly titles and
> authors of
> >> books. We have no wildcard queries and less than 5% of our queries use
> >> fuzzy matching. We update once per day and have cache hit rates of
> around
> >> 30%.
> >>
> >> We run new benchmarks twice each year, before our busy seasons. We use
> the
> >> current index and configuration and the queries from the busiest day of
> the
> >> previous season.
> >>
> >> Our key benchmark is the 95th percentile response time, but we also
> >> measure median, 90th, and 99th percentile.
> >>
> >> We are currently on Solr 3.3 with some customizations. We're working on
> >> transitioning to Solr 4.
> >>
> >> wunder
> >> --
> >> Walter Underwood
> >> wun...@wunderwood.org
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread sdspieg
If anybody could still help me out with this, I'd really appreciate it.
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054885.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread Furkan KAMACI
Apache Solr 4 Cookbok says that:

curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true";
-F "myfile=@cookbook.pdf"

is that what you want?

2013/4/10 sdspieg 

> If anybody could still help me out with this, I'd really appreciate it.
> Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054885.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread Jack Krupansky
The newer release of SimplePostTool with Solr 4.x makes it easy to post PDF 
files from a directory, including automatically adding the file name to a 
field. But SolrCell is the direct API that it uses as well.


-- Jack Krupansky
-Original Message- 
From: Furkan KAMACI

Sent: Tuesday, April 09, 2013 6:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Pushing a whole set of pdf-files to solr

Apache Solr 4 Cookbok says that:

curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true";
-F "myfile=@cookbook.pdf"

is that what you want?

2013/4/10 sdspieg 


If anybody could still help me out with this, I'd really appreciate it.
Thanks!



--
View this message in context:
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054885.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Slow qTime for distributed search

2013-04-09 Thread Shawn Heisey

On 4/9/2013 3:50 PM, Furkan KAMACI wrote:

Hi Shawn;

You say that:

*... your documents are about 50KB each.  That would translate to an index
that's at least 25GB*

I know we can not say an exact size but what is the approximately ratio of
document size / index size according to your experiences?


If you store the fields, that is actual size plus a small amount of 
overhead.  Starting with Solr 4.1, stored fields are compressed.  I 
believe that it uses LZ4 compression.  Some people store all fields, 
some people store only a few or one - an ID field.  The size of stored 
fields does have an impact on how much OS disk cache you need, but not 
as much as the other parts of an index.


It's been my experience that termvectors take up almost as much space as 
stored data for the same fields, and sometimes more.  Starting with Solr 
4.2, termvectors are also compressed.


Adding docValues (new in 4.2) to the schema will also make the index 
larger.  The requirements here are similar to stored fields.  I do not 
know whether this data gets compressed, but I don't think it does.


As for the indexed data, this is where I am less clear about the storage 
ratios, but I think you can count on it needing almost as much space as 
the original data.  If the schema uses types or filters that produce a 
lot of information, the indexed data might be larger than the original 
input.  Examples of data explosions in a schema: trie fields with a 
non-zero precisionStep, the edgengram filter, the shingle filter.


Thanks,
Shawn



Re: How can I set configuration options?

2013-04-09 Thread Chris Hostetter
: Thanks for the replies. The problem I have is that setting them at the JVM
: level would mean that all instances of Solr deployed in the Tomcat instance
: are forced to use the same settings. I actually want to set the properties
: at the application level (e.g. in solr.xml, zoo.conf or maybe an
: application level Tomcat Context.xml file).

the thing to keep in mind is that most of the params you refered to are 
things you would not typically want in a deployed setup.  others are 
just ways of specifying defaults that are substituted into configs...

: > > java -Dbootstrap_confdir=./solr/collection1/conf

you don't wnat this option for a normal setup, it's just for boostratping 
(hence it's only a system property).  in a production setup you would use 
the zookeeper tools to load the configs into your zk quorum.

https://wiki.apache.org/solr/SolrCloud#Config_Startup_Bootstrap_Params
...vs...
https://wiki.apache.org/solr/SolrCloud#Command_Line_Util

: > > -Dcollection.configName=myconf -DzkRun

ditto for collection.configName -- it's only for boostraping

zkRun is something you only use in trivial setups like the examples in the 
SolrCloud tutorial to run zookeeper embedded in Solr.  if you are running 
a production cluster where you want to be able to add/remove solr nodes on 
the fly, then you are going to want to set of specific machines running 
standalone zookeper.

: > > -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2

zkHost can be specified in solr.xml (allthough i'm not sure why the 
example solr.xml doesn't include it, i'll update SOLR-4622 to address 
this), or it can be overridden by a system property.


-Hoss


Re: Field exist in schema.xml but returns

2013-04-09 Thread deniz
Raymond Wiker wrote
> You have misspelt the tag name in the field definition... you have "fiald"
> instead of "field".

thank you Raymond, it was really hard to find it out in a massive schema
file



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-exist-in-schema-xml-but-returns-tp4054634p4054903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Shawn Heisey

On 4/9/2013 4:06 PM, Furkan KAMACI wrote:

Are there anybody who can help me about how to guess the approximately
needed RAM for 5000 query/second at a Solr machine?


You've already gotten some good replies, and I'm aware that they haven't 
really answered your question.  This is the kind of question that cannot 
be answered.


The amount of RAM that you'll need for extreme performance actually 
isn't hard to figure out - you need enough free RAM for the OS to cache 
the maximum amount of disk space all your indexes will ever use. 
Normally this will be twice the size of all the indexes on the machine, 
because that's how much disk space will likely be used in a worst-case 
merge scenario (optimize).  That's very expensive, so it is cheaper to 
budget for only the size of the index.


A load of 5000 queries per second is pretty high, and probably something 
you will not achieve with a single-server (not counting backup) 
approach.  All of the tricks that high-volume website developers use are 
also applicable to Solr.


Once you have enough RAM, you need to worry more about the number of 
servers, the number of CPU cores in each server, and the speed of those 
CPU cores.  Testing with actual production queries is the only way to 
find out what you really need.


Beyond hardware design, making the requests as simple as possible and 
taking advantage of caches is important.  Solr has caches for queries, 
filters, and documents.  You can also put a caching proxy (something 
like Varnish) in front of Solr, but that would make NRT updates pretty 
much impossible, and that kind of caching can be difficult to get 
working right.


Thanks,
Shawn



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
These are really good metrics for me:

You say that RAM size should be at least index size, and it is better to
have a RAM size twice the index size (because of worst case scenario).

On the other hand let's assume that I have a RAM size that is bigger than
twice of indexes at machine. Can Solr use that extra RAM or is it a
approximately maximum limit (to have twice size of indexes at machine)?


2013/4/10 Shawn Heisey 

> On 4/9/2013 4:06 PM, Furkan KAMACI wrote:
>
>> Are there anybody who can help me about how to guess the approximately
>> needed RAM for 5000 query/second at a Solr machine?
>>
>
> You've already gotten some good replies, and I'm aware that they haven't
> really answered your question.  This is the kind of question that cannot be
> answered.
>
> The amount of RAM that you'll need for extreme performance actually isn't
> hard to figure out - you need enough free RAM for the OS to cache the
> maximum amount of disk space all your indexes will ever use. Normally this
> will be twice the size of all the indexes on the machine, because that's
> how much disk space will likely be used in a worst-case merge scenario
> (optimize).  That's very expensive, so it is cheaper to budget for only the
> size of the index.
>
> A load of 5000 queries per second is pretty high, and probably something
> you will not achieve with a single-server (not counting backup) approach.
>  All of the tricks that high-volume website developers use are also
> applicable to Solr.
>
> Once you have enough RAM, you need to worry more about the number of
> servers, the number of CPU cores in each server, and the speed of those CPU
> cores.  Testing with actual production queries is the only way to find out
> what you really need.
>
> Beyond hardware design, making the requests as simple as possible and
> taking advantage of caches is important.  Solr has caches for queries,
> filters, and documents.  You can also put a caching proxy (something like
> Varnish) in front of Solr, but that would make NRT updates pretty much
> impossible, and that kind of caching can be difficult to get working right.
>
> Thanks,
> Shawn
>
>


Re: Results Order When Performing Wildcard Query

2013-04-09 Thread Chris Hostetter

: My gut says the difference in assignment of docids has to do with how the
: 
FileListEntityProcessor

docids just represent the order documents are added to the index.  if you 
use DIH with FileListEntityProcessor to create one doc per file then the 
order of the documents will (if i remember correctly) corrispond tothe 
order of the files returned by the OS, which may vary.

even if the files are ordered consitently by modification date: 1) the 
modification date of these files on your machines  might be different; the 
graunlarity of file modification dates supported by the filesystem or file 
io layer in the JVM on each machine might be different -- causing two 
files to appera to have identical mod times on one machine, but different 
mod times on the other machine.


-Hoss


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread sdspieg
Thanks for those replies. I will look into them. But if anyone knows of a
site that describes step by step how a windows user who has already
installed solr (and tomcat) can easily feed a folder (and subfolders) with
100s of pdfs into solr, or would be willing to write down down those steps,
I would really appreciate the reference. And I bet you there are lots of
people like me... 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread sdspieg
I am able to run the "java -jar post.jar -help" command which I found here:
http://docs.lucidworks.com/display/solr/Running+Solr. But now how can I tell
post to post all pdf files in a certain folder (preferably recursively) to a
collection? Could anybody please post the exact command for that? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054916.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2.1 SSLInitializationException

2013-04-09 Thread Sarita Nair
Hi Chris,

Thanks for your response.

My understanding is that GlassFish specifies the keystore as a system property, 
but does not specify the password  in order to protect it from 
snooping. There's
a keychain that requires a password to be passed from the DAS in order to 
unlock the key for the keystore. 

Is there some way to specify a 
different HttpClient implementation (e.g. DefaultHttpClient rather than 
SystemDefaultHttpClient), as we don't want the application to have 
access to the keystore?


I have also pasted the entire stack trace below:

2013-04-09 10:45:06,144 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter 
- Could not start Solr. Check solr/home property and the logs
    2013-04-09 10:45:06,224 [main] ERROR org.apache.solr.core.SolrCore - 
null:org.apache.http.conn.ssl.SSLInitializationException: Failure initializing 
default system SSL context
    at 
org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:368)
    at 
org.apache.http.conn.ssl.SSLSocketFactory.getSystemSocketFactory(SSLSocketFactory.java:204)
    at 
org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault(SchemeRegistryFactory.java:82)
    at 
org.apache.http.impl.client.SystemDefaultHttpClient.createClientConnectionManager(SystemDefaultHttpClient.java:118)
    at 
org.apache.http.impl.client.AbstractHttpClient.getConnectionManager(AbstractHttpClient.java:466)
    at 
org.apache.solr.client.solrj.impl.HttpClientUtil.setMaxConnections(HttpClientUtil.java:179)
    at 
org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:33)
    at 
org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:115)
    at 
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:105)
    at 
org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:134)
    at 
com.sun.enterprise.glassfish.bootstrap.GlassFishImpl.start(GlassFishImpl.java:79)
    at 
com.sun.enterprise.glassfish.bootstrap.GlassFishDecorator.start(GlassFishDecorator.java:63)
    at 
com.sun.enterprise.glassfish.bootstrap.osgi.OSGiGlassFishImpl.start(OSGiGlassFishImpl.java:69)
    at 
com.sun.enterprise.glassfish.bootstrap.GlassFishMain$Launcher.launch(GlassFishMain.java:117)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at 
com.sun.enterprise.glassfish.bootstrap.GlassFishMain.main(GlassFishMain.java:97)
    at com.sun.enterprise.glassfish.bootstrap.ASMain.main(ASMain.java:55)
Caused by: java.io.IOException: Keystore was tampered with, or password was 
incorrect
  at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:772)
    at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:55)
    at java.security.KeyStore.load(KeyStore.java:1214)
    at 
org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:281)
    at 
org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:366)
 
    ... 50 more
Caused by: java.security.UnrecoverableKeyException: Password verification failed
    at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:770)
    ... 54 more






 







 From: Chris Hostetter 
To: "solr-user@lucene.apache.org" ; Sarita Nair 
 
Sent: Tuesday, April 9, 2013 1:31 PM
Subject: Re: Solr 4.2.1 SSLInitializationException
 

: Deploying Solr 4.2.1 to GlassFish 3.1.1 results in the error below.  I 
: have seen similar problems being reported with Solr 4.2

Are you trying to use server SSL with glassfish?

can you please post the full stack trace so we can see where this error is 
coming from.

My best guess is that this is coming from the changes made in 
SOLR-4451 to use system defaults correctly when initializing HttpClient, 
which suggets that your problem is exactly what the error message says...

  "Keystore was tampered with, or password was incorrect"

Is it possible that the default keystore password for your JVM (or as 
overridden by glassfish defaults - possibly using the 
"javax.net.ssl.keyStore" sysprop) has a password set on it?  If so you 
need to confiure your JVM with the standard java system properties to 
specify what that password is.

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3c1364232676233-4051159.p...@n3.nabble.com%3E

:     2013-04-09 10:45:06,144 [main] ERROR 
: org.apache.solr.servlet.SolrDispatchFilter - Could not start Solr. Check 
solr/home property and the logs
:     2013-04-09 10:45:06,224 [main] ERROR 
: org.apache.solr.core.SolrCore - 
: null:org.apache.http.conn.ssl.SSLInitializationException: Failure 
: initializing default system SSL context
:     Caus

Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread Gora Mohanty
On 10 April 2013 07:28, sdspieg  wrote:
> I am able to run the "java -jar post.jar -help" command which I found here:
> http://docs.lucidworks.com/display/solr/Running+Solr. But now how can I tell
> post to post all pdf files in a certain folder (preferably recursively) to a
> collection? Could anybody please post the exact command for that?
[...]

There are two options:
* I am not familiar with Microsoft Windows, but writing some kind of a batch
  script that recurses down a directory, and posts files to Solr should be easy.
* One could use the Solr DataImportHandler with FileDataSource to handle
   the filesystem traversal, and TikaEntityProcessor to handle the indexing of
   rich content. Please see:
   http://wiki.apache.org/solr/DataImportHandler
   http://wiki.apache.org/solr/TikaEntityProcessor

Regards,
Gora


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread sdspieg
Another progress report. I 'flattened' all the folders which contained the
pdf files with Fileboss and then moved the pdf files to the directory where
I found the post.jar file (in solr-4.2.1\solr-4.2.1\example\exampledocs). I
then ran "java -Ddata=files -jar post.jar *.pdf" and in the command window
it seemed to be working fine (these are just academic articles in pdf-format
that I downloaded with ZOtyero from EBSCO):
04/10/2013  12:20 AM   159,224 Vorontsov - 2012 - The Korea- Russia
Gas
Pipeline Project Past, Pres.pdf
04/10/2013  12:12 AM 3,885,056 Walker - 2012 - Asia competes for
energy
security.pdf
04/10/2013  12:45 AM66,195 Whitmill - 2012 - Is UK Energy Policy
Dri
ving Energy Innovation - or.pdf
04/10/2013  12:29 AM 2,208,367 Wietfeld - 2011 - Understanding
Middle Ea
st Gas Exporting Behavior.pdf
04/10/2013  12:59 AM 3,011,185 Wiseman - 2011 - Expanding Regional
Renew
able Governance.pdf
04/10/2013  12:38 AM   180,692 Woudhuysen - 2012 - Innovation in
Energy
Expressions of a Crisis, and.pdf
04/10/2013  12:49 AM   229,991 Yergin - 2012 - How Is Energy
Remaking th
e World.pdf
04/10/2013  12:40 AM 3,397,328 Young - 2012 - Industrial Gases.
(cover s
tory).pdf
04/10/2013  01:36 AM73,125 Zimmerer - 2011 - New Geographies of
Ener
gy Introduction to the Spe.pdf
... and so on, all together some 300 articles.

But then when I looked in solr, I saw the following:
04:34:41
SEVERE
SolrCore
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at
char #10,​ byte #-1)
04:34:41
SEVERE
SolrCore
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at
char #10,​ byte #-1)

... and a lot more of those.

I'd like to think I made SOME progress, but it also seems like I'm still not
close to being there. Any suggestions from the experts here on what I am
doing wrong? 

Thanks!

-Stephan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054920.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Shawn Heisey
On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
> These are really good metrics for me:
> 
> You say that RAM size should be at least index size, and it is better to
> have a RAM size twice the index size (because of worst case scenario).
> 
> On the other hand let's assume that I have a RAM size that is bigger than
> twice of indexes at machine. Can Solr use that extra RAM or is it a
> approximately maximum limit (to have twice size of indexes at machine)?

What we have been discussing is the OS cache, which is memory that is
not used by programs.  The OS uses that memory to make everything run
faster.  The OS will instantly give that memory up if a program requests it.

Solr is a java program, and java uses memory a little differently, so
Solr most likely will NOT use more memory when it is available.

In a "normal" directly executable program, memory can be allocated at
any time, and given back to the system at any time.

With Java, you tell it the maximum amount of memory the program is ever
allowed to use.  Because of how memory is used inside Java, most
long-running Java programs (like Solr) will allocate up to the
configured maximum even if they don't really need that much memory.
Most Java virtual machines will never give the memory back to the system
even if it is not required.

Thanks,
Shawn



Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread Gora Mohanty
On 10 April 2013 08:11, sdspieg  wrote:
> Another progress report. I 'flattened' all the folders which contained the
> pdf files with Fileboss and then moved the pdf files to the directory where
> I found the post.jar file (in solr-4.2.1\solr-4.2.1\example\exampledocs). I
> then ran "java -Ddata=files -jar post.jar *.pdf" and in the command window
> it seemed to be working fine (these are just academic articles in pdf-format
> that I downloaded with ZOtyero from EBSCO):
[...]

If it works, great, but it is not generally advisable to have a large number
of files under one directory. However, that is not the source of your error
here.
> But then when I looked in solr, I saw the following:
> 04:34:41
> SEVERE
> SolrCore
> org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at
> char #10, byte #-1)
[...]

Your files seem to have some encoding other than UTF-8: My random
guess would be Windows-1252. You need to convert the files to UTF-8.

Regards,
Gora


Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread Jack Krupansky
The newer SimplePostTool can in fact recurse a directory of PDFs. Just get 
the usage for the tool. I'm sure it lists the command options.


-- Jack Krupansky

-Original Message- 
From: sdspieg

Sent: Tuesday, April 09, 2013 9:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Pushing a whole set of pdf-files to solr

Thanks for those replies. I will look into them. But if anyone knows of a
site that describes step by step how a windows user who has already
installed solr (and tomcat) can easily feed a folder (and subfolders) with
100s of pdfs into solr, or would be willing to write down down those steps,
I would really appreciate the reference. And I bet you there are lots of
people like me...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054915.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: edismax returns very less matches than regular

2013-04-09 Thread Erick Erickson
Adding &debugQuery=true is your friend. I suspect that you'll find
your first query is actually searching
name:coldfusion OR defaultsearchfield:cache and you _think_ it's
searching for both coldfusion and cache in the name field

Best
Erick

On Mon, Apr 8, 2013 at 2:50 AM, amit  wrote:
> I have a simple system. I put the title of webpages into the "name" field and
> content of the web pages into the "Description" field.
> I want to search both fields and give the name a little more boost.
> A search on name field or description field returns records cloase to
> hundreds.
>
> http://localhost:8983/solr/select/?q=name:%28coldfusion^2%20cache^1%29&fq=author:[*%20TO%20*]%20AND%20-author:chinmoyp&start=0&rows=10&fl=author,score,%20id
>
> But search on both fields using boost just gives 5 matches.
>
> http://localhost:8983/solr/mindfire/?q=%28%20coldfusion^2%20cache^1%29*&defType=edismax&qf=name^1.5%20description^1.0*&fq=author:[*%20TO%20*]%20AND%20-author:chinmoyp&start=0&rows=10&fl=author,score,%20id
>
> I am wondering what is wrong, because there are valid results returned in
> 1st query which is ignored by edismax. I am on solr3.6
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/edismax-returns-very-less-matches-than-regular-tp4054442.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
I am sorry but you said:

*you need enough free RAM for the OS to cache the maximum amount of disk
space all your indexes will ever use*

I have made an assumption my indexes at my machine. Let's assume that it is
5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up
to how much I define it as a Java processes. When we think about the
indexes at storage and caching them at RAM by OS, is that what you talk
about: having more than 5 GB - or - 10 GB RAM for my machine?

2013/4/10 Shawn Heisey 

> On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
> > These are really good metrics for me:
> >
> > You say that RAM size should be at least index size, and it is better to
> > have a RAM size twice the index size (because of worst case scenario).
> >
> > On the other hand let's assume that I have a RAM size that is bigger than
> > twice of indexes at machine. Can Solr use that extra RAM or is it a
> > approximately maximum limit (to have twice size of indexes at machine)?
>
> What we have been discussing is the OS cache, which is memory that is
> not used by programs.  The OS uses that memory to make everything run
> faster.  The OS will instantly give that memory up if a program requests
> it.
>
> Solr is a java program, and java uses memory a little differently, so
> Solr most likely will NOT use more memory when it is available.
>
> In a "normal" directly executable program, memory can be allocated at
> any time, and given back to the system at any time.
>
> With Java, you tell it the maximum amount of memory the program is ever
> allowed to use.  Because of how memory is used inside Java, most
> long-running Java programs (like Solr) will allocate up to the
> configured maximum even if they don't really need that much memory.
> Most Java virtual machines will never give the memory back to the system
> even if it is not required.
>
> Thanks,
> Shawn
>
>


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Shawn Heisey
On 4/9/2013 9:12 PM, Furkan KAMACI wrote:
> I am sorry but you said:
> 
> *you need enough free RAM for the OS to cache the maximum amount of disk
> space all your indexes will ever use*
> 
> I have made an assumption my indexes at my machine. Let's assume that it is
> 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up
> to how much I define it as a Java processes. When we think about the
> indexes at storage and caching them at RAM by OS, is that what you talk
> about: having more than 5 GB - or - 10 GB RAM for my machine?

If your index is 5GB, and you give 3GB of RAM to the Solr JVM, then you
would want at least 8GB of total RAM for that machine - the 3GB of RAM
given to Solr, plus the rest so the OS can cache the index in RAM.  If
you plan for double the cache memory, you'd need 13 to 14GB.

Thanks,
Shawn



RE: Solr index Backup and restore of large indexs

2013-04-09 Thread Sandeep Kumar Anumalla
Please update?

-Original Message-
From: Sandeep Kumar Anumalla
Sent: 31 March, 2013 12:08 PM
To: solr-user@lucene.apache.org
Cc: 'Joel Bernstein'
Subject: RE: Solr index Backup and restore of large indexs

Hi,

I am exploring all the possible options.

We want to distribute 1 TB traffic among 3 Slor Shards(Masters) and 
corresponding 3 Solr Slaves.

Initially I have used Master/Slave setup. But in this case my traffic rate on 
Master is very high, because of this we are facing the blow issue while 
replicating to Slave.

-
SnapPull failed
SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Unable to 
download _xv0_Lucene41_0.doc completely. Downloaded 0!=5935


In this case the Slave machine also has to be the same Hardware and Software 
configuration as such the Master; this seems to be more expensive.

-

Then I decided to use multiple Solr instances on single machine and accessing 
them by Using "EmbeddedSolrServer", and planned to query all these instances to 
get the required result.

In this case there is no need of Slave machine,just we need to take the backup 
and we can store in any external hard disks.

Here there are 2 issues I am facing.

1. Loading is not that much fast when compare to Database.
2. How to take incremental backup? Means I don't want to take the full back up 
every time.

-

Thanks
Sandeep A

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com]
Sent: 28 March, 2013 04:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr index Backup and restore of large indexs

Hi,

Are you running Solr Cloud or Master/Slave? I'm assuming with 1TB a day you're 
sharding.

With master/slave you can configure incremental index replication to another 
core. The backup core can be local on the server, on a separate sever or in a 
separate data center.

With Solr Cloud replicas can be setup to automatically have redundant copies of 
the index. These copies though are live copies and will handle queries. 
Replicating data to a separate data center is typically not done through Solr 
Cloud replication.

Joel


On Mon, Mar 25, 2013 at 11:43 PM, Otis Gospodnetic < 
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Try something like this: http://host/solr/replication?command=backup
>
> See: http://wiki.apache.org/solr/SolrReplication
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, Mar 21, 2013 at 3:23 AM, Sandeep Kumar Anumalla
>  wrote:
> >
> > Hi,
> >
> > We are loading daily 1TB (Apprx) of index data .Please let me know
> > the
> best procedure to take Backup and restore of the indexes. I am using
> Solr 4.2.
> >
> >
> >
> > Thanks & Regards
> > Sandeep A
> > Ext : 02618-2856
> > M : 0502493820
> >
> >
> > 
> > The content of this email together with any attachments, statements
> > and
> opinions expressed herein contains information that is private and
> confidential are intended for the named addressee(s) only. If you are
> not the addressee of this email you may not copy, forward, disclose or
> otherwise use it or any part of it in any form whatsoever. If you have
> received this message in error please notify postmas...@etisalat.ae by
> email immediately and delete the message without making any copies.
>



--
Joel Bernstein
Professional Services LucidWorks

The content of this email together with any attachments, statements and 
opinions expressed herein contains information that is private and confidential 
are intended for the named addressee(s) only. If you are not the addressee of 
this email you may not copy, forward, disclose or otherwise use it or any part 
of it in any form whatsoever. If you have received this message in error please 
notify postmas...@etisalat.ae by email immediately and delete the message 
without making any copies.


Re: query regarding the use of boost across the fields in edismax query

2013-04-09 Thread Rohan Thakur
hi otis

can you explain that in some depth like If is search for led in both the
cases what could be the difference in the results I get?

thanks in advance
regards
Rohan


On Tue, Apr 9, 2013 at 11:25 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Not sure if i'm missing something but in the first case features, cat,
> and color field have more weight, so matches on them with have bigger
> contribution to the overall relevancy score.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Tue, Apr 9, 2013 at 1:52 PM, Rohan Thakur  wrote:
> > hi all
> >
> > wanted to know what could be the difference between the results if I
> apply
> > boost accross say 5 fields in query like for
> >
> > first: title^10.0 features^7.0 cat^5.0 color^3.0 root^1.0 and
> > second settings like : title^10.0 features^5.0 cat^3.0 color^2.0 root^1.0
> >
> > what could be the difference as in the weights are in same order
> decreasing?
> >
> > thanks in advance
> >
> > regards
> > Rohan
>


RE: Solr 4.2 Incremental backups

2013-04-09 Thread Sandeep Kumar Anumalla
HI Erick,

My main point is if I use replication I have to use similar kind of setup 
(Hardware, storage space) as such as the Master, it more cost effective, that 
is why I am looking at incremental backup options, so that I can keep these 
backup any place like external Hard disks, tapes.

And moreover when I am using replication we are facing the blow issue while 
replicating to Slave.

-
SnapPull failed
SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Unable to 
download _xv0_Lucene41_0.doc completely. Downloaded 0!=5935


Thanks


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 25 March, 2013 07:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.2 Incremental backups

That's essentially what replication does, only backs up parts of the index that 
have changed. However, when segments merge, that might mean the entire index 
needs to be replicated.

Best
Erick


On Sun, Mar 24, 2013 at 12:08 AM, Sandeep Kumar Anumalla < 
sanuma...@etisalat.ae> wrote:

> Hi,
>
> Is there any option to do Incremental backups in Solr 4.2?
>
> Thanks & Regards
> Sandeep A
> Ext : 02618-2856
> M : 0502493820
>
>
> 
> The content of this email together with any attachments, statements
> and opinions expressed herein contains information that is private and
> confidential are intended for the named addressee(s) only. If you are
> not the addressee of this email you may not copy, forward, disclose or
> otherwise use it or any part of it in any form whatsoever. If you have
> received this message in error please notify postmas...@etisalat.ae by
> email immediately and delete the message without making any copies.
>

The content of this email together with any attachments, statements and 
opinions expressed herein contains information that is private and confidential 
are intended for the named addressee(s) only. If you are not the addressee of 
this email you may not copy, forward, disclose or otherwise use it or any part 
of it in any form whatsoever. If you have received this message in error please 
notify postmas...@etisalat.ae by email immediately and delete the message 
without making any copies.


Re: Solr 4.2.1 SSLInitializationException

2013-04-09 Thread Uwe Klosa
You have to add two new Java options to your Glassfish config (example if
you use the standard keystore and truststore):

asadmin create-jvm-options -- -Djavax.net.ssl.keyStorePassword=changeit
asadmin create-jvm-options -- -Djavax.net.ssl.trustStorePassword=changeit

/Uwe


On 10 April 2013 03:59, Sarita Nair  wrote:

> Hi Chris,
>
> Thanks for your response.
>
> My understanding is that GlassFish specifies the keystore as a system
> property,
> but does not specify the password  in order to protect it from
> snooping. There's
> a keychain that requires a password to be passed from the DAS in order to
> unlock the key for the keystore.
>
> Is there some way to specify a
> different HttpClient implementation (e.g. DefaultHttpClient rather than
> SystemDefaultHttpClient), as we don't want the application to have
> access to the keystore?
>
>
> I have also pasted the entire stack trace below:
>
> 2013-04-09 10:45:06,144 [main] ERROR
> org.apache.solr.servlet.SolrDispatchFilter - Could not start Solr. Check
> solr/home property and the logs
> 2013-04-09 10:45:06,224 [main] ERROR org.apache.solr.core.SolrCore -
> null:org.apache.http.conn.ssl.SSLInitializationException: Failure
> initializing default system SSL context
> at
> org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:368)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.getSystemSocketFactory(SSLSocketFactory.java:204)
> at
> org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault(SchemeRegistryFactory.java:82)
> at
> org.apache.http.impl.client.SystemDefaultHttpClient.createClientConnectionManager(SystemDefaultHttpClient.java:118)
> at
> org.apache.http.impl.client.AbstractHttpClient.getConnectionManager(AbstractHttpClient.java:466)
> at
> org.apache.solr.client.solrj.impl.HttpClientUtil.setMaxConnections(HttpClientUtil.java:179)
> at
> org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:33)
> at
> org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:115)
> at
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:105)
> at
> org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:134)
> at
> com.sun.enterprise.glassfish.bootstrap.GlassFishImpl.start(GlassFishImpl.java:79)
> at
> com.sun.enterprise.glassfish.bootstrap.GlassFishDecorator.start(GlassFishDecorator.java:63)
> at
> com.sun.enterprise.glassfish.bootstrap.osgi.OSGiGlassFishImpl.start(OSGiGlassFishImpl.java:69)
> at
> com.sun.enterprise.glassfish.bootstrap.GlassFishMain$Launcher.launch(GlassFishMain.java:117)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> com.sun.enterprise.glassfish.bootstrap.GlassFishMain.main(GlassFishMain.java:97)
> at com.sun.enterprise.glassfish.bootstrap.ASMain.main(ASMain.java:55)
> Caused by: java.io.IOException: Keystore was tampered with, or password
> was incorrect
>   at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:772)
> at
> sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:55)
> at java.security.KeyStore.load(KeyStore.java:1214)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:281)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:366)
> ... 50 more
> Caused by: java.security.UnrecoverableKeyException: Password verification
> failed
> at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:770)
> ... 54 more
>
>
>
>
>
>
>
>
>
>
>
>
>
> 
>  From: Chris Hostetter 
> To: "solr-user@lucene.apache.org" ; Sarita
> Nair 
> Sent: Tuesday, April 9, 2013 1:31 PM
> Subject: Re: Solr 4.2.1 SSLInitializationException
>
>
> : Deploying Solr 4.2.1 to GlassFish 3.1.1 results in the error below.  I
> : have seen similar problems being reported with Solr 4.2
>
> Are you trying to use server SSL with glassfish?
>
> can you please post the full stack trace so we can see where this error is
> coming from.
>
> My best guess is that this is coming from the changes made in
> SOLR-4451 to use system defaults correctly when initializing HttpClient,
> which suggets that your problem is exactly what the error message says...
>
>   "Keystore was tampered with, or password was incorrect"
>
> Is it possible that the default keystore password for your JVM (or as
> overridden by glassfish defaults - possibly using the
> "javax.net.ssl.keyStore" sysprop) has a password set on it?  If so you
> need to confiure your JVM with the standard java system properties to
> specify what that pas