Is there any C API for Solr??

2013-01-02 Thread Romita Saha
Hi All,

Is there any C API for Solr??

Thanks and regards,
Romita 

Re: Is there any C API for Solr??

2013-01-02 Thread Rafał Kuć
Hello!

Although not C, but C++ there is a project aiming at this - 
http://code.google.com/p/solcpp/

However I don't know how usable that is, you can just make pure HTTP
calls and process the response. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi All,

> Is there any C API for Solr??

> Thanks and regards,
> Romita



Re: Solr 4.0 NRT Search

2013-01-02 Thread Per Steffensen

On 1/1/13 2:07 PM, hupadhyay wrote:

I was reading a solr wiki located at
http://wiki.apache.org/solr/NearRealtimeSearch

It says all commitWithin are now soft commits.

can any one explain what does it means?
Soft commit means that the documents indexed before the soft commit will 
become searchable, but not necessarily persisted and flushed to disk (so 
you might loose data that has only been soft-committed (not 
hard-committed) in case of a crash)
Hard commit means that the documents indexed before the hard commit will 
become searchable and persisted and flushed to disk

Does It means commitWithin will not cause a hard commit?

Yes


Moreover that wiki itself is insufficient,as feature is NRT.

can any one list down the config steps to enable NRT in solr 4.0?
In your solrconfig.xml (sub-section " updateHandler") make sure that you 
have "autoSoftCommit" and/or "autoCommit" (hard commit) not commented 
out and that you have considered the values of maxDocs/maxTime. 
http://wiki.apache.org/solr/SolrConfigXml?#Update_Handler_Section


Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-NRT-Search-tp4029928.html
Sent from the Solr - User mailing list archive at Nabble.com.





RE: Solr 4.0 NRT Search

2013-01-02 Thread hupadhyay
Thanks for this valuable explanation.

It was very helpful.

Best Regards

Hardik Upadhyay


From: Per Steffensen [via Lucene] 
[mailto:ml-node+s472066n4030004...@n3.nabble.com]
Sent: Wednesday, January 02, 2013 2:10 PM
To: Hardik Upadhyay
Subject: Re: Solr 4.0 NRT Search

On 1/1/13 2:07 PM, hupadhyay wrote:
> I was reading a solr wiki located at
> http://wiki.apache.org/solr/NearRealtimeSearch
>
> It says all commitWithin are now soft commits.
>
> can any one explain what does it means?
Soft commit means that the documents indexed before the soft commit will
become searchable, but not necessarily persisted and flushed to disk (so
you might loose data that has only been soft-committed (not
hard-committed) in case of a crash)
Hard commit means that the documents indexed before the hard commit will
become searchable and persisted and flushed to disk
> Does It means commitWithin will not cause a hard commit?
Yes
>
> Moreover that wiki itself is insufficient,as feature is NRT.
>
> can any one list down the config steps to enable NRT in solr 4.0?
In your solrconfig.xml (sub-section " updateHandler") make sure that you
have "autoSoftCommit" and/or "autoCommit" (hard commit) not commented
out and that you have considered the values of maxDocs/maxTime.
http://wiki.apache.org/solr/SolrConfigXml?#Update_Handler_Section
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-NRT-Search-tp4029928.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-NRT-Search-tp4029928p4030004.html
To unsubscribe from Solr 4.0 NRT Search, click 
here.
NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-NRT-Search-tp4029928p4030007.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Compound Terms query parser

2013-01-02 Thread Mikhail Khludnev
Arcadius,

It can be easily achieved by extending RequestHandlerBase and implementing
straightforward looping through other request handlers via
solrCore.getRequestHandler(name).handleRequest(req,resp).

I have no spare time to contribute it - it's about #10 in my TODO list.
I'm replying to mail list please use it to follow up further.

Good luck.



On Sat, Dec 29, 2012 at 7:01 PM, Arcadius Ahouansou wrote:

>
> Good morning Mikhail.
>
> I hope you had a nice Christmas.
>
> I came across your excellent presentation at:
>
>
> http://archive.apachecon.com/eu2012/presentations/07-Wednesday/L1R-Lucene/aceu-2012-compound-terms-query-parser-for-great-shopping-experience.pdf
>
>
>
> We are doing movie search but the situation is very similar to the one you
> were talking about.
>
> People can search for movies by title, summary, actors, director etc in
> ine single search field.
>
> For instance when people search for "Die Hard", the current eDismax will
> return results where for instance the title contains "Die"  and the summary
> contains "Hard".
>
> We want to avoid searching across multiple fields in one go and instead,
> search for the full keyword in the title first then when nothing is found
> ,search in the summary... etc
>
> The flowchart shown on your slide 52 (Captain Obvious to the rescue) may
> help us achieve this.
>
> I would like to ask whether you could share a bit more detail about how
> this was implemented.
>
> Thank you very much.
>
>
> Arcadius.
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Max number of core in Solr multi-core

2013-01-02 Thread Erick Erickson
This is a common approach to this problem, having separate
cores keeps the apps from influencing each other when it comes
to term frequencies & etc. It also keeps the chances of returning
the wrong data do a minimum.

As to how many cores can fit, "it depends" (tm). There's lots of
work going on right now, see: http://wiki.apache.org/solr/LotsOfCores.

But having all those cores does allow you to expand your system
pretty easily if you do run over the limit your hardware can handle, just
move the entire core to a new machine. Only testing will tell
you where that limit is.

Best
Erick


On Wed, Jan 2, 2013 at 7:18 AM, Parvin Gasimzade  wrote:

> Hi all,
>
> We have a system that enables users to create applications and store data
> on their application. We want to separate the index of each application. We
> create a core for each application and search on the given application when
> user make query. Since there isn't any relation between the applications,
> this solution could perform better than the storing all index together.
>
> I have two questions related to this.
> 1. Is this a good solution? If not could you please suggest any better
> solution?
> 2. Is there a limit on the number of core that I can create on Solr? There
> will be thousands maybe more application on the system.
>
> P.S. This question is also asked in the
> stackoverflow<
> http://stackoverflow.com/questions/14121624/max-number-of-core-in-solr-multi-core
> >
> .
>
> Thanks,
> Parvin
>


index copy omits documents

2013-01-02 Thread UnConundrum
I replicate from a live server to a backup server.  That backup server is
also used for development, so every night by cron, and sometimes manually, I
execute the following script to update a development core on the backup
server:


date
echo "stopping mysql slave"
mysqladmin -u intranet -ppassword stop-slave
echo "stopping solr replication"
curl
http://my_live_server.com:8983/solr/production/admin/replication?command=disablereplication/
echo "copying and uploading mysql tables"
mysqldump -u intranet -ppassword intranet_beta | mysql -u intranet
-ppassword intranet_beta_dev
echo copying production solr index to development
rm -rf /solr/test_site/development/data/*
cp -rf  /solr/test_site/production/data/* /solr/test_site/development/data/
echo "restarting solr replication"
curl
http://my_live_server.com:8983/solr/production/admin/replication?command=enablereplication/
echo "restarting mysql replication/slave"
mysqladmin -u intranet -ppassword start-slave
echo "finished"
date


I noticed last night that many documents are being omitted.  On the
production core, numDocs is 2476185, while on the development core numDocs
is 2357785, about 5% less.  The live server reports 2476192 docs (diff is
due to delay on my part).  

Does anyone have any idea why a copy of the directory doesn't result in the
same number of documents?
(FYI, this is affecting my development as queries are missing documents).

Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-copy-omits-documents-tp4030043.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: index copy omits documents

2013-01-02 Thread Jack Krupansky
Do you have any soft commits (or commitWithin that is a soft commit) that 
show up in queries but haven't yet been committed to disk? You have to do a 
hard commit to flush those to disk.


-- Jack Krupansky

-Original Message- 
From: UnConundrum

Sent: Wednesday, January 02, 2013 8:42 AM
To: solr-user@lucene.apache.org
Subject: index copy omits documents

I replicate from a live server to a backup server.  That backup server is
also used for development, so every night by cron, and sometimes manually, I
execute the following script to update a development core on the backup
server:


date
echo "stopping mysql slave"
mysqladmin -u intranet -ppassword stop-slave
echo "stopping solr replication"
curl
http://my_live_server.com:8983/solr/production/admin/replication?command=disablereplication/
echo "copying and uploading mysql tables"
mysqldump -u intranet -ppassword intranet_beta | mysql -u intranet
-ppassword intranet_beta_dev
echo copying production solr index to development
rm -rf /solr/test_site/development/data/*
cp -rf  /solr/test_site/production/data/* /solr/test_site/development/data/
echo "restarting solr replication"
curl
http://my_live_server.com:8983/solr/production/admin/replication?command=enablereplication/
echo "restarting mysql replication/slave"
mysqladmin -u intranet -ppassword start-slave
echo "finished"
date


I noticed last night that many documents are being omitted.  On the
production core, numDocs is 2476185, while on the development core numDocs
is 2357785, about 5% less.  The live server reports 2476192 docs (diff is
due to delay on my part).

Does anyone have any idea why a copy of the directory doesn't result in the
same number of documents?
(FYI, this is affecting my development as queries are missing documents).

Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-copy-omits-documents-tp4030043.html
Sent from the Solr - User mailing list archive at Nabble.com. 



CPU spikes on trunk

2013-01-02 Thread Markus Jelsma
Hi,

We have two clusters running on similar machines equipped with SSD's. One runs 
a 6 month old trunk check out and another always has a very recent check out. 
Both sometimes receive a few documents to index. The old cluster actually 
processes queries.

We've seen performance differences before, the idle new cluster is always more 
slow to respond than the old one. Top and other monitoring tools show frequent 
CPU-spikes even when nothing is going on, CPU usage increases when a proxy 
starts to admin/ping them.

Is anyone familiar with this observation? Did i miss something?

Thanks,
Markus


Re: CPU spikes on trunk

2013-01-02 Thread Alan Woodward
Hi Markus,

How recent a check-out from trunk are you running?  I added a bunch of 
statistics recording a few months back which we had to back out over Christmas 
because it was causing memory leaks.

Alan Woodward
a...@flax.co.uk


On 2 Jan 2013, at 15:15, Markus Jelsma wrote:

> Hi,
> 
> We have two clusters running on similar machines equipped with SSD's. One 
> runs a 6 month old trunk check out and another always has a very recent check 
> out. Both sometimes receive a few documents to index. The old cluster 
> actually processes queries.
> 
> We've seen performance differences before, the idle new cluster is always 
> more slow to respond than the old one. Top and other monitoring tools show 
> frequent CPU-spikes even when nothing is going on, CPU usage increases when a 
> proxy starts to admin/ping them.
> 
> Is anyone familiar with this observation? Did i miss something?
> 
> Thanks,
> Markus



Re: Max number of core in Solr multi-core

2013-01-02 Thread Per Steffensen
Furthermore, if you plan to index "a lot" of data per application, and 
you are using Solr 4.0.0+ (including Solr Cloud), you probably want to 
consider creating a collection per application instead of a core per 
application.


On 1/2/13 2:38 PM, Erick Erickson wrote:

This is a common approach to this problem, having separate
cores keeps the apps from influencing each other when it comes
to term frequencies & etc. It also keeps the chances of returning
the wrong data do a minimum.

As to how many cores can fit, "it depends" (tm). There's lots of
work going on right now, see: http://wiki.apache.org/solr/LotsOfCores.

But having all those cores does allow you to expand your system
pretty easily if you do run over the limit your hardware can handle, just
move the entire core to a new machine. Only testing will tell
you where that limit is.

Best
Erick


On Wed, Jan 2, 2013 at 7:18 AM, Parvin Gasimzade 
wrote:
Hi all,

We have a system that enables users to create applications and store data
on their application. We want to separate the index of each application. We
create a core for each application and search on the given application when
user make query. Since there isn't any relation between the applications,
this solution could perform better than the storing all index together.

I have two questions related to this.
1. Is this a good solution? If not could you please suggest any better
solution?
2. Is there a limit on the number of core that I can create on Solr? There
will be thousands maybe more application on the system.

P.S. This question is also asked in the
stackoverflow<
http://stackoverflow.com/questions/14121624/max-number-of-core-in-solr-multi-core
.

Thanks,
Parvin





RE: CPU spikes on trunk

2013-01-02 Thread Markus Jelsma
Hi Alan,

I noticed that issue but i'm using today's check out.

Thanks,
Markus

 
 
-Original message-
> From:Alan Woodward 
> Sent: Wed 02-Jan-2013 16:30
> To: solr-user@lucene.apache.org
> Subject: Re: CPU spikes on trunk
> 
> Hi Markus,
> 
> How recent a check-out from trunk are you running?  I added a bunch of 
> statistics recording a few months back which we had to back out over 
> Christmas because it was causing memory leaks.
> 
> Alan Woodward
> a...@flax.co.uk
> 
> 
> On 2 Jan 2013, at 15:15, Markus Jelsma wrote:
> 
> > Hi,
> > 
> > We have two clusters running on similar machines equipped with SSD's. One 
> > runs a 6 month old trunk check out and another always has a very recent 
> > check out. Both sometimes receive a few documents to index. The old cluster 
> > actually processes queries.
> > 
> > We've seen performance differences before, the idle new cluster is always 
> > more slow to respond than the old one. Top and other monitoring tools show 
> > frequent CPU-spikes even when nothing is going on, CPU usage increases when 
> > a proxy starts to admin/ping them.
> > 
> > Is anyone familiar with this observation? Did i miss something?
> > 
> > Thanks,
> > Markus
> 
> 


Re: index copy omits documents

2013-01-02 Thread UnConundrum
Jack Krupansky-2 wrote
> Do you have any soft commits ble.com.

I don't think so, especially since the Live production core and replicated
production core are only 7 documents apart. I haven't used the copied
development core at all, nor do I ever use the replicated production core
since it serves as a backup to the live production core and I want to keep
it pristine.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-copy-omits-documents-tp4030043p4030074.html
Sent from the Solr - User mailing list archive at Nabble.com.


Upgrading from 3.6 to 4.0

2013-01-02 Thread Benjamin, Roy
Will the existing 3.6 indexes work with 4.0 binary ?

Will 3.6 solrJ clients work with 4.0 servers ?


Thanks
Roy


Re: Upgrading from 3.6 to 4.0

2013-01-02 Thread Lance Norskog
Indexes will not work. I have not heard of an index upgrader. If you run 
your 3.6 and new 4.0 Solr at the same time, you can upload all the data 
with a DataImportHandler script using the SolrEntityProcessor.


How large are your indexes? 4.1 indexes will not match 4.0, so you will 
have to upload everything twice. You might want to wait, or use a build 
from the 4.x trunk.


SolrJ client apps should work with 4.0.

On 01/02/2013 10:04 AM, Benjamin, Roy wrote:

Will the existing 3.6 indexes work with 4.0 binary ?

Will 3.6 solrJ clients work with 4.0 servers ?


Thanks
Roy




Re: Upgrading from 3.6 to 4.0

2013-01-02 Thread Upayavira
Indexes will not work???

I've upgraded a 3.6 index. I used the new Solrconfig from 4.0, and had
to do some hacking to my schema (e.g. add a version field) to make it
work, but once I'd done that, all was fine.

As I understand it, any Lucene instance can understand the format of an
index from one major version below. Thus, my 4.0 system could read the
index of my 3.6 cores, but the one core not updated since 1.4 it
couldn't read (and refused to load the core).

Optimising will cause your index to be rewritten in the new format, as
will waiting for background merges to happen. That's what I understand,
anyway.

Upayavira

On Wed, Jan 2, 2013, at 06:10 PM, Lance Norskog wrote:
> Indexes will not work. I have not heard of an index upgrader. If you run 
> your 3.6 and new 4.0 Solr at the same time, you can upload all the data 
> with a DataImportHandler script using the SolrEntityProcessor.
> 
> How large are your indexes? 4.1 indexes will not match 4.0, so you will 
> have to upload everything twice. You might want to wait, or use a build 
> from the 4.x trunk.
> 
> SolrJ client apps should work with 4.0.
> 
> On 01/02/2013 10:04 AM, Benjamin, Roy wrote:
> > Will the existing 3.6 indexes work with 4.0 binary ?
> >
> > Will 3.6 solrJ clients work with 4.0 servers ?
> >
> >
> > Thanks
> > Roy
> 


Re: Upgrading from 3.6 to 4.0

2013-01-02 Thread Tomás Fernández Löbbe
AFAIK Solr 4 should be able to read Solr 3.6 indexes. Soon those files will
be updated to 4.0 format and will not be readable by Solr 3.6 anymore. See
http://wiki.apache.org/lucene-java/BackwardsCompatibility
You should not use a a 3.6 SolrJ client with Solr 4 server.

Tomás


On Wed, Jan 2, 2013 at 3:04 PM, Benjamin, Roy  wrote:

> Will the existing 3.6 indexes work with 4.0 binary ?
>
> Will 3.6 solrJ clients work with 4.0 servers ?
>
>
> Thanks
> Roy
>


RE: Upgrading from 3.6 to 4.0

2013-01-02 Thread Benjamin, Roy
Thanks all,

>> You should not use a a 3.6 SolrJ client with Solr 4 server.

I run 100 shards and 20 clients. If above is correct then the entire
system must be shut down for many hours for an upgrade...

Thanks
Roy

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] 
Sent: Wednesday, January 02, 2013 10:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Upgrading from 3.6 to 4.0

AFAIK Solr 4 should be able to read Solr 3.6 indexes. Soon those files will be 
updated to 4.0 format and will not be readable by Solr 3.6 anymore. See 
http://wiki.apache.org/lucene-java/BackwardsCompatibility
You should not use a a 3.6 SolrJ client with Solr 4 server.

Tomás


On Wed, Jan 2, 2013 at 3:04 PM, Benjamin, Roy  wrote:

> Will the existing 3.6 indexes work with 4.0 binary ?
>
> Will 3.6 solrJ clients work with 4.0 servers ?
>
>
> Thanks
> Roy
>


Re: Upgrading from 3.6 to 4.0

2013-01-02 Thread Shawn Heisey

On 1/2/2013 11:34 AM, Benjamin, Roy wrote:

Thanks all,


You should not use a a 3.6 SolrJ client with Solr 4 server.

I run 100 shards and 20 clients. If above is correct then the entire
system must be shut down for many hours for an upgrade...


Using SolrJ 3.6 against a Solr 4 server will *probably* work, as long as 
you are not attempting to use SolrCloud, and your Solr 4 config is 
otherwise completely compatible with the older client.  If you just make 
the minimum adjustments required to get your 3.6 config to work in 4, 
rather than refactoring it to conform to the latest best practices, then 
you might be OK.  No guarantees can be made of course.  The best thing 
you can do is test everything on development hardware.


Additional note: If you are already using the newer server objects in 
SolrJ 3.6 (HttpSolrServer in most cases) you might be able to drop the 
v4 SolrJ jar into your 3.6 SolrJ app and have it continue to work.  You 
could give that a try before you even upgrade your server side.


Thanks,
Shawn



Best practices for Solr highlighter for CJK

2013-01-02 Thread Tom Burton-West
Hello all,

What are the best practices for setting up the highlighter to work with CJK?
We are using the ICUTokenizer with the CJKBigramFilter, so overlapping
bigrams are what are actually being searched. However the highlighter seems
to only highlight the first of any two overlapping bigrams.   i.e.  ABC =>
searched as AB BC  only AB gets highlighted even if the matching string is
ABC. (Where ABC are chinese characters such as 大亚湾  => searched as 大亚 亚湾,
but only   大亚 is highlighted rather than 大亚湾)

Is there some highlighting parameter that might fix this?

Tom Burton-West


Re: Upgrading from 3.6 to 4.0

2013-01-02 Thread Shawn Heisey

On 1/2/2013 11:48 AM, Shawn Heisey wrote:
Additional note: If you are already using the newer server objects in 
SolrJ 3.6 (HttpSolrServer in most cases) you might be able to drop the 
v4 SolrJ jar into your 3.6 SolrJ app and have it continue to work.  
You could give that a try before you even upgrade your server side.


That statement was incomplete - there are slightly different dependent 
jars for the updated SolrJ, so you'd have to update those too.


Thanks,
Shawn



Re: Best practices for Solr highlighter for CJK

2013-01-02 Thread Walter Underwood
Speaking from experience: if you are using bigrams for CJK, do not highlight. 
The results will look very wrong to someone who knows the language.

Even with a dictionary-based tokenizer, you'll need a client dictionary for 
local terms.

wunder

On Jan 2, 2013, at 10:51 AM, Tom Burton-West wrote:

> Hello all,
> 
> What are the best practices for setting up the highlighter to work with CJK?
> We are using the ICUTokenizer with the CJKBigramFilter, so overlapping
> bigrams are what are actually being searched. However the highlighter seems
> to only highlight the first of any two overlapping bigrams.   i.e.  ABC =>
> searched as AB BC  only AB gets highlighted even if the matching string is
> ABC. (Where ABC are chinese characters such as 大亚湾  => searched as 大亚 亚湾,
> but only   大亚 is highlighted rather than 大亚湾)
> 
> Is there some highlighting parameter that might fix this?
> 
> Tom Burton-West






RE: Override wt parameter

2013-01-02 Thread Manepalli, Kalyan
Anyother ideas to resolve this issue would be really helpful.

Thanks
Kalyan

Thanks,
Kalyan Manepalli

-Original Message-
From: Manepalli, Kalyan [mailto:kalyan.manepa...@orbitz.com] 
Sent: Friday, December 28, 2012 9:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Override wt parameter

I tried using invariants wt=xml, but it doesn¹t work.
Is anyone tried playing around with SolrCore.java changes?

On 12/28/12 7:17 PM, "Shawn Heisey"  wrote:

>On 12/28/2012 6:07 PM, Stefan Matheis wrote:
>> Kalyan
>>
>> I didn't test that .. but perhaps it may work out for you -- 
>>specifying "invariants" (possible per SearchHandler) like it's shown in the 
>>wiki:
>>http://wiki.apache.org/solr/SearchHandler#Configuration ?
>
>Stefan, your reply hadn't arrived before I sent my second reply.  I 
>suspect that if you make wt=xml an invariant option, SolrJ will still 
>send wt=javabin and when it gets the response, it will be expecting 
>javabin, not xml, which will result in an exception.  I could be 
>completely wrong; SolrJ may be smart enough, but I suspect that it's not.
>
>Thanks,
>Shawn
>



Re: indexing cpu utilization

2013-01-02 Thread Uwe Reh

Hi,

while trying to optimize our indexing workflow I reached the same 
endpoint like gabriel shen described in his mail. My Solr server won't 
utilize more than 40% of the computing power.
I made some tests, but i'm not able to find the bottleneck. Could 
anybody help to solve this quest?


At first let me describe the environment:

Server:
- Two socket Opteron (interlagos) => 32 cores
- 64Gb Ram (1600Mhz)
- SATA Disks: spindle and ssd
- Solaris 5.11
- JRE 1.7.0
- Solr 4.0
- ApplicationServer Jetty
- 1Gb network interface

Client:
- same hardware as client
- either multi threaded solrj client using multiple instances of 
HttpSolrServer
- or multi threaded solrj client using a ConcurrentUpdateSolrServer with 
100 threads


Problem:
- 10,000,000 docs of bibliographic data (~4k each)
- with a simplified schema definition it takes 10 hours to index <=> 
~250docs/second

- with the real schema.xml it takes 50 hours to index  <=> ~50docs/second
In both cases the client takes just 2% of the cpu resources and the 
server 35%. It's obvious that there is some optimization potential in 
the schema definition, but why uses the Server never more than 40% of 
the cpu power?



Discarded possible bottlenecks:
- Ram for the JVM
Solr takes only up to 12G of heap and there is just a negligible gc 
activity. So the increase from 16G to 32G of possible heap made no 
difference.

- Bandwidth of the net
The transmitted data is identical in both cases. The size of the 
transmitted data is somewhat below 50G. Since both machines have a 
dedicated 1G line to the switch, the raw transmission should not take 
much more than 10 minutes

- Performance of the client
Like above, the client ist fast enough for the simplified case (10h). A 
dry run (just preprocessing not indexing) may finish after 75 minutes.

- Servers disk IO
The size of the simpler index is ~100G the size of the other is ~150G. 
This makes factor of 1.5 not 5. The difference between a ssd and a real 
disk is not noticeable. The output of 'iostat' and 'zpool iostat' is 
unsuspicious.

- Bad thread distribution
'mpstat' shows a well distributed load over all cpus and a sensible 
amount of crosscalls (less than ten/cpu)

- Solr update parameter (solrconfig.xml)
Inspired from 
>http://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1 I'm using:

256
40
1024
native
true

Any changes on this Parameters made it worse.

To get an idea whats going on, I've done some statistics with visualvm. 
(see attachement)
The distribution of real and cpu time looks significant, but Im not 
smart enough to interpret the results.
The method 
org.apache.lucene.index.treadAffinityDocumentsWriterThreadPool.getAndLock() 
is active at 80% of the time but takes only 1% of the cpu time. On the 
other hand the second method 
org.apache.commons.codec.language.bm.PhoneticEngine$PhonemeBuilder.append() 
is active at 12% of the time and is always running on a cpu


So again the question "When there are free resources in all dimensions, 
why utilizes Solr not more than 40% of the computing Power"?

Bandwidth of the RAM?? I can't believe this. How to verify?
???

Any hints are welcome.
Uwe








Re: Spatial filter in solr 4.0 - "Intersects" operation with parameters

2013-01-02 Thread David Smiley (@MITRE.org)
Mladen,
  FYI I just committed this to 4.x: 
https://issues.apache.org/jira/browse/SOLR-4230
~ David


mladen micevic wrote
> Hi,
> I went through example for spatial search in Solr 4.0
> (http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
> Both indexing and searching work fine.
> 
> Example is: fq=geo:"Intersects(-74.093 41.042 -69.347 44.558)" 
> 
> My problem is how to send values to "Intersects" operation as parameters.
> If would like to send custom parameters in URL: 
> ...&lon1=-74.093&lat1=41.042&lon2=-69.347&lat2=44.558
> and have default filter query:
>   fq=geo:"Intersects($lon1 $lat1 $lon2 $lat2)
> I tried this approach - but it did not work.
> 
> How do I do this?
> 
> Using {!bbox} is not documented in 4.0 wiki.
> Anyways, I tried to use it against "geo" field but got following error:
>field does not support spatial filtering ...
> Can I use {!bbox}  in 4.0 ?
> 
> 
> Thanks.
> Mladen





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-filter-in-solr-4-0-Intersects-operation-with-parameters-tp4029034p4030138.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing cpu utilization (attachement)

2013-01-02 Thread Uwe Reh

Am 02.01.2013 22:39, schrieb Uwe Reh:

To get an idea whats going on, I've done some statistics with visualvm.
(see attachement)


"merde" the listserver stripes attachments.
You'll find the screen shot at 
>http://fantasio.rz.uni-frankfurt.de/solrtest/HotSpot.gif


uwe



Re: index copy omits documents

2013-01-02 Thread UnConundrum
Well, after hours of fighting with this, I decided to turn on replication
between the production and development cores and use curl commands to
disable replication to the development core and only run replication after
the script updates the development db, doing a sleep 5m, and then disable
replication again.   Seems to be working.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-copy-omits-documents-tp4030043p4030147.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr4.0 filter start up erorr in Eclipse

2013-01-02 Thread Pradeep Pujari
Hi,
I took solr4.0 code from lucene_solr_branch_4x and set up in Eclipse. I am 
using Tomcat 7 server in Eclipse. I am getting start-up error, although Solr 
comes up correctly. I do not see this error in Solr 3.6 start up time. 



INFO: Starting service Catalina
Jan 2, 2013 12:17:02 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.26
Jan 2, 2013 12:17:03 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter SolrRequestFilter
java.lang.ClassNotFoundException: org.apache.solr.servlet.SolrDispatchFilter
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1701)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1546)
at 
org.apache.catalina.core.DefaultInstanceManager.loadClass(DefaultInstanceManager.java:525)
at 
org.apache.catalina.core.DefaultInstanceManager.loadClassMaybePrivileged(DefaultInstanceManager.java:507)
at 
org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:124)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:256)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1566)
at 
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1556)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Jan 2, 2013 12:17:03 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error filterStart
Jan 2, 2013 12:17:03 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [] startup failed due to previous errors
Jan 2, 2013 12:17:03 PM org.apache.catalina.loader.WebappClassLoader 
validateJarFile
I


Re: Where is ISOLatin1AccentFilterFactory (Solr4)?

2013-01-02 Thread Uwe Reh

Hi,

I like the best of both worlds:

 

 Mask some specials like "C++" to "cplusplus" or "C#" to "csharp" ...

 

 Tokenize an identify on unicode whitespaces and charsets

 

 Well known splitter for composed words

 

 Perfect superset of 
 or the ISOLatin1AccentFilterFactory because it can handle composed and 
decomposed accents and umlauts

 
 Nice workaround for missing whitespace as word separator in this 
languages.



Am 01.01.2013 17:48, schrieb Jack Krupansky:

Hmmm... quite some time ago I switched from ASCIIFoldingFilterFactory
to MappingCharFilterFactory, because I was told (by who I can't recall)
that the latter was "better/preferred". Is there any particular reason
to favor one over the other?

-Original Message- From: Erick Erickson
ASCIIFoldingFilterFactory is preferred, does that suit your needs?




Solr Cloud with external zookeeper ensemble not starting

2013-01-02 Thread davers
When I try to start a solr server in my solr cloud I am receiving the error:

SEVERE: null:org.apache.solr.common.SolrException:
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error
processing /srv/solr//zoo.cfg

But I don't understand why I would need zoo.cfg in solr/home when I am
running an external ensemble.

I am starting my tomcat servers with the following:

-DzkRun -DzkHost=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181

If I am specifying my ensemble then why is solr looking for a zoo.cfg in
solr/home ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-with-external-zookeeper-ensemble-not-starting-tp4030160.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-02 Thread Mark Miller

On Jan 2, 2013, at 5:51 PM, Bill Au  wrote:

> Is anyone running Solr 4.0 SolrCloud with AWS auto scaling?
> 
> My concern is that as AWS auto scaling add and remove instances to
> SolrCloud, the number of nodes in SolrCloud Zookeeper config will grow
> indefinitely as removed instances will never be used again.  AWS auto
> scaling will keep on adding new instances, and there is no way to remove
> them from Zookeeper, right?

You can unload them and that removes them.

>  What's the effect of have all these phantom
> nodes?

Unless they are only replicas, they would need to be removed.

Also, unless you are using elastic ips, 
https://issues.apache.org/jira/browse/SOLR-4078 may be of interest.

- Mark

Re: Solr Cloud with external zookeeper ensemble not starting

2013-01-02 Thread Mark Miller
Don't use zkRun, just zkHost. zkRun is for internal.

- Mark

On Jan 2, 2013, at 6:44 PM, davers  wrote:

> When I try to start a solr server in my solr cloud I am receiving the error:
> 
> SEVERE: null:org.apache.solr.common.SolrException:
> org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error
> processing /srv/solr//zoo.cfg
> 
> But I don't understand why I would need zoo.cfg in solr/home when I am
> running an external ensemble.
> 
> I am starting my tomcat servers with the following:
> 
> -DzkRun -DzkHost=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
> 
> If I am specifying my ensemble then why is solr looking for a zoo.cfg in
> solr/home ?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Cloud-with-external-zookeeper-ensemble-not-starting-tp4030160.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: CPU spikes on trunk

2013-01-02 Thread Mark Miller
Any chance you can hook up to a node with something like visual vm and sample 
some method calls or something?

- Mark

On Jan 2, 2013, at 10:15 AM, Markus Jelsma  wrote:

> Hi,
> 
> We have two clusters running on similar machines equipped with SSD's. One 
> runs a 6 month old trunk check out and another always has a very recent check 
> out. Both sometimes receive a few documents to index. The old cluster 
> actually processes queries.
> 
> We've seen performance differences before, the idle new cluster is always 
> more slow to respond than the old one. Top and other monitoring tools show 
> frequent CPU-spikes even when nothing is going on, CPU usage increases when a 
> proxy starts to admin/ping them.
> 
> Is anyone familiar with this observation? Did i miss something?
> 
> Thanks,
> Markus



Re: Exception on getMBeanInfo

2013-01-02 Thread Mark Miller
Solr 4.0?

I think there is a JIRA and fix for this in for 4.1.

- Mark

On Dec 28, 2012, at 7:20 AM, Marcin Rzewucki  wrote:

> Hi,
> 
> I found in logs that sometimes the following error (more lines at the end
> of this mail) occurs on Solr startup or core reload:
> 
> Dec 28, 2012 8:42:01 AM
> org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean getMBeanInfo
> WARNING: Could not getStatistics on info bean
> org.apache.solr.handler.ReplicationHandler
> java.lang.IllegalArgumentException: /solr/cores/my_core/data/index does not
> exist
> 
> Indeed, this directory does not exist. However, there
> is /solr/cores/my_core/data/index.20121228091226651
> There's also "index.properties" file with the name of index directory.
> Generally, there is no other issue with that - querying and indexing works.
> Does it mean that getMBeanInfo is not using this file and always looks for
> .../data/index ? Or maybe this is typical for some issue with Solr
> configuration (I use SolrCloud4x) ? If so, how to avoid it ?
> 
> Dec 28, 2012 8:42:01 AM
> org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean getMBeanInfo
> WARNING: Could not getStatistics on info bean
> org.apache.solr.handler.ReplicationHandler
> java.lang.IllegalArgumentException: /solr/cores/my_core/data/index does not
> exist
>at
> org.apache.commons.io.FileUtils.sizeOfDirectory(FileUtils.java:2074)
>at
> org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:477)
>at
> org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:525)
>at
> org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:231)
>at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(Unknown
> Source)
>at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(Unknown
> Source)
>at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(Unknown
> Source)
>at
> org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140)
>at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
>at
> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:633)
>at org.apache.solr.core.SolrCore.(SolrCore.java:736)
>at org.apache.solr.core.SolrCore.(SolrCore.java:566)
>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
>at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
>at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
>at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
>at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
>at
> org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>at
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
>at
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
>at
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
>at
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>at
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
>at
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
>at
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
>at
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
>at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
>at
> org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
>at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
>at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>at
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
>at
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:5

Re: indexing cpu utilization

2013-01-02 Thread Mark Miller
32 cores eh? You probably have to raise some limits to take advantage of that.

https://issues.apache.org/jira/browse/SOLR-4078
support configuring IndexWriter max thread count in solrconfig

That's coming in 4.1 and is likely important - the default is only 8.

You might always want to experiment with using more merge threads? I think the 
default may be 3.

Beyond that, you may want to look at running multiple jvms on the one host and 
doing distributed. That can certainly have benefits, but you have to weigh 
against the management costs. And make sure process->processor affinity is in 
gear.

Finally, make sure you are using many threads to add docs...

- Mark

On Jan 2, 2013, at 4:39 PM, Uwe Reh  wrote:

> Hi,
> 
> while trying to optimize our indexing workflow I reached the same endpoint 
> like gabriel shen described in his mail. My Solr server won't utilize more 
> than 40% of the computing power.
> I made some tests, but i'm not able to find the bottleneck. Could anybody 
> help to solve this quest?
> 
> At first let me describe the environment:
> 
> Server:
> - Two socket Opteron (interlagos) => 32 cores
> - 64Gb Ram (1600Mhz)
> - SATA Disks: spindle and ssd
> - Solaris 5.11
> - JRE 1.7.0
> - Solr 4.0
> - ApplicationServer Jetty
> - 1Gb network interface
> 
> Client:
> - same hardware as client
> - either multi threaded solrj client using multiple instances of 
> HttpSolrServer
> - or multi threaded solrj client using a ConcurrentUpdateSolrServer with 100 
> threads
> 
> Problem:
> - 10,000,000 docs of bibliographic data (~4k each)
> - with a simplified schema definition it takes 10 hours to index <=> 
> ~250docs/second
> - with the real schema.xml it takes 50 hours to index  <=> ~50docs/second
> In both cases the client takes just 2% of the cpu resources and the server 
> 35%. It's obvious that there is some optimization potential in the schema 
> definition, but why uses the Server never more than 40% of the cpu power?
> 
> 
> Discarded possible bottlenecks:
> - Ram for the JVM
> Solr takes only up to 12G of heap and there is just a negligible gc activity. 
> So the increase from 16G to 32G of possible heap made no difference.
> - Bandwidth of the net
> The transmitted data is identical in both cases. The size of the transmitted 
> data is somewhat below 50G. Since both machines have a dedicated 1G line to 
> the switch, the raw transmission should not take much more than 10 minutes
> - Performance of the client
> Like above, the client ist fast enough for the simplified case (10h). A dry 
> run (just preprocessing not indexing) may finish after 75 minutes.
> - Servers disk IO
> The size of the simpler index is ~100G the size of the other is ~150G. This 
> makes factor of 1.5 not 5. The difference between a ssd and a real disk is 
> not noticeable. The output of 'iostat' and 'zpool iostat' is unsuspicious.
> - Bad thread distribution
> 'mpstat' shows a well distributed load over all cpus and a sensible amount of 
> crosscalls (less than ten/cpu)
> - Solr update parameter (solrconfig.xml)
> Inspired from 
> >http://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1
>  I'm using:
>> 256
>> 40
>> 1024
>> native
>> true
> Any changes on this Parameters made it worse.
> 
> To get an idea whats going on, I've done some statistics with visualvm. (see 
> attachement)
> The distribution of real and cpu time looks significant, but Im not smart 
> enough to interpret the results.
> The method 
> org.apache.lucene.index.treadAffinityDocumentsWriterThreadPool.getAndLock() 
> is active at 80% of the time but takes only 1% of the cpu time. On the other 
> hand the second method 
> org.apache.commons.codec.language.bm.PhoneticEngine$PhonemeBuilder.append() 
> is active at 12% of the time and is always running on a cpu
> 
> So again the question "When there are free resources in all dimensions, why 
> utilizes Solr not more than 40% of the computing Power"?
> Bandwidth of the RAM?? I can't believe this. How to verify?
> ???
> 
> Any hints are welcome.
> Uwe
> 
> 
> 
> 
> 
> 



Re: Compound Terms query parser

2013-01-02 Thread Arcadius Ahouansou
Thanks Mikhail.

I will have a look at the RequestHandlerBase.

Arcadius.



On 2 January 2013 12:22, Mikhail Khludnev wrote:

> Arcadius,
>
> It can be easily achieved by extending RequestHandlerBase and implementing
> straightforward looping through other request handlers via
> solrCore.getRequestHandler(name).handleRequest(req,resp).
>
> I have no spare time to contribute it - it's about #10 in my TODO list.
> I'm replying to mail list please use it to follow up further.
>
> Good luck.
>
>
>
> On Sat, Dec 29, 2012 at 7:01 PM, Arcadius Ahouansou 
> wrote:
>
>>
>> Good morning Mikhail.
>>
>> I hope you had a nice Christmas.
>>
>> I came across your excellent presentation at:
>>
>>
>> http://archive.apachecon.com/eu2012/presentations/07-Wednesday/L1R-Lucene/aceu-2012-compound-terms-query-parser-for-great-shopping-experience.pdf
>>
>>
>>
>> We are doing movie search but the situation is very similar to the one
>> you were talking about.
>>
>> People can search for movies by title, summary, actors, director etc in
>> ine single search field.
>>
>> For instance when people search for "Die Hard", the current eDismax will
>> return results where for instance the title contains "Die"  and the summary
>> contains "Hard".
>>
>> We want to avoid searching across multiple fields in one go and instead,
>> search for the full keyword in the title first then when nothing is found
>> ,search in the summary... etc
>>
>> The flowchart shown on your slide 52 (Captain Obvious to the rescue) may
>> help us achieve this.
>>
>> I would like to ask whether you could share a bit more detail about how
>> this was implemented.
>>
>> Thank you very much.
>>
>>
>> Arcadius.
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>


Re: Solr Collection API doesn't seem to be working

2013-01-02 Thread Mark Miller
Unfortunately, for 4.0, the collections API was pretty bare bones. You don't 
actually get back responses currently - you just pass off the create command to 
zk for the Overseer to pick up and execute.

So you actually have to check the logs of the Overseer to see what the problem 
may be. I'm working on making sure we address this for 4.1.

If you look at the admin UI, in the zk tree, you should be able to see what 
node is the overseer (look for its election node). The logs for that node 
should indicate the problem.

FYI, if I remember right, replication factor is not currently optional.

In the future, I'd like it so you can say like replicationFactor=max_int, and 
the overseer will periodically try to match that given the nodes it sees - but 
we don't have that yet.

When you add new nodes, to add them to a current collection you will either 
have to use CoreAdmin API or pre configure the cores in solr.xml. All you need 
is to specify a matching collection name for the new core.

- Mark

On Jan 2, 2013, at 7:58 PM, davers  wrote:

> Hello I have an external zookeeper ensemble and I have started 6 tomcat
> servers with solr running with the default solr.xml
> 
> I have uploaded the config directory to zookeeper and linked the collection.
> 
> http://d.pr/i/BkT
> 
> My solr.xml is nearly the default with a minor adjustment for the jetty.port
> parameter:
> 
>   host="${host:}" hostPort="${jetty.port:8080}" hostContext="${hostContext:}"
> zkClientTimeout="${zkClientTimeout:15000}">
>  
> 
> When I try to create the collection issuing the command:
> /solr/admin/collections?action=CREATE&name=productindex&numShards=3 (I leave
> replicationFactor out because I want my cloud to automatically re-size
> vertically as I add or remove replicant servers) I get output in the log:
> 
> INFO: Creating Collection : numShards=3&name=productindex&action=CREATE
> 
> But nothing happens. The solr.xml files are not updated. What am I doing
> wrong?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Collection-API-doesn-t-seem-to-be-working-tp4030182.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Collection API doesn't seem to be working

2013-01-02 Thread davers
This is what I get from the leader overseer log:

2013-01-02 18:04:24,663 - INFO  [ProcessThread:-1:PrepRequestProcessor@419]
- Got user-level KeeperException when processing sessionid:0x23bfe1d4c280001
type:create cxid:0x58 zxid:0xfffe txntype:unknown reqpath:n/a
Error Path:/overseer Error:KeeperErrorCode = NodeExists for /overseer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Collection-API-doesn-t-seem-to-be-working-tp4030182p4030188.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Collection API doesn't seem to be working

2013-01-02 Thread Yonik Seeley
On Wed, Jan 2, 2013 at 9:21 PM, davers  wrote:
> So by providing the correct replicationFactor parameter for the number of
> servers has fixed my issue.
>
> So can you not provide a higher replicationFactor than you have live_nodes?
> What if you want to add more replicants to the collection in the future?

I advocated that replicationFactor / maxShardsPerNode only be a
target, not a requirement in
https://issues.apache.org/jira/browse/SOLR-4114
and I hope that's in what will be 4.1, but I haven't verified.

-Yonik
http://lucidworks.com


Re: SolrJ | IOException while Indexing a PDF document with additional fields

2013-01-02 Thread Lance Norskog

You did not include the stack trace. Oops.

Try using fewer threads with the concurrent uploader, or use the 
single-threaded one.


On 01/01/2013 03:55 PM, uwe72 wrote:

the problem occurrs when i add a lot of values to a multivalue field. id i
add just a few, then it works.

this is the full stack trace:





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-IOException-while-Indexing-a-PDF-document-with-additional-fields-tp4029971p4029978.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: indexing cpu utilization

2013-01-02 Thread Gora Mohanty
On 3 January 2013 05:55, Mark Miller  wrote:
>
> 32 cores eh? You probably have to raise some limits to take advantage of
> that.
>
> https://issues.apache.org/jira/browse/SOLR-4078
> support configuring IndexWriter max thread count in solrconfig
>
> That's coming in 4.1 and is likely important - the default is only 8.
>
> You might always want to experiment with using more merge threads? I think
> the default may be 3.
>
> Beyond that, you may want to look at running multiple jvms on the one host
> and doing distributed. That can certainly have benefits, but you have to
> weigh against the management costs. And make sure process->processor
> affinity is in gear.
>
> Finally, make sure you are using many threads to add docs...
[...]

Yes, making sure to use many threads is definitely good.
We also found that indexing to multiple Solr cores, and
doing one merge of all the indices at the end dramatically
improved indexing time. As long as we had roughly one
CPU core per Solr core (I am guessing that had to do
with threading) indexing speed increased linearly with the
number of Solr cores. Yes, the merge at the end is slow,
and needs large disk space (at least twice the total index
size), but one wins overall.

Regards,
Gora


Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-02 Thread Otis Gospodnetic
We've considered using AWS Beanstalk (hmm, what's the difference between
AWS auto scaling and elastic beanstalk? not sure.) for search-lucene.com ,
but the idea of something adding and removing nodes seems scary.  The
scariest part to me is automatic removal of wrong nodes that ends up in
data loss or insufficient number of replicas.

But if somebody has done thing and has written up a how-to, I'd love to see
it!

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Jan 2, 2013 at 5:51 PM, Bill Au  wrote:

> Is anyone running Solr 4.0 SolrCloud with AWS auto scaling?
>
> My concern is that as AWS auto scaling add and remove instances to
> SolrCloud, the number of nodes in SolrCloud Zookeeper config will grow
> indefinitely as removed instances will never be used again.  AWS auto
> scaling will keep on adding new instances, and there is no way to remove
> them from Zookeeper, right?  What's the effect of have all these phantom
> nodes?
>
> Bill
>


Re: indexing cpu utilization

2013-01-02 Thread Otis Gospodnetic
I, too, was going to point out to the number of threads, but was going to
suggest using fewer of them because the server has 32 cores and there was a
mention of 100 threads being used from the client.  Thus, my guess was that
the machine is busy juggling threads and context switching (how's vmstat 2
output, Uwe?) instead of doing the real work.

Mark wanted to point this other issue:
https://issues.apache.org/jira/browse/SOLR-3929 though, so try that, too.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Jan 2, 2013 at 11:13 PM, Gora Mohanty  wrote:

> On 3 January 2013 05:55, Mark Miller  wrote:
> >
> > 32 cores eh? You probably have to raise some limits to take advantage of
> > that.
> >
> > https://issues.apache.org/jira/browse/SOLR-4078
> > support configuring IndexWriter max thread count in solrconfig
> >
> > That's coming in 4.1 and is likely important - the default is only 8.
> >
> > You might always want to experiment with using more merge threads? I
> think
> > the default may be 3.
> >
> > Beyond that, you may want to look at running multiple jvms on the one
> host
> > and doing distributed. That can certainly have benefits, but you have to
> > weigh against the management costs. And make sure process->processor
> > affinity is in gear.
> >
> > Finally, make sure you are using many threads to add docs...
> [...]
>
> Yes, making sure to use many threads is definitely good.
> We also found that indexing to multiple Solr cores, and
> doing one merge of all the indices at the end dramatically
> improved indexing time. As long as we had roughly one
> CPU core per Solr core (I am guessing that had to do
> with threading) indexing speed increased linearly with the
> number of Solr cores. Yes, the merge at the end is slow,
> and needs large disk space (at least twice the total index
> size), but one wins overall.
>
> Regards,
> Gora
>


Re: Upgrading from 3.6 to 4.0

2013-01-02 Thread Otis Gospodnetic
Hi Roy,

Unless your servers are maxed out, I'd do with:

1. Set up Solr 4.0 on the same servers... or just 1 (additional even)
2. Reindex to Solr 4.0 or use SolrEntityProcessor if all fields from 3.6
index are stored
3. Add the super secret, for your eyes only 21st client with Solr 3.6 and
point it to the Solr 4.0 index.

I'd put my money on "Solr 3.6 client will work with Solr 4.0".
And yes, Solr 4.0 should be able to read Solr 3.6.* indices.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Jan 2, 2013 at 1:34 PM, Benjamin, Roy  wrote:

> Thanks all,
>
> >> You should not use a a 3.6 SolrJ client with Solr 4 server.
>
> I run 100 shards and 20 clients. If above is correct then the entire
> system must be shut down for many hours for an upgrade...
>
> Thanks
> Roy
>
> -Original Message-
> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
> Sent: Wednesday, January 02, 2013 10:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Upgrading from 3.6 to 4.0
>
> AFAIK Solr 4 should be able to read Solr 3.6 indexes. Soon those files
> will be updated to 4.0 format and will not be readable by Solr 3.6 anymore.
> See http://wiki.apache.org/lucene-java/BackwardsCompatibility
> You should not use a a 3.6 SolrJ client with Solr 4 server.
>
> Tomás
>
>
> On Wed, Jan 2, 2013 at 3:04 PM, Benjamin, Roy  wrote:
>
> > Will the existing 3.6 indexes work with 4.0 binary ?
> >
> > Will 3.6 solrJ clients work with 4.0 servers ?
> >
> >
> > Thanks
> > Roy
> >
>