null when using HttpSolrServer

2012-11-06 Thread sh

Good day,

I recently moved to solrj 3.6.1. As the CommonsHttpSolrServer class is deprecated in that version I 
migrated to HttpSolrServer. But now tika does not generate the stream_size field correctly, it is 
saying in the result response for an arbitrary jpeg file null. Is there any known way to fix that?

The extract handler is defined as:
  
  solrconfig.xml


  

  true
  file_owner
  file_path

  

 the field in schema.xml looks like that:

 

Kind regards,

Silvio


Solr / Velocity url rewrite

2012-11-06 Thread Sébastien Dartigues
Hi all,

Today i'm using solritas as front-end for the solr search engine.

But i would like to do url rewriting to deliver urls more compliant with
SEO.

First the end user types that kind of url : http://host.com/query/myquery

So this url should be rewriten internally (kind of reverse proxy) in
http://localhost:8983/query?q=myquery.

This internal url should not be displayed to the end user and in return
when the result page is displayed all the links in the page should be
rewritten with a SEO compliant url.

I tried to perform some tests with an apache front end by using mod_proxy
but i didn't succeed to pass url parameters.
Does someone ever tried to do SEO with solr search engine (solritas front)?

Thanks for your help.


RE: Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException

2012-11-06 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-4037

 
 
-Original message-
> From:Mark Miller 
> Sent: Sat 03-Nov-2012 14:24
> To: solr-user@lucene.apache.org
> Subject: Re: Continuous Ping query caused exception: 
> java.util.concurrent.RejectedExecutionException
> 
> 
> On Nov 1, 2012, at 5:39 AM, Markus Jelsma  wrote:
> 
> > File bug?
> 
> Please.
> 
> - Mark


RE: SolrCloud indexing blocks if node is recovering

2012-11-06 Thread Markus Jelsma

https://issues.apache.org/jira/browse/SOLR-4038
Still trying to gather the logs
 
 
-Original message-
> From:Mark Miller 
> Sent: Sat 03-Nov-2012 14:17
> To: Markus Jelsma 
> Cc: solr-user@lucene.apache.org
> Subject: Re: SolrCloud indexing blocks if node is recovering
> 
> The OOM machine and any surrounding if possible (eg especially the leader of 
> the shard).
> 
> Not sure what I'm looking for yet, so the more info the better.
> 
> - Mark
> 
> On Nov 3, 2012, at 5:23 AM, Markus Jelsma  wrote:
> 
> > Hi - yes, i should be able to make sense out of them next monday. I assume 
> > you're not too interested in the OOM machine but all surrounding nodes that 
> > blocked instead? 
> > 
> > 
> > 
> > -Original message-
> >> From:Mark Miller 
> >> Sent: Sat 03-Nov-2012 03:14
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: SolrCloud indexing blocks if node is recovering
> >> 
> >> Doesn't sound right. Still have the logs?
> >> 
> >> - Mark
> >> 
> >> On Fri, Nov 2, 2012 at 9:45 AM, Markus Jelsma
> >>  wrote:
> >>> Hi,
> >>> 
> >>> We just tested indexing some million docs from Hadoop to a 10 node 2 rep 
> >>> SolrCloud cluster with this week's trunk. One of the nodes gave an OOM 
> >>> but indexing continued without interruption. When i restarted the node 
> >>> indexing stopped completely, the node tried to recover - which was 
> >>> unsuccessful. I restarted the node again but that wasn't very helpful 
> >>> either. Finally i decided to stop the node completely and see what 
> >>> happens - indexing resumed.
> >>> 
> >>> Why or how won't the other nodes accept incoming documents when one node 
> >>> behaves really bad? The dying node wasn't the node we were sending 
> >>> documents to and we are not using CloudSolrServer yet (see other thread). 
> >>> Is this known behavior? Is it a bug?
> >>> 
> >>> Thanks,
> >>> Markus
> >> 
> >> 
> >> 
> >> -- 
> >> - Mark
> >> 
> 
> 


Re: Where to get more documents or references about sold cloud?

2012-11-06 Thread Lance Norskog
LucidFind is a searchable archive of Solr documentation and email lists:

http://find.searchhub.org/?q=solrcloud

- Original Message -
| From: "Jack Krupansky" 
| To: solr-user@lucene.apache.org
| Sent: Monday, November 5, 2012 4:44:46 AM
| Subject: Re: Where to get more documents or references about sold cloud?
| 
| Is most of the Web blocked in your location? When I Google
| "SolrCloud",
| Google says that there are "About 61,400 results" with LOTS of
| informative
| links, including blogs, videos, slideshares, etc. just on the first
| two
| pages pf search results alone.
| 
| If you have specific questions, please ask them with specific detail,
| but
| try reading a few of the many sources of information available on the
| Web
| first.
| 
| -- Jack Krupansky
| 
| -Original Message-
| From: SuoNayi
| Sent: Monday, November 05, 2012 3:32 AM
| To: solr-user@lucene.apache.org
| Subject: Where to get more documents or references about sold cloud?
| 
| Hi all, there is only one entry about solr cloud on the
| wiki,http://wiki.apache.org/solr/SolrCloud.
| I have googled a lot and found no more details about solr cloud, or
| maybe I
| miss something?
| 
| 


Re: Does SolrCloud supports MoreLikeThis?

2012-11-06 Thread Lance Norskog
The question you meant to ask is: "Does MoreLikeThis support Distributed 
Search?" and the answer apparently is no. This is the issue to get it working:

https://issues.apache.org/jira/browse/SOLR-788

("Distributed Search" is independent of SolrCloud.) If you want to make unit 
tests, that would really help- they won't work now but they will make it easier 
for someone to get the patch working again. Also, the patch will not get 
committed without unit tests.

Lance

- Original Message -
| From: "Luis Cappa Banda" 
| To: solr-user@lucene.apache.org
| Sent: Monday, November 5, 2012 7:54:59 AM
| Subject: Re: Does SolrCloud supports MoreLikeThis?
| 
| Thanks for the answer, Darren! I still have the hope that MLT is
| supported
| in the current version. An important feature of the product that I´m
| developing depends on that, and even if I can emulate MLT with a
| Dismax or
| E-dismax component, the thing is that MLT fits and works perfectly...
| 
| Regards,
| 
| Luis Cappa.
| 
| 
| 2012/11/5 Darren Govoni 
| 
| > There is a ticket for that with some recent activity (sorry I don't
| > have
| > it handy right now), but I'm not sure if that work made it into the
| > trunk,
| > so probably solrcloud does not support MLT...yet. Would love an
| > update from
| > the dev team though!
| >
| > --- Original Message ---
| > On 11/5/2012  10:37 AM Luis Cappa Banda wrote:That´s the
| > question, :-)
| > 
| > Regards,
| > 
| > Luis Cappa.
| > 
| >
| 


GC stalls cause Zookeeper timeout during uninvert for facet field

2012-11-06 Thread Arend-Jan Wijtzes
Hi,

We are running a small solr cluster with 8 cores on 4 machines. This
database has about 1E9 very small documents. One of the statistics we
need requires a facet on a text field with high cardinality.

During the uninvert phase of this text field the searchers experience
long stalls because of the garbage collecting (20+ seconds pauses) which
causes Solr to lose the Zookeeper lease. Often they do not recover 
gracefully and as a result the cluster becomes degraded:

"SEVERE: There was a problem finding the leader in
zk:org.apache.solr.common.SolrException: Could not get leader props"

This is an known open issue.

I explored several options to try and work around this. However I'm new
to Solr and need some help.

We tried running more cores:
We went from 4 to 8 cores. Does it make sense to go to 16 cores on 4
machines?


GC tuning:
This helped a lot but not enough to prevent the lease expirations. I'm
by no means a Java GC expert and would appreciate any tips to improve
this further. Current settings are:

Java HotSpot(TM) 64-Bit Server VM (20.0-b11)
-Xloggc:/home/solr/solr/log/gc.log
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintTenuringDistribution
-XX:+PrintClassHistogram
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=75
-XX:MaxGCPauseMillis=1
-XX:+CMSIncrementalMode
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-Djava.awt.headless=true
-Xss256k
-Xmx18g
-Xms1g
-DzkHost=ds30:2181,ds31:2181,ds32:2181

Actual memory stats accoring to top are: 74GB virtual, 11GB resident.
The GC log shows:
- age   1:   39078968 bytes,   39078968 total
: 342633K->38290K(345024K), 24.7992520 secs]
9277535K->9058682K(11687832K) icms_dc=73 , 24.7993810 secs] [Times:
user=366.87 sys=26.31, real=24.79 secs]
Total time for which application threads were stopped: 24.8005790
seconds
975.478: [GC 975.478: [ParNew
Desired survivor size 19628032 bytes, new threshold 1 (max 4)
- age   1:   38277672 bytes,   38277672 total
: 343750K->37537K(345024K), 22.4217640 secs]
9364142K->9131962K(11687832K) icms_dc=73 , 22.4218650 secs] [Times:
user=331.25 sys=23.85, real=22.42 secs]
Total time for which application threads were stopped: 22.4231750
seconds

etc.


Solr version:
4.0.0.2012.10.06.03.04.33

Current hardware consists of 4 machines, of which each has:
2x E5645 CPU, total of 24 cores
48GB mem
8 x SATA 7200RPM in raid 10


What would be a good strategy to try and get this database to perform
the way we need it? Would it make sense to split it up into 16 shards?
Ways to improve the GC behavior?

Any help would be grately appreciated.

AJ

-- 
Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl


Re: SolrCloud - configuration management in ZooKeeper

2012-11-06 Thread Tomás Fernández Löbbe
Hi Alexey, responses are inline:

Zookeeper manages not only the cluster state, but also the common
> configuration files.
> My question is, what are the exact rules of precedence? That is, when SOLR
> node will decide to download new configuration files?
>
When the SolrCore is started.


> Will configuration files be updated from ZooKeeper every time the core is
> refreshed?

Yes, every time the SolrCore is reloaded. If you need to force this, you
can either reload all the cores or reload the collection:
https://issues.apache.org/jira/browse/SOLR-3488

> What if bootstrapping is defined (bootstrap_configdir)? Will the
> node always try to upload?
>
if bootstrap_confdir is set, and the config name is always the same, every
time you start Solr it will upload the configuration files and override the
old ones in the same zk location.

> What are the best practices for production environment? Is it better to use
> external tool (ZkCLI) to trigger configuration changes?
>
I would at least not attach the bootstrap_confdir to a start script and
make it explicit. There are some Solr specific zk scripts that you can use.
See
http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper

I would use Solr's zk script for managing the configuration.

Tomás

>
> Thanks
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-configuration-management-in-ZooKeeper-tp4018432.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr4 data import skipdoc and regex

2012-11-06 Thread Randy
Hi *,

I want to import some data to build a Solr index. For this import, I need to
skip some documents from importing. In my data-config file it looks like
this:



As I also need to search my 'titles' I tried this:



This couldn't work - thats now clear for me ;-) But how can I do it?

Thanks in advance :-)

Randy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-data-import-skipdoc-and-regex-tp4018495.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Add new shard will be treated as replicas in Solr4.0?

2012-11-06 Thread Erick Erickson
bq: where can i find all the items on the road map?

Well, you really can't ... There's no "official" roadmap. I happen to
know this since I follow the developer's list and I've seen references to
this being important to the folks doing SolrCloud development work and it's
been a recurring theme on the user's list. It's one of those things that
_everybody_ understands would be useful in certain circumstances, but
haven't had time to actually implement yet.

You can track this at: https://issues.apache.org/jira/browse/SOLR-2592

Best
Erick



On Mon, Nov 5, 2012 at 7:57 PM, Zeng Lames  wrote:

> btw, where can i find all the items in the road map? thanks!
>
>
> On Tue, Nov 6, 2012 at 8:55 AM, Zeng Lames  wrote:
>
> > hi Erick, thanks for your kindly response. hv got the information from
> the
> > SolrCloud wiki.
> > think we may need to defined the shard numbers when we really rollout it.
> >
> > thanks again
> >
> >
> > On Mon, Nov 5, 2012 at 8:40 PM, Erick Erickson  >wrote:
> >
> >> Not at present. What you're interested in is "shard splitting" which is
> >> certainly on the roadmap but not implemented yet. To expand the
> >> number of shards you'll have to reconfigure, then re-index.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames 
> wrote:
> >>
> >> > Dear All,
> >> >
> >> > we have an existing solr collection, 2 shards, numOfShard is 2. and
> >> there
> >> > are already records in the index files. now we start another solr
> >> instance
> >> > with ShardId= shard3, and found that Solr treat it as replicas.
> >> >
> >> > check the zookeeper data, found the range of shard doesn't
> >> > change correspondingly. shard 1 is 0-7fff, while shard 2 is
> >> > 8000-.
> >> >
> >> > is there any way to increase new shard for existing collection?
> >> >
> >> > thanks a lot!
> >> > Lames
> >> >
> >>
> >
> >
>


Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread Erick Erickson
Not that I know of. This would be extremely expensive in the usual case.
Loading up configs, reconfiguring all the handlers etc. would add a huge
amount of overhead to the commit operation, which is heavy enough as it is.

What's the use-case here? Changing your configs really often and reading
them on commit sounds like a way to make for a very confusing application!

But if you really need to re-read all this info on a running system,
consider the core admin RELOAD command.

Best
Erick


On Mon, Nov 5, 2012 at 8:43 PM, roz dev  wrote:

> Hi All
>
> I am keen to find out if Solr exposes any event listener or other hooks
> which can be used to re-read configuration files.
>
>
> I know that we have firstSearcher event but I am not sure if it causes
> request handlers to reload themselves and read the conf files again.
>
> For example, if I change the synonym file and solr gets a commit, will it
> re-initialize request handlers and re-read the conf files.
>
> Or, are there some events which can be listened to?
>
> Any inputs are welcome.
>
> Thanks
> Saroj
>


Searching for Partial Words

2012-11-06 Thread Sohail Aboobaker
Hi,

Given following values in the document:

Doc1: Engine
Doc2. Engineer
Doc3. ResidentEngineer

We need to return all three documents when someone searches for "engi".

Basically we need to implement partial word search. Currently, we have a
wild card on the right side of search term (term*). Is it possible to have
wild card on both sides of a search term?

Regards,
Sohail Aboobaker.


Re: load balance with SolrCloud

2012-11-06 Thread Erick Erickson
I think you're conflating shards and cores. Shards are physical slices of a
singe logical index. An incoming query is sent to each and every shard and
the results tallied.

The case you're talking about seems to be more you have N separate indexes
(cores), where each core is for a specific user. This is vastly different
from SolrCloud, which puts all the data into one huge logical index!

Furthermore, presently there's no way to direct specific documents to
specific shards in SolrCloud (although a pluggable sharding mechanism is
under development).

You might be interested in SOLR-1293 (under development) for managing lots
of cores.






On Mon, Nov 5, 2012 at 4:26 PM, Jie Sun  wrote:

> we are using solr 3.5 in production and we deal with customers data of
> terabytes.
>
> we are using shards for large customers and write our own replica
> management
> in our software.
>
> Now with the rapid growth of data, we are looking into solrcloud for its
> robustness of sharding and replications.
>
> I understand by read some documents on line that there is no SPOF using
> solrcloud, so any instance in the cluster can server the query/index.
> However, is it true that we need to write our own load balancer in front of
> solrCloud?
>
> For example if we want to implement a model similar to Loggly, i.e. each
> customer start indexing into the small shard of its own, then if any of the
> customers grow more than the small shard's limit, we switch to index into
> another small shard (we call it front end shard), meanwhile merge the just
> released small shard to next level larger shard.
>
> Since the merge can happen between two instances on different servers, we
> probably end up with synch the index files for the merging shards and then
> use solr merge.
>
> I am curious if there is anything solr provide to help on these kind of
> strategy dealing with unevenly grow big customer data (a core)? or do we
> have to write these in our software layer from scratch?
>
> thanks
> Jie
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr / Velocity url rewrite

2012-11-06 Thread Erick Erickson
Velocity/Solaritas was never intended to be a user-facing app. How are you
locking things down so a user can't enter, or instance,
q=*:*&commit=true?

I'd really recommend a proper middleware layer unless you have a trusted
user base...

FWIW,
Erick


On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues <
sebastien.dartig...@gmail.com> wrote:

> Hi all,
>
> Today i'm using solritas as front-end for the solr search engine.
>
> But i would like to do url rewriting to deliver urls more compliant with
> SEO.
>
> First the end user types that kind of url : http://host.com/query/myquery
>
> So this url should be rewriten internally (kind of reverse proxy) in
> http://localhost:8983/query?q=myquery.
>
> This internal url should not be displayed to the end user and in return
> when the result page is displayed all the links in the page should be
> rewritten with a SEO compliant url.
>
> I tried to perform some tests with an apache front end by using mod_proxy
> but i didn't succeed to pass url parameters.
> Does someone ever tried to do SEO with solr search engine (solritas front)?
>
> Thanks for your help.
>


Re: Solr / Velocity url rewrite

2012-11-06 Thread Sébastien Dartigues
Hi Erick,

Thanks for your help.
OK except the php client delivered as a sample, do you have a preference
for an "out of the box" front end easly deployable?
My main use case is to be compliant with SEO, or at least to give nice
(url) entry point.

Thanks.


2012/11/6 Erick Erickson 

> Velocity/Solaritas was never intended to be a user-facing app. How are you
> locking things down so a user can't enter, or instance,
> q=*:*&commit=true?
>
> I'd really recommend a proper middleware layer unless you have a trusted
> user base...
>
> FWIW,
> Erick
>
>
> On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues <
> sebastien.dartig...@gmail.com> wrote:
>
> > Hi all,
> >
> > Today i'm using solritas as front-end for the solr search engine.
> >
> > But i would like to do url rewriting to deliver urls more compliant with
> > SEO.
> >
> > First the end user types that kind of url :
> http://host.com/query/myquery
> >
> > So this url should be rewriten internally (kind of reverse proxy) in
> > http://localhost:8983/query?q=myquery.
> >
> > This internal url should not be displayed to the end user and in return
> > when the result page is displayed all the links in the page should be
> > rewritten with a SEO compliant url.
> >
> > I tried to perform some tests with an apache front end by using mod_proxy
> > but i didn't succeed to pass url parameters.
> > Does someone ever tried to do SEO with solr search engine (solritas
> front)?
> >
> > Thanks for your help.
> >
>


Re: Searching for Partial Words

2012-11-06 Thread Jack Krupansky
Add an "edge" n-gram filter (EdgeNGramFilterFactory) to your "index" 
analyzer. This will add all the prefixes of words to the index, so that a 
query of "engi" will be equivalent to but much faster than the wildcard 
engi*. You can specify a minimum size, such as 3 or 4 to eliminate tons of 
too-short prefixes, if you want.


See:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramFilterFactory.html
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html

-- Jack Krupansky

-Original Message- 
From: Sohail Aboobaker

Sent: Tuesday, November 06, 2012 8:08 AM
To: solr-user@lucene.apache.org
Subject: Searching for Partial Words

Hi,

Given following values in the document:

Doc1: Engine
Doc2. Engineer
Doc3. ResidentEngineer

We need to return all three documents when someone searches for "engi".

Basically we need to implement partial word search. Currently, we have a
wild card on the right side of search term (term*). Is it possible to have
wild card on both sides of a search term?

Regards,
Sohail Aboobaker. 



Re: Solr 4.0 simultaneous query problem

2012-11-06 Thread Rohit Harchandani
So is it a better approach to query for smaller rows, say 500, and keep
increasing the start parameter? wouldnt that be slower since I have an
increasing start parameter and I will also be sorting by the same field in
each of my queries made to the multiple shards?

Also, does it make sense to have all these documents in the same shard? I
went for this approach because the shard which is queried the most is small
and gives a lot of benefit in terms of time taken for all the stats
queries. This shard is only about 5 gb whereas the entire index will be
about 50 gb.

Thanks for the help,
Rohit

On Mon, Nov 5, 2012 at 4:02 PM, Walter Underwood wrote:

> Don't query for 5000 documents. That is going to be slow no matter how it
> is implemented.
>
> wunder
>
> On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote:
>
> > Hi,
> > So it seems that when I query multiple shards with the sort criteria for
> > 5000 documents, it queries all shards and gets a list of document ids and
> > then adds the document ids to the original query and queries all the
> shards
> > again.
> > This process of doing the join of query results with the unique ids and
> > getting the remaining fields is turning out to be really slow. It takes a
> > while to search for a list of unique ids. Is there any config change  to
> > make this process faster?
> > Also what does isDistrib=false mean when solr generates the queries
> > internally?
> > Thanks,
> > Rohit
> >
> > On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani  >wrote:
> >
> >> Hi,
> >>
> >> The same query is fired always for 500 rows. The only thing different is
> >> the "start" parameter.
> >>
> >> The 3 shards are in the same instance on the same server. They all have
> >> the same schema. But the inherent type of the documents is different.
> Also
> >> most of the apps queries goes to shard "A" which has the smallest index
> >> size (4gb).
> >>
> >> The query is made to a "master" shard which by default goes to all 3
> >> shards for results. (also, the query that i am trying matches documents
> >> only only in shard "A" mentioned above)
> >>
> >> Will try debugQuery now and post it here.
> >>
> >> Thanks,
> >> Rohit
> >>
> >>
> >>
> >>
> >> On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic <
> >> otis.gospodne...@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> Maybe you can narrow this down a little further.  Are there some
> >>> queries that are faster and some slower?  Is there a pattern?  Can you
> >>> share examples of slow queries?  Have you tried &debugQuery=true?
> >>> These 3 shards is each of them on its own server or?  Is the slow
> >>> one always the one that hits the biggest shard?  Do they hold the same
> >>> type of data?  How come their sizes are so different?
> >>>
> >>> Otis
> >>> --
> >>> Search Analytics - http://sematext.com/search-analytics/index.html
> >>> Performance Monitoring - http://sematext.com/spm/index.html
> >>>
> >>>
> >>> On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani  >
> >>> wrote:
>  Hi all,
>  I have an application which queries a solr instance having 3
> shards(4gb,
>  13gb and 30gb index size respectively) having 6 million documents in
> >>> all.
>  When I start 10 threads in my app to make simultaneous queries (with
>  rows=500 and different start parameter, sort on 1 field and no facets)
> >>> to
>  solr to return 500 different documents in each query, sometimes I see
> >>> that
>  most of the responses come back within no time (500ms-1000ms), but the
> >>> last
>  response takes close to 50 seconds (Qtime).
>  I am using the latest 4.0 release. What is the reason for this delay?
> Is
>  there a way to prevent this?
>  Thanks and regards,
>  Rohit
> >>>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Re: SolrCloud failover behavior

2012-11-06 Thread Nick Chase
Thanks a million, Erick!  You're right about killing both nodes hosting 
the shard.  I'll get the wiki corrected.


  Nick

On 11/3/2012 10:51 PM, Erick Erickson wrote:

SolrCloud doesn't work unless every shard has at least one server that is
up and running.

I _think_ you might be killing both nodes that host one of the shards. The
admin
page has a link showing you the state of your cluster. So when this happens,
does that page show both nodes for that shard being down?

And yeah, SolrCloud requires a quorum of ZK nodes up. So with only one ZK
node, killing that will bring down the whole cluster. Which is why the
usual
recommendation is that ZK be run externally and usually an odd number of ZK
nodes (three or more).

Anyone can create a login and edit the Wiki, so any clarifications are
welcome!

Best
Erick


On Sat, Nov 3, 2012 at 12:17 PM, Nick Chase  wrote:


I think there's a change in the behavior of SolrCloud vs. what's in the
wiki, but I was hoping someone could confirm for me.  I checked JIRA and
there were a couple of issues requesting partial results if one server
comes down, but that doesn't seem to be the issue here.  I also checked
CHANGES.txt and don't see anything that seems to apply.

I'm running "Example B: Simple two shard cluster with shard replicas" from
the wiki at 
https://wiki.apache.org/solr/**SolrCloudand
 everything starts out as expected.  However, when I get to the part
about fail over behavior is when things get a little wonky.

I added data to the shard running on 7475.  If I kill 7500, a query to any
of the other servers works fine.  But if I kill 7475, rather than getting
zero results on a search to 8983 or 8900, I get a 503 error:



   503
   5
   
  *:*
   


   no servers hosting shard:
   503



I don't see any errors in the consoles.

Also, if I kill 8983, which includes the Zookeeper server, everything
dies, rather than just staying in a steady state; the other servers
continually show:

Nov 03, 2012 11:39:34 AM org.apache.zookeeper.**ClientCnxn$SendThread
startConnect
NFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983
ov 03, 2012 11:39:35 AM org.apache.zookeeper.**ClientCnxn$SendThread run
ARNING: Session 0x13ac6cf87890002 for server null, unexpected error,
closing socket connection and attempting reconnect
ava.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.**checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.**finishConnect(Unknown Source)
at org.apache.zookeeper.**ClientCnxn$SendThread.run(**
ClientCnxn.java:1143)

ov 03, 2012 11:39:35 AM org.apache.zookeeper.**ClientCnxn$SendThread
startConnect

over and over again, and a call to any of the servers shows a connection
error to 8983.

This is the current 4.0.0 release, running on Windows 7.

If this is the proper behavior and the wiki needs updating, fine; I just
need to know.  Otherwise if anybody has any clues as to what I may be
missing, I'd be grateful. :)

Thanks...

---  Nick





Re: lukeall.jar for Solr4r?

2012-11-06 Thread Carrie Coy
Thank you very much for taking the time to do this.   This version is 
able to read the index files, but there is at least one issue:


The home screen reports "ERROR: can't count terms per field" and  this 
exception is thrown:


java.util.NoSuchElementException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1098)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)

at org.getopt.luke.IndexInfo.countTerms(IndexInfo.java:64)
at org.getopt.luke.IndexInfo.getNumTerms(IndexInfo.java:109)
at org.getopt.luke.Luke$3.run(Luke.java:1165)


On 11/05/2012 05:08 PM, Shawn Heisey wrote:

On 11/5/2012 2:52 PM, Shawn Heisey wrote:
No idea whether I did it right, or even whether it works.  All my 
indexes are either 3.5 or 4.1-SNAPSHOT, so I can't actually test it.  
You can get to the resulting jar and my patch against the 
luke-4.0.0-ALPHA source:


https://dl.dropbox.com/u/97770508/luke-4.0.0-unofficial.patch
https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial.jar

If you have an immediate need for 4.0.0 support in Luke, please try 
it out and let me know whether it works.  If it doesn't work, or when 
the official luke 4.0.0 is released, I will remove those files from 
my dropbox.


I just realized that the version I uploaded there was compiled with 
java 1.7.0_09.  I don't know if this is actually a problem, but just 
in case, I re-did the compile on a machine with 1.6.0_29.  The 
filename referenced above now points to this version and I have 
included a file that indicates its java7 origins:


https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial-java7.jar

Thanks,
Shawn



custom request handler

2012-11-06 Thread Lee Carroll
Hi we are extending SearchHandler to provide a custom search request
handler. Basically we've added NamedLists called allowed , whiteList,
maxMinList etc.

These look like the default, append and invariant namedLists in the
standard search handler config. In handleRequestBody we then remove params
not listed in the allowed named list, white list values as per the white
list and so on.

The idea is to have a "safe" request handler which the big bad world could
be exposed to. I'm worried. What have we missed that a front end app could
give us ?

Also removing params in SolrParams is a bit clunky. We are basically
converting SolrParams into NamedList processing a new NamedList from this
and then .setParams(SolrParams.toSolrParams(nlNew)) Is their a better way?
In particular namedLists are not set up for key look ups...

Anyway basically is having a custom request handler doing the above the way
to go ?

Cheers


Re: Searching for Partial Words

2012-11-06 Thread Sohail Aboobaker
Thanks Jack.
In the configuration below:

 
   
 
   
 

What are the possible values for "side"?

If I understand it correctly, minGramSize=3 and side=front, will
include eng* but not en*. Is this correct? So, the minGramSize is for
number of characters allowed in the specified side.

Does it allow side=both :) or something similar?

Regards,
Sohail


Re: migrating from solr3 to solr4

2012-11-06 Thread Michael Della Bitta
> I got the following error in browser console:
> http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json

We can't see the contents of that link.. Could you post it on
pastebin.com or something?

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker
 wrote:
> I
> got the following error in browser console:
>
> http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json


Re: migrating from solr3 to solr4

2012-11-06 Thread Carlos Alexandro Becker
Hi Michael, thank for your answer.

I already posted it in stackoverflow (
http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4 ),
but, this looks like a encoding issue, actually, is exactly the error.

I'm not sure, but I look in all xml files in my JBoss and also in app,
neither mention this variables (contextPath and adminPath) related to solr.

So, or there is something that I should configure and don't know how, or
some trouble with the encoding that are escaping the "$" and "{" around the
var (not sure, I didn't find the file where the app variable is populated).

Thanks in advance.




On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> > I got the following error in browser console:
> > http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
>
> We can't see the contents of that link.. Could you post it on
> pastebin.com or something?
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker
>  wrote:
> > I
> > got the following error in browser console:
> >
> > http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
>



-- 
Atenciosamente,
*Carlos Alexandro Becker*
https://profiles.google.com/caarlos0


Re: SolrCloud Tomcat configuration: problems and doubts.

2012-11-06 Thread Luis Cappa Banda
Forward to solr-user mailing list. We forgot to reply to it, :-/

2012/11/5 Luis Cappa Banda 

> Hello, Mark!
>
> I´ve been testing more and more and things are going better. I have tested
> what you told me about "-Dbootstrap_conf=true" and works fine, but the
> problem is that if I include that application parameter in every Tomcat
> instance when I deploy all Solr servers each one load again all solrCore
> configurations inside Zookeeper.
>
> It should exists something like a Tomcat master server which only has the
> following parameters that defines the basic SolrCloud configuration:
>
> JAVA_OPTS="-DzkHost=127.0.0.1:9000 -DnumShards=2 -Dbootstrap_conf=true"
>
> Then the other Tomcat servers should have only:
>
> JAVA_OPTS="-DzkHost=127.0.0.1:9000"
>
>
> However, I think that is not the best way to procceed. We are at 2012,
> it´s the end of the world - God (well, one of them) is angry and attacks my
> Production environment. Imagine that all servers go down and a Monit
> service restarts them alleatory. Maybe one common Tomcat server finishes
> it´s startup faster than the named Tomcat master server, so those SolrCloud
> configuration parameters won´t be loaded at first. That´s a problem.
>
> One posibility is to write a simple script to be executed in every Tomcat
> launch execution that consists on something like:
>
> " I´m the first Tomcat and I´m launching! I´ll write a
> solrcloud.config.lock file in a well-known path (or maybe into Zookeeper)
> to announce the other Tomcats that I´ll start to load SolrCloud
> configuration files into Zookeeper. I am the Tomcat master server, so I´ll
> load* JAVA_OPTS="-DzkHost=127.0.0.1:9000 -DnumShards=2
> -Dbootstrap_conf=true"* ".
>
> " I´m a second Tomcat and I´m launching! First I check if any
> solrcloud.config.lock file exists. If exists, I simple load *
> JAVA_OPTS="-DzkHost=127.0.0.1:9000"* "
>
>
> And so on.
>
>
>
> I don´t like too much this solution because it´s not elegant and it´s very
> ad-hoc, but it works. What do you think about it? I´ve just started with
> SolrCloud four or five days ago and maybe I forget something that could
> solve this problem.
>
> Thank you very much, Mark.
>
> Regards,
>
> Luis Cappa.
>
>
>
> 2012/11/3 Mark Miller 
>
>> On Fri, Nov 2, 2012 at 9:05 AM, Luis Cappa Banda 
>> wrote:
>> > Hello, Mark!
>> >
>> > How are you? Thanks a lot for helping me. You were right about
>> jetty.host
>> > parameter. My fianl test solr.xml looks like:
>> >
>> >   > > host="localhost" hostPort="9080" hostContext="items_en">
>> > 
>> >   
>> >
>> >
>> > I´ve noticed that 'hostContext' parameter was also required, so I
>> included
>> > it.
>>
>> It should default to /solr if you don't set it - it is there in case
>> you deploy to a different context though.
>>
>> >After that corrections Cloud graph tree looks right, and executing
>> > queries doesn' t return a 503 error. Phew! However, I checked in the
>> Cloud
>> > graph tree that a"collection1" appears too pointing to
>> > http://localhost:8983/solr. I will continue testing if I missed
>> something,
>> > but looks like it is creating another collection with default parameters
>> > (collection name, port) without control.
>>
>> It should only create what it finds in solr.xml - let me know what you
>> find.
>>
>> >
>> > While using Apache Tomcat I was forced to include in catalina.sh (or
>> > setenv.sh) the following environment parameters, as I told you before:
>> >
>> > JAVA_OPTS="-DzkHost=127.0.0.1:9000 -Dcollection.configName=items_en"
>>
>> You should only need -DzkHost= - see below.
>>
>> >
>> >
>> > Just three questions more:
>> >
>> > 1. That´s a problem for me, because I would like to deploy in each
>> Tomcat
>> > instance more than one Solr server with different configurations file (I
>> > mean, differents configName parameters), so including that JAVA_OPTS
>> forces
>> > to me to deploy in that Tomcat server only Solr servers with this kind
>> of
>> > configuration. In a production environment I would like to deploy in a
>> > single Tomcat instance at least for Solr servers, one per each kind of
>> > documents that I will index and query to. Do you know any way to
>> configure
>> > the configName per each Solr server instance? Is it posible to
>> configure it
>> > inside solr.xml file? Also, it make sense to deploy in each Solr server
>> a
>> > multi-core configuration, each core with each configName allocated in
>> > Zookeeper, but again using that kind of JAVA_OPTS on-fire params
>> > configuration makes it impossible, :-(
>>
>> That config name sys prop is not being used here - it's only used when
>> you use -Dbootstrap_confdir=, and then only the first time you
>> start up.
>>
>> Collections are linked to configuration sets in ZooKeeper. If you use
>> -Dboostrap_conf=true, a special rule is used that auto links
>> collections and config sets with the same name as the collection.
>> Otherwise, you can use the ZkCLi cmd line tool to link any collectio
>> to any config in z

Re: migrating from solr3 to solr4

2012-11-06 Thread Stefan Matheis
Hey Carlos

just had a quick look at our changes and figured out the revision which 
introduced this change, which might help you while having another look?

http://svn.apache.org/viewvc?view=revision&revision=1297578

The LoadAdminUiServlet is responsible for replacing those placeholders which 
are causing your problems

HTH at least a bit
Stefan



On Tuesday, November 6, 2012 at 5:02 PM, Carlos Alexandro Becker wrote:

> just found this in the admin.html head:
>  
> https://gist.github.com/4025669
>  
>  
> On Tue, Nov 6, 2012 at 1:57 PM, Carlos Alexandro Becker
> mailto:caarl...@gmail.com)>wrote:
>  
> > Hi Michael, thank for your answer.
> >  
> > I already posted it in stackoverflow (
> > http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4 ),
> > but, this looks like a encoding issue, actually, is exactly the error.
> >  
> > I'm not sure, but I look in all xml files in my JBoss and also in app,
> > neither mention this variables (contextPath and adminPath) related to solr.
> >  
> > So, or there is something that I should configure and don't know how, or
> > some trouble with the encoding that are escaping the "$" and "{" around the
> > var (not sure, I didn't find the file where the app variable is populated).
> >  
> > Thanks in advance.
> >  
> >  
> >  
> >  
> > On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta <
> > michael.della.bi...@appinions.com 
> > (mailto:michael.della.bi...@appinions.com)> wrote:
> >  
> > > > I got the following error in browser console:
> > > http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
> > >  
> > > We can't see the contents of that link.. Could you post it on
> > > pastebin.com (http://pastebin.com) or something?
> > >  
> > > Michael Della Bitta
> > >  
> > > 
> > > Appinions
> > > 18 East 41st Street, 2nd Floor
> > > New York, NY 10017-6271
> > >  
> > > www.appinions.com (http://www.appinions.com)
> > >  
> > > Where Influence Isn’t a Game
> > >  
> > >  
> > > On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker
> > > mailto:caarl...@gmail.com)> wrote:
> > > > I
> > > > got the following error in browser console:
> > >  
> > >  
> > > http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
> >  
> >  
> >  
> >  
> > --
> > Atenciosamente,
> > *Carlos Alexandro Becker*
> > https://profiles.google.com/caarlos0
>  
>  
>  
>  
> --  
> Atenciosamente,
> *Carlos Alexandro Becker*
> https://profiles.google.com/caarlos0





Re: lukeall.jar for Solr4r?

2012-11-06 Thread Shawn Heisey

On 11/6/2012 7:45 AM, Carrie Coy wrote:
Thank you very much for taking the time to do this.   This version is 
able to read the index files, but there is at least one issue:


The home screen reports "ERROR: can't count terms per field" and this 
exception is thrown:


java.util.NoSuchElementException
at 
java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1098)

at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)

at org.getopt.luke.IndexInfo.countTerms(IndexInfo.java:64)
at org.getopt.luke.IndexInfo.getNumTerms(IndexInfo.java:109)
at org.getopt.luke.Luke$3.run(Luke.java:1165)


That particular change, around IndexInfo.java line 64 (and a few other 
locations as well), is the one part of my changes that I actually had 
confidence in.  I have no idea how to fix it.  I'll go ahead and remove 
the jars from my dropbox, since they don't work.


Thanks,
Shawn



Re: GC stalls cause Zookeeper timeout during uninvert for facet field

2012-11-06 Thread Gil Tene
On Nov 6, 2012 at 6:06 AM, Arend-Jan Wijtzes 
mailto:ajwyt...@wise-guys.nl>> wrote:
...
During the uninvert phase of this text field the searchers experience
long stalls because of the garbage collecting (20+ seconds pauses) which
causes Solr to lose the Zookeeper lease. Often they do not recover
gracefully and as a result the cluster becomes degraded:

"SEVERE: There was a problem finding the leader in
zk:org.apache.solr.common.SolrException: Could not get leader props"

This is an known open issue.



Using the Zing JVM is simple, immediate way to get around this and other known 
GC related issues. Zing eliminates GC pauses as a concern for enterprise 
applications such as this, driving worst case JVM-related hiccups down to the 
milliseconds level. This behavior will tend to happen out-of-the-box, with 
little or no tuning, and at any heap size your server can support. For example, 
on the specific serverconfigurations you mention (24 vcores, 48GB of RAM) you 
should be able to comfortably run with a -Xmx of 30GB and no longer worry about 
pauses. We've had people run much larger than that (e.g. 
http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html).

In full disclosure, I work for (and am the CTO at) Azul.

-- Gil.


Re: migrating from solr3 to solr4

2012-11-06 Thread Carlos Alexandro Becker
Hi Stefan,

Thank you very much, I just realized that I didn't updated the web.xml, so,
I not has the LoadAdminUiServlet configured, that's why it was not working.

By now, the only problem I still have, is that it tries to access
solr.home/collection1/conf, and I used to have it in solr.home/conf..

How can I fix this?

Thank you very much for your help.


On Tue, Nov 6, 2012 at 3:01 PM, Stefan Matheis wrote:

> Hey Carlos
>
> just had a quick look at our changes and figured out the revision which
> introduced this change, which might help you while having another look?
>
> http://svn.apache.org/viewvc?view=revision&revision=1297578
>
> The LoadAdminUiServlet is responsible for replacing those placeholders
> which are causing your problems
>
> HTH at least a bit
> Stefan
>
>
>
> On Tuesday, November 6, 2012 at 5:02 PM, Carlos Alexandro Becker wrote:
>
> > just found this in the admin.html head:
> >
> > https://gist.github.com/4025669
> >
> >
> > On Tue, Nov 6, 2012 at 1:57 PM, Carlos Alexandro Becker
> > mailto:caarl...@gmail.com)>wrote:
> >
> > > Hi Michael, thank for your answer.
> > >
> > > I already posted it in stackoverflow (
> > >
> http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4),
> > > but, this looks like a encoding issue, actually, is exactly the error.
> > >
> > > I'm not sure, but I look in all xml files in my JBoss and also in app,
> > > neither mention this variables (contextPath and adminPath) related to
> solr.
> > >
> > > So, or there is something that I should configure and don't know how,
> or
> > > some trouble with the encoding that are escaping the "$" and "{"
> around the
> > > var (not sure, I didn't find the file where the app variable is
> populated).
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > >
> > > On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta <
> > > michael.della.bi...@appinions.com (mailto:
> michael.della.bi...@appinions.com)> wrote:
> > >
> > > > > I got the following error in browser console:
> > > >
> http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
> > > >
> > > > We can't see the contents of that link.. Could you post it on
> > > > pastebin.com (http://pastebin.com) or something?
> > > >
> > > > Michael Della Bitta
> > > >
> > > > 
> > > > Appinions
> > > > 18 East 41st Street, 2nd Floor
> > > > New York, NY 10017-6271
> > > >
> > > > www.appinions.com (http://www.appinions.com)
> > > >
> > > > Where Influence Isn’t a Game
> > > >
> > > >
> > > > On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker
> > > > mailto:caarl...@gmail.com)> wrote:
> > > > > I
> > > > > got the following error in browser console:
> > > >
> > > >
> > > >
> http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
> > >
> > >
> > >
> > >
> > > --
> > > Atenciosamente,
> > > *Carlos Alexandro Becker*
> > > https://profiles.google.com/caarlos0
> >
> >
> >
> >
> > --
> > Atenciosamente,
> > *Carlos Alexandro Becker*
> > https://profiles.google.com/caarlos0
>
>
>
>


-- 
Atenciosamente,
*Carlos Alexandro Becker*
http://caarlos0.github.com/about


Re: Reply:Re: Where to get more documents or references about sold cloud?

2012-11-06 Thread Otis Gospodnetic
Hi,

On Mon, Nov 5, 2012 at 8:24 PM, SuoNayi  wrote:

> Thanks jack and thanks for the great country.
> All big famous websites such as google, slideshares and blogspot etc are
> blocked.
> What I want to know about is more details about solrcloud, here is my
> questions:
> 1.Can we control the relocation of shard / replica dynamically?
>

Don't think so, if you think manually.


> 2.Can we move shard between solr instances?
>

SolrCloud does this, there is no manual moving option now.


> 3.Is one solr instance related to one shard / replica?
>

A single shard or a single replica of a shard lives on just 1 Solr server.
 A replica of a shard on server A can and should be on server B.

4.What's the sharding key algorithm?
>

Hashing on the doc key and # of nodes, I believe.


> 5.Does it support custom sharding key?
>

Not yet, I believe.

See
http://search-lucene.com/?q=cloud+sharding&fc_project=Solr&fc_type=mail+_hash_+user&fc_type=jira

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html



>
>
> At 2012-11-05 20:44:46,"Jack Krupansky"  wrote:
> >Is most of the Web blocked in your location? When I Google "SolrCloud",
> >Google says that there are "About 61,400 results" with LOTS of informative
> >links, including blogs, videos, slideshares, etc. just on the first two
> >pages pf search results alone.
> >
> >If you have specific questions, please ask them with specific detail, but
> >try reading a few of the many sources of information available on the Web
> >first.
> >
> >-- Jack Krupansky
> >
> >-Original Message-
> >From: SuoNayi
> >Sent: Monday, November 05, 2012 3:32 AM
> >To: solr-user@lucene.apache.org
> >Subject: Where to get more documents or references about sold cloud?
> >
> >Hi all, there is only one entry about solr cloud on the
> >wiki,http://wiki.apache.org/solr/SolrCloud.
> >I have googled a lot and found no more details about solr cloud, or maybe
> I
> >miss something?
> >
>


New Index directory regardless of Solr.xml

2012-11-06 Thread Rasmussen, Chris
I have a five node SolrCloud implementation running as a test with no 
replication using a three node zookeeper ensemble.  Admittedly, I'm new to Solr 
and just grinding it out.  Accidently re-initialized zookeeper with the wrong 
conf dir and I'm trying to recover.  I re-ran the initialization with the 
correct conf dir, but now the indexes are reporting 0 documents.  Logs also 
report that a new index was created in the dataDir called "index". Previous 
indexes where in a named directory based on slice/shard.  The previous indexes 
don't appear to have any issues, I just can't "re-point" the solr cores to 
them.  The Solr.xml file for one of the servers is:



  

  


I think I'm missing exactly what the instanceDir provides.  These were the 
directories created when I first set up the servers and where the indexes exist 
that I want to use.

Any thought? Or am I just completely off base here in my description of the 
issue.

Chris


Deleting an individual document while delta index is running

2012-11-06 Thread Josh Turmel
Running Solr 3.3

We're running into issues where deleting individual documents (by ID) will
timeout but it only seems to happen when our hourly delta index is being
ran to pull in new documents, is there a way to work around this?

Thank you,
Josh


RE: Access DIH from inside application (via Solrj)?

2012-11-06 Thread Dyer, James
DIH & SolrJ don't really support what you want to do.  But you can make it work 
with code like this, which reloads the DIH configuration and checks for the 
response.  Just note this is quite brittle:  whenever the response changes in 
future versions of DIH, it'll break your code.

Map paramMap = new HashMap();
paramMap.put("command", "reload-config");
SolrParams params = new MapSolrParams(paramMap);
DirectXmlRequest req = new DirectXmlRequest("/dataimporthandler", null);

req.setMethod(METHOD.GET);
req.setParams(params);
NamedList nl = server.request(req);
String importResponse = (String) nl.get("importResponse");
boolean reloaded = false;
if("Configuration Re-loaded sucessfully".equals(importResponse)) {
reloaded = true;
}

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Billy Newman [mailto:newman...@gmail.com] 
Sent: Tuesday, November 06, 2012 3:00 PM
To: solr-user@lucene.apache.org
Subject: Access DIH from inside application (via Solrj)?

I know that you can access the DIh interface restfully which work
pretty well for most cases.  I would like to know however if it is
possible to send/receive commands from a DIH vai the Solrj library.
Basically I would just like to be able to kick off the DIH and maybe
check status.  I can work around this but java is not the best client
for handling/dealing with http/xml.  If I could use Solrj the code
would probably be a lot more straight forward.

Thanks guys/gals,

Billy




Re: Solr / Velocity url rewrite

2012-11-06 Thread Erick Erickson
Not really. Mostly it's whatever you are most comfortable with. Since the
app <-> solr connection is just HTTP, the front-end is wide open.

FWIW,
Erick


On Tue, Nov 6, 2012 at 8:30 AM, Sébastien Dartigues <
sebastien.dartig...@gmail.com> wrote:

> Hi Erick,
>
> Thanks for your help.
> OK except the php client delivered as a sample, do you have a preference
> for an "out of the box" front end easly deployable?
> My main use case is to be compliant with SEO, or at least to give nice
> (url) entry point.
>
> Thanks.
>
>
> 2012/11/6 Erick Erickson 
>
> > Velocity/Solaritas was never intended to be a user-facing app. How are
> you
> > locking things down so a user can't enter, or instance,
> > q=*:*&commit=true?
> >
> > I'd really recommend a proper middleware layer unless you have a trusted
> > user base...
> >
> > FWIW,
> > Erick
> >
> >
> > On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues <
> > sebastien.dartig...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Today i'm using solritas as front-end for the solr search engine.
> > >
> > > But i would like to do url rewriting to deliver urls more compliant
> with
> > > SEO.
> > >
> > > First the end user types that kind of url :
> > http://host.com/query/myquery
> > >
> > > So this url should be rewriten internally (kind of reverse proxy) in
> > > http://localhost:8983/query?q=myquery.
> > >
> > > This internal url should not be displayed to the end user and in return
> > > when the result page is displayed all the links in the page should be
> > > rewritten with a SEO compliant url.
> > >
> > > I tried to perform some tests with an apache front end by using
> mod_proxy
> > > but i didn't succeed to pass url parameters.
> > > Does someone ever tried to do SEO with solr search engine (solritas
> > front)?
> > >
> > > Thanks for your help.
> > >
> >
>


Re: SolrCloud failover behavior

2012-11-06 Thread Erick Erickson
I was right for once ..

Thanks for updating the Wiki!

Erick


On Tue, Nov 6, 2012 at 9:42 AM, Nick Chase  wrote:

> Thanks a million, Erick!  You're right about killing both nodes hosting
> the shard.  I'll get the wiki corrected.
>
>   Nick
>
>
> On 11/3/2012 10:51 PM, Erick Erickson wrote:
>
>> SolrCloud doesn't work unless every shard has at least one server that is
>> up and running.
>>
>> I _think_ you might be killing both nodes that host one of the shards. The
>> admin
>> page has a link showing you the state of your cluster. So when this
>> happens,
>> does that page show both nodes for that shard being down?
>>
>> And yeah, SolrCloud requires a quorum of ZK nodes up. So with only one ZK
>> node, killing that will bring down the whole cluster. Which is why the
>> usual
>> recommendation is that ZK be run externally and usually an odd number of
>> ZK
>> nodes (three or more).
>>
>> Anyone can create a login and edit the Wiki, so any clarifications are
>> welcome!
>>
>> Best
>> Erick
>>
>>
>> On Sat, Nov 3, 2012 at 12:17 PM, Nick Chase  wrote:
>>
>>  I think there's a change in the behavior of SolrCloud vs. what's in the
>>> wiki, but I was hoping someone could confirm for me.  I checked JIRA and
>>> there were a couple of issues requesting partial results if one server
>>> comes down, but that doesn't seem to be the issue here.  I also checked
>>> CHANGES.txt and don't see anything that seems to apply.
>>>
>>> I'm running "Example B: Simple two shard cluster with shard replicas"
>>> from
>>> the wiki at 
>>> https://wiki.apache.org/solr/SolrCloud
>>> >and
>>> everything starts out as expected.  However, when I get to the part
>>>
>>> about fail over behavior is when things get a little wonky.
>>>
>>> I added data to the shard running on 7475.  If I kill 7500, a query to
>>> any
>>> of the other servers works fine.  But if I kill 7475, rather than getting
>>> zero results on a search to 8983 or 8900, I get a 503 error:
>>>
>>> 
>>> 
>>>503
>>>5
>>>
>>>   *:*
>>>
>>> 
>>> 
>>>no servers hosting shard:
>>>503
>>> 
>>> 
>>>
>>> I don't see any errors in the consoles.
>>>
>>> Also, if I kill 8983, which includes the Zookeeper server, everything
>>> dies, rather than just staying in a steady state; the other servers
>>> continually show:
>>>
>>> Nov 03, 2012 11:39:34 AM org.apache.zookeeper.ClientCnxn$SendThread
>>>
>>> startConnect
>>> NFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983
>>> ov 03, 2012 11:39:35 AM org.apache.zookeeper.ClientCnxn$SendThread
>>> run
>>>
>>> ARNING: Session 0x13ac6cf87890002 for server null, unexpected error,
>>> closing socket connection and attempting reconnect
>>> ava.net.ConnectException: Connection refused: no further information
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown
>>> Source)
>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(**
>>> ClientCnxn.java:1143)
>>>
>>> ov 03, 2012 11:39:35 AM org.apache.zookeeper.ClientCnxn$SendThread
>>>
>>> startConnect
>>>
>>> over and over again, and a call to any of the servers shows a connection
>>> error to 8983.
>>>
>>> This is the current 4.0.0 release, running on Windows 7.
>>>
>>> If this is the proper behavior and the wiki needs updating, fine; I just
>>> need to know.  Otherwise if anybody has any clues as to what I may be
>>> missing, I'd be grateful. :)
>>>
>>> Thanks...
>>>
>>> ---  Nick
>>>
>>>
>>


Re: load balance with SolrCloud

2012-11-06 Thread Erick Erickson
This is a complex setup, all right.

A pluggable sharding strategy is definitely something that is on the
roadmap for SolrCloud, but hasn't made it into the code base yet.

Keep in mind, though, that all the SolrCloud goodness centers around the
idea of a single index that may be sharded. I don't think SolrCloud has had
time to really think about handling the situation in which you have a bunch
of cores that may or may not be sharded but are running on the same server.
I don't know that it _doesn't_ work, mind you, but that scenario doesn't
seem like the prime use-case for SolrCloud.

That said, I don't know that such a situation is  _not_ do-able in
SolrCloud. Mostly I haven't explored that kind of functionality yet.

Not much help, I know. I suspect that this is one of those cases where _we_
will learn from _you_ if you try to meld SolrCloud with your setup. Sounds
like a great Wiki page if you do pursue this!


Best
Erick


On Tue, Nov 6, 2012 at 4:58 PM, Jie Sun  wrote:

> Hi Eric,
> thanks for your information. I read all the related issues with SOLR-1293
> as
> your just pointed me to.
>
> It seems they are not very suitable for our scenario.
>
> We do have couple of hundreds cores (you are right each customer will be
> corresponded to a core) typically on one solr instance. and all of them
> need
> to be actively working with indexing and queries. So we are not having like
> 10s of thousands of cores that only part of them need to be loaded.
>
> Our issues are on some servers that host very large customers, it runs out
> of disk space after some time due to the large among of index data. I have
> written a restful service that is being deployed with solr on tomcat to
> identify the large customer (core) indexing requests and consult with a dns
> service, it then off loads the indexing requests to additional solr
> servers,
> and support queries using solr shards on these servers going forward.
>
> We also have replicas for each shard, managed by our own software using
> peer
> model (I am thinking about using solr replications after 1.4).
>
> to me, SolrCould is like sharding+replication+zookeeper. I could be wrong.
> But if I am right, with very big existing data in our service, and we
> already have a lot of software in place working pretty well utilizing solr
> 1.4, I am just trying to figure out if it will worth it to migrate the
> production system to use SolrCloud.
>
> The problem we need to fix is in one area : I need to automate the off-load
> (sharding) process. Right now we use some monitor system to watch for the
> growth on each server. When we find a fast growing large core(customer), we
> will start to manually configure our dns directory and start adding
> shard(s)
> to it (basically we create a same core name on a different solr
> server/instance). my restful service going forward will then direct the
> queries for the customer onto these sharded cores using solr shards.
>
> If SolrCloud can not really help me automate this process, it is not very
> attractive to me right now. I have read some of the topics, I looked into
> distributing indexing, distributed update processor ... none of them can
> help the way I have been looking for. So I guess using solrcloud or not, I
> will need to write my own kind of 'load balancer' for indexing, unless I am
> wrong.
>
> I did come across Jon's white paper on Loggly, I have designed a model
> based
> on what he has done. The solution should be able to automatically creating
> shards, but it will need rsych index files for a core to different server
> and use solr merge to merge small core into larger cores, or use core admin
> to add new core on the fly.
>
> is this approach sounds like someone is already familiar with and had
> out-of-box solution? When I looked into solrcloud, I was expecting some
> pluggable index distributing policy factory I can customize.
> The closest thing I found was  SOLR-2593 (A new core admin action 'split'
> for splitting index ) but not exactly what I wanted.  Let me know if you
> can
> advice me on this more.
>
> thanks
> Jie
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367p4018609.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Add new shard will be treated as replicas in Solr4.0?

2012-11-06 Thread Zeng Lames
got it. thanks a lot


On Tue, Nov 6, 2012 at 8:43 PM, Erick Erickson wrote:

> bq: where can i find all the items on the road map?
>
> Well, you really can't ... There's no "official" roadmap. I happen to
> know this since I follow the developer's list and I've seen references to
> this being important to the folks doing SolrCloud development work and it's
> been a recurring theme on the user's list. It's one of those things that
> _everybody_ understands would be useful in certain circumstances, but
> haven't had time to actually implement yet.
>
> You can track this at: https://issues.apache.org/jira/browse/SOLR-2592
>
> Best
> Erick
>
>
>
> On Mon, Nov 5, 2012 at 7:57 PM, Zeng Lames  wrote:
>
> > btw, where can i find all the items in the road map? thanks!
> >
> >
> > On Tue, Nov 6, 2012 at 8:55 AM, Zeng Lames  wrote:
> >
> > > hi Erick, thanks for your kindly response. hv got the information from
> > the
> > > SolrCloud wiki.
> > > think we may need to defined the shard numbers when we really rollout
> it.
> > >
> > > thanks again
> > >
> > >
> > > On Mon, Nov 5, 2012 at 8:40 PM, Erick Erickson <
> erickerick...@gmail.com
> > >wrote:
> > >
> > >> Not at present. What you're interested in is "shard splitting" which
> is
> > >> certainly on the roadmap but not implemented yet. To expand the
> > >> number of shards you'll have to reconfigure, then re-index.
> > >>
> > >> Best
> > >> Erick
> > >>
> > >>
> > >> On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames 
> > wrote:
> > >>
> > >> > Dear All,
> > >> >
> > >> > we have an existing solr collection, 2 shards, numOfShard is 2. and
> > >> there
> > >> > are already records in the index files. now we start another solr
> > >> instance
> > >> > with ShardId= shard3, and found that Solr treat it as replicas.
> > >> >
> > >> > check the zookeeper data, found the range of shard doesn't
> > >> > change correspondingly. shard 1 is 0-7fff, while shard 2 is
> > >> > 8000-.
> > >> >
> > >> > is there any way to increase new shard for existing collection?
> > >> >
> > >> > thanks a lot!
> > >> > Lames
> > >> >
> > >>
> > >
> > >
> >
>


Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread roz dev
Erick

We have a requirement where seach admin can add or remove some synonyms and
would want these changes to be reflected in search thereafter.

yes, we looked at reload command and it seems to be suitable for that
purpose. We have a master and slave setup so it should be OK to issue
reload command on master. I expect that slaves will pull the latest config
files.

Is reload operation very costly, in terms of time and cpu? We have a
multicore setup and would need to issue reload on multiple cores.

Thanks
Saroj


On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson wrote:

> Not that I know of. This would be extremely expensive in the usual case.
> Loading up configs, reconfiguring all the handlers etc. would add a huge
> amount of overhead to the commit operation, which is heavy enough as it is.
>
> What's the use-case here? Changing your configs really often and reading
> them on commit sounds like a way to make for a very confusing application!
>
> But if you really need to re-read all this info on a running system,
> consider the core admin RELOAD command.
>
> Best
> Erick
>
>
> On Mon, Nov 5, 2012 at 8:43 PM, roz dev  wrote:
>
> > Hi All
> >
> > I am keen to find out if Solr exposes any event listener or other hooks
> > which can be used to re-read configuration files.
> >
> >
> > I know that we have firstSearcher event but I am not sure if it causes
> > request handlers to reload themselves and read the conf files again.
> >
> > For example, if I change the synonym file and solr gets a commit, will it
> > re-initialize request handlers and re-read the conf files.
> >
> > Or, are there some events which can be listened to?
> >
> > Any inputs are welcome.
> >
> > Thanks
> > Saroj
> >
>


Re: load balance with SolrCloud

2012-11-06 Thread Jie Sun
thanks for your feedback Erick.

I am also aware of the current limitation of shard number in a collection is
fixed. changing the number will need re-config and re-index. Let's say if
the limitation gets levitated in near future release, I would then consider
setup collection for each customer, which will include varies number of
shards and their replicas (depend on the customer size and it should grow
dynamically).

 so this will lead to having multiple collections on one solr server
instance... I assume setup n collections on one server is not an issue? or
is it? I am skeptical, see example on solr wiki below, it seems it is
starting a solr instance with a specific collection and its config:
cd example
java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar

thanks
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367p4018659.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread Otis Gospodnetic
Hi,

Note about modifying synonyms - you need to reindex, really, if using
index-time synonyms. And if you're using search-time synonyms you have
multi-word synonym issue described on the Wiki.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 6, 2012 11:02 PM, "roz dev"  wrote:

> Erick
>
> We have a requirement where seach admin can add or remove some synonyms and
> would want these changes to be reflected in search thereafter.
>
> yes, we looked at reload command and it seems to be suitable for that
> purpose. We have a master and slave setup so it should be OK to issue
> reload command on master. I expect that slaves will pull the latest config
> files.
>
> Is reload operation very costly, in terms of time and cpu? We have a
> multicore setup and would need to issue reload on multiple cores.
>
> Thanks
> Saroj
>
>
> On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson  >wrote:
>
> > Not that I know of. This would be extremely expensive in the usual case.
> > Loading up configs, reconfiguring all the handlers etc. would add a huge
> > amount of overhead to the commit operation, which is heavy enough as it
> is.
> >
> > What's the use-case here? Changing your configs really often and reading
> > them on commit sounds like a way to make for a very confusing
> application!
> >
> > But if you really need to re-read all this info on a running system,
> > consider the core admin RELOAD command.
> >
> > Best
> > Erick
> >
> >
> > On Mon, Nov 5, 2012 at 8:43 PM, roz dev  wrote:
> >
> > > Hi All
> > >
> > > I am keen to find out if Solr exposes any event listener or other hooks
> > > which can be used to re-read configuration files.
> > >
> > >
> > > I know that we have firstSearcher event but I am not sure if it causes
> > > request handlers to reload themselves and read the conf files again.
> > >
> > > For example, if I change the synonym file and solr gets a commit, will
> it
> > > re-initialize request handlers and re-read the conf files.
> > >
> > > Or, are there some events which can be listened to?
> > >
> > > Any inputs are welcome.
> > >
> > > Thanks
> > > Saroj
> > >
> >
>


Two questions about solrcloud

2012-11-06 Thread SuoNayi
Hi all,sorry for questions about solrcloud from newbie.
here is my two questions:
1.If I have a solrcloud cluster with two shards and 0 replica on two different 
server.
 when one of server restarts will the solr instance on that server replay
 the transaction log to make sure these operations persistent to the index 
files(commit the transaction log)?
 
2.Assuming I have 3 shards cluster with 4 different server,
it will form a cluster with 3 shard and 1 replica. Can I remove one server to 
reduce
(degrade)the number of servers? if so does I just need to shutdown the server 
and manually remove it's node from ZK?
 
Regards

SuoNayi

Re: Solr Replication is not Possible on RAMDirectory?

2012-11-06 Thread deniz
Erik Hatcher-4 wrote
> There's an open issue (with a patch!) that enables this, it seems:
> ;
> 
>   Erik

well patch seems not doing that... i have tried and still getting some error
lines about the dir types




-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018670.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread roz dev
Thanks Otis for pointing this out.

We may end up using search time synonyms for single word synonym and use
index time synonym for multi world synonyms.

-Saroj


On Tue, Nov 6, 2012 at 8:09 PM, Otis Gospodnetic  wrote:

> Hi,
>
> Note about modifying synonyms - you need to reindex, really, if using
> index-time synonyms. And if you're using search-time synonyms you have
> multi-word synonym issue described on the Wiki.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Nov 6, 2012 11:02 PM, "roz dev"  wrote:
>
> > Erick
> >
> > We have a requirement where seach admin can add or remove some synonyms
> and
> > would want these changes to be reflected in search thereafter.
> >
> > yes, we looked at reload command and it seems to be suitable for that
> > purpose. We have a master and slave setup so it should be OK to issue
> > reload command on master. I expect that slaves will pull the latest
> config
> > files.
> >
> > Is reload operation very costly, in terms of time and cpu? We have a
> > multicore setup and would need to issue reload on multiple cores.
> >
> > Thanks
> > Saroj
> >
> >
> > On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson  > >wrote:
> >
> > > Not that I know of. This would be extremely expensive in the usual
> case.
> > > Loading up configs, reconfiguring all the handlers etc. would add a
> huge
> > > amount of overhead to the commit operation, which is heavy enough as it
> > is.
> > >
> > > What's the use-case here? Changing your configs really often and
> reading
> > > them on commit sounds like a way to make for a very confusing
> > application!
> > >
> > > But if you really need to re-read all this info on a running system,
> > > consider the core admin RELOAD command.
> > >
> > > Best
> > > Erick
> > >
> > >
> > > On Mon, Nov 5, 2012 at 8:43 PM, roz dev  wrote:
> > >
> > > > Hi All
> > > >
> > > > I am keen to find out if Solr exposes any event listener or other
> hooks
> > > > which can be used to re-read configuration files.
> > > >
> > > >
> > > > I know that we have firstSearcher event but I am not sure if it
> causes
> > > > request handlers to reload themselves and read the conf files again.
> > > >
> > > > For example, if I change the synonym file and solr gets a commit,
> will
> > it
> > > > re-initialize request handlers and re-read the conf files.
> > > >
> > > > Or, are there some events which can be listened to?
> > > >
> > > > Any inputs are welcome.
> > > >
> > > > Thanks
> > > > Saroj
> > > >
> > >
> >
>