date:20120615

Re: Starts with Query

2012-06-15 Thread Michael Kuhlmann

It's not necessary to do this. You can simply be happy about the fact 
that all digits are ordered strictly in unicode, so you can use a range 
query:


(f)q={!frange l=0 u=\: incl=true incu=false}title

This finds all documents where any token from the title field starts 
with a digit, so if you want to only find documents where the whole 
title starts with a digit, you need a second field with a string or 
untokenized text type. Use the copyField directive then, as Jack 
Krupansky already suggested in a previous reply.


Greetings,
Kuli


Am 15.06.2012 08:38, schrieb Afroz Ahmad:

If you are not searching for the specific digit and want to match all
documents that start with any digit, you could as part of the indexing
process, have another field say startsWithDigit and set it to true if
it the title begins with a digit. All you need to do at query time then
is query for startsWithDigit =true.
Thanks
Afroz


From: nutchsolruser
Sent: 6/14/2012 11:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Starts with Query
Thanks Jack for valuable response,Actually i am trying to match *any* numeric
pattern at the start of each document.  I dont know documents in index i
just want documents title starting with any digit.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Starts-with-Query-tp3989627p3989761.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: IndexWrite in Lucene/Solr 3.5 is slower?

2012-06-15 Thread Ramprakash Ramamoorthy

On Fri, Jun 15, 2012 at 12:20 PM, pravesh  wrote:

> BTW, Have you changed the MergePolicy & MergeScheduler settings also? Since
> Lucene 3.x/3.5 onwards,
> there have been new MergePolicy & MergeScheduler implementations available,
> like TieredMergePolicy & ConcurrentMergeScheduler.
>
> Regards
> Pravesh
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/IndexWrite-in-Lucene-Solr-3-5-is-slower-tp3989764p3989768.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Thanks for the reply Pravesh. Yes I initially used the default
 TieredMergePolicy and later set the merge policy in both the versions to
LogByteSizeMergePolicy, in order to maintain congruence. But still Lucene
3.5 lagged behind by 2X approx.

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420

Re: DIH idle in transaction forever

2012-06-15 Thread Jasper Floor

Btw, I removed the batchSize but performance is better with
batchSize=1. I haven't done further testing to see what the best
setting is, but the difference between setting it at 1 and not
setting it is almost double the indexing time (~20 minutes vs ~37
minutes)

On Thu, Jun 14, 2012 at 4:49 PM, Jasper Floor  wrote:
> Actually, the readOnly=true makes things worse.
> What it does (among other things) is:
>            c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);
>
> which leads to:
> Caused by: org.postgresql.util.PSQLException: Cannot change
> transaction isolation level in the middle of a transaction.
>
> because the connection is idle in transaction.
>
> I found this issue:
> https://issues.apache.org/jira/browse/SOLR-2045
>
> Patching DIH with the code they suggest seems to work.
>
> mvg,
> Jasper
>
> On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James  
> wrote:
>> Try readOnly="true" in the dataSource configuration.  This causes several 
>> defaults to get set in the JDBC connection, and often will solve problems 
>> like this. (see 
>> http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
>> Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
>> optimal.  This might be better than 1.
>>
>> There is also an issue in that it doesn't explicitly close the resultset but 
>> relies on closing the connection to implicily close the child objects.  I 
>> know when I tried using DIH with Derby a while back this had at the least 
>> caused some log warnings, and it wouldn't work at all without 
>> readOnly=false.  Not sure abour PostgreSql.
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -Original Message-
>> From: Jasper Floor [mailto:jasper.fl...@m4n.nl]
>> Sent: Thursday, June 14, 2012 8:21 AM
>> To: solr-user@lucene.apache.org
>> Subject: DIH idle in transaction forever
>>
>> Hi all,
>>
>> It seems that DIH always holds two connections open to the database.
>> One of them is almost always 'idle in transaction'. It may sometimes
>> seem to do a little work but then it goes idle again.
>>
>>
>> datasource definition:
>>        > jndiName="java:ext_solr_datafeeds_dba" type="JdbcDataSource"
>> autoCommit="false" batchSize="1" />
>>
>> We have a datasource defined in the jndi:
>>        
>>                ext_solr_datafeeds_dba
>>                
>> ext_solr_datafeeds_dba_realm
>>                
>> jdbc:postgresql://db1.live.mbuyu.nl/datafeeds
>>                0
>>                5
>>                
>> TRANSACTION_READ_COMMITTED
>>                org.postgresql.Driver
>>                3
>>                5
>>                SELECT 1
>>                SELECT 
>> 1
>>        
>>
>>
>> If we set autocommit to true then we get an OOM on indexing so that is
>> not an option.
>>
>> Does anyone have any idea why this happens? I would guess that DIH
>> doesn't close the connection, but reading the code I can't be sure of
>> this. The ResultSet object should close itself once it reaches the
>> end.
>>
>> mvg,
>> JAsper

FileListEntityProcessor limit at 11 files?

2012-06-15 Thread Roland Ucker

Hello,

I'm using the DIH to index some PDFs.
Everything works fine for the first 11 files.
But after indexing 11 PDFs the process stops independently of the PDFs
being indexed or the directory structure (recursive="true").
The lucene index for these 11 documents is valid.

Is there anything like a FileListEntityProcessor limit that can be set?

Regards,
Roland

Re: FilterCache - maximum size of document set

2012-06-15 Thread Erick Erickson

Test first, of course, but slave on 3.6 and master on 3.5 should be
fine. If you're
getting evictions with the cache settings that high, you really want
to look at why.

Note that in particular, using NOW in your filter queries virtually guarantees
that they won't be re-used as per the link I sent yesterday.

Best
Erick

On Fri, Jun 15, 2012 at 1:15 AM, Pawel Rog  wrote:
> It can be true that filters cache max size is set to high value. That is
> also true that.
> We looked at evictions and hit rate earlier. Maybe you are right that
> evictions are
> not always unwanted. Some time ago we made tests. There are not so high
> difference in hit rate when filters maxSize is set to 4000 (hit rate about
> 85%) and
> 16000 (hitrate about 91%). I think that also using LFU cache can be helpful
> but
> it makes me to migrate to 3.6. Do you think it is reasonable to use slave on
> version 3.6 and master on 3.5?
>
> Once again, Thanks for your help
>
> --
> Pawel
>
> On Thu, Jun 14, 2012 at 7:22 PM, Erick Erickson 
> wrote:
>
>> Hmmm, your maxSize is pretty high, it may just be that you've set this
>> much higher
>> than is wise. The maxSize setting governs the number of entries. I'd start
>> with
>> a much lower number here, and monitor the solr/admin page for both
>> hit ratio and evictions. Well, and size too. 16,000 entries puts a
>> ceiling of, what,
>> 48G on it? Ouch! It sounds like what's happening here is you're just
>> accumulating
>> more and more fqs over the course of the evening and blowing memory.
>>
>> Not all FQs will be that big, there's some heuristics in there to just
>> store the
>> document numbers for sparse filters, maxDocs/8 is pretty much the upper
>> bound though.
>>
>> Evictions are not necessarily a bad thing, the hit-ratio is important
>> here. And
>> if you're using a bare NOW in your filter queries, you're probably never
>> re-using them anyway, see:
>>
>> http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/
>>
>> I really question whether this limit is reasonable, but you know your
>> situation best.
>>
>> Best
>> Erick
>>
>> On Wed, Jun 13, 2012 at 5:40 PM, Pawel Rog  wrote:
>> > Thanks for your response
>> > Yes, maybe you are right. I thought that filters can be larger than 3M.
>> All
>> > kinds of filters uses BitSet?
>> > Moreover maxSize of filterCache is set to 16000 in my case. There are
>> > evictions during day traffic
>> > but not during night traffic.
>> >
>> > Version of Solr which I use is 3.5
>> >
>> > I haven't used Memory Anayzer yet. Could you write more details about it?
>> >
>> > --
>> > Regards,
>> > Pawel
>> >
>> > On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson <
>> erickerick...@gmail.com>wrote:
>> >
>> >> Hmmm, I think you may be looking at the wrong thing here. Generally, a
>> >> filterCache
>> >> entry will be maxDocs/8 (plus some overhead), so in your case they
>> really
>> >> shouldn't be all that large, on the order of 3M/filter. That shouldn't
>> >> vary based
>> >> on the number of docs that match the fq, it's just a bitset. To see if
>> >> that makes any
>> >> sense, take a look at the admin page and the number of evictions in
>> >> your filterCache. If
>> >> that is > 0, you're probably using all the memory you're going to in
>> >> the filterCache during
>> >> the day..
>> >>
>> >> But you haven't indicated what version of Solr you're using, I'm going
>> >> from a
>> >> relatively recent 3x knowledge-base.
>> >>
>> >> Have you put a memory analyzer against your Solr instance to see where
>> >> the memory
>> >> is being used?
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Jun 13, 2012 at 1:05 PM, Pawel  wrote:
>> >> > Hi,
>> >> > I have solr index with about 25M documents. I optimized FilterCache
>> size
>> >> to
>> >> > reach the best performance (considering traffic characteristic that my
>> >> Solr
>> >> > handles). I see that the only way to limit size of a Filter Cace is to
>> >> set
>> >> > number of document sets that Solr can cache. There is no way to set
>> >> memory
>> >> > limit (eg. 2GB, 4GB or something like that). When I process a standard
>> >> > trafiic (during day) everything is fine. But when Solr handle night
>> >> traffic
>> >> > (and the charateristic of requests change) some problems appear.
>> There is
>> >> > JVM out of memory error. I know what is the reason. Some filters on
>> some
>> >> > fields are quite poor filters. They returns 15M of documents or even
>> >> more.
>> >> > You could say 'Just put that into q'. I tried to put that filters into
>> >> > "Query" part but then, the statistics of request processing time
>> (during
>> >> > day) become much worse. Reduction of Filter Cache maxSize is also not
>> >> good
>> >> > solution because during day cache filters are very very helpful.
>> >> > You could be interested in type of filters that I use. These are range
>> >> > filters (I tried standard range filters and frange) - eg. price:[* TO
>> >> > 1]. Some fq with price can return few thousa

SolrCloud subdirs in conf boostrap dir

2012-06-15 Thread Markus Jelsma

Hi,

We'd like to create subdirectories for each collection in our conf bootstrap 
directory for cleaner maintenance and not having to include the collection name 
in each configuration file. However, it is not working:

2012-06-15 11:31:08,483 ERROR [solr.core.CoreContainer] - [main] - : 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode f
or /configs/COLLECTION_NAME/solrconfig.xml

The solrconfig.xml is in boostrap_conf/dirname/solrconfig.xml and solr.xml's 
solrconfig attribute points to the proper file.

A better question might be, how can i nicely maintain multiple collection 
configuration directories in SolrCloud?

Thanks,
Markus

Re: Dedupe and overwriteDupes setting

2012-06-15 Thread Shameema Umer

Hi,
My solrconfig dedupe setting is as follows.

 

  true
  false
  dupesign
  title,url
  org.apache.solr.update.processor.Lookup3Signature



  

Even though overwriteDupes is set to false, search qiery results show the
contents are overwrtten.

Is this because there are duplicate contents on solr and the query results
is displaying only the latest entery from the duplicate?

I actually need the "date" field not to be overwritten. Please help.

Thanks
Shameema


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dedupe-and-overwriteDupes-setting-tp809320p3989807.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: IndexWrite in Lucene/Solr 3.5 is slower?

2012-06-15 Thread Ramprakash Ramamoorthy

On Fri, Jun 15, 2012 at 12:50 PM, Ramprakash Ramamoorthy <
youngestachie...@gmail.com> wrote:

>
>
> On Fri, Jun 15, 2012 at 12:20 PM, pravesh  wrote:
>
>> BTW, Have you changed the MergePolicy & MergeScheduler settings also?
>> Since
>> Lucene 3.x/3.5 onwards,
>> there have been new MergePolicy & MergeScheduler implementations
>> available,
>> like TieredMergePolicy & ConcurrentMergeScheduler.
>>
>> Regards
>> Pravesh
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/IndexWrite-in-Lucene-Solr-3-5-is-slower-tp3989764p3989768.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
> Thanks for the reply Pravesh. Yes I initially used the default
>  TieredMergePolicy and later set the merge policy in both the versions to
> LogByteSizeMergePolicy, in order to maintain congruence. But still Lucene
> 3.5 lagged behind by 2X approx.
>
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> Engineer Trainee,
> Zoho Corporation.
> +91 9626975420
>
>
Can someone help me with this please?

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420

Re: Building a heat map from geo data in index

2012-06-15 Thread Jamie Johnson

So I've tried this a bit, but I can't get it to look quite right.
What I was doing up until now was taking the center point of the
geohash cell as location for the value I am getting from the index.
Doing this you end up with what appears to be islands (using
HeatMap.js currently).  I guess what I would like to do is take this
information and generate a static image so I can quickly prototype
some things.  Are there any good Java based heatmap tools?  Also if
anyone has done this before any thoughts on how to do this would
really be appreciated.

On Mon, Jun 11, 2012 at 12:52 PM, Jamie Johnson  wrote:
> Yeah I'll have to play to see how useful it is, I really don't know at
> this point.
>
> On another note we already using some binning like is described in teh
> wiki you sent, specifically http://code.google.com/p/javageomodel/ for
> other purposes.  Not sure if that could be used or not, guess I'd have
> to think on it harder.
>
>
> On Mon, Jun 11, 2012 at 12:04 PM, Tanguy Moal  wrote:
>> Yes it looks interesting and is not too difficult to do.
>> However, the length of the geohashes gives you very little control on the
>> size of the regions to colorize. Quoting wikipedia :
>> geohash length
>>
>>
>> km error1
>>
>>
>>
>> ±25002
>>
>>
>>
>> ±6303
>>
>>
>>
>> ±784
>>
>>
>> ±205
>>
>>
>> ±2.46
>>
>>
>>
>> ±0.617
>>
>>
>>
>> ±0.0768
>>
>>
>>
>> ±0.019
>> This is interesting also : http://wiki.openstreetmap.org/wiki/QuadTiles
>> But it does what you're looking for, somehow :)
>>
>> --
>> Tanguy
>>
>>
>> 2012/6/11 Jamie Johnson 
>>
>>> If you look at the Stack response from David he had suggested breaking
>>> the geohash up into pieces and then using a prefix for refining
>>> precision.  I hadn't imagined limiting this to a particular area, just
>>> limiting it based on the prefix (which would be based on users zoom
>>> level or something) allowing the information to become more precise as
>>> the user zoomed in.  That seemed a very reasonable approach to the
>>> problem.
>>>
>>> On Mon, Jun 11, 2012 at 10:55 AM, Tanguy Moal 
>>> wrote:
>>> > There is definitely something interesting to do around geohashes.
>>> >
>>> > I'm wondering how one could map the N by N tiles requested tiles to a
>>> range
>>> > of geohashes. (Where the gap would be a function of N).
>>> > What I try to mean is that I don't know if a bijective function exist
>>> > between tiles and geohash ranges.
>>> > I don't even know if a contiguous range of geohashes ends up in a squared
>>> > box.
>>> >
>>> > Because if you can find such a function, then you could probably solve
>>> the
>>> > issue by asking facet ranges on a geohash field to solr.
>>> >
>>> > I don't if that helps but the topic is very interesting to me...
>>> > Please share your findings, if any :-)
>>> >
>>> > --
>>> > Tanguy
>>> >
>>> > 2012/6/11 Dmitry Kan 
>>> >
>>> >> so it sounds to me, that the geohash is just a hash representation of
>>> lat,
>>> >> lon coordinates for an easier referencing (see e.g.
>>> >> http://en.wikipedia.org/wiki/Geohash).
>>> >> I would probably start with something easier, having bbox lat,lon
>>> >> coordinate pairs of top left corner (or in some coordinate systems, it
>>> is
>>> >> down left corner), break each bbox into cells of size w/N, h/N (and
>>> >> probably, that's equal numbers). Then you can loop over the cells and
>>> >> compute your facet counts with bbox of a cell. You could then evolve
>>> this
>>> >> to geohashes, if you want, but at least you would know where to start.
>>> >>
>>> >> -- Dmitry
>>> >>
>>> >> On Mon, Jun 11, 2012 at 4:48 PM, Jamie Johnson 
>>> wrote:
>>> >>
>>> >> > That is certainly an option but the collecting of the heat map data is
>>> >> > really the question.
>>> >> >
>>> >> > I saw this
>>> >> >
>>> >> >
>>> >> >
>>> >>
>>> http://stackoverflow.com/questions/8798711/solr-using-facets-to-sum-documents-based-on-variable-precision-geohashes
>>> >> >
>>> >> > but don't have a really good understanding of how this would be
>>> >> > accomplished.  I need to get a more firm understanding of geohashes as
>>> >> > my understanding is extremely lacking at this point.
>>> >> >
>>> >> > On Mon, Jun 11, 2012 at 8:55 AM, Stefan Matheis
>>> >> >  wrote:
>>> >> > > I'm not entirely sure, that it has to be that complicated .. what
>>> about
>>> >> > using for example http://www.patrick-wied.at/static/heatmapjs/ ? You
>>> >> > could collect all the geo-related data and do the (heat)map stuff on
>>> the
>>> >> > client.
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote:
>>> >> > >
>>> >> > >> I had a request from a customer which to this point I have not seen
>>> >> > >> much similar so I figured I'd pose the question here. I've been
>>> asked
>>> >> > >> if it was possible to build a heat map from the results of a
>>> query. I
>>> >> > >> can imagine a process to do this through some post processing, but
>>> >> > >> that sounds very expensive for large/distributed indices so

Re: FilterCache - maximum size of document set

2012-06-15 Thread Pawel Rog

Thanks
I don't use NOW in queries. All my filters with timestamp are rounded to
hundreds of
seconds to increase hitrate. The only problem could be in price filters
which can be
varied (users are unpredictable :P), but also that filters from fq or
setting cache=false"
is also bad idea ... checked it :) Load rised three times :)

--
Pawel

On Fri, Jun 15, 2012 at 1:30 PM, Erick Erickson wrote:

> Test first, of course, but slave on 3.6 and master on 3.5 should be
> fine. If you're
> getting evictions with the cache settings that high, you really want
> to look at why.
>
> Note that in particular, using NOW in your filter queries virtually
> guarantees
> that they won't be re-used as per the link I sent yesterday.
>
> Best
> Erick
>
> On Fri, Jun 15, 2012 at 1:15 AM, Pawel Rog  wrote:
> > It can be true that filters cache max size is set to high value. That is
> > also true that.
> > We looked at evictions and hit rate earlier. Maybe you are right that
> > evictions are
> > not always unwanted. Some time ago we made tests. There are not so high
> > difference in hit rate when filters maxSize is set to 4000 (hit rate
> about
> > 85%) and
> > 16000 (hitrate about 91%). I think that also using LFU cache can be
> helpful
> > but
> > it makes me to migrate to 3.6. Do you think it is reasonable to use
> slave on
> > version 3.6 and master on 3.5?
> >
> > Once again, Thanks for your help
> >
> > --
> > Pawel
> >
> > On Thu, Jun 14, 2012 at 7:22 PM, Erick Erickson  >wrote:
> >
> >> Hmmm, your maxSize is pretty high, it may just be that you've set this
> >> much higher
> >> than is wise. The maxSize setting governs the number of entries. I'd
> start
> >> with
> >> a much lower number here, and monitor the solr/admin page for both
> >> hit ratio and evictions. Well, and size too. 16,000 entries puts a
> >> ceiling of, what,
> >> 48G on it? Ouch! It sounds like what's happening here is you're just
> >> accumulating
> >> more and more fqs over the course of the evening and blowing memory.
> >>
> >> Not all FQs will be that big, there's some heuristics in there to just
> >> store the
> >> document numbers for sparse filters, maxDocs/8 is pretty much the upper
> >> bound though.
> >>
> >> Evictions are not necessarily a bad thing, the hit-ratio is important
> >> here. And
> >> if you're using a bare NOW in your filter queries, you're probably never
> >> re-using them anyway, see:
> >>
> >>
> http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/
> >>
> >> I really question whether this limit is reasonable, but you know your
> >> situation best.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Jun 13, 2012 at 5:40 PM, Pawel Rog 
> wrote:
> >> > Thanks for your response
> >> > Yes, maybe you are right. I thought that filters can be larger than
> 3M.
> >> All
> >> > kinds of filters uses BitSet?
> >> > Moreover maxSize of filterCache is set to 16000 in my case. There are
> >> > evictions during day traffic
> >> > but not during night traffic.
> >> >
> >> > Version of Solr which I use is 3.5
> >> >
> >> > I haven't used Memory Anayzer yet. Could you write more details about
> it?
> >> >
> >> > --
> >> > Regards,
> >> > Pawel
> >> >
> >> > On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson <
> >> erickerick...@gmail.com>wrote:
> >> >
> >> >> Hmmm, I think you may be looking at the wrong thing here. Generally,
> a
> >> >> filterCache
> >> >> entry will be maxDocs/8 (plus some overhead), so in your case they
> >> really
> >> >> shouldn't be all that large, on the order of 3M/filter. That
> shouldn't
> >> >> vary based
> >> >> on the number of docs that match the fq, it's just a bitset. To see
> if
> >> >> that makes any
> >> >> sense, take a look at the admin page and the number of evictions in
> >> >> your filterCache. If
> >> >> that is > 0, you're probably using all the memory you're going to in
> >> >> the filterCache during
> >> >> the day..
> >> >>
> >> >> But you haven't indicated what version of Solr you're using, I'm
> going
> >> >> from a
> >> >> relatively recent 3x knowledge-base.
> >> >>
> >> >> Have you put a memory analyzer against your Solr instance to see
> where
> >> >> the memory
> >> >> is being used?
> >> >>
> >> >> Best
> >> >> Erick
> >> >>
> >> >> On Wed, Jun 13, 2012 at 1:05 PM, Pawel 
> wrote:
> >> >> > Hi,
> >> >> > I have solr index with about 25M documents. I optimized FilterCache
> >> size
> >> >> to
> >> >> > reach the best performance (considering traffic characteristic
> that my
> >> >> Solr
> >> >> > handles). I see that the only way to limit size of a Filter Cace
> is to
> >> >> set
> >> >> > number of document sets that Solr can cache. There is no way to set
> >> >> memory
> >> >> > limit (eg. 2GB, 4GB or something like that). When I process a
> standard
> >> >> > trafiic (during day) everything is fine. But when Solr handle night
> >> >> traffic
> >> >> > (and the charateristic of requests change) some problems appear.
> >> There is
> >> >> > JVM out of memory error. I know

SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic

Hi,

How exactly does SolrCloud handle split brain situations?

Imagine a cluster of 10 nodes.
Imagine 3 of them being connected to the network by some switch and imagine the 
out port of this switch dies.
When that happens, these 3 nodes will be disconnected from the other 7 nodes 
and we'll have 2 clusters, one with 3 nodes and one with 7 nodes and we'll have 
a split brain situation.  
Imagine we had 3 ZK nodes in the original 10-node cluster, 2 of which are 
connected to the dead switch and are thus aware only of the 3 node cluster now, 
and 1 ZK instance which is on a different switch and is thus aware only of the 
7 node cluster.

At this point how exactly does ZK make SolrCloud immune to split brain?


Does LBHttpSolrServer play a key role here? (I see LBHttpSolrServer mentioned 
only once on http://wiki.apache.org/solr/SolrCloud and with a question mark 
next to it)


Thanks,
Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm

Re: SolrCloud and split-brain

2012-06-15 Thread Yury Kats

On 6/15/2012 12:49 PM, Otis Gospodnetic wrote:
> Hi,
> 
> How exactly does SolrCloud handle split brain situations?
> 
> Imagine a cluster of 10 nodes.
> Imagine 3 of them being connected to the network by some switch and imagine 
> the out port of this switch dies.
> When that happens, these 3 nodes will be disconnected from the other 7 nodes 
> and we'll have 2 clusters, one with 3 nodes and one with 7 nodes and we'll 
> have a split brain situation.  
> Imagine we had 3 ZK nodes in the original 10-node cluster, 2 of which are 
> connected to the dead switch and are thus aware only of the 3 node cluster 
> now, and 1 ZK instance which is on a different switch and is thus aware only 
> of the 7 node cluster.
> 
> At this point how exactly does ZK make SolrCloud immune to split brain?

A quorum of N/2+1 nodes is required to operate (that's also the reason you need 
at least 3 to begin with)

StreamingUpdateSolrServer Connection Timeout Setting

2012-06-15 Thread Kissue Kissue

Hi,

Does anybody know what the default connection timeout setting is for
StreamingUpdateSolrServer? Can i explicitly set one and how?

Thanks.

Re: SolrCloud and split-brain

2012-06-15 Thread Mark Miller

Zookeeper avoids split brain using Paxos (or something very like it - I can't 
remember if they extended it or modified and/or what they call it).

So you will only ever see one Zookeeper cluster - the smaller partition will be 
down. There is a proof for Paxos if I remember right.

Zookeeper then acts as the system of record for Solr. Solr won't auto form its 
own new little clusters - *the* cluster is modeled in Zookeeper and that's the 
cluster. So Solr does not find it self organizing new mini clusters on 
partition splits.

When we lose our connection to Zookeeper, update requests are no longer 
accepted, because we may have a stale cluster view and not know it for a long 
period of time.

On Jun 15, 2012, at 12:49 PM, Otis Gospodnetic wrote:

> Hi,
> 
> How exactly does SolrCloud handle split brain situations?
> 
> Imagine a cluster of 10 nodes.
> Imagine 3 of them being connected to the network by some switch and imagine 
> the out port of this switch dies.
> When that happens, these 3 nodes will be disconnected from the other 7 nodes 
> and we'll have 2 clusters, one with 3 nodes and one with 7 nodes and we'll 
> have a split brain situation.  
> Imagine we had 3 ZK nodes in the original 10-node cluster, 2 of which are 
> connected to the dead switch and are thus aware only of the 3 node cluster 
> now, and 1 ZK instance which is on a different switch and is thus aware only 
> of the 7 node cluster.
> 
> At this point how exactly does ZK make SolrCloud immune to split brain?
> 
> 
> Does LBHttpSolrServer play a key role here? (I see LBHttpSolrServer mentioned 
> only once on http://wiki.apache.org/solr/SolrCloud and with a question mark 
> next to it)
> 
> 
> Thanks,
> Otis
> 
> Performance Monitoring for Solr / ElasticSearch / HBase - 
> http://sematext.com/spm

- Mark Miller
lucidimagination.com

Re: SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic

Hi,
 
> Zookeeper avoids split brain using Paxos (or something very like it - I 

> can't remember if they extended it or modified and/or what they call it).
> 
> So you will only ever see one Zookeeper cluster - the smaller partition will 
> be 
> down. There is a proof for Paxos if I remember right.
> 
> Zookeeper then acts as the system of record for Solr. Solr won't auto form 
> its own new little clusters - *the* cluster is modeled in Zookeeper and 
> that's the cluster. So Solr does not find it self organizing new mini 
> clusters on partition splits.
> 
> When we lose our connection to Zookeeper, update requests are no longer 
> accepted, because we may have a stale cluster view and not know it for a long 
> period of time.


Does this work even when outside clients (apps for indexing or searching) send 
their requests directly to individual nodes?
Let's use the example from my email where we end up with 2 groups of nodes: 
7-node group with 2 ZK nodes on the same network and 3-node group with 1 ZK 
node on the same network.

If a client sends a request to a node in the 7-node group what happens?
And if a client sends a request to a node in the 3-node group what happens?

Yury wrote:
> A quorum of N/2+1 nodes is required to operate (that's also the reason you 
>need at least 3 to begin with)

N=3 (ZK nodes), right?
So in that case we need at least 3/2+1 => 2.5 ZK nodes to operate.  So in my 
example neither the 7-node group nor the 3-node group will operate (does that 
mean request rejection or something else?) because neither sees 2.5 ZK nodes?

Thanks,
Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




> On Jun 15, 2012, at 12:49 PM, Otis Gospodnetic wrote:
> 
>>  Hi,
>> 
>>  How exactly does SolrCloud handle split brain situations?
>> 
>>  Imagine a cluster of 10 nodes.
>>  Imagine 3 of them being connected to the network by some switch and imagine 
> the out port of this switch dies.
>>  When that happens, these 3 nodes will be disconnected from the other 7 
> nodes and we'll have 2 clusters, one with 3 nodes and one with 7 nodes and 
> we'll have a split brain situation.  
>>  Imagine we had 3 ZK nodes in the original 10-node cluster, 2 of which are 
> connected to the dead switch and are thus aware only of the 3 node cluster 
> now, 
> and 1 ZK instance which is on a different switch and is thus aware only of 
> the 7 
> node cluster.
>> 
>>  At this point how exactly does ZK make SolrCloud immune to split brain?
>> 
>> 
>>  Does LBHttpSolrServer play a key role here? (I see LBHttpSolrServer 
> mentioned only once on http://wiki.apache.org/solr/SolrCloud and with a 
> question 
> mark next to it)
>> 
>> 
>>  Thanks,
>>  Otis
>>  
>>  Performance Monitoring for Solr / ElasticSearch / HBase - 
> http://sematext.com/spm
> 
> - Mark Miller
> lucidimagination.com
>

Re: SolrCloud and split-brain

2012-06-15 Thread Mark Miller

On Jun 15, 2012, at 1:44 PM, Otis Gospodnetic wrote:

> Does this work even when outside clients (apps for indexing or searching) 
> send their requests directly to individual nodes?
> Let's use the example from my email where we end up with 2 groups of nodes: 
> 7-node group with 2 ZK nodes on the same network and 3-node group with 1 ZK 
> node on the same network.

The 3-node group with 1 ZK would not have a functioning zk - so it would stop 
accepting updates. If it could serve a complete view of the index, it would 
though, for searches.

The 7-node group would have a working ZK it could talk to, and it would 
continue to accept updates as long as a node for a shard for that hash range is 
up. It would also of course serve searches.

In this case, hitting a box in the 3-node group for searches would start 
becoming stale. A smart client would no longer hit those boxes though.

If you have a 'dumb' client or load balancer, then yes - you would have to 
remove the bad nodes from rotation.

We could improve this or make the behavior configurable. At least initially 
though, we figured it was better if we kept serving searches even when we 
cannot talk to zookeeper.

> 
> If a client sends a request to a node in the 7-node group what happens?
> And if a client sends a request to a node in the 3-node group what happens?

- Mark Miller
lucidimagination.com

Re: StreamingUpdateSolrServer Connection Timeout Setting

2012-06-15 Thread Sami Siren

The api doc for version 3.6.0 is available here:
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html

I think the default is coming from your OS if you are not setting it explicitly.

--
 Sami Siren

On Fri, Jun 15, 2012 at 8:22 PM, Kissue Kissue  wrote:
> Hi,
>
> Does anybody know what the default connection timeout setting is for
> StreamingUpdateSolrServer? Can i explicitly set one and how?
>
> Thanks.

Re: SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic

Ola,

Thanks Mark!
 
>>  Does this work even when outside clients (apps for indexing or searching) 

> send their requests directly to individual nodes?
>>  Let's use the example from my email where we end up with 2 groups of 
> nodes: 7-node group with 2 ZK nodes on the same network and 3-node group with 
> 1 
> ZK node on the same network.
> 
> The 3-node group with 1 ZK would not have a functioning zk - so it would stop 
> accepting updates. If it could serve a complete view of the index, it would 
> though, for searches.


So in this case information in this 1 ZK node would tell the 3 Solr nodes 
whether they have all index data or if some shards are missing (i.e. were only 
on nodes in the other 7-node group)?
And if nodes figure out they don't have all index data they will reject search 
requests?  Or will they accept and perform searches, but return responses that 
tell the client that the searched index was not complete?

> The 7-node group would have a working ZK it could talk to, and it would 
> continue 
> to accept updates as long as a node for a shard for that hash range is up. It 
> would also of course serve searches.


Right, so if the node for the shard where a doc is supposed to go to is in that 
3-node group, then the indexing request will be rejected.  Is this correct?

> In this case, hitting a box in the 3-node group for searches would start 
> becoming stale. A smart client would no longer hit those boxes though.
> 
> If you have a 'dumb' client or load balancer, then yes - you would have 
> to remove the bad nodes from rotation.


Aha, yes and yes.

> We could improve this or make the behavior configurable. At least initially 
> though, we figured it was better if we kept serving searches even when we 
> cannot 
> talk to zookeeper.


Makes sense.  Do responses carry something to alert the client that "something 
is rotten in the state of cluster"?

Thanks,

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm

Re: SolrCloud and split-brain

2012-06-15 Thread Mark Miller

On Jun 15, 2012, at 2:12 PM, Otis Gospodnetic wrote:

> Makes sense.  Do responses carry something to alert the client that 
> "something is rotten in the state of cluster"?

No, I don't think so - we should probably add that to the header similar to how 
I assume partial results will work.

Feel free to fire up a JIRA issue for that.

- Mark Miller
lucidimagination.com

Re: SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic

Thanks Mark, will open an issue in a bit.

But I think the following is the real meat of the Q about split brain and 
SolrCloud, especially when it comes to how indexing is handled during split 
brain:

>>  Does this work even when outside clients (apps for indexing or searching) 

> send their requests directly to individual nodes?
>>  Let's use the example from my email where we end up with 2 groups of 
> nodes: 7-node group with 2 ZK nodes on the same network and 3-node group with 
> 1 
> ZK node on the same network.
> 
> The 3-node group with 1 ZK would not have a functioning zk - so it would stop 
> accepting updates. If it could serve a complete view of the index, it would 
> though, for searches.

So in this case information in this 1 ZK node would tell the 3 Solr nodes 
whether they have all index data or if some shards are missing (i.e. were only 
on nodes in the other 7-node group)?
And if nodes figure out they don't have all index data they will reject search 
requests?  Or will they accept and perform searches, but return responses that 
tell the client that the searched index was not complete?

> The 7-node group would have a working ZK it could talk to, and it would 
> continue 
> to accept updates as long as a node for a shard for that hash range is up. It 
> would also of course serve searches.

Right, so if the node for the shard where a doc is supposed to go to is in that 
3-node group, then the indexing request will be rejected.  Is this correct? 



Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



- Original Message -
> From: Mark Miller 
> To: solr-user 
> Cc: 
> Sent: Friday, June 15, 2012 2:22 PM
> Subject: Re: SolrCloud and split-brain
> 
> 
> On Jun 15, 2012, at 2:12 PM, Otis Gospodnetic wrote:
> 
>>  Makes sense.  Do responses carry something to alert the client that 
> "something is rotten in the state of cluster"?
> 
> No, I don't think so - we should probably add that to the header similar to 
> how I assume partial results will work.
> 
> Feel free to fire up a JIRA issue for that.
> 
> - Mark Miller
> lucidimagination.com
>

WordBreak and default dictionary crash Solr

2012-06-15 Thread Carrie Coy


Is this a configuration problem or a bug?

We use two dictionaries, default (spellcheckerFreq)  and 
solr.WordBreakSolrSpellChecker.  When a query contains 2 misspellings, 
one corrected by the default dictionary, and the other corrected by the 
wordbreak dictionary ("strawberryn shortcake") , Solr crashes with error 
below.   It doesn't matter which dictionary is checked first.


java.lang.NullPointerException
at 
org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:566)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1555)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)


Multiple errors corrected by the SAME dictionary (either wordbreak or 
default) do not crash Solr.   Here is excerpt from our solrconfig.xml:



textSpell

wordbreak
solr.WordBreakSolrSpellChecker
spell
true
true
1


default
spell
spellcheckerFreq
true





   .
wordbreak
default
3
true
false

Re: SolrCloud and split-brain

2012-06-15 Thread Mark Miller


On Jun 15, 2012, at 3:21 PM, Otis Gospodnetic wrote:

> Thanks Mark, will open an issue in a bit.
> 
> But I think the following is the real meat of the Q about split brain and 
> SolrCloud, especially when it comes to how indexing is handled during split 
> brain:
> 
>>>   Does this work even when outside clients (apps for indexing or searching) 
> 
>> send their requests directly to individual nodes?
>>>   Let's use the example from my email where we end up with 2 groups of 
>> nodes: 7-node group with 2 ZK nodes on the same network and 3-node group 
>> with 1 
>> ZK node on the same network.
>>  
>> The 3-node group with 1 ZK would not have a functioning zk - so it would 
>> stop 
>> accepting updates. If it could serve a complete view of the index, it would 
>> though, for searches.
> 
> So in this case information in this 1 ZK node would tell the 3 Solr nodes 
> whether they have all index data or if some shards are missing (i.e. were 
> only on nodes in the other 7-node group)?
> And if nodes figure out they don't have all index data they will reject 
> search requests?  Or will they accept and perform searches, but return 
> responses that tell the client that the searched index was not complete?

The 1 ZK node will not function, so the 3 Solr nodes will not accept updates.

If there is one replica for each shard available, search will still work. I 
don't think partial results has been committed yet for distrib search. In that 
case, we will put something in the header to indicate a full copy of the index 
was not available. I think we can also add something in the header if we know 
we cannot talk to zookeeper to let the client know it could be seeing stale 
state. SmartClients that talked to zookeeper would see those nodes appear as 
down in zookeeper and stop trying to talk to them.

> 
>> The 7-node group would have a working ZK it could talk to, and it would 
>> continue 
>> to accept updates as long as a node for a shard for that hash range is up. 
>> It 
>> would also of course serve searches.
> 
> Right, so if the node for the shard where a doc is supposed to go to is in 
> that 3-node group, then the indexing request will be rejected.  Is this 
> correct? 

it depends on what is available - but you will need at least one replica for 
each shard available - eg your partition needs to have one copy of the index - 
otherwise updates are rejected if there are no nodes hosting a shard of the 
hash range. So if a replica made it into the larger partition, you will be fine 
- it will become the leader.

> 
> 
> 
> Otis 
> 
> Performance Monitoring for Solr / ElasticSearch / HBase - 
> http://sematext.com/spm 
> 
> 
> 
> - Original Message -
>> From: Mark Miller 
>> To: solr-user 
>> Cc: 
>> Sent: Friday, June 15, 2012 2:22 PM
>> Subject: Re: SolrCloud and split-brain
>> 
>> 
>> On Jun 15, 2012, at 2:12 PM, Otis Gospodnetic wrote:
>> 
>>> Makes sense.  Do responses carry something to alert the client that 
>> "something is rotten in the state of cluster"?
>> 
>> No, I don't think so - we should probably add that to the header similar to 
>> how I assume partial results will work.
>> 
>> Feel free to fire up a JIRA issue for that.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 

- Mark Miller
lucidimagination.com

RE: WordBreak and default dictionary crash Solr

2012-06-15 Thread Dyer, James

Carrie,

Thank you for trying out new features!  I'm pretty sure you've found a bug 
here.  Could you tell me whether you're using a build from Trunk or Solr_4x ?  
Also, do you know the svn revision or the Jenkins build # (or timestamp) you're 
working from?

Could you try instead to use DirectSolrSpellChecker instead of 
IndexBasedSpellChecker for your "default" dictionary?  (In Trunk and the 4.x 
branch, the Solr Example now uses DirectSolrSpellChecker as its default.)  It 
could be this is a problem related to using WordBreakSolrSpellChecker with the 
older IndexBasedSpellChecker.  So if you have better luck with 
DirectSolrSpellChecker, that would be helpful in honing in on the exact problem.

Also, judging from the line that is failing, could it be you're using a build 
based on svn revision pre-r1346489 (Trunk) or pre-r1346499 (Branch_4x) ?  
https://issues.apache.org/jira/browse/SOLR-2993  Shortly after the initial 
commit of this feature, a bug similar to the one you're reporting was later 
fixed with these subsequent revisions.  

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Carrie Coy [mailto:c...@ssww.com] 
Sent: Friday, June 15, 2012 2:46 PM
To: solr-user@lucene.apache.org
Subject: WordBreak and default dictionary crash Solr

Is this a configuration problem or a bug?

We use two dictionaries, default (spellcheckerFreq)  and 
solr.WordBreakSolrSpellChecker.  When a query contains 2 misspellings, 
one corrected by the default dictionary, and the other corrected by the 
wordbreak dictionary ("strawberryn shortcake") , Solr crashes with error 
below.   It doesn't matter which dictionary is checked first.

java.lang.NullPointerException
 at 
org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:566)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1555)
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
 at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:662)


Multiple errors corrected by the SAME dictionary (either wordbreak or 
default) do not crash Solr.   Here is excerpt from our solrconfig.xml:


textSpell

wordbreak
solr.WordBreakSolrSpellChecker
spell
true
true
1


default
spell
spellcheckerFreq
true





.
wordbreak
default
3
true
false

Re: How to boost a field with another field's value?

2012-06-15 Thread smita

Actually I have a title field that I am searching for my query term, and the
documents have a rating field that I want to boost the results by, so the
higher rated items appear before the lower rated documents.

I am also boosting results on another field using bq:

q=summer&df=title&bq=sponsored:true^5.0&qf=rating^2.0&defType=dismax

However, when I use qf to boost the results by rating, Sorl is trying to
match the query in the rating field. How can I accomplish boosting by rating
using query time boosting?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-a-field-with-another-field-s-value-tp3989706p3989917.html
Sent from the Solr - User mailing list archive at Nabble.com.

Writing index files that have the right owner

2012-06-15 Thread Mike O'Leary

I have been putting together an application using Quartz to run several 
indexing jobs in sequence using SolrJ and Tomcat on Windows. I would like the 
Quartz job to do the following:

1.   Delete index directories from the cores so each indexing job starts 
fresh with empty indexes to populate.

2.   Start the Tomcat server.

3.   Run the indexing job.

4.   Stop the Tomcat server.

5.   Copy the index directories to an archive.

Steps 2-5 work fine, but I haven't been able to find a way to delete the index 
directories from within Java. I also can't delete them from a Windows command 
shell window: I get an error message that says "Access is denied". The reason 
for this is that the index directories and files have the owner 
"BUILTIN\Administrators". Although I am an administrator on this machine, the 
fact that these files have a different owner means that I can only delete them 
in a Windows command shell window if I start it with "Run as administrator". I 
spent a bunch of time today trying every Java function and Windows shell 
command I could find that would let me change the owner of these files, grant 
my user account the capability to delete the files, etc. Nothing I tried 
worked, likely because along with not having permission to delete the files, I 
also don't have permission to give myself permission to delete the files.

At a certain point I stopped wondering how to change the files owner or 
permissions and started wondering why the files have "BUILTIN\Administrators" 
as owner, and the permissions associated with that owner, in the first place. 
Is there somewhere in the Solr or Tomcat configuration files, or in the SolrJ 
code, where I can set who the owner of files written to the index directories 
should be?
Thanks,
Mike

Re: Solr Search Count Variance

2012-06-15 Thread Jack Krupansky

The "variance" is simply likely due to the fact that your "text" field is 
analyzed differently than the source fields you include in your dismax "qf". 
For example, maybe some of them may be "string" with no analysis. So, fewer 
of those fields are matching on your query terms when using dismax.


Look at the results of both queries and then try querying on the specific 
fields of a document that is found by the traditional Lucene/Solr query 
parser but not found using dismax.


-- Jack Krupansky

-Original Message- 
From: mechravi25

Sent: Friday, June 15, 2012 1:16 AM
To: solr-user@lucene.apache.org
Subject: Solr Search Count Variance

Hi all,

When we give a search request to solr, the part of the request url to solr
having the search query will be as following

/select/?qf=name%5e2.3+text+r_name%5e0.3+id%5e0.3+xid%5e0.3&fl=*&f.tFacet.facet.mincount=1&facet.field=tFacet&f.rFacet.facet.mincount=1&facet.field=rFacet&facet=true&hl.fl=*&hl=true&rows=10&start=0&q=test+Log&debugQuery=on?

We find the number of documnts returned to be 5000 (approx.). Here, it makes
use of the standard handler and we get the parsed query as follows

(text:Cxx1 text:test) (text:Dyy3 text:Log)
(text:Cxx1 text:test) (text:Dyy3
text:Log)

here, text is the default field and this is used by the standard handler and
it is the destination field for all the other fields.

The same way, when we alter the above url to fetch the result by using the
dismax handler,

/select/?qf=name%5e2.3+text+r_name%5e0.3+id%5e0.3+xid%5e0.3&qt=dismax&fl=*&f.tFacet.facet.mincount=1&facet.field=tFacet&f.rFacet.facet.mincount=1&facet.field=rFacet&facet=true&hl.fl=*&hl=true&rows=10&start=0&q=test+Log&debugQuery=on?

We find the number of documents found to be 710 and the parsed query is as
follows

+((DisjunctionMaxQuery((xid:test^0.3 | id:test^0.3 |
((r_name:Cxx1 r_name:test)^0.3) | (text:Cxx1 text:test) | ((name:Cxx1
name:test)^2.3))) DisjunctionMaxQuery((xid:Log^0.3 | id:Log^0.3 |
((r_name:Dyy3 r_name:Log)^0.3) | (text:Dyy3 text:Log) | ((name:Dyy3
name:Log)^2.3~2) ()
 +(((xid:test^0.3 | id:test^0.3 |
((r_name:Cxx1 r_name:test)^0.3) | (text:Cxx1 text:test) | ((name:Cxx1
name:test)^2.3)) (xid:Log^0.3 | id:Log^0.3 | ((r_name:Dyy3 r_name:Log)^0.3)
| (text:Dyy3 text:Log) | ((name:Dyy3 name:Log)^2.3)))~2) ()

If we try to give the boosts like dismax in q parameter for standard, its
working fine i.e. the total number of documents fetched is 710. The query
used is as follows

q:(name:test^2.3 AND name:Log^2.3)OR(text:test AND
text:Log)OR(r_name:test^0.3 AND r_name:Log^0.3)OR(id:test^0.3 AND
id:Log^0.3)OR(xid:test^0.3 AND xid:Log^0.3)

I have two doubts here

1. Why is there a count difference of this extent between the standard and
dismax handler?
2. Does the dismax handler use AND operation in the phrase query (when we
use with/without quotes)?

Can you please explain me the same?

Thanks in advance

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-Count-Variance-tp3989760.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic

Thanks Mark.

The reason I asked this is because I saw mentions of SolrCloud being resilient 
to split brain because it uses ZooKeeper.
However, if my half brain understands what split brain is then I think that's 
not a completely true claim because one can get unlucky and get a SolrCloud 
cluster partitioned in a way that one or even all partitions reject indexing 
(and update and deletion) requests if they do not have a complete index.

In my example of a 10-node cluster that gets split into a 7-node and a 3-node 
partition, if neither partition ends up containing the full index (i.e. at 
least one copy of each shard) then neither partition will accept updates.

And here is one more Q.
* Imagine a client is adding documents and, for simplicity, imagine SolrCloud 
routes all these documents to the same shard, call it S.
* Imagine that both the 7-node and the 3-node partition end up with a complete 
index and thus both accept updates.
* This means that both the 7-node and the 3-node partition have at least one 
replica of shard S, lets call then S7 and S3.
* Now imagine if the client sending documents for indexing happened to be 
sending documents to 2 nodes, say in round-robin fashion.
* And imagine that each of these 2 nodes ended up in a different partition.

The client now keeps sending docs to these 2 nodes and both happily take and 
index documents in their own copies of S.
To the client everything looks normal - all documents are getting indexed.
But S7 and S3 are no longer the same - they contain different documents!

Problem, no?
What happens with somebody fixes the cluster and all nodes are back in the same 
10-node cluster?  What happens to S7 and S3?
Wouldn't SolrCloud have to implement bi-directional synchronization to fix 
things and "unify" S7 and S3?

And if there are updates and deletes involved, things get even messier :(

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 

- Original Message -
> From: Mark Miller 
> To: solr-user 
> Cc: 
> Sent: Friday, June 15, 2012 5:07 PM
> Subject: Re: SolrCloud and split-brain
> 
> 
> On Jun 15, 2012, at 3:21 PM, Otis Gospodnetic wrote:
> 
>>  Thanks Mark, will open an issue in a bit.
>> 
>>  But I think the following is the real meat of the Q about split brain and 
> SolrCloud, especially when it comes to how indexing is handled during split 
> brain:
>> 
    Does this work even when outside clients (apps for indexing or 
> searching) 
>> 
>>>  send their requests directly to individual nodes?
    Let's use the example from my email where we end up with 2 
> groups of 
>>>  nodes: 7-node group with 2 ZK nodes on the same network and 3-node 
> group with 1 
>>>  ZK node on the same network.
>>>   
>>>  The 3-node group with 1 ZK would not have a functioning zk - so it 
> would stop 
>>>  accepting updates. If it could serve a complete view of the index, it 
> would 
>>>  though, for searches.
>> 
>>  So in this case information in this 1 ZK node would tell the 3 Solr nodes 
> whether they have all index data or if some shards are missing (i.e. were 
> only 
> on nodes in the other 7-node group)?
>>  And if nodes figure out they don't have all index data they will reject 
> search requests?  Or will they accept and perform searches, but return 
> responses 
> that tell the client that the searched index was not complete?
> 
> The 1 ZK node will not function, so the 3 Solr nodes will not accept updates.
> 
> If there is one replica for each shard available, search will still work. I 
> don't think partial results has been committed yet for distrib search. In 
> that case, we will put something in the header to indicate a full copy of the 
> index was not available. I think we can also add something in the header if 
> we 
> know we cannot talk to zookeeper to let the client know it could be seeing 
> stale 
> state. SmartClients that talked to zookeeper would see those nodes appear as 
> down in zookeeper and stop trying to talk to them.
> 
>> 
>>>  The 7-node group would have a working ZK it could talk to, and it would 
> continue 
>>>  to accept updates as long as a node for a shard for that hash range is 
> up. It 
>>>  would also of course serve searches.
>> 
>>  Right, so if the node for the shard where a doc is supposed to go to is in 
> that 3-node group, then the indexing request will be rejected.  Is this 
> correct? 
> 
> 
> it depends on what is available - but you will need at least one replica for 
> each shard available - eg your partition needs to have one copy of the index 
> - 
> otherwise updates are rejected if there are no nodes hosting a shard of the 
> hash 
> range. So if a replica made it into the larger partition, you will be fine - 
> it 
> will become the leader.
> 
>> 
>> 
>> 
>>  Otis 
>>  
>>  Performance Monitoring for Solr / ElasticSearch / HBase - 
> http://sematext.com/spm 
>> 
>> 
>> 
>>  - Original Message -
>>>  From: Mark Miller 
>>>  To

solr exception occur during index creating

2012-06-15 Thread robinsolr

Hi All,

I am using solr server to create index, but during index creation i am
facing following error, Let me know if naybody resolve this issue


request:
http://solr.c2gcampus.com/update/extract?literal.entityid=1013#2&literal.entitycode=2&literal.docfilepath=/school_3/documentType_1/data/52.pdf&literal.docfiletype=1&literal.doccreatedby=101&literal.docfilename=Applicant_Report-BY_CATEGORY.pdf&literal.dockeyword=gdffdg&literal.libauthor=hghgh&literal.libtitle=book98&literal.libonlinecount=1&literal.libisno=53456&literal.libpublisher=&literal.libsubjectid=3&literal.libcategory=0&literal.cschoolid=3&literal.libkeywords=gdffdg&literal.libdocid=52&commit=true&waitFlush=true&waitSearcher=true&wt=xml&version=2.2>
2012-06-15 10:59:28,243 ERROR
[com.connecttwogurukul.base.solrj.util.SolrPlainUpdateQuery.createLibraryIndex:87]
- <[SolrPlainUpdateQuery][createLibraryIndex] SolrException
ERROR:org.apache.solr.common.SolrException: Not Found

Not Found

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-exception-occur-during-index-creating-tp3989946.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Starts with Query

Re: IndexWrite in Lucene/Solr 3.5 is slower?

Re: DIH idle in transaction forever

FileListEntityProcessor limit at 11 files?

Re: FilterCache - maximum size of document set

SolrCloud subdirs in conf boostrap dir

Re: Dedupe and overwriteDupes setting

Re: IndexWrite in Lucene/Solr 3.5 is slower?

Re: Building a heat map from geo data in index

Re: FilterCache - maximum size of document set

SolrCloud and split-brain

Re: SolrCloud and split-brain

StreamingUpdateSolrServer Connection Timeout Setting

Re: SolrCloud and split-brain

Re: SolrCloud and split-brain

Re: SolrCloud and split-brain

Re: StreamingUpdateSolrServer Connection Timeout Setting

Re: SolrCloud and split-brain

Re: SolrCloud and split-brain

Re: SolrCloud and split-brain

WordBreak and default dictionary crash Solr

Re: SolrCloud and split-brain

RE: WordBreak and default dictionary crash Solr

Re: How to boost a field with another field's value?

Writing index files that have the right owner

Re: Solr Search Count Variance

Re: SolrCloud and split-brain

solr exception occur during index creating

28 matches

Site Navigation

Mail list logo

Footer information