Re: Couple issues with edismax in 3.5

2012-02-29 Thread Ahmet Arslan
> 1. Search for 4X6 generated the following parsed query:
> +DisjunctionMaxQueryid:4 id:x id:6)^1.2) | ((name:4
> name:x
> name:6)^1.025) )
> while the search for "4 X 6" (with space in between) 
> generated the query
> below: (I like this one)
> +((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
> +((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
> +((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)
> 
> Is that really intentional? The first query is pretty weird
> because it will
> return all of the docs with one of 4, x, 6.

Minimum Should Match (mm) parameter is used to control how many search terms 
should match. For example, you can set it to &mm=100%.

Also you can tweak relevancy be setting phrase fields (pf) parameter.

> Any easy way we can force "4X6" search to be the same as "4
> X 6"?
> 
> 2. Issue with multi words synonym because edismax separates
> keywords to
> multiple words via the line below:
> clauses = splitIntoClauses(userQuery, false);
> and seems like edismax doesn't quite respect fieldType at
> query time, for
> example, handling stopWords differently than what's
> specified in schema.
> 
> For example: I have the following synonym:
> AAA BBB, AAABBB, AAA-BBB, CCC DDD
> 
> When I search for "AAA-BBB", it works, however search for
> "CCC DDD" was not
> returning results containing AAABBB. What is interesting is
> that
> admin/analysis.jsp is returning great results.

Query string is tokenized (according to white spaces) before it reaches 
analyzer. https://issues.apache.org/jira/browse/LUCENE-2605
That's why multi-word synonyms are not advised to use at query time. 

Analysis.jsp does not perform actual query parsing.


Re: Is there a way to implement a IntRangeField in Solr?

2012-02-29 Thread Mikhail Khludnev
Hold on. It's possible to find leases, at the particular date and collapse
them to appartments. But it looks impossible to negate those busy
apartments. Or I don't know how.

Let's try with http://wiki.apache.org/solr/Join
If you have lease documents with "FK" field LEASE_APT_FK *and* appartment
documents with "PK" APT_PK.
you can search leases for the given date (overlappingleases= below), then
join leases to apartments (&busyapts=) and negate those apartments (}&fq=).

overlappingleases=(from:[$need_from TO $need_to] ) OR (to:[$need_from TO
$need_to]) OR ((from:[* TO $need_from]) AND (to:[$need_from TO
*]))&busyapts={!join from=LEASE_APT_FK to=APT_PK
v=$overlappingleases}&fq=NOT _query_:{!v=$busyapts}&q=*:*

FWIW i use some magic from http://wiki.apache.org/solr/LocalParams and
http://wiki.apache.org/solr/SolrQuerySyntax


On Tue, Feb 28, 2012 at 11:11 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

>
>
> On Tue, Feb 28, 2012 at 6:44 PM, federico.wachs  > wrote:
>
>> Hi Mikhail, thanks for your concern and reply.
>>
>> I've read a few dozen times your reply and I think I get what you mean,
>> but
>> I'm not exactly sure how to go forward with your approach. You are saying
>> that I should be able to have nested documents, but I haven't been able to
>> submit a Document with another Document on it so far.
>>
>> I'm using SolrJ to integrate with my Solr servers, do you think you could
>> guide me a bit on how you would accomplish to nest two different kinds of
>> documents?
>>
>
> Ok start from Lease documents with ApartmentID and than query leases and
> specify &group.field=ApartmentID&group=true
>
> Disclaimer: I've never done grouping and I am not really familiar with
> SolrJ,
>
>
>>
>> Thank you for your time and explanation I really appreciate it!
>>
>> Regards,
>> Federicp
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3784220.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>
> 
>  
>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: Building a resilient cluster

2012-02-29 Thread Andre Bois-Crettez

You have to run ZK on a at least 3 different machines for fault
tolerance (a ZK ensemble).
http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble


Ranjan Bagchi wrote:

Hi,

I'm interested in setting up a solr cluster where each machine [at least
initially] hosts a separate shard of a big index [too big to sit on the
machine].  I'm able to put a cloud together by telling it that I have (to
start out with) 4 nodes, and then starting up nodes on 3 machines pointing
at the zkInstance.  I'm able to load my sharded data onto each machine
individually and it seems to work.

My concern is that it's not fault tolerant:  if one of the non-zookeeper
machines falls over, the whole cluster won't work.  Also, I can't create a
shard with more data, and have it work within the existing cloud.

I tried using -DshardId=shard5 [on an existing 4-shard cluster], but it
just started replicating, which doesn't seem right.

Are there ways around this?

Thanks,
Ranjan Bagchi




--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


How to only count distinct facet values of each group

2012-02-29 Thread Mathias Hodler
Hi,

I'm looking for a parameter like "group.truncate=true". Though I not
only want to count facets based on the most relevant document of each
group but based on all documents. Moreover if a facet value is in more
than in one document of a group it should only count once.

Example:

Doc 1:
type: shirt
color: green

Doc 2:
type: shirt
color: green

Doc 3:
type: shirt
color: blue

Doc 4:
type: pants
color: black

grouping by field 'type' should create following facet:

Color:
green: 1
blue:   1
black: 1

Thanks.


Re: Unique key constraint and optimistic locking (versioning)

2012-02-29 Thread Per Steffensen
Created SOLR-3178 covering the versioning/optimistic-locking part. In 
combination SOLR-3173 and SOLR-3178 should provide the features I am 
missing, and that I believe lots of other SOLR users will be able to 
benefit from. Please help shape by commenting on the Jira issues. Thanks.


Per Steffensen skrev:
Created SOLR-3173 on the part about making insert fail if document 
(with same uniqueKey) already exists. SOLR-3173 also includes to make 
"update" not insert document if not already exists - just for 
consistency with normal RDBMS behaviour. So basically the feature 
allowes you to turn on this behaviour of having "database" (RDBMS) 
semantics, and when you do you get both.
Tomorrrow will create another Jira issue on the "versioning/optimistic 
locking" part.


Per Steffensen skrev:

Hi

Does solr/lucene provide any mechanism for "unique key constraint" 
and "optimistic locking (versioning)"?
Unique key constraint: That a client will not succeed creating a new 
document in solr/lucene if a document already exists having the same 
value in some field (e.g. an id field). Of course implemented right, 
so that even though two or more threads are concurrently trying to 
create a new document with the same value in this field, only one of 
them will succeed.
Optimistic locking (versioning): That a client will only succeed 
updating a document if this updated document is based on the version 
of the document currently stored in solr/lucene. Implemented in the 
optimistic way that clients during an update have to tell which 
version of the document they fetched from Solr and that they 
therefore have used as a starting-point for their updated document. 
So basically having a version field on the document that clients 
increase by one before sending to solr for update, and some code in 
Solr that only makes the update succeed if the version number of the 
updated document is exactly one higher than the version number of the 
document already stored. Of course again implemented right, so that 
even though two or more thrads are concurrently trying to update a 
document, and they all have their updated document based on the 
current version in solr/lucene, only one of them will succeed.


Or do I have to do stuff like this myself outside solr/lucene - e.g. 
in the client using solr.


Regards, Per Steffensen








Too many values for UnInvertedField faceting on field topic

2012-02-29 Thread Michael Jakl
Our Solr started to throw the following exception when requesting the
facets of a multivalued field holding a lot of terms.

SEVERE: org.apache.solr.common.SolrException: Too many values for
UnInvertedField faceting on field topic
at 
org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:390)
at 
org.apache.solr.request.UnInvertedField.(UnInvertedField.java:180)
at 
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:871)
at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:287)
at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:319)
at 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1373)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1198)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:139)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:909)
at java.lang.Thread.run(Thread.java:662)

Is there a way around it, maybe a setting to increase the limit?
Using facet.method=enum, as suggested in a thread in 2009, is far too
slow, at least in the experiments I did.

I'm using Solr 3.5.0 on Linux (192GB RAM), so faceting was pretty fast
after an initial cache warming.

Cheers,
Michael


Re: indexing but not able to search

2012-02-29 Thread vibhoreng04
Hi Sawmya,

Are you able to resolve your problem?
If not check the field type in the solr schema.It should be text if u r
tokenising and searching.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-but-not-able-to-search-tp3144695p3787592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing but not able to search

2012-02-29 Thread somer81
Hi,
No unfortunately I am not able to solve still,
For being sure I make it same field like in Solr Schema
I mean, for example for my "name" field I used name field in Solr
Schema or I did mine "name2" and copied same specifications with "name"
field in Solr. Or for my "coord" field I used Solr's store field which are
identical.
My coord field also keeps latitude, longtitude values.

But when I search by name I can see Solr fields and values mine are
disappear.

Thanks in advance

Omer


2012/2/29 vibhoreng04 [via Lucene] 

> Hi Sawmya,
>
> Are you able to resolve your problem?
> If not check the field type in the solr schema.It should be text if u r
> tokenising and searching.
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/indexing-but-not-able-to-search-tp3144695p3787592.html
>  To unsubscribe from indexing but not able to search, click 
> here
> .
> NAML
>



-- 
Ömer SEVİNÇ
Öğr. Görevlisi
Vezirköprü MYO


--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-but-not-able-to-search-tp3144695p3787629.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Building a resilient cluster

2012-02-29 Thread Ranjan Bagchi
Hi,

At this point I'm ok with one zk instance being a point of failure, I just
want to create sharded solr instances, bring them into the cluster, and be
able to shut them down without bringing down the whole cluster.

According to the wiki page, I should be able to bring up new shard by using
shardId [-D shardId], but when I did that, the logs showed it replicating
an existing shard.

Ranjan
Andre Bois-Crettez wrote:

> You have to run ZK on a at least 3 different machines for fault
> tolerance (a ZK ensemble).
>
> http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_sha=
> rd_replicas_and_zookeeper_ensemble
>
> Ranjan Bagchi wrote:
> > Hi,
> >
> > I'm interested in setting up a solr cluster where each machine [at least
> > initially] hosts a separate shard of a big index [too big to sit on the
> > machine].  I'm able to put a cloud together by telling it that I have (to
> > start out with) 4 nodes, and then starting up nodes on 3 machines
> pointin=
> g
> > at the zkInstance.  I'm able to load my sharded data onto each machine
> > individually and it seems to work.
> >
> > My concern is that it's not fault tolerant:  if one of the non-zookeeper
> > machines falls over, the whole cluster won't work.  Also, I can't create
> =
> a
> > shard with more data, and have it work within the existing cloud.
> >
> > I tried using -DshardId=3Dshard5 [on an existing 4-shard cluster], but it
> > just started replicating, which doesn't seem right.
> >
> > Are there ways around this?
> >
> > Thanks,
> > Ranjan Bagchi
> >
> >


[SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
Hi,

We're doing some tests with the latest trunk revision on a cluster of five 
high-end machines. There is one collection, five shards and one replica per 
shard on some other node.

We're filling the index from a MapReduce job, 18 processes run concurrently. 
This is plenty when indexing to a single high-end node but with SolrCloud 
things go down pretty soon.

First we get a Too Many Open Files error on all nodes almost at the same time. 
When shutting down the indexer the nodes won't respond anymore except for an 
Internal Server Error.

First the too many open files stack trace:

2012-02-29 15:22:51,067 ERROR [solr.core.SolrCore] - [http-80-6] - : 
java.io.FileNotFoundException: /opt/solr/openindex_b/data/index/_h5_0.tim (Too 
many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:216)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:449)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:288)
at 
org.apache.lucene.codecs.BlockTreeTermsWriter.(BlockTreeTermsWriter.java:149)
at 
org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:66)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:118)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:322)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:92)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:475)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:320)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:389)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1533)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1505)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:56)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:53)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:354)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:451)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:258)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:118)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:135)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662



A similar exception sometimes begins with:

%2012-02-29 15:25:36,137 ERROR [solr.update.CommitTracker] - [pool-5-thread-1] 
- : auto commit error...:jav

Re: Building a resilient cluster

2012-02-29 Thread Jamie Johnson
That is correct, the cloud does not currently elastically expand.
Essentially when you first start up you define something like
numShards, once numShards is reached all else goes in as replicas.  If
you manually specify the shards using the create core commands you can
define the layout however you please, but that still doesn't change
the fact that SolrCloud doesn't support elastically expanding after
initially provisioning the cluster.

I've seen this on the roadmap before, but don't know where it falls on
the current wish list, it's high on my mine :)

On Wed, Feb 29, 2012 at 10:36 AM, Ranjan Bagchi  wrote:
> Hi,
>
> At this point I'm ok with one zk instance being a point of failure, I just
> want to create sharded solr instances, bring them into the cluster, and be
> able to shut them down without bringing down the whole cluster.
>
> According to the wiki page, I should be able to bring up new shard by using
> shardId [-D shardId], but when I did that, the logs showed it replicating
> an existing shard.
>
> Ranjan
> Andre Bois-Crettez wrote:
>
>> You have to run ZK on a at least 3 different machines for fault
>> tolerance (a ZK ensemble).
>>
>> http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_sha=
>> rd_replicas_and_zookeeper_ensemble
>>
>> Ranjan Bagchi wrote:
>> > Hi,
>> >
>> > I'm interested in setting up a solr cluster where each machine [at least
>> > initially] hosts a separate shard of a big index [too big to sit on the
>> > machine].  I'm able to put a cloud together by telling it that I have (to
>> > start out with) 4 nodes, and then starting up nodes on 3 machines
>> pointin=
>> g
>> > at the zkInstance.  I'm able to load my sharded data onto each machine
>> > individually and it seems to work.
>> >
>> > My concern is that it's not fault tolerant:  if one of the non-zookeeper
>> > machines falls over, the whole cluster won't work.  Also, I can't create
>> =
>> a
>> > shard with more data, and have it work within the existing cloud.
>> >
>> > I tried using -DshardId=3Dshard5 [on an existing 4-shard cluster], but it
>> > just started replicating, which doesn't seem right.
>> >
>> > Are there ways around this?
>> >
>> > Thanks,
>> > Ranjan Bagchi
>> >
>> >


Re: Building a resilient cluster

2012-02-29 Thread Jamie Johnson
rereading your email, perhaps this doesn't answer the question though.
 Can you provide your solr.xml so we can get a better idea of your
configuration?

On Wed, Feb 29, 2012 at 10:41 AM, Jamie Johnson  wrote:
> That is correct, the cloud does not currently elastically expand.
> Essentially when you first start up you define something like
> numShards, once numShards is reached all else goes in as replicas.  If
> you manually specify the shards using the create core commands you can
> define the layout however you please, but that still doesn't change
> the fact that SolrCloud doesn't support elastically expanding after
> initially provisioning the cluster.
>
> I've seen this on the roadmap before, but don't know where it falls on
> the current wish list, it's high on my mine :)
>
> On Wed, Feb 29, 2012 at 10:36 AM, Ranjan Bagchi  
> wrote:
>> Hi,
>>
>> At this point I'm ok with one zk instance being a point of failure, I just
>> want to create sharded solr instances, bring them into the cluster, and be
>> able to shut them down without bringing down the whole cluster.
>>
>> According to the wiki page, I should be able to bring up new shard by using
>> shardId [-D shardId], but when I did that, the logs showed it replicating
>> an existing shard.
>>
>> Ranjan
>> Andre Bois-Crettez wrote:
>>
>>> You have to run ZK on a at least 3 different machines for fault
>>> tolerance (a ZK ensemble).
>>>
>>> http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_sha=
>>> rd_replicas_and_zookeeper_ensemble
>>>
>>> Ranjan Bagchi wrote:
>>> > Hi,
>>> >
>>> > I'm interested in setting up a solr cluster where each machine [at least
>>> > initially] hosts a separate shard of a big index [too big to sit on the
>>> > machine].  I'm able to put a cloud together by telling it that I have (to
>>> > start out with) 4 nodes, and then starting up nodes on 3 machines
>>> pointin=
>>> g
>>> > at the zkInstance.  I'm able to load my sharded data onto each machine
>>> > individually and it seems to work.
>>> >
>>> > My concern is that it's not fault tolerant:  if one of the non-zookeeper
>>> > machines falls over, the whole cluster won't work.  Also, I can't create
>>> =
>>> a
>>> > shard with more data, and have it work within the existing cloud.
>>> >
>>> > I tried using -DshardId=3Dshard5 [on an existing 4-shard cluster], but it
>>> > just started replicating, which doesn't seem right.
>>> >
>>> > Are there ways around this?
>>> >
>>> > Thanks,
>>> > Ranjan Bagchi
>>> >
>>> >


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Sami Siren
Hi Markus,

> The Linux machines have proper settings for ulimit and friends, 32k open files
> allowed so i suspect there's another limit which i am unaware of. I also
> listed the number of open files while the errors were coming in but it did not
> exceed 11k at any given time.

How did you check the number of filedescriptors used? Did you get this
number from the system info handler
(http://hotname:8983/solr/admin/system?indent=on&wt=json) or somehow
differently?

--
 Sami Siren


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Matthew Parker
Mark,

Nothing appears to be wrong in the logs. I wiped the indexes and imported
37 files from SharePoint using Manifold. All 37 make it in, but SOLR still
has issues with the results being inconsistent.

Let me run my setup by you, and see whether that is the issue?

On one machine, I have three zookeeper instances, four solr instances, and
a data directory for solr and zookeeper config data.

Step 1. I modified each zoo.xml configuration file to have:

Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk1_data
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
contents:
==
1

Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
==
tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk2_data
clientPort=2182
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
contents:
==
2

Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk3_data
clientPort=2183
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
contents:

3

Step 2 - SOLR Build
===

I pulled the latest SOLR trunk down. I built it with the following commands:

   ant example dist

I modified the solr.war files and added the solr cell and extraction
libraries to WEB-INF/lib. I couldn't get the extraction to work
any other way. Will zookeper pickup jar files stored with the rest of the
configuration files in Zookeeper?

I copied the contents of the example directory to each of my SOLR
directories.

Step 3 - Starting Zookeeper instances
===

I ran the following commands to start the zookeeper instances:

start .\zookeeper1\bin\zkServer.cmd
start .\zookeeper2\bin\zkServer.cmd
start .\zookeeper3\bin\zkServer.cmd

Step 4 - Start Main SOLR instance
==
I ran the following command to start the main SOLR instance

java -Djetty.port=8081 -Dhostport=8081
-Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

Starts up fine.

Step 5 - Start the Remaining 3 SOLR Instances
==
I ran the following commands to start the other 3 instances from their home
directories:

java -Djetty.port=8082 -Dhostport=8082
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

java -Djetty.port=8083 -Dhostport=8083
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

java -Djetty.port=8084 -Dhostport=8084
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

All startup without issue.

Step 6 - Modified solrconfig.xml to have a custom request handler
===


  
 sharepoint-pipeline
 text
 true
 ignored
 true
 links
 ignored
  



   
  true
  id
  true
  url
  solr.processor.Lookup3Signature
   
   
   



Hopefully this will shed some light on why my configuration is having
issues.

Thanks for your help.

Matt



On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller  wrote:

> Hmm...this is very strange - there is nothing interesting in any of the
> logs?
>
> In clusterstate.json, all of the shards have an active state?
>
>
> There are quite a few of us doing exactly this setup recently, so there
> must be something we are missing here...
>
> Any info you can offer might help.
>
> - Mark
>
> On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
>
> > Mark,
> >
> > I got the codebase from the 2/26/2012, and I got the same inconsistent
> > results.
> >
> > I have solr running on four ports 8081-8084
> >
> > 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
> >
> > 8083 - is assigned to shard 1
> > 8084 - is assigned to shard 2
> >
> > queries come in and sometime it seems the windows from 8081 and 8083 move
> > responding to the query but there are no results.
> >
> > if the queries run on 8081/8082 or 8081/8084 then results come back ok.
> >
> > The query is nothing more than: q=*:*
> >
> > Regards,
> >
> > Matt
> >
> >
> > On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker <
> > mpar...@apogeeintegration.com> wrote:
> >
> >> I'll have to check on the commit situation. We have been pushing data
> from
> >> SharePoint the last week or so. Would that somehow block the documents
> >> moving between the solr instances?
> >>
> >> I'll try another version tomorrow. Thanks for the suggestions.
> >>
> >> On Mon, 

Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
Sami,

As superuser:
$ lsof | wc -l

But, just now, i also checked the system handler and it told me:
(error executing: ulimit -n)

This is rather strange, it seems. lsof | wc -l is not higher than 6k right now 
and ulimit -n is 32k. Is lsof not to be trusted in this case or... something 
else? 

Thanks

On Wednesday 29 February 2012 16:44:58 Sami Siren wrote:
> Hi Markus,
> 
> > The Linux machines have proper settings for ulimit and friends, 32k open
> > files allowed so i suspect there's another limit which i am unaware of.
> > I also listed the number of open files while the errors were coming in
> > but it did not exceed 11k at any given time.
> 
> How did you check the number of filedescriptors used? Did you get this
> number from the system info handler
> (http://hotname:8983/solr/admin/system?indent=on&wt=json) or somehow
> differently?
> 
> --
>  Sami Siren

-- 
Markus Jelsma - CTO - Openindex


Re: Is there a way to implement a IntRangeField in Solr?

2012-02-29 Thread federico.wachs
I'll give this a try. I'm not sure I completely understand how to do that
because I don't have so much experience with Solr. Do I have to use another
core to post a different kind of document and then join it?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3787873.html
Sent from the Solr - User mailing list archive at Nabble.com.


searching top matches of each facet

2012-02-29 Thread Paul
Let's say that I have a facet named 'subject' that contains one of:
physics, chemistry, psychology, mathematics, etc

I'd like to do a search for the top 5 documents in each category. I
can do this with a separate search for each facet, but it seems like
there would a way to combine the search. Is there a way?

That is, if the user searches for "my search", I can now search for it
with the facet of "physics" and rows=5, then do a separate search with
the facet of "chemistry", etc...

Can I do that in one search to decrease the load on the server? Or,
when I do the first search, will the results be cached, so that the
rest of the searches are pretty cheap?


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Sami Siren
On Wed, Feb 29, 2012 at 5:53 PM, Markus Jelsma
 wrote:
> Sami,
>
> As superuser:
> $ lsof | wc -l
>
> But, just now, i also checked the system handler and it told me:
> (error executing: ulimit -n)

That's odd, you should see something like this there:

"openFileDescriptorCount":131,
"maxFileDescriptorCount":4096,

Which jvm do you have?

> This is rather strange, it seems. lsof | wc -l is not higher than 6k right now
> and ulimit -n is 32k. Is lsof not to be trusted in this case or... something
> else?

I am not sure what is going on, are you sure the open file descriptor
(32k) limit is active for the user running solr?

--
 Sami Siren


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Matthew Parker
I tried running SOLR Cloud with the default number of shards (i.e. 1), and
I get the same results.

On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> Mark,
>
> Nothing appears to be wrong in the logs. I wiped the indexes and imported
> 37 files from SharePoint using Manifold. All 37 make it in, but SOLR still
> has issues with the results being inconsistent.
>
> Let me run my setup by you, and see whether that is the issue?
>
> On one machine, I have three zookeeper instances, four solr instances, and
> a data directory for solr and zookeeper config data.
>
> Step 1. I modified each zoo.xml configuration file to have:
>
> Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg
> 
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=[DATA_DIRECTORY]/zk1_data
> clientPort=2181
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890
>
> Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
> contents:
> ==
> 1
>
> Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
> ==
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=[DATA_DIRECTORY]/zk2_data
> clientPort=2182
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890
>
> Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
> contents:
> ==
> 2
>
> Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg
> 
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=[DATA_DIRECTORY]/zk3_data
> clientPort=2183
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890
>
> Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
> contents:
> 
> 3
>
> Step 2 - SOLR Build
> ===
>
> I pulled the latest SOLR trunk down. I built it with the following
> commands:
>
>ant example dist
>
> I modified the solr.war files and added the solr cell and extraction
> libraries to WEB-INF/lib. I couldn't get the extraction to work
> any other way. Will zookeper pickup jar files stored with the rest of the
> configuration files in Zookeeper?
>
> I copied the contents of the example directory to each of my SOLR
> directories.
>
> Step 3 - Starting Zookeeper instances
> ===
>
> I ran the following commands to start the zookeeper instances:
>
> start .\zookeeper1\bin\zkServer.cmd
> start .\zookeeper2\bin\zkServer.cmd
> start .\zookeeper3\bin\zkServer.cmd
>
> Step 4 - Start Main SOLR instance
> ==
> I ran the following command to start the main SOLR instance
>
> java -Djetty.port=8081 -Dhostport=8081
> -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>
> Starts up fine.
>
> Step 5 - Start the Remaining 3 SOLR Instances
> ==
> I ran the following commands to start the other 3 instances from their
> home directories:
>
> java -Djetty.port=8082 -Dhostport=8082
> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>
> java -Djetty.port=8083 -Dhostport=8083
> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>
> java -Djetty.port=8084 -Dhostport=8084
> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>
> All startup without issue.
>
> Step 6 - Modified solrconfig.xml to have a custom request handler
> ===
>
>  class="solr.extraction.ExtractingRequestHandler">
>   
>  sharepoint-pipeline
>  text
>  true
>  ignored
>  true
>  links
>  ignored
>   
> 
>
> 
>
>   true
>   id
>   true
>   url
>   solr.processor.Lookup3Signature
>
>
>
> 
>
>
> Hopefully this will shed some light on why my configuration is having
> issues.
>
> Thanks for your help.
>
> Matt
>
>
>
> On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller wrote:
>
>> Hmm...this is very strange - there is nothing interesting in any of the
>> logs?
>>
>> In clusterstate.json, all of the shards have an active state?
>>
>>
>> There are quite a few of us doing exactly this setup recently, so there
>> must be something we are missing here...
>>
>> Any info you can offer might help.
>>
>> - Mark
>>
>> On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
>>
>> > Mark,
>> >
>> > I got the codebase from the 2/26/2012, and I got the same inconsistent
>> > results.
>> >
>> > I have solr running on four ports 8081-8084
>> >
>> > 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
>> >
>> > 8083 - is assigned to shard 1
>> > 8084 - is assigned to shard 2
>> >
>> > queries come in and sometime it seems the windows from 8081 and 8083
>> move
>> > responding to the query but there are no results.
>> >
>> > if the 

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Matthew Parker
I also took out my requestHandler and used the standard /update/extract
handler. Same result.

On Wed, Feb 29, 2012 at 11:47 AM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> I tried running SOLR Cloud with the default number of shards (i.e. 1), and
> I get the same results.
>
> On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker <
> mpar...@apogeeintegration.com> wrote:
>
>> Mark,
>>
>> Nothing appears to be wrong in the logs. I wiped the indexes and imported
>> 37 files from SharePoint using Manifold. All 37 make it in, but SOLR still
>> has issues with the results being inconsistent.
>>
>> Let me run my setup by you, and see whether that is the issue?
>>
>> On one machine, I have three zookeeper instances, four solr instances,
>> and a data directory for solr and zookeeper config data.
>>
>> Step 1. I modified each zoo.xml configuration file to have:
>>
>> Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg
>> 
>> tickTime=2000
>> initLimit=10
>> syncLimit=5
>> dataDir=[DATA_DIRECTORY]/zk1_data
>> clientPort=2181
>> server.1=localhost:2888:3888
>> server.2=localhost:2889:3889
>> server.3=localhost:2890:3890
>>
>> Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
>> contents:
>> ==
>> 1
>>
>> Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
>> ==
>> tickTime=2000
>> initLimit=10
>> syncLimit=5
>> dataDir=[DATA_DIRECTORY]/zk2_data
>> clientPort=2182
>> server.1=localhost:2888:3888
>> server.2=localhost:2889:3889
>> server.3=localhost:2890:3890
>>
>> Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
>> contents:
>> ==
>> 2
>>
>> Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg
>> 
>> tickTime=2000
>> initLimit=10
>> syncLimit=5
>> dataDir=[DATA_DIRECTORY]/zk3_data
>> clientPort=2183
>> server.1=localhost:2888:3888
>> server.2=localhost:2889:3889
>> server.3=localhost:2890:3890
>>
>> Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
>> contents:
>> 
>> 3
>>
>> Step 2 - SOLR Build
>> ===
>>
>> I pulled the latest SOLR trunk down. I built it with the following
>> commands:
>>
>>ant example dist
>>
>> I modified the solr.war files and added the solr cell and extraction
>> libraries to WEB-INF/lib. I couldn't get the extraction to work
>> any other way. Will zookeper pickup jar files stored with the rest of the
>> configuration files in Zookeeper?
>>
>> I copied the contents of the example directory to each of my SOLR
>> directories.
>>
>> Step 3 - Starting Zookeeper instances
>> ===
>>
>> I ran the following commands to start the zookeeper instances:
>>
>> start .\zookeeper1\bin\zkServer.cmd
>> start .\zookeeper2\bin\zkServer.cmd
>> start .\zookeeper3\bin\zkServer.cmd
>>
>> Step 4 - Start Main SOLR instance
>> ==
>> I ran the following command to start the main SOLR instance
>>
>> java -Djetty.port=8081 -Dhostport=8081
>> -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>
>> Starts up fine.
>>
>> Step 5 - Start the Remaining 3 SOLR Instances
>> ==
>> I ran the following commands to start the other 3 instances from their
>> home directories:
>>
>> java -Djetty.port=8082 -Dhostport=8082
>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>
>> java -Djetty.port=8083 -Dhostport=8083
>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>
>> java -Djetty.port=8084 -Dhostport=8084
>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>
>> All startup without issue.
>>
>> Step 6 - Modified solrconfig.xml to have a custom request handler
>> ===
>>
>> > class="solr.extraction.ExtractingRequestHandler">
>>   
>>  sharepoint-pipeline
>>  text
>>  true
>>  ignored
>>  true
>>  links
>>  ignored
>>   
>> 
>>
>> 
>>
>>   true
>>   id
>>   true
>>   url
>>   solr.processor.Lookup3Signature
>>
>>
>>
>> 
>>
>>
>> Hopefully this will shed some light on why my configuration is having
>> issues.
>>
>> Thanks for your help.
>>
>> Matt
>>
>>
>>
>> On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller wrote:
>>
>>> Hmm...this is very strange - there is nothing interesting in any of the
>>> logs?
>>>
>>> In clusterstate.json, all of the shards have an active state?
>>>
>>>
>>> There are quite a few of us doing exactly this setup recently, so there
>>> must be something we are missing here...
>>>
>>> Any info you can offer might help.
>>>
>>> - Mark
>>>
>>> On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
>>>
>>> > Mark,
>>> >
>>> > I got the codebase from the 2/26/2012, and I got the same inconsistent
>>> > results.
>

Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Yonik Seeley
On Wed, Feb 29, 2012 at 10:32 AM, Markus Jelsma
 wrote:
> The Linux machines have proper settings for ulimit and friends, 32k open files
> allowed

Maybe you can expand on this point.

cat /proc/sys/fs/file-max
cat /proc/sys/fs/nr_open

Those take precedence over ulimit.  Not sure if there are others...

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
On Wednesday 29 February 2012 17:52:55 Sami Siren wrote:
> On Wed, Feb 29, 2012 at 5:53 PM, Markus Jelsma
> 
>  wrote:
> > Sami,
> > 
> > As superuser:
> > $ lsof | wc -l
> > 
> > But, just now, i also checked the system handler and it told me:
> > (error executing: ulimit -n)
> 
> That's odd, you should see something like this there:
> 
> "openFileDescriptorCount":131,
> "maxFileDescriptorCount":4096,
> 
> Which jvm do you have?

Standard issue SUN Java 6 on Debian. We run that JVM on all machines. But i 
see the same (error executing: ulimit -n) locally with Jetty and Solr trunk 
and Solr 3.5 and on a production server with Solr 3.2 with Tomcat6.

> 
> > This is rather strange, it seems. lsof | wc -l is not higher than 6k
> > right now and ulimit -n is 32k. Is lsof not to be trusted in this case
> > or... something else?
> 
> I am not sure what is going on, are you sure the open file descriptor
> (32k) limit is active for the user running solr?

I get the correct output for ulimit -n as tomcat6 user. However, i did find a 
mistake in /etc/security/limits.conf where i misspelled the tomcat6 user 
(shame). On recent systems only ulimit and sysctl is not enough so spelling 
tomcat6 correctly should fix the open files issue. 

No we only have the issue of (error executing: ulimit -n).

> 
> --
>  Sami Siren

-- 
Markus Jelsma - CTO - Openindex


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Sami Siren
On Wed, Feb 29, 2012 at 7:03 PM, Matthew Parker
 wrote:
> I also took out my requestHandler and used the standard /update/extract
> handler. Same result.

How did you install/start the system this time? The same way as
earlier? What kind of queries do you run?

Would it be possible for you to check out the latest version from svn.
In there we have some dev scripts for linux that can be used to setup
a test system easily (you need svn, jdk and ant).

Essentially the steps would be:

#Checkout the sources:
svn co http://svn.apache.org/repos/asf/lucene/dev/trunk

#build and start solrcloud (1 shard, no replicas)
cd solr/cloud-dev
sh ./control.sh rebuild
sh ./control.sh reinstall 1
sh ./control.sh start 1

#index content
java -jar ../example/exampledocs/post.jar ../example/exampledocs/*.xml

#after that you can run your queries

--
 Sami Siren


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
Thanks. They are set properly. But i misspelled the tomcat6 username in 
limits.conf :(

On Wednesday 29 February 2012 18:08:55 Yonik Seeley wrote:
> On Wed, Feb 29, 2012 at 10:32 AM, Markus Jelsma
> 
>  wrote:
> > The Linux machines have proper settings for ulimit and friends, 32k open
> > files allowed
> 
> Maybe you can expand on this point.
> 
> cat /proc/sys/fs/file-max
> cat /proc/sys/fs/nr_open
> 
> Those take precedence over ulimit.  Not sure if there are others...
> 
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10

-- 
Markus Jelsma - CTO - Openindex



Re: Couple issues with edismax in 3.5

2012-02-29 Thread Way Cool
Thanks Ahmet for your reply.

I don't think mm will help here because it defaults to 100% already by the
following code.

 if (parsedUserQuery != null && doMinMatched) {
String minShouldMatch = solrParams.get(DMP.MM, "100%");
if (parsedUserQuery instanceof BooleanQuery) {
  U.setMinShouldMatch((BooleanQuery)parsedUserQuery,
minShouldMatch);
}
  }

Regarding multi-word synonym, what is the best way to handle it now? Make
it as a phrase with " or adding -  in between?
I don't like index time expansion because it adds lots of noises.

That's good to know Analysis.jsp does not perform actual query parsing. I
was hoping edismax can do something similar to analysis tool because it
shows everything I need for multi-word synonym.

Thanks.

On Wed, Feb 29, 2012 at 1:23 AM, Ahmet Arslan  wrote:

> > 1. Search for 4X6 generated the following parsed query:
> > +DisjunctionMaxQueryid:4 id:x id:6)^1.2) | ((name:4
> > name:x
> > name:6)^1.025) )
> > while the search for "4 X 6" (with space in between)
> > generated the query
> > below: (I like this one)
> > +((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
> > +((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
> > +((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)
> >
> > Is that really intentional? The first query is pretty weird
> > because it will
> > return all of the docs with one of 4, x, 6.
>
> Minimum Should Match (mm) parameter is used to control how many search
> terms should match. For example, you can set it to &mm=100%.
>
> Also you can tweak relevancy be setting phrase fields (pf) parameter.
>
> > Any easy way we can force "4X6" search to be the same as "4
> > X 6"?
> >
> > 2. Issue with multi words synonym because edismax separates
> > keywords to
> > multiple words via the line below:
> > clauses = splitIntoClauses(userQuery, false);
> > and seems like edismax doesn't quite respect fieldType at
> > query time, for
> > example, handling stopWords differently than what's
> > specified in schema.
> >
> > For example: I have the following synonym:
> > AAA BBB, AAABBB, AAA-BBB, CCC DDD
> >
> > When I search for "AAA-BBB", it works, however search for
> > "CCC DDD" was not
> > returning results containing AAABBB. What is interesting is
> > that
> > admin/analysis.jsp is returning great results.
>
> Query string is tokenized (according to white spaces) before it reaches
> analyzer. https://issues.apache.org/jira/browse/LUCENE-2605
> That's why multi-word synonyms are not advised to use at query time.
>
> Analysis.jsp does not perform actual query parsing.
>


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Carlos Alberto Schneider
I had this problem sometime ago,
It happened on our homolog machine.

There was 3 solr instances , 1 master 2 slaves, running.
My Solution was: I stoped the slaves, deleted both data folders, runned an
optimize and than started it again.

I tried to raise the OS open file limit first, but i think it was not a
good idea... so i tried this ...


On Wed, Feb 29, 2012 at 2:07 PM, Markus Jelsma
wrote:

> I get the correct output for ulimit -n as tomcat6 user. However, i did
> find a




-- 
Carlos Alberto Schneider
Informant -(47) 38010919 - 9904-5517


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Matthew Parker
Sami,

I have the latest as of the 26th. My system is running on a standalone
network so it's not easy to get code updates without a wave of paperwork.

I installed as per the detailed instructions I laid out a couple of
messages ago from today (2/29/2012).

I'm running the following query:

http://localhost:8081/solr/collection1/select?q=*:*

which gets translated to the following:

http://localhost:8081/solr/collection1/select?q=*:*&version=2.2&start=0&rows=10&indent=on

I just tried it running only two solr nodes, and I get the same results.

Regards,

Matt

On Wed, Feb 29, 2012 at 12:25 PM, Sami Siren  wrote:

> On Wed, Feb 29, 2012 at 7:03 PM, Matthew Parker
>  wrote:
> > I also took out my requestHandler and used the standard /update/extract
> > handler. Same result.
>
> How did you install/start the system this time? The same way as
> earlier? What kind of queries do you run?
>
> Would it be possible for you to check out the latest version from svn.
> In there we have some dev scripts for linux that can be used to setup
> a test system easily (you need svn, jdk and ant).
>
> Essentially the steps would be:
>
> #Checkout the sources:
> svn co http://svn.apache.org/repos/asf/lucene/dev/trunk
>
> #build and start solrcloud (1 shard, no replicas)
> cd solr/cloud-dev
> sh ./control.sh rebuild
> sh ./control.sh reinstall 1
> sh ./control.sh start 1
>
> #index content
> java -jar ../example/exampledocs/post.jar ../example/exampledocs/*.xml
>
> #after that you can run your queries
>
> --
>  Sami Siren
>

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Matthew Parker
Mark/Sami

I ran the system with 3 zookeeper nodes, 2 solr cloud nodes, and left
numShards set to its default value (i.e. 1)

I looks like it finally sync'd with the other one after quite a while, but
it's throwing lots of errors like the following:

org.apache.solr.common.SolrException: missing _version_ on update from
leader at
org.apache.solr.update.processor.DistributtedUpdateProcessor.versionDelete(
DistributedUpdateProcessor.java: 712)




Is it normal to sync long after the documents were sent for indexing?

I'll have to check and see whether the 4 solr node instance with 2 shards
works after waiting for the system to sync.

Regards,

Matt

On Wed, Feb 29, 2012 at 12:03 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> I also took out my requestHandler and used the standard /update/extract
> handler. Same result.
>
> On Wed, Feb 29, 2012 at 11:47 AM, Matthew Parker <
> mpar...@apogeeintegration.com> wrote:
>
>> I tried running SOLR Cloud with the default number of shards (i.e. 1),
>> and I get the same results.
>>
>> On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker <
>> mpar...@apogeeintegration.com> wrote:
>>
>>> Mark,
>>>
>>> Nothing appears to be wrong in the logs. I wiped the indexes and
>>> imported 37 files from SharePoint using Manifold. All 37 make it in, but
>>> SOLR still has issues with the results being inconsistent.
>>>
>>> Let me run my setup by you, and see whether that is the issue?
>>>
>>> On one machine, I have three zookeeper instances, four solr instances,
>>> and a data directory for solr and zookeeper config data.
>>>
>>> Step 1. I modified each zoo.xml configuration file to have:
>>>
>>> Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg
>>> 
>>> tickTime=2000
>>> initLimit=10
>>> syncLimit=5
>>> dataDir=[DATA_DIRECTORY]/zk1_data
>>> clientPort=2181
>>> server.1=localhost:2888:3888
>>> server.2=localhost:2889:3889
>>> server.3=localhost:2890:3890
>>>
>>> Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
>>> contents:
>>> ==
>>> 1
>>>
>>> Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
>>> ==
>>> tickTime=2000
>>> initLimit=10
>>> syncLimit=5
>>> dataDir=[DATA_DIRECTORY]/zk2_data
>>> clientPort=2182
>>> server.1=localhost:2888:3888
>>> server.2=localhost:2889:3889
>>> server.3=localhost:2890:3890
>>>
>>> Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
>>> contents:
>>> ==
>>> 2
>>>
>>> Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg
>>> 
>>> tickTime=2000
>>> initLimit=10
>>> syncLimit=5
>>> dataDir=[DATA_DIRECTORY]/zk3_data
>>> clientPort=2183
>>> server.1=localhost:2888:3888
>>> server.2=localhost:2889:3889
>>> server.3=localhost:2890:3890
>>>
>>> Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
>>> contents:
>>> 
>>> 3
>>>
>>> Step 2 - SOLR Build
>>> ===
>>>
>>> I pulled the latest SOLR trunk down. I built it with the following
>>> commands:
>>>
>>>ant example dist
>>>
>>> I modified the solr.war files and added the solr cell and extraction
>>> libraries to WEB-INF/lib. I couldn't get the extraction to work
>>> any other way. Will zookeper pickup jar files stored with the rest of
>>> the configuration files in Zookeeper?
>>>
>>> I copied the contents of the example directory to each of my SOLR
>>> directories.
>>>
>>> Step 3 - Starting Zookeeper instances
>>> ===
>>>
>>> I ran the following commands to start the zookeeper instances:
>>>
>>> start .\zookeeper1\bin\zkServer.cmd
>>> start .\zookeeper2\bin\zkServer.cmd
>>> start .\zookeeper3\bin\zkServer.cmd
>>>
>>> Step 4 - Start Main SOLR instance
>>> ==
>>> I ran the following command to start the main SOLR instance
>>>
>>> java -Djetty.port=8081 -Dhostport=8081
>>> -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
>>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>>
>>> Starts up fine.
>>>
>>> Step 5 - Start the Remaining 3 SOLR Instances
>>> ==
>>> I ran the following commands to start the other 3 instances from their
>>> home directories:
>>>
>>> java -Djetty.port=8082 -Dhostport=8082
>>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>>
>>> java -Djetty.port=8083 -Dhostport=8083
>>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>>
>>> java -Djetty.port=8084 -Dhostport=8084
>>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>>
>>> All startup without issue.
>>>
>>> Step 6 - Modified solrconfig.xml to have a custom request handler
>>> ===
>>>
>>> >> class="solr.extraction.ExtractingRequestHandler">
>>>   
>>>  sharepoint-pipeline
>>>  text
>>>  true
>>>  ignored
>>>

handling case insensitive and regex

2012-02-29 Thread Neil Hart
I'm just starting out...

for either
testing QA
TESTING QA

I can query with the following strings and find my text:
testing
TESTING
testing*

but the following doesn't work.
TESTING*

any ideas?
thanks
Neil


Re: Is there a way to implement a IntRangeField in Solr?

2012-02-29 Thread Mikhail Khludnev
AFAIK join is done in the single core. Same core should have two types of
documents.
Pls let me know about your achievement.

On Wed, Feb 29, 2012 at 8:46 PM, federico.wachs
wrote:

> I'll give this a try. I'm not sure I completely understand how to do that
> because I don't have so much experience with Solr. Do I have to use another
> core to post a different kind of document and then join it?
>
> Thanks!
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3787873.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: handling case insensitive and regex

2012-02-29 Thread Emmanuel Espina
What query parser are you using? It looks like Lucene Query Parser or edismax.
The cause is that wildcard queries does not get analyzed. So even if
you have lowercase filters in the analysis chain that is not being
applied when you search using *

Thanks
Emmanuel

2012/2/29 Neil Hart :
> I'm just starting out...
>
> for either
> testing QA
> TESTING QA
>
> I can query with the following strings and find my text:
> testing
> TESTING
> testing*
>
> but the following doesn't work.
> TESTING*
>
> any ideas?
> thanks
> Neil


Re: searching top matches of each facet

2012-02-29 Thread Emmanuel Espina
I think that what you want is FieldCollapsing:

http://wiki.apache.org/solr/FieldCollapsing

For example
&q=my search&group=true&group.field=subject&group.limit=5

Test it to see if that is what you want.

Thanks
Emmanuel


2012/2/29 Paul :
> Let's say that I have a facet named 'subject' that contains one of:
> physics, chemistry, psychology, mathematics, etc
>
> I'd like to do a search for the top 5 documents in each category. I
> can do this with a separate search for each facet, but it seems like
> there would a way to combine the search. Is there a way?
>
> That is, if the user searches for "my search", I can now search for it
> with the facet of "physics" and rows=5, then do a separate search with
> the facet of "chemistry", etc...
>
> Can I do that in one search to decrease the load on the server? Or,
> when I do the first search, will the results be cached, so that the
> rest of the searches are pretty cheap?


Re: Too many values for UnInvertedField faceting on field topic

2012-02-29 Thread Emmanuel Espina
No. But probably we can find another way to do what you want. Please
describe the problem and include some "numbers" to give us an idea of
the sizes that you are handling. Number of documents, size of the
index, etc.

Thanks
Emmanuel

2012/2/29 Michael Jakl :
> Our Solr started to throw the following exception when requesting the
> facets of a multivalued field holding a lot of terms.
>
> SEVERE: org.apache.solr.common.SolrException: Too many values for
> UnInvertedField faceting on field topic
>        at 
> org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:390)
>        at 
> org.apache.solr.request.UnInvertedField.(UnInvertedField.java:180)
>        at 
> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:871)
>        at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:287)
>        at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:319)
>        at 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1373)
>        at 
> org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
>        at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1198)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:139)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:909)
>        at java.lang.Thread.run(Thread.java:662)
>
> Is there a way around it, maybe a setting to increase the limit?
> Using facet.method=enum, as suggested in a thread in 2009, is far too
> slow, at least in the experiments I did.
>
> I'm using Solr 3.5.0 on Linux (192GB RAM), so faceting was pretty fast
> after an initial cache warming.
>
> Cheers,
> Michael


Solr sorting question to boost a certain field first

2012-02-29 Thread Mike Austin
I have content that I index for several different domains.  What I'd like
to do is have all search results found for domainA returned first and
results for domainB,C,D..etc.. returned second.  I could do two different
searches but was wondering if there was a way to only do one query but
return results from a certain domain first followed by results from the
rest of the domains second.

I thought about trying to boost but I question if the boost would always
make domainA return first?  Could someone please suggest a way to do this?
Thanks!

Example: Query for "apple" on "domainA" plus give me other domains that
have "apple" in them also

Example results:
1. DomainA, score .85
2. DomainA, score .84
3. DomainA, score .75
4. DomainA, score .65
5. DomainA, score .55
6. DomainA, score .35
--- now network results 
7. DomainC, score .94
8. DomainE, score .75
9. DomainB, score .68
10. DomainG, score .55
11. DomainC, score .35

Thanks,
Mike


Solr Design question on spatial search

2012-02-29 Thread Venu Shankar
Hello,

I have a design question for Solr.

I work for an enterprise which has a lot of retail stores (approx. 20K).
These retail stores are spread across the world.  My search requirement is
to find all the cities which are within x miles of a retail store.

So lets say if we have a retail Store in San Francisco and if I search for
"San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
returned as they are within x miles from San Francisco. I also want to rank
the search results by their distance.

I can create an index with all the cities in it but I am not sure how do I
ensure that the cities returned in a search result have a nearby retail
store. Any suggestions ?

Thanks,
Venu,


Re: Solr sorting question to boost a certain field first

2012-02-29 Thread Mike Austin
Boom!

This works: sort=map(query($qq,-1),0, ,
1)+desc,score+desc&qq=domain:domainA

Thanks,
Mike

On Wed, Feb 29, 2012 at 3:45 PM, Mike Austin  wrote:

> I have content that I index for several different domains.  What I'd like
> to do is have all search results found for domainA returned first and
> results for domainB,C,D..etc.. returned second.  I could do two different
> searches but was wondering if there was a way to only do one query but
> return results from a certain domain first followed by results from the
> rest of the domains second.
>
> I thought about trying to boost but I question if the boost would always
> make domainA return first?  Could someone please suggest a way to do this?
> Thanks!
>
> Example: Query for "apple" on "domainA" plus give me other domains that
> have "apple" in them also
>
> Example results:
> 1. DomainA, score .85
> 2. DomainA, score .84
> 3. DomainA, score .75
> 4. DomainA, score .65
> 5. DomainA, score .55
> 6. DomainA, score .35
> --- now network results 
> 7. DomainC, score .94
> 8. DomainE, score .75
> 9. DomainB, score .68
> 10. DomainG, score .55
> 11. DomainC, score .35
>
> Thanks,
> Mike
>


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Mark Miller
Do you have a _version_ field in your schema? I actually just came back to
this thread with that thought and then saw your error - so that remains my
guess.

I'm going to improve the doc on the wiki around what needs to be defined
for SolrCloud - so far we have things in the example defaults, but its not
clear enough to users what needs to be there if they are using an old
schema or modifying the example.

  

If you do have a _version_ field, there is something to track down here for
sure.

On Wed, Feb 29, 2012 at 1:15 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> Mark/Sami
>
> I ran the system with 3 zookeeper nodes, 2 solr cloud nodes, and left
> numShards set to its default value (i.e. 1)
>
> I looks like it finally sync'd with the other one after quite a while, but
> it's throwing lots of errors like the following:
>
> org.apache.solr.common.SolrException: missing _version_ on update from
> leader at
> org.apache.solr.update.processor.DistributtedUpdateProcessor.versionDelete(
> DistributedUpdateProcessor.java: 712)
> 
> 
> 
>
> Is it normal to sync long after the documents were sent for indexing?
>
> I'll have to check and see whether the 4 solr node instance with 2 shards
> works after waiting for the system to sync.
>
> Regards,
>
> Matt
>
> On Wed, Feb 29, 2012 at 12:03 PM, Matthew Parker <
> mpar...@apogeeintegration.com> wrote:
>
> > I also took out my requestHandler and used the standard /update/extract
> > handler. Same result.
> >
> > On Wed, Feb 29, 2012 at 11:47 AM, Matthew Parker <
> > mpar...@apogeeintegration.com> wrote:
> >
> >> I tried running SOLR Cloud with the default number of shards (i.e. 1),
> >> and I get the same results.
> >>
> >> On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker <
> >> mpar...@apogeeintegration.com> wrote:
> >>
> >>> Mark,
> >>>
> >>> Nothing appears to be wrong in the logs. I wiped the indexes and
> >>> imported 37 files from SharePoint using Manifold. All 37 make it in,
> but
> >>> SOLR still has issues with the results being inconsistent.
> >>>
> >>> Let me run my setup by you, and see whether that is the issue?
> >>>
> >>> On one machine, I have three zookeeper instances, four solr instances,
> >>> and a data directory for solr and zookeeper config data.
> >>>
> >>> Step 1. I modified each zoo.xml configuration file to have:
> >>>
> >>> Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg
> >>> 
> >>> tickTime=2000
> >>> initLimit=10
> >>> syncLimit=5
> >>> dataDir=[DATA_DIRECTORY]/zk1_data
> >>> clientPort=2181
> >>> server.1=localhost:2888:3888
> >>> server.2=localhost:2889:3889
> >>> server.3=localhost:2890:3890
> >>>
> >>> Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
> >>> contents:
> >>> ==
> >>> 1
> >>>
> >>> Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
> >>> ==
> >>> tickTime=2000
> >>> initLimit=10
> >>> syncLimit=5
> >>> dataDir=[DATA_DIRECTORY]/zk2_data
> >>> clientPort=2182
> >>> server.1=localhost:2888:3888
> >>> server.2=localhost:2889:3889
> >>> server.3=localhost:2890:3890
> >>>
> >>> Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
> >>> contents:
> >>> ==
> >>> 2
> >>>
> >>> Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg
> >>> 
> >>> tickTime=2000
> >>> initLimit=10
> >>> syncLimit=5
> >>> dataDir=[DATA_DIRECTORY]/zk3_data
> >>> clientPort=2183
> >>> server.1=localhost:2888:3888
> >>> server.2=localhost:2889:3889
> >>> server.3=localhost:2890:3890
> >>>
> >>> Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
> >>> contents:
> >>> 
> >>> 3
> >>>
> >>> Step 2 - SOLR Build
> >>> ===
> >>>
> >>> I pulled the latest SOLR trunk down. I built it with the following
> >>> commands:
> >>>
> >>>ant example dist
> >>>
> >>> I modified the solr.war files and added the solr cell and extraction
> >>> libraries to WEB-INF/lib. I couldn't get the extraction to work
> >>> any other way. Will zookeper pickup jar files stored with the rest of
> >>> the configuration files in Zookeeper?
> >>>
> >>> I copied the contents of the example directory to each of my SOLR
> >>> directories.
> >>>
> >>> Step 3 - Starting Zookeeper instances
> >>> ===
> >>>
> >>> I ran the following commands to start the zookeeper instances:
> >>>
> >>> start .\zookeeper1\bin\zkServer.cmd
> >>> start .\zookeeper2\bin\zkServer.cmd
> >>> start .\zookeeper3\bin\zkServer.cmd
> >>>
> >>> Step 4 - Start Main SOLR instance
> >>> ==
> >>> I ran the following command to start the main SOLR instance
> >>>
> >>> java -Djetty.port=8081 -Dhostport=8081
> >>> -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
> >>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
> >>>
> >>> Starts up fine.
> >>>
>

Re: Building a resilient cluster

2012-02-29 Thread Mark Miller
Doh! Sorry - this was broken - I need to fix the doc or add it back.

The shard id is actually set in solr.xml since its per core - the sys prop
was a sugar option we had setup. So either add 'shard' to the core in
solr.xml, or to make it work like it does in the doc, do:

 

That sets shard to the 'shard' system property if its set, or as a default,
act as if it wasn't set.

I've been working with custom shard ids mainly through solrj, so I hadn't
noticed this.

- Mark

On Wed, Feb 29, 2012 at 10:36 AM, Ranjan Bagchi wrote:

> Hi,
>
> At this point I'm ok with one zk instance being a point of failure, I just
> want to create sharded solr instances, bring them into the cluster, and be
> able to shut them down without bringing down the whole cluster.
>
> According to the wiki page, I should be able to bring up new shard by using
> shardId [-D shardId], but when I did that, the logs showed it replicating
> an existing shard.
>
> Ranjan
> Andre Bois-Crettez wrote:
>
> > You have to run ZK on a at least 3 different machines for fault
> > tolerance (a ZK ensemble).
> >
> >
> http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_sha=
> > rd_replicas_and_zookeeper_ensemble
> >
> > Ranjan Bagchi wrote:
> > > Hi,
> > >
> > > I'm interested in setting up a solr cluster where each machine [at
> least
> > > initially] hosts a separate shard of a big index [too big to sit on the
> > > machine].  I'm able to put a cloud together by telling it that I have
> (to
> > > start out with) 4 nodes, and then starting up nodes on 3 machines
> > pointin=
> > g
> > > at the zkInstance.  I'm able to load my sharded data onto each machine
> > > individually and it seems to work.
> > >
> > > My concern is that it's not fault tolerant:  if one of the
> non-zookeeper
> > > machines falls over, the whole cluster won't work.  Also, I can't
> create
> > =
> > a
> > > shard with more data, and have it work within the existing cloud.
> > >
> > > I tried using -DshardId=3Dshard5 [on an existing 4-shard cluster], but
> it
> > > just started replicating, which doesn't seem right.
> > >
> > > Are there ways around this?
> > >
> > > Thanks,
> > > Ranjan Bagchi
> > >
> > >
>



-- 
- Mark

http://www.lucidimagination.com


Re: SolrCloud on Trunk

2012-02-29 Thread Mark Miller

On Feb 28, 2012, at 9:33 AM, Jamie Johnson wrote:

> where specifically this is on the roadmap for SolrCloud.  Anyone
> else have those details?

I think we would like to do this sometime in the near future, but I don't know 
exactly what time frame fits in yet. There is a lot to do still, and we also 
need to get a 4 release of both Lucene and Solr out to users soon. It could be 
in a point release later - but it's open source - it really just depends on 
what people start doing it and get it done. I will say it's something I'd like 
to see done.

With what we have now, one option we have talked about in the past was to just 
install multiple shards on a single machine - later you can start up a replica 
on a new machine when you are ready to grow and kill the original shard.

i.e. you could startup 15 shards on a single machine, and then over time 
migrate shards off nodes and onto new hardware. It's as simple as starting up a 
new replica on the new hardware and removing the core on machines you want to 
stop serving that shard from. This would let you expand to a 15 shard/machine 
cluster with N replicas (scaling replicas is as simple as starting a new node 
or stopping an old one).

- Mark Miller
lucidimagination.com













Re: Solr Cloud, Commits and Master/Slave configuration

2012-02-29 Thread Mark Miller
We actually do currently batch updates - we are being somewhat loose when we 
say a document at a time. There is a buffer of updates per replica that gets 
flushed depending on the requests coming through and the buffer size.

- Mark Miller
lucidimagination.com

On Feb 28, 2012, at 3:38 AM, eks dev wrote:

> SolrCluod is going to be great, NRT feature is really huge step
> forward, as well as central configuration, elasticity ...
> 
> The only thing I do not yet understand is treatment of cases that were
> traditionally covered by Master/Slave setup. Batch update
> 
> If I get it right (?), updates to replicas are sent one by one,
> meaning when one server receives update, it gets forwarded to all
> replicas. This is great for reduced update latency case, but I do not
> know how is it implemented if you hit it with "batch" update. This
> would cause huge amount of update commands going to replicas. Not so
> good for throughput.
> 
> - Master slave does distribution at segment level, (no need to
> replicate analysis, far less network traffic). Good for batch updates
> - SolrCloud does par update command (low latency, but chatty and
> Analysis step is done N_Servers times). Good for incremental updates
> 
> Ideally, some sort of "batching" is going to be available in
> SolrCloud, and some cont roll over it, e.g. forward batches of 1000
> documents (basically keep update log slightly longer and forward it as
> a batch update command). This would still cause duplicate analysis,
> but would reduce network traffic.
> 
> Please bare in mind, this is more of a question than a statement,  I
> didn't look at the cloud code. It might be I am completely wrong here!
> 
> 
> 
> 
> 
> On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson  
> wrote:
>> As I understand it (and I'm just getting into SolrCloud myself), you can
>> essentially forget about master/slave stuff. If you're using NRT,
>> the soft commit will make the docs visible, you don't ned to do a hard
>> commit (unlike the master/slave days). Essentially, the update is sent
>> to each shard leader and then fanned out into the replicas for that
>> leader. All automatically. Leaders are elected automatically. ZooKeeper
>> is used to keep the cluster information.
>> 
>> Additionally, SolrCloud keeps a transaction log of the updates, and replays
>> them if the indexing is interrupted, so you don't risk data loss the way
>> you used to.
>> 
>> There aren't really masters/slaves in the old sense any more, so
>> you have to get out of that thought-mode (it's hard, I know).
>> 
>> The code is under pretty active development, so any feedback is
>> valuable
>> 
>> Best
>> Erick
>> 
>> On Mon, Feb 27, 2012 at 3:26 AM, roz dev  wrote:
>>> Hi All,
>>> 
>>> I am trying to understand features of Solr Cloud, regarding commits and
>>> scaling.
>>> 
>>> 
>>>   - If I am using Solr Cloud then do I need to explicitly call commit
>>>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
>>>   writing to disk?
>>> 
>>> 
>>>   - Do We still need to use  Master/Slave setup to scale searching? If we
>>>   have to use Master/Slave setup then do i need to issue hard-commit to make
>>>   my changes visible to slaves?
>>>   - If I were to use NRT with Master/Slave setup with soft commit then
>>>   will the slave be able to see changes made on master with soft commit?
>>> 
>>> Any inputs are welcome.
>>> 
>>> Thanks
>>> 
>>> -Saroj














Re: SolrCloud on Trunk

2012-02-29 Thread Jamie Johnson
Mark,

Is there a ticket around doing this?  If the work/design was written
down somewhere the community might have a better idea of how exactly
we could help.

On Wed, Feb 29, 2012 at 11:21 PM, Mark Miller  wrote:
>
> On Feb 28, 2012, at 9:33 AM, Jamie Johnson wrote:
>
>> where specifically this is on the roadmap for SolrCloud.  Anyone
>> else have those details?
>
> I think we would like to do this sometime in the near future, but I don't 
> know exactly what time frame fits in yet. There is a lot to do still, and we 
> also need to get a 4 release of both Lucene and Solr out to users soon. It 
> could be in a point release later - but it's open source - it really just 
> depends on what people start doing it and get it done. I will say it's 
> something I'd like to see done.
>
> With what we have now, one option we have talked about in the past was to 
> just install multiple shards on a single machine - later you can start up a 
> replica on a new machine when you are ready to grow and kill the original 
> shard.
>
> i.e. you could startup 15 shards on a single machine, and then over time 
> migrate shards off nodes and onto new hardware. It's as simple as starting up 
> a new replica on the new hardware and removing the core on machines you want 
> to stop serving that shard from. This would let you expand to a 15 
> shard/machine cluster with N replicas (scaling replicas is as simple as 
> starting a new node or stopping an old one).
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud on Trunk

2012-02-29 Thread Yonik Seeley
On Thu, Mar 1, 2012 at 12:27 AM, Jamie Johnson  wrote:
> Is there a ticket around doing this?

Around splitting shards?

The easiest thing to consider is just splitting a single shard in two
reusing some of the existing buffering/replication mechanisms we have.
1) create two new shards to represent each half of the old index
2) make sure leaders are forwarding udpates to them and that the
shards are buffering them
3) do a commit and split the current index
4) proceed with recovery as normal on the two new shards (replicate
the halfs, apply the buffered updates)
5) some unresolved stuff such as how to transition leadership from the
single big shard to the smaller shards.  maybe just handle like leader
failure.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Solr Design question on spatial search

2012-02-29 Thread Dirceu Vieira
I believe that what you need is spatial search...

Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch

On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar wrote:

> Hello,
>
> I have a design question for Solr.
>
> I work for an enterprise which has a lot of retail stores (approx. 20K).
> These retail stores are spread across the world.  My search requirement is
> to find all the cities which are within x miles of a retail store.
>
> So lets say if we have a retail Store in San Francisco and if I search for
> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
> returned as they are within x miles from San Francisco. I also want to rank
> the search results by their distance.
>
> I can create an index with all the cities in it but I am not sure how do I
> ensure that the cities returned in a search result have a nearby retail
> store. Any suggestions ?
>
> Thanks,
> Venu,
>



-- 
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr