Re: Issue with large html indexing

2013-10-24 Thread Raheel Hasan
ok. see this:
http://s23.postimg.org/yck2s5k1n/html_indexing.png



On Wed, Oct 23, 2013 at 10:45 PM, Erick Erickson wrote:

> Attachments and images are often eaten by the mail server, your image is
> not visible at least to me. Can you describe what you're seeing? Or post
> the image somewhere and provide a link?
>
> Best,
> Erick
>
>
> On Wed, Oct 23, 2013 at 11:07 AM, Raheel Hasan  >wrote:
>
> > Hi,
> >
> > I have an issue here while indexing large html. Here is the confguration
> > for that:
> >
> > 1) Data is imported via URLDataSource / PlainTextEntityProcessor (DIH)
> >
> > 2) Schema has this for the field:
> > type="text_en_splitting" indexed="true" stored="false" required="false"
> >
> > 3) text_en_splitting has the following work done for indexing:
> > HTMLStripCharFilterFactory
> > WhitespaceTokenizerFactory (create tokens)
> > StopFilterFactory
> > WordDelimiterFilterFactory
> > ICUFoldingFilterFactory
> > PorterStemFilterFactory
> > RemoveDuplicatesTokenFilterFactory
> > LengthFilterFactory
> >
> > However, the indexed data is like this (as in the attached image):
> > [image: Inline image 1]
> >
> >
> > so what are these numbers?
> > If I put small html, it works fine, but as the size of html file
> > increases, this is what happens..
> >
> > --
> > Regards,
> > Raheel Hasan
> >
>



-- 
Regards,
Raheel Hasan


Re: Minor bug with CloudSolrServer and collection-alias.

2013-10-24 Thread Thomas Egense
Thanks to both of you for fixing the bug. Impressive response time for the
fix (7 hours).

Thomas Egense


On Wed, Oct 23, 2013 at 7:16 PM, Mark Miller  wrote:

> I filed https://issues.apache.org/jira/browse/SOLR-5380 and just
> committed a fix.
>
> - Mark
>
> On Oct 23, 2013, at 11:15 AM, Shawn Heisey  wrote:
>
> > On 10/23/2013 3:59 AM, Thomas Egense wrote:
> >> Using cloudSolrServer.setDefaultCollection(collectionId) does not work
> as
> >> intended for an alias spanning more than 1 collection.
> >> The virtual collection-alias collectionID is recoqnized as a existing
> >> collection, but it does only query one of the collections it is mapped
> to.
> >>
> >> You can confirm this easy in AliasIntegrationTest.
> >>
> >> The test-class AliasIntegrationTest creates to cores with 2 and 3
> different
> >> documents. And then creates an alias pointing to both of them.
> >>
> >> Line 153:
> >>// search with new cloud client
> >>CloudSolrServer cloudSolrServer = new
> >> CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
> >>cloudSolrServer.setParallelUpdates(random().nextBoolean());
> >>query = new SolrQuery("*:*");
> >>query.set("collection", "testalias");
> >>res = cloudSolrServer.query(query);
> >>cloudSolrServer.shutdown();
> >>assertEquals(5, res.getResults().getNumFound());
> >>
> >> No unit-test bug here, however if you change it from setting the
> >> collectionid on the query but on CloudSolrServer instead,it will produce
> >> the bug:
> >>
> >>// search with new cloud client
> >>CloudSolrServer cloudSolrServer = new
> >>CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
> >>cloudSolrServer.setDefaultCollection("testalias");
> >>cloudSolrServer.setParallelUpdates(random().nextBoolean());
> >>query = new SolrQuery("*:*");
> >>//query.set("collection", "testalias");
> >>res = cloudSolrServer.query(query);
> >>cloudSolrServer.shutdown();
> >>assertEquals(5, res.getResults().getNumFound());  <-- Assertion
> failure
> >>
> >> Should I create a Jira issue for this?
> >
> > Thomas,
> >
> > I have confirmed this with the following test patch, which adds to the
> > test rather than changing what's already there:
> >
> > http://apaste.info/9ke5
> >
> > I'm about to head off to the train station to start my commute, so I
> > will be unavailable for a little while.  If you haven't gotten the jira
> > filed by the time I get to another computer, I will create it.
> >
> > Thanks,
> > Shawn
> >
>
>


RE: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Hoggarth, Gil
Absolutely, the scenario I'm seeing does _sound_ like I've not specified
the number of shards, but I think I have - the evidence is:
- DnumShards=24 defined within the /etc/sysconfig/solrnode* files

- DnumShards=24 seen on each 'ps' line (two nodes listed here):
" tomcat   26135 1  5 09:51 ?00:00:22 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode1
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp
org.apache.catalina.startup.Bootstrap start
tomcat   26225 1  5 09:51 ?00:00:19 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode2
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp
org.apache.catalina.startup.Bootstrap start"

- The Solr node dashboard shows "-DnumShards=24" in its list of Args for
each node

And yet, the ldwa01 nodes are leader and replica of shard 17 and there
are no other shard leaders created. Plus, if I only change the ZK
ensemble declarations in /etc/system/solrnode* to the different dev ZK
servers, all 24 leaders are created before any replicas are added.

I can also mention, when I browse the Cloud view, I can see both the
ldwa01 collection and the ukdomain collection listed, suggesting that
this information comes from the ZKs - I assume this is as expected.
Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed
for ldwa01 but these addresses are also listed as 'Down' in the ukdomain
collection (except for :8983 which only shows in the ldwa01 collection).

Any help very gratefully received.
Gil

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 23 October 2013 18:50
To: solr-user@lucene.apache.org
Subject: Re: New shard leaders or existing shard replicas depends on
zookeeper?

My first impulse would be to ask how you created the collection. It sure
_sounds_ like you didn't specify 24 shards and thus have only a single
shard, one leader and 23 replicas

bq: ...to point to the zookeeper ensemble also used for the ukdomain
collection...

so my guess is that this ZK ensemble has the ldwa01 collection defined
as having only one shard

I admit I pretty much skimmed your post though...

Best,
Erick


On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil 
wrote:

> Hi solr-users,
>
>
>
> I'm seeing some confusing behaviour in Solr/zookeeper and hope you can

> shed some light on what's happening/how I can correct it.
>
>
>
> We have two physical servers running automated builds of RedHat 6.4 
> and Solr 4.4.0 that host two separate Solr services. The first server 
> (called ld01) has 24 shards and hosts a collection called 'ukdomain'; 
> the second server (ld02) also has 24 shards and hosts a different 
> collection called 'ldwa01'. It's evidently important to note that 
> previously both of these physical servers provided the 'ukdomain'
> collection, but the 'ldwa01' server has been rebuilt for the new 
> collection.
>
>
>
> When I start the ldwa01 solr nodes with their zookeeper configuration 
> (defined in /etc/sysconfig/solrnode* and with collection.configName as
> 'ldwa01cfg') pointing to the development zookeeper ensemble, all nodes

> initially become shard leaders and then replicas as I'd expect. But if

> I change the ldwa01 solr nodes to point to the zookeeper ensemble also

> used for the ukdomain collection, all ldwa01 solr nodes start on the 
> same shard (that is, the first ldwa01 solr node becomes the shard 
> leader, then every other solr node becomes a replica for this shard). 
> The significant point here is no other ldwa01 shards gain leaders (or
replicas).
>
>
>
> The ukdomain collection uses a zookeeper collection.configName of 
> 'ukdomaincfg', and prior to the creation of this ldwa01 service the 
> collection.configName of 'ldwa01cfg' has never previously 

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but it's 
for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I have 
the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

2013-10-24 Thread michael.boom
Hi!

I have a SolrCloud setup, on two servers 3 shards, replicationFactor=2.
Today I trigered the optimization on core *shard2_replica2* which only
contained 3M docs, and 2.7G.
The size of the other shards were shard3=2.7G and shard1=48G (the routing is
implicit but after some update deadlocks and restarts the shard range in
Zookeeper got null and everything since then apparently got indexed to
shard1)

So, half an hour after i triggered the optimization, via the Admin UI, i
noticed that used space was increasing alot on *both servers* for cores
*shard1_replica1 and shard1_replica2*. 
It was now 67G and increasing. In the end after about 40 minutes from the
start operation shard1 was done optimizing on both servers leaving
shard1_replica1 and shard1_replica2 at about 33G.

Any idea what is happening and why the core on which i wanted the
optimization to happen, got no optimization and instead another shard got
optimized, on both servers?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimizations-of-all-cores-in-that-collection-tp4097499.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellcheck with Distributed Search (sharding).

2013-10-24 Thread Luis Cappa Banda
Any idea?


2013/10/23 Luis Cappa Banda 

> More info:
>
> When executing the Query to a single Solr server it works:
> http://solr1:8080/events/data/suggest?q=m&wt=json
>
> {
>
>- responseHeader:
>{
>   - status: 0,
>   - QTime: 1
>   },
>- response:
>{
>   - numFound: 0,
>   - start: 0,
>   - docs: [ ]
>   },
>- spellcheck:
>{
>   - suggestions:
>   [
>  - "m",
>  -
>  {
> - numFound: 4,
> - startOffset: 0,
> - endOffset: 1,
> - suggestion:
> [
>- "marca",
>- "marcacom",
>- "mis",
>- "mispelotas"
>]
> }
>  ]
>   }
>
> }
>
>
> But when choosing the Request handler this way it doesn't:
> http://solr1:8080/events/data/select?*qt=/sugges*t&wt=json&q=*:*
>
>
>
>
> 2013/10/23 Luis Cappa Banda 
>
>> Hello!
>>
>> I'be been trying to enable Spellchecking using sharding following the
>> steps from the Wiki, but I failed, :-( What I do is:
>>
>> *Solrconfig.xml*
>>
>>
>> <*searchComponent name="suggest"* class="solr.SpellCheckComponent">
>> 
>>  suggest
>> org.apache.solr.spelling.suggest.Suggester
>>  > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>> suggestion
>>  true
>> 
>> 
>>
>>
>> <*requestHandler name="/suggest"* class="solr.SearchHandler">
>> 
>>  suggestion
>> true
>>  suggest
>> 10
>>  
>>   
>> suggest
>>   
>> 
>>
>>
>> *Note:* I have two shards (solr1 and solr2) and both have the same
>> solrconfig.xml. Also, bot indexes were optimized to create the spellchecker
>> indexes.
>>
>> *Query*
>>
>>
>> solr1:8080/events/data/select?q=m&qt=/suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
>>
>> *
>> *
>> *Response*
>> *
>> *
>> {
>>
>>- responseHeader:
>>{
>>   - status: 404,
>>   - QTime: 12,
>>   - params:
>>   {
>>  - shards: "solr1:8080/events/data,solr2:8080/events/data",
>>  - shards.qt: "/suggestion",
>>  - q: "m",
>>  - wt: "json",
>>  - qt: "/suggestion"
>>  }
>>   },
>>- error:
>>{
>>   - msg: "Server at http://solr1:8080/events/data returned non ok
>>   status:404, message:Not Found",
>>   - code: 404
>>   }
>>
>> }
>>
>> More query syntaxes that I used and that doesn't work:
>>
>>
>> http://solr1:8080/events/data/select?q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
>>
>>
>> http://solr1:8080/events/data/select?q=*:*&spellcheck.q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
>>
>>
>> Any idea of what I'm doing wrong?
>>
>> Thank you very much in advance!
>>
>> Best regards,
>>
>> --
>> - Luis Cappa
>>
>
>
>
> --
> - Luis Cappa
>



-- 
- Luis Cappa


Proposal for new feature, cold replicas, brainstorming

2013-10-24 Thread yriveiro
I'm wondering some time ago if it's possible have replicas of a shard
synchronized but in an state that they can't accept queries only updates. 

This replica in "replication" mode only awake to accept queries if it's the
last alive replica and goes to replication mode when other replica becomes
alive and synchronized.

The motivation of this is simple, I want have replication but I don't want
have n replicas actives with full resources allocated (cache and so on).
This is usefull in enviroments where replication is needed but a high query
throughput is not fundamental and the resources are limited.

I know that right now is not possible, but I think that it's a feature that
can be implemented in a easy way creating a new status for shards.

The bottom line question is, I'm the only one with this kind of
requeriments? Does it make sense one functionality like this?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Proposal-for-new-feature-cold-replicas-brainstorming-tp4097501.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query & result caching with custom functions

2013-10-24 Thread Mathias Lux
Hi all!

Got a question on the Solr cache :)

I've written a custom function, which is able to provide a distance
based on some DocValues to re-sort result lists. This basically works
great, but we've got the problem that if I don't change the query, but
the function parameters, Solr delivers a cached result without
re-ordering. I turned off caching and see there, problem solved. But
of course this is not a avenue I want to pursue further as it doesn't
make sense for a prodcutive system.

Do you have any ideas (beyond fake query modification and turning off
caching) to counteract?

btw. I'm using Solr 4.4 (so if you are aware of the issue and it has
been resolved in 4.5 I'll port it :) The code I'm using is at
https://bitbucket.org/dermotte/liresolr

regards,
Mathias

-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


Solr subset searching in 100-million document index

2013-10-24 Thread Sandeep Gupta
Hi,

We have a Solr index of around 100 million documents with each document
being given a region id growing at a rate of about 10 million documents per
month - the average document size being aronud 10KB of pure text. The total
number of region ids are themselves in the range of 2.5 million.

We want to search for a query with a given list of region ids. The number
of region ids in this list is usually around 250-300 (most of the time),
but can be upto 500, with a maximum cap of around 2000 ids in one request.


What is the best way to model such queries besides using an IN param in the
query, or using a Filter FQ in the query? Are there any other faster
methods available?


If it may help, the index is on a VM with 4 virtual-cores and has currently
4GB of Java memory allocated out of 16GB in the machine. The number of
queries do not exceed more than 1 per minute for now. If needed, we can
throw more hardware to the index - but the index will still be only on a
single machine for atleast 6 months.

Regards,
Sandeep Gupta


Re: Terms function join with a Select function ?

2013-10-24 Thread Erik Hatcher
That would be called faceting :)

http://wiki.apache.org/solr/SimpleFacetParameters




On Oct 24, 2013, at 5:23 AM, Bruno Mannina  wrote:

> Dear All,
> 
> Ok I have an answer concerning the first question (limit)
> It's the terms.limit parameters.
> 
> But I can't find how to apply a Terms request on a query result
> 
> any idea ?
> 
> Bruno
> 
> Le 23/10/2013 23:19, Bruno Mannina a écrit :
>> Dear Solr users,
>> 
>> I use the Terms function to see the frequency data in a field but it's for 
>> the whole database.
>> 
>> I have 2 questions:
>> - Is it possible to increase the number of statistic ? actually I have the 
>> 10 first frequency term.
>> 
>> - Is it possible to limit this statistic to the result of a request ?
>> 
>> PS: the second question is very important for me.
>> 
>> Many thanks
>> 
>> 
> 
> 
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
> parce que la protection avast! Antivirus est active.
> http://www.avast.com
> 



Basic query process question with fl=id

2013-10-24 Thread Manuel Le Normand
Hi

Any distributed lookup is basically composed of two stages: the first
collecting all the matching documents from every shard and a second which
fetches additional information about specific ids (i.e stored, termVectors).

It can be seen in the logs of each shard (isShard=true), where first
request logs the num of hits that were received on the query by the
specific shard and a second that contains the ids fields (ids=...) for the
additional fetch.
At the end of both I get a total QTime of the query and the total num of
hits.

My question is about the case only id's are requested (fl=id). This query
should make only one request against a shard, while it actually does the
two of them.

Looks like the response builder has to go through these two stages no
matter what is the kind of query.

My question:
1. Is it normal the response builder has to go though both stages?
2. Does the first request gets internal lucene DocId's or the actual
uniqueKey id?
3. A query as above (fl=id), where is the Id read from? Is it fetched from
the stored file? or doc value file if exists? Because if fetched from the
stored, a high row param (say 1000 in my case) would need 1000 lookups
which could badly heart performance.

Thanks
Manuel


RE: Spellcheck with Distributed Search (sharding).

2013-10-24 Thread Dyer, James
Is it that your request handler is named "/suggest" but you are setting 
"shards.qt" to "/suggestion" ?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Luis Cappa Banda [mailto:luisca...@gmail.com] 
Sent: Thursday, October 24, 2013 6:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck with Distributed Search (sharding).

Any idea?


2013/10/23 Luis Cappa Banda 

> More info:
>
> When executing the Query to a single Solr server it works:
> http://solr1:8080/events/data/suggest?q=m&wt=json
>
> {
>
>- responseHeader:
>{
>   - status: 0,
>   - QTime: 1
>   },
>- response:
>{
>   - numFound: 0,
>   - start: 0,
>   - docs: [ ]
>   },
>- spellcheck:
>{
>   - suggestions:
>   [
>  - "m",
>  -
>  {
> - numFound: 4,
> - startOffset: 0,
> - endOffset: 1,
> - suggestion:
> [
>- "marca",
>- "marcacom",
>- "mis",
>- "mispelotas"
>]
> }
>  ]
>   }
>
> }
>
>
> But when choosing the Request handler this way it doesn't:
> http://solr1:8080/events/data/select?*qt=/sugges*t&wt=json&q=*:*
>
>
>
>
> 2013/10/23 Luis Cappa Banda 
>
>> Hello!
>>
>> I'be been trying to enable Spellchecking using sharding following the
>> steps from the Wiki, but I failed, :-( What I do is:
>>
>> *Solrconfig.xml*
>>
>>
>> <*searchComponent name="suggest"* class="solr.SpellCheckComponent">
>> 
>>  suggest
>> org.apache.solr.spelling.suggest.Suggester
>>  > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>> suggestion
>>  true
>> 
>> 
>>
>>
>> <*requestHandler name="/suggest"* class="solr.SearchHandler">
>> 
>>  suggestion
>> true
>>  suggest
>> 10
>>  
>>   
>> suggest
>>   
>> 
>>
>>
>> *Note:* I have two shards (solr1 and solr2) and both have the same
>> solrconfig.xml. Also, bot indexes were optimized to create the spellchecker
>> indexes.
>>
>> *Query*
>>
>>
>> solr1:8080/events/data/select?q=m&qt=/suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
>>
>> *
>> *
>> *Response*
>> *
>> *
>> {
>>
>>- responseHeader:
>>{
>>   - status: 404,
>>   - QTime: 12,
>>   - params:
>>   {
>>  - shards: "solr1:8080/events/data,solr2:8080/events/data",
>>  - shards.qt: "/suggestion",
>>  - q: "m",
>>  - wt: "json",
>>  - qt: "/suggestion"
>>  }
>>   },
>>- error:
>>{
>>   - msg: "Server at http://solr1:8080/events/data returned non ok
>>   status:404, message:Not Found",
>>   - code: 404
>>   }
>>
>> }
>>
>> More query syntaxes that I used and that doesn't work:
>>
>>
>> http://solr1:8080/events/data/select?q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
>>
>>
>> http://solr1:8080/events/data/select?q=*:*&spellcheck.q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
>>
>>
>> Any idea of what I'm doing wrong?
>>
>> Thank you very much in advance!
>>
>> Best regards,
>>
>> --
>> - Luis Cappa
>>
>
>
>
> --
> - Luis Cappa
>



-- 
- Luis Cappa



Re: Proposal for new feature, cold replicas, brainstorming

2013-10-24 Thread Toke Eskildsen
On Thu, 2013-10-24 at 13:27 +0200, yriveiro wrote:
> The motivation of this is simple, I want have replication but I don't want
> have n replicas actives with full resources allocated (cache and so on).
> This is usefull in enviroments where replication is needed but a high query
> throughput is not fundamental and the resources are limited.

Coincidentally we recently talked about the exact same setup.

We are looking at sharding a 20 TB index into 20 * 1 TB shards, each
located on their own dedicated physical SSD, which has more than enough
horsepower for our needs. For replication, we have a remote storage
system capable of serving requests for 2-4 shards with acceptable
latency.

Projected performance for the SSD setup is superior (5-10 times) to our
remote storage, so we would like to hit only the SSDs if possible.
Setting up a cloud to issue all requests to the SSD-shards unless a
catastrophic failure happened to on of them and in that case fallback to
the remote story replica for only that shard, would be perfect.

> I know that right now is not possible, but I think that it's a feature that
> can be implemented in a easy way creating a new status for shards.

shardIsLastResort=true? On paper it seems like a simple addition, but I
am not at familiar enough with the SolrCloud-code to guess if it is easy
to implement.

- Toke Eskildsen, State and University Library, Denmark




Searching on special characters

2013-10-24 Thread johnmunir
Hi,


How should I setup Solr so I can search and get hit on special characters such 
as: + - && || ! ( ) { } [ ] ^ " ~ * ? : \


My need is, if a user has text like so:


Doc-#1: "(Solr)"
Doc-#2: "Solr"


And they type "(solr)" I want a hit on "(solr)" only in document #1, with the 
brackets matching.  And if they type "solr", they will get a hit in Document #2 
only.


An additional nice-to-have is, if they type "solr", I want a hit in both 
document #1 and #2.


Here is what my current schema.xml looks like:



  







  



Currently, special characters are being stripped.



Any idea how I can configure Solr to do this?  I'm using Solr 3.6.



Thanks !!


-MJ


Re: Searching on special characters

2013-10-24 Thread Jack Krupansky
Have two or three copies of the text, one field could be raw string and 
boosted heavily for exact match, a second could be text using the keyword 
tokenizer but with lowercase filter also heavily boosted, and the third 
field general, tokenized text with a lower boost. You could also have a copy 
that uses the keyword tokenizer to maintain a single token but also applies 
a regex filter to strip special characters and applies a lower case filter 
and give that an intermediate boost.


-- Jack Krupansky

-Original Message- 
From: johnmu...@aol.com

Sent: Thursday, October 24, 2013 9:20 AM
To: solr-user@lucene.apache.org
Subject: Searching on special characters

Hi,


How should I setup Solr so I can search and get hit on special characters 
such as: + - && || ! ( ) { } [ ] ^ " ~ * ? : \



My need is, if a user has text like so:


Doc-#1: "(Solr)"
Doc-#2: "Solr"


And they type "(solr)" I want a hit on "(solr)" only in document #1, with 
the brackets matching.  And if they type "solr", they will get a hit in 
Document #2 only.



An additional nice-to-have is, if they type "solr", I want a hit in both 
document #1 and #2.



Here is what my current schema.xml looks like:



 
   
   words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
   generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" 
splitOnNumerics="1" stemEnglishPossessive="1" preserveOriginal="1"/>

   
   protected="protwords.txt"/>

   
   
 



Currently, special characters are being stripped.



Any idea how I can configure Solr to do this?  I'm using Solr 3.6.



Thanks !!


-MJ 



Re: Spellcheck with Distributed Search (sharding).

2013-10-24 Thread Luis Cappa Banda
I'ts just a type error, sorry about that! The Request Handler is OK spelled
and it doesn't work.


2013/10/24 Dyer, James 

> Is it that your request handler is named "/suggest" but you are setting
> "shards.qt" to "/suggestion" ?
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Luis Cappa Banda [mailto:luisca...@gmail.com]
> Sent: Thursday, October 24, 2013 6:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck with Distributed Search (sharding).
>
> Any idea?
>
>
> 2013/10/23 Luis Cappa Banda 
>
> > More info:
> >
> > When executing the Query to a single Solr server it works:
> > http://solr1:8080/events/data/suggest?q=m&wt=json<
> http://solrclusterd.buguroo.dev:8080/events/data/suggest?q=m&wt=json>
> >
> > {
> >
> >- responseHeader:
> >{
> >   - status: 0,
> >   - QTime: 1
> >   },
> >- response:
> >{
> >   - numFound: 0,
> >   - start: 0,
> >   - docs: [ ]
> >   },
> >- spellcheck:
> >{
> >   - suggestions:
> >   [
> >  - "m",
> >  -
> >  {
> > - numFound: 4,
> > - startOffset: 0,
> > - endOffset: 1,
> > - suggestion:
> > [
> >- "marca",
> >- "marcacom",
> >- "mis",
> >- "mispelotas"
> >]
> > }
> >  ]
> >   }
> >
> > }
> >
> >
> > But when choosing the Request handler this way it doesn't:
> > http://solr1:8080/events/data/select?*qt=/sugges*t&wt=json&q=*:*<
> http://solrclusterd.buguroo.dev:8080/events/data/select?qt=/suggest&wt=json&q=*:*
> >
> >
> >
> >
> >
> > 2013/10/23 Luis Cappa Banda 
> >
> >> Hello!
> >>
> >> I'be been trying to enable Spellchecking using sharding following the
> >> steps from the Wiki, but I failed, :-( What I do is:
> >>
> >> *Solrconfig.xml*
> >>
> >>
> >> <*searchComponent name="suggest"* class="solr.SpellCheckComponent">
> >> 
> >>  suggest
> >> org.apache.solr.spelling.suggest.Suggester
> >>   >> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
> >> suggestion
> >>  true
> >> 
> >> 
> >>
> >>
> >> <*requestHandler name="/suggest"* class="solr.SearchHandler">
> >> 
> >>  suggestion
> >> true
> >>  suggest
> >> 10
> >>  
> >>   
> >> suggest
> >>   
> >> 
> >>
> >>
> >> *Note:* I have two shards (solr1 and solr2) and both have the same
> >> solrconfig.xml. Also, bot indexes were optimized to create the
> spellchecker
> >> indexes.
> >>
> >> *Query*
> >>
> >>
> >>
> solr1:8080/events/data/select?q=m&qt=/suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
> >>
> >> *
> >> *
> >> *Response*
> >> *
> >> *
> >> {
> >>
> >>- responseHeader:
> >>{
> >>   - status: 404,
> >>   - QTime: 12,
> >>   - params:
> >>   {
> >>  - shards: "solr1:8080/events/data,solr2:8080/events/data",
> >>  - shards.qt: "/suggestion",
> >>  - q: "m",
> >>  - wt: "json",
> >>  - qt: "/suggestion"
> >>  }
> >>   },
> >>- error:
> >>{
> >>   - msg: "Server at http://solr1:8080/events/data returned non ok
> >>   status:404, message:Not Found",
> >>   - code: 404
> >>   }
> >>
> >> }
> >>
> >> More query syntaxes that I used and that doesn't work:
> >>
> >>
> >>
> http://solr1:8080/events/data/select?q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
> <
> http://solrclusterd.buguroo.dev:8080/events/data/select?q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data
> >
> >>
> >>
> >>
> http://solr1:8080/events/data/select?q=*:*&spellcheck.q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data
> <
> http://solrclusterd.buguroo.dev:8080/events/data/select?q=*:*&spellcheck.q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data
> >
> >>
> >>
> >> Any idea of what I'm doing wrong?
> >>
> >> Thank you very much in advance!
> >>
> >> Best regards,
> >>
> >> --
> >> - Luis Cappa
> >>
> >
> >
> >
> > --
> > - Luis Cappa
> >
>
>
>
> --
> - Luis Cappa
>
>


-- 
- Luis Cappa


Re: Searching on special characters

2013-10-24 Thread johnmunir
I'm not sure what you mean.  Based on what you are saying, is there an example 
of how I can setup my schema.xml to get the result I need?


Also, the way I execute a search is using 
http://localhost:8080/solr/select/?q=  Does your solution require 
me to change this?  If so, in what way?


It would be great if all this is documented somewhere, so I won't have to bug 
you guys !!!



--MJ



-Original Message-
From: Jack Krupansky 
To: solr-user 
Sent: Thu, Oct 24, 2013 9:39 am
Subject: Re: Searching on special characters


Have two or three copies of the text, one field could be raw string and 
boosted heavily for exact match, a second could be text using the keyword 
tokenizer but with lowercase filter also heavily boosted, and the third 
field general, tokenized text with a lower boost. You could also have a copy 
that uses the keyword tokenizer to maintain a single token but also applies 
a regex filter to strip special characters and applies a lower case filter 
and give that an intermediate boost.

-- Jack Krupansky

-Original Message- 
From: johnmu...@aol.com
Sent: Thursday, October 24, 2013 9:20 AM
To: solr-user@lucene.apache.org
Subject: Searching on special characters

Hi,


How should I setup Solr so I can search and get hit on special characters 
such as: + - && || ! ( ) { } [ ] ^ " ~ * ? : \


My need is, if a user has text like so:


Doc-#1: "(Solr)"
Doc-#2: "Solr"


And they type "(solr)" I want a hit on "(solr)" only in document #1, with 
the brackets matching.  And if they type "solr", they will get a hit in 
Document #2 only.


An additional nice-to-have is, if they type "solr", I want a hit in both 
document #1 and #2.


Here is what my current schema.xml looks like:



  







  



Currently, special characters are being stripped.



Any idea how I can configure Solr to do this?  I'm using Solr 3.6.



Thanks !!


-MJ 


 



Re: Issue with large html indexing

2013-10-24 Thread Shawn Heisey
On 10/24/2013 2:11 AM, Raheel Hasan wrote:
> ok. see this:
> http://s23.postimg.org/yck2s5k1n/html_indexing.png

A recap.  You said your index analysis chain is this:

HTMLStripCharFilterFactory
WhitespaceTokenizerFactory (create tokens)
StopFilterFactory
WordDelimiterFilterFactory
ICUFoldingFilterFactory
PorterStemFilterFactory
RemoveDuplicatesTokenFilterFactory
LengthFilterFactory

Your picture says you have 1 document, and this field contains 1036
terms. The numbers are likely numbers that are in your html document.
You never showed us the input document.  It is likely that the
whitespace tokenizer and/or the WordDelimeter filter are producing these
numbers as standalone tokens.  The tokenizer is pretty easy to
understand - it splits on whitespace.  Please see the following to know
what the options for WordDelimeterFilterFactory will do:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Thanks,
Shawn



Re: Query & result caching with custom functions

2013-10-24 Thread Shawn Heisey
On 10/24/2013 5:35 AM, Mathias Lux wrote:
> I've written a custom function, which is able to provide a distance
> based on some DocValues to re-sort result lists. This basically works
> great, but we've got the problem that if I don't change the query, but
> the function parameters, Solr delivers a cached result without
> re-ordering. I turned off caching and see there, problem solved. But
> of course this is not a avenue I want to pursue further as it doesn't
> make sense for a prodcutive system.
> 
> Do you have any ideas (beyond fake query modification and turning off
> caching) to counteract?
> 
> btw. I'm using Solr 4.4 (so if you are aware of the issue and it has
> been resolved in 4.5 I'll port it :) The code I'm using is at
> https://bitbucket.org/dermotte/liresolr

I suspect that the queryResultCache is not paying attention to the fact
that parameters for your plugin have changed.  This probably means that
your plugin must somehow inform the "cache check" code that something
HAS changed.

How you actually do this is a mystery to me because it involves parts of
the code that are beyond my understanding, but it MIGHT involve making
sure that parameters related to your code are saved as part of the entry
that goes into the cache.

Thanks,
Shawn



Re: Solr subset searching in 100-million document index

2013-10-24 Thread Joel Bernstein
Sandeep,

This type of operation can often be expressed as a PostFilter very
efficiently. This is particularly true if the region id's are integer keys.

Joel

On Thu, Oct 24, 2013 at 7:46 AM, Sandeep Gupta  wrote:

> Hi,
>
> We have a Solr index of around 100 million documents with each document
> being given a region id growing at a rate of about 10 million documents per
> month - the average document size being aronud 10KB of pure text. The total
> number of region ids are themselves in the range of 2.5 million.
>
> We want to search for a query with a given list of region ids. The number
> of region ids in this list is usually around 250-300 (most of the time),
> but can be upto 500, with a maximum cap of around 2000 ids in one request.
>
>
> What is the best way to model such queries besides using an IN param in the
> query, or using a Filter FQ in the query? Are there any other faster
> methods available?
>
>
> If it may help, the index is on a VM with 4 virtual-cores and has currently
> 4GB of Java memory allocated out of 16GB in the machine. The number of
> queries do not exceed more than 1 per minute for now. If needed, we can
> throw more hardware to the index - but the index will still be only on a
> single machine for atleast 6 months.
>
> Regards,
> Sandeep Gupta
>



--


Re: Query & result caching with custom functions

2013-10-24 Thread Joel Bernstein
Mathias,

I'd have to do a close review of the function sort code to be sure, but I
suspect if you implement the equals() method on the ValueSource it should
solve your caching issue. Also implement hashCode().

Joel


On Thu, Oct 24, 2013 at 10:35 AM, Shawn Heisey  wrote:

> On 10/24/2013 5:35 AM, Mathias Lux wrote:
> > I've written a custom function, which is able to provide a distance
> > based on some DocValues to re-sort result lists. This basically works
> > great, but we've got the problem that if I don't change the query, but
> > the function parameters, Solr delivers a cached result without
> > re-ordering. I turned off caching and see there, problem solved. But
> > of course this is not a avenue I want to pursue further as it doesn't
> > make sense for a prodcutive system.
> >
> > Do you have any ideas (beyond fake query modification and turning off
> > caching) to counteract?
> >
> > btw. I'm using Solr 4.4 (so if you are aware of the issue and it has
> > been resolved in 4.5 I'll port it :) The code I'm using is at
> > https://bitbucket.org/dermotte/liresolr
>
> I suspect that the queryResultCache is not paying attention to the fact
> that parameters for your plugin have changed.  This probably means that
> your plugin must somehow inform the "cache check" code that something
> HAS changed.
>
> How you actually do this is a mystery to me because it involves parts of
> the code that are beyond my understanding, but it MIGHT involve making
> sure that parameters related to your code are saved as part of the entry
> that goes into the cache.
>
> Thanks,
> Shawn
>
>


Re: Solr not indexing everything from MongoDB

2013-10-24 Thread Michael Della Bitta
That's typical for an index that receives updates to the same document. Are
you sure your keys are unique?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Wed, Oct 23, 2013 at 5:57 PM, gohome190  wrote:

> numFound is 10.
> numDocs is 10, maxDoc is 23.  Yeah, Solr 4.x!
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-not-indexing-everything-from-MongoDB-tp4097302p4097340.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Hoggarth, Gil
I think my question is easier, because I think the problem below was
caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg'
zk collection name didn't specify the number of shards (and thus
defaulted to 1).

So, how can I change the number of shards for an existing collection/zk
collection name, especially when the ZK ensemble in question is the
production version and supporting other Solr collections that I do not
want to interrupt. (Which I think means that I can't just delete the
clusterstate.json and restart the ZKs as this will also lose the other
Solr collection information.)

Thanks in advance, Gil

-Original Message-
From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk] 
Sent: 24 October 2013 10:13
To: solr-user@lucene.apache.org
Subject: RE: New shard leaders or existing shard replicas depends on
zookeeper?

Absolutely, the scenario I'm seeing does _sound_ like I've not specified
the number of shards, but I think I have - the evidence is:
- DnumShards=24 defined within the /etc/sysconfig/solrnode* files

- DnumShards=24 seen on each 'ps' line (two nodes listed here):
" tomcat   26135 1  5 09:51 ?00:00:22 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode1
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp
org.apache.catalina.startup.Bootstrap start
tomcat   26225 1  5 09:51 ?00:00:19 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode2
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp
org.apache.catalina.startup.Bootstrap start"

- The Solr node dashboard shows "-DnumShards=24" in its list of Args for
each node

And yet, the ldwa01 nodes are leader and replica of shard 17 and there
are no other shard leaders created. Plus, if I only change the ZK
ensemble declarations in /etc/system/solrnode* to the different dev ZK
servers, all 24 leaders are created before any replicas are added.

I can also mention, when I browse the Cloud view, I can see both the
ldwa01 collection and the ukdomain collection listed, suggesting that
this information comes from the ZKs - I assume this is as expected.
Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed
for ldwa01 but these addresses are also listed as 'Down' in the ukdomain
collection (except for :8983 which only shows in the ldwa01 collection).

Any help very gratefully received.
Gil

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 23 October 2013 18:50
To: solr-user@lucene.apache.org
Subject: Re: New shard leaders or existing shard replicas depends on
zookeeper?

My first impulse would be to ask how you created the collection. It sure
_sounds_ like you didn't specify 24 shards and thus have only a single
shard, one leader and 23 replicas

bq: ...to point to the zookeeper ensemble also used for the ukdomain
collection...

so my guess is that this ZK ensemble has the ldwa01 collection defined
as having only one shard

I admit I pretty much skimmed your post though...

Best,
Erick


On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil 
wrote:

> Hi solr-users,
>
>
>
> I'm seeing some confusing behaviour in Solr/zookeeper and hope you can

> shed some light on what's happening/how I can correct it.
>
>
>
> We have two physical servers running automated builds of RedHat 6.4 
> and Solr 4.4.0 that host two separate Solr services. The first server 
> (called ld01) has 24 shards and hosts a collection called 'ukdomain'; 
> the second server (ld02) also has 24 shards and hosts a different 
> collection called 'ldwa01'. It's evidently important to note that 
> previously both of these physical servers provided the 'ukdomain'
> collection, but the 'ldwa01' server has been rebuilt for the new 
> collection.
>
>
>
> When I start the ldwa01

Re: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Daniel Collins
Ah yes, I was about to mention that, -DnumShards is only actually used when
the collection is being created for the first time.  After that point (i.e.
once the collection exists in ZK), passing it along the command line is
redundant (Solr won't actually read it).  I know preferred mechanism of
creating collections is to use the collectionAPI, in which case you never
use -DnumShards at all.  Having it on the command line can be confusing
(we've fallen into that trap too!)

The only way to change the number of shards on a collection is to use the
collection API to split a shard (and currently you can only do that in
steps of 2, so you'll need to do 1->2, 2->4, 4->8, 8->16.  You can't get
from 1 -> 24 as its not a power of 2 :(   What you want is
https://issues.apache.org/jira/browse/SOLR-5004

Otherwise, you'll need to create a new collection and re-index everything
into that.


On 24 October 2013 16:35, Hoggarth, Gil  wrote:

> I think my question is easier, because I think the problem below was
> caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg'
> zk collection name didn't specify the number of shards (and thus
> defaulted to 1).
>
> So, how can I change the number of shards for an existing collection/zk
> collection name, especially when the ZK ensemble in question is the
> production version and supporting other Solr collections that I do not
> want to interrupt. (Which I think means that I can't just delete the
> clusterstate.json and restart the ZKs as this will also lose the other
> Solr collection information.)
>
> Thanks in advance, Gil
>
> -Original Message-
> From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk]
> Sent: 24 October 2013 10:13
> To: solr-user@lucene.apache.org
> Subject: RE: New shard leaders or existing shard replicas depends on
> zookeeper?
>
> Absolutely, the scenario I'm seeing does _sound_ like I've not specified
> the number of shards, but I think I have - the evidence is:
> - DnumShards=24 defined within the /etc/sysconfig/solrnode* files
>
> - DnumShards=24 seen on each 'ps' line (two nodes listed here):
> " tomcat   26135 1  5 09:51 ?00:00:22 /opt/java/bin/java
> -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log
> ging.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en
> -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf
> -Dcollection.configName=ldwa01cfg -DnumShards=24
> -Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data
> -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
> .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
> /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
> -Dcatalina.base=/opt/tomcat_instances/solrnode1
> -Dcatalina.home=/opt/tomcat
> -Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp
> org.apache.catalina.startup.Bootstrap start
> tomcat   26225 1  5 09:51 ?00:00:19 /opt/java/bin/java
> -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log
> ging.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en
> -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf
> -Dcollection.configName=ldwa01cfg -DnumShards=24
> -Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data
> -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
> .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
> /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
> -Dcatalina.base=/opt/tomcat_instances/solrnode2
> -Dcatalina.home=/opt/tomcat
> -Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp
> org.apache.catalina.startup.Bootstrap start"
>
> - The Solr node dashboard shows "-DnumShards=24" in its list of Args for
> each node
>
> And yet, the ldwa01 nodes are leader and replica of shard 17 and there
> are no other shard leaders created. Plus, if I only change the ZK
> ensemble declarations in /etc/system/solrnode* to the different dev ZK
> servers, all 24 leaders are created before any replicas are added.
>
> I can also mention, when I browse the Cloud view, I can see both the
> ldwa01 collection and the ukdomain collection listed, suggesting that
> this information comes from the ZKs - I assume this is as expected.
> Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed
> for ldwa01 but these addresses are also listed as 'Down' in the ukdomain
> collection (except for :8983 which only shows in the ldwa01 collection).
>
> Any help very gratefully received.
> Gil
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 23 October 2013 18:50
> To: solr-user@lucene.apache.org
> Subject: Re: New shard leaders or existing shard replicas depends on
> zookeeper?
>
> My first impulse would be to ask how you created the collection. It sure
> _sounds_ like you di

Re: Query & result caching with custom functions

2013-10-24 Thread Mathias Lux
That's a possibility,  I'll try that and report on the effects.  Thanks,
Mathias
Am 24.10.2013 16:52 schrieb "Joel Bernstein" :

> Mathias,
>
> I'd have to do a close review of the function sort code to be sure, but I
> suspect if you implement the equals() method on the ValueSource it should
> solve your caching issue. Also implement hashCode().
>
> Joel
>
>
> On Thu, Oct 24, 2013 at 10:35 AM, Shawn Heisey  wrote:
>
> > On 10/24/2013 5:35 AM, Mathias Lux wrote:
> > > I've written a custom function, which is able to provide a distance
> > > based on some DocValues to re-sort result lists. This basically works
> > > great, but we've got the problem that if I don't change the query, but
> > > the function parameters, Solr delivers a cached result without
> > > re-ordering. I turned off caching and see there, problem solved. But
> > > of course this is not a avenue I want to pursue further as it doesn't
> > > make sense for a prodcutive system.
> > >
> > > Do you have any ideas (beyond fake query modification and turning off
> > > caching) to counteract?
> > >
> > > btw. I'm using Solr 4.4 (so if you are aware of the issue and it has
> > > been resolved in 4.5 I'll port it :) The code I'm using is at
> > > https://bitbucket.org/dermotte/liresolr
> >
> > I suspect that the queryResultCache is not paying attention to the fact
> > that parameters for your plugin have changed.  This probably means that
> > your plugin must somehow inform the "cache check" code that something
> > HAS changed.
> >
> > How you actually do this is a mystery to me because it involves parts of
> > the code that are beyond my understanding, but it MIGHT involve making
> > sure that parameters related to your code are saved as part of the entry
> > that goes into the cache.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Proposal for new feature, cold replicas, brainstorming

2013-10-24 Thread Yago Riveiro
With a shard with "listening" status and some logic on the mechanism that does 
the load balancing between replicas, we can achieve the goal.

The SPLITSHARD action makes replicas from the original shard which are in 
"inactive" state, this shards buffering the updates and when the operation 
ends, the parent shard becomes "inactive" and the new replicas are promoted to 
"active" state.

Like "inactive" state, we can have a "listening" state that never becomes 
"active", unless a leader election operation happen and this shard with 
"listening" status be the unique that is alive.

In addition, is necessary add new metadata to the shard on clusterstate.json 
file to mark that replica as a replica with replication purposes, and resigns 
when other replica becomes active.


-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, October 24, 2013 at 2:16 PM, Toke Eskildsen wrote:

> On Thu, 2013-10-24 at 13:27 +0200, yriveiro wrote:
> > The motivation of this is simple, I want have replication but I don't want
> > have n replicas actives with full resources allocated (cache and so on).
> > This is usefull in enviroments where replication is needed but a high query
> > throughput is not fundamental and the resources are limited.
> > 
> 
> 
> Coincidentally we recently talked about the exact same setup.
> 
> We are looking at sharding a 20 TB index into 20 * 1 TB shards, each
> located on their own dedicated physical SSD, which has more than enough
> horsepower for our needs. For replication, we have a remote storage
> system capable of serving requests for 2-4 shards with acceptable
> latency.
> 
> Projected performance for the SSD setup is superior (5-10 times) to our
> remote storage, so we would like to hit only the SSDs if possible.
> Setting up a cloud to issue all requests to the SSD-shards unless a
> catastrophic failure happened to on of them and in that case fallback to
> the remote story replica for only that shard, would be perfect.
> 
> > I know that right now is not possible, but I think that it's a feature that
> > can be implemented in a easy way creating a new status for shards.
> > 
> 
> 
> shardIsLastResort=true? On paper it seems like a simple addition, but I
> am not at familiar enough with the SolrCloud-code to guess if it is easy
> to implement.
> 
> - Toke Eskildsen, State and University Library, Denmark 



[ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Mark Miller
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

October 2013, Apache Solr™ 4.5.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic clustering,
database integration, rich document (e.g., Word, PDF) handling, and
geospatial search. Solr is highly scalable, providing fault tolerant
distributed search and indexing, and powers the search and navigation
features of many of the world's largest internet sites.

Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug
fixes. The release is available for immediate download at:

http://lucene.apache.org/solr/mirrors-solr-latest-redir.html


See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using
may not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.

Happy searching,

Lucene/Solr developers
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW
z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl
wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn
4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF
xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90
ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW
Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM
FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r
tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX
PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9
3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z
xrG1W1o2sjrYkioJ7nZK
=8++G
-END PGP SIGNATURE-


Re: [ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Jack Park
Download redirects to 4.5.0
Is there a typo in the server path?

On Thu, Oct 24, 2013 at 9:14 AM, Mark Miller  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> October 2013, Apache Solr™ 4.5.1 available
>
> The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1
>
> Solr is the popular, blazing fast, open source NoSQL search platform
> from the Apache Lucene project. Its major features include powerful
> full-text search, hit highlighting, faceted search, dynamic clustering,
> database integration, rich document (e.g., Word, PDF) handling, and
> geospatial search. Solr is highly scalable, providing fault tolerant
> distributed search and indexing, and powers the search and navigation
> features of many of the world's largest internet sites.
>
> Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug
> fixes. The release is available for immediate download at:
>
> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>
>
> See the CHANGES.txt file included with the release for a full list of
> changes and further details.
>
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
>
> Note: The Apache Software Foundation uses an extensive mirroring network
> for distributing releases. It is possible that the mirror you are using
> may not have replicated the release yet. If that is the case, please try
> another mirror. This also goes for Maven access.
>
> Happy searching,
>
> Lucene/Solr developers
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW
> z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl
> wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn
> 4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF
> xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90
> ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW
> Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM
> FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r
> tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX
> PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9
> 3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z
> xrG1W1o2sjrYkioJ7nZK
> =8++G
> -END PGP SIGNATURE-


Re: [ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Jack Park
Use a different server than default gets 4.5.1

On Thu, Oct 24, 2013 at 9:35 AM, Jack Park  wrote:
> Download redirects to 4.5.0
> Is there a typo in the server path?
>
> On Thu, Oct 24, 2013 at 9:14 AM, Mark Miller  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> October 2013, Apache Solr™ 4.5.1 available
>>
>> The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1
>>
>> Solr is the popular, blazing fast, open source NoSQL search platform
>> from the Apache Lucene project. Its major features include powerful
>> full-text search, hit highlighting, faceted search, dynamic clustering,
>> database integration, rich document (e.g., Word, PDF) handling, and
>> geospatial search. Solr is highly scalable, providing fault tolerant
>> distributed search and indexing, and powers the search and navigation
>> features of many of the world's largest internet sites.
>>
>> Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug
>> fixes. The release is available for immediate download at:
>>
>> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>>
>>
>> See the CHANGES.txt file included with the release for a full list of
>> changes and further details.
>>
>> Please report any feedback to the mailing lists
>> (http://lucene.apache.org/solr/discussion.html)
>>
>> Note: The Apache Software Foundation uses an extensive mirroring network
>> for distributing releases. It is possible that the mirror you are using
>> may not have replicated the release yet. If that is the case, please try
>> another mirror. This also goes for Maven access.
>>
>> Happy searching,
>>
>> Lucene/Solr developers
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.14 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW
>> z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl
>> wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn
>> 4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF
>> xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90
>> ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW
>> Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM
>> FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r
>> tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX
>> PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9
>> 3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z
>> xrG1W1o2sjrYkioJ7nZK
>> =8++G
>> -END PGP SIGNATURE-


Re: Changing indexed property on a field from false to true

2013-10-24 Thread Aloke Ghoshal
 Upayavira - Nice idea pushing in a nominal update when all fields are
stored, and it does work. The nominal update could be sent to a boolean
type dynamic field, that's not to be used for anything other than maybe
identifying documents that are done re-indexing.


On Wed, Oct 23, 2013 at 7:47 PM, Upayavira  wrote:

> The content needs to be re-indexed, the question is whether you can use
> the info in the index to do it rather than pushing fresh copies of the
> documents to the index.
>
> I've often wondered whether atomic updates could be used to handle this
> sort of thing. If all fields are stored, push a nominal update to cause
> the document to be re-indexed. I've never tried it though. I'd be
> curious to know if it works.
>
> Upayavira
>
> On Wed, Oct 23, 2013, at 02:25 PM, michael.boom wrote:
> > Being given
> > indexed="false"* stored="true"
> > multiValued="false" />
> > Changed to
> > indexed="true"* stored="true"
> > multiValued="false" />
> >
> > Once the above is done and the collection reloaded, is there a way I can
> > build that index on that field, without reindexing the everything?
> >
> > Thank you!
> >
> >
> >
> > -
> > Thanks,
> > Michael
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: New query-time multi-word synonym expander

2013-10-24 Thread Otis Gospodnetic
Jack - watch https://issues.apache.org/jira/browse/SOLR-5379 -
comments from the author are there.
Markus - ah, yes.  I see I even managed to (re)name SOLR-5379
*exactly* the same as SOLR-4381 :)  But the author of SOLR-5379 points
out its advantages over SOLR-4381.

Would be great if people could try it and leave comments with any
issues, so we can iterate on the patch to make it committable.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Oct 23, 2013 at 1:13 PM, Markus Jelsma
 wrote:
> Nice, but now we got three multi-word synonym parsers? Didn't the LUCENE-4499 
> or SOLR-4381 patches work? I know the latter has had a reasonable amount of 
> users and committers on github, but it was never brought back to ASF it seems.
>
> -Original message-
>> From:Otis Gospodnetic 
>> Sent: Wednesday 23rd October 2013 18:54
>> To: solr-user@lucene.apache.org
>> Subject: New query-time multi-word synonym expander
>>
>> Hi,
>>
>> Heads up that there is new query-time multi-word synonym expander
>> patch in https://issues.apache.org/jira/browse/SOLR-5379
>>
>> This worked for our customer and we hope it works for others.
>>
>> Any feedback would be greatly appreciated.
>>
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com
>>


Re: Changing indexed property on a field from false to true

2013-10-24 Thread Upayavira
When this gets interesting is if we had batch atomic updates. Imagine
you could do indexCount++ fro all docs matching the query
category:sport. Could be really useful. /dreaming.

Upayavira

On Thu, Oct 24, 2013, at 05:40 PM, Aloke Ghoshal wrote:
>  Upayavira - Nice idea pushing in a nominal update when all fields are
> stored, and it does work. The nominal update could be sent to a boolean
> type dynamic field, that's not to be used for anything other than maybe
> identifying documents that are done re-indexing.
> 
> 
> On Wed, Oct 23, 2013 at 7:47 PM, Upayavira  wrote:
> 
> > The content needs to be re-indexed, the question is whether you can use
> > the info in the index to do it rather than pushing fresh copies of the
> > documents to the index.
> >
> > I've often wondered whether atomic updates could be used to handle this
> > sort of thing. If all fields are stored, push a nominal update to cause
> > the document to be re-indexed. I've never tried it though. I'd be
> > curious to know if it works.
> >
> > Upayavira
> >
> > On Wed, Oct 23, 2013, at 02:25 PM, michael.boom wrote:
> > > Being given
> > > indexed="false"* stored="true"
> > > multiValued="false" />
> > > Changed to
> > > indexed="true"* stored="true"
> > > multiValued="false" />
> > >
> > > Once the above is done and the collection reloaded, is there a way I can
> > > build that index on that field, without reindexing the everything?
> > >
> > > Thank you!
> > >
> > >
> > >
> > > -
> > > Thanks,
> > > Michael
> > > --
> > > View this message in context:
> > >
> > http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >


Re: Multiple facet fields in "defaults" section of a Request Handler

2013-10-24 Thread Chris Hostetter

: Now a client wants to use multi select faceting. He calls the following API:
: 
http://localhost:8983/solr/collection1/search?q=*:*&facet.field={!ex=foo}category&fq={!tag=foo}category
: :"cat"

: Putting the facet definitions in "appends" cases it to facet category 2
: times.
: 
: Is there a way where he does not have to provide all the facet.field
: parameters in the API call?

What you are asking is essentially "I want to configure faceting on X and 
Y by default, but i want clients to be able to add faceting on Z and have 
that disable faceting on X while still faceting on Y"

It doens't matter that X and Z are both field facets based arround the 
field name "category" -- the tag exclusion makes them completley 
different.

The basic default/invariants/appends logic doesn't give you any easy 
mechanism to ignore arbitrary params like that - you could probably write 
a custom component that inspected the params and droped ones you don't 
want, but this wouldn't make sense as generalized logic in the 
FacetComponent since faceting on a field both with and w/o a tag 
expclusion at the same time is a very common use case.




-Hoss


Re: Solr subset searching in 100-million document index

2013-10-24 Thread Sandeep Gupta
Hi Joel,

Thanks a lot for the information - I haven't worked with PostFilter's
before but found an example at
http://java.dzone.com/articles/custom-security-filtering-solr.

Will try it over the next few days and come back if still have questions.

Thanks again!



Keep Walking,
~ Sandeep


On Thu, Oct 24, 2013 at 8:25 PM, Joel Bernstein  wrote:

> Sandeep,
>
> This type of operation can often be expressed as a PostFilter very
> efficiently. This is particularly true if the region id's are integer keys.
>
> Joel
>
> On Thu, Oct 24, 2013 at 7:46 AM, Sandeep Gupta 
> wrote:
>
> > Hi,
> >
> > We have a Solr index of around 100 million documents with each document
> > being given a region id growing at a rate of about 10 million documents
> per
> > month - the average document size being aronud 10KB of pure text. The
> total
> > number of region ids are themselves in the range of 2.5 million.
> >
> > We want to search for a query with a given list of region ids. The number
> > of region ids in this list is usually around 250-300 (most of the time),
> > but can be upto 500, with a maximum cap of around 2000 ids in one
> request.
> >
> >
> > What is the best way to model such queries besides using an IN param in
> the
> > query, or using a Filter FQ in the query? Are there any other faster
> > methods available?
> >
> >
> > If it may help, the index is on a VM with 4 virtual-cores and has
> currently
> > 4GB of Java memory allocated out of 16GB in the machine. The number of
> > queries do not exceed more than 1 per minute for now. If needed, we can
> > throw more hardware to the index - but the index will still be only on a
> > single machine for atleast 6 months.
> >
> > Regards,
> > Sandeep Gupta
> >
>
>
>
> --
>


Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for the 
field AP (applicant field (patent notice))


but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboard&version=2.2&start=0&rows=10&indent=on&facet=true&f.ap.facet.limit=10

Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

 http://wiki.apache.org/solr/SimpleFacetParameters




On Oct 24, 2013, at 5:23 AM, Bruno Mannina  wrote:


Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but it's for the 
whole database.

I have 2 questions:
- Is it possible to increase the number of statistic ? actually I have the 10 
first frequency term.

- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com







---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

humm facet perfs are very bad (Solr 3.6.0)
My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram)

I thought facets will work only on the result but it seems it's not the 
case.


My request:
http://localhost:2727/solr/select?q=ti:snowboard&rows=0&facet=true&facet.field=ap&facet.limit=5

Do you think my request is wrong ?

Maybe it's not possible to have statistic on a field (like Terms 
function) on a query.


Thx for your help,

Bruno


Le 24/10/2013 19:40, Bruno Mannina a écrit :

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for the 
field AP (applicant field (patent notice))


but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboard&version=2.2&start=0&rows=10&indent=on&facet=true&f.ap.facet.limit=10 



Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

 http://wiki.apache.org/solr/SimpleFacetParameters




On Oct 24, 2013, at 5:23 AM, Bruno Mannina  wrote:


Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but 
it's for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I 
have the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks




---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com







---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

Just a little precision: solr down after running my URL :( so bad...

Le 24/10/2013 22:04, Bruno Mannina a écrit :

humm facet perfs are very bad (Solr 3.6.0)
My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram)

I thought facets will work only on the result but it seems it's not 
the case.


My request:
http://localhost:2727/solr/select?q=ti:snowboard&rows=0&facet=true&facet.field=ap&facet.limit=5 



Do you think my request is wrong ?

Maybe it's not possible to have statistic on a field (like Terms 
function) on a query.


Thx for your help,

Bruno


Le 24/10/2013 19:40, Bruno Mannina a écrit :

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for 
the field AP (applicant field (patent notice))


but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboard&version=2.2&start=0&rows=10&indent=on&facet=true&f.ap.facet.limit=10 



Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

 http://wiki.apache.org/solr/SimpleFacetParameters




On Oct 24, 2013, at 5:23 AM, Bruno Mannina  wrote:


Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but 
it's for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I 
have the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks




---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com







---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Join Query Behavior

2013-10-24 Thread Andy Pickler
We're attempting to upgrade from Solr 4.2 to 4.5 but are finding that 4.5
is not "honoring" this join query:

...
&
fq={!join from=project_id_i to=project_id_im}user_id_i:65615 -role_id_i:18
type:UserRole
&


On our Solr 4.2 instance adding/removing that query gives us different (and
expected) results, while the query doesn't affect the results at all in
4.5.  Is there any known join query behavior differences/fixes between 4.2
and 4.5 that might explain this, or should I be looking at other factors?

Thanks,
Andy Pickler


Post filter cache question

2013-10-24 Thread Eric Grobler
Hi

If I run this query it is very fast (<10 ms) because it uses a "TopList"
filter:
q=*:*
fl=adr_geopoint,adr_city,filterflags
*fq=(filterflags:TopList) *
and the number of relevant documents are 3000 out of 7 million.

If I run the same query but add a spatial filter with cost:
q=*:*
fl=adr_geopoint,adr_city,filterflags
*fq=(filterflags:TopList) *
pt=49.594,8.468
sfield=adr_geopoint
fq={!bbox d=30}
fq={!frange l=15 u=30 *cache=false *cost=200}geodist()

It takes over 3 seconds even though it should only scan around 3000
documents from the first cached filter?
Could it be a problem with my cache settings in solrconfig.xml (solr 3.1)
or is my query wrong?

Thanks & regards
Ericz


Re: measure result set quality

2013-10-24 Thread Chris Hostetter

: As a first approach I will evaluate (manually :( ) hits that are out of the
: intersection set for every query in each system. Anyway I will keep

FYI: LucidWorks has a "Relevancy Workbench" tool that serves as a simple 
UI designed explicitly for the purpose of comparing the result sets of 
from different solr query configurations...

http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/


-Hoss


Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Jonathan Rochkind
This is good to know, and I find it welcome advice; I would recommend 
making sure this advice is clearly highlighted in the relevant Solr 
docs, such as any getting started docs.


I'm not sure everyone realizes this, and some go down tomcat route 
without realizing the Solr committers recommend jetty -- or use a stock 
jetty without realizing the 'example' jetty is recommended and actually 
intended to be used by Solr users in production!  I think it's easy to 
not catch this advice.


On 10/20/13 5:55 PM, Shawn Heisey wrote:

On 10/20/2013 2:57 PM, Shawn Heisey wrote:

We recommend jetty.  The solr example uses jetty.


I have a clarification for this statement.  We actually recommend using
the jetty that's included in the Solr 4.x example.  It is stripped of
all unnecessary features and its config has had some minor tuning so
it's optimized for Solr.  The jetty binaries in 4.x are completely
unmodified from the upstream download, we just don't include all of
them.  On the 1.x and 3.x examples, there was a small bug in Jetty 6, so
those versions included modified binaries.

If you download jetty from eclipse.org or install it from your operating
system's repository, it will include components you don't need and its
config won't be optimized for Solr, but it will still be a lot closer to
what's actually tested than tomcat is.

Thanks,
Shawn



Re: Post filter cache question

2013-10-24 Thread Chris Hostetter

: Could it be a problem with my cache settings in solrconfig.xml (solr 3.1)
: or is my query wrong?

3.1? ouch ... PostFilter wasn't even added until 3.4...
https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

...so your spatial filter is definitely being applied to the entire index 
and then getting cached.

. . .

Below is what i wrote before i saw that 3.4 comment at the end of your 
email...

: If I run the same query but add a spatial filter with cost:
: q=*:*
: fl=adr_geopoint,adr_city,filterflags
: *fq=(filterflags:TopList) *
: pt=49.594,8.468
: sfield=adr_geopoint
: fq={!bbox d=30}
: fq={!frange l=15 u=30 *cache=false *cost=200}geodist()
: 
: It takes over 3 seconds even though it should only scan around 3000
: documents from the first cached filter?

You've also added a "bbox" filter, which will be computed against the 
entire index and cached.

I'm not sure whta FieldType you are using, and i don't know a lot of the 
detials about hte spatial queries -- but things you should look into...

1) does the bbox gain you anything if you are already doing the geodist 
filter as a post filter?  (my hunch would be that the only point of a bbox 
fq is if you are *scoring* documents by distance and you want to ignore 
things beyond a set distance)

2) does {!bbox} support PostFilter on your FieldType? does 
adding "cache=false cost=150" to the bbox filter improve things?



-Hoss


Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt
I agree with Jonathan (and Shawn on the Jetty explanation), I think the
docs should make this a bit more clear - I notice many people choosing
Tomcat and then learning these details after, possibly regretting it.

I'd be glad to modify the docs but I want to be careful how it is worded.
Is it fair to go as far as saying Jetty is 100% THE "recommended" container
for Solr, or should a recommendation be avoided, and maybe just a list of
pros/cons?

Cheers,

Tim


Re: Reclaiming disk space from (large, optimized) segments

2013-10-24 Thread Chris Hostetter

I didn't dig into the details of your mail too much, but a few things 
jumped out at me...

: - At some time in the past, a manual force merge / optimize with
: maxSegments=2 was run to troubleshoot high disk i/o and remove "too many

Have you tried a simple commit using expungeDeletes=true?  It should be a 
little less intensive then a optimizing.  (under the covers it does 
IndexWriter.forceMergeDeletes())


: - Merge policies are all at Solr 4 defaults. Index size is currently ~50M
: maxDocs, ~35M numDocs, 276GB.

"Solr 4 defaults" is way to vague to be meaningful: 4.0? 4.1? ... 4.4? 

Do you mean you are using the example configs that came with that version 
of Solr, or do you mean you have no mergePolicy configured and you are 
getting the hardcoded defaults? .. either way it's important to specify 
exactly which version of Solr are you running and exactly what does your 
entire  section looks like since both the example configs 
and the hardcoded default behavior when configs aren't specified have 
evolved since 4.0-ALPHA.



-Hoss


Problem with glassfish and zookeeper 3.4.5

2013-10-24 Thread kaustubh147
Hi,

Glassfish 3.1.2.2
Solr 4.5
Zookeeper 3.4.5

We have set up a SolrCloud with 4 Solr nodes and 3 zookeeper instances. It
seems to be working fine from Solr admin page.

but when I am trying to connect it to web application using Solrj 4.5.
I am creating my Solr Cloud Server as suggested on the wiki page

LBHttpSolrServer lbHttpSolrServer = new LBHttpSolrServer(
SOLR_INSTANCE01,
SOLR_INSTANCE02,
SOLR_INSTANCE03,
SOLR_INSTANCE04);
solrServer = new CloudSolrServer(zk1:p1, zk2:p1, zk3:p1, lbHttpSolrServer);
solrServer.setDefaultCollection(collection);


It seems to be working fine for a while even though I am getting a WARNING
as below
-
SASL configuration failed: javax.security.auth.login.LoginException: No JAAS
configuration section named 'Client' was found in specified JAAS
configuration file: 'XYZ_path/SolrCloud_04/config/login.conf'. Will continue
connection to Zookeeper server without SASL authentication,​ if Zookeeper
server allows it.
--

The application is deployed on a single node cluster on glassfish. 

as soon as my application has made some queries to the Solr server it will
start throwing error in the solrServer.runQuery() method. The reason of the
error is not clear..

Application logs shows following error trace many times...

-
[#|2013-10-24T14:07:53.750-0700|WARNING|glassfish3.1.2|org.apache.zookeeper.ClientCnxn|_ThreadID=1434;_ThreadName=Thread-2;|SASL
configuration failed: javax.security.auth.login.LoginException: No JAAS
configuration section named 'Client' was found in specified JAAS
configuration file: 'XYZ_PATH/config/login.conf'. Will continue connection
to Zookeeper server without SASL authentication, if Zookeeper server allows
it.|#]

[#|2013-10-24T14:07:53.750-0700|INFO|glassfish3.1.2|org.apache.zookeeper.ClientCnxn|_ThreadID=1434;_ThreadName=Thread-2;|Opening
socket connection to server server_name/IP3:2181|#]

[#|2013-10-24T14:07:53.750-0700|INFO|glassfish3.1.2|org.apache.solr.common.cloud.ConnectionManager|_ThreadID=1435;_ThreadName=Thread-2;|Watcher
org.apache.solr.common.cloud.ConnectionManager@187eaada
name:ZooKeeperConnection Watcher:IP1:2181,IP2:2181,IP3:2181 got event
WatchedEvent state:AuthFailed type:None path:null path:null type:None|#]

[#|2013-10-24T14:07:53.750-0700|INFO|glassfish3.1.2|org.apache.solr.common.cloud.ConnectionManager|_ThreadID=1435;_ThreadName=Thread-2;|Client->ZooKeeper
status change trigger but we are already closed|#]

[#|2013-10-24T14:07:53.751-0700|INFO|glassfish3.1.2|org.apache.zookeeper.ClientCnxn|_ThreadID=1434;_ThreadName=Thread-2;|Socket
connection established to server_name/IP3:2181, initiating session|#]

[#|2013-10-24T14:07:53.751-0700|INFO|glassfish3.1.2|org.apache.solr.common.cloud.ConnectionManager|_ThreadID=1420;_ThreadName=Thread-2;|Watcher
org.apache.solr.common.cloud.ConnectionManager@4ba50169
name:ZooKeeperConnection Watcher:IP1:2181,IP2:2181,IP3:2181 got event
WatchedEvent state:Disconnected type:None path:null path:null type:None|#]

[#|2013-10-24T14:07:53.751-0700|WARNING|glassfish3.1.2|org.apache.zookeeper.ClientCnxn|_ThreadID=1434;_ThreadName=Thread-2;|Session
0x0 for serverserver_name/IP3:2181, unexpected error, closing socket
connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:166)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
|#]


--

before this happen the zookeeper logs on all the 3 instances starts showing
following warning

2013-10-24 14:05:55,200 [myid:3] - WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too
many connections from /IP_APPLICATION_SEVER - max is 200

it means that my application is making too many connections with the
zookeeper and it is exceeding the limit which is set to 200.


Is there a way I can control the number of connections my application is
making with the zookeeper.
The only component which is connecting to zookeeper in my application is
CloudSolrServer object.

As per my investigation SASL warning is related to a existing bug in
Zookeeper 3.4.5 and is being solved for Zookeeper 3.5 and it should not
cause this issue

I need help and guidance..

Thanks,
Kaustubh













--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-gla

Re: Problem with glassfish and zookeeper 3.4.5

2013-10-24 Thread Shawn Heisey

On 10/24/2013 4:30 PM, kaustubh147 wrote:

Glassfish 3.1.2.2
Solr 4.5
Zookeeper 3.4.5

We have set up a SolrCloud with 4 Solr nodes and 3 zookeeper instances. It
seems to be working fine from Solr admin page.

but when I am trying to connect it to web application using Solrj 4.5.
I am creating my Solr Cloud Server as suggested on the wiki page

LBHttpSolrServer lbHttpSolrServer = new LBHttpSolrServer(
SOLR_INSTANCE01,
SOLR_INSTANCE02,
SOLR_INSTANCE03,
SOLR_INSTANCE04);
solrServer = new CloudSolrServer(zk1:p1, zk2:p1, zk3:p1, lbHttpSolrServer);
solrServer.setDefaultCollection(collection);


If this is what you are seeing as instructions for connecting from SolrJ 
to SolrCloud, then something's really screwy.  Can you give me the URL 
that shows this, so I can see about getting it changed?  The following 
code example is how you should be doing that.  For this example, 
zookeeper is using the default port of 2181 and the zookeeper hosts are 
zoo1, zoo2, and zoo3.


String zkHost = "zoo1:2181,zoo2:2181,zoo3:2181";
// If you are using a chroot, use something like this instead:
// String zkHost = "zoo1:2181,zoo2:2181,zoo3:2181/chroot";
CloudSolrServer server = new CloudSolrServer(zkHost);
server.setDefaultCollection("collection1");


It seems to be working fine for a while even though I am getting a WARNING
as below
-
SASL configuration failed: javax.security.auth.login.LoginException: No JAAS
configuration section named 'Client' was found in specified JAAS
configuration file: 'XYZ_path/SolrCloud_04/config/login.conf'. Will continue
connection to Zookeeper server without SASL authentication,​ if Zookeeper
server allows it.
--


Later you said this sounds like a bug you saw in ZK 3.4.5. It may be 
that glassfish turns on some system-wide setting related to 
authentication that zookeeper picks up on.  I would tend to agree that 
this probably is not related to the other problems mentioned below.



before this happen the zookeeper logs on all the 3 instances starts showing
following warning

2013-10-24 14:05:55,200 [myid:3] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too
many connections from /IP_APPLICATION_SEVER - max is 200

it means that my application is making too many connections with the
zookeeper and it is exceeding the limit which is set to 200.


Are you creating one CloudSolrServer object (static would be OK) and 
using it for all interaction with SolrCloud, or are you creating many 
CloudSolrServer objects over the life of your application?  Is there 
more than one thread or instance of your application running, and each 
one has its own CloudSolrServer object?  It is strongly recommended that 
you only create one object for your entire application and use it for 
all queries, updates, etc.  You can set the "collection" parameter on 
each query or request object that you use, if you need to use more than one.


If you *DO* create many CloudSolrServer objects over the life of your 
application and cannot immediately change your code so that it uses one 
object, be sure to shutdown() each one when it is no longer required.  
Depending on the exact nature of your application, you may also need to 
increase the maximum number of connections allowed in your zookeeper config.


Thanks,
Shawn



Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Anshum Gupta
Thought you may want to have a look at this:

https://issues.apache.org/jira/browse/SOLR-4792

P.S: There are no timelines for 5.0 for now, but it's the future
nevertheless.



On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt wrote:

> I agree with Jonathan (and Shawn on the Jetty explanation), I think the
> docs should make this a bit more clear - I notice many people choosing
> Tomcat and then learning these details after, possibly regretting it.
>
> I'd be glad to modify the docs but I want to be careful how it is worded.
> Is it fair to go as far as saying Jetty is 100% THE "recommended" container
> for Solr, or should a recommendation be avoided, and maybe just a list of
> pros/cons?
>
> Cheers,
>
> Tim
>



-- 

Anshum Gupta
http://www.anshumgupta.net


Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt
Hmm, thats an interesting move. I'm on the fence on that one but it surely
simplifies some things. Good info, thanks!

Tim


On 24 October 2013 16:46, Anshum Gupta  wrote:

> Thought you may want to have a look at this:
>
> https://issues.apache.org/jira/browse/SOLR-4792
>
> P.S: There are no timelines for 5.0 for now, but it's the future
> nevertheless.
>
>
>
> On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt  >wrote:
>
> > I agree with Jonathan (and Shawn on the Jetty explanation), I think the
> > docs should make this a bit more clear - I notice many people choosing
> > Tomcat and then learning these details after, possibly regretting it.
> >
> > I'd be glad to modify the docs but I want to be careful how it is worded.
> > Is it fair to go as far as saying Jetty is 100% THE "recommended"
> container
> > for Solr, or should a recommendation be avoided, and maybe just a list of
> > pros/cons?
> >
> > Cheers,
> >
> > Tim
> >
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net
>


Re: Post filter cache question

2013-10-24 Thread Eric Grobler
Hi Chris

Thank you for your response.
I will try to migrate to Solr 4.4 first!

Best regards



On Thu, Oct 24, 2013 at 10:44 PM, Chris Hostetter
wrote:

>
> : Could it be a problem with my cache settings in solrconfig.xml (solr 3.1)
> : or is my query wrong?
>
> 3.1? ouch ... PostFilter wasn't even added until 3.4...
> https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters
>
> ...so your spatial filter is definitely being applied to the entire index
> and then getting cached.
>
> . . .
>
> Below is what i wrote before i saw that 3.4 comment at the end of your
> email...
>
> : If I run the same query but add a spatial filter with cost:
> : q=*:*
> : fl=adr_geopoint,adr_city,filterflags
> : *fq=(filterflags:TopList) *
> : pt=49.594,8.468
> : sfield=adr_geopoint
> : fq={!bbox d=30}
> : fq={!frange l=15 u=30 *cache=false *cost=200}geodist()
> :
> : It takes over 3 seconds even though it should only scan around 3000
> : documents from the first cached filter?
>
> You've also added a "bbox" filter, which will be computed against the
> entire index and cached.
>
> I'm not sure whta FieldType you are using, and i don't know a lot of the
> detials about hte spatial queries -- but things you should look into...
>
> 1) does the bbox gain you anything if you are already doing the geodist
> filter as a post filter?  (my hunch would be that the only point of a bbox
> fq is if you are *scoring* documents by distance and you want to ignore
> things beyond a set distance)
>
> 2) does {!bbox} support PostFilter on your FieldType? does
> adding "cache=false cost=150" to the bbox filter improve things?
>
>
>
> -Hoss
>


First test cloud error question...

2013-10-24 Thread Jack Park
Background: all testing done on a Win7 platform. This is my first
migration from a single Solr server to a simple cloud. Everything is
configured exactly as specified in the wiki.

I created a simple 3-node client, all localhost with different server
URLs, and a lone external zookeeper.  The online admin shows they are
all up.

I then start an agent which sends in documents to "bootstrap" the
index. That's when issues start.  A clip from the log shows this:
First, I create a SolrDocument with this JSON data:

DEBUG 2013-10-24 18:00:09,143 [main] - SolrCloudClient.mapToDocument-
{"locator":"EssayNodeType","smallIcon":"\/images\/cogwheel.png","subOf":["NodeType"],"details":["The
TopicQuests NodeTypes typology essay
type."],"isPrivate":"false","creatorId":"SystemUser","label":["Essay
Type"],"largeIcon":"\/images\/cogwheel_sm.png","lastEditDate":Thu Oct
24 18:00:09 PDT 2013,"createdDate":Thu Oct 24 18:00:09 PDT 2013}

Then, send it in from SolrJ which has a CloudSolrServer initialized
with localhost:2181 and an instance of LBHttpSolrServer initialized
with http://localhost:8983/solr/

That trace follows

INFO  2013-10-24 18:00:09,145 [main] - Initiating client connection,
connectString=localhost:2181 sessionTimeout=1
watcher=org.apache.solr.common.cloud.ConnectionManager@e6c
INFO  2013-10-24 18:00:09,148 [main] - Waiting for client to connect
to ZooKeeper
INFO  2013-10-24 18:00:09,150 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
- Opening socket connection to server
0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate
using SASL (Unable to locate a login configuration)
ERROR 2013-10-24 18:00:09,151 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
- Unable to open socket to 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181
WARN  2013-10-24 18:00:09,151 [main-SendThread(0:0:0:0:0:0:0:1:2181)]
- Session 0x0 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.SocketException: Address family not supported by protocol
family: connect
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:266)

I can watch the Zookeeper console running; it's mostly complaining
about too many connections from /127.0.0.1 ; I am seeing the errors in
the agent's log file.

Following that trace in the log is this:

INFO  2013-10-24 18:00:09,447 [main-SendThread(127.0.0.1:2181)] -
Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not
attempt to authenticate using SASL (Unable to locate a login
configuration)
INFO  2013-10-24 18:00:09,448 [main-SendThread(127.0.0.1:2181)] -
Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating
session
DEBUG 2013-10-24 18:00:09,449 [main-SendThread(127.0.0.1:2181)] -
Session establishment request sent on 127.0.0.1/127.0.0.1:2181
DEBUG 2013-10-24 18:00:09,449 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
INFO  2013-10-24 18:00:09,501 [main-SendThread(127.0.0.1:2181)] -
Session establishment complete on server 127.0.0.1/127.0.0.1:2181,
sessionid = 0x141ece7e6160017, negotiated timeout = 1
INFO  2013-10-24 18:00:09,501 [main-EventThread] - Watcher
org.apache.solr.common.cloud.ConnectionManager@42bad8a8
name:ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent
state:SyncConnected type:None path:null path:null type:None
INFO  2013-10-24 18:00:09,502 [main] - Client is connected to ZooKeeper
DEBUG 2013-10-24 18:00:09,502 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,502 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,503 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,503 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,504 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,504 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,505 [main-SendThread(127.0.0.1:2181)] -
Could not retrieve login configuration: java.lang.SecurityException:
Unable to locate a login configuration
DEBUG 2013-10-24 18:00:09,506 [main-SendThread(127.0.0.1:2181)] -
Reading reply sessionid:0x141ece7e6160017, packet:: clientPath:null
serverPath:null finished:false header:: 1,3  replyHeader:: 1,541,0
request:: '/clus

Solr 4.5.1 and Illegal to have multiple roots (start tag in epilog?). (perhaps SOLR-4327 bug?)

2013-10-24 Thread Michael Tracey
Hey Solr-users,

I've got a single solr 4.5.1 node with 96GB ram, a 65GB index (105 million 
records) and a lot of daily churn of newly indexed files (auto softcommit and 
commits).  I'm trying to bring another matching node into the mix, and am 
getting these errors on the new node:

org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: 
Illegal to have multiple roots (start tag in epilog?).

On the old server, still running, I'm getting: 

shard update error StdNode: 
http://server1:/solr/collection/:org.apache.solr.client.solrj.SolrServerException:
 Server refused connection at: http://server2:/solr/collection

the new core never actually comes online, stays in recovery mode.  The other 
two tiny cores (100,000+ records each and not updated frequently), work just 
fine.

is this SOLR-4327 bug?  https://issues.apache.org/jira/browse/SOLR-5331   And 
if so, how can I get the new node up and running so I can get back in 
production with some redundancy and speed?

I'm running an external zookeeper, and that is all running just fine.  Also 
internal Solrj/jetty with little to no modifications.  

Any ideas would be appreciated, thanks, 

M.


Solr indexing on email mime body and attachment

2013-10-24 Thread neerajp
Hi,
I am integrating solr search engine with my email clients. I am sending POST
request to Solr using REST.
I am successfully able to post email's to, from, subject etc headers to solr
for making index.
Since email can have mime type bodies and attachments so I am not able to
understand how to post the email body and attachment so that solr could make
indexing.

Any help is highly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-on-email-mime-body-and-attachment-tp4097692.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reclaiming disk space from (large, optimized) segments

2013-10-24 Thread Otis Gospodnetic
Only skimmed your email, but purge every 4 hours jumped out at me. Would it
make sense to have time-based indices that can be periodically dropped
instead of being purged?

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Oct 23, 2013 10:33 AM, "Scott Lundgren" 
wrote:

> *Background:*
>
> - Our use case is to use SOLR as a massive FIFO queue.
>
> - Document additions and updates happen continuously.
>
> - Documents are being added at sustained a rate of 50 - 100 documents
> per second.
>
> - About 50% of these document are updates to existing docs, indexed
> using atomic updates: the original doc is thus deleted and re-added.
>
> - There is a separate purge operation running every four hours that deletes
> the oldest docs, if required based on a number of unrelated configuration
> parameters.
>
> - At some time in the past, a manual force merge / optimize with
> maxSegments=2 was run to troubleshoot high disk i/o and remove "too many
> segments" as a potential variable.  Currently, the largest fdts are 74G and
> 43G.   There are 47 total segments, the largest other sizes are all around
> 2G.
>
> - Merge policies are all at Solr 4 defaults. Index size is currently ~50M
> maxDocs, ~35M numDocs, 276GB.
>
> *Issue:*
>
> The background purge operation is deleting docs on schedule, but the disk
> space is not being recovered.
>
> *Presumptions:*
> I presume, but have not confirmed (how?) the 15M deleted documents are
> predominately in the two large segments.  Because they are largely in the
> two large segments, and those large segments still have (some/many) live
> documents, the segment backing files are not deleted.
>
> *Questions:*
>
> - When will those segments get merged and documents recovered?  Does it
> happen when _all_ the documents in those segments are deleted?  Some
> percentage of the segment is filled with deleted documents?
> - Is there a way to do it right now vs. just waiting?
> - In some cases, the purge delete conditional is _just_ free disk space:
>  when index > free space, delete oldest.  Those setups are now in scenarios
> where index >> free space, and getting worse.  How does low disk space
> effect above two questions?
> - Is there a way for me to determine stats on a per-segment basis?
>- for example, how many deleted documents in a particular segment?
> - On the flip side, can I determine in what segment a particular document
> is located?
>
> Thank you,
>
> Scott
>
> --
> Scott Lundgren
> Director of Engineering
> Carbon Black, Inc.
> (210) 204-0483 | scott.lundg...@carbonblack.com
>


Solr search in case the first keyword are not index?

2013-10-24 Thread dtphat
I have a problem with solr search: I have keyword, example: "apache solr
reference", I index for "apache", "solr", "reference".when I search with the
list keywords below:- apache solr reference -> OK- apache -> OK- solr -> OK-
the same.But when the first keyword is not index, and other keywords are
index, solr can not query it.(example I search with: apacheee solr
reference).Anyone can help me to solr this problem. 

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-search-in-case-the-first-keyword-are-not-index-tp4097698.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr search in case the first keyword are not index

2013-10-24 Thread dtphat
I have a problem with solr search: 
I have keyword, example: "apache solr reference", I index for "apache",
"solr", "reference".
when I search with the list keywords below:
- apache solr reference -> OK
- apache -> OK
- solr -> OK
- the same.

But when the first keyword is not index, and other keywords are index, solr
can not query it.
(example I search with: apacheee solr reference).

Anyone can help me to solr this problem.




  

  
  
  
  
  
  
  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-search-in-case-the-first-keyword-are-not-index-tp4097699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple facet fields in "defaults" section of a Request Handler

2013-10-24 Thread Varun Thacker
I think you have explained it perfectly on how the tag exclusion makes it a
different facet field and that no logic of default/invariants/appends would
be able to solve this

I went with the custom component approach.

Although a very hacky solution could be defining this in defaults:
{!ex=foo}category
{!ex=bar}brand

And ensure whenever clients filter docs they use the following syntax:
&fq={!tag=foo}category:"cat"


On Thu, Oct 24, 2013 at 11:01 PM, Chris Hostetter
wrote:

>
> : Now a client wants to use multi select faceting. He calls the following
> API:
> :
> http://localhost:8983/solr/collection1/search?q=*:*&facet.field={!ex=foo}category&fq={!tag=foo}category
> : :"cat"
>
> : Putting the facet definitions in "appends" cases it to facet category 2
> : times.
> :
> : Is there a way where he does not have to provide all the facet.field
> : parameters in the API call?
>
> What you are asking is essentially "I want to configure faceting on X and
> Y by default, but i want clients to be able to add faceting on Z and have
> that disable faceting on X while still faceting on Y"
>
> It doens't matter that X and Z are both field facets based arround the
> field name "category" -- the tag exclusion makes them completley
> different.
>
> The basic default/invariants/appends logic doesn't give you any easy
> mechanism to ignore arbitrary params like that - you could probably write
> a custom component that inspected the params and droped ones you don't
> want, but this wouldn't make sense as generalized logic in the
> FacetComponent since faceting on a field both with and w/o a tag
> expclusion at the same time is a very common use case.
>
>
>
>
> -Hoss
>



-- 


Regards,
Varun Thacker
http://www.vthacker.in/