Re: Issue with multivalued fields in UIMA

2014-10-29 Thread Darx Oman
Hi there
I ran into the same problem.
would you please explain how did you solve it.

Thanks,
Darx

On Fri, Aug 29, 2014 at 11:26 PM, Tommaso Teofili  wrote:

> Hi,
>
> it'd be good if you could open a Jira issues (with a patch preferably)
> describing your findings.
>
> Thanks,
> Tommaso
>
>
> 2014-08-29 18:34 GMT+02:00 mkhordad :
>
> > I solved it. It was caused by a bug in UIMAUpdateRequestProcessor.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Issue-with-multivalued-fields-in-UIMA-tp4155609p4155864.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Slow forwarding requests to collection leader

2014-10-29 Thread Daniel Collins
I kind of think this might be "working as designed", but I'll be happy to
be corrected by others :)

We had a similar issue which we discovered by accident, we had 2 or 3
collections spread across some machines, and we accidentally tried to send
an indexing request to a node in teh cloud that didn't have a replica of
collection1 (but it had other collections). We saw an instant jump in
indexing latency to 5s, which given the previous latencies had been ~20ms
was rather obvious!

Querying seems to be fine with this kind of forwarding approach, but
indexing would logically require ZK information (to find the right shard
for the destination collection and the leader of that shard), so I'm
wondering if a node in the cloud that has a replica of collection1 has that
information cached, whereas a node in the (same) cloud that only has a
collection2 replica only has collection2 information cached, and has to go
to ZK for every "forwarding" request.

I haven't checked the code recently, but that seems plausible to me. Would
you really want all your collection2 nodes to be running ZK watches for all
collection1 updates as well as their own collection2 watches, that would
clog them up processing updates that in all honestly, they shouldn't have
to deal with. Every node in the cloud would have to have a watch on
everything else which if you have a lot of independent collections would be
an unnecessary burden on each of them.

If you use SolrJ as a client, that would route to a correct node in the
cloud (which is what we ended up using through JNI which was
"interesting"), but if you are using HTTP to index, that's something your
application has to take care of.

On 28 October 2014 19:29, Matt Hilt  wrote:

> I have three equal machines each running solr cloud (4.8). I have multiple
> collections that are replicated but not sharded. I also have document
> generation processes running on these nodes which involves querying the
> collection ~5 times per document generated.
>
> Node 1 has a replica of collection A and is running document generation
> code that pushes to the HTTP /update/json hander.
> Node 2 is the leader of collection A.
> Node 3 does not have a replica of node A, but is running document
> generation code for collection A.
>
> The issue I see is that node 1 can push documents into Solr 3-5 times
> faster than node 3 when they both talk to the solr instance on their
> localhost. If either of them talk directly to the solr instance on node 2,
> the performance is excellent (on par with node 1). To me it seems that the
> only difference in these cases is the query/put request forwarding. Does
> this involve some slow zookeeper communication that should be avoided? Any
> other insights?
>
> Thanks


Solr, Invalid chunk header when indexing

2014-10-29 Thread Diego Marconato
Hi all,

with Solr 3.3.0 when indexing I get the following errors (sometimes):

org.apache.solr.common.SolrException log
org.apache.solr.common.SolrException: Invalid chunk header
Caused by: com.ctc.wstx.exc.WstxIOException: Invalid chunk header
Caused by: java.io.IOException: Invalid chunk header
at
org.apache.coyote.http11.filters.ChunkedInputFilter.throwIOException(ChunkedInputFilter.java:610)


Any idea of a workaround to solve the problem . I can act in some way
in solrconfig.xml?


thanks in advance

Diego
(Venice, Italy)


facet on field aliases of same field

2014-10-29 Thread Dan Field
Hi, we have a use case where we are trying to create multiple facet ranges 
based on a single field. 

I have successfully aliased the field by using the fl parameter e.g. 
fl=date_decade:date,date_year:date,date_month:date,date_day:date where date is 
the original field and the day_decade etc are the aliases. 

What I am failing to do is to create multiple facet ranges based on these 
aliased fields e.g:

&facet.field={!key=date_month 
ex=date_month}date_month&facet.field={!key=date_day 
ex=date_day}date&facet.range={!key=date_decade 
ex=date_decade}date_decade&facet.range={!key=date_year 
ex=date_year}date_year&f.date_decade.facet.range.start=1600-01-01T00:00:00Z&f.date_decade.facet.range.end=2000-01-01T00:00:00Z&f.date_decade.facet.range.gap=+10YEARS&f.date_year.facet.range.start=1600-01-01T00:00:00Z&f.date_year.facet.range.end=2000-01-01T00:00:00Z&f.date_year.facet.range.gap=+1YEARS"

We’re using Solarium here to generate the query and facet ranges but if we can 
do this in a raw HTTP request, that’s fine. I’m just not sure whether Solr will 
allow us to generate multiple facet ranges based on a single data field. Or am 
I approaching the problem in the wrong way?

Server is Solr 4.1

Any help appreciated

-- 
Dan Field mailto:d...@llgc.org.uk>>   
Ffôn/Tel. +44 1970 632 582
Pennaeth Uned DatblyguHead of Development Unit
Llyfrgell Genedlaethol Cymru  National Library of Wales



function results' names include trailing whitespace

2014-10-29 Thread Michael Sokolov
I noticed that when you include a function as a result field, the 
corresponding key in the result markup includes trailing whitespace, 
which seems like a bug.  I wonder if anyone knows if there is a ticket 
for this already?


Example:

fl="id field(units_used) archive_id"

ends up returning results like this:

|   {
"id":"nest.epubarchive.1",
"archive_id":"urn:isbn:97849D42C5A01",
"field(units_used)  ":123
  ^
  }|

A workaround is to use something like:

fl="id field(units_used)archive_id"

instead, but it seems inelegant, and not consistent with the treatment 
of other fields


-Mike


Sharding for Multi Cores

2014-10-29 Thread kumar
I have a situation that getting response from two cores for a single request.

I need to use custom request handler to get the responses.

For Example:

core1 and core2 both are having the request handler namely "/abc".

If i use individually in the following way i am getting proper results.

http://localhost:/solr/core1/abc/suggest?s=abc

http://localhost:/solr/core2/abc/suggest?s=abc

But how can i write a single query to combine both the above queries.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sharding-for-Multi-Cores-tp4166434.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Log message "zkClient has disconnected".

2014-10-29 Thread Mark Miller


> On Oct 28, 2014, at 9:31 AM, Shawn Heisey  wrote:
> 
> exceed a 15 second zkClientTimeout

Which is too low even with good GC settings. Anyone with config still using 15 
or 10 seconds should move it to at least 30.

- Mark

http://about.me/markrmiller

RE: facet on field aliases of same field

2014-10-29 Thread Michael Ryan
It is indeed possible. Just need to use a different syntax. As far as I know, 
the facet parameters need to be local parameters, like this...

&facet.range={!key=date_decade facet.range.start=1600-01-01T00:00:00Z 
facet.range.end=2000-01-01T00:00:00Z 
facet.range.gap=%2B10YEARS}date&facet.range={!key=date_year 
facet.range.start=1600-01-01T00:00:00Z facet.range.end=2000-01-01T00:00:00Z 
facet.range.gap=%2B1YEARS}date

-Michael

-Original Message-
From: Dan Field [mailto:d...@llgc.org.uk] 
Sent: Wednesday, October 29, 2014 5:54 AM
To: solr-user@lucene.apache.org
Subject: facet on field aliases of same field

Hi, we have a use case where we are trying to create multiple facet ranges 
based on a single field. 

I have successfully aliased the field by using the fl parameter e.g. 
fl=date_decade:date,date_year:date,date_month:date,date_day:date where date is 
the original field and the day_decade etc are the aliases. 

What I am failing to do is to create multiple facet ranges based on these 
aliased fields e.g:

&facet.field={!key=date_month 
ex=date_month}date_month&facet.field={!key=date_day 
ex=date_day}date&facet.range={!key=date_decade 
ex=date_decade}date_decade&facet.range={!key=date_year 
ex=date_year}date_year&f.date_decade.facet.range.start=1600-01-01T00:00:00Z&f.date_decade.facet.range.end=2000-01-01T00:00:00Z&f.date_decade.facet.range.gap=+10YEARS&f.date_year.facet.range.start=1600-01-01T00:00:00Z&f.date_year.facet.range.end=2000-01-01T00:00:00Z&f.date_year.facet.range.gap=+1YEARS"

We’re using Solarium here to generate the query and facet ranges but if we can 
do this in a raw HTTP request, that’s fine. I’m just not sure whether Solr will 
allow us to generate multiple facet ranges based on a single data field. Or am 
I approaching the problem in the wrong way?

Server is Solr 4.1

Any help appreciated

-- 
Dan Field mailto:d...@llgc.org.uk>>   
Ffôn/Tel. +44 1970 632 582
Pennaeth Uned DatblyguHead of Development Unit
Llyfrgell Genedlaethol Cymru  National Library of Wales



Re: Solr, Invalid chunk header when indexing

2014-10-29 Thread Shawn Heisey
On 10/29/2014 3:09 AM, Diego Marconato wrote:
> with Solr 3.3.0 when indexing I get the following errors (sometimes):
> 
> org.apache.solr.common.SolrException log
> org.apache.solr.common.SolrException: Invalid chunk header
> Caused by: com.ctc.wstx.exc.WstxIOException: Invalid chunk header
> Caused by: java.io.IOException: Invalid chunk header
> at
> org.apache.coyote.http11.filters.ChunkedInputFilter.throwIOException(ChunkedInputFilter.java:610)
> 
> 
> Any idea of a workaround to solve the problem . I can act in some way
> in solrconfig.xml?

The class org.apache.coyote.http11.filters.ChunkedInputFilter is a
tomcat class.  Tomcat seems to be complaining about the HTTP connection
from the client.  HTTP chunking (a feature of http 1.1, related to
keepalive if I'm not mistaken) seems to be having problems.  Solr itself
is not involved with this exception.

This is most likely a client issue, a bug in tomcat, or a configuration
issue in tomcat.  The client issue could be an issue with a proxy, too.
 The haproxy load balancer (which I use with Solr) usually requires the
http-pretend-keepalive option when talking to a servlet container like
jetty or tomcat.

http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#4-option%20http-pretend-keepalive

Thanks,
Shawn



Re: Sharding for Multi Cores

2014-10-29 Thread Erick Erickson
You can't AFAIK. Solr treats these cores as completely
separate entities. Plus, scores across the separate
cores cannot be assumed to be comparable. In fact,
there's not even any guarantee that the two
cores have _any_ fields in common, so this isn't
something that can be solved generally.

The app layer, or the custom request handler need
to handle this.

Best,
Erick

On Wed, Oct 29, 2014 at 5:28 AM, kumar  wrote:
> I have a situation that getting response from two cores for a single request.
>
> I need to use custom request handler to get the responses.
>
> For Example:
>
> core1 and core2 both are having the request handler namely "/abc".
>
> If i use individually in the following way i am getting proper results.
>
> http://localhost:/solr/core1/abc/suggest?s=abc
>
> http://localhost:/solr/core2/abc/suggest?s=abc
>
> But how can i write a single query to combine both the above queries.
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Sharding-for-Multi-Cores-tp4166434.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Clear Solr Admin Interface Logging page's logs

2014-10-29 Thread David Philip
Hi,

 Is there a way to clear the solr admin interface logging page's logs?

I understand that we can change the logging level but incase if I would
want to just clear the logs and say reload collection and expect to see
latest only and not the past?
Manual way or anywhere that I should clear so that I just see latest logs?


Thanks
David


Re: function results' names include trailing whitespace

2014-10-29 Thread Chris Hostetter

: fl="id field(units_used) archive_id"

I didn't even realize until today that fl was documented to support space 
seperated fields.  i've only ever used commas...

  fl="id,field(units_used),archive_id"

Please go ahead and file a bug in Jira for this, and note in the summary 
that using commas instead of spaces is the workarround.


-Hoss
http://www.lucidworks.com/


Score phrases higher than the records containing the words?

2014-10-29 Thread hschillig
So I have a few titles like so:

1. When a dog bites fight back : what you need to know, what to do, what not
to do / [prepared by the law firm] Slater & Zurz LLP.
2. First things first [book on cd] : [the rules of being a Warner-- what
works, what doesn't and what really matters most] / Kurt & Brenda Warner,
with Jennifer Schuchmann.
3. What if? : serious scientific answers to absurd hypothetical questions /
Randall Munroe.

Now when I put this in my query field:
title:what if

It returns the first two BEFORE it returns the book that has the actual
"what if" phrase.. when that one should be listed first..
If I do title:"what if", none of them get returned.

Here is my schema.xml file:
http://apaste.info/7r5   

I want the titles that contain the phrase "what if" to be returned first.
And then index by "what", "if".. The double quotes doesn't seem to contain
the phrase. I removed the stop words because "if" was in the list and I
didn't want the indexing to ignore that.

Thank you for any help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Score-phrases-higher-than-the-records-containing-the-words-tp4166488.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Memory Usage

2014-10-29 Thread Vijay Kokatnur
I am observing some weird behavior with how Solr is using memory.  We are
running both Solr and zookeeper on the same node.  We tested memory
settings on Solr Cloud Setup of 1 shard with 146GB index size, and 2 Shard
Solr setup with 44GB index size.  Both are running on similar beefy
machines.

 After running the setup for 3-4 days, I see that a lot of memory is
inactive in all the nodes -

 99052952  total memory
 98606256  used memory
 19143796  active memory
 75063504  inactive memory

And inactive memory is never reclaimed by the OS.  When total memory size
is reached, latency and disk IO shoots up.  We observed this behavior in
both Solr Cloud setup with 1 shard and Solr setup with 2 shards.

For the Solr Cloud setup, we are running a cron job with following command
to clear out the inactive memory.  It  is working as expected.  Even though
the index size of Cloud is 146GB, the used memory is always below 55GB.
Our response times are better and no errors/exceptions are thrown. (This
command causes issue in 2 Shard setup)

echo 3 > /proc/sys/vm/drop_caches

We have disabled the query, doc and solr caches in our setup.  Zookeeper is
using around 10GB of memory and we are not running any other process in
this system.

Has anyone faced this issue before?


Re: Score phrases higher than the records containing the words?

2014-10-29 Thread Erick Erickson
First thing is add &debug=query to the URL and see what the parsed
form of the query is to be sure the stop words issue is resolved.

Once that's determined, add the phrase with a high boost, something like
q=title:(what if) OR title:"what if"^10

where the boost factor is TBD.

Or add the title field to the "pf" parameter if you're using edismax,
possibly with a boost.

Or add a "bq" clause to the edismax.

Or add a "boost" to the main query, similar to:
https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

Best,
Erick


On Wed, Oct 29, 2014 at 10:30 AM, hschillig  wrote:
> So I have a few titles like so:
>
> 1. When a dog bites fight back : what you need to know, what to do, what not
> to do / [prepared by the law firm] Slater & Zurz LLP.
> 2. First things first [book on cd] : [the rules of being a Warner-- what
> works, what doesn't and what really matters most] / Kurt & Brenda Warner,
> with Jennifer Schuchmann.
> 3. What if? : serious scientific answers to absurd hypothetical questions /
> Randall Munroe.
>
> Now when I put this in my query field:
> title:what if
>
> It returns the first two BEFORE it returns the book that has the actual
> "what if" phrase.. when that one should be listed first..
> If I do title:"what if", none of them get returned.
>
> Here is my schema.xml file:
> http://apaste.info/7r5 
>
> I want the titles that contain the phrase "what if" to be returned first.
> And then index by "what", "if".. The double quotes doesn't seem to contain
> the phrase. I removed the stop words because "if" was in the list and I
> didn't want the indexing to ignore that.
>
> Thank you for any help!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Score-phrases-higher-than-the-records-containing-the-words-tp4166488.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Memory Usage

2014-10-29 Thread Shawn Heisey
On 10/29/2014 11:43 AM, Vijay Kokatnur wrote:
> I am observing some weird behavior with how Solr is using memory.  We are
> running both Solr and zookeeper on the same node.  We tested memory
> settings on Solr Cloud Setup of 1 shard with 146GB index size, and 2 Shard
> Solr setup with 44GB index size.  Both are running on similar beefy
> machines.
>
>  After running the setup for 3-4 days, I see that a lot of memory is
> inactive in all the nodes -
>
>  99052952  total memory
>  98606256  used memory
>  19143796  active memory
>  75063504  inactive memory
>
> And inactive memory is never reclaimed by the OS.  When total memory size
> is reached, latency and disk IO shoots up.  We observed this behavior in
> both Solr Cloud setup with 1 shard and Solr setup with 2 shards.

Where are these numbers coming from?  If they are coming from the
operating system and not Java, then you have nothing to worry about.

> For the Solr Cloud setup, we are running a cron job with following command
> to clear out the inactive memory.  It  is working as expected.  Even though
> the index size of Cloud is 146GB, the used memory is always below 55GB.
> Our response times are better and no errors/exceptions are thrown. (This
> command causes issue in 2 Shard setup)
>
> echo 3 > /proc/sys/vm/drop_caches

Don't do that.  You're throwing away almost every performance advantage
the operating system has to offer.  If this changes the numbers so they
look better to you, then I can almost guarantee you that you are not
having any actual problem, and that dropping the caches like this is
*hurting* performance, not helping it.

It's completely normal for a correctly functioning system to report an
extremely low amount of memory as free.  The operating system is using
the spare memory in your system as a filesystem cache, which makes
everything run a lot faster.  If a program needs more memory, the
operating system will instantly give up some of its disk cache in order
to satisfy the memory allocation.

The "virtual memory" part of this blog post (which has direct relevance
for Solr) hopefully can explain it better than I can.  The entire blog
post is worth reading.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Thanks,
Shawn



RE: Solr Memory Usage

2014-10-29 Thread Toke Eskildsen
Vijay Kokatnur [kokatnur.vi...@gmail.com] wrote:
> For the Solr Cloud setup, we are running a cron job with following command
> to clear out the inactive memory.  It  is working as expected.  Even though
> the index size of Cloud is 146GB, the used memory is always below 55GB.
> Our response times are better and no errors/exceptions are thrown. (This
> command causes issue in 2 Shard setup)

> echo 3 > /proc/sys/vm/drop_caches

As Shawn points out, this is under normal circumstances a very bad idea, but...

> Has anyone faced this issue before?

We did have some problems on a 256GB machine churning terabytes of data through 
40 concurrent Tika processes and into Solr. After some days, performance got 
really bad. When we did a top, we noticed that most of the time was used in the 
kernel (the 'sy' on the '%Cpu(s):'-line). The drop_caches trick worked for us 
too. Our systems guys explained that it was because of virtual memory space 
fragmentation, so the OS had to spend a lot of resources just bookkeeping 
memory.

Try keeping an eye on the fraction of processing power spend on the kernel from 
you clear the cache until it performance gets bad again. If it rises 
drastically, you might have the same problem.

- Toke Eskildsen


Re: Clear Solr Admin Interface Logging page's logs

2014-10-29 Thread Ramzi Alqrainy
Yes sure, if you use jetty container to run solr, you can remove solr.log
file from
 
$SOLR_HOME/example/logs

by using this command for Linux/Unix
 
rm -rf $SOLR_HOME/example/logs/solr.log

For windows

DEL  $SOLR_HOME/example/logs/solr.log

After that, you can check the logging interface.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clear-Solr-Admin-Interface-Logging-page-s-logs-tp4166463p4166500.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Clear Solr Admin Interface Logging page's logs

2014-10-29 Thread Jorge Luis Betancourt González
Although this looks like a nice & simple addition to the web interface.

- Original Message -
From: "Ramzi Alqrainy" 
To: solr-user@lucene.apache.org
Sent: Wednesday, October 29, 2014 3:18:26 PM
Subject: Re: Clear Solr Admin Interface Logging page's logs

Yes sure, if you use jetty container to run solr, you can remove solr.log
file from
 
$SOLR_HOME/example/logs

by using this command for Linux/Unix
 
rm -rf $SOLR_HOME/example/logs/solr.log

For windows

DEL  $SOLR_HOME/example/logs/solr.log

After that, you can check the logging interface.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clear-Solr-Admin-Interface-Logging-page-s-logs-tp4166463p4166500.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr-map-reduce API

2014-10-29 Thread Pritesh Patel
What exactly does this API do?

--Pritesh


Re: solr-map-reduce API

2014-10-29 Thread Michael Della Bitta

Check this out:

http://www.slideshare.net/cloudera/solrhadoopbigdatasearch

On 10/29/14 16:31, Pritesh Patel wrote:

What exactly does this API do?

--Pritesh





Questions about Solrj indexing/updateRequest API with regard to enabling HTTP Basic Auth inside Tomcat (HTTP POST method)

2014-10-29 Thread Yuan Jerry
Hi Solr User List,

I have started using Solrj (Solr and Solrj 4.1.0, and also 4.10.1) for sending 
indexing/update requests to Solr server that is being hosted inside Tomcat, and 
the security authentication HTTP BASIC auth is enabled in this Solr server 
web.xml.

(1) The client code looks like below:

String solrServerUrl = "http://localhost:8983/solr/core";;
String userName = "solr_admin";
String password = "solr_pwd";

DefaultHttpClient client = new DefaultHttpClient();
HttpClientUtil.setBasicAuth(client, userName, password);

HttpSolrServer solrServer = new HttpSolrServer(solrServerUrl, client);

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "id_" + System.currentTimeMillis());
doc.addField("name", "Name_" + System.currentTimeMillis());
doc.addField("title", "Title_" + System.currentTimeMillis());

try {
   UpdateResponse updateResponse = solrServer.add(doc, 1);
   ..
} catch (Exception ex) {
}

(2) The Solr server web.xml is configured with the following HTTP BASIC Auth 
configurations:

web.xml:

   
  BASIC
  Solr
   

   
  
 Secured Solr Access
 /*
  
  
 solr_secure
  
   

(3) The Tomcat container has the following role defined for being used in the 
above security constraints:

tomcat-users.xml:

   
 
 
 
   

When I ran the above client code trying to add a single SolrDocument, and it 
always failed with the following exception:

org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: http://localhost:8983/solr/core

Are there any users out there that have used solrj APIs to conduct indexing / 
storing process for solr documents into Solr server that is configured with 
HTTP Basic Auth like above? If so, please let me know if you have encountered 
similar exceptions or there could be some issues with my configurations that 
are shown above. Your information would be highly appreciated in advance.

Jerry Yuan


This e-mail is confidential.  If you are not the intended recipient, you must 
not disclose or use the information contained in it. If you have received this 
e-mail in error, please tell us immediately by return e-mail and delete the 
document. No recipient may use the information in this e-mail in violation of 
any civil or criminal statute. Sentry disclaims all liability for any 
unauthorized uses of this e-mail or its contents. Sentry accepts no liability 
or responsibility for any damage caused by any virus transmitted with this 
e-mail.


v4.0 upgrade to v4.1 with custom fq

2014-10-29 Thread nbosecker
I've inherited some code that filters requests for ACL by implementing a
servlet Filter and wrapping the request to add parameters (user/groups) to
the request as fq, and also handling getParameter()/getParameterMap() that
Solr invokes to get those values from the request.

Solrconfig.xml has placeholders for the parameters that are injected when
those methods are invoked:
 
  {!acl user=$user groups=$groups}


This all works exactly as expected using Solr 4.0.0 with Tomcat.

I'm attempting to upgrade to Solr 4.1.0, and these values are no longer
injected. I've traced the problem to the fact that Solr no longer invokes
getParameter()/getParameterMap() on my filtered, wrapped request as it did
in 4.0.0.

I note that in 4.0.0, the package org.apache.solr.request contained
ServletSolrParams class that was used to get the parameters from the request
using getParameters()/getParameterMap().

4.1.0 version of org.apache.solr.request no longer contains
ServletSolrParams, but there is no mention of this deprecation in the
release notes for Solr 4.1.0.

Is there a new standard way to access a filtered request with Solr 4.1.0?
Why don't the release notes for 4.1.0 mention this class deprecation?

Thanks for your help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/v4-0-upgrade-to-v4-1-with-custom-fq-tp4166520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: v4.0 upgrade to v4.1 with custom fq

2014-10-29 Thread Erick Erickson
Hmmm, my first question is whether you really mean 4.1.0 Or 4.10?

Because if it's the former I really have to ask why you'e use such
an old version. I'm assuming that's a typo.

BTW, 4.10.2 is being released as we speak, so you'll really want to
consider using that version assuming you meant 4.10

That said, I have no real help to give here.

Best,
Erick

On Wed, Oct 29, 2014 at 1:51 PM, nbosecker  wrote:
> I've inherited some code that filters requests for ACL by implementing a
> servlet Filter and wrapping the request to add parameters (user/groups) to
> the request as fq, and also handling getParameter()/getParameterMap() that
> Solr invokes to get those values from the request.
>
> Solrconfig.xml has placeholders for the parameters that are injected when
> those methods are invoked:
>  
>   {!acl user=$user groups=$groups}
> 
>
> This all works exactly as expected using Solr 4.0.0 with Tomcat.
>
> I'm attempting to upgrade to Solr 4.1.0, and these values are no longer
> injected. I've traced the problem to the fact that Solr no longer invokes
> getParameter()/getParameterMap() on my filtered, wrapped request as it did
> in 4.0.0.
>
> I note that in 4.0.0, the package org.apache.solr.request contained
> ServletSolrParams class that was used to get the parameters from the request
> using getParameters()/getParameterMap().
>
> 4.1.0 version of org.apache.solr.request no longer contains
> ServletSolrParams, but there is no mention of this deprecation in the
> release notes for Solr 4.1.0.
>
> Is there a new standard way to access a filtered request with Solr 4.1.0?
> Why don't the release notes for 4.1.0 mention this class deprecation?
>
> Thanks for your help!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/v4-0-upgrade-to-v4-1-with-custom-fq-tp4166520.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: v4.0 upgrade to v4.1 with custom fq

2014-10-29 Thread nbosecker
No, I mean 4.1.0, not 4.10, although my ultimate goal is to get to 4.10. (And
now 4.10.2 as you suggest!)

I tried 4.0->4.10 first, ran into this issue, and decided to go one step at
a time and try going from 4.0->4.1. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/v4-0-upgrade-to-v4-1-with-custom-fq-tp4166520p4166525.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Memory Usage

2014-10-29 Thread Will Martin
This command only touches OS level caches that hold pages destined for (or
not) the swap cache. Its use means that disk will be hit on future requests,
but in many instances the pages were headed for ejection anyway.

It does not have anything whatsoever to do with Solr caches.  It also is not
fragmentation related; it is a result of the kernel managing virtual pages
in an "as designed manner". The proper command is

#sync; echo 3 >/proc/sys/vm/drop_caches. 

http://linux.die.net/man/5/proc

I have encountered resistance on the use of this on long-running processes
for years ... from people who don't even research the matter.



-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Wednesday, October 29, 2014 3:06 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Memory Usage

Vijay Kokatnur [kokatnur.vi...@gmail.com] wrote:
> For the Solr Cloud setup, we are running a cron job with following 
> command to clear out the inactive memory.  It  is working as expected.  
> Even though the index size of Cloud is 146GB, the used memory is always
below 55GB.
> Our response times are better and no errors/exceptions are thrown. 
> (This command causes issue in 2 Shard setup)

> echo 3 > /proc/sys/vm/drop_caches

As Shawn points out, this is under normal circumstances a very bad idea,
but...

> Has anyone faced this issue before?

We did have some problems on a 256GB machine churning terabytes of data
through 40 concurrent Tika processes and into Solr. After some days,
performance got really bad. When we did a top, we noticed that most of the
time was used in the kernel (the 'sy' on the '%Cpu(s):'-line). The
drop_caches trick worked for us too. Our systems guys explained that it was
because of virtual memory space fragmentation, so the OS had to spend a lot
of resources just bookkeeping memory.

Try keeping an eye on the fraction of processing power spend on the kernel
from you clear the cache until it performance gets bad again. If it rises
drastically, you might have the same problem.

- Toke Eskildsen



Exporting Error in 4.10.1

2014-10-29 Thread Joseph Obernberger
Hi - I'm trying to use 4.10.1 with /export.  I've defined a field as
follows:


I then call:
http://server:port/solr/COLLECT1/export?q=Collection:COLLECT2000&sort=DocumentId
desc&fl=DocumentId

The error I receive is:
java.io.IOException: DocumentId must have DocValues to use this feature.
at
org.apache.solr.response.SortingResponseWriter.getFieldWriters(SortingResponseWriter.java:228)
at
org.apache.solr.response.SortingResponseWriter.write(SortingResponseWriter.java:119)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)

Any ideas what I'm doing wrong?
Thank you!

-Joe Obernberger


RE: Solr Memory Usage

2014-10-29 Thread Will Martin
Oops. My wording was poor. My reference to those who don't research the
matter was pointing at a large number of engineers I have worked with; not
this list.

-Original Message-
From: Will Martin [mailto:wmartin...@gmail.com] 
Sent: Wednesday, October 29, 2014 6:38 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Solr Memory Usage

This command only touches OS level caches that hold pages destined for (or
not) the swap cache. Its use means that disk will be hit on future requests,
but in many instances the pages were headed for ejection anyway.

It does not have anything whatsoever to do with Solr caches.  It also is not
fragmentation related; it is a result of the kernel managing virtual pages
in an "as designed manner". The proper command is

#sync; echo 3 >/proc/sys/vm/drop_caches. 

http://linux.die.net/man/5/proc

I have encountered resistance on the use of this on long-running processes
for years ... from people who don't even research the matter.



-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Wednesday, October 29, 2014 3:06 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Memory Usage

Vijay Kokatnur [kokatnur.vi...@gmail.com] wrote:
> For the Solr Cloud setup, we are running a cron job with following 
> command to clear out the inactive memory.  It  is working as expected.
> Even though the index size of Cloud is 146GB, the used memory is always
below 55GB.
> Our response times are better and no errors/exceptions are thrown. 
> (This command causes issue in 2 Shard setup)

> echo 3 > /proc/sys/vm/drop_caches

As Shawn points out, this is under normal circumstances a very bad idea,
but...

> Has anyone faced this issue before?

We did have some problems on a 256GB machine churning terabytes of data
through 40 concurrent Tika processes and into Solr. After some days,
performance got really bad. When we did a top, we noticed that most of the
time was used in the kernel (the 'sy' on the '%Cpu(s):'-line). The
drop_caches trick worked for us too. Our systems guys explained that it was
because of virtual memory space fragmentation, so the OS had to spend a lot
of resources just bookkeeping memory.

Try keeping an eye on the fraction of processing power spend on the kernel
from you clear the cache until it performance gets bad again. If it rises
drastically, you might have the same problem.

- Toke Eskildsen



Re: Shared Directory for two Solr Clouds(Writer and Reader)

2014-10-29 Thread Jaeyoung Yoon
Hi Erick,

Thanks for your kind reply.

In order to deal with more documents in SolrCloud, we are thinking to use
many collections and each of collection will also have several shards.
Basic idea to deal with much document is that when a collection is filled
with much data, we will create a new collection.
That is, we will create many collections to contain more data. Currently we
are thinking to use timestamp of each file to decide which collection will
contain which documents.
For example, we will create a new collection like every day or ever hour.
But not fixed interval. We will maintain start/end time of each collection.
Then once a collection is filled with the limit of number of documents. We
will create a new collection. In order to avoid too many running
collections in a SolrCloud, we are also thinking to unload(disable) old
collections without deleting index files. So in future, we could enable the
collection again on demand in a different SolrCloud.

This way is what we are thinking to deal with many documents.

But problem is
Because our document ingest rate is very high, most of
resources(CPU/Memory) in Solr is used for indexing. So when we try to do
some query on the same machine, the query might be slow because of lack
resources. Also the queries reduces indexing performance.
So we have been investigating the way to use two separate Solr Clouds; one
for indexing and the other for query. These two clouds will share data but
use separate computing resources.


Here is what we already setup in our prototype.

Setup

Currently we setup two separate Solr Clouds.

   1. Two Solr Clouds.
   2. One zooKeeper for each SolrCloud. Indexing Solr Cloud need to know
   Search Solr Cloud's zookeeper address. But Search SolrCloud doesn't need to
   know indexing Zookeeper.
   3. Indexing SolrCloud and query SolrCloud will have the same collection
   name and same number of shards for the collection.
   4. Indexing SolrCloud and query SolrCloud uses their own solrHome but
   indexing data directory are shared between them.
   5. In indexing SolrCloud, each shard will have one only node.
   6. In query SolrCloud, each shard could have more than one node for more
   query capability.

 For example,

 [image: Inline image 1]
How it works

In order to keep consistent view between Indexing and Search Solr Clouds,

   1. Search Solr Cloud doesn't have any updateHandler/commit. It uses
   ReadOnlyDirectory(Factory) and NoOpUpdateHandler for /update. Also
   solrcloud.skip.autorecovery=true
   2. Search Solr Cloud doesn't open "Searcher" by itself. It opens
   "Searcher" only when it receives "openSearcherCmd" from Indexing Solr Cloud.
   3. Indexing Solr Cloud sends "openSearcherCmd" to search Solr Cloud
   after commit. That is, after each commit on Indexing SolrCloud, it
   schedules "openSearcherCmd" with remoteOpenSearcherMaxDelayAfterCommit
   interval. After the interval(default is 80 secs), Indexing SolrCloud will
   send "openSearcherCmd" to Search Solr Cloud.
   4. Indexing SolrCloud has own deletionPolicy to keep old commit points
   which might be used by running queries on Search Cloud. Currently Indexing
   SolrCloud keep last 20 minutes commit points.

Any feedback or your opinion would be very helpful to us.

Thanks in advance.
Jae


On Tue, Oct 21, 2014 at 7:30 AM, Erick Erickson 
wrote:

> Hmmm, I sure hope you have _lots_ of shards. At that rate, a single
> shard is probably going to run up against internal limits in a _very_
> short time (the most docs I've seen successfully served on a single
> shard run around 300M).
>
> It seems, to handle any reasonable retention period, you need lots and
> lots and lots of physical machines out there. Which hints at using
> regular SolrCloud since each machine would then be handling much less
> of the load.
>
> This is what I mean by "the XY problem". Your setup, at least from
> what you've told us so far, has so many unknowns that it's impossible
> to say much. If you go with your original e-mail and get it all set up
> and running on, say, 3 shards, it would work fine for about an hour.
> At that point you would have 300M docs on each shard and your query
> performance would start having... problems. You'd be hitting the hard
> limit of 2B docs/shard in less than 10 hours. And all the work you've
> put into this complex coordination setup would be totally wasted.
>
> So, you _really_ have to explain a lot more about the problem before
> we talk about writing code. You might want to review:
> http://wiki.apache.org/solr/UsingMailingLists
>
> Best,
> Erick
>
> On Tue, Oct 21, 2014 at 12:34 AM, Jaeyoung Yoon 
> wrote:
> > In my case, injest rate is very high(above 300K docs/sec) and data are
> kept
> > inserted. So CPU is already bottleneck because of indexing.
> >
> > older-style master/slave replication with http or scp takes long to copy
> > big files from master/slave.
> >
> > That's why I setup two separate Solr Clouds. One for indexing and the
> oth

Re: Solr Memory Usage

2014-10-29 Thread Shawn Heisey
On 10/29/2014 1:05 PM, Toke Eskildsen wrote:
> We did have some problems on a 256GB machine churning terabytes of data 
> through 40 concurrent Tika processes and into Solr. After some days, 
> performance got really bad. When we did a top, we noticed that most of the 
> time was used in the kernel (the 'sy' on the '%Cpu(s):'-line). The 
> drop_caches trick worked for us too. Our systems guys explained that it was 
> because of virtual memory space fragmentation, so the OS had to spend a lot 
> of resources just bookkeeping memory.

There's always at least one exception to any general advice, including
whatever I come up with!  It's really too bad that it didn't Just Work
(tm) for you.  Weird things can happen when you start down the path of
extreme scaling, though.

Thank you for exploring the bleeding edge for us!

Shawn



Re: function results' names include trailing whitespace

2014-10-29 Thread Michael Sokolov
OK, I opened SOLR-6672; not sure how I stumbled into using white space; 
I would ordinarily use commas too, I think.


-Mike

On 10/29/14 1:23 PM, Chris Hostetter wrote:

: fl="id field(units_used) archive_id"

I didn't even realize until today that fl was documented to support space
seperated fields.  i've only ever used commas...

   fl="id,field(units_used),archive_id"

Please go ahead and file a bug in Jira for this, and note in the summary
that using commas instead of spaces is the workarround.


-Hoss
http://www.lucidworks.com/




Migrating cloud to another set of machines

2014-10-29 Thread Jakov Sosic

Hi guys


I was wondering is there some smart way to migrate Solr cloud from 1 set 
of machines to another?


Specificaly, I have 2 cores, each of them with 2 replicas and 2 shards, 
spread across 4 machines.


We bought new HW and are in a process of moving to new 4 machines.


What are my options?


1) - Create new cluster on new set of machines.
   - stop write operations
   - copy data directories from old machines to new machines
   - start solrs on new machines


2) - expand number of replicas from 2 to 4
   - add new solr nodes to cloud
   - wait for resync
   - stop old solr nodes
   - shrink number of replicas from 4 back to 2


Is there any other path to achieve this?

I'm leaning towards no1, because I don't feel too comfortable with doing 
all those changes explained in no2 ...


Ideas?


Re: Migrating cloud to another set of machines

2014-10-29 Thread Otis Gospodnetic
Hi/Bok Jakov,

2) sounds good to me.  It means no down-time.  1) means stoppage.  If
stoppage is not OK, but falling behind with indexing new content is OK, you
could:
* add a new cluster
* start reading from old index and indexing into the new index
* stop old cluster when done
* index new content to new cluster (or maybe you can be doing this all
along if indexing old + new at the same time is OK for you)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Oct 29, 2014 at 10:18 PM, Jakov Sosic  wrote:

> Hi guys
>
>
> I was wondering is there some smart way to migrate Solr cloud from 1 set
> of machines to another?
>
> Specificaly, I have 2 cores, each of them with 2 replicas and 2 shards,
> spread across 4 machines.
>
> We bought new HW and are in a process of moving to new 4 machines.
>
>
> What are my options?
>
>
> 1) - Create new cluster on new set of machines.
>- stop write operations
>- copy data directories from old machines to new machines
>- start solrs on new machines
>
>
> 2) - expand number of replicas from 2 to 4
>- add new solr nodes to cloud
>- wait for resync
>- stop old solr nodes
>- shrink number of replicas from 4 back to 2
>
>
> Is there any other path to achieve this?
>
> I'm leaning towards no1, because I don't feel too comfortable with doing
> all those changes explained in no2 ...
>
> Ideas?
>