Re: Solr Full Import frozen after indexing a fixed number of records

2014-07-27 Thread Gora Mohanty
On 27 July 2014 12:13, Aniket Bhoi  wrote:

> On Fri, Jul 25, 2014 at 8:32 PM, Aniket Bhoi 
> wrote:
>
> > I have Apache Solr,hosted on my apache Tomcat Server with SQLServer
> > Backend.
>
> [...]

> > After I run a full import,Indexing proceeds sucessfully,but seems to
> > freeze everytime after fetching fixed number of records.What I mean is
> > after it fetches 10730 records it just freezes and doesnt process any
> more.
> >
> > Excerpt from dataimport.xml:
> >
> > 
> > 0:15:31.959
> > 0
> > *10730*
> > 3579
> > 0
> > 2014-07-25 10:44:39
> >
> > This seems to happen everytime.
> >
> > I checked the tomcatlog.Following is the excerpt when Solr freezes:
> >
> > INFO:  Generating record for Unique ID :null attachment Ref:null
> > parent ref :nullexecuted by thread:25
>

[...]

Something is wrong with your DIH config file: You seem to be getting null
for a document unique ID. Please share the file with us.

Regards,
Gora


Re: integrating Accumulo with solr

2014-07-27 Thread Jack Krupansky
Right, and that's exactly what DataStax Enterprise provides (at great 
engineering effort!) - synchronization of database updates and search 
indexing. Sure, you can do it as well, but that's a significant engineering 
challenge with both sides of the equation, and not a simple "plug and play" 
configuration setting by writing a simple "connector."


But, hey, if you consider yourself one of those "true hard-core gunslingers" 
then you'll be able to code that up in a weekend without any of our 
assistance, right?


In short, synchronizing two data stores is a real challenge. Yes, it is 
doable, but... it is non-trivial. Especially if both stores are distributed 
clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route 
instead of Solr.


I'm certainly not suggesting that it can't be done. Just highlighting the 
challenge of such a task.


Just to be clear, you are referring to "sync mode" and not mere "ETL", which 
people do all the time with batch scripts, Java extraction and ingestion 
connectors, and cron jobs.


Give it a shot and let us know how it works out.

-- Jack Krupansky

-Original Message- 
From: Ali Nazemian

Sent: Sunday, July 27, 2014 1:20 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Hi,
One more thing to mention: I dont want to use solr or lucence for indexing
accumulo or full text search inside that. I am looking for have both in a
sync mode. I mean import some parts of data to solr for indexing. For this
purpose probably I need something like trigger in RDBMS, I have to define
something (probably with accumulo iterator) to import to solr on inserting
new data.
Regards.

On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian 
wrote:


Dear Jack,
Actually I am going to do benefit-cost analysis for in-house developement
or going for sqrrl support.
Best regards.


On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky 
wrote:


Like I said, you're going to have to be a real, hard-core gunslinger to
do that well. Sqrrl uses Lucene directly, BTW:

"Full-Text Search: Utilizing open-source Lucene and custom indexing
methods, Sqrrl Enterprise users can conduct real-time, full-text search
across data in Sqrrl Enterprise."

See:
http://sqrrl.com/product/search/

Out of curiosity, why are you not using that integrated Lucene support of
Sqrrl Enterprise?


-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 3:07 PM

To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Thank you. I am aware of datastax but I am looking for integrating
accumulo
with solr. This is something like what sqrrl guys offer.
Regards.


On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky 
wrote:

 If you are not a "true hard-core gunslinger" who is willing to dive in

and
integrate the code yourself, instead you should give serious
consideration
to a product such as DataStax Enterprise that fully integrates and
packages
a NoSQL database (Cassandra) and Solr for search. The security aspects
are
still a work in progress, but certainly headed in the right direction.
And
it has Hadoop and Spark integration as well.

See:
http://www.datastax.com/what-we-offer/products-services/
datastax-enterprise

-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr


Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on
the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to
use
HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock  wrote:

 Ali,



Sounds like a good choice.  It's pretty standard to store the primary
storage id as a field in Solr so that you can search the full text in
Solr
and then retrieve the full document elsewhere.

I would recommend creating a document structure in Solr with whatever
fields you want indexed (most likely as text_en, etc.), and then store 
a

"string" field named "content_id", which would be the Accumulo row id
that
you look up with a scan.

One caveat -- Accumulo will be protected at the cell level, but if you
need
your Solr search results to be protected by complex authorization
strings
similar to Accumulo, you will need to write your own QParserPlugin and
use
post filtering:
http://java.dzone.com/articles/custom-security-filtering-solr

The code you see in that article is written for an earlier version of
Solr,
but it's not too difficult to adjust it for the latest (we've done so 
in

our project).  Once you've implemented this, you would store an
"authorizations" string field in each Solr document, and pass in the
authorizations that the user has access to in the fq parameter of every
query.  It's also not too bad to write something that parses the
A

Re: solr always loading and not any response

2014-07-27 Thread IJ
I always get the "Loading" message on the Solr Admin Console if I use IE.
However - the page loads perfectly fine when I use Google Chrome or Mozilla
Firefox.
Could you check if your problem resolves itself if you use a different
browser ???




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-always-loading-and-not-any-response-tp4148960p4149341.html
Sent from the Solr - User mailing list archive at Nabble.com.


Content-Charset header in HttpSolrServer

2014-07-27 Thread Michael Ryan
I was reviewing the httpclient code in HttpSolrServer and noticed that it sets 
a "Content-Charset" header. As far as I know this is not a real header and is 
not necessary. Anyone know a reason for this to be there? I'm guessing this was 
just a mistake when converting from httpclient3 to httpclient4.

-Michael


Re: Latest jetty

2014-07-27 Thread Shalin Shekhar Mangar
Yes, we are on Java7 so we can move now. I'll open an issue.


On Sun, Jul 27, 2014 at 5:39 AM, Bill Bell  wrote:

> Since we are now on latest Java JDK can we move to Jetty 9?
>
> Thoughts ?
>
> Bill Bell
> Sent from mobile
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Latest jetty

2014-07-27 Thread Shalin Shekhar Mangar
I found SOLR-4839 so we'll use that issue.

https://issues.apache.org/jira/browse/SOLR-4839


On Sun, Jul 27, 2014 at 8:06 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Yes, we are on Java7 so we can move now. I'll open an issue.
>
>
> On Sun, Jul 27, 2014 at 5:39 AM, Bill Bell  wrote:
>
>> Since we are now on latest Java JDK can we move to Jetty 9?
>>
>> Thoughts ?
>>
>> Bill Bell
>> Sent from mobile
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.


Subject=How to Get Highlighting Working in Velocity (Solr 4.8.0)

2014-07-27 Thread Olivier FOSTIER
May be you miss that your field "dom_title" should be
index="true" termVectors="true" termPositions="true" termOffsets="true"


Re: how to achieve static boost in solr

2014-07-27 Thread Erick Erickson
Yep, Query Elevation is a pretty blunt instrument. You should
be able to get the configuration file to re-load by issuing a reload
command rather than re-starting.

But your problem of having a bunch of different queries
return the same top doc is, indeed, the problem. You need
a complete list of query terms and each one needs an entry.

The only real alternative is to be able to somehow encode the
_reason_ these docs need to be returned first, which you
haven't articulated. If it's an arbitrary reason (i.e. "sponsored
search" or some such) it's pretty hard because there's no
rule to turn into an algorithm.

Best,
Erick


On Thu, Jul 24, 2014 at 4:30 AM, rahulmodi  wrote:

> Thanks a lot Erick,
>
> i have looked at Query Elevation Component, it works but the problem is if
> i
> need to add new  tag or update existing  tag in elevate.xml
> file then i need to restart the server in order to take effect.
>
> I have also used "forceElevation=true" even then it requires restarting
> server.
>
> Is there any way by which we can achieve this without restarting server.
>
> Also, there is another issue is that it works only when we use exact query,
> example is below:
> elevate.xml file has entry like:-
>
> 
>   http://welcome.energy.com/"; />
> 
>
> if i use "energy" as query then i get correct url as
> "http://welcome.energy.com/";
> But if i use "power energy" as query then i get another url but here also i
> want the url "http://welcome.energy.com/"; to be displayed.
>
> Please suggest how to achieve this.
> Thanks in advance.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-achieve-static-boost-in-solr-tp4148788p4148999.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Auto Suggest

2014-07-27 Thread Erick Erickson
No, although there's been some joy with using shingles. Autosuggest
works off of the _indexed tokens_. So the problem is really reducing
the tokenization to something that is  multi-word.

Best,
Erick


On Thu, Jul 24, 2014 at 5:11 AM, benjelloun  wrote:

> Hello,
>
> Did solr.SuggestComponent work on MultiValued Field to Auto suggest not
> only
> one word but the whole sentence?
>
>  indexed="true"/>
>
> Regards,
> Anass BENJELLOUN
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Auto-Suggest-tp4149004.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: how to achieve static boost in solr

2014-07-27 Thread Ahmet Arslan
Hi,

Can't you use elevateIds parameter?
https://wiki.apache.org/solr/QueryElevationComponent#elevateIds.2FexcludeIds



On Thursday, July 24, 2014 2:30 PM, rahulmodi  wrote:
Thanks a lot Erick,

i have looked at Query Elevation Component, it works but the problem is if i
need to add new  tag or update existing  tag in elevate.xml
file then i need to restart the server in order to take effect.

I have also used "forceElevation=true" even then it requires restarting
server.

Is there any way by which we can achieve this without restarting server.

Also, there is another issue is that it works only when we use exact query,
example is below:
elevate.xml file has entry like:-


  http://welcome.energy.com/"; />


if i use "energy" as query then i get correct url as
"http://welcome.energy.com/";
But if i use "power energy" as query then i get another url but here also i
want the url "http://welcome.energy.com/"; to be displayed.

Please suggest how to achieve this.
Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-achieve-static-boost-in-solr-tp4148788p4148999.html



Sent from the Solr - User mailing list archive at Nabble.com.



Re: Passivate core in Solr Cloud

2014-07-27 Thread Erick Erickson
"Does not play nice" really means it was designed to run in a
non-distributed mode. There has
been no work done to verify that it does work in cloud mode, I fully expect
some "interesting"
problems in that mode. If/when we get to it that is.

About replication: I haven't heard of any problems, but I also haven't
heard of it
working in that environment. I expect that it'll only try to replicate when
it's
loaded, so that might be interesting

Best,
Erick


On Thu, Jul 24, 2014 at 6:49 AM, Aurélien MAZOYER <
aurelien.mazo...@francelabs.com> wrote:

> Thank you Erick and Alex for your answers. Lots of core stuff seems to
> meet my requirement but it is a problem if it does not work with Solr
> Cloud. Is there an issue opened for this problem?
> If I understand well, the only solution for me is to use multiple
> monoinstances of Solr using transient cores and to distribute manually the
> cores for my tenant (I assume the LRU mechanimn will be less effective as
> it will be done per solr instance).
> When you say "does NOT play nice with distributed mode", does it also
> include the standard replication mecanism?
>
> Thanks,
>
> Regards,
>
> Aurelien
>
>
>
> Le 23/07/2014 17:21, Erick Erickson a écrit :
>
>  Do note that the lots of cores stuff does NOT play nice with in
>> distributed mode (yet).
>>
>> Best,
>> Erick
>>
>>
>> On Wed, Jul 23, 2014 at 6:00 AM, Alexandre Rafalovitch> >
>> wrote:
>>
>>  Solr has some support for large number of cores, including transient
>>> cores:http://wiki.apache.org/solr/LotsOfCores
>>>
>>> Regards,
>>> Alex.
>>> Personal:http://www.outerthoughts.com/  and @arafalov
>>> Solr resources:http://www.solr-start.com/  and @solrstart
>>> Solr popularizers community:https://www.linkedin.com/groups?gid=6713853
>>>
>>>
>>> On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER
>>>   wrote:
>>>
 Hello,

 We want to setup a Solr Cloud cluster in order to handle a high volume
 of
 documents with a multi-tenant architecture. The problem is that an
 application-level isolation for a tenant (using a mutual index with a

>>> field
>>>
 "customer") is not enough to fit our requirements. As a result, we need
 1
 collection/customer. There is more than a thousand customers and it
 seems
 unreasonable to create thousands of collections in Solr Cloud... But as

>>> we
>>>
 know that there are less than 1 query/customer/day, we are currently

>>> looking
>>>
 for a way to passivate collection when they are not in use. Can it be a

>>> good
>>>
 idea? If yes, are there best practices to implement this? What side

>>> effects
>>>
 can we expect? Do we need to put some application-level logic on top on

>>> the
>>>
 Solr Cloud cluster to choose which collection we have to unload (and

>>> maybe
>>>
 there is something smarter (and quicker?) than simply loading/unloading

>>> the
>>>
 core when it is not in used?) ?


 Thank you for your answer(s),

 Aurelien


>


Re: To warm the whole cache of Solr other than the only autowarmcount

2014-07-27 Thread Erick Erickson
Why do you think you _need_ to autowarm the entire cache? It
is, after all, an LRU cache, the theory being that the most recent
queries are most likely to be reused.

Personally I'd run some tests on using small autowarm counts
before getting at all mixed up in some complex scheme that
may not be useful at all. Say an autowarm count of 16. Then
measure using that, then say 32 then... Insure you have a real
problem before worrying about a solution! ;)

Best,
Erick


On Fri, Jul 25, 2014 at 6:45 AM, Shawn Heisey  wrote:

> On 7/24/2014 8:45 PM, YouPeng Yang wrote:
> > To Matt
> >
> >   Thank you,your opinion is very valuable ,So I have checked the source
> > codes about how the cache warming  up. It seems to just put items of the
> > old caches into the new caches.
> >   I will pull Mark Miller into this discussion.He is the one of the
> > developer of the Solr whom  I had  contacted with.
> >
> >  To Mark Miller
> >
> >Would you please check out what we are discussing in the last two
> > posts.I need your help.
>
> Matt is completely right.  Any commit can drastically change the Lucene
> document id numbers.  It would be too expensive to determine which
> numbers haven't changed.  That means Solr must throw away all cache
> information on commit.
>
> Two of Solr's caches support autowarming.  Those caches use queries as
> keys and results as values.  Autowarming works by re-executing the top N
> queries (keys) in the old cache to obtain fresh Lucene document id
> numbers (values).  The cache code does take *keys* from the old cache
> for the new cache, but not *values*.  I'm very sure about this, as I
> wrote the current (and not terribly good) LFUCache.
>
> Thanks,
> Shawn
>
>


Re: SolrCloud extended warmup support

2014-07-27 Thread Erick Erickson
H, well _I_ don't know what to say then

This is puzzling. How much of a latency difference are you seeing?

It'd be interesting to see what happens if you experiment with
only going to a single shard (add &distrib=false to the query). Each
cache is local to the shard, so it's vaguely possible that you're
seeing queries hit different shards and in aggregate reduce your
total latency. But I'm really shooting in the dark here.

Best,
Erick


On Mon, Jul 21, 2014 at 5:57 PM, Erick Erickson 
wrote:

> I've never seen it necessary to run "thousands of queries"
> to warm Solr. Usually less than a dozen will work fine. My
> challenge would be for you to measure performance differences
> on queries after running, say, 12 well-chosen queries as
> opposed to hundreds/thousands. I bet that if
> 1> you search across all the relevant fields, you'll fill up the
>  low-level caches for those fields.
> 2> you facet on all the fields you intend to facet on.
> 3> you sort on all the fields you intend to sort on.
> 4> you specify some filter queries. This is fuzzy since
>  really depends on you being able to predict what
>  those will be for firstSearcher. Things like "in the
>  last day/week/month" can be pre-configured, but
>  others you won't get. BTW, here's a blog about
>  why "in the last day" fq clauses can be tricky.
>http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> that you'll pretty much nail warmup and be fine. Note that
> you can do all the faceting on a single query. Specifying
> the primary, secondary & etc. sorts will fill those caches.
>
> Best,
> Erick
>
>
> On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes 
> wrote:
>
>>
>> On 7/21/14, 4:50 PM, "Shawn Heisey"  wrote:
>>
>> >On 7/21/2014 5:37 PM, Jeff Wartes wrote:
>> >> I¹d like to ensure an extended warmup is done on each SolrCloud node
>> >>prior to that node serving traffic.
>> >> I can do certain things prior to starting Solr, such as pump the index
>> >>dir through /dev/null to pre-warm the filesystem cache, and post-start I
>> >>can use the ping handler with a health check file to prevent the node
>> >>from entering the clients load balancer until I¹m ready.
>> >> What I seem to be missing is control over when a node starts
>> >>participating in queries sent to the other nodes.
>> >>
>> >> I can, of course, add solrconfig.xml firstSearcher queries, which I
>> >>assume (and fervently hope!) happens before a node registers itself in
>> >>ZK clusterstate.json as ready for work, but that doesn¹t scale so well
>> >>if I want that initial warmup to run thousands of queries, or run them
>> >>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive
>> >>to the size.
>> >>
>> >> Any ideas, or corrections to my assumptions?
>> >
>> >I think that firstSearcher/newSearcher (and making sure useColdSearcher
>> >is set to false) is going to be the only way you can do this in a way
>> >that's compatible with SolrCloud.  If you were doing manual distributed
>> >search without SolrCloud, you'd have more options available.
>> >
>> >If useColdSearcher is set to false, that should keep *everything* from
>> >using the searcher until the warmup has finished.  I cannot be certain
>> >that this is the case, but I have some reasonable confidence that this
>> >is how it works.  If you find that it doesn't behave this way, I'd call
>> >it a bug.
>> >
>> >Thanks,
>> >Shawn
>>
>>
>> Thanks for the quick reply. Since distributed search latency is the max of
>> the shard sub-requests, I¹m trying my best to minimize any spikes in
>> cluster latency due to node restarts.
>> I double-checked useColdSearcher was false, but the doc says this means
>> requests ³block until the first searcher is done warming², which
>> translates pretty clearly to ³latency spike². The more I think about it,
>> the more worried I am that a node might indeed register itself in
>> live_nodes and get distributed requests before it¹s got a searcher to work
>> with. *Especially* if I have lots of serial firstSearcher queries.
>>
>> I¹ll look through the code myself tomorrow, but if anyone can help
>> confirm/deny the order of operations here, I¹d appreciate it.
>>
>>
>


Re: Slow inserts when using Solr Cloud

2014-07-27 Thread Erick Erickson
bq: Whoa! That's awesome!

And scary.

Ian: Thanks a _lot_ for trying this out and reporting back.

Also, let me say that this was a nice writeup, I wish more people would post
as thorough a problem statement!

Best,
Erick


On Sat, Jul 26, 2014 at 5:08 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Whoa! That's awesome!
>
>
> On Fri, Jul 25, 2014 at 8:03 PM, ian  wrote:
>
> > I've built and installed the latest snapshot of Solr 4.10 using the same
> > SolrCloud configuration and that gave me a tenfold increase in
> throughput,
> > so it certainly looks like SOLR-6136 was the issue that was causing my
> slow
> > insert rate/high latency with shard routing and replicas.  Thanks for
> your
> > help.
> >
> >
> > Timothy Potter wrote
> > > Hi Ian,
> > >
> > > What's the CPU doing on the leader? Have you tried attaching a
> > > profiler to the leader while running and then seeing if there are any
> > > hotspots showing. Not sure if this is related but we recently fixed an
> > > issue in the area of leader forwarding to replica that used too many
> > > CPU cycles inefficiently - see SOLR-6136.
> > >
> > > Tim
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4149219.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: how to extract stats component with solrj 4.9.0

2014-07-27 Thread Erick Erickson
Have you tried the getFieldStatsInfo method in the QueryResponse object?

Best,
Erick


On Sat, Jul 26, 2014 at 3:36 PM, Edith Au  wrote:

> I have a solr query like this
>
> q=categories:cat1 OR
> categories:cat2&stats=true&stats.field=count&stats.facet=block_num
>
> Basically, I want to get the sum(count) group by block num.
>
>
> This query works on a browser. But with solrj, I could not access the stats
> fields from the Response obj. I can do a response.getFieldStatsInfo(). But
> it is not what I want. Here is how I construct the query
>
> SolrQuery query = new SolrQuery(q);
> query.add("stats", "true");
> query.add("stats.field", "count");
> query.add("stats.facet", "block_num");
>
> With a debugger, I could see that the response has a private statsInfo
> object and it has the information I am looking for. But there is no api to
> access the private object.
>
> I would like to know if there is
>
>1. a better way to construct my query. I only need the sum of (count),
>group by block num
>2. a way to access the hidden statsInfo object in the query response()?
>[it is so frustrated. I can see all the info I need in the private obj
> on
>my debugger!]
>
> Thanks!
>
>
> ps. I posted this question on stackoverflow but have gotten no response so
> far.  Any help will be greatly appreciated!
>
> Thanks!
>


Re: /solr/admin/ping causing exceptions in log?

2014-07-27 Thread Shawn Heisey
On 7/26/2014 5:15 PM, Nathan Neulinger wrote:
> Recently deployed haproxy in front of my solr instances, and seeing a
> large number of exceptions in the logs now... Example below. I can pound
> the server with requests against /solr/admin/ping via curl, with no
> obvious issue, but the haproxy checks appear to be aggravating something.
> 
> Solr 4.8.0 w/ solr cloud, 2 nodes, 3 zk, linux x86_64
> 
> It seems like when the issue occurs, I get a set of the errors all in a
> burst (below), never just one.
> 
> Suggestions?
> 
> -- Nathan
> 
> 
> Nathan Neulinger   nn...@neulinger.org
> Neulinger Consulting   (573) 612-1412
> 
> 
> 
> 2014-07-26 23:04:36,506 ERROR qtp1532385072-4864
> [g.apache.solr.servlet.SolrDispatchFilter]  -
> null:org.eclipse.jetty.io.EofException

EofException means that the client has disconnected the TCP connection
before Solr has responded to the request.

I assume that this is the httpchk config to make sure that the server is
operational.  If so, you need to increase the "timeout check" value,
because it is too small.  The ping request is taking longer to run than
you have allowed in the timeout.  Here's part of my haproxy config:

listen  idx_nc
bind 0.0.0.0:8984
option  httpchk GET /solr/ncmain/admin/ping
balance leastconn
timeout check   4990
server  idxa1 10.100.0.240:8981 check inter 5s fastinter 2s rise
3 fall 2 weight 100
server  idxb1 10.100.0.241:8981 check inter 5s fastinter 2s rise
3 fall 2 weight 100 backup
server  idxa2 10.100.0.242:8981 check inter 15s fastinter 2s
rise 2 fall 1 weight 2 backup
server  idxb2 10.100.0.243:8981 check inter 15s fastinter 2s
rise 2 fall 1 weight 1 backup

If you have allowed what you think is plenty of time, then you may need
to investigate Solr's performance or the specific query that you are
using for the ping.

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Latest jetty

2014-07-27 Thread Shawn Heisey
On 7/27/2014 8:37 AM, Shalin Shekhar Mangar wrote:
> I found SOLR-4839 so we'll use that issue.
> 
> https://issues.apache.org/jira/browse/SOLR-4839

I hope you have better luck than I did.  It wasn't a simple matter of
upgrading the jars and locating simple API changes, a job that I've
tackled a few times.  More extensive knowledge of jetty will be
required, knowledge that I do not have.

Thanks,
Shawn



Re: /solr/admin/ping causing exceptions in log?

2014-07-27 Thread Nathan Neulinger
Cool. That's likely exactly it, since I don't have one set, it's using the check interval, and occasionally must just be 
too short.


Thank you!

-- Nathan



I assume that this is the httpchk config to make sure that the server is
operational.  If so, you need to increase the "timeout check" value,
because it is too small.  The ping request is taking longer to run than
you have allowed in the timeout.  Here's part of my haproxy config:



--

Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412


Re: /solr/admin/ping causing exceptions in log?

2014-07-27 Thread Nathan Neulinger

Unfortunately, doesn't look like this clears the symptom.

The ping is responding almost instantly every time. I've tried setting a 15 second timeout on the check, with no change 
in occurences of the error.


Looking at a packet capture on the server side, there is a clear distinction between working and 
failing/error-triggering connections.


It looks like in a "working" case, I see two packets immediately back to back (one with header, and next a continuation 
with content) with no ack in between, followed by ack, rst+ack, rst.


In the failing request, I see the GET request, acked, then the http/1.1 200 Ok response from Solr, a single ack, and 
then an almost instantaneous reset sent by the client.



I'm only seeing this on traffic to/from haproxy checks. If I do a simple:

while [ true ]; do curl -s http://host:8983/solr/admin/ping; done

from the same box, that flood runs with generally 10-20ms request times and 
zero errors.

-- Nathan

On 07/27/2014 07:12 PM, Nathan Neulinger wrote:

Cool. That's likely exactly it, since I don't have one set, it's using the 
check interval, and occasionally must just be
too short.

Thank you!

-- Nathan



I assume that this is the httpchk config to make sure that the server is
operational.  If so, you need to increase the "timeout check" value,
because it is too small.  The ping request is taking longer to run than
you have allowed in the timeout.  Here's part of my haproxy config:





--

Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412


solr-working.cap
Description: application/vnd.tcpdump.pcap


solr-cutoff2.cap
Description: application/vnd.tcpdump.pcap


Re: /solr/admin/ping causing exceptions in log?

2014-07-27 Thread Nathan Neulinger

Either way, looks like this is not a SOLR issue, but rather haproxy.

Thanks.

-- Nathan

On 07/27/2014 08:23 PM, Nathan Neulinger wrote:

Unfortunately, doesn't look like this clears the symptom.

The ping is responding almost instantly every time. I've tried setting a 15 
second timeout on the check, with no change
in occurences of the error.

Looking at a packet capture on the server side, there is a clear distinction 
between working and
failing/error-triggering connections.

It looks like in a "working" case, I see two packets immediately back to back 
(one with header, and next a continuation
with content) with no ack in between, followed by ack, rst+ack, rst.

In the failing request, I see the GET request, acked, then the http/1.1 200 Ok 
response from Solr, a single ack, and
then an almost instantaneous reset sent by the client.


I'm only seeing this on traffic to/from haproxy checks. If I do a simple:

 while [ true ]; do curl -s http://host:8983/solr/admin/ping; done

from the same box, that flood runs with generally 10-20ms request times and 
zero errors.

-- Nathan

On 07/27/2014 07:12 PM, Nathan Neulinger wrote:

Cool. That's likely exactly it, since I don't have one set, it's using the 
check interval, and occasionally must just be
too short.

Thank you!

-- Nathan



I assume that this is the httpchk config to make sure that the server is
operational.  If so, you need to increase the "timeout check" value,
because it is too small.  The ping request is taking longer to run than
you have allowed in the timeout.  Here's part of my haproxy config:







--

Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412


Re: /solr/admin/ping causing exceptions in log?

2014-07-27 Thread Shawn Heisey
On 7/27/2014 7:23 PM, Nathan Neulinger wrote:
> Unfortunately, doesn't look like this clears the symptom.
> 
> The ping is responding almost instantly every time. I've tried setting a
> 15 second timeout on the check, with no change in occurences of the error.
> 
> Looking at a packet capture on the server side, there is a clear
> distinction between working and failing/error-triggering connections.
> 
> It looks like in a "working" case, I see two packets immediately back to
> back (one with header, and next a continuation with content) with no ack
> in between, followed by ack, rst+ack, rst.
> 
> In the failing request, I see the GET request, acked, then the http/1.1
> 200 Ok response from Solr, a single ack, and then an almost
> instantaneous reset sent by the client.
> 
> 
> I'm only seeing this on traffic to/from haproxy checks. If I do a simple:
> 
> while [ true ]; do curl -s http://host:8983/solr/admin/ping; done
> 
> from the same box, that flood runs with generally 10-20ms request times
> and zero errors.

I won't claim to understand what's going on here, but it might be a
matter of the haproxy options.  Here are the options I'm using in the
"defaults" section of the config:

defaults
log global
modehttp
option  httplog
option  dontlognull
option  redispatch
option  abortonclose
option  http-server-close
option  http-pretend-keepalive
retries 1
maxconn 1024
timeout connect 1s
timeout client  5s
timeout server  30s

One bit of information I came across when I first started setting
haproxy up for Solr is that servlet containers like Jetty and Tomcat
require the "http-pretend-keepalive" option to work properly.  Are you
using this option?

Thanks,
Shawn



Re: Solr Full Import frozen after indexing a fixed number of records

2014-07-27 Thread Aniket Bhoi
On Sun, Jul 27, 2014 at 12:28 PM, Gora Mohanty  wrote:

> On 27 July 2014 12:13, Aniket Bhoi  wrote:
>
> > On Fri, Jul 25, 2014 at 8:32 PM, Aniket Bhoi 
> > wrote:
> >
> > > I have Apache Solr,hosted on my apache Tomcat Server with SQLServer
> > > Backend.
> >
> > [...]
>
> > > After I run a full import,Indexing proceeds sucessfully,but seems to
> > > freeze everytime after fetching fixed number of records.What I mean is
> > > after it fetches 10730 records it just freezes and doesnt process any
> > more.
> > >
> > > Excerpt from dataimport.xml:
> > >
> > > 
> > > 0:15:31.959
> > > 0
> > > *10730*
> > > 3579
> > > 0
> > > 2014-07-25 10:44:39
> > >
> > > This seems to happen everytime.
> > >
> > > I checked the tomcatlog.Following is the excerpt when Solr freezes:
> > >
> > > INFO:  Generating record for Unique ID :null attachment Ref:null
> > > parent ref :nullexecuted by thread:25
> >
>
> [...]
>
> Something is wrong with your DIH config file: You seem to be getting null
> for a document unique ID. Please share the file with us.
>
> Regards,
> Gora
>


Hi,

The thing is that I have 3 Solr instances deployed across Dev ,QA and
Production.They have the exact same configuration files but point to
different databases.The DIH config is the same across all 2 instances.It is
only in QA that this issue occurs though.Thoughts on this?

Regards,
Aniket


Re: To warm the whole cache of Solr other than the only autowarmcount

2014-07-27 Thread YouPeng Yang
Hi Erick

We do the DIH job from the DB and committed frequently.It takes a long time
to autowarm the filterCaches after commit or soft commit  happened when
setting the autowarmcount=1024,which I do think is small enough.
So It comes up an idea that whether it  could  directly pass the reference
of the caches   over to the new caches so that the autowarm processing will
take much fewer time .



2014-07-28 2:30 GMT+08:00 Erick Erickson :

> Why do you think you _need_ to autowarm the entire cache? It
> is, after all, an LRU cache, the theory being that the most recent
> queries are most likely to be reused.
>
> Personally I'd run some tests on using small autowarm counts
> before getting at all mixed up in some complex scheme that
> may not be useful at all. Say an autowarm count of 16. Then
> measure using that, then say 32 then... Insure you have a real
> problem before worrying about a solution! ;)
>
> Best,
> Erick
>
>
> On Fri, Jul 25, 2014 at 6:45 AM, Shawn Heisey  wrote:
>
> > On 7/24/2014 8:45 PM, YouPeng Yang wrote:
> > > To Matt
> > >
> > >   Thank you,your opinion is very valuable ,So I have checked the source
> > > codes about how the cache warming  up. It seems to just put items of
> the
> > > old caches into the new caches.
> > >   I will pull Mark Miller into this discussion.He is the one of the
> > > developer of the Solr whom  I had  contacted with.
> > >
> > >  To Mark Miller
> > >
> > >Would you please check out what we are discussing in the last two
> > > posts.I need your help.
> >
> > Matt is completely right.  Any commit can drastically change the Lucene
> > document id numbers.  It would be too expensive to determine which
> > numbers haven't changed.  That means Solr must throw away all cache
> > information on commit.
> >
> > Two of Solr's caches support autowarming.  Those caches use queries as
> > keys and results as values.  Autowarming works by re-executing the top N
> > queries (keys) in the old cache to obtain fresh Lucene document id
> > numbers (values).  The cache code does take *keys* from the old cache
> > for the new cache, but not *values*.  I'm very sure about this, as I
> > wrote the current (and not terribly good) LFUCache.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: To warm the whole cache of Solr other than the only autowarmcount

2014-07-27 Thread YouPeng Yang
Hi Shawn
  No affense to your work,I am still confusing about the cache warm
processing about your explanation.So I check the warm method of
FastLRUCache as [1].
  As far as I see,there is no values refresh during the the warm
processing. the  *regenerator.regenerateItem* just put the old value to the
new cache.

 Did I miss anything?

[1]--
  public void warm(SolrIndexSearcher searcher, SolrCache old) {
if (regenerator == null) return;
long warmingStartTime = System.nanoTime();
FastLRUCache other = (FastLRUCache) old;
// warm entries
if (isAutowarmingOn()) {
  int sz = autowarm.getWarmCount(other.size());
  Map items = other.cache.getLatestAccessedItems(sz);
  Map.Entry[] itemsArr = new Map.Entry[items.size()];
  int counter = 0;
  for (Object mapEntry : items.entrySet()) {
itemsArr[counter++] = (Map.Entry) mapEntry;
  }
  for (int i = itemsArr.length - 1; i >= 0; i--) {
try {
  boolean continueRegen = regenerator.regenerateItem(searcher,
  this, old, itemsArr[i].getKey(), itemsArr[i].getValue());
  if (!continueRegen) break;
}
catch (Exception e) {
  SolrException.log(log, "Error during auto-warming of key:" +
itemsArr[i].getKey(), e);
}
  }
}
warmupTime = TimeUnit.MILLISECONDS.convert(System.nanoTime() -
warmingStartTime, TimeUnit.NANOSECONDS);
  }


2014-07-25 21:45 GMT+08:00 Shawn Heisey :

> On 7/24/2014 8:45 PM, YouPeng Yang wrote:
> > To Matt
> >
> >   Thank you,your opinion is very valuable ,So I have checked the source
> > codes about how the cache warming  up. It seems to just put items of the
> > old caches into the new caches.
> >   I will pull Mark Miller into this discussion.He is the one of the
> > developer of the Solr whom  I had  contacted with.
> >
> >  To Mark Miller
> >
> >Would you please check out what we are discussing in the last two
> > posts.I need your help.
>
> Matt is completely right.  Any commit can drastically change the Lucene
> document id numbers.  It would be too expensive to determine which
> numbers haven't changed.  That means Solr must throw away all cache
> information on commit.
>
> Two of Solr's caches support autowarming.  Those caches use queries as
> keys and results as values.  Autowarming works by re-executing the top N
> queries (keys) in the old cache to obtain fresh Lucene document id
> numbers (values).  The cache code does take *keys* from the old cache
> for the new cache, but not *values*.  I'm very sure about this, as I
> wrote the current (and not terribly good) LFUCache.
>
> Thanks,
> Shawn
>
>


Re: Slow inserts when using Solr Cloud

2014-07-27 Thread Shalin Shekhar Mangar
I'm benchmarking this right now so I'll share some numbers soon.


On Mon, Jul 28, 2014 at 12:45 AM, Erick Erickson 
wrote:

> bq: Whoa! That's awesome!
>
> And scary.
>
> Ian: Thanks a _lot_ for trying this out and reporting back.
>
> Also, let me say that this was a nice writeup, I wish more people would
> post
> as thorough a problem statement!
>
> Best,
> Erick
>
>
> On Sat, Jul 26, 2014 at 5:08 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Whoa! That's awesome!
> >
> >
> > On Fri, Jul 25, 2014 at 8:03 PM, ian  wrote:
> >
> > > I've built and installed the latest snapshot of Solr 4.10 using the
> same
> > > SolrCloud configuration and that gave me a tenfold increase in
> > throughput,
> > > so it certainly looks like SOLR-6136 was the issue that was causing my
> > slow
> > > insert rate/high latency with shard routing and replicas.  Thanks for
> > your
> > > help.
> > >
> > >
> > > Timothy Potter wrote
> > > > Hi Ian,
> > > >
> > > > What's the CPU doing on the leader? Have you tried attaching a
> > > > profiler to the leader while running and then seeing if there are any
> > > > hotspots showing. Not sure if this is related but we recently fixed
> an
> > > > issue in the area of leader forwarding to replica that used too many
> > > > CPU cycles inefficiently - see SOLR-6136.
> > > >
> > > > Tim
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4149219.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


copy EnumField to text field

2014-07-27 Thread Elran Dvir
Hi all,

I have an enumField called severity.
these are its relevant definitions in schema.xml:
  


And in enumsConfig.xml:

Not Available
Low
Medium
High
Critical


The default field for free text search is text.

An enum field can be sent with its integer value or with its string value, and 
the value will stored and indexed as integer and displayed as string.
When severity is sent with "Not Available", there will be matches for the free 
text search of "Not Available".
When severity is sent with "0" (the integer equivalent of " Not Available"), 
there will be no matches for the free text search of "Not Available".
In order to enable it, the following change should be made in DocumentBuilder:

Instead of:
// Perhaps trim the length of a copy field Object val = v;

The code will be:
// Perhaps trim the length of a copy field Object val = 
sfield.getType().toExternal(sfield.createField(v, 1.0f));

Am I right? It seems to work.
I think this change is suitable for all field types. What do you think?

But when no value is sent with severity, and the default of 0 is used, the fix 
doesn't seem to work.
How can I make it work also for default values?
  
Thanks.


Re: copy EnumField to text field

2014-07-27 Thread Alexandre Rafalovitch
On Mon, Jul 28, 2014 at 1:31 PM, Elran Dvir  wrote:
> But when no value is sent with severity, and the default of 0 is used, the 
> fix doesn't seem to work.

I guess the default in this case is figured out at the query time
because there is no empty value as such. So that would be too late for
copyField. If I am right, then you could probably use
UpdateRequestProcessor and set the default value explicitly
(DefaultValueUpdateProcessorFactory).

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853