from:"Andy Lester"

Re: Perl Solr help - doing : query

2013-07-09 Thread Andy Lester

On Jul 9, 2013, at 2:48 PM, Shawn Heisey  wrote:

> This is primarily to Andy Lester, who wrote the WebService::Solr module
> on CPAN, but I'll take a response from anyone who knows what I can do.
> 
> If I use the following Perl code, I get an error.

What error do you get?  Never say "I get an error."  Always say "I get this 
error: ."

>  If I try to build
> some other query besides *:* to request all documents, the script runs,
> but the query doesn't do what I asked it to do.

What DOES it do?

> http://apaste.info/3j3Q

For the sake of future readers, please put your code in the message.  This 
message will get archived, and future people reading the lists will not be able 
to read the code at some arbitrary paste site.

Shawn's code is:

use strict;
use WebService::Solr;
use WebService::Solr::Query;
use WebService::Solr::Response;

my $url = "http://idx.REDACTED.com:8984/solr/ncmain";;
my $solr = WebService::Solr->new($url);
my $query = WebService::Solr::Query->new("*:*");
my $response = $solr->search($query, {'rows' => '0'});
my $numFound = $response->content->{response}->{numFound};

print "nf: $numFound\n";

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Email regular expression.

2013-07-30 Thread Andy Lester

On Jul 30, 2013, at 9:53 AM, Luis Cappa Banda  wrote:

> The syntax is the following:
> 
> *E-mail: *
> text:/[a-z0-9_\|-]+(\.[a-z0-9_\|-]|)*@[a-z0-9-]|(\.[a-z0-9-]|)*\.([a-z]{2,4})/

Please note that the question of "How do I write a regex to match an email 
address" is one of the most discussed on the Internet.  Googling for "email 
address regular expression" will give you many many many many hits discussing 
how to do it, and lots of hotly-contested debates.  The topic is not nearly as 
simple as you might think at first glance.

There is no "right" way to do it.  Every approach you take will involve 
tradeoffs.  Read up on this already well-discussed topic and decide what answer 
is best for you in your case.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Schema Lint

2013-08-06 Thread Andy Lester


On Aug 6, 2013, at 9:55 AM, Steven Bower  wrote:

> Is there an easy way in code / command line to lint a solr config (or even
> just a solr schema)?

No, there's not.  I would love there to be one, especially for the DIH.

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Andy Lester


On Mar 24, 2013, at 10:18 PM, Steve Rowe  wrote:

> The wiki at http://wiki.apache.org/solr/ has come under attack by spammers 
> more frequently of late, so the PMC has decided to lock it down in an attempt 
> to reduce the work involved in tracking and removing spam.
> 
> From now on, only people who appear on 
> http://wiki.apache.org/solr/ContributorsGroup will be able to 
> create/modify/delete wiki pages.
> 
> Please request either on the solr-user@lucene.apache.org or on 
> d...@lucene.apache.org to have your wiki username added to the 
> ContributorsGroup page - this is a one-time step.


Please add my username, AndyLester, to the approved editors list.  Thanks.

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Solr indexing

2013-04-18 Thread Andy Lester

On Apr 18, 2013, at 10:49 AM, hassancrowdc  wrote:

> Solr is not showing the dates i have in database. any help? is solr following
> any specific timezone? On my database my date is 2013-04-18 11:29:33 but
> solr shows me "2013-04-18T15:29:33Z".   Any help

Solr knows nothing of timezones.  Solr expects everything is in UTC.  If you 
want time zone support, you'll have to convert local time to UTC before 
importing, and then convert back to local time from UTC when you read from Solr.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester


On May 2, 2013, at 3:36 AM, "Jack Krupansky"  wrote:

> RC4 of 4.3 is available now. The final release of 4.3 is likely to be within 
> days.


How can I see the Changelog of what will be in it?

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester


On May 2, 2013, at 9:03 AM, Yago Riveiro  wrote:

> The road map has this release note, but I think that most of it will be move 
> to 4.3.1 or 4.4
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230&version=12324128
>  

So, is there a way I can see what is currently pending to go in 4.3?

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:11 AM, Yago Riveiro  wrote:

> In attachment the change log of solr 4.3 RC3 
> 

And where would I find that?  I don't see anything at 
http://lucene.apache.org/solr/downloads.html to download?  Do I need to check 
out Subversion repo?  Is there a page somewhere that describes the process set 
up?

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester


On May 2, 2013, at 9:20 AM, Alexandre Rafalovitch  wrote:

> Hopefully, this is not a secret, but the RCs are built and available
> for download and announced on the dev mailing list.


Thanks for the link.

I don't think it's a secret, but I sure don't see anything that says "This is 
how the dev process works."

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester

I'm working on using spellcheck for giving suggestions, and collations
are giving me good results, but they turn out to be very slow if
my original query has any FQs in it.  We can do 100 maxCollationTries
in no time at all, but if there are FQs in the query, things get
very slow.  As maxCollationTries and the count of FQs increase,
things get very slow very quickly.

 1102050   100 MaxCollationTries
0FQs 8 9101110
1FQ 11   160   599  1597  1668
2FQs20   346  1163  3360  3361
3FQs29   474  1852  5039  5095
4FQs36   589  2463  6797  6807

All times are QTimes of ms.

See that top row?  With no FQs, 50 MaxCollationTries comes back
instantly.  Add just one FQ, though, and things go bad, and they
get worse as I add more of the FQs.  Also note that things seem to
level off at 100 MaxCollationTries.

Here's a query that I've been using as a test:

df=title_tracings_t&
fl=flrid,nodeid,title_tracings_t&
q=bagdad+AND+diaries+AND+-parent_tracings:(bagdad+AND+diaries)&
spellcheck.q=bagdad+AND+diaries&
rows=4&
wt=xml&
sort=popular_score+desc,+grouping+asc,+copyrightyear+desc,+flrid+asc&
spellcheck=true&
spellcheck.dictionary=direct&
spellcheck.onlyMorePopular=false&
spellcheck.count=15&
spellcheck.extendedResults=false&
spellcheck.collate=true&
spellcheck.maxCollations=10&
spellcheck.maxCollationTries=50&
spellcheck.collateExtendedResults=true&
spellcheck.alternativeTermCount=5&
spellcheck.maxResultsForSuggest=10&
debugQuery=off&
fq=((grouping:"1"+OR+grouping:"2"+OR+grouping:"3")+OR+solrtype:"N")&
fq=((item_source:"F"+OR+item_source:"B"+OR+item_source:"M")+OR+solrtype:"N")&
fq={!tag%3Dgrouping}((grouping:"1"+OR+grouping:"2")+OR+solrtype:"N")&
fq={!tag%3Dlanguagecode}(languagecode:"eng"+OR+solrtype:"N")&

The only thing that changes between tests is the value of
spellcheck.maxCollationTries and how many FQs are at the end.

Am I doing something wrong?  Do the collation internals not handle
FQs correctly?  The lookup/hit counts on filterCache seem to be
increasing just fine.  It will do N lookups, N hits, so I'm not
thinking that caching is the problem.

We'd really like to be able to use the spellchecker but the results
with only 10-20 maxCollationTries aren't nearly as good as if we
can bump that up to 100, but we can't afford the slow response time.
We also can't do without the FQs.

Thanks,
Andy


--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester

Thanks for looking at this.

> What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck 
> entirely turned off?  Is it about (or a little more than) half the total when 
> maxCollationTries=1 ?

With spellcheck off I get 8ms for 4fq query.


>  Also, with the varying # of fq's, how many collation tries does it take to 
> get 10 collations?

I don't know.  How can I tell?


> Possibly, a better way to test this is to set maxCollations = 
> maxCollationTries.  The reason is that it quits "trying" once it finds 
> "maxCollations", so if with 0fq's, lots of combinations can generate hits and 
> it doesn't need to try very many to get to 10.  But with more fq's, fewer 
> collations will pan out so now it is trying more up to 100 before (if ever) 
> it gets to 10.

It does just fine doing 100 collations so long as there are no FQs.  It seems 
to me that the FQs are taking an inordinate amount of extra time.  100 
collations in (roughly) the same amount of time as a single collation, so long 
as there are no FQs.  Why are the FQs such a drag on the collation process?


> (I'm assuming you have all non-search components like faceting turned off).

Yes, definitely.


>  So say with 2fq's it takes 10ms for the query to complete with spellcheck 
> off, and 20ms with "maxCollation = maxCollationTries = 1", then it will take 
> about 110ms with "maxCollation = maxCollationTries = 10".

I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so 
long as I have FQs off.  Add a single FQ and it becomes 13499ms.

I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so 
long as I have FQs off.  Add a single FQ and it becomes 62038ms.


> But I think you're just setting maxCollationTries too high.  You're asking it 
> to do too much work in trying teens of combinations.

The results I get back with 100 tries are about twice as many as I get with 10 
tries.  That's a big difference to the user where it's trying to figure 
misspelled phrases.

Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Andy Lester

On May 29, 2013, at 9:46 AM, "Dyer, James"  wrote:

> Just an instanity check, I see I had misspelled "maxCollations" as 
> "maxCollation" in my prior response.  When you tested with this set the same 
> as "maxCollationTries", did you correct my spelling?

Yes, definitely.

Thanks for the ticket.  I am looking at the effects of turning on 
spellcheck.onlyMorePopular to true, which reduces the number of collations it 
seems to do, but doesn't affect the underlying question of "is the spellchecker 
doing FQs properly?"

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Solr Security

2013-06-24 Thread Andy Lester


On Jun 24, 2013, at 12:51 AM, Aaron Greenspan  wrote:

>  all of them are terrible,

> it looks like you can edit some XML files (if you can find them) 

> The wiki itself is full of semi-useless information, which is pretty 
> infuriating since it's supposed to be the best source.

> Statements like "standard Java web security can be added by tuning the 
> container and the Solr web application configuration itself via web.xml" are 
> not helpful to me.

>  this giant mess,

> It's just common sense.

> Netscape Enterprise Server prompted you to do that a decade and a half ago

>  But either way, that's a pretty ridiculous solution.

> I don't know of any other server product that disregards security so 
> willingly.


Why are you wasting your time with such an inferior project?  Perhaps 
ElasticSearch is more to your liking.

xoxo,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: stopwords in solr

2012-11-27 Thread Andy Lester

On Nov 28, 2012, at 12:33 AM, Joe Zhang  wrote:

> that is really strange. so basic stopwords such as "a" "the' are not
> eliminated from the index?

There is no list of "basic stopwords" anywhere.  If you want stop words, you 
have to put them in the file yourself.  There are not really any sensible 
defaults for stopwords, so Solr doesn't provide them.

Just add them to the stopwords.txt and reindex your core.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: solr searchHandler/searchComponent for query statistics

2012-12-06 Thread Andy Lester

On Dec 6, 2012, at 9:50 AM, "joe.cohe...@gmail.com"  
wrote:

> Is there an out-of-the-box or have anyone already implemented a feature for
> collecting statistics on queries?

What sort of statistics are you talking about?  Are you talking about 
collecting information in aggregate about queries over time?  Or for giving 
statistics about individual queries, like time breakouts for benchmarking?

For the latter, you want "debugQuery=true" and you get a raft of stats down in 
.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Andy Lester

We've got an 11,000,000-document index. Most documents have a unique ID called
"flrid", plus a different ID called "solrid" that is Solr's PK. For some
searches, we need to be able to limit the searches to a subset of documents
defined by a list of FLRID values. The list of FLRID values can change between
every search and it will be rare enough to call it "never" that any two
searches will have the same set of FLRIDs to limit on.

What we're doing right now is, roughly:

q=title:dogs AND
(flrid:(123 125 139 34823) OR
flrid:(34837 ... 59091) OR
... OR
flrid:(101294813 ... 103049934))

Each of those FQs parentheticals can be 1,000 FLRIDs strung together. We have
to subgroup to get past Solr's limitations on the number of terms that can be
ORed together.

The problem with this approach (besides that it's clunky) is that it seems to
perform O(N^2) or so. With 1,000 FLRIDs, the search comes back in 50ms or so.
If we have 10,000 FLRIDs, it comes back in 400-500ms. With 100,000 FLRIDs,
that jumps up to about 75000ms. We want it be on the order of 1000-2000ms at
most in all cases up to 100,000 FLRIDs.

How can we do this better?

Things we've tried or considered:

* Tried: Using dismax with minimum-match mm:0 to simulate an OR query. No
improvement.
* Tried: Putting the FLRIDs into the fq instead of the q. No improvement.
* Considered: dumping all the FLRIDs for a given search into another core and
doing a join between it and the main core, but if we do five or ten searches
per second, it seems like Solr would die from all the commits. The set of
FLRIDs is unique between searches so there is no reuse possible.
* Considered: Translating FLRIDs to SolrID and then limiting on SolrID instead,
so that Solr doesn't have to hit the documents in order to translate
FLRID->SolrID to do the matching.

What we're hoping for:

* An efficient way to pass a long set of IDs, or for Solr to be able to pull
them from the app's Oracle database.
* Have Solr do big ORs as a set operation not as (what we assume is) a naive
one-at-a-time matching.
* A way to create a match vector that gets passed to the query, because strings
of fqs in the query seems to be a suboptimal way to do it.

I've searched SO and the web and found people asking about this type of
situation a few times, but no answers that I see beyond what we're doing now.

*
http://stackoverflow.com/questions/11938342/solr-search-within-subset-defined-by-list-of-keys
*
http://stackoverflow.com/questions/9183898/searching-within-a-subset-of-data-solr
*
http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-td502245.html
*
http://lucene.472066.n3.nabble.com/Search-within-a-subset-of-documents-td1680475.html

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: [Beginner] wants to contribute in open source project

2013-03-11 Thread Andy Lester

On Mar 11, 2013, at 11:14 AM, chandresh pancholi 
 wrote:

> I am beginner in this field. It would be great if you help me out. I love
> to code in java.
> can you guys share some link so that i can start contributing in
> solr/lucene project.

This article I wrote about getting started contributing to projects may give 
you some ideas.

http://blog.smartbear.com/software-quality/bid/167051/14-Ways-to-Contribute-to-Open-Source-without-Being-a-Programming-Genius-or-a-Rock-Star

I don't have tasks specifically for the Solr project (does Solr have such a 
list for newcomers to help on?) but I hope that you'll get some ideas.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-12 Thread Andy Lester

On Mar 12, 2013, at 1:21 PM, Chris Hostetter  wrote:

> How are these sets of flrids created/defined?  (undertsanding the source 
> of the filter information may help inspire alternative suggestsions, ie: 
> XY Problem)

It sounds like you're looking for patterns that could potentially providing 
groupings for these FLRIDs.  We've been down that road, too, but we don't see 
how there could be one.  The arbitrariness comes from the fact that the lists 
are maintained by users and can be changed at any time.

Each book in the database has an FLRID.  Each user can create lists of books.  
These lists can be modified at any time.  

That looks like this in Oracle:   USER   1->M   LIST   1->M   LISTDETAIL  M <- 
1  TITLE

The sizes we're talking about:  tens of thousands of users; hundreds of 
thousands of lists, with up to 100,000 items per list; tens of millions of 
listdetail.

We have a feature that lets the user do a keyword search on books within his 
list.  We can't update the Solr record to keep track of which lists it appears 
on because there may be, say, 20 people every second updating the contents of 
their lists, and those 20 people expect that their next search-within-a-list 
will have those new results.

Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Importing datetime

2013-03-19 Thread Andy Lester


On Mar 19, 2013, at 12:04 PM, Spadez  wrote:

> This is the datetime format SOLR requires as I understand it:
> 
> 1995-12-31T23:59:59Z
> 
> When I try to store this as a datetime field in MySQL it says it isn't
> valid. My question is, ideally I would want to keep a datetime in my
> database so I can sort by date rather than just making it a varchar, so I
> would store it like this:
> 
> 1995-12-31 23:59:59 
> 
> Can import date in this format into SOLR from MySQL?

Yes.  Don't change the storage type of your column in MySQL.  Changing to 
VARCHAR would be sad.

What you'll need to do is use a date formatting function in your SELECT out of 
the MySQL database to get the date into the format that MySQL likes.

See 
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date-format
 

xoa


--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Solr makes long requests about once a minute

2012-08-08 Thread Andy Lester

I'm having a problem with Solr under Tomcat unexpectedly taking a long time to 
respond to queries.  As part of some stress testing, I wrote a bot that just 
does random word searches on my Solr install, and my responses typically come 
back in 10-50 ms.  The queries are just 1-3 random words from 
/usr/share/dict/words, and I cap off the results at 2500 hits.  

The queries run just fine and I typically get responses up to 50ms for large 
result sets.  Here's an example of my log:

TIME HITS   MS SEARCH WORDS
12:33:2015 hoovey Aruru kwachas
12:33:2085 blinis twyver
12:33:20 2500   34 prework burlily sunshine
12:33:20 1928   30 rendu Solly
12:33:20   unnethe
12:33:20   gadwell afterpeak
12:33:20  792   14 steen
12:33:2047 blanchi repaving
12:33:20   326 torbanite Storz ungag
12:33:2075 chemostat
12:33:20   156 Guauaenok Adao lakist
12:33:2066 bechance viny
12:33:20   206 chagigah
12:33:22  532 2404 bonne
12:33:22  1439 nonman Norrie
12:33:22   246 repealers
12:33:22   Pfosi laniard locutory
12:33:22   516 sexipolar wordsmith enshield
12:33:22   loggiest Aryanise koels
12:33:22   fogyish unforcing
12:33:2245 Millvale chokies
12:33:2256 Melfa ripal Olva
12:33:22   156 apio Heraea latimeria
12:33:2245 nonnitric parleying

See that one line where it 2404ms to return?  I get those about once a minute, 
but not at a regular interval.  I ran this for two hours and got 122 spikes in 
120 minutes.  I ran it overnight and came in to work to find that there were 
1283 spikes in 1260 minutes.  So that one-a-minute is a pattern.

As I write this, I'm in IRC with Chris Hostetter and he says:

--snip--
Probably need to tweak your garbage collector settings to something that 
doesn't involve "stop the world" ... the specifics of the changes largely 
depend on what JVM you are using, what options you already have set, etc.  
markrmiller wrote a good blog about this a little while back: 
http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/  There's 
also some notes here in the LucidWorks Solr Ref Guide: 
http://lucidworks.lucidimagination.com/display/solr/JVM+Settings
--snip--

GC certainly sounds like a reasonable suspect.  Any other suggestions?  Any 
hints on Solr-specific GC tuning?  I'm currently scouring Google.

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Solr makes long requests about once a minute

2012-08-08 Thread Andy Lester

On Aug 8, 2012, at 10:53 AM, Michael Della Bitta wrote:

> What version of Solr are you running and what Directory implementation
> are you using? How much RAM does your system have, and how much is
> available for use by Solr?

Solr 3.6.0

I don't know what "directory implementation" means.  Are you asking about 
?  All I have in my solrconfig.xml is

The box has 16GB in it and currently has literally nothing else running on it.  
As to the "how much is available for use by Solr", is there somewhere that I'm 
setting that in a config file?

Clearly, I'm entirely new to the whole JVM ecosystem. I'm coming from the world 
of Perl.

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Holy cow do I love 4.0's admin screen

2012-08-23 Thread Andy Lester

> can you elaborate on your comment related to your polling script written in
> ruby and how the new data import status screen makes your polling app
> obsolete?

The 4.0 admin tools have a screen that give the status in the web app so I 
don't have to run the CLI tool to check the indexing status.

However, it will still be necessary if I need to wait for indexing to complete 
in, for example, a Makefile or a script.

xoxo
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Bitmap field in solr

2012-08-23 Thread Andy Lester

On Aug 23, 2012, at 2:54 PM, Rohit Harchandani wrote:

> Hi all,
> Is there any way to have a bitmap field in Solr??
> I have a use case where I need to search specific attributes of a document.
> Rather than having an is_A, is_B, is_C (all related to each other)etc...how
> would i store all this data in a single field and still be able to query
> it?? Can it be done in any way apart from storing them as strings in a text
> field?

You can have a field that is multiValued.  It still needs a base type, like 
"string" or "int".  For instance, in my book database, I have a field called 
"classifications" and it is multivalued.  

A classification of 1 means "spiralbound", and 2 means "large print" and 3 
means "multilingual" and so on.  So if my user wants to search for a 
multilingual book, I search for "classifications:3".  If you want spiralbound 
large print, you'd search for "classifications:1 classifications:2".

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Solr Index problem

2012-08-23 Thread Andy Lester


On Aug 23, 2012, at 4:46 PM, ranmatrix S  wrote:

> The schema and fields in db-data-config.xml are one and the same.

Please attach or post both the schema and the DIH config XML files so we can 
see them.  The DIH can be pretty tricky.

You say you can see 9 records are returned back.  How do you see that?

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Debugging DIH

2012-08-24 Thread Andy Lester

On Aug 24, 2012, at 9:17 AM, Hasan Diwan wrote:

> 
> url="jdbc:h2:tcp://192.168.1.6/finance" user="sa" />
>
>  
>
> 
> 
> and I've added the appropriate fields to schema.xml:
>  
>   
>   
> 
> There's nothing in my index and 343 rows in my table. What is going on? -- H

I don't see that you have anything in the DIH that tells what columns from the 
query go into which fields in the index.  You need something like

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Searchers, threads and performance

2012-11-13 Thread Andy Lester

We're getting close to deploying our Solr search solution, and we're doing 
performance testing, and we've run into some questions and concerns.

Our number one problem: Doing a commit from loading records, which can happen 
throughout the day, makes all queries stop for 5-7 seconds.  This is a 
showstopper for deployment.

Here's what we've observed: Upon commit, Solr finishes processing queries in 
flight, starts up a new searcher, warms it, shuts down the old searcher and 
puts the new searcher into effect. Does the old searcher stop taking requests 
before the new searcher is warmed or after? How wide is the window of time 
wherein Solr is not serving requests?  For us, it's about five seconds and we 
need to drop that dramatically.  In general, what is the difference between 
accepting the delay of waiting for warming vs. accepting the delay of running 
useColdSearcher=true?

Is there any such thing as/any sense in running more than one searcher in our 
scenario?  What are the benefits of multiple searchers?  Erik Erikson posts in 
2012: "Unless you have warming happening, there should only be a single 
searcher open at any given time." Except: "If your queries run across several 
commits you'll get multiple searchers open." Not sure if this is a general 
observation, or specific to the particular poster's situation.

Finally, what do people mean when they blog that they have Solr set up for "n 
threads"? Is that the same thing as saying that Solr can be processing n 
requests simultaneously?

Thanks for any insight or even links to relevant pages.  We've been Googling 
all over and haven't found answers to the above.

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Searchers, threads and performance

2012-11-13 Thread Andy Lester

We're getting close to deploying our Solr search solution, and we're doing 
performance testing, and we've run into some questions and concerns.

Our number one problem: Doing a commit from loading records, which can happen 
throughout the day, makes all queries stop for 5-7 seconds.  This is a 
showstopper for deployment.

Here's what we've observed: Upon commit, Solr finishes processing queries in 
flight, starts up a new searcher, warms it, shuts down the old searcher and 
puts the new searcher into effect. Does the old searcher stop taking requests 
before the new searcher is warmed or after? How wide is the window of time 
wherein Solr is not serving requests?  For us, it's about five seconds and we 
need to drop that dramatically.  In general, what is the difference between 
accepting the delay of waiting for warming vs. accepting the delay of running 
useColdSearcher=true?

Is there any such thing as/any sense in running more than one searcher in our 
scenario?  What are the benefits of multiple searchers?  Erik Erikson posts in 
2012: "Unless you have warming happening, there should only be a single 
searcher open at any given time." Except: "If your queries run across several 
commits you'll get multiple searchers open." Not sure if this is a general 
observation, or specific to the particular poster's situation.

Finally, what do people mean when they blog that they have Solr set up for "n 
threads"? Is that the same thing as saying that Solr can be processing n 
requests simultaneously?

Thanks for any insight or even links to relevant pages.  We've been Googling 
all over and haven't found answers to the above.

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Andy Lester

On Nov 15, 2012, at 8:02 AM, Sébastien Lorber  
wrote:

>  
>
>  

I don't know where you're getting the ${JOB_EXEC.JOB_INSTANCE_ID}.  I believe 
that if you want to get parameters passed in, it looks like this:

   WHERE batchid = ${dataimporter.request.batchid}

when I kick off the DIH like this:

   $url/dih?command=full-import&entity=titles&commit=true&batchid=47

At least that's how it works for me in 3.6 and 4.0.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

How do I best detect when my DIH load is done?

2012-11-19 Thread Andy Lester

A little while back, I needed a way to tell if my DIH load was done, so I made 
up a little Ruby program to query /dih?command=status .  The program is here: 
http://petdance.com/2012/07/a-little-ruby-program-to-monitor-solr-dih-imports/

Is this the best way to do it?  Is there some other tool or interface that I 
should be using instead?

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester

Is anyone using Cacti to track trends over time in Solr and Tomcat metrics?  We 
have Nagios set up for alerts, but want to track trends over time.

I've found a couple of examples online, but none have worked completely for me. 
 I'm looking at this one next: 
http://forums.cacti.net/viewtopic.php?f=12&t=19744&start=15  It looks promising 
although it doesn't monitor Solr itself.

Suggestions?

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester

On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic  
wrote:

> My favourite topic ;)  See my sig below for SPM for Solr. At my last
> company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
> some graphite, some newrelic, some SPM, some nothing!

SPM looks mighty tasty, but we must have it in-house on our own servers, for 
monitoring internal dev systems, and we'd like it to be open source.

We already have Cacti up and running, but it's possible we could use something 
else.

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: Perl Solr help - doing : query

Re: Email regular expression.

Re: Schema Lint

Re: [ANNOUNCE] Solr wiki editing change

Re: Solr indexing

Re: Any estimation for solr 4.3?

Re: Any estimation for solr 4.3?

Re: Any estimation for solr 4.3?

Re: Any estimation for solr 4.3?

Why do FQs make my spelling suggestions so slow?

Re: Why do FQs make my spelling suggestions so slow?

Re: Why do FQs make my spelling suggestions so slow?

Re: Solr Security

Re: stopwords in solr

Re: solr searchHandler/searchComponent for query statistics

How can I limit my Solr search to an arbitrary set of 100,000 documents?

Re: [Beginner] wants to contribute in open source project

Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

Re: Importing datetime

Solr makes long requests about once a minute

Re: Solr makes long requests about once a minute

Re: Holy cow do I love 4.0's admin screen

Re: Bitmap field in solr

Re: Solr Index problem

Re: Debugging DIH

Searchers, threads and performance

Searchers, threads and performance

Re: DataImportHandler in Solr 1.4 bug?

How do I best detect when my DIH load is done?

Cacti monitoring of Solr and Tomcat

Re: Cacti monitoring of Solr and Tomcat

31 matches

Site Navigation

Mail list logo

Footer information