date:20150928

Keyword match distance rule issue

2015-09-28 Thread anil.vadhavane

Hello,

I'm using Lucene Solr 4.10.4 for Keyword match functionality. I found some
issues with distance rule.
I have added search keyword with distance 2 "Bridgewater~2".
When I make search it did not return "bridwater" in results which should be. 

If I change placing of 'ge' at any other place it works. For e.g.
"Bridwgeater~2"

Has anyone faced similar issues and possible solutions.

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Keyword-match-distance-rule-issue-tp4231624.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-28 Thread Alessandro Benedetti

Hi Upaya,
thanks for the explanation, I actually already did some investigations
about it ( my first foundation was :
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ ) and
then I took a look to the code.

Was just wondering what the community was thinking about
including/providing numerical similarity ( approaches, ideas, possible
existent solutions).
Customisation should be the last step, if anything already available.

Thanks for the support anyway !

Cheers

2015-09-25 12:47 GMT+01:00 Upayavira :

> Alessandro,
>
> I'd suggest you review the code of the MoreLikeThisHandler. It is a
> little knotty, but it would be worth your while understanding what is
> going on there.
>
> Basically, there are three phases:
>
> phase #1: parse the source document into a list of terms (avoided if
> term vectors enabled and source doc is in index)
> phase #2: calculate a score for each of these terms and select the n
> highest scoring ones (default 25)
> phase #3: build and execute a boolean query using these 25 terms
>
> Phase #2 uses a TF/IDF like approach to calculate the scores for those
> "interesting terms".
>
> Once you understand what MLT is doing, you will probably not find it so
> hard to create your own version which is better suited to your own
> use-case.
>
> Of course, this would probably be better constructed as a QueryParser
> rather than a request handler, but that's a detail.
>
> Upayavira
>
> On Fri, Sep 25, 2015, at 11:08 AM, Alessandro Benedetti wrote:
> > Hi guys,
> > was just investigating a little bit in how to include numeric fields in
> > the
> > MLT calculations.
> >
> > As we know, we are currently building a smart lucene query based on the
> > document in input ( the one to search for similar ones) and run this
> > query
> > to obtain the similar docs.
> > Because the MLT is currently built on TF/IDF , it is mainly thought for
> > textual fields.
> > What about we want to include a numeric factor  in the similarity
> > calculus ?
> >
> > e.g.
> > Solr Document ( Hotel)
> > mlt.fl=description,stars,trip_advisor_rating
> >
> > To find the similarity based not only on the description, but also on the
> > numeric fields ( stars and rating) .
> >
> > The first thought I had , is to add a support for boosting functions.
> > In this way we are more flexible and we can add how many functions we
> > want.
> >
> > For example adding :
> > bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))
> >
> > Also other kind of functions can be applied.
> > What do you think ? Do you have any alternative ideas ?
> >
> > Cheers
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: query parsing

2015-09-28 Thread Alessandro Benedetti

happy to read that, regarding the spellcheck, is a different thing, so let
us know for further details !

Cheers

2015-09-27 18:59 GMT+01:00 Mark Fenbers :

> I am delighted to announce that I have it all working again!  Well, not
> all, just the searching!
>
> I deleted my core and created a new one from the command-line (solr
> create_core -c EventLog2) using the basic_configs option. Then I had to add
> my columns to the schema.xml and the dataimport handler to solrconfig.xml
> and tweak a couple of other details. But to make a long story short,
> parsing is working and I can search on terms without wrapping asterisks!!
> Yay!  Thanks for the help!
>
> Spell-checking still isn't working, though, and I'm apprehensive about
> working with it today.  But I will eventually.  The complaint is it can't
> find ELspell, which I had defined in the old setup that I blew away, so
> I'll have to redefine it at some point!  For now, I'm just gonna delight in
> having searching working again!
>
> Mark
>
>
> On 9/26/2015 11:05 PM, Erick Erickson wrote:
>
>> No need to re-install Solr, just create a new core, this time it'd
>> probably be
>> easiest to use the bin/solr create_core command. In the Solr
>> directory just type bin/solr create_core -help to see the options.
>>
>> We're pretty much trying to migrate to using bin/solr for all the
>> maintenance
>> we can, but as always the documentation lags the code.
>>
>> Yeah, things are a bit ragged. The admin UI/core UI is really a legacy
>> bit of code that has _always_ been confusing, I'm hoping we can pretty
>> much remove it at some point since it's as trappy as it is.
>>
>> Best,
>> Erick
>>
>> On Sat, Sep 26, 2015 at 12:49 PM, Mark Fenbers 
>> wrote:
>>
>>> OK, a lot of dialog while I was gone for two days!  I read the whole
>>> thread,
>>> but I'm a newbie to Solr, so some of the dialog was Greek to me.  I
>>> understand the words, of course, but applying it so I know exactly what
>>> to
>>> do without screwing something else up is the problem.  After all, that is
>>> how I got into the mess in the first place.  I'm glad I have good help to
>>> untangle the knots I've made!
>>>
>>> I'd like to start over (option 1 below), but does this mean delete all my
>>> config and reinstalling Solr??  Maybe that is not a bad idea, but I will
>>> at
>>> least save off my data-config.xml as that is clearly the one thing that
>>> is
>>> probably working right.  However, I did do quite a bit of editing that I
>>> would have to do again. Please advise...
>>>
>>> To be fair, I must answer Erick's question of how I created the data
>>> index
>>> in the first place, because this might be relevant...
>>>
>>> The bulk of the data is read from 9000+ text files, where each file was
>>> manually typed.  Before inserting into the database, I do a little bit of
>>> processing of the text using "sed" to delete the top few and bottom few
>>> lines, and to substitute each single-quote character with a pair of
>>> single-quotes (so PostgreSQL doesn't choke).  Line-feed characters are
>>> preserved as ASCII 10 (hex 0A), but there shouldn't be (and I am not
>>> aware
>>> of) any characters aside from what is on the keyboard.
>>>
>>> Next, I insert it with this command:
>>> psql -U awips -d OHRFC -c "INSERT INTO EventLogText VALUES('$postDate',
>>> '$user', '$postDate', '$entryText', '$postCatVal');"
>>>
>>> In case you are wondering about my table, it is defined in this way:
>>> CREATE TABLE eventlogtext (
>>>posttime timestamp without time zone NOT NULL, -- Timestamp of this
>>> entry's original posting
>>>username character varying(8), -- username (logname) of the original
>>> poster
>>>lastmodtime timestamp without time zone, -- Last time record was
>>> altered
>>>logtext text, -- text of the log entry
>>>category integer, -- bit-wise category value
>>>CONSTRAINT eventlogtext_pkey PRIMARY KEY (posttime)
>>> )
>>>
>>> To do the indexing, I merely use /dataimport?full-import, but it knows
>>> what
>>> to do from my data-config.xml; which is here:
>>>
>>> 
>>>  >> url="jdbc:postgresql://dx1f/OHRFC" user="awips" />
>>>  
>>>  >>  deltaQuery="SELECT posttime AS id FROM eventlogtext
>>> WHERE
>>> lastmodtime > '${dataimporter.last_index_time}';">
>>>  
>>>  
>>>  
>>>  
>>> 
>>>
>>> Hope this helps!
>>>
>>> Thanks,
>>> Mark
>>>
>>> On 9/24/2015 10:57 AM, Erick Erickson wrote:
>>>
 Geraint:

 Good Catch! I totally missed that. So all of our focus on schema.xml has
 been... totally irrelevant. Now that you pointed that out, there's also
 the
 addition: add-unknown-fields-to-the-schema, which indicates you started
 this up in "schemaless" mode.

 In short, solr is trying to guess what your field types should be and
 guessing wrong (again and again and again). This is the classic weakness
 of
 schemaless. It's great for indexing stuff fast, but if it guesses wrong

RE: New Project setup too clunky

2015-09-28 Thread Duck Geraint (ext) GBJH

Huh, strange - I didn't even notice that you could create cores through the UI. 
I suppose it depends what order you read and infer from the documentation.

See "Create a Core":
https://cwiki.apache.org/confluence/display/solr/Running+Solr

I followed the "solr create -help" option to work out how to create a 
non-datadriven core (i.e. solr create_core), but I suppose this could be a 
little more explicit.

Geraint

Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com

-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov]
Sent: 27 September 2015 19:07
To: solr-user@lucene.apache.org
Subject: Re: New Project setup too clunky

On 9/27/2015 12:49 PM, Alexandre Rafalovitch wrote:
> Mark,
>
> Thank you for your valuable feedback. The newbie's views are always 
> appreciated.
>
> Admin Admin UI command is designed for creating a collection based on
> the configuration you already have. Obviously, it makes that point
> somewhat less than obvious.
>
> To create a new collection with configuration files all in place, you
> can bootstrap it from a configset. Which is basically what you did
> when you run "solr -e", except "-e" also populates the files and does
> other tricks.
>
> So, if you go back to the command line and run "solr" you will see a
> bunch of options. The one you are looking for is "solr create_core"
> which will tell you all the parameters as well as the available
> configurations to bootstrap from.
>
> I hope this helps.

Yes!  It does help!  But it took a post and a response on the user-forum for me 
to learn this!  Rather, it should be added to the "Solr Quick Start" document.
Mark

Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.

What kind of nutch documents does Solr index?

2015-09-28 Thread Daniel Holmes

Hi,
I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In
my tests there is a gap between number of fetched results of Nutch and
number of indexed documents in Solr. For example one of the crawls is
fetched 23343 pages and 1146 images successfully while in the Solr 19250
docs is indexed and 500 of them is image urls.

My question is that what kind of pages are indexed is solr and why?
Does Solr index pages whit other status or not?
what kind of images does Solr index?

Thanks.

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-28 Thread Toke Eskildsen

On Sun, 2015-09-27 at 14:47 +0200, Uwe Reh wrote:
> Like Walter Underwood wrote, in technical sense faceting on authors 
> isn't a good idea.

In a technical sense, there is no good or bad about faceting on
high-cardinality fields in Solr. The faceting code is fairly efficient
(modulo the newly discovered regression) and scales well with the number
of references and unique terms. It gives the expected performance when
used with high-cardinality fields: Relatively heavy and with substantial
worst-case processing time.

As such should be enabled with care and a clear understanding of the
cost. But the same can be said of a great deal of other features, when
building an IT system. Labelling is a good or bad idea only makes sense
when looking at the specific context.

I am being a stickler about this because high-cardinality faceting in
Solr has an undeserved bad rep. Rather than discouraging it, we should
be better at describing the consequences of using it.

> In the worst case, the relation book to author is 
> n:n. Never the less, thanks to authority files (which are intensively 
> used in Germany) the facet 'author' is often helpful.

We have been faceting on Author (10M uniques) since 2007. It helps our
users navigate the corpus. It is a good idea for us.

We tried faceting on 6 billion uniques/machine as default in our Net
Archive (custom hack). It raised our non-pathological 75% percentile to
2½ second, with little value for the researchers. It was a bad idea for
us.

- Toke Eskildsen, State and University Library, Denmark

Re: Keyword match distance rule issue

2015-09-28 Thread Alessandro Benedetti

Maybe it's a silly observation...
But are you lowercasing at indexing/querying time ?
Can you show us the schema analysis config for the field type you use ?
Because strictly talking about Levenshtein distance bridwater is 3 edits
from Bridgewater.

Cheers

2015-09-28 8:26 GMT+01:00 anil.vadhavane :

> Hello,
>
> I'm using Lucene Solr 4.10.4 for Keyword match functionality. I found some
> issues with distance rule.
> I have added search keyword with distance 2 "Bridgewater~2".
> When I make search it did not return "bridwater" in results which should
> be.
>
> If I change placing of 'ge' at any other place it works. For e.g.
> "Bridwgeater~2"
>
> Has anyone faced similar issues and possible solutions.
>
> Thanks.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Keyword-match-distance-rule-issue-tp4231624.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: position of the search term

2015-09-28 Thread Alessandro Benedetti

So, based on my knowledge, it is not possible ( except if you customise the
component) .
Read here :
http://lucene.472066.n3.nabble.com/How-do-I-recover-the-position-and-offset-a-highlight-for-solr-4-1-4-2-td4051763.html

Another data structure that you can think as useful is to store the Term
Vector for your docs and use it.

Cheers

2015-09-27 19:58 GMT+01:00 Mark Fenbers :

> For the brief period that I had spell-checking working, I noticed that the
> results record had the start/end position within the text of the misspelled
> word.  Is there anyway to get the same start/end position when doing a
> search?  I want to be able to highlight the search term in the text.
> Default config puts  tags around the search, but I'm not using an HTML
> renderer and I don't want characters of any sort inserted into the text
> returned in the result set. rather, I just want the start/end position.
> How do I configure that?
>
> Mark
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: firstSearcher cache warming with own QuerySenderListener

2015-09-28 Thread Christian Reuschling

Erick, Walter and all,

as I wrote, I am aware of the firstSearcher event, we tried it manually before 
we choosed to enhance
the QuerySenderListener.

I think our usage scenario (I didn't wrote about it for simplicity) is a bit 
different from yours,
what makes this necessary. We are implementing an own solr searchHandler module 
that fires several
queries on its own, and our customer implements against this searchHandler, 
giving some 'seed
queries', which leads to further queries (dozens to hundreds).

I assume there will be a set of typical or common queries, but according to the 
more heterogeneous
nature of the final queries that arrives the cores, it is bigger and hard to 
find out manually.

The server is restarted so often because the searchHandler is still under 
development. And because
one customer query yields to so much queries executed, non-warmed caches makes 
a big difference.

I will formulate my query different, it's the same thing:

I have inserted a query for warming inside firstSearcher event solrconfig.xml. 
If I call it from the
browser once, the response time still is much bigger against the ones from 
succeeding invocations,
which gives the impression that some caches are not filled. Here is my query 
(with an mlt query parser):

http://localhost:8014/solr/etrCollection/select?q=+%28+%28dynaqCategory:brandwatch%29%29%20_query_:%27{!mlt%20qf=%22body%22%20v=%22http://www.usatoday.com/story/news/nation/2013/02/14/drought-farmers-midwest/1920577/%22}&rows=10&fl=dataEntityId,title,creator,score&wt=json



thanks again,

Christian



Walter wrote:

> Right.
>
> I chose the twenty most frequent terms from our documents and use those for 
> cache warming.
> The list of most frequent terms is pretty stable in most collections.


Erick wrote:

> That's what the firstSearcher event in solrconfig.xml is for, exactly the
> case of autowarming Solr when it's just been started. The queries you put
> in that event are fired only when the server starts.
>
> So I'd just put my queries there. And you do not have to put a zillion
> queries here. Start with one that mentions all the facets you intend to
> use, sorts by all the various sort fields you use, perhaps (if you have any
> _very_ common filter queries) put those in too.
>
> Then analyze the queries that are still slow when issued the first time
> after startup and add what you suspect are the relevant bits to the
> firstSearcher query (or queries).
>
> I suggest that this is a much easier thing to do, and focus efforts on why
> you are shutting down your Solr servers often enough that anyone notices..
>
> Best,
> Erick




On 25.09.2015 17:31, Christian Reuschling wrote:
> Hey all,
> 
> we want to avoid cold start performance issues when the caches are cleared 
> after a server restart.
> 
> For this, we have written a SearchComponent that saves least recently used 
> queries. These are
> written to a file inside a closeHook of a SolrCoreAware at server shutdown.
> 
> The plan is to perform these queries at server startup to warm up the caches. 
> For this, we have
> written a derivative of the QuerySenderListener and configured it as 
> firstSearcher listener in
> solrconfig.xml. The only difference to the origin QuerySenderListener is that 
> it gets it's queries
> from the formerly dumped lru queries rather than getting them from the config 
> file.
> 
> It seems that everything is called correctly, and we have the impression that 
> the query response
> times for the dumped queries are sometimes slightly better than without this 
> warming.
> 
> Nevertheless, there is still a huge difference against the times when we 
> manually perform the same
> queries once, e.g. from a browser. If we do this, the second time we perform 
> these queries they
> respond much faster (up to 10 times) than the response times after the 
> implemented warming.
> 
> It seems that not all caches are warmed up during our warming. And because of 
> these huge
> differences, I doubt we missed something.
> 
> The index has about 25M documents, and is splitted into two shards in a cloud 
> configuration, both
> shards are on the same server instance for now, for testing purposes.
> 
> Does anybody have an idea? I tried to disable lazy field loading as a 
> potential issue, but with no
> success.
> 
> 
> Cheers,
> 
> Christian
>

PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Moen Endre

How does facet_count work with a facet field that is defined as solr. 
PathHierarchyTokenizerFactory?

I have multiple records that contains field Parameter which is of type 
PathHierarchyTokenizerFactory.
E.g
"Parameter": [
  "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATURE",
  "EARTH SCIENCE>OCEANS>OCEAN PRESSURE>WATER PRESSURE",
  "EARTH SCIENCE>OCEANS>OCEAN ACOUSTICS>ACOUSTIC VELOCITY",
  "EARTH SCIENCE>ACOUSTIC",
  "EARTH SCIENCE>VELOCITY",
  "EARTH SCIENCE>ACOBAR | ACOUSTIC TECHNOLOGY FOR OBSERVING THE 
INTERIOR OF THE ARCTIC OCEAN",
  "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
  "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
],

But when I run a query to get all facet counts for Parameter - with this query:
http://localhost:8983/solr/nmdc/query? 
q=*:*&facet=true&rows=0&facet.mincount=1&facet.field=Parameter

the two last entries from this record;
"EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
"EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"

is missing from the facet_count - which looks like:

  "facet_counts":{

"facet_queries":{},

"facet_fields":{

  "Parameter":[

"EARTH SCIENCE",228,

"EARTH SCIENCE>OCEANS",128,

"EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE",100,

"EARTH SCIENCE>OCEANS>SALINITY/DENSITY",90,
...

Im running solr 5.0

Why does the query seem to omit some of the Parameter entries from records?
Path is configured with:









Cheers
Endre

Re: What kind of nutch documents does Solr index?

2015-09-28 Thread Upayavira

I suspect you may be better off asking this on the Nutch user list. The
decisions you are describing will be within the Nutch codebase, not
Solr. Someone here may know (hopefully) but you may get more support
over on the Nutch list.

One suggestion -start with a clean, empty index. Run a crawl. Look at
the maxDocs vs numDocs (visible via the admin UI for your
core/collection). If maxDocs>numDocs, it means that some docs have been
overwritten - i.e. the ID field that Nutch is using is not unique.

Upayavira

On Mon, Sep 28, 2015, at 10:19 AM, Daniel Holmes wrote:
> Hi,
> I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing.
> In
> my tests there is a gap between number of fetched results of Nutch and
> number of indexed documents in Solr. For example one of the crawls is
> fetched 23343 pages and 1146 images successfully while in the Solr 19250
> docs is indexed and 500 of them is image urls.
> 
> My question is that what kind of pages are indexed is solr and why?
> Does Solr index pages whit other status or not?
> what kind of images does Solr index?
> 
> Thanks.

[ANNOUNCE] Luke 5.3.0 released

2015-09-28 Thread Dmitry Kan

This is a major release supporting lucene / solr 5.3.0. Download the zip
here:
https://github.com/DmitryKey/luke/releases/tag/luke-5.3.0

This release runs on Java8 and does not run on Java7.

The release includes a number of pull requests and github issues. Worth
mentioning:

https://github.com/DmitryKey/luke/pull/38 upgrade to 5.3.0 itself

https://github.com/DmitryKey/luke/pull/28 Added LUKE_PATH env variable to
luke.sh
https://github.com/DmitryKey/luke/pull/35 Added copy, cut, paste etc.
shortcuts, using Mac command key
https://github.com/DmitryKey/luke/pull/34 Fixed lastAnalyzer retrieval
(this feature remembers the last used analyzer on the Search tab)
https://github.com/DmitryKey/luke/issues/31 200 stargazers on github (by
the time of this release the number crossed 250). Great to see Luke
community growing.

If any of you would still require java7 compatibility for luke 5.3.0,
please file an issue on the luke's github: https://github.com/DmitryKey/luke


Luke Team

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Upayavira

There is also facet.limit which says how many facet entries to return.
Is that catching you?

The document either matches your query, or doesn't. If it does, then all
values of the Parameter field should be included in your faceting. But,
perhaps not all facet buckets are being returned to you - hence try
facet.limit = 100 or such

Upayavira

On Mon, Sep 28, 2015, at 11:47 AM, Moen Endre wrote:
> How does facet_count work with a facet field that is defined as solr.
> PathHierarchyTokenizerFactory?
> 
> I have multiple records that contains field Parameter which is of type
> PathHierarchyTokenizerFactory.
> E.g
> "Parameter": [
>   "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATURE",
>   "EARTH SCIENCE>OCEANS>OCEAN PRESSURE>WATER PRESSURE",
>   "EARTH SCIENCE>OCEANS>OCEAN ACOUSTICS>ACOUSTIC VELOCITY",
>   "EARTH SCIENCE>ACOUSTIC",
>   "EARTH SCIENCE>VELOCITY",
>   "EARTH SCIENCE>ACOBAR | ACOUSTIC TECHNOLOGY FOR OBSERVING THE
>   INTERIOR OF THE ARCTIC OCEAN",
>   "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
>   "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> ],
> 
> But when I run a query to get all facet counts for Parameter - with this
> query:
> http://localhost:8983/solr/nmdc/query?
> q=*:*&facet=true&rows=0&facet.mincount=1&facet.field=Parameter
> 
> the two last entries from this record;
> "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
> "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> 
> is missing from the facet_count - which looks like:
> 
>   "facet_counts":{
> 
> "facet_queries":{},
> 
> "facet_fields":{
> 
>   "Parameter":[
> 
> "EARTH SCIENCE",228,
> 
> "EARTH SCIENCE>OCEANS",128,
> 
> "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE",100,
> 
> "EARTH SCIENCE>OCEANS>SALINITY/DENSITY",90,
> ...
> 
> Im running solr 5.0
> 
> Why does the query seem to omit some of the Parameter entries from
> records?
> Path is configured with:
> 
> 
>  class="solr.PathHierarchyTokenizerFactory"
> delimiter=">" />
> 
> 
>  />
> 
> 
> 
> Cheers
> Endre
>

Cost of having multiple search handlers?

2015-09-28 Thread Oliver Schrenk

Hi,

I want to register multiple but identical search handler to have multiple 
buckets to measure performance for our different apis and consumers (and to 
find out who is actually using Solr).

What are there some costs associated with having multiple search handlers? Are 
they neglible?

Cheers,
Oliver

RE: String index out of range exception from Spell check

2015-09-28 Thread Dyer, James

This looks similar to SOLR-4489, which is marked fixed for version 4.5.  If 
you're using an older version, the fix is to upgrade.  

Also see SOLR-3608, which is similar but here it seems as if the user's query 
is more than spellcheck was designed to handle.  This should still be looked at 
and possibly we can come up with a way to handle these cases.

A way to work around these bugs is to strip your query down to raw terms, 
separated by spaces, and use "spellcheck.q" with the raw terms only.

James Dyer
Ingram Content Group


-Original Message-
From: davidphilip cherian [mailto:davidphilipcher...@gmail.com] 
Sent: Sunday, September 27, 2015 3:50 PM
To: solr-user@lucene.apache.org
Subject: String index out of range exception from Spell check

There are irregular exceptions from spell check component. Below is the
stack trace. This is not common for all the q terms but have often seen
them occurring for specific queries after enabling spellcheck.collate
method.



String index out of range: -3



java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at
java.lang.StringBuilder.replace(StringBuilder.java:266) at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
at
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230)
at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:226)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:722)



500

RE: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Moen Endre

Yes, that solved my problem. There must be an implisite facet.limit set because 
I tried the same url query with face.limit=1. And got back records with 
"EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"

Cheers!
Endre

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: 28. september 2015 14:01
To: solr-user@lucene.apache.org
Subject: Re: PathHierarchyTokenizerFactory and facet_count

There is also facet.limit which says how many facet entries to return.
Is that catching you?

The document either matches your query, or doesn't. If it does, then all values 
of the Parameter field should be included in your faceting. But, perhaps not 
all facet buckets are being returned to you - hence try facet.limit = 100 or 
such

Upayavira

On Mon, Sep 28, 2015, at 11:47 AM, Moen Endre wrote:
> How does facet_count work with a facet field that is defined as solr.
> PathHierarchyTokenizerFactory?
> 
> I have multiple records that contains field Parameter which is of type 
> PathHierarchyTokenizerFactory.
> E.g
> "Parameter": [
>   "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATURE",
>   "EARTH SCIENCE>OCEANS>OCEAN PRESSURE>WATER PRESSURE",
>   "EARTH SCIENCE>OCEANS>OCEAN ACOUSTICS>ACOUSTIC VELOCITY",
>   "EARTH SCIENCE>ACOUSTIC",
>   "EARTH SCIENCE>VELOCITY",
>   "EARTH SCIENCE>ACOBAR | ACOUSTIC TECHNOLOGY FOR OBSERVING THE
>   INTERIOR OF THE ARCTIC OCEAN",
>   "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
>   "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> ],
> 
> But when I run a query to get all facet counts for Parameter - with 
> this
> query:
> http://localhost:8983/solr/nmdc/query?
> q=*:*&facet=true&rows=0&facet.mincount=1&facet.field=Parameter
> 
> the two last entries from this record; "EARTH SCIENCE>GEOGRAPHIC 
> REGION>POLAR", "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> 
> is missing from the facet_count - which looks like:
> 
>   "facet_counts":{
> 
> "facet_queries":{},
> 
> "facet_fields":{
> 
>   "Parameter":[
> 
> "EARTH SCIENCE",228,
> 
> "EARTH SCIENCE>OCEANS",128,
> 
> "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE",100,
> 
> "EARTH SCIENCE>OCEANS>SALINITY/DENSITY",90,
> ...
> 
> Im running solr 5.0
> 
> Why does the query seem to omit some of the Parameter entries from 
> records?
> Path is configured with:
> 
> 
>  class="solr.PathHierarchyTokenizerFactory"
> delimiter=">" />
> 
> 
>  />
> 
> 
> 
> Cheers
> Endre
>

Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-28 Thread Upayavira

You could use the MLT query parser, and combine that with other queries,
whether as filters or boosts.

You can't yet use stream.body yet, so would need to use the handler if
you need that.

Upayavira

On Mon, Sep 28, 2015, at 09:53 AM, Alessandro Benedetti wrote:
> Hi Upaya,
> thanks for the explanation, I actually already did some investigations
> about it ( my first foundation was :
> http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ ) and
> then I took a look to the code.
> 
> Was just wondering what the community was thinking about
> including/providing numerical similarity ( approaches, ideas, possible
> existent solutions).
> Customisation should be the last step, if anything already available.
> 
> Thanks for the support anyway !
> 
> Cheers
> 
> 2015-09-25 12:47 GMT+01:00 Upayavira :
> 
> > Alessandro,
> >
> > I'd suggest you review the code of the MoreLikeThisHandler. It is a
> > little knotty, but it would be worth your while understanding what is
> > going on there.
> >
> > Basically, there are three phases:
> >
> > phase #1: parse the source document into a list of terms (avoided if
> > term vectors enabled and source doc is in index)
> > phase #2: calculate a score for each of these terms and select the n
> > highest scoring ones (default 25)
> > phase #3: build and execute a boolean query using these 25 terms
> >
> > Phase #2 uses a TF/IDF like approach to calculate the scores for those
> > "interesting terms".
> >
> > Once you understand what MLT is doing, you will probably not find it so
> > hard to create your own version which is better suited to your own
> > use-case.
> >
> > Of course, this would probably be better constructed as a QueryParser
> > rather than a request handler, but that's a detail.
> >
> > Upayavira
> >
> > On Fri, Sep 25, 2015, at 11:08 AM, Alessandro Benedetti wrote:
> > > Hi guys,
> > > was just investigating a little bit in how to include numeric fields in
> > > the
> > > MLT calculations.
> > >
> > > As we know, we are currently building a smart lucene query based on the
> > > document in input ( the one to search for similar ones) and run this
> > > query
> > > to obtain the similar docs.
> > > Because the MLT is currently built on TF/IDF , it is mainly thought for
> > > textual fields.
> > > What about we want to include a numeric factor  in the similarity
> > > calculus ?
> > >
> > > e.g.
> > > Solr Document ( Hotel)
> > > mlt.fl=description,stars,trip_advisor_rating
> > >
> > > To find the similarity based not only on the description, but also on the
> > > numeric fields ( stars and rating) .
> > >
> > > The first thought I had , is to add a support for boosting functions.
> > > In this way we are more flexible and we can add how many functions we
> > > want.
> > >
> > > For example adding :
> > > bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))
> > >
> > > Also other kind of functions can be applied.
> > > What do you think ? Do you have any alternative ideas ?
> > >
> > > Cheers
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> >
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England

Re: Cost of having multiple search handlers?

2015-09-28 Thread Upayavira

I would expect this to be negligible.

Upayavira

On Mon, Sep 28, 2015, at 01:30 PM, Oliver Schrenk wrote:
> Hi,
> 
> I want to register multiple but identical search handler to have multiple
> buckets to measure performance for our different apis and consumers (and
> to find out who is actually using Solr).
> 
> What are there some costs associated with having multiple search
> handlers? Are they neglible?
> 
> Cheers,
> Oliver

Re: Cost of having multiple search handlers?

2015-09-28 Thread Shawn Heisey

On 9/28/2015 6:30 AM, Oliver Schrenk wrote:
> I want to register multiple but identical search handler to have multiple 
> buckets to measure performance for our different apis and consumers (and to 
> find out who is actually using Solr).
> 
> What are there some costs associated with having multiple search handlers? 
> Are they neglible?

Unless you are creating hundreds or thousands of them, I doubt you'll
notice any significant increase in resource usage from additional
handlers.  Each handler definition creates an additional URL endpoint
within the servlet container, additional object creation within Solr,
and perhaps an additional thread pool and threads to go with it, so it's
not free, but I doubt that it's significant.  The resources required for
actually handling a request is likely to dwarf what's required for more
handlers.

Disclaimer: I have not delved into the code to figure out exactly what
gets created with a search handler config, so I don't know exactly what
happens.  I'm basing this on general knowledge about how Java programs
are constructed by expert developers, not specifics about Solr.

There are others on the list who have a much better idea than I do, so
if I'm wrong, I'm sure one of them will let me know.

Thanks,
Shawn

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Gili Nachum

Were all of shard replica in active state (green color in admin ui) before
starting?
Sounds like it otherwise you won't hit the replica that is out of sync.

Replicas can get out of sync, and report being in sync after a sequence of
stop start w/o a chance to complete sync.
See if it might have happened to you:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201412.mbox/%3CCAOOKt53XTU_e0m2ioJ-S4SfsAp8JC6m-=nybbd4g_mjh60b...@mail.gmail.com%3E
On Sep 27, 2015 06:56, "Ravi Solr"  wrote:

> Erick...There is only one type of String
> "sun.org.mozilla.javascript.internal.NativeString:" and no other variations
> of that in my index, so no question of missing it. Point taken regarding
> the CURSORMARK stuff, yes you are correct, my head so numb at this point
> after working 3 days on this, I wasnt thinking straight.
>
> BTW I found the real issue, I have a total of 8 servers in the solr cloud.
> The leader for this specific collection was the one that was returning 0
> for the searches. All other 7 servers had roughly 800K docs still needing
> the string replacement. So maybe the real issue is sync among servers. Just
> to prove to myself I shutdown the solr  that was giving zero results (i.e.
> all uuid strings have already been somehow devoid of spurious
> sun.org.mozilla.javascript.internal.NativeString on that server). Now it
> ran perfectly fine and is about to finish as last 103K are still left when
> I was writing this email.
>
> So the real question is how can we ensure that the Sync is always
> maintained and what to do if it ever goes out of Sync, I did see some Jira
> tickets from previous 4.10.x versions where Sync was an issue. Can you
> please point me to any doc which says how SolrCloud synchs/replicates ?
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
> Thanks
>
> Rvai Kiran Bhaskar
>
> On Sat, Sep 26, 2015 at 11:00 PM, Erick Erickson 
> wrote:
>
> > bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially
> > using
> > 100 docs batch, which, I later increased to 500 docs per batch. Also it
> > would not be a infinite loop if I commit for each batch, right !!??
> >
> > That's not the point at all. Look at the basic logic here:
> >
> > You run for a while processing 100 (or 500 or 1,000) docs per batch
> > and change all uuid fields with this statement:
> >
> > uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
> >
> > and then update the doc. You run this as long as you have any docs
> > that satisfy the query "q=uuid:sun.org.mozilla*", _changing_
> > every one that has this string!
> >
> > At that point, theoretically, no document in your index has this string.
> So
> > running your update program immediately after should find _zero_
> documents.
> >
> > I've been assuming your complaint is that you don't process 1.4 M docs
> (in
> > batches), you process some lower number then exit and you think this is
> > wrong.
> > I'm claiming that you should only expect to find as many docs as have
> been
> > indexed since the last time the program ran.
> >
> > As far as the infinite loop is concerned, again trace the logic in the
> old
> > code.
> > Forget about commits and all the mechanics, just look at the logic.
> > You're querying on "sun.org.mozilla*". But you only change if you get a
> > match on
> > "sun.org.mozilla.javascript.internal.NativeString:"
> >
> > Now imagine you have a doc that has sun.org.mozilla.erick in it. That doc
> > gets
> > returned from the query but does _not_ get modified because it doesn't
> > match your pattern. In the older code, it would be found again and
> > returned next
> > time you queried. Then not modified again. Eventually you'd be in a
> > position
> > where you never changed any docs, just kept getting the same docList back
> > over and over again. Marching through based on the unique key should not
> > have the same potential issue.
> >
> > You should not be mixing the new query stuff with CURSORMARK. Deep paging
> > supposes the exact same query is being run over and over and you're
> > _paging_
> > through the results. You're changing the query every time so the results
> > aren't
> > very predictable.
> >
> > Best,
> > Erick
> >
> >
> > On Sat, Sep 26, 2015 at 5:01 PM, Ravi Solr  wrote:
> > > Erick & Shawn I incrporated your suggestions.
> > >
> > >
> > > 0. Shut off all other indexing processes.
> > > 1. As Shawn mentioned set batch size to 1.
> > > 2. Loved Erick's suggestion about not using filter at all and sort by
> > > uniqueId and put last known uinqueId as next queries start while still
> > > using cursor marks as follows
> > >
> > > SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" +
> > > markerSysId + " TO
> > > *]").setRows(1).addSort("uniqueId",ORDER.asc).setFields(new
> > > String[]{"uniqueId","uuid"});
> > > q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
> > >
> > > 3. As per Shawn's advise commented autocommit and soft commit in
> > > solrconfig.xml and set openSearcher to false

entity processing order during updates

2015-09-28 Thread Roxana Danger

Hello,
 I am importing in solr 2 entities coming from 2 different tables, and
I have defined an update request processor chain with two custom processor
factories:
 - the first processor factory needs to be executed first for one type
of entities and then for the other (I differentiate the "entity type" with
a field called table). In the import data config file I keep the order on
which the entities should need to be processed.
  - the second processor needs to be executed after complete the first
one.
 When executed the updates having only the first processor, the updates
work all fine. However, when I added the second processor, it seems that
the first update processor is not getting the entities in the order I was
expected.
 Does anyone had this problem before? Could anyone help me to configure
this?
 Thank you very much in advance,
 Roxana

Re: PathHierarchyTokenizerFactory and facet_count

2015-09-28 Thread Alessandro Benedetti

>From the Solr wiki, the default facet.limit should be 100 !
Anyway I find the way field facet is shown for field path hierarchy token
filtered fields, to be not so user friendly.
Ideally for those fields we should show a facet representation similar to
facet pivot.
Should be nice to think an idea to do that.


Cheers

2015-09-28 14:47 GMT+01:00 Moen Endre :

> Yes, that solved my problem. There must be an implisite facet.limit set
> because I tried the same url query with face.limit=1. And got back
> records with "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
>
> Cheers!
> Endre
>
> -Original Message-
> From: Upayavira [mailto:u...@odoko.co.uk]
> Sent: 28. september 2015 14:01
> To: solr-user@lucene.apache.org
> Subject: Re: PathHierarchyTokenizerFactory and facet_count
>
> There is also facet.limit which says how many facet entries to return.
> Is that catching you?
>
> The document either matches your query, or doesn't. If it does, then all
> values of the Parameter field should be included in your faceting. But,
> perhaps not all facet buckets are being returned to you - hence try
> facet.limit = 100 or such
>
> Upayavira
>
> On Mon, Sep 28, 2015, at 11:47 AM, Moen Endre wrote:
> > How does facet_count work with a facet field that is defined as solr.
> > PathHierarchyTokenizerFactory?
> >
> > I have multiple records that contains field Parameter which is of type
> > PathHierarchyTokenizerFactory.
> > E.g
> > "Parameter": [
> >   "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATURE",
> >   "EARTH SCIENCE>OCEANS>OCEAN PRESSURE>WATER PRESSURE",
> >   "EARTH SCIENCE>OCEANS>OCEAN ACOUSTICS>ACOUSTIC VELOCITY",
> >   "EARTH SCIENCE>ACOUSTIC",
> >   "EARTH SCIENCE>VELOCITY",
> >   "EARTH SCIENCE>ACOBAR | ACOUSTIC TECHNOLOGY FOR OBSERVING THE
> >   INTERIOR OF THE ARCTIC OCEAN",
> >   "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
> >   "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> > ],
> >
> > But when I run a query to get all facet counts for Parameter - with
> > this
> > query:
> > http://localhost:8983/solr/nmdc/query?
> > q=*:*&facet=true&rows=0&facet.mincount=1&facet.field=Parameter
> >
> > the two last entries from this record; "EARTH SCIENCE>GEOGRAPHIC
> > REGION>POLAR", "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> >
> > is missing from the facet_count - which looks like:
> >
> >   "facet_counts":{
> >
> > "facet_queries":{},
> >
> > "facet_fields":{
> >
> >   "Parameter":[
> >
> > "EARTH SCIENCE",228,
> >
> > "EARTH SCIENCE>OCEANS",128,
> >
> > "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE",100,
> >
> > "EARTH SCIENCE>OCEANS>SALINITY/DENSITY",90,
> > ...
> >
> > Im running solr 5.0
> >
> > Why does the query seem to omit some of the Parameter entries from
> > records?
> > Path is configured with:
> > 
> > 
> >  > class="solr.PathHierarchyTokenizerFactory"
> > delimiter=">" />
> > 
> > 
> >  > />
> > 
> > 
> >
> > Cheers
> > Endre
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Cost of having multiple search handlers?

2015-09-28 Thread Gili Nachum

A different solution to the same need: I'm measuring response times of
different collections measuring  online/batch queries apart using New
Relic. I've added a servlet filter that analyses the request and makes this
info available to new relic over a request argument.

The built in new relic solr plug in doesn't provide much.
On Sep 28, 2015 17:16, "Shawn Heisey"  wrote:

> On 9/28/2015 6:30 AM, Oliver Schrenk wrote:
> > I want to register multiple but identical search handler to have
> multiple buckets to measure performance for our different apis and
> consumers (and to find out who is actually using Solr).
> >
> > What are there some costs associated with having multiple search
> handlers? Are they neglible?
>
> Unless you are creating hundreds or thousands of them, I doubt you'll
> notice any significant increase in resource usage from additional
> handlers.  Each handler definition creates an additional URL endpoint
> within the servlet container, additional object creation within Solr,
> and perhaps an additional thread pool and threads to go with it, so it's
> not free, but I doubt that it's significant.  The resources required for
> actually handling a request is likely to dwarf what's required for more
> handlers.
>
> Disclaimer: I have not delved into the code to figure out exactly what
> gets created with a search handler config, so I don't know exactly what
> happens.  I'm basing this on general knowledge about how Java programs
> are constructed by expert developers, not specifics about Solr.
>
> There are others on the list who have a much better idea than I do, so
> if I'm wrong, I'm sure one of them will let me know.
>
> Thanks,
> Shawn
>
>

Re: Cost of having multiple search handlers?

2015-09-28 Thread Walter Underwood

We did the same thing, but reporting performance metrics to Graphite.

But we won’t be able to add servlet filters in 6.x, because it won’t be a 
webapp.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 28, 2015, at 11:32 AM, Gili Nachum  wrote:
> 
> A different solution to the same need: I'm measuring response times of
> different collections measuring  online/batch queries apart using New
> Relic. I've added a servlet filter that analyses the request and makes this
> info available to new relic over a request argument.
> 
> The built in new relic solr plug in doesn't provide much.
> On Sep 28, 2015 17:16, "Shawn Heisey"  wrote:
> 
>> On 9/28/2015 6:30 AM, Oliver Schrenk wrote:
>>> I want to register multiple but identical search handler to have
>> multiple buckets to measure performance for our different apis and
>> consumers (and to find out who is actually using Solr).
>>> 
>>> What are there some costs associated with having multiple search
>> handlers? Are they neglible?
>> 
>> Unless you are creating hundreds or thousands of them, I doubt you'll
>> notice any significant increase in resource usage from additional
>> handlers.  Each handler definition creates an additional URL endpoint
>> within the servlet container, additional object creation within Solr,
>> and perhaps an additional thread pool and threads to go with it, so it's
>> not free, but I doubt that it's significant.  The resources required for
>> actually handling a request is likely to dwarf what's required for more
>> handlers.
>> 
>> Disclaimer: I have not delved into the code to figure out exactly what
>> gets created with a search handler config, so I don't know exactly what
>> happens.  I'm basing this on general knowledge about how Java programs
>> are constructed by expert developers, not specifics about Solr.
>> 
>> There are others on the list who have a much better idea than I do, so
>> if I'm wrong, I'm sure one of them will let me know.
>> 
>> Thanks,
>> Shawn
>> 
>>

Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Ajinkya Kale

Hi,

I am trying to retrieve all the documents from a solr index in a batched
manner.
I have 100M documents. I am retrieving them using the method proposed here
https://nowontap.wordpress.com/2014/04/04/solr-exporting-an-index-to-an-external-file/
I am dumping 10M document splits in each file. I get "OutOfMemoryError" if
start is at 50M. I get the same error even if rows=10 for start=50M.
Curl on start=0 rows=50M in one go works fine too. But things go bad when
start is at 50M.
My Solr version is 4.4.0.

Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:146)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1502)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)

--aj

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Ravi Solr

Gili I was constantly checking the cloud admin UI and it always stayed
Green, that is why I initially overlooked sync issues...finally when all
options dried out I went individually to each node and quieried and that is
when i found the out of sync issue. The way I resolved my issue was shut
down the leader that was not synching properly and let another node become
the leader, then reindex all docs. Once the reindexing is done I started
the node that was causing the issue and it synched properly :-)

Thanks

Ravi Kiran Bhaskar



On Mon, Sep 28, 2015 at 10:26 AM, Gili Nachum  wrote:

> Were all of shard replica in active state (green color in admin ui) before
> starting?
> Sounds like it otherwise you won't hit the replica that is out of sync.
>
> Replicas can get out of sync, and report being in sync after a sequence of
> stop start w/o a chance to complete sync.
> See if it might have happened to you:
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201412.mbox/%3CCAOOKt53XTU_e0m2ioJ-S4SfsAp8JC6m-=nybbd4g_mjh60b...@mail.gmail.com%3E
> On Sep 27, 2015 06:56, "Ravi Solr"  wrote:
>
> > Erick...There is only one type of String
> > "sun.org.mozilla.javascript.internal.NativeString:" and no other
> variations
> > of that in my index, so no question of missing it. Point taken regarding
> > the CURSORMARK stuff, yes you are correct, my head so numb at this point
> > after working 3 days on this, I wasnt thinking straight.
> >
> > BTW I found the real issue, I have a total of 8 servers in the solr
> cloud.
> > The leader for this specific collection was the one that was returning 0
> > for the searches. All other 7 servers had roughly 800K docs still needing
> > the string replacement. So maybe the real issue is sync among servers.
> Just
> > to prove to myself I shutdown the solr  that was giving zero results
> (i.e.
> > all uuid strings have already been somehow devoid of spurious
> > sun.org.mozilla.javascript.internal.NativeString on that server). Now it
> > ran perfectly fine and is about to finish as last 103K are still left
> when
> > I was writing this email.
> >
> > So the real question is how can we ensure that the Sync is always
> > maintained and what to do if it ever goes out of Sync, I did see some
> Jira
> > tickets from previous 4.10.x versions where Sync was an issue. Can you
> > please point me to any doc which says how SolrCloud synchs/replicates ?
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > Thanks
> >
> > Rvai Kiran Bhaskar
> >
> > On Sat, Sep 26, 2015 at 11:00 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially
> > > using
> > > 100 docs batch, which, I later increased to 500 docs per batch. Also it
> > > would not be a infinite loop if I commit for each batch, right !!??
> > >
> > > That's not the point at all. Look at the basic logic here:
> > >
> > > You run for a while processing 100 (or 500 or 1,000) docs per batch
> > > and change all uuid fields with this statement:
> > >
> > > uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
> > >
> > > and then update the doc. You run this as long as you have any docs
> > > that satisfy the query "q=uuid:sun.org.mozilla*", _changing_
> > > every one that has this string!
> > >
> > > At that point, theoretically, no document in your index has this
> string.
> > So
> > > running your update program immediately after should find _zero_
> > documents.
> > >
> > > I've been assuming your complaint is that you don't process 1.4 M docs
> > (in
> > > batches), you process some lower number then exit and you think this is
> > > wrong.
> > > I'm claiming that you should only expect to find as many docs as have
> > been
> > > indexed since the last time the program ran.
> > >
> > > As far as the infinite loop is concerned, again trace the logic in the
> > old
> > > code.
> > > Forget about commits and all the mechanics, just look at the logic.
> > > You're querying on "sun.org.mozilla*". But you only change if you get a
> > > match on
> > > "sun.org.mozilla.javascript.internal.NativeString:"
> > >
> > > Now imagine you have a doc that has sun.org.mozilla.erick in it. That
> doc
> > > gets
> > > returned from the query but does _not_ get modified because it doesn't
> > > match your pattern. In the older code, it would be found again and
> > > returned next
> > > time you queried. Then not modified again. Eventually you'd be in a
> > > position
> > > where you never changed any docs, just kept getting the same docList
> back
> > > over and over again. Marching through based on the unique key should
> not
> > > have the same potential issue.
> > >
> > > You should not be mixing the new query stuff with CURSORMARK. Deep
> paging
> > > supposes the exact same query is being run over and over and you're
> > > _paging_
> > > through the results. You're changing the query every time so the
> results
> > > aren't
> > > very predictable.
> >

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Markus Jelsma

Hi - you need to use the CursorMark feature for larger sets: 
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
M.

 
 
-Original message-
> From:Ajinkya Kale 
> Sent: Monday 28th September 2015 20:46
> To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
> Subject: Solr java.lang.OutOfMemoryError: Java heap space
> 
> Hi,
> 
> I am trying to retrieve all the documents from a solr index in a batched
> manner.
> I have 100M documents. I am retrieving them using the method proposed here
> https://nowontap.wordpress.com/2014/04/04/solr-exporting-an-index-to-an-external-file/
> I am dumping 10M document splits in each file. I get "OutOfMemoryError" if
> start is at 50M. I get the same error even if rows=10 for start=50M.
> Curl on start=0 rows=50M in one go works fine too. But things go bad when
> start is at 50M.
> My Solr version is 4.4.0.
> 
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:146)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1502)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> 
> --aj
>

Re: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Ajinkya Kale

If I am not wrong this works only with Solr version > 4.7.0 ?
On Mon, Sep 28, 2015 at 12:23 PM Markus Jelsma 
wrote:

> Hi - you need to use the CursorMark feature for larger sets:
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> M.
>
>
>
> -Original message-
> > From:Ajinkya Kale 
> > Sent: Monday 28th September 2015 20:46
> > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
> > Subject: Solr java.lang.OutOfMemoryError: Java heap space
> >
> > Hi,
> >
> > I am trying to retrieve all the documents from a solr index in a batched
> > manner.
> > I have 100M documents. I am retrieving them using the method proposed
> here
> >
> https://nowontap.wordpress.com/2014/04/04/solr-exporting-an-index-to-an-external-file/
> > I am dumping 10M document splits in each file. I get "OutOfMemoryError"
> if
> > start is at 50M. I get the same error even if rows=10 for start=50M.
> > Curl on start=0 rows=50M in one go works fine too. But things go bad when
> > start is at 50M.
> > My Solr version is 4.4.0.
> >
> > Caused by: java.lang.OutOfMemoryError: Java heap space
> > at
> >
> org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:146)
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1502)
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
> > at
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
> > at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> >
> > --aj
> >
>

highlighting

2015-09-28 Thread Mark Fenbers


Greetings!

I have highlighting turned on in my Solr searches, but what I get back 
is  tags surrounding the found term.  Since I use a SWT StyledText 
widget to display my search results, what I really want is the offset 
and length of each found term, so that I can highlight it in my own way 
without HTML.  Is there a way to configure Solr to do that?  I couldn't 
find it.  If not, how do I go about posting this as a feature request?


Thanks,
Mark

Re: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Gili Nachum

If you can't use CursorMark, then I suggest not using the start parameter,
instead sort asc by a unique field and and range the query to records with
a field value larger then the last doc you read. Then set rows to be
whatever you found can fit in memory.

On Mon, Sep 28, 2015 at 10:59 PM, Ajinkya Kale 
wrote:

> If I am not wrong this works only with Solr version > 4.7.0 ?
> On Mon, Sep 28, 2015 at 12:23 PM Markus Jelsma  >
> wrote:
>
> > Hi - you need to use the CursorMark feature for larger sets:
> > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> > M.
> >
> >
> >
> > -Original message-
> > > From:Ajinkya Kale 
> > > Sent: Monday 28th September 2015 20:46
> > > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
> > > Subject: Solr java.lang.OutOfMemoryError: Java heap space
> > >
> > > Hi,
> > >
> > > I am trying to retrieve all the documents from a solr index in a
> batched
> > > manner.
> > > I have 100M documents. I am retrieving them using the method proposed
> > here
> > >
> >
> https://nowontap.wordpress.com/2014/04/04/solr-exporting-an-index-to-an-external-file/
> > > I am dumping 10M document splits in each file. I get "OutOfMemoryError"
> > if
> > > start is at 50M. I get the same error even if rows=10 for start=50M.
> > > Curl on start=0 rows=50M in one go works fine too. But things go bad
> when
> > > start is at 50M.
> > > My Solr version is 4.4.0.
> > >
> > > Caused by: java.lang.OutOfMemoryError: Java heap space
> > > at
> > >
> >
> org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:146)
> > > at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1502)
> > > at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
> > > at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
> > > at
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
> > > at
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> > > at
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> > >
> > > --aj
> > >
> >
>

Passing Basic Auth info to HttpSolrClient

2015-09-28 Thread Steven White

Hi,

I'm using HttpSolrClient to connect to Solr.  Everything works until when I
enabled basic authentication in Jetty.  My question is, how do I pass to
SolrJ the basic auth info. so that I don't get a 401 error?

Thanks in advance

Steve

error reporting during indexing

2015-09-28 Thread Matteo Grolla

Hi,
if I need fine grained error reporting I use Http Solr server and send
1 doc per request using the add method.
I report errors on exceptions of the add method,
I'm using autocommit so I'm not seing errors related to commit.
Am I loosing some errors? Is there a better way?

Thanks

CloudSolrClient timeout settingsr

2015-09-28 Thread Arcadius Ahouansou

CloudSolrClient has zkClientTimeout/zkConnectTimeout  for access to
zookeeper.


It would be handy to also have the possibility to set something like
 soTimeout/connectTimeout for accessing the solr nodes similarly to the old
non-cloud client.

Currently, in order to set a timeout for the client to connect to solr, one
has to create a custom HttpClient.
Same goes for maxConnection.

Is this worth a Jira ticket?

Thanks.

Arcadius.

Re: Cost of having multiple search handlers?

2015-09-28 Thread Jeff Wartes


One would hope that https://issues.apache.org/jira/browse/SOLR-4735 will
be done by then. 


On 9/28/15, 11:39 AM, "Walter Underwood"  wrote:

>We did the same thing, but reporting performance metrics to Graphite.
>
>But we won’t be able to add servlet filters in 6.x, because it won’t be a
>webapp.
>
>wunder
>Walter Underwood
>wun...@wunderwood.org
>http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 28, 2015, at 11:32 AM, Gili Nachum  wrote:
>> 
>> A different solution to the same need: I'm measuring response times of
>> different collections measuring  online/batch queries apart using New
>> Relic. I've added a servlet filter that analyses the request and makes
>>this
>> info available to new relic over a request argument.
>> 
>> The built in new relic solr plug in doesn't provide much.
>> On Sep 28, 2015 17:16, "Shawn Heisey"  wrote:
>> 
>>> On 9/28/2015 6:30 AM, Oliver Schrenk wrote:
 I want to register multiple but identical search handler to have
>>> multiple buckets to measure performance for our different apis and
>>> consumers (and to find out who is actually using Solr).
 
 What are there some costs associated with having multiple search
>>> handlers? Are they neglible?
>>> 
>>> Unless you are creating hundreds or thousands of them, I doubt you'll
>>> notice any significant increase in resource usage from additional
>>> handlers.  Each handler definition creates an additional URL endpoint
>>> within the servlet container, additional object creation within Solr,
>>> and perhaps an additional thread pool and threads to go with it, so
>>>it's
>>> not free, but I doubt that it's significant.  The resources required
>>>for
>>> actually handling a request is likely to dwarf what's required for more
>>> handlers.
>>> 
>>> Disclaimer: I have not delved into the code to figure out exactly what
>>> gets created with a search handler config, so I don't know exactly what
>>> happens.  I'm basing this on general knowledge about how Java programs
>>> are constructed by expert developers, not specifics about Solr.
>>> 
>>> There are others on the list who have a much better idea than I do, so
>>> if I'm wrong, I'm sure one of them will let me know.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread will martin

http://opensourceconnections.com/blog/2014/07/13/reindexing-collections-with-solrs-cursor-support/

-Original Message-
From: Ajinkya Kale [mailto:kaleajin...@gmail.com] 
Sent: Monday, September 28, 2015 2:46 PM
To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
Subject: Solr java.lang.OutOfMemoryError: Java heap space

Hi,

I am trying to retrieve all the documents from a solr index in a batched manner.
I have 100M documents. I am retrieving them using the method proposed here 
https://nowontap.wordpress.com/2014/04/04/solr-exporting-an-index-to-an-external-file/
I am dumping 10M document splits in each file. I get "OutOfMemoryError" if 
start is at 50M. I get the same error even if rows=10 for start=50M.
Curl on start=0 rows=50M in one go works fine too. But things go bad when start 
is at 50M.
My Solr version is 4.4.0.

Caused by: java.lang.OutOfMemoryError: Java heap space at
org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:146)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1502)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)

--aj

Re: error reporting during indexing

2015-09-28 Thread Erick Erickson

You shouldn't be losing errors with HttpSolrServer. Are you
seeing evidence that you are or is this mostly a curiosity question?

Do not it's better to batch up docs, your throughput will increase
a LOT. That said, when you do batch (e.g. send 500 docs per update
or whatever) and you get an error back, you're not quite sure what
doc failed. So what people do is retry a failed batch one document
at a time when the batch has errors and rely on Solr overwriting
any docs in the batch that were indexed the first time.

Best,
Erick

On Mon, Sep 28, 2015 at 2:27 PM, Matteo Grolla  wrote:
> Hi,
> if I need fine grained error reporting I use Http Solr server and send
> 1 doc per request using the add method.
> I report errors on exceptions of the add method,
> I'm using autocommit so I'm not seing errors related to commit.
> Am I loosing some errors? Is there a better way?
>
> Thanks

Re: CloudSolrClient timeout settingsr

2015-09-28 Thread Shawn Heisey

On 9/28/2015 4:04 PM, Arcadius Ahouansou wrote:
> CloudSolrClient has zkClientTimeout/zkConnectTimeout  for access to
> zookeeper.
>
> It would be handy to also have the possibility to set something like
>  soTimeout/connectTimeout for accessing the solr nodes similarly to the old
> non-cloud client.
>
> Currently, in order to set a timeout for the client to connect to solr, one
> has to create a custom HttpClient.
> Same goes for maxConnection.

Currently SolrJ is using HttpClient classes and methods that are
deprecated as of HC 4.3.  SolrJ in the latest version is using HC 4.5,
which still has all that deprecated code.

In order to remove HC deprecations, SolrJ must move to immutable
HttpClient objects, so the current methods on HttpSolrClient that modify
the HttpClient object, like setConnectionTimeout, will no longer work.

All of these settings (and more) can be configured on the *requests*
made through HttpClient, and this is the way that the HttpComponents
project recommends writing code using their libarary, so what we really
need to have is simply numbers for these settings stored in the class
implementing SolrClient (like HttpSolrClient, CloudSolrClient, etc),
which get passed down to the internal code that makes the request.  We
have an issue for removing HC deprecations, but because it's a massive
undertaking requiring a fair amount of experience with new
HttpComponents classes/methods, nobody has attempted to do it:

https://issues.apache.org/jira/browse/SOLR-5604

Thanks,
Shawn

Re: Cost of having multiple search handlers?

2015-09-28 Thread Walter Underwood

We built our own because there was no movement on that. Don’t hold your breath.

Glad to contribute it. We’ve been running it in production for a year, but the 
config is pretty manual.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 28, 2015, at 4:41 PM, Jeff Wartes  wrote:
> 
> 
> One would hope that https://issues.apache.org/jira/browse/SOLR-4735 will
> be done by then. 
> 
> 
> On 9/28/15, 11:39 AM, "Walter Underwood"  wrote:
> 
>> We did the same thing, but reporting performance metrics to Graphite.
>> 
>> But we won’t be able to add servlet filters in 6.x, because it won’t be a
>> webapp.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Sep 28, 2015, at 11:32 AM, Gili Nachum  wrote:
>>> 
>>> A different solution to the same need: I'm measuring response times of
>>> different collections measuring  online/batch queries apart using New
>>> Relic. I've added a servlet filter that analyses the request and makes
>>> this
>>> info available to new relic over a request argument.
>>> 
>>> The built in new relic solr plug in doesn't provide much.
>>> On Sep 28, 2015 17:16, "Shawn Heisey"  wrote:
>>> 
 On 9/28/2015 6:30 AM, Oliver Schrenk wrote:
> I want to register multiple but identical search handler to have
 multiple buckets to measure performance for our different apis and
 consumers (and to find out who is actually using Solr).
> 
> What are there some costs associated with having multiple search
 handlers? Are they neglible?

 Unless you are creating hundreds or thousands of them, I doubt you'll
 notice any significant increase in resource usage from additional
 handlers.  Each handler definition creates an additional URL endpoint
 within the servlet container, additional object creation within Solr,
 and perhaps an additional thread pool and threads to go with it, so
 it's
 not free, but I doubt that it's significant.  The resources required
 for
 actually handling a request is likely to dwarf what's required for more
 handlers.

 Disclaimer: I have not delved into the code to figure out exactly what
 gets created with a search handler config, so I don't know exactly what
 happens.  I'm basing this on general knowledge about how Java programs
 are constructed by expert developers, not specifics about Solr.

 There are others on the list who have a much better idea than I do, so
 if I'm wrong, I'm sure one of them will let me know.

 Thanks,
 Shawn

>> 
>

38 matches

Mail list logo