date:20130617

RE: analyzer for Code

2013-06-17 Thread Gian Maria Ricci

I'll have a look to it, thanks to everyone.

--
Gian Maria Ricci
Mobile: +39 320 0136949



-Original Message-
From: Steve Rowe [mailto:sar...@gmail.com] 
Sent: Thursday, June 13, 2013 9:03 PM
To: solr-user@lucene.apache.org
Subject: Re: analyzer for Code

Hi Gian Maria,

OpenGrok  has a bunch of JFlex-based
computer language tokenizers for Lucene:
.  Not sure how much work it would be to use them in another
project, though.

There's a bunch of JFlex grammars listed here, though most (almost all?) are
not integrated with Lucene: 

 


Looks like at least the Jsyntaxpane and RSyntaxTextArea projects have
multiple programming language lexers.

Steve

On Jun 13, 2013, at 1:40 PM, Gian Maria Ricci 
wrote:

> Thanks for the suggestions, I'll try with the 
> WordDelimiterFilterFactory. My aim is not to have a perfect analysis, 
> just a way to quick search for words in the whole history of a 
> codebase. J
>  
> --
> Gian Maria Ricci
> Mobile: +39 320 0136949
>
>  
>  
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, June 13, 2013 1:24 PM
> To: solr-user@lucene.apache.org; Gian Maria Ricci
> Subject: Re: analyzer for Code
>  
> Well, WordDelimiterFilterFactory would split on the punctuation, so 
> you could add it to the analyzer chain along with StandardAnalyzer.
>  
> You could use one of the regex filters to break up tokens that make it 
> through the analyzer as you see fit.
>  
> But in general, this will be a bunch of compromises since programming 
> languages are, shall we say, not standard 
>  
> Best
> Erick
>  
> 
> On Thu, Jun 13, 2013 at 4:19 AM, Gian Maria Ricci
 wrote:
> I did a little search around and did not find anything interesting. Anyone
know if some analyzers exists to better index source code (es C#, C++. Java
etc)?
>  
> Standard analyzer is quite good, but I wish to know if there are some more
specific analyzers that can do a better indexing. Es I did a little try with
C# and the full class name was indexed without splitting by dots. So
MyLib.Helpers.Myclass becomes one token and when I search for MyClass I did
not find matches.
>  
> Thanks in advance.
>  
> --
> Gian Maria Ricci
> Mobile: +39 320 0136949
>
>  
>

RE: analyzer for Code

2013-06-17 Thread Gian Maria Ricci

Thanks very much for the suggestion, it sounds interesting.

--
Gian Maria Ricci
Mobile: +39 320 0136949



-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, June 14, 2013 2:59 AM
To: solr-user@lucene.apache.org; alkamp...@nablasoft.com
Subject: Re: analyzer for Code

Gian,

Lucene in Action has a case study from Krugle about their analysis for a
code search engine, if you want to look there.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Jun 13, 2013 at 4:19 AM, Gian Maria Ricci
wrote:

> I did a little search around and did not find anything interesting. 
> Anyone know if some analyzers exists to better index source code (es 
> C#, C++. Java
> etc)?
>
> ** **
>
> Standard analyzer is quite good, but I wish to know if there are some 
> more specific analyzers that can do a better indexing. Es I did a 
> little try with C# and the full class name was indexed without 
> splitting by dots. So MyLib.Helpers.Myclass becomes one token and when 
> I search for MyClass I did not find matches. 
>
> ** **
>
> Thanks in advance.
>
> ** **
>
> --
>
> Gian Maria Ricci
>
> Mobile: +39 320 0136949
>
>  [image:
> https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQyg0wiW_QuTxl-rn
> uVR2P0jGuj4qO3I9attctCNarL--FC3vdPYg] ariaricci>
>  [image:
> https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcT8z0HpwpDSjDWw1I
> 59Yx7HmF79u-NnP0NYeYYyEyWM1WtIbOl7]
>  [image:
> https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQQWMj687BGGypKMU
> Tub_lkUrull1uU2LTx0K2tDBeu3mNUr7Oxlg] mpferEng>
>  [image:
> https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSkTG_lPTPFe470xf
> DtiInUtseqKcuV_lvI5h_-8t_3PsY5ikg3]
> 
>
> ** **
>
> ** **
>

Different scores for exact and non-exact matching

2013-06-17 Thread Daniel Mosesson

What I am looking to do is take field that contains a string like (called name 
for example):

"This is a sample string"

and then query by that field so that a search for "This" gets x points (exact 
match), "sam" gets y points (partial match).

I attempted to do this via the sort and query parameters, like so:

sort=if(name==This,100,50) but this gives me an error:
sort param could not be parsed as a query, and is not a field that exists in 
the index: if(name==This,100,50)

Full URL:
http://localhost:8983/solr/db/select?q=name%3A*&sort=if(name%3D%3D%22This%22%2C100%2C50)+asc&fl=price%2Cname&wt=xml&indent=true

Is there a way to do this?

Note: I believe that I can at least get the documents that need to be sorted 
via (name:This AND name:This*) but then I do not know where to go from there 
(as I can't seem to get sort working for any functions).

Can anyone provide some examples for how to do this kind of thing?

Thank you.



**
This e-mail message and any attachments are confidential. Dissemination, 
distribution or copying of this e-mail or any attachments by anyone other than 
the intended recipient is prohibited. If you are not the intended recipient, 
please notify Ipreo immediately by replying to this e-mail, and destroy all 
copies of this e-mail and any attachments. Thank you!
**

Re: Solr Server Add causes java.net.SocketException: No buffer space available

2013-06-17 Thread Snubbel

Hello,

thanks for your replies,

maybe I'm opening to many connections, only I don't know how, because I
don't do this explicitly. Here is a snippet of my code, can anyone explain,
where the connections are opend and how I can close them?

solrServer = new HttpSolrServer(url);
QueryResponse result = solrServer.query(query);

for (SolrDocument doc : result.getResults()) {
Map atomic_update_map_navigateTo = new
HashMap();
List navigateToList = new ArrayList();

idList.add("ID123456789");
atomic_update_map_navigateTo.put("add", idList);

doc.setField("my_field_name", atomic_update_map_navigateTo);

solrServer.add(ClientUtils.toSolrInputDocument(doc));
commitCounter++;
if (commitCounter % 50 == 0) {
solrServer.commit();
}
}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Server-Add-causes-java-net-SocketException-No-buffer-space-available-tp4070533p4070943.html
Sent from the Solr - User mailing list archive at Nabble.com.

AW: Best way to match umlauts

2013-06-17 Thread André Widhani

We configure both baseletter conversion (removing accents and umlauts) and 
alternate spelling through the mapping file.

For baseletter conversion and mostly german content we transform all accents 
that are not used in german language (like french é, è, ê etc.) to their 
baseletter. We do not do do this for german umlauts, because the assumption is 
that a user will know the correct spelling in his or her native language but 
probably not in foreign languages.

For alternate spelling, we use the following mapping:

  # * Alternate spelling
  #
  # Additionally, german umlauts are converted to their base form ("ä" => "ae"),
  # and "ß" is converted to "ss". Which means both spellings can be used to find
  # either one.
  #
  "\u00C4" => "AE"
  "\u00D6" => "OE"
  "\u00DC" => "UE"
  "\u00E4" => "ae"
  "\u00F6" => "oe"
  "\u00DF" => "ss"
  "\u00FC" => "ue"


André

Highlighting Response

2013-06-17 Thread Furkan KAMACI

Here is my highlight handler:



dismax
explicit
0.01
content^0.5 anchor^1.0 title^1.2
content^0.5 anchor^1.5 title^1.2 site^1.5
url
100
true
*:*
title url content
0
title
0
url
regex



Return value is as follows:


+
+
+


response has just url list of results. highlighting has title, url and
content of results. I don't want the first list. How can I do that?

Re: Different scores for exact and non-exact matching

2013-06-17 Thread Upayavira

q="This is a sample string"^10 (This is a sample string)^5 fuzzy:(This
is a sample string)

You'd have to define the 'fuzzy' field as an EdgeNGram field, such that
'sample' gets indexed as:

 s
 sa
 sam
 samp
 sampl
 sample

Obviously, that'll take more space in your index, but I believe it would
get you the result you want.

Thus, exact phrase matches are boosted 10, docs that contain some of the
words are boosted 5, and fuzzy matches effectively get a boost of 1 (the
default).

Upayavira

On Mon, Jun 17, 2013, at 08:46 AM, Daniel Mosesson wrote:
> What I am looking to do is take field that contains a string like (called
> name for example):
> 
> "This is a sample string"
> 
> and then query by that field so that a search for "This" gets x points
> (exact match), "sam" gets y points (partial match).
> 
> I attempted to do this via the sort and query parameters, like so:
> 
> sort=if(name==This,100,50) but this gives me an error:
> sort param could not be parsed as a query, and is not a field that exists
> in the index: if(name==This,100,50)
> 
> Full URL:
> http://localhost:8983/solr/db/select?q=name%3A*&sort=if(name%3D%3D%22This%22%2C100%2C50)+asc&fl=price%2Cname&wt=xml&indent=true
> 
> Is there a way to do this?
> 
> Note: I believe that I can at least get the documents that need to be
> sorted via (name:This AND name:This*) but then I do not know where to go
> from there (as I can't seem to get sort working for any functions).
> 
> Can anyone provide some examples for how to do this kind of thing?
> 
> Thank you.
> 
> 
> 
> **
> This e-mail message and any attachments are confidential. Dissemination,
> distribution or copying of this e-mail or any attachments by anyone other
> than the intended recipient is prohibited. If you are not the intended
> recipient, please notify Ipreo immediately by replying to this e-mail,
> and destroy all copies of this e-mail and any attachments. Thank you!
> **

Re: sort=geodist() asc

2013-06-17 Thread Erick Erickson

Hmmm, could you simply store a single-valued
point field to use for sorting etc? It seems like
the problem here is partly the same as for
multiValued fields in general, which one
should be used?

Best
Erick

On Mon, Jun 17, 2013 at 1:50 AM, William Bell  wrote:
> This simple feature of "sort=geodist() asc" is very powerful since it
> enables us to move from SOLR 3 to SOLR 4 without rewriting all our queries.
>
> We also use boost=geodist() in some cases, and some bf/bq.
>
> bf=recip(geodist(),2,200,20)&sort=score
> desc
>
> OR
>
> boost=recip(geodist(),2,200,20)&sort=score
> desc
>
> I know it specifically says it won't work for geohash multivalue points,
> but what would it take to do it?
>
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076

Re: Solr Server Add causes java.net.SocketException: No buffer space available

2013-06-17 Thread Erick Erickson

Wild shot in the dark here, but try taking the
solrServer.commit() out and rely on the
autocommit parameters in solrconfig.xml.

And configure autocommit to commit, say,
every 5 minutes and do _not_ configure
the "numDocs" bit for autocommit.

If you do that and don't have this problem,
we can talk more about why but it'd
be a fast test.

Best
Erick

On Mon, Jun 17, 2013 at 4:03 AM, Snubbel
 wrote:
> Hello,
>
> thanks for your replies,
>
> maybe I'm opening to many connections, only I don't know how, because I
> don't do this explicitly. Here is a snippet of my code, can anyone explain,
> where the connections are opend and how I can close them?
>
> solrServer = new HttpSolrServer(url);
> QueryResponse result = solrServer.query(query);
>
> for (SolrDocument doc : result.getResults()) {
> Map atomic_update_map_navigateTo = new
> HashMap();
> List navigateToList = new ArrayList();
>
> idList.add("ID123456789");
> atomic_update_map_navigateTo.put("add", idList);
>
> doc.setField("my_field_name", atomic_update_map_navigateTo);
>
> solrServer.add(ClientUtils.toSolrInputDocument(doc));
> commitCounter++;
> if (commitCounter % 50 == 0) {
> solrServer.commit();
> }
> }
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Server-Add-causes-java-net-SocketException-No-buffer-space-available-tp4070533p4070943.html
> Sent from the Solr - User mailing list archive at Nabble.com.

RE: out of memory during indexing do to large incoming queue

2013-06-17 Thread Yoni Amir

Thanks Shawn,
This was very helpful. Indeed I had some terminology problem regarding the 
segment merging. In any case, I tweaked those parameters that you recommended 
and it helped a lot.

I was wondering about your recommendation to use facet.method=enum? Can you 
explain what is the trade-off here? I understand that I gain a benefit by using 
less memory, but what with I lose? Is it speed?

Also, do you know if there is an answer to my original question in this thread? 
Solr has a queue of incoming requests, which, in my case, kept on growing. I 
looked at the code but couldn't find it, I think maybe it is an implicit queue 
in the form of Java's concurrent thread pool or something like that.

Is it possible to limit the size of this queue, or to determine its size during 
runtime? This is the last issue that I am trying to figure out right now.

Also, to answer your question about the field all_text: all the fields are 
stored in order to support partial-update of documents. Most of the fields are 
used for highlighting, all_text is used for searching. I'll gladly omit 
all_text from being stored, but then partial-update won't work.
The reason I didn't use edismax to search all the fields, is because the list 
of all fields is very long. Can edismax handle several hundred fields in the 
list? What about dynamic fields? Edismax requires the list to be fixed in the 
configuration file, so I can't include dynamic fields there. I can pass along 
the full list in the 'qf' parameter in every search request, but this seems 
like a waste? Also, what about performance? I was told that the best practice 
in this case (you have lots of fields and want to search everything) is to copy 
everything to a catch-all field.

Thanks again,
Yoni

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Monday, June 03, 2013 17:08
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/3/2013 1:06 AM, Yoni Amir wrote:
> Solrconfig.xml -> http://apaste.info/dsbv
> 
> Schema.xml -> http://apaste.info/67PI
> 
> This solrconfig.xml file has optimization enabled. I had another file which I 
> can't locate at the moment, in which I defined a custom merge scheduler in 
> order to disable optimization.
> 
> When I say 1000 segments, I mean that's the number I saw in Solr UI. I assume 
> there were much more files than that.

I think we have a terminology problem happening here.  There's nothing you can 
put in a solrconfig.xml file to enable optimization.  Solr will only optimize 
when you explicitly send an optimize command to it.  There is segment merging, 
but that's not the same thing.  Segment merging is completely normal.  Normally 
it's in the background and indexing will continue while it's occurring, but if 
you get too many merges happening at once, that can stop indexing.  I have a 
solution for that:

At the following URL s my indexConfig section, geared towards heavy indexing.  
The TieredMergePolicy settings are the equivalent of a legacy mergeFactor of 
35.  I've gone with a lower-than-default ramBufferSizeMB here, to reduce memory 
usage.  The default value for this setting as of version 4.1 is 100:

http://apaste.info/4gaD

One thing that this configuration does which might directly impact on your 
setup is increase the maxMergeCount.  I believe the default value for this is 
3.  This means that if you get more than three "levels" of merging happening at 
the same time, indexing will stop until until the number of levels drops.  
Because Solr always does the biggest merge first, this can really take a long 
time.  The combination of a large mergeFactor and a larger-than-normal 
maxMergeCount will ensure that this situation never happens.

If you are not using SSD, don't increase maxThreadCount beyond one.  The 
random-access characteristics of regular hard disks will make things go slower 
with more threads, not faster.  With SSD, increasing the threads can make 
things go faster.

There's a few high memory use things going on in your config/schema.

The first thing that jumped out at me is facets.  They use a lot of memory.  
You can greatly reduce the memory use by adding &facet.method=enum to the 
query.  The default for the method is fc, which means fieldcache.  The size of 
the Lucene fieldcache cannot be directly controlled by Solr, unlike Solr's own 
caches.  It gets as big as it needs to be, and facets using the fc method will 
put all the facet data for the entire index in the fieldcache.

The second thing that jumped out at me is the fact that all_text is being 
stored.  Apparently this is for highlighting.  I will admit that I do not know 
anything about highlighting, so you might need separate help there.  You are 
using edismax for your query parser, which is perfectly capable of searching 
all the fields that make up all_text, so in my mind, all_text doesn't need to 
exist at all.

If you wrote a custom merge scheduler that d

Re: Solr large boolean filter

2013-06-17 Thread Igor Kustov

> Where do the long list of IDs come from? 

I'm indexing database, so the id list is security access control list.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-tp4070747p4070964.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: PostingsHighlighter and analysis

2013-06-17 Thread Markus Jelsma

Hi,

Any intelligent suggestions for this issue?

Thanks,
Markus 
 
-Original message-
> From:Trey Hyde 
> Sent: Mon 11-Mar-2013 21:44
> To: solr-user@lucene.apache.org
> Subject: PostingsHighlighter and analysis
> 
> debug=timing has told me for a very long time that 99% of my query time for 
> slow queries is in the highlighting component so I've been eagerly awaiting 
> the postingshighlighter for quite some time.  Mean query times 50ms or less, 
> with certain queries able to generate > 30s worth of highlighting.Now 
> that it's here I've been somewhat disappointed since I can't use it since so 
> many common analyzers emit tokens out of order, which, apparently is not 
> compatible with storeOffsetsWithPositions.
> 
> The only analyzer that is in the "bad" list according to LUCENE-4641 that is 
> really critical to our searches is the WordDelimiter filer.
> 
> My current index time filter config (which I believe has bee unchanged for me 
> for 5+ years):
>   generateWordParts="1"
> generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/>
> 
> Does anyone have any suggestions deal with this?   Perhaps limiting certain 
> options will always produce tokens in order?
> 
> Thanks
> 
> Trey Hyde 
> Director of Engineering
> Email th...@centraldesktop.com
> 
> Central Desktop. Work together in ways you never thought possible. 
> Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Google+  
> |  Blog 
> 
> 
>

Re: Solr Server Add causes java.net.SocketException: No buffer space available

2013-06-17 Thread Snubbel

Hello,

I did set the autoCommit to 5 Minutes and removed all commit Statements but
one, because, you see, my test case is as follows:

I need a hugh number of documents in Solr. Then I want to update them with
AtomicUpdate and, for comparison, the "classical" way, like we did before
Solr 4.3. 
So, I need to do at least one hard commit to get the initial documents to
Solr and to be able to load them by query. 

That one commit already seems to be enough to cause the connections to
exceed.

Actually, the exception happens on adding the documents though, not on
commit:

solrServer.add(ClientUtils.toSolrInputDocument(docModule));
 
17.06.2013 13:45:44 org.apache.http.impl.client.DefaultRequestDirector
tryConnect
INFO: I/O exception (java.net.SocketException) caught when connecting to the
target host: No buffer space available (maximum connections reached?):
connect
17.06.2013 13:45:44 org.apache.http.impl.client.DefaultRequestDirector
tryConnect
INFO: Retrying connect
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://localhost:/solr-4.3.0
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at
de.exxcellent.connect.portal.business.solr.AtomicUpdateTest.addKontaktgehaeuseToModule(AtomicUpdateTest.java:234)
at
de.exxcellent.connect.portal.business.solr.AtomicUpdateTest.performanceAtomicVSclassicUpdateTest(AtomicUpdateTest.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
at
org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
at org.testng.TestRunner.privateRun(TestRunner.java:767)
at org.testng.TestRunner.run(TestRunner.java:617)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
at org.testng.SuiteRunner.run(SuiteRunner.java:240)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1198)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1123)
at org.testng.TestNG.run(TestNG.java:1031)
at org.testng.remote.RemoteTestNG.run(RemoteTestNG.java:111)
at org.testng.remote.RemoteTestNG.initAndRun(RemoteTestNG.java:204)
at org.testng.remote.RemoteTestNG.main(RemoteTestNG.java:175)
at org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.net.SocketException: No buffer space available (maximum
connections reached?): connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:640)
at
org.a

Searching for cache stats

2013-06-17 Thread John Nielsen


  
  
Hi,

I am looking for an automated way of getting cache stats from Solr.

Specificly what I am looking for are the cumulative evictions for
each cache type for each core:

http://screencast.com/t/IrD0VItfVduk

An example of how I would like to be able to query the cache
information is basically something like when I get core information
like this:

http://URL:8000/solr/admin/cores

Does anything similar exist which will allow me to get the cache
information?


-- 
  
  Med venlig hilsen / Best regards
  
  John Nielsen
  Programmer
  
  MCB A/S
  Enghaven 15
  DK-7500 Holstebro
  
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk

Re: Solr Server Add causes java.net.SocketException: No buffer space available

2013-06-17 Thread Snubbel

I did try something else. I did add a list of SolrInputDocuments containing
500 Documents at a time. Now it works.

Adding every Document seems to be too much after about 1 Documents (even
with commits after every 500 Documents).
But this is only in Solr 4.3, in 4.0 this was possible without a problem.

So, could this be a bug?

Best regards, Snubbel 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Server-Add-causes-java-net-SocketException-No-buffer-space-available-tp4070533p4070981.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching for cache stats

2013-06-17 Thread Stefan Matheis


JohnThe UI is using /solr/collection1/admin/mbeans?stats=true to get those values, does this help?- Stefan

 
On Monday, June 17, 2013 at 2:32 PM, John Nielsen wrote:


   


   
   
Hi,

I am looking for an automated way of getting cache stats from Solr.

Specificly what I am looking for are the cumulative evictions for
each cache type for each core:

http://screencast.com/t/IrD0VItfVduk

An example of how I would like to be able to query the cache
information is basically something like when I get core information
like this:

http://URL:8000/solr/admin/cores

Does anything similar exist which will allow me to get the cache
information?


-- 
  
  Med venlig hilsen / Best regards
  
  John Nielsen
  Programmer
  
  MCB A/S
  Enghaven 15
  DK-7500 Holstebro
  
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk

Re: Solr large boolean filter

2013-06-17 Thread Jack Krupansky


That would have been one of my top guesses.

Take a look at LucidWorks Search and how they have a built-in role-based 
document access control component. They call the feature "Search Filters":


http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control

-- Jack Krupansky

-Original Message- 
From: Igor Kustov

Sent: Monday, June 17, 2013 6:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr large boolean filter


Where do the long list of IDs come from?


I'm indexing database, so the id list is security access control list.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-tp4070747p4070964.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AW: Best way to match umlauts

2013-06-17 Thread Jack Krupansky

And this is a key advantage of using the mapping char filter rather than the 
simple ASCII folding token filter - you can easily go in and modify the 
mappings for application/domain/environment-specific character mappings such 
as these.


-- Jack Krupansky

-Original Message- 
From: André Widhani

Sent: Monday, June 17, 2013 4:27 AM
To: solr-user@lucene.apache.org
Subject: AW: Best way to match umlauts

We configure both baseletter conversion (removing accents and umlauts) and 
alternate spelling through the mapping file.


For baseletter conversion and mostly german content we transform all accents 
that are not used in german language (like french é, è, ê etc.) to their 
baseletter. We do not do do this for german umlauts, because the assumption 
is that a user will know the correct spelling in his or her native language 
but probably not in foreign languages.


For alternate spelling, we use the following mapping:

 # * Alternate spelling
 #
 # Additionally, german umlauts are converted to their base form ("ä" => 
"ae"),
 # and "ß" is converted to "ss". Which means both spellings can be used to 
find

 # either one.
 #
 "\u00C4" => "AE"
 "\u00D6" => "OE"
 "\u00DC" => "UE"
 "\u00E4" => "ae"
 "\u00F6" => "oe"
 "\u00DF" => "ss"
 "\u00FC" => "ue"


André
=

503 - server is shutting down error

2013-06-17 Thread gururaj kosuru

Hi,
   I am trying to run solr 4.3 on a standalone system using tomcat 6
and I am facing a ExceptionInInitializerError. I am attaching the trace of
the log for details. I also get a log4j error(no such file or directory),
but I think it is not related to this. The error logs says:




[main] ERROR org.apache.solr.core.SolrCore  ?
null:java.lang.ExceptionInInitializerError
  at
org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:201)
  at
org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:115)
  at
org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:250)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:380)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:358)
  at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:326)
  at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:124)
  at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
  at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
  at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
  at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
  at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
  at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
  at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
  at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
  at
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943)
  at
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778)
  at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504)
  at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
  at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
  at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
  at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
  at
org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
  at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
  at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
  at
org.apache.catalina.core.StandardService.start(StandardService.java:525)
  at
org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
  at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
  at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
  Caused by: java.lang.ClassCastException: class
org.apache.lucene.facet.codecs.facet42.Facet42DocValuesFormat
  at java.lang.Class.asSubclass(Class.java:3027)
  at
org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:137)
  at
org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:65)
  at
org.apache.lucene.util.NamedSPILoader.(NamedSPILoader.java:47)
  at
org.apache.lucene.util.NamedSPILoader.(NamedSPILoader.java:37)
  at
org.apache.lucene.codecs.DocValuesFormat.(DocValuesFormat.java:43)
  ... 34 more


Any help is appreciated.
Thanks,
Gururaj

Re: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-17 Thread Floyd Wu

Hi Michael, How do I configure posthighlighter with my solr 4.2 box?
Please kindly point me. Many thanks.
2013/6/15 下午10:48 於 "Michael McCandless"  寫道：

> You could also try the new[ish] PostingsHighlighter:
>
> http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Jun 15, 2013 at 8:50 AM, Michael Sokolov
>  wrote:
> > If you have very large documents (many MB) that can lead to slow
> > highlighting, even with FVH.
> >
> > See https://issues.apache.org/jira/browse/LUCENE-3234
> >
> > and try setting phraseLimit=1 (or some bigger number, but not infinite,
> > which is the default)
> >
> > -Mike
> >
> >
> >
> > On 6/14/13 4:52 PM, Andy Brown wrote:
> >>
> >> Bryan,
> >>
> >> For specifics, I'll refer you back to my original email where I
> >> specified all the fields/field types/handlers I use. Here's a general
> >> overview.
> >>   I really only have 3 fields that I index and search against: "name",
> >> "description", and "content". All of which are just general text
> >> (string) fields. I have a catch-all field called "text" that is only
> >> used for querying. It's indexed but not stored. The "name",
> >> "description", and "content" fields are copied into the "text" field.
> >>   For partial word matching, I have 4 more fields: "name_par",
> >> "description_par", "content_par", and "text_par". The "text_par" field
> >> has the same relationship to the "*_par" fields as "text" does to the
> >> others (only used for querying). Those partial word matching fields are
> >> of type "text_general_partial" which I created. That field type is
> >> analyzed different than the regular text field in that it goes through
> >> an EdgeNGramFilterFactory with the minGramSize="2" and maxGramSize="7"
> >> at index time.
> >>   I query against both "text" and "text_par" fields using edismax
> deftype
> >> with my qf set to "text^2 text_par^1" to give full word matches a higher
> >> score. This part returns back very fast as previously stated. It's when
> >> I turn on highlighting that I take the huge performance hit.
> >>   Again, I'm using the FastVectorHighlighting. The hl.fl is set to "name
> >> name_par description description_par content content_par" so that it
> >> returns highlights for full and partial word matches. All of those
> >> fields have indexed, stored, termPositions, termVectors, and termOffsets
> >> set to "true".
> >>   It all seems redundant just to allow for partial word
> >> matching/highlighting but I didn't know of a better way. Does anything
> >> stand out to you that could be the culprit? Let me know if you need any
> >> more clarification.
> >>   Thanks!
> >>   - Andy
> >>
> >> -Original Message-
> >> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> >> Sent: Wednesday, May 29, 2013 5:44 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: Slow Highlighter Performance Even Using
> >> FastVectorHighlighter
> >>
> >> Andy,
> >>
> >>> I don't understand why it's taking 7 secs to return highlights. The
> >>
> >> size
> >>>
> >>> of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set
> >>
> >> to
> >>>
> >>> 1024 for this verification purpose and that should be more than
> >>
> >> enough.
> >>>
> >>> The processor is plenty powerful enough as well.
> >>>
> >>> Running VisualVM shows all my CPU time being taken by mainly these 3
> >>> methods:
> >>>
> >>>
> >> org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
> >>>
> >>> nfo.getStartOffset()
> >>>
> >> org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
> >>>
> >>> nfo.getStartOffset()
> >>>
> >> org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap(
> >>>
> >>> )
> >>
> >> That is a strange and interesting set of things to be spending most of
> >> your CPU time on. The implication, I think, is that the number of term
> >> matches in the document for terms in your query (or, at least, terms
> >> matching exact words or the beginning of phrases in your query) is
> >> extremely high . Perhaps that's coming from this "partial word match"
> >> you
> >> mention -- how does that work?
> >>
> >> -- Bryan
> >>
> >>> My guess is that this has something to do with how I'm handling
> >>
> >> partial
> >>>
> >>> word matches/highlighting. I have setup another request handler that
> >>> only searches the whole word fields and it returns in 850 ms with
> >>> highlighting.
> >>>
> >>> Any ideas?
> >>>
> >>> - Andy
> >>>
> >>>
> >>> -Original Message-
> >>> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> >>> Sent: Monday, May 20, 2013 1:39 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: RE: Slow Highlighter Performance Even Using
> >>> FastVectorHighlighter
> >>>
> >>> My guess is that the problem is those 200M documents.
> >>> FastVectorHighlighter is fast at deciding whether a match, especially
> >>
> >> a
> >>>
> >>> phrase, appear

How to get SolrJ-serialization / binary-size statistics ?

2013-06-17 Thread Ralf Heyde


Hi Everybody,

The as is situation:
We have an application (on Server 1), which fires many (up to 20) 
Solr-Queries (on Server 2) to produce the result. Since we have network 
latency for transport and serialization, we will shift the Query-Part to 
Server 2. The idea behind is, that the complete Search Logic and 
"re-crawling" must not be done using the network interface via a "slow" 
connection.
The queries are depending on each other. When query-1 produces a result, 
in dependency of the result another queries might be fired (or even not).


The Logic is nearly shifted and is working. We get feasible results 
(using a binary format) for serialization / delivery and we can measure, 
what time it takes to serialize the resultset and see the message size, 
which is passed over the network.


The Question:
Is there a queryable statistical information, where i can get 
information about the serialization (SolrJ) and the message size?
If not, is there a way to build a custom component / Filter which can 
access / log this information?


We currently having a Filter on Tomcat Level for slow-query purposes, 
but this filter only surrounds:


chain.doFilter(request, response);


What I want to know: how expensive is the serialization (SolrJ) of the 
20 queries mentioned above in comparison to the Single-Serialization of 
the object described before.


I hope you understand, what I want.

Regards, Ralf

Re: sort=geodist() asc

2013-06-17 Thread Smiley, David W.

Bill, I added this comment:
https://issues.apache.org/jira/browse/SOLR-2345?focusedCommentId=13685627&p
age=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#commen
t-13685627

On 6/17/13 1:50 AM, "William Bell"  wrote:

>This simple feature of "sort=geodist() asc" is very powerful since it
>enables us to move from SOLR 3 to SOLR 4 without rewriting all our
>queries.
>
>We also use boost=geodist() in some cases, and some bf/bq.
>
>bf=recip(geodist(),2,200,20)&sort=score
>descq.alt=*:*&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&bf=recip(geodist
>(),2,200,20)&sort=score%20desc>
>
>OR
>
>boost=recip(geodist(),2,200,20)&sort=score
>desc&q.alt=*:*&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&boost=recip%28g
>eodist%28%29,2,200,20%29&sort=score%20desc>
>
>I know it specifically says it won't work for geohash multivalue points,
>but what would it take to do it?
>
>
>
>
>-- 
>Bill Bell
>billnb...@gmail.com
>cell 720-256-8076

Need assistance in defining solr to process user generated query text

2013-06-17 Thread Mysurf Mail

Hi,
I have been reading solr wiki pages and configured solr successfully over
my flat table.
I have a few question though regarding the querying and parsing of user
generated text.

1. I have understood through this page that
I want to use dismax.
Through this page I can do it
using localparams

But I think the best way is to define this in my xml files.
Can I do this?

2.in this tutorial
(solr) the following query appears

http://localhost:8983/solr/#/collection1/query?q=video

When I want to query my fact table  I have to query using *video*.
just video retrieves nothing.
How can I query it using video only?
3. In this page
it says that
"Extended DisMax is already configured in the example configuration, with
the name edismax"
But I see it only in the /browse requestHandler
as follows:



 
   explicit
...

   edismax

Do I use it also when I use select in my url ?

4. In general, I want to transfer a user generated text to my url request
using the most standard rules (translate "",+,- signs to the q parameter
value).
What is the best way to



Thanks.

SolrJ Howto get local params from QueryResponse

2013-06-17 Thread Holger Rieß


Hi,
how can I get local params like '{!ex=dyn,cls} AAA001001_0_1.1.1_ss' from 
QueryResponse? I've tagged filter queries and facet fields with different tags 
(p.e.'dyn','cls').
I can see the tags in the QueryResponse XML facet.field section:
 
{!ex=dyn}AAA001001_0_1.1.1_ss
...
 

But the FacetField class has no method List getLocalParams().
My goal is to make an server side representation of the QueryResponse 
components and sort the facet fields on the client side by the tag.

Thanks, Holger

Filtered Query in Solr

2013-06-17 Thread Prathik Puthran

Hi,

I am making a select request to solr with with 'fq=asset_type:MUSIC ALBUM'
(see query 1 below) as one of the GET parameter. This request does not
return any results. However when I send the select request with the
parameter 'asset_type=MUSIC ALBUM'(see query 2 below) I get the results.

Does the filtered query parser do anything special (like split based on the
spaces) before processing the request? How do I avoid this from happening?

Query 1 -->
http://localhost:8080/solr/assets/select?q=amitabh&fq=asset_type%3AMUSIC%20ALBUM&wt=json

Query 2 -->
http://localhost:8080/solr/assets/select?wt=json&q=amitabh&indent=true&sort=release_year%20desc&asset_type=MUSIC%20ALBUM


Thanks,
Prathik

Re: Need assistance in defining solr to process user generated query text

2013-06-17 Thread Jack Krupansky

It sounds like you have your text indexed in a "string" field (why the 
wildcards are needed), or that maybe you are using the "keyword" tokenizer 
rather than the standard tokenizer.


What is your default or query fields for dismax/edismax? And what are the 
field types for those fields?


-- Jack Krupansky

-Original Message- 
From: Mysurf Mail

Sent: Monday, June 17, 2013 10:51 AM
To: solr-user@lucene.apache.org
Subject: Need assistance in defining solr to process user generated query 
text


Hi,
I have been reading solr wiki pages and configured solr successfully over
my flat table.
I have a few question though regarding the querying and parsing of user
generated text.

1. I have understood through this page 
that

I want to use dismax.
   Through this page I can do it
using localparams

   But I think the best way is to define this in my xml files.
   Can I do this?

2.in this tutorial
(solr) the following query appears

   http://localhost:8983/solr/#/collection1/query?q=video

   When I want to query my fact table  I have to query using *video*.
   just video retrieves nothing.
   How can I query it using video only?
3. In this page
it says that
"Extended DisMax is already configured in the example configuration, with
the name edismax"
But I see it only in the /browse requestHandler
as follows:




  explicit
   ...
   
  edismax

Do I use it also when I use select in my url ?

4. In general, I want to transfer a user generated text to my url request
using the most standard rules (translate "",+,- signs to the q parameter
value).
What is the best way to



Thanks.

Any way to have the suggest component be filter query aware?

2013-06-17 Thread Brendan Grainger

Hi All,

I expect the answer is no, but just to be sure I am wondering if there is
any way to make the suggest component (http://wiki.apache.org/solr/Suggester)
filter query aware, i.e. I'd like to have suggestions for a given context,
so say if I were searching in the book lucene in action suggestions would
be offered for terms that exist in that book not the entire index.

Otherwise I guess I should look at using EdgeNGramFilter?

Thanks
Brendan

-- 
Brendan Grainger
www.kuripai.com

Re: Filtered Query in Solr

2013-06-17 Thread Upayavira

Your fq query is:

fq=asset_type:MUSIC ALBUM

This is actually interpreted as:
fq=asset_type:MUSIC text:ALBUM

You probably want:
fq=asset_type:"MUSIC ALBUM"
or
fq=asset_type:(+MUSIC +ALBUM)
or even:
fq:{!term f=asset_type}MUSIC ALBUM

Upayavira

On Mon, Jun 17, 2013, at 03:57 PM, Prathik Puthran wrote:
> Hi,
> 
> I am making a select request to solr with with 'fq=asset_type:MUSIC
> ALBUM'
> (see query 1 below) as one of the GET parameter. This request does not
> return any results. However when I send the select request with the
> parameter 'asset_type=MUSIC ALBUM'(see query 2 below) I get the results.
> 
> Does the filtered query parser do anything special (like split based on
> the
> spaces) before processing the request? How do I avoid this from
> happening?
> 
> Query 1 -->
> http://localhost:8080/solr/assets/select?q=amitabh&fq=asset_type%3AMUSIC%20ALBUM&wt=json
> 
> Query 2 -->
> http://localhost:8080/solr/assets/select?wt=json&q=amitabh&indent=true&sort=release_year%20desc&asset_type=MUSIC%20ALBUM
> 
> 
> Thanks,
> Prathik

Re: Filtered Query in Solr

2013-06-17 Thread Prathik Puthran

The first one i.e. fq=asset_type:"MUSIC ALBUM" doesen't work.

However the 2nd one works
fq=asset_type:(+MUSIC +ALBUM)

Thanks for the response.

Regards,
Prathik


On Mon, Jun 17, 2013 at 8:41 PM, Upayavira  wrote:

> Your fq query is:
>
> fq=asset_type:MUSIC ALBUM
>
> This is actually interpreted as:
> fq=asset_type:MUSIC text:ALBUM
>
> You probably want:
> fq=asset_type:"MUSIC ALBUM"
> or
> fq=asset_type:(+MUSIC +ALBUM)
> or even:
> fq:{!term f=asset_type}MUSIC ALBUM
>
> Upayavira
>
> On Mon, Jun 17, 2013, at 03:57 PM, Prathik Puthran wrote:
> > Hi,
> >
> > I am making a select request to solr with with 'fq=asset_type:MUSIC
> > ALBUM'
> > (see query 1 below) as one of the GET parameter. This request does not
> > return any results. However when I send the select request with the
> > parameter 'asset_type=MUSIC ALBUM'(see query 2 below) I get the results.
> >
> > Does the filtered query parser do anything special (like split based on
> > the
> > spaces) before processing the request? How do I avoid this from
> > happening?
> >
> > Query 1 -->
> >
> http://localhost:8080/solr/assets/select?q=amitabh&fq=asset_type%3AMUSIC%20ALBUM&wt=json
> >
> > Query 2 -->
> >
> http://localhost:8080/solr/assets/select?wt=json&q=amitabh&indent=true&sort=release_year%20desc&asset_type=MUSIC%20ALBUM
> >
> >
> > Thanks,
> > Prathik
>

Re: Filtered Query in Solr

2013-06-17 Thread Prathik Puthran

Can you please explain why the 2nd one works?




On Mon, Jun 17, 2013 at 8:49 PM, Prathik Puthran <
prathik.puthra...@gmail.com> wrote:

> The first one i.e. fq=asset_type:"MUSIC ALBUM" doesen't work.
>
> However the 2nd one works
> fq=asset_type:(+MUSIC +ALBUM)
>
> Thanks for the response.
>
> Regards,
> Prathik
>
>
> On Mon, Jun 17, 2013 at 8:41 PM, Upayavira  wrote:
>
>> Your fq query is:
>>
>> fq=asset_type:MUSIC ALBUM
>>
>> This is actually interpreted as:
>> fq=asset_type:MUSIC text:ALBUM
>>
>> You probably want:
>> fq=asset_type:"MUSIC ALBUM"
>> or
>> fq=asset_type:(+MUSIC +ALBUM)
>> or even:
>> fq:{!term f=asset_type}MUSIC ALBUM
>>
>> Upayavira
>>
>> On Mon, Jun 17, 2013, at 03:57 PM, Prathik Puthran wrote:
>> > Hi,
>> >
>> > I am making a select request to solr with with 'fq=asset_type:MUSIC
>> > ALBUM'
>> > (see query 1 below) as one of the GET parameter. This request does not
>> > return any results. However when I send the select request with the
>> > parameter 'asset_type=MUSIC ALBUM'(see query 2 below) I get the results.
>> >
>> > Does the filtered query parser do anything special (like split based on
>> > the
>> > spaces) before processing the request? How do I avoid this from
>> > happening?
>> >
>> > Query 1 -->
>> >
>> http://localhost:8080/solr/assets/select?q=amitabh&fq=asset_type%3AMUSIC%20ALBUM&wt=json
>> >
>> > Query 2 -->
>> >
>> http://localhost:8080/solr/assets/select?wt=json&q=amitabh&indent=true&sort=release_year%20desc&asset_type=MUSIC%20ALBUM
>> >
>> >
>> > Thanks,
>> > Prathik
>>
>
>

Re: Need assistance in defining solr to process user generated query text

2013-06-17 Thread Mysurf Mail

I have one fact table with a lot of string columns and a few GUIDs just for
retreival (Not for search)



On Mon, Jun 17, 2013 at 6:01 PM, Jack Krupansky wrote:

> It sounds like you have your text indexed in a "string" field (why the
> wildcards are needed), or that maybe you are using the "keyword" tokenizer
> rather than the standard tokenizer.
>
> What is your default or query fields for dismax/edismax? And what are the
> field types for those fields?
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail
> Sent: Monday, June 17, 2013 10:51 AM
> To: solr-user@lucene.apache.org
> Subject: Need assistance in defining solr to process user generated query
> text
>
>
> Hi,
> I have been reading solr wiki pages and configured solr successfully over
> my flat table.
> I have a few question though regarding the querying and parsing of user
> generated text.
>
> 1. I have understood through this 
> >page
> that
>
> I want to use dismax.
>Through this 
> >page
> I can do it
>
> using localparams
>
>But I think the best way is to define this in my xml files.
>Can I do this?
>
> 2.in this 
> 
> >**tutorial
>
> (solr) the following query appears
>
>
> http://localhost:8983/solr/#/**collection1/query?q=video
>
>When I want to query my fact table  I have to query using *video*.
>just video retrieves nothing.
>How can I query it using video only?
> 3. In this 
> 
> >**page
>
> it says that
> "Extended DisMax is already configured in the example configuration, with
> the name edismax"
> But I see it only in the /browse requestHandler
> as follows:
>
>
> 
> 
>   explicit
>...
>
>   edismax
>
> Do I use it also when I use select in my url ?
>
> 4. In general, I want to transfer a user generated text to my url request
> using the most standard rules (translate "",+,- signs to the q parameter
> value).
> What is the best way to
>
>
>
> Thanks.
>

Re: Filtered Query in Solr

2013-06-17 Thread Jack Krupansky

What does the actual indexed data look like? Maybe "ALBUM" doesn't 
immediately follow "MUSIC", at least in that particular field. Or, maybe you 
added "MUSIC" and "ALBUM" as two separate values for that field and Solr 
then implicitly added the +100 position gap between them.


-- Jack Krupansky

-Original Message- 
From: Prathik Puthran

Sent: Monday, June 17, 2013 11:20 AM
To: solr-user@lucene.apache.org
Subject: Re: Filtered Query in Solr

Can you please explain why the 2nd one works?




On Mon, Jun 17, 2013 at 8:49 PM, Prathik Puthran <
prathik.puthra...@gmail.com> wrote:


The first one i.e. fq=asset_type:"MUSIC ALBUM" doesen't work.

However the 2nd one works
fq=asset_type:(+MUSIC +ALBUM)

Thanks for the response.

Regards,
Prathik


On Mon, Jun 17, 2013 at 8:41 PM, Upayavira  wrote:


Your fq query is:

fq=asset_type:MUSIC ALBUM

This is actually interpreted as:
fq=asset_type:MUSIC text:ALBUM

You probably want:
fq=asset_type:"MUSIC ALBUM"
or
fq=asset_type:(+MUSIC +ALBUM)
or even:
fq:{!term f=asset_type}MUSIC ALBUM

Upayavira

On Mon, Jun 17, 2013, at 03:57 PM, Prathik Puthran wrote:
> Hi,
>
> I am making a select request to solr with with 'fq=asset_type:MUSIC
> ALBUM'
> (see query 1 below) as one of the GET parameter. This request does not
> return any results. However when I send the select request with the
> parameter 'asset_type=MUSIC ALBUM'(see query 2 below) I get the 
> results.

>
> Does the filtered query parser do anything special (like split based on
> the
> spaces) before processing the request? How do I avoid this from
> happening?
>
> Query 1 -->
>
http://localhost:8080/solr/assets/select?q=amitabh&fq=asset_type%3AMUSIC%20ALBUM&wt=json
>
> Query 2 -->
>
http://localhost:8080/solr/assets/select?wt=json&q=amitabh&indent=true&sort=release_year%20desc&asset_type=MUSIC%20ALBUM
>
>
> Thanks,
> Prathik

Re: Sorting by field is slow

2013-06-17 Thread Shane Perry

Using 4.3.1-SNAPSHOT I have identified where the issue is occurring.  For a
query in the format (it returns one document, sorted by field4)

+(field0:UUID0) -field1:string0 +field2:string1 +field3:text0
+field4:"text1"


with the field types






  





  



the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader
reports 2640449 terms.  As a result, the loop on line 1198 is
executed 2640449 and the inner loop is executed a total of 658310778.  My
index contains 56180128 documents.

My configuration file sets the  for the newSearcher and
firstSearcher listeners to the value


   static firstSearcher warming in solrconfig.xml
   field4



which does not appear to affect the speed.  I'm not sure how replication
plays into the equation outside the fact that we are relatively aggressive
on the replication (every 60 seconds).  I fear I may be at the end of my
knowledge without really getting into the code so any help at this point
would be greatly appreciated.

Shane

On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry  wrote:

> I've dug through the code and have narrowed the delay down
> to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
> the point where the comparator's setNextReader() method is called (line 98
> in the lucene_solr_4_3 branch).  That line is actually two method calls so
> I'm not yet certain which path is the cause.  I'll continue to dig through
> the code but am on thin ice so input would be great.
>
> Shane
>
>
> On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry  wrote:
>
>> Erick,
>>
>> We do have soft commits turned.  Initially, autoCommit was set at 15000
>> and autoSoftCommit at 1000.  We did up those to 120 and 60
>> respectively.  However, since the core in question is a slave, we don't
>> actually do writes to the core but rely on replication only to populate the
>> index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
>> no-ops?  I thought I had pulled out all hard commits but a double check
>> shows one instance where it still occurs.
>>
>> Thanks for your time.
>>
>> Shane
>>
>> On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson 
>> wrote:
>>
>>> Shane:
>>>
>>> You've covered all the config stuff that I can think of. There's one
>>> other possibility. Do you have the soft commits turned on and are
>>> they very short? Although soft commits shouldn't invalidate any
>>> segment-level caches (but I'm not sure whether the sorting buffers
>>> are low-level or not).
>>>
>>> About the only other thing I can think of is that you're somehow
>>> doing hard commits from, say, the client but that's really
>>> stretching.
>>>
>>> All I can really say at this point is that this isn't a problem I've seen
>>> before, so it's _likely_ some innocent-seeming config has changed.
>>> I'm sure it'll be obvious once you find it ...
>>>
>>> Erick
>>>
>>> On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry  wrote:
>>> > Erick,
>>> >
>>> > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
>>> from
>>> > the distribution example with my 3.6 solrconfig.xml, pulling out what I
>>> > didn't need.  There is the possibility I removed something I shouldn't
>>> have
>>> > though I don't know what it would be.  Minus removing the dynamic
>>> fields, a
>>> > custom tokenizer class, and changing all my fields to be stored, the
>>> > schema.xml file should be the same as well.  I'm not currently in the
>>> > position to do so, but I'll double check those two files.  Finally, the
>>> > data was re-indexed when I moved to 4.3.
>>> >
>>> > My statement about field values wasn't stated very well.  What I meant
>>> is
>>> > that the 'text' field has more unique terms than some of my other
>>> fields.
>>> >
>>> > As for this being an edge case, I'm not sure why it would manifest
>>> itself
>>> > in 4.3 but not in 3.6 (short of me having a screwy configuration
>>> setting).
>>> >  If I get a chance, I'll see if I can duplicate the behavior with a
>>> small
>>> > document count in a sandboxed environment.
>>> >
>>> > Shane
>>> >
>>> > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson <
>>> erickerick...@gmail.com>wrote:
>>> >
>>> >> This doesn't make much sense, particularly the fact
>>> >> that you added first/new searchers. I'm assuming that
>>> >> these are sorting on the same field as your slow query.
>>> >>
>>> >> But sorting on a text field for which
>>> >> "Overall, the values of the field are unique"
>>> >> is a red-flag. Solr doesn't sort on fields that have
>>> >> more than one term, so you might as well use a
>>> >> string field and be done with it, it's possible you're
>>> >> hitting some edge case.
>>> >>
>>> >> Did you just copy your 3.6 schema and configs to
>>> >> 4.3? Did you re-index?
>>> >>
>>> >> Best
>>> >> Erick
>>> >>
>>> >> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry 
>>> wrote:
>>> >> > Thanks for the responses.
>>> >> >
>>> >> > Setting first/newSearcher had no noticeable effect.  I'm sorting on
>>> a
>>> >> > sto

How to define my data in schema.xml

2013-06-17 Thread Mysurf Mail

Hi,
I have created a flat table from my DB and defined a solr core on it.
It works excellent so far.

My problem is that my table has two hierarchies. So when flatted it is too
big.
Lets consider the following example scenario

My Tables are

School
Students (1:n with school)
Teachers(1:n with school)

Now, each school has many students and teachers but each student/teacher
has another multivalue field. i.e. the following table

studentHobbies - 1:N with students
teacherCourses - 1:N with teachers

My main Entity is School and that what I want to get in the result.
Flattening does not help me much and is very expensive.

Can you direct me to how I define 1:n relationships  ( and 1:n:n)
In data-config.xml
Thanks.

Is there a way to encrypt username and pass in the solr config file

2013-06-17 Thread Mysurf Mail

Hi,
I want to encrypt (rsa maybe?) my user name/pass in solr .
Cant leave a simple plain text on the server.
What is the recomended way?
Thanks.

Re: out of memory during indexing do to large incoming queue

2013-06-17 Thread Shawn Heisey


On 6/17/2013 4:32 AM, Yoni Amir wrote:

I was wondering about your recommendation to use facet.method=enum? Can you 
explain what is the trade-off here? I understand that I gain a benefit by using 
less memory, but what with I lose? Is it speed?


The problem with facet.method=fc (the default) and memeory is that every 
field and query that you use for faceting ends up separately cached in 
the FieldCache, and the memory required grows as your index grows.  If 
you only use facets on one or two fields, then the normal method is 
fine, and subsequent facets will be faster.  It does eat a lot of java 
heap memory, though ... and the bigger your java heap is, the more 
problems you'll have with garbage collection.


With enum, it must gather the data out of the index for every facet run. 
 If you have plenty of extra memory for the OS disk cache, this is not 
normally a major issue, because it will be pulled out of RAM, similar to 
what happens with fc, except that it's not java heap memory.  The OS is 
a lot more efficient with how it uses memory than Java is.



Also, do you know if there is an answer to my original question in this thread? 
Solr has a queue of incoming requests, which, in my case, kept on growing. I 
looked at the code but couldn't find it, I think maybe it is an implicit queue 
in the form of Java's concurrent thread pool or something like that.

Is it possible to limit the size of this queue, or to determine its size during 
runtime? This is the last issue that I am trying to figure out right now.


I do not know the answer to this.


Also, to answer your question about the field all_text: all the fields are 
stored in order to support partial-update of documents. Most of the fields are 
used for highlighting, all_text is used for searching. I'll gladly omit 
all_text from being stored, but then partial-update won't work.


Your copyFields will still work just fine with atomic updates even if 
they are not stored.  Behind the scenes, an atomic update is a delete 
and an add with the stored data plus the changes... if all your source 
fields are stored, then the copyField should be generated correctly from 
all the source fields.


The wiki page on the subject actually says that copyField destinations 
*MUST* be set to stored=false.


http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations


The reason I didn't use edismax to search all the fields, is because the list 
of all fields is very long. Can edismax handle several hundred fields in the 
list? What about dynamic fields? Edismax requires the list to be fixed in the 
configuration file, so I can't include dynamic fields there. I can pass along 
the full list in the 'qf' parameter in every search request, but this seems 
like a waste? Also, what about performance? I was told that the best practice 
in this case (you have lots of fields and want to search everything) is to copy 
everything to a catch-all field.


If there is ever any situation where you can come up with some searches 
that only need to search against some of the fields and other searches 
that need to search against different fields, then you might consider 
creating different search handlers with different qf lists.  If you 
always want to search against all the fields, then it's probably more 
efficient to keep your current method.


Thanks,
Shawn

RE: shardkey

2013-06-17 Thread Joshi, Shital

Thanks for the links. It was very useful.

Is there a way to use implicit router WITH numShards parameter? We have 5 
shards and business day (Monday-Friday) is our shardkey. We want to be able to 
say Monday -> shard1, Tuesday -> shard2.




-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 13, 2013 2:38 PM
To: solr-user@lucene.apache.org
Subject: Re: shardkey

Also you might want to check this blog post, just went up today.

http://searchhub.org/2013/06/13/solr-cloud-document-routing/


On Wed, Jun 12, 2013 at 2:18 PM, James Thomas  wrote:

> This page has some good information on custom document routing:
>
> http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
>
>
> -Original Message-
> From: Rishi Easwaran [mailto:rishi.easwa...@aol.com]
> Sent: Wednesday, June 12, 2013 1:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: shardkey
>
> From my understanding.
> In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
> CompositeId router is default if your numShards>1 on collection creation.
> CompositeId router generates an hash using the uniqueKey defined in your
> schema.xml to route your documents to a dedicated shard.
>
> You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit
> only the shard that has your shard.key
>
>
>
>  Thanks,
>
> Rishi.
>
>
>
> -Original Message-
> From: Joshi, Shital 
> To: 'solr-user@lucene.apache.org' 
> Sent: Wed, Jun 12, 2013 10:01 am
> Subject: shardkey
>
>
> Hi,
>
> We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple
> questions on shard key.
>
> 1. Looking at the admin GUI, how do I know which field is being
> used for shard key.
> 2. What is the default shard key used?
> 3. How do I override the default shard key?
>
> Thanks.
>
>
>


-- 
Joel Bernstein
Professional Services LucidWorks

Re: Filtered Query in Solr

2013-06-17 Thread Prathik Puthran

"MUSIC ALBUM" is the value of one of the field (asset_type) in the indexed
document.


On Mon, Jun 17, 2013 at 9:06 PM, Jack Krupansky wrote:

> What does the actual indexed data look like? Maybe "ALBUM" doesn't
> immediately follow "MUSIC", at least in that particular field. Or, maybe
> you added "MUSIC" and "ALBUM" as two separate values for that field and
> Solr then implicitly added the +100 position gap between them.
>
> -- Jack Krupansky
>
> -Original Message- From: Prathik Puthran
> Sent: Monday, June 17, 2013 11:20 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Filtered Query in Solr
>
>
> Can you please explain why the 2nd one works?
>
>
>
>
> On Mon, Jun 17, 2013 at 8:49 PM, Prathik Puthran <
> prathik.puthra...@gmail.com> wrote:
>
>  The first one i.e. fq=asset_type:"MUSIC ALBUM" doesen't work.
>>
>> However the 2nd one works
>> fq=asset_type:(+MUSIC +ALBUM)
>>
>> Thanks for the response.
>>
>> Regards,
>> Prathik
>>
>>
>> On Mon, Jun 17, 2013 at 8:41 PM, Upayavira  wrote:
>>
>>  Your fq query is:
>>>
>>> fq=asset_type:MUSIC ALBUM
>>>
>>> This is actually interpreted as:
>>> fq=asset_type:MUSIC text:ALBUM
>>>
>>> You probably want:
>>> fq=asset_type:"MUSIC ALBUM"
>>> or
>>> fq=asset_type:(+MUSIC +ALBUM)
>>> or even:
>>> fq:{!term f=asset_type}MUSIC ALBUM
>>>
>>> Upayavira
>>>
>>> On Mon, Jun 17, 2013, at 03:57 PM, Prathik Puthran wrote:
>>> > Hi,
>>> >
>>> > I am making a select request to solr with with 'fq=asset_type:MUSIC
>>> > ALBUM'
>>> > (see query 1 below) as one of the GET parameter. This request does not
>>> > return any results. However when I send the select request with the
>>> > parameter 'asset_type=MUSIC ALBUM'(see query 2 below) I get the >
>>> results.
>>> >
>>> > Does the filtered query parser do anything special (like split based on
>>> > the
>>> > spaces) before processing the request? How do I avoid this from
>>> > happening?
>>> >
>>> > Query 1 -->
>>> >
>>> http://localhost:8080/solr/**assets/select?q=amitabh&fq=**
>>> asset_type%3AMUSIC%20ALBUM&wt=**json
>>> >
>>> > Query 2 -->
>>> >
>>> http://localhost:8080/solr/**assets/select?wt=json&q=**
>>> amitabh&indent=true&sort=**release_year%20desc&asset_**
>>> type=MUSIC%20ALBUM
>>> >
>>> >
>>> > Thanks,
>>> > Prathik
>>>
>>>
>>
>>
>

Re: Solr large boolean filter

2013-06-17 Thread Igor Kustov


Menawhile I'm currently trying to write custom QParser which will use
FieldCacheTermsFilter

So I'm using query like
http://127.0.0.1:8080/solr/select?q=*:*&fq={!mqparser}id:%281%202%203%29

And I couldn't make it work - I just couldn't find a proper constructor and
also not sure that i'm filtering appropriately. 

private class MyQParser { 

List idsList;  

MyQParser(String queryString, SolrParams localParams, SolrParams solrParams,
SolrQueryRequestsolrQueryRequest) throws SyntaxError {
super(queryString,localParams,solrParams, solrQueryRequest);
 idsList = // extract ids from params
}

@Override
public Query parse() throws SyntaxError {
   FieldCacheTerms filter = new
FieldCacheTermsFilter("id",idsList.toArray()) 
// first problem id is just an int in my case, but this seems like the only
normal constructor
   return new FilteredQuery(new BooleanQuery(), filter); 
// my goal here is to get only filtered data, but does BooleanQuery() equals
to *:*? 
}







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-tp4070747p4071049.html
Sent from the Solr - User mailing list archive at Nabble.com.

Infinite Solr's node recovery loop after ungraceful shutdown of majority of nodes in a cluster

2013-06-17 Thread serhiy.ivanov

Hi Solr Community,
We're currently experimenting with test SolrCloud setup and doing some weird
failover test scenarios to check how system reacts. 
Basically, I do have 3 nodes in my Solr Cloud. Cloud is using external
ZooKeeper ensemble with 3 nodes. 
ZooKeeper seems to be working pretty predictable, and requires majority of
it's nodes to be up to work correnctly. (also tested with 5 nodes, 9 nodes
(3 groups, 3 node per group) in zookeeper ensemble)

On the other hand, there're some cases, when SolrCloud can't handle recovery
process per node.
E.g., 
Cloud -> 1 UP (leader), 2 UP, 3 UP.
If I perform immediate, ungraceful shutdown (simply closing terminal window
where jetty with solr is running), 3rd node is going into infinite recovery
process:


/Jun 5, 2013 1:25:09 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Wait 2.0 seconds before trying to recover again (1)
Jun 5, 2013 1:25:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
shouldIBeLeader
INFO: Checking if I should try and be the leader.
Jun 5, 2013 1:25:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
shouldIBeLeader
*INFO: My last published State was recovering, I won't be the leader.*
Jun 5, 2013 1:25:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
rejoinLeaderElection
*INFO: There may be a better leader candidate than us - going back into
recovery*
Jun 5, 2013 1:25:09 PM org.apache.solr.update.DefaultSolrCoreState
doRecovery
*INFO: Running recovery - first canceling any ongoing recovery*
Jun 5, 2013 1:25:09 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for
zkNodeName=10.25.12.66:8083_solr_docas1-collection_shard1_replica1core=docas1-collection_shard1_replica1
Jun 5, 2013 1:25:10 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Finished recovery process. core=docas1-collection_shard1_replica1
Jun 5, 2013 1:25:10 PM org.apache.solr.cloud.RecoveryStrategy run
INFO: Starting recovery process.  core=docas1-collection_shard1_replica1
recoveringAfterStartup=false
Jun 5, 2013 1:25:10 PM org.apache.solr.cloud.ZkController publish
INFO: publishing core=docas1-collection_shard1_replica1 state=recovering
Jun 5, 2013 1:25:10 PM org.apache.solr.cloud.ZkController publish
INFO: numShards not found on descriptor - reading it from system property
Jun 5, 2013 1:25:10 PM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Jun 5, 2013 1:25:10 PM org.apache.solr.cloud.ShardLeaderElectionContext
runLeaderProcess
INFO: Running the leader process.
Jun 5, 2013 1:25:10 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover.
core=docas1-collection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException:
Server refused connection at: http://10.25.12.66:8082/solr
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:409)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
http://10.25.12.66:8082 refused
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
... 4 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
at java.net.Socket.connect(Socket.java:529)
at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
... 11 more
/

Also weird that if I kil

Re: Sorting by field is slow

2013-06-17 Thread Shane Perry

Turns out it was a case of an oversite.  My warming queries weren't setting
the sort order and as a result don't successfully complete.  After setting
the sort order things appear to be responding quickly.

Thanks for the help.

On Mon, Jun 17, 2013 at 9:45 AM, Shane Perry  wrote:

> Using 4.3.1-SNAPSHOT I have identified where the issue is occurring.  For
> a query in the format (it returns one document, sorted by field4)
>
> +(field0:UUID0) -field1:string0 +field2:string1 +field3:text0
> +field4:"text1"
>
>
> with the field types
>
> 
>
>  omitNorms="true"/>
>
> 
>   
> 
> 
> 
> 
> 
>   
> 
>
>
> the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader
> reports 2640449 terms.  As a result, the loop on line 1198 is
> executed 2640449 and the inner loop is executed a total of 658310778.  My
> index contains 56180128 documents.
>
> My configuration file sets the  for the newSearcher and
> firstSearcher listeners to the value
>
> 
>static firstSearcher warming in solrconfig.xml
>field4
> 
>
>
> which does not appear to affect the speed.  I'm not sure how replication
> plays into the equation outside the fact that we are relatively aggressive
> on the replication (every 60 seconds).  I fear I may be at the end of my
> knowledge without really getting into the code so any help at this point
> would be greatly appreciated.
>
> Shane
>
> On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry  wrote:
>
>> I've dug through the code and have narrowed the delay down
>> to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
>> the point where the comparator's setNextReader() method is called (line 98
>> in the lucene_solr_4_3 branch).  That line is actually two method calls so
>> I'm not yet certain which path is the cause.  I'll continue to dig through
>> the code but am on thin ice so input would be great.
>>
>> Shane
>>
>>
>> On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry  wrote:
>>
>>> Erick,
>>>
>>> We do have soft commits turned.  Initially, autoCommit was set at 15000
>>> and autoSoftCommit at 1000.  We did up those to 120 and 60
>>> respectively.  However, since the core in question is a slave, we don't
>>> actually do writes to the core but rely on replication only to populate the
>>> index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
>>> no-ops?  I thought I had pulled out all hard commits but a double check
>>> shows one instance where it still occurs.
>>>
>>> Thanks for your time.
>>>
>>> Shane
>>>
>>> On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson >> > wrote:
>>>
 Shane:

 You've covered all the config stuff that I can think of. There's one
 other possibility. Do you have the soft commits turned on and are
 they very short? Although soft commits shouldn't invalidate any
 segment-level caches (but I'm not sure whether the sorting buffers
 are low-level or not).

 About the only other thing I can think of is that you're somehow
 doing hard commits from, say, the client but that's really
 stretching.

 All I can really say at this point is that this isn't a problem I've
 seen
 before, so it's _likely_ some innocent-seeming config has changed.
 I'm sure it'll be obvious once you find it ...

 Erick

 On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry 
 wrote:
 > Erick,
 >
 > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
 from
 > the distribution example with my 3.6 solrconfig.xml, pulling out what
 I
 > didn't need.  There is the possibility I removed something I
 shouldn't have
 > though I don't know what it would be.  Minus removing the dynamic
 fields, a
 > custom tokenizer class, and changing all my fields to be stored, the
 > schema.xml file should be the same as well.  I'm not currently in the
 > position to do so, but I'll double check those two files.  Finally,
 the
 > data was re-indexed when I moved to 4.3.
 >
 > My statement about field values wasn't stated very well.  What I
 meant is
 > that the 'text' field has more unique terms than some of my other
 fields.
 >
 > As for this being an edge case, I'm not sure why it would manifest
 itself
 > in 4.3 but not in 3.6 (short of me having a screwy configuration
 setting).
 >  If I get a chance, I'll see if I can duplicate the behavior with a
 small
 > document count in a sandboxed environment.
 >
 > Shane
 >
 > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson <
 erickerick...@gmail.com>wrote:
 >
 >> This doesn't make much sense, particularly the fact
 >> that you added first/new searchers. I'm assuming that
 >> these are sorting on the same field as your slow query.
 >>
 >> But sorting on a text field for which
 >> "Overall, the values of the field are unique"
 >> is a red-flag. Solr doesn't sort on fields th

Re: Filtered Query in Solr

2013-06-17 Thread Upayavira

you have likely indexed it as a text/analysed field, not as a string
field. Your usage suggests that "MUSIC ALBUM" should be a single term,
thus you should index it as a string field.

Upayavira

On Mon, Jun 17, 2013, at 05:21 PM, Prathik Puthran wrote:
> "MUSIC ALBUM" is the value of one of the field (asset_type) in the
> indexed
> document.
> 
> 
> On Mon, Jun 17, 2013 at 9:06 PM, Jack Krupansky
> wrote:
> 
> > What does the actual indexed data look like? Maybe "ALBUM" doesn't
> > immediately follow "MUSIC", at least in that particular field. Or, maybe
> > you added "MUSIC" and "ALBUM" as two separate values for that field and
> > Solr then implicitly added the +100 position gap between them.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Prathik Puthran
> > Sent: Monday, June 17, 2013 11:20 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Filtered Query in Solr
> >
> >
> > Can you please explain why the 2nd one works?
> >
> >
> >
> >
> > On Mon, Jun 17, 2013 at 8:49 PM, Prathik Puthran <
> > prathik.puthra...@gmail.com> wrote:
> >
> >  The first one i.e. fq=asset_type:"MUSIC ALBUM" doesen't work.
> >>
> >> However the 2nd one works
> >> fq=asset_type:(+MUSIC +ALBUM)
> >>
> >> Thanks for the response.
> >>
> >> Regards,
> >> Prathik
> >>
> >>
> >> On Mon, Jun 17, 2013 at 8:41 PM, Upayavira  wrote:
> >>
> >>  Your fq query is:
> >>>
> >>> fq=asset_type:MUSIC ALBUM
> >>>
> >>> This is actually interpreted as:
> >>> fq=asset_type:MUSIC text:ALBUM
> >>>
> >>> You probably want:
> >>> fq=asset_type:"MUSIC ALBUM"
> >>> or
> >>> fq=asset_type:(+MUSIC +ALBUM)
> >>> or even:
> >>> fq:{!term f=asset_type}MUSIC ALBUM
> >>>
> >>> Upayavira
> >>>
> >>> On Mon, Jun 17, 2013, at 03:57 PM, Prathik Puthran wrote:
> >>> > Hi,
> >>> >
> >>> > I am making a select request to solr with with 'fq=asset_type:MUSIC
> >>> > ALBUM'
> >>> > (see query 1 below) as one of the GET parameter. This request does not
> >>> > return any results. However when I send the select request with the
> >>> > parameter 'asset_type=MUSIC ALBUM'(see query 2 below) I get the >
> >>> results.
> >>> >
> >>> > Does the filtered query parser do anything special (like split based on
> >>> > the
> >>> > spaces) before processing the request? How do I avoid this from
> >>> > happening?
> >>> >
> >>> > Query 1 -->
> >>> >
> >>> http://localhost:8080/solr/**assets/select?q=amitabh&fq=**
> >>> asset_type%3AMUSIC%20ALBUM&wt=**json
> >>> >
> >>> > Query 2 -->
> >>> >
> >>> http://localhost:8080/solr/**assets/select?wt=json&q=**
> >>> amitabh&indent=true&sort=**release_year%20desc&asset_**
> >>> type=MUSIC%20ALBUM
> >>> >
> >>> >
> >>> > Thanks,
> >>> > Prathik
> >>>
> >>>
> >>
> >>
> >

Solr Cloud Hangs consistently .

2013-06-17 Thread Rishi Easwaran



Hi All,

I am trying to benchmark SOLR Cloud and it consistently hangs. 
Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.

A little bit about my set up. 
I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
is configured to have 8 SOLR cloud nodes running at 4GB each.
JVM configs: http://apaste.info/57Ai

My cluster has 12 shards with replication factor 2- http://apaste.info/09sA

I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
running this configuration in production in Non-Cloud form. 
It got stuck repeatedly.

I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
and tomcat7. 
 It still shows same behaviour and hangs through the test.

My test schema and config.
Schema.xml - http://apaste.info/imah
SolrConfig.xml - http://apaste.info/ku4F

The test is pretty simple. its a jmeter test with update command via SOAP rpc 
(round robin request across every node), adding in 5 fields from a csv file - 
id, guid, subject, body, compositeID (guid!id).
number of jmeter threads = 150. loop count = 20, num of messages to add/per 
guid = 3; total 150*3*20 = 9000 documents.  

When cloud gets stuck, i don't get anything in the logs, but when i run netstat 
i see the following.
Sample netstat on a stuck run. http://apaste.info/hr0O 
hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.

 
At the moment my benchmarking efforts are at a stand still.

Any help from the community would be great, I got some heap dumps and stack 
dumps, but haven't found a smoking gun yet.
If I can provide anything else to diagnose this issue. just let me know.

Thanks,

Rishi.

Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Mark Miller

Could you give a simple stack trace dump as well?

It's likely the distributed update deadlock that has been reported a few times 
now - I think usually with a replication factor greater than 2, but I can't be 
sure. The deadlock involves sending docs concurrently to replicas and I 
wouldn't have expected it to be so easily hit with only 2 replicas per shard. I 
should be able to tell from a stack trace though.

If it is that, it's on my short list to investigate (been there a long time now 
though - but I still hope to look at it soon).

- Mark

On Jun 17, 2013, at 1:44 PM, Rishi Easwaran  wrote:

> 
> 
> Hi All,
> 
> I am trying to benchmark SOLR Cloud and it consistently hangs. 
> Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
> 
> A little bit about my set up. 
> I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
> is configured to have 8 SOLR cloud nodes running at 4GB each.
> JVM configs: http://apaste.info/57Ai
> 
> My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
> 
> I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
> running this configuration in production in Non-Cloud form. 
> It got stuck repeatedly.
> 
> I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
> and tomcat7. 
> It still shows same behaviour and hangs through the test.
> 
> My test schema and config.
> Schema.xml - http://apaste.info/imah
> SolrConfig.xml - http://apaste.info/ku4F
> 
> The test is pretty simple. its a jmeter test with update command via SOAP rpc 
> (round robin request across every node), adding in 5 fields from a csv file - 
> id, guid, subject, body, compositeID (guid!id).
> number of jmeter threads = 150. loop count = 20, num of messages to add/per 
> guid = 3; total 150*3*20 = 9000 documents.  
> 
> When cloud gets stuck, i don't get anything in the logs, but when i run 
> netstat i see the following.
> Sample netstat on a stuck run. http://apaste.info/hr0O 
> hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
> 
> 
> At the moment my benchmarking efforts are at a stand still.
> 
> Any help from the community would be great, I got some heap dumps and stack 
> dumps, but haven't found a smoking gun yet.
> If I can provide anything else to diagnose this issue. just let me know.
> 
> Thanks,
> 
> Rishi.
> 
> 
> 
> 
> 
> 
> 
>

Re: How to define my data in schema.xml

2013-06-17 Thread Gora Mohanty

On 17 June 2013 21:39, Mysurf Mail  wrote:
> Hi,
> I have created a flat table from my DB and defined a solr core on it.
> It works excellent so far.
>
> My problem is that my table has two hierarchies. So when flatted it is too
> big.

What do you mean by "too big"? Have you actually tried
indexing the data into Solr, and does the performance
not meet your needs, or are you guessing from the size
of the tables?

> Lets consider the following example scenario
>
> My Tables are
>
> School
> Students (1:n with school)
> Teachers(1:n with school)
[...]

Um, all of this crucially depends on what your 'n' is.
Plus, you need to describe your use case in much
more detail. At the moment, you are asking us to
guess at what you are trying to do, which is inefficient,
and unlikely to solve your problem.

Regards,
Gora

Re: Is there a way to encrypt username and pass in the solr config file

2013-06-17 Thread Gora Mohanty

On 17 June 2013 21:41, Mysurf Mail  wrote:
> Hi,
> I want to encrypt (rsa maybe?) my user name/pass in solr .
> Cant leave a simple plain text on the server.
> What is the recomended way?

I don't think that there is a way to encrypt this information
at the moment.

The recommended way would be to never expose your
Solr server to the external world. The way to do that
depends on your OS, and possibly the container in
which you are running Solr.

Regards,
Gora

Refresh implicit core properties after a SWAP

2013-06-17 Thread aus...@3bx.org

I noticed that Shawn mentioned (
https://issues.apache.org/jira/browse/SOLR-4732) that “when you rename or
swap cores, the solr.core.name property does NOT get updated until you
restart Solr”.  I’m wondering if there’s any way possible to update this
property other than restarting the entire Solr application.  I’m currently
using Solr 4.3.0.



My original thought was that I may be able to update this property by
issuing a core RELOAD after the SWAP, which would mean that Solr is at
least responding to search requests while it’s reloading the core.  Would
this work?



I’m trying to avoid restarting the entirety of Solr, as I have over 100
cores, and the startup time can be a minute or two.  Additionally, I’m
attempting to share the same schema/configuration among these 100 cores,
which I why I’m using the ${solr.core.name} property for replication and
the data import handler, so that I only need one copy of these files.



I’d imagine that similar issues may be encountered while implementing
https://issues.apache.org/jira/browse/SOLR-4478, since this intends to
share config sets across multiple cores.  I would think that the core name
may be an important property to use in the config set scenario.



Thanks!

RE: yet another optimize question

2013-06-17 Thread Petersen, Robert

Hi Otis,

Right I didn't restart the JVMs except on the one slave where I was 
experimenting with using G1GC on the 1.7.0_21 JRE.   Also some time ago I made 
all our caches small enough to keep us from getting OOMs while still having a 
good hit rate.Our index has about 50 fields which are mostly int IDs and 
there are some dynamic fields also.  These dynamic fields can be used for 
custom faceting.  We have some standard facets we always facet on and other 
dynamic facets which are only used if the query is filtering on a particular 
category.  There are hundreds of these fields but since they are only for a 
small subset of the overall index they are very sparsely populated with regard 
to the overall index.  With CMS GC we get a sawtooth on the old generation (I 
guess every replication and commit causes it's usage to drop down to 10GB or 
so) and it seems to be the old generation which is the main space consumer.  
With the G1GC, the memory map looked totally different!  I was a little lost 
looking at memory consumption with that GC.  Maybe I'll try it again now that 
the index is a bit smaller than it was last time I tried it.  After four days 
without running an optimize now it is 21GB.  BTW our indexing speed is mostly 
bound by the DB so reducing the segments might be ok...

Here is a quick snapshot of one slaves memory map as reported by PSI-Probe, but 
unfortunately I guess I can't send the history graphics to the solr-user list 
to show their changes over time:
NameUsedCommitted   Max 
Initial Group
 Par Survivor Space 20.02 MB108.13 MB   108.13 MB   
108.13 MB   HEAP
 CMS Perm Gen   42.29 MB70.66 MB82.00 MB20.75 
MBNON_HEAP
 Code Cache 9.73 MB 9.88 MB 48.00 MB2.44 MB NON_HEAP
 CMS Old Gen20.22 GB30.94 GB30.94 GB
30.94 GBHEAP
 Par Eden Space 42.20 MB865.31 MB   865.31 MB   865.31 
MB   HEAP
 Total  20.33 GB31.97 GB32.02 GB
31.92 GBTOTAL

And here's our current cache stats from a random slave:

name:queryResultCache  
class:   org.apache.solr.search.LRUCache  
version: 1.0  
description: LRU Cache(maxSize=488, initialSize=6, autowarmCount=6, 
regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3)  
stats:  lookups : 619 
hits : 36 
hitratio : 0.05 
inserts : 592 
evictions : 101 
size : 488 
warmupTime : 2949 
cumulative_lookups : 681225 
cumulative_hits : 73126 
cumulative_hitratio : 0.10 
cumulative_inserts : 602396 
cumulative_evictions : 428868


 name:   fieldCache  
class:   org.apache.solr.search.SolrFieldCacheMBean  
version: 1.0  
description: Provides introspection of the Lucene FieldCache, this is 
**NOT** a cache that is managed by Solr.  
stats:  entries_count : 359


name:documentCache  
class:   org.apache.solr.search.LRUCache  
version: 1.0  
description: LRU Cache(maxSize=2048, initialSize=512, autowarmCount=10, 
regenerator=null)  
stats:  lookups : 12710 
hits : 7160 
hitratio : 0.56 
inserts : 5636 
evictions : 3588 
size : 2048 
warmupTime : 0 
cumulative_lookups : 10590054 
cumulative_hits : 6166913 
cumulative_hitratio : 0.58 
cumulative_inserts : 4423141 
cumulative_evictions : 3714653


name:fieldValueCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=280, initialSize=280, 
minSize=252, acceptableSize=266, cleanupThread=false, autowarmCount=6, 
regenerator=org.apache.solr.search.SolrIndexSearcher$1@143eb77a)  
stats:  lookups : 1725 
hits : 1481 
hitratio : 0.85 
inserts : 122 
evictions : 0 
size : 128 
warmupTime : 4426 
cumulative_lookups : 3449712 
cumulative_hits : 3281805 
cumulative_hitratio : 0.95 
cumulative_inserts : 83261 
cumulative_evictions : 3479


name:filterCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=248, initialSize=12, minSize=223, 
acceptableSize=235, cleanupThread=false, autowarmCount=10, 
regenerator=org.apache.solr.search.SolrIndexSearcher$2@36e831d6)  
stats:  lookups : 3990 
hits : 3831 
hitratio : 0.96 
inserts : 239 
evictions : 26 
size : 244 
warmupTime : 1 
cumulative_lookups : 5745011 
cumulative_hits : 5496150 
cumulative_hitratio : 0.95 
cumulative_inserts : 351485 
cumulative_evictions : 276308

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Saturday, June 15, 2013 5:52 AM
To: solr-user@lucene.apache.org
Subject: Re: yet another optimize question

Hi Robi,

I'm going to guess you are seeing smaller heap also simply because you 
restarted the JVM recently (hm, you don't say you restarted, maybe I'm making 
this up). If you are indeed indexing continuously then you shouldn't optimize.

Re: New operator.

2013-06-17 Thread Yanis Kakamaikis

Hi all,   thanks for your reply.
I want to be able to ask a combined query,  a normal solr querym but one of
the query fields should get it's answer not from within the solr engine,
but from an external engine.
the rest should work normaly with the ability to do more tasks on the
answer like faceting for example.
The external engine will use the same objects ids like solr, so the boolean
query that uses this engine answer be executed correctly.
For example, let say I want to find a person by his name, age, address, and
also by his picture. I have a picture indexing engine, I want to create a
combined query that will call this engine like other query field.   I hope
it's more clear now...

On Sun, Jun 16, 2013 at 4:02 PM, Jack Krupansky wrote:

> It all depends on what you mean by an "operator".
>
> Start by describing in more detail what problem you are trying to solve.
>
> And how do you expect your users or applications to use this "operator".
> Give some examples.
>
> Solr and Lucene do not have "operators" per say, except in query parser
> syntax, but that is hard-wired into the individual query parsers.
>
> -- Jack Krupansky
>
> -Original Message- From: Yanis Kakamaikis
> Sent: Sunday, June 16, 2013 2:01 AM
> To: solr-user@lucene.apache.org
> Subject: New operator.
>
>
> Hi all,I want to add a new operator to my solr.   I need that operator
> to call my proprietary engine and build an answer vector to solr, in a way
> that this vector will be part of the boolean query at the next step.   How
> do I do that?
> Thanks
>

Re: Solr large boolean filter

2013-06-17 Thread Mikhail Khludnev

nonono, mate! I warn you before by 'Mind term ecoding due to field type!'

you need to obtain schema from request, then access fieldtype and convert
external string representation into (might be) tricky encoded bytes by
readableToIndexed() see FieldType.getFieldQuery()

btw, it's a really frequent pain in this list, feel free to contribute when
you done!

Empty BooleanQuery matches nothing. There is a  MatchAllDocsQuery().

On Mon, Jun 17, 2013 at 8:35 PM, Igor Kustov  wrote:

>
> Menawhile I'm currently trying to write custom QParser which will use
> FieldCacheTermsFilter
>
> So I'm using query like
> http://127.0.0.1:8080/solr/select?q=*:*&fq={!mqparser}id:%281%202%203%29
>
> And I couldn't make it work - I just couldn't find a proper constructor and
> also not sure that i'm filtering appropriately.
>
> private class MyQParser {
>
> List idsList;
>
> MyQParser(String queryString, SolrParams localParams, SolrParams
> solrParams,
> SolrQueryRequestsolrQueryRequest) throws SyntaxError {
> super(queryString,localParams,solrParams, solrQueryRequest);
>  idsList = // extract ids from params
> }
>
> @Override
> public Query parse() throws SyntaxError {
>FieldCacheTerms filter = new
> FieldCacheTermsFilter("id",idsList.toArray())
> // first problem id is just an int in my case, but this seems like the only
> normal constructor
>return new FilteredQuery(new BooleanQuery(), filter);
> // my goal here is to get only filtered data, but does BooleanQuery()
> equals
> to *:*?
> }
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-tp4070747p4071049.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: New operator.

2013-06-17 Thread Roman Chyla

Hello Yanis,

We are probably using something similar - eg. 'functional operators' - eg.
edismax() to treat everything inside the bracket as an argument for
edismax, or pos() to search for authors based on their position. And
invenio() which is exactly what you describe, to get results from external
engine. Depending on the level of complexity, you may need any/all of the
following

1. query parser that understands the operator syntax and can build some
'external search' query object
2. the 'query object' that knows to contact the external service and return
lucene docids - so you will need some translation
externalIds<->luceneDocIds - you can for example index the same primary key
in both solr and the ext engine, and then use a cache for the mapping

To solve the 1, you could use the
https://issues.apache.org/jira/browse/LUCENE-5014 - sorry for the shameless
plug :) - but this is what we use and what i am familiar with, you can see
a grammar that gives you the 'functional operator' here - if you dig
deeper, you will see how it is building different query objects for
different operators:
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/grammars/ADS.g

and here an example how to ask the external engine for results and return
lucene docids:
https://github.com/romanchyla/montysolr/blob/master/contrib/invenio/src/java/org/apache/lucene/search/InvenioWeight.java

it is a bit messy and you should probably ignore how we are getting the
results, just look at nextDoc()

HTH,

  roman


On Mon, Jun 17, 2013 at 2:34 PM, Yanis Kakamaikis <
yanis.kakamai...@gmail.com> wrote:

> Hi all,   thanks for your reply.
> I want to be able to ask a combined query,  a normal solr querym but one of
> the query fields should get it's answer not from within the solr engine,
> but from an external engine.
> the rest should work normaly with the ability to do more tasks on the
> answer like faceting for example.
> The external engine will use the same objects ids like solr, so the boolean
> query that uses this engine answer be executed correctly.
> For example, let say I want to find a person by his name, age, address, and
> also by his picture. I have a picture indexing engine, I want to create a
> combined query that will call this engine like other query field.   I hope
> it's more clear now...
>
>
> On Sun, Jun 16, 2013 at 4:02 PM, Jack Krupansky  >wrote:
>
> > It all depends on what you mean by an "operator".
> >
> > Start by describing in more detail what problem you are trying to solve.
> >
> > And how do you expect your users or applications to use this "operator".
> > Give some examples.
> >
> > Solr and Lucene do not have "operators" per say, except in query parser
> > syntax, but that is hard-wired into the individual query parsers.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Yanis Kakamaikis
> > Sent: Sunday, June 16, 2013 2:01 AM
> > To: solr-user@lucene.apache.org
> > Subject: New operator.
> >
> >
> > Hi all,I want to add a new operator to my solr.   I need that
> operator
> > to call my proprietary engine and build an answer vector to solr, in a
> way
> > that this vector will be part of the boolean query at the next step.
> How
> > do I do that?
> > Thanks
> >
>

Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Rishi Easwaran

Mark,

I got a few stack dumps of the instance that was stuck ssdtest-d03:8011

http://apaste.info/cofK
http://apaste.info/sv4M
http://apaste.info/cxUf

 


 I can get dumps of others if needed.

Thanks,

Rishi.

 

-Original Message-
From: Mark Miller 
To: solr-user 
Sent: Mon, Jun 17, 2013 1:57 pm
Subject: Re: Solr Cloud Hangs consistently .


Could you give a simple stack trace dump as well?

It's likely the distributed update deadlock that has been reported a few times 
now - I think usually with a replication factor greater than 2, but I can't be 
sure. The deadlock involves sending docs concurrently to replicas and I 
wouldn't 
have expected it to be so easily hit with only 2 replicas per shard. I should 
be 
able to tell from a stack trace though.

If it is that, it's on my short list to investigate (been there a long time now 
though - but I still hope to look at it soon).

- Mark

On Jun 17, 2013, at 1:44 PM, Rishi Easwaran  wrote:

> 
> 
> Hi All,
> 
> I am trying to benchmark SOLR Cloud and it consistently hangs. 
> Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
> 
> A little bit about my set up. 
> I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
is configured to have 8 SOLR cloud nodes running at 4GB each.
> JVM configs: http://apaste.info/57Ai
> 
> My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
> 
> I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
running this configuration in production in Non-Cloud form. 
> It got stuck repeatedly.
> 
> I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
and tomcat7. 
> It still shows same behaviour and hangs through the test.
> 
> My test schema and config.
> Schema.xml - http://apaste.info/imah
> SolrConfig.xml - http://apaste.info/ku4F
> 
> The test is pretty simple. its a jmeter test with update command via SOAP rpc 
(round robin request across every node), adding in 5 fields from a csv file - 
id, guid, subject, body, compositeID (guid!id).
> number of jmeter threads = 150. loop count = 20, num of messages to add/per 
guid = 3; total 150*3*20 = 9000 documents.  
> 
> When cloud gets stuck, i don't get anything in the logs, but when i run 
netstat i see the following.
> Sample netstat on a stuck run. http://apaste.info/hr0O 
> hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
> 
> 
> At the moment my benchmarking efforts are at a stand still.
> 
> Any help from the community would be great, I got some heap dumps and stack 
dumps, but haven't found a smoking gun yet.
> If I can provide anything else to diagnose this issue. just let me know.
> 
> Thanks,
> 
> Rishi.
> 
> 
> 
> 
> 
> 
> 
>

Re: SolrJ Howto get local params from QueryResponse

2013-06-17 Thread Jack Krupansky

The "LocalParams" are just the prefix on the query parameters  (e.g., 
"facet.field") themselves - what you sent on the original query. I mean, you 
constructed those original parameters in your app code, right?


You can also call QueryResponse#getHeader and then locate the original query 
parameters in there, if you need to.


You could also set a custom label for each facet field if you wanted to 
encode extra metadata for each facet field in the facet response.


-- Jack Krupansky

-Original Message- 
From: Holger Rieß

Sent: Monday, June 17, 2013 10:53 AM
To: solr-user@lucene.apache.org
Subject: SolrJ Howto get local params from QueryResponse


Hi,
how can I get local params like '{!ex=dyn,cls} AAA001001_0_1.1.1_ss' from 
QueryResponse? I've tagged filter queries and facet fields with different 
tags (p.e.'dyn','cls').

I can see the tags in the QueryResponse XML facet.field section:

 {!ex=dyn}AAA001001_0_1.1.1_ss
...


But the FacetField class has no method List getLocalParams().
My goal is to make an server side representation of the QueryResponse 
components and sort the facet fields on the client side by the tag.


Thanks, Holger

Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Rishi Easwaran

FYI..you can ignore  http4ClientExpiryService thread in the stack dump.
Its a dummy executor service, i created to test out something, unrelated to 
this issue.  
 

 

 

-Original Message-
From: Rishi Easwaran 
To: solr-user 
Sent: Mon, Jun 17, 2013 2:54 pm
Subject: Re: Solr Cloud Hangs consistently .


Mark,

I got a few stack dumps of the instance that was stuck ssdtest-d03:8011

http://apaste.info/cofK
http://apaste.info/sv4M
http://apaste.info/cxUf

 


 I can get dumps of others if needed.

Thanks,

Rishi.

 

-Original Message-
From: Mark Miller 
To: solr-user 
Sent: Mon, Jun 17, 2013 1:57 pm
Subject: Re: Solr Cloud Hangs consistently .


Could you give a simple stack trace dump as well?

It's likely the distributed update deadlock that has been reported a few times 
now - I think usually with a replication factor greater than 2, but I can't be 
sure. The deadlock involves sending docs concurrently to replicas and I 
wouldn't 

have expected it to be so easily hit with only 2 replicas per shard. I should 
be 

able to tell from a stack trace though.

If it is that, it's on my short list to investigate (been there a long time now 
though - but I still hope to look at it soon).

- Mark

On Jun 17, 2013, at 1:44 PM, Rishi Easwaran  wrote:

> 
> 
> Hi All,
> 
> I am trying to benchmark SOLR Cloud and it consistently hangs. 
> Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
> 
> A little bit about my set up. 
> I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
is configured to have 8 SOLR cloud nodes running at 4GB each.
> JVM configs: http://apaste.info/57Ai
> 
> My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
> 
> I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
running this configuration in production in Non-Cloud form. 
> It got stuck repeatedly.
> 
> I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
and tomcat7. 
> It still shows same behaviour and hangs through the test.
> 
> My test schema and config.
> Schema.xml - http://apaste.info/imah
> SolrConfig.xml - http://apaste.info/ku4F
> 
> The test is pretty simple. its a jmeter test with update command via SOAP rpc 
(round robin request across every node), adding in 5 fields from a csv file - 
id, guid, subject, body, compositeID (guid!id).
> number of jmeter threads = 150. loop count = 20, num of messages to add/per 
guid = 3; total 150*3*20 = 9000 documents.  
> 
> When cloud gets stuck, i don't get anything in the logs, but when i run 
netstat i see the following.
> Sample netstat on a stuck run. http://apaste.info/hr0O 
> hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
> 
> 
> At the moment my benchmarking efforts are at a stand still.
> 
> Any help from the community would be great, I got some heap dumps and stack 
dumps, but haven't found a smoking gun yet.
> If I can provide anything else to diagnose this issue. just let me know.
> 
> Thanks,
> 
> Rishi.
> 
> 
> 
> 
> 
> 
> 
>

Re: yet another optimize question

2013-06-17 Thread Upayavira

The key figures are numdocs vs maxdocs. Maxdocs-numdocs is the number of
deleted docs in your index.

This is a 3.6 system you say. But has it been upgraded? I've seen folks
who've upgraded from 1.4 or 3.0/3.1 over time, keeping the old config.
The consequence of this is that they don't get the right config for the
TieredMergePolicy, and therefore don't get to use it, seeing the old
behaviour which does require periodic optimise.

Upayavira

On Mon, Jun 17, 2013, at 07:21 PM, Petersen, Robert wrote:
> Hi Otis,
> 
> Right I didn't restart the JVMs except on the one slave where I was
> experimenting with using G1GC on the 1.7.0_21 JRE.   Also some time ago I
> made all our caches small enough to keep us from getting OOMs while still
> having a good hit rate.Our index has about 50 fields which are mostly
> int IDs and there are some dynamic fields also.  These dynamic fields can
> be used for custom faceting.  We have some standard facets we always
> facet on and other dynamic facets which are only used if the query is
> filtering on a particular category.  There are hundreds of these fields
> but since they are only for a small subset of the overall index they are
> very sparsely populated with regard to the overall index.  With CMS GC we
> get a sawtooth on the old generation (I guess every replication and
> commit causes it's usage to drop down to 10GB or so) and it seems to be
> the old generation which is the main space consumer.  With the G1GC, the
> memory map looked totally different!  I was a little lost looking at
> memory consumption with that GC.  Maybe I'll try it again now that the
> index is a bit smaller than it was last time I tried it.  After four days
> without running an optimize now it is 21GB.  BTW our indexing speed is
> mostly bound by the DB so reducing the segments might be ok...
> 
> Here is a quick snapshot of one slaves memory map as reported by
> PSI-Probe, but unfortunately I guess I can't send the history graphics to
> the solr-user list to show their changes over time:
>   NameUsedCommitted   Max 
> Initial Group
>Par Survivor Space 20.02 MB108.13 MB   108.13 MB   
> 108.13 MB   HEAP
>CMS Perm Gen   42.29 MB70.66 MB82.00 MB20.75 
> MBNON_HEAP
>Code Cache 9.73 MB 9.88 MB 48.00 MB2.44 MB NON_HEAP
>CMS Old Gen20.22 GB30.94 GB30.94 GB
> 30.94 GBHEAP
>Par Eden Space 42.20 MB865.31 MB   865.31 MB   865.31 
> MB   HEAP
>Total  20.33 GB31.97 GB32.02 GB
> 31.92 GBTOTAL
> 
> And here's our current cache stats from a random slave:
> 
> name:queryResultCache  
> class:   org.apache.solr.search.LRUCache  
> version: 1.0  
> description: LRU Cache(maxSize=488, initialSize=6, autowarmCount=6,
> regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3)  
> stats:  lookups : 619 
> hits : 36 
> hitratio : 0.05 
> inserts : 592 
> evictions : 101 
> size : 488 
> warmupTime : 2949 
> cumulative_lookups : 681225 
> cumulative_hits : 73126 
> cumulative_hitratio : 0.10 
> cumulative_inserts : 602396 
> cumulative_evictions : 428868
> 
> 
>  name:fieldCache  
> class:   org.apache.solr.search.SolrFieldCacheMBean  
> version: 1.0  
> description: Provides introspection of the Lucene FieldCache, this is
> **NOT** a cache that is managed by Solr.  
> stats:  entries_count : 359
> 
> 
> name:documentCache  
> class:   org.apache.solr.search.LRUCache  
> version: 1.0  
> description: LRU Cache(maxSize=2048, initialSize=512,
> autowarmCount=10, regenerator=null)  
> stats:  lookups : 12710 
> hits : 7160 
> hitratio : 0.56 
> inserts : 5636 
> evictions : 3588 
> size : 2048 
> warmupTime : 0 
> cumulative_lookups : 10590054 
> cumulative_hits : 6166913 
> cumulative_hitratio : 0.58 
> cumulative_inserts : 4423141 
> cumulative_evictions : 3714653
> 
> 
> name:fieldValueCache  
> class:   org.apache.solr.search.FastLRUCache  
> version: 1.0  
> description: Concurrent LRU Cache(maxSize=280, initialSize=280,
> minSize=252, acceptableSize=266, cleanupThread=false, autowarmCount=6,
> regenerator=org.apache.solr.search.SolrIndexSearcher$1@143eb77a)  
> stats:  lookups : 1725 
> hits : 1481 
> hitratio : 0.85 
> inserts : 122 
> evictions : 0 
> size : 128 
> warmupTime : 4426 
> cumulative_lookups : 3449712 
> cumulative_hits : 3281805 
> cumulative_hitratio : 0.95 
> cumulative_inserts : 83261 
> cumulative_evictions : 3479
> 
> 
> name:filterCache  
> class:   org.apache.solr.search.FastLRUCache  
> version: 1.0  
> description: Concurrent LRU Cache(maxSize=248, initialSize=12,
> minSize=223, acceptableSize=235, cleanupThread=false, autowarmCount=10,
> regenerator=org.apache.solr.search.SolrIndexSearcher$2@36e831d6)  
> stats

Solr data files

2013-06-17 Thread Mysurf Mail

Where are the core data files located?
Can I just delete folder/files in order to quick clean the core/indexes?
Thanks

SOLR Cloud - Disable Transaction Logs

2013-06-17 Thread Rishi Easwaran

Hi,

Is there a way to disable transaction logs in SOLR cloud. As far as I can tell 
no.
Just curious why do we need transaction logs, seems like an I/O intensive 
operation.
As long as I have replicatonFactor >1, if a node (leader) goes down, the 
replica can take over and maintain a durable state of my index.

I understand from the previous discussions, that it was intended for update 
durability and realtime get.
But, unless I am missing something an ability to disable it in SOLR cloud if 
not needed would be good.

Thanks,

Rishi.

Re: How to define my data in schema.xml

2013-06-17 Thread Mysurf Mail

Thanks for your quick reply. Here are some notes:

1. Consider that all tables in my example have two columns: Name &
Description which I would like to index and search.
2. I have no other reason to create flat table other than for solar. So I
would like to see if I can avoid it.
3. If in my example I will have a flat table then obviously it will hold a
lot of rows for a single school.
By searching the exact school name I will likely receive a lot of rows.
(my flat table has its own pk)
That is something I would like to avoid and I thought I can avoid this
by defining teachers and students as multiple value or something like this
and than teacherCourses and studentHobbies  as 1:n respectively.
This is quite similiar to my real life demand, so I came here to get
some tips as a solr noob.

On Mon, Jun 17, 2013 at 9:08 PM, Gora Mohanty  wrote:

> On 17 June 2013 21:39, Mysurf Mail  wrote:
> > Hi,
> > I have created a flat table from my DB and defined a solr core on it.
> > It works excellent so far.
> >
> > My problem is that my table has two hierarchies. So when flatted it is
> too
> > big.
>
> What do you mean by "too big"? Have you actually tried
> indexing the data into Solr, and does the performance
> not meet your needs, or are you guessing from the size
> of the tables?
>
> > Lets consider the following example scenario
> >
> > My Tables are
> >
> > School
> > Students (1:n with school)
> > Teachers(1:n with school)
> [...]
>
> Um, all of this crucially depends on what your 'n' is.
> Plus, you need to describe your use case in much
> more detail. At the moment, you are asking us to
> guess at what you are trying to do, which is inefficient,
> and unlikely to solve your problem.
>
> Regards,
> Gora
>

Re: Solr data files

2013-06-17 Thread Alexandre Rafalovitch

The index files are under the the collection's directory in the
subdirectory called 'data'. Right next to the directory called 'conf'
where your schema.xml and solrconfig.xml live.

If the Solr is not running, you can delete that directory to clear the
index content. I don't think you can do that while Solr is running.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

On Mon, Jun 17, 2013 at 3:33 PM, Mysurf Mail  wrote:
> Where are the core data files located?
> Can I just delete folder/files in order to quick clean the core/indexes?
> Thanks

Re: SOLR Cloud - Disable Transaction Logs

2013-06-17 Thread Shalin Shekhar Mangar

It is also necessary for near real-time replication, peer sync and recovery.


On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran wrote:

> Hi,
>
> Is there a way to disable transaction logs in SOLR cloud. As far as I can
> tell no.
> Just curious why do we need transaction logs, seems like an I/O intensive
> operation.
> As long as I have replicatonFactor >1, if a node (leader) goes down, the
> replica can take over and maintain a durable state of my index.
>
> I understand from the previous discussions, that it was intended for
> update durability and realtime get.
> But, unless I am missing something an ability to disable it in SOLR cloud
> if not needed would be good.
>
> Thanks,
>
> Rishi.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: shardkey

2013-06-17 Thread Shalin Shekhar Mangar

No, there is no way to do that right now. I think you'd be better off using
custom sharding because you can't really control that two shardKeys must go
to two different shards. We can only guarantee that docs with the same
shardKey will goto the same shard.


On Mon, Jun 17, 2013 at 9:47 PM, Joshi, Shital  wrote:

> Thanks for the links. It was very useful.
>
> Is there a way to use implicit router WITH numShards parameter? We have 5
> shards and business day (Monday-Friday) is our shardkey. We want to be able
> to say Monday -> shard1, Tuesday -> shard2.
>
>
>
>
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Thursday, June 13, 2013 2:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: shardkey
>
> Also you might want to check this blog post, just went up today.
>
> http://searchhub.org/2013/06/13/solr-cloud-document-routing/
>
>
> On Wed, Jun 12, 2013 at 2:18 PM, James Thomas  wrote:
>
> > This page has some good information on custom document routing:
> >
> >
> http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> >
> >
> >
> > -Original Message-
> > From: Rishi Easwaran [mailto:rishi.easwa...@aol.com]
> > Sent: Wednesday, June 12, 2013 1:40 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: shardkey
> >
> > From my understanding.
> > In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
> > CompositeId router is default if your numShards>1 on collection creation.
> > CompositeId router generates an hash using the uniqueKey defined in your
> > schema.xml to route your documents to a dedicated shard.
> >
> > You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit
> > only the shard that has your shard.key
> >
> >
> >
> >  Thanks,
> >
> > Rishi.
> >
> >
> >
> > -Original Message-
> > From: Joshi, Shital 
> > To: 'solr-user@lucene.apache.org' 
> > Sent: Wed, Jun 12, 2013 10:01 am
> > Subject: shardkey
> >
> >
> > Hi,
> >
> > We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple
> > questions on shard key.
> >
> > 1. Looking at the admin GUI, how do I know which field is being
> > used for shard key.
> > 2. What is the default shard key used?
> > 3. How do I override the default shard key?
> >
> > Thanks.
> >
> >
> >
>
>
> --
> Joel Bernstein
> Professional Services LucidWorks
>



-- 
Regards,
Shalin Shekhar Mangar.

Start custom Java component on Solr start?

2013-06-17 Thread Otis Gospodnetic

Hi,

What is the best thing in Solr to hook into that would allow me to
start (and keep running) a custom piece of code when Solr starts?  Say
I want to have something that pulls data from an external queue from
within Solr and indexes it into Solr and I want it start and stop
together with the Solr process.  Is there any place in Solr where one
could do that?

Thanks,
Otis
--
Solr & ElasticSearch Support
http://sematext.com/

Re: Start custom Java component on Solr start?

2013-06-17 Thread Al Wold

I've used a servlet context listener before and it works pretty well. You just 
have a write a small Java class to receive the event when the app is started, 
then add it to web.xml.

I don't think there's much good official documentation, but this blog post 
outlines it pretty simply:

http://www.mkyong.com/servlet/what-is-listener-servletcontextlistener-example/

-Al

On Jun 17, 2013, at 1:03 PM, Otis Gospodnetic wrote:

> Hi,
> 
> What is the best thing in Solr to hook into that would allow me to
> start (and keep running) a custom piece of code when Solr starts?  Say
> I want to have something that pulls data from an external queue from
> within Solr and indexes it into Solr and I want it start and stop
> together with the Solr process.  Is there any place in Solr where one
> could do that?
> 
> Thanks,
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/

Re: Start custom Java component on Solr start?

2013-06-17 Thread Shalin Shekhar Mangar

I assume you don't want it per-core. Custom CoreAdminHandler maybe?


On Tue, Jun 18, 2013 at 1:33 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> What is the best thing in Solr to hook into that would allow me to
> start (and keep running) a custom piece of code when Solr starts?  Say
> I want to have something that pulls data from an external queue from
> within Solr and indexes it into Solr and I want it start and stop
> together with the Solr process.  Is there any place in Solr where one
> could do that?
>
> Thanks,
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>



-- 
Regards,
Shalin Shekhar Mangar.

Parallel queries on a single core

2013-06-17 Thread Manuel Le Normand

Hello all,
Assuming I have a single shard with a single core, how do run
multi-threaded queries on Solr 4.x?

Specifically, if one user sends a heavy query (legitimate wildcard query
for 10 sec), what happens to all other users quering during this period?

If the repsonse is that simultaneous queries (say 2) run multi-threaded, a
single CPU would switch between those two query-threads, and in case of 2
CPU's each CPU would run his own thread. But the latter case does not give
any advantage to repFactor > 1 perfomance speaking, as it's close to same
as a single replica running wth >1 CPU's. So I am bit confused about this,

Thanks,

Manu

column of linked table can not be displayed

2013-06-17 Thread Jenny Huang

I am importing data from two database tables into solr.  The main table is
called 'gene'.  The other table is called 'taxon'.  The two tables are
connected through 'taxon' column in 'gene' table and 'taxon_oid' column in
'taxon' table.  In another word, 'gene.taxon = taxon.taxon_oid'.

I want the 'domain' column in 'taxon' table to show up together with
columns in gene table.   Unfortunately, the 'domain' column refuse to be
displayed.  See below for related markup in data-config.xml and schema.xml.

data-config.xml:


..
















schema.xml:

   
   
   
   
   
   








The import goes smoothly.  However, when I did the 'query' in admin
browser, the 'domain' column never show up.

http://localhost:8983/solr/imgdb/select?q=*%3A*&wt=xml&indent=true


  0
  7
  
true
*:*
1371500131137
xml
  
  
637000454
hypothetical protein
637000261
SBO_2569
637808244  



I want the  to be something like:

  
637000454
hypothetical protein
637000261
SBO_2569
637808244
bacteria 

Could someone help me and let me know what went wrong?  Thanks ahead.

Avoiding OOM fatal crash

2013-06-17 Thread Manuel Le Normand

Hello again,

After a heavy query on my index (returning 100K docs in a single query) my
JVM heap's floods and I get an JAVA OOM exception, and then that my
GCcannot collect anything (GC
overhead limit exceeded) as these memory chunks are not disposable.

I want to afford queries like this, my concern is that this case provokes a
total Solr crash, returning a 503 Internal Server Error while trying to *
index.*

Is there anyway to separate these two logics? I'm fine with solr not being
able to return any response after returning this OOM, but I don't see the
justification the query to flood JVM's internal (bounded) buffers for
writings.

Thanks,
Manuel

Re: Avoiding OOM fatal crash

2013-06-17 Thread Manuel Le Normand

One of my users requested it, they are less aware of what's allowed and I
don't want apriori blocking them for long specific request (there are other
params that might end up OOMing me).

I thought of timeAllowed restriction, but also this solution cannot
guarantee during this delay I would not get the JVM heap flooded (for
example I already have all cashed and my RAM io's are very fast)


On Mon, Jun 17, 2013 at 11:47 PM, Walter Underwood wrote:

> Don't request 100K docs in a single query. Fetch them in smaller batches.
>
> wunder
>
> On Jun 17, 2013, at 1:44 PM, Manuel Le Normand wrote:
>
> > Hello again,
> >
> > After a heavy query on my index (returning 100K docs in a single query)
> my
> > JVM heap's floods and I get an JAVA OOM exception, and then that my
> > GCcannot collect anything (GC
> > overhead limit exceeded) as these memory chunks are not disposable.
> >
> > I want to afford queries like this, my concern is that this case
> provokes a
> > total Solr crash, returning a 503 Internal Server Error while trying to *
> > index.*
> >
> > Is there anyway to separate these two logics? I'm fine with solr not
> being
> > able to return any response after returning this OOM, but I don't see the
> > justification the query to flood JVM's internal (bounded) buffers for
> > writings.
> >
> > Thanks,
> > Manuel
>
>
>
>
>
>

Re: Solr large boolean filter

2013-06-17 Thread Alexandre Rafalovitch

On Mon, Jun 17, 2013 at 12:35 PM, Igor Kustov  wrote:
> So I'm using query like
> http://127.0.0.1:8080/solr/select?q=*:*&fq={!mqparser}id:%281%202%203%29

If the IDs are purely numeric, I wonder if the better way is to send a
bitset. So, bit 1 is on if ID:1 is included, bit 2000 is on if ID:2000
is included. Even using URL-encoding rules, you can fit at least 65
sequential ID flags per character and I am sure there are more
efficient encoding schemes for long empty sequences.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Re: Avoiding OOM fatal crash

2013-06-17 Thread Walter Underwood

Don't request 100K docs in a single query. Fetch them in smaller batches.

wunder

On Jun 17, 2013, at 1:44 PM, Manuel Le Normand wrote:

> Hello again,
> 
> After a heavy query on my index (returning 100K docs in a single query) my
> JVM heap's floods and I get an JAVA OOM exception, and then that my
> GCcannot collect anything (GC
> overhead limit exceeded) as these memory chunks are not disposable.
> 
> I want to afford queries like this, my concern is that this case provokes a
> total Solr crash, returning a 503 Internal Server Error while trying to *
> index.*
> 
> Is there anyway to separate these two logics? I'm fine with solr not being
> able to return any response after returning this OOM, but I don't see the
> justification the query to flood JVM's internal (bounded) buffers for
> writings.
> 
> Thanks,
> Manuel

Re: SOLR Cloud - Disable Transaction Logs

2013-06-17 Thread Rishi Easwaran

Shalin,

Just some thoughts.

Near Real time replication- don't we use solrCmdDistributor, which send 
requests immediately to replicas with a clonedRequest, as an option can't we 
achieve something similar form CloudSolrserver in Solrj instead of leader doing 
it. As long as 2 nodes receive writes and acknowledge. durability should be 
high.
Peer-Sync and Recovery - Can we achieve that merging indexes from leader as 
needed, instead of replaying the transaction logs?

Rishi.

-Original Message-
From: Shalin Shekhar Mangar 
To: solr-user 
Sent: Mon, Jun 17, 2013 3:43 pm
Subject: Re: SOLR Cloud - Disable Transaction Logs

It is also necessary for near real-time replication, peer sync and recovery.

On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran wrote:

> Hi,
>
> Is there a way to disable transaction logs in SOLR cloud. As far as I can
> tell no.
> Just curious why do we need transaction logs, seems like an I/O intensive
> operation.
> As long as I have replicatonFactor >1, if a node (leader) goes down, the
> replica can take over and maintain a durable state of my index.
>
> I understand from the previous discussions, that it was intended for
> update durability and realtime get.
> But, unless I am missing something an ability to disable it in SOLR cloud
> if not needed would be good.
>
> Thanks,
>
> Rishi.
>
>

-- 
Regards,
Shalin Shekhar Mangar.

Spread the word - Opening at AOL Mail Team in Dulles VA

2013-06-17 Thread Rishi Easwaran

Hi All,

With the economy the way it is and many folks still looking. 
Figured this is a good place as any to publish this. 

Just today, we got an opening for mid-senior level Software Engineer in our 
team.
Experience with SOLR is a big+.
Feel free to have a look at this position.
http://www.linkedin.com/jobs?viewJob=&jobId=6073910

If interested, send your current resume to rishi.easwa...@aol.com.
I will take it to my Director.   

This position is in Dulles, VA.

Thanks,

Rishi.

Re: Avoiding OOM fatal crash

2013-06-17 Thread Walter Underwood

Make them aware of what is required. Solr is not designed to return huge 
requests.

If you need to do this, you will need to run the JVM with a big enough heap to 
build the request. You are getting OOM because the JVM does not have enough 
memory to build a response with 100K documents.

wunder

On Jun 17, 2013, at 1:57 PM, Manuel Le Normand wrote:

> One of my users requested it, they are less aware of what's allowed and I
> don't want apriori blocking them for long specific request (there are other
> params that might end up OOMing me).
> 
> I thought of timeAllowed restriction, but also this solution cannot
> guarantee during this delay I would not get the JVM heap flooded (for
> example I already have all cashed and my RAM io's are very fast)
> 
> 
> On Mon, Jun 17, 2013 at 11:47 PM, Walter Underwood 
> wrote:
> 
>> Don't request 100K docs in a single query. Fetch them in smaller batches.
>> 
>> wunder
>> 
>> On Jun 17, 2013, at 1:44 PM, Manuel Le Normand wrote:
>> 
>>> Hello again,
>>> 
>>> After a heavy query on my index (returning 100K docs in a single query)
>> my
>>> JVM heap's floods and I get an JAVA OOM exception, and then that my
>>> GCcannot collect anything (GC
>>> overhead limit exceeded) as these memory chunks are not disposable.
>>> 
>>> I want to afford queries like this, my concern is that this case
>> provokes a
>>> total Solr crash, returning a 503 Internal Server Error while trying to *
>>> index.*
>>> 
>>> Is there anyway to separate these two logics? I'm fine with solr not
>> being
>>> able to return any response after returning this OOM, but I don't see the
>>> justification the query to flood JVM's internal (bounded) buffers for
>>> writings.
>>> 
>>> Thanks,
>>> Manuel
>>

Re: Solr large boolean filter

2013-06-17 Thread Otis Gospodnetic

Btw. ElasticSearch has a nice feature here.  Not sure what it's
called, but I call it "named filter".

http://www.elasticsearch.org/blog/terms-filter-lookup/

Maybe that's what OP was after?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jun 17, 2013 at 4:59 PM, Alexandre Rafalovitch
 wrote:
> On Mon, Jun 17, 2013 at 12:35 PM, Igor Kustov  wrote:
>> So I'm using query like
>> http://127.0.0.1:8080/solr/select?q=*:*&fq={!mqparser}id:%281%202%203%29
>
> If the IDs are purely numeric, I wonder if the better way is to send a
> bitset. So, bit 1 is on if ID:1 is included, bit 2000 is on if ID:2000
> is included. Even using URL-encoding rules, you can fit at least 65
> sequential ID flags per character and I am sure there are more
> efficient encoding schemes for long empty sequences.
>
> Regards,
>Alex.
>
>
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)

Re: Parallel queries on a single core

2013-06-17 Thread Otis Gospodnetic

If I understand your question correctly - what happens with Solr and N
parallel queries is not much different from what happens with N
processes running in the OS - they all get a slice of the CPU time to
do their work.  Not sure if that answers your question...?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jun 17, 2013 at 4:32 PM, Manuel Le Normand
 wrote:
> Hello all,
> Assuming I have a single shard with a single core, how do run
> multi-threaded queries on Solr 4.x?
>
> Specifically, if one user sends a heavy query (legitimate wildcard query
> for 10 sec), what happens to all other users quering during this period?
>
> If the repsonse is that simultaneous queries (say 2) run multi-threaded, a
> single CPU would switch between those two query-threads, and in case of 2
> CPU's each CPU would run his own thread. But the latter case does not give
> any advantage to repFactor > 1 perfomance speaking, as it's close to same
> as a single replica running wth >1 CPU's. So I am bit confused about this,
>
> Thanks,
>
> Manu

Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Yago Riveiro

I can confirm that the deadlock happen with only 2 replicas by shard. I need 
shutdown one node that host a replica of the shard to recover the indexation 
capability.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:

> 
> 
> Hi All,
> 
> I am trying to benchmark SOLR Cloud and it consistently hangs. 
> Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
> 
> A little bit about my set up. 
> I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
> is configured to have 8 SOLR cloud nodes running at 4GB each.
> JVM configs: http://apaste.info/57Ai
> 
> My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
> 
> I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
> running this configuration in production in Non-Cloud form. 
> It got stuck repeatedly.
> 
> I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
> and tomcat7. 
> It still shows same behaviour and hangs through the test.
> 
> My test schema and config.
> Schema.xml - http://apaste.info/imah
> SolrConfig.xml - http://apaste.info/ku4F
> 
> The test is pretty simple. its a jmeter test with update command via SOAP rpc 
> (round robin request across every node), adding in 5 fields from a csv file - 
> id, guid, subject, body, compositeID (guid!id).
> number of jmeter threads = 150. loop count = 20, num of messages to add/per 
> guid = 3; total 150*3*20 = 9000 documents. 
> 
> When cloud gets stuck, i don't get anything in the logs, but when i run 
> netstat i see the following.
> Sample netstat on a stuck run. http://apaste.info/hr0O 
> hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
> 
> At the moment my benchmarking efforts are at a stand still.
> 
> Any help from the community would be great, I got some heap dumps and stack 
> dumps, but haven't found a smoking gun yet.
> If I can provide anything else to diagnose this issue. just let me know.
> 
> Thanks,
> 
> Rishi.

Re: Start custom Java component on Solr start?

2013-06-17 Thread Otis Gospodnetic

Hi,

Hm, right, although once Solr stops being a webapp, this won't work any more...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jun 17, 2013 at 4:14 PM, Al Wold  wrote:
> I've used a servlet context listener before and it works pretty well. You 
> just have a write a small Java class to receive the event when the app is 
> started, then add it to web.xml.
>
> I don't think there's much good official documentation, but this blog post 
> outlines it pretty simply:
>
> http://www.mkyong.com/servlet/what-is-listener-servletcontextlistener-example/
>
> -Al
>
> On Jun 17, 2013, at 1:03 PM, Otis Gospodnetic wrote:
>
>> Hi,
>>
>> What is the best thing in Solr to hook into that would allow me to
>> start (and keep running) a custom piece of code when Solr starts?  Say
>> I want to have something that pulls data from an external queue from
>> within Solr and indexes it into Solr and I want it start and stop
>> together with the Solr process.  Is there any place in Solr where one
>> could do that?
>>
>> Thanks,
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/
>

Re: Parallel queries on a single core

2013-06-17 Thread Manuel Le Normand

Yes, that answers the first part of my question, thanks.

So saying N (equally heavy) queries agains N CPUs would run simultaneously,
right?

Previous posting suggest high qps rate can be solved perfomance-wise by
having high replicationFactor. But what's the  benefit (performance wise)
compared to having a single replica served by many CPU's?




On Tue, Jun 18, 2013 at 12:14 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> If I understand your question correctly - what happens with Solr and N
> parallel queries is not much different from what happens with N
> processes running in the OS - they all get a slice of the CPU time to
> do their work.  Not sure if that answers your question...?
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Mon, Jun 17, 2013 at 4:32 PM, Manuel Le Normand
>  wrote:
> > Hello all,
> > Assuming I have a single shard with a single core, how do run
> > multi-threaded queries on Solr 4.x?
> >
> > Specifically, if one user sends a heavy query (legitimate wildcard query
> > for 10 sec), what happens to all other users quering during this period?
> >
> > If the repsonse is that simultaneous queries (say 2) run multi-threaded,
> a
> > single CPU would switch between those two query-threads, and in case of 2
> > CPU's each CPU would run his own thread. But the latter case does not
> give
> > any advantage to repFactor > 1 perfomance speaking, as it's close to same
> > as a single replica running wth >1 CPU's. So I am bit confused about
> this,
> >
> > Thanks,
> >
> > Manu
>

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-17 Thread Al Wold

Hi Erick,
I tried out your changes from the branch_4x branch. It looks good in terms of 
preserving the zkHost, but I'm running into an exception because it isn't 
persisting the instanceDir attribute on the  element.

I've got a few other things I need to take care of, but as soon as I have time 
I'll dig in and see if I can figure out what's going on, and see what changed 
to make this not work.

Here are details on what the files looked like before/after CREATE call:

original solr.xml:



  
  


here's what was produced with 4.3 branch + a quick mod to preserve zkHost:



  


  


here's what was produced with branch_4x 4.4-SNAPSHOT:



  


  


and here's the error from solr.log after restarting after the CREATE:

2013-06-17 21:37:07,083 1874 [pool-2-thread-1] ERROR 
org.apache.solr.core.CoreContainer  - null:java.lang.NullPointerException: 
Missing required 'instanceDir'
at org.apache.solr.core.CoreDescriptor.doInit(CoreDescriptor.java:133)
at org.apache.solr.core.CoreDescriptor.(CoreDescriptor.java:87)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:365)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:221)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:190)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:124)
at 
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
at 
org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1099)
at 
org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1621)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)


On Jun 16, 2013, at 5:38 AM, Erick Erickson wrote:

> Al:
> 
> As it happens, I hope sometime today to put up a patch for SOLR-4910
> that should harden up many things in persisting solr.xml, I'll be sure
> to include this. It's kind of a pain to create an automated test for
> this, so I'll give it a whirl manually.
> 
> As you say, most of this is going away in 5.0, but it needs to work for 4.x.
> 
> And when I get the patch up, if you could give it a "real world" try
> it'd be great!
> 
> Thanks,
> Erick
> 
> On Fri, Jun 14, 2013 at 6:15 PM, Al Wold  wrote:
>> Hi,
>> I'm working on setting up a solr cloud test environment, and the target 
>> environment I need to put it in has multiple webapps per tomcat instance. 
>> With that in mind, I wanted/had to avoid putting any configs in system 
>> properties. I tried putting the zkHost in solr.xml, like this:
>> 
>>> 
>>> 
>>>  
>>>  >> hostContext="/"/>
>>> 
>> 
>> Everything works fine when I first start things up, create collections, 
>> upload docs, search, etc. Creating the collection, however, modifies the 
>> solr.xml file, and doesn't keep the zkHost setting:
>> 
>>> 
>>> 
>>>  >> hostContext="/">
>>>>> instanceDir="directory_shard2_replica1/" transient="false" 
>>> name="directory_shard2_replica1" collection="directory"/>
>>>>> instanceDir="directory_shard1_replica1/" transient="false" 
>>> name="directory_shard1_replica1" collection="directory"/>
>>>  
>>> 
>> 
>> 
>> With that in mind, once I restart tomcat, it no longer knows it's supposed 
>> to be talking to zookeeper, so it looks for local configs and blows up.
>> 
>> I traced this back to the code in CoreContainer.java, in the method 
>> persistFile(), where it seems to contain no code to write out the zkHost 
>> when it updates solr.xml. I upped the logging on my solr instance to verify 
>> this code is executing, so I'm pretty sure it's the right spot.
>> 
>> Is anyone else using zkHost in their solr.xml successfully? I can't see how 
>> it would work given this pr

Re: Parallel queries on a single core

2013-06-17 Thread Shawn Heisey


On 6/17/2013 2:32 PM, Manuel Le Normand wrote:

Hello all,
Assuming I have a single shard with a single core, how do run
multi-threaded queries on Solr 4.x?


To run multithreaded queries, just send them at the same time, as you 
mention below.  Solr will run them in parallel, within the limits of the 
system(s) you're using.



Specifically, if one user sends a heavy query (legitimate wildcard query
for 10 sec), what happens to all other users quering during this period?

If the repsonse is that simultaneous queries (say 2) run multi-threaded, a
single CPU would switch between those two query-threads, and in case of 2
CPU's each CPU would run his own thread. But the latter case does not give
any advantage to repFactor > 1 perfomance speaking, as it's close to same
as a single replica running wth >1 CPU's. So I am bit confused about this,


Your question isn't very clear.  Otis has answered one possibility.  I 
have something different to address.


The main advantage of repFactor > 1 is redundancy and high performance 
in high-volume situations.


With multiple replicas, each one will be on a completely separate 
machine.  Whether that performs better than a single machine with 
multiple CPUs will depend on a bunch of factors.  Some examples: the 
amount of data returned by a query and whether Solr can get the data out 
of the disk cache or whether it has to actually go to the disk.


In memory-limited situations, having the disk I/O bandwidth of multiple 
machines might overcome the latency introduced by the distributed 
communication over the network.  Also, it's likely that you'll have more 
total RAM with more machines, so splitting the index into shards on 
separate machines makes it more likely that each shard will fit into 
available memory.


Thanks,
Shawn

Re: Avoiding OOM fatal crash

2013-06-17 Thread Manuel Le Normand

Unfortunately my organisation's too big to control or teach every employee
what are the limits, as well as they can vary (many facets - how much is
ok?, asking for too many fields in proportion of too many rows etc)

Don't you think it is preferable to "commit" the maxBufferSize in the JVM
heap for indexing only?


On Tue, Jun 18, 2013 at 12:11 AM, Walter Underwood wrote:

> Make them aware of what is required. Solr is not designed to return huge
> requests.
>
> If you need to do this, you will need to run the JVM with a big enough
> heap to build the request. You are getting OOM because the JVM does not
> have enough memory to build a response with 100K documents.
>
> wunder
>
> On Jun 17, 2013, at 1:57 PM, Manuel Le Normand wrote:
>
> > One of my users requested it, they are less aware of what's allowed and I
> > don't want apriori blocking them for long specific request (there are
> other
> > params that might end up OOMing me).
> >
> > I thought of timeAllowed restriction, but also this solution cannot
> > guarantee during this delay I would not get the JVM heap flooded (for
> > example I already have all cashed and my RAM io's are very fast)
> >
> >
> > On Mon, Jun 17, 2013 at 11:47 PM, Walter Underwood <
> wun...@wunderwood.org>wrote:
> >
> >> Don't request 100K docs in a single query. Fetch them in smaller
> batches.
> >>
> >> wunder
> >>
> >> On Jun 17, 2013, at 1:44 PM, Manuel Le Normand wrote:
> >>
> >>> Hello again,
> >>>
> >>> After a heavy query on my index (returning 100K docs in a single query)
> >> my
> >>> JVM heap's floods and I get an JAVA OOM exception, and then that my
> >>> GCcannot collect anything (GC
> >>> overhead limit exceeded) as these memory chunks are not disposable.
> >>>
> >>> I want to afford queries like this, my concern is that this case
> >> provokes a
> >>> total Solr crash, returning a 503 Internal Server Error while trying
> to *
> >>> index.*
> >>>
> >>> Is there anyway to separate these two logics? I'm fine with solr not
> >> being
> >>> able to return any response after returning this OOM, but I don't see
> the
> >>> justification the query to flood JVM's internal (bounded) buffers for
> >>> writings.
> >>>
> >>> Thanks,
> >>> Manuel
> >>
>
>
>
>
>

Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Rishi Easwaran

Update!!

This happens with replicationFactor=1
Just for kicks I created a collection with a 24 shards, replicationfactor=1 
cluster on my exisiting benchmark env.
Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most 
metrics looks fine.
Only indication seems to be netstat showing incoming request not being read in.

Yago,

I saw your previous post 
(http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets 
fixed, but no luck.
Looks like this is a dominant and easily reproducible issue on SOLR cloud.

Thanks,

Rishi. 

-Original Message-
From: Yago Riveiro 
To: solr-user 
Sent: Mon, Jun 17, 2013 5:15 pm
Subject: Re: Solr Cloud Hangs consistently .

I can confirm that the deadlock happen with only 2 replicas by shard. I need 
shutdown one node that host a replica of the shard to recover the indexation 
capability.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:

> 
> 
> Hi All,
> 
> I am trying to benchmark SOLR Cloud and it consistently hangs. 
> Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
> 
> A little bit about my set up. 
> I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
is configured to have 8 SOLR cloud nodes running at 4GB each.
> JVM configs: http://apaste.info/57Ai
> 
> My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
> 
> I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
running this configuration in production in Non-Cloud form. 
> It got stuck repeatedly.
> 
> I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
and tomcat7. 
> It still shows same behaviour and hangs through the test.
> 
> My test schema and config.
> Schema.xml - http://apaste.info/imah
> SolrConfig.xml - http://apaste.info/ku4F
> 
> The test is pretty simple. its a jmeter test with update command via SOAP rpc 
(round robin request across every node), adding in 5 fields from a csv file - 
id, guid, subject, body, compositeID (guid!id).
> number of jmeter threads = 150. loop count = 20, num of messages to add/per 
guid = 3; total 150*3*20 = 9000 documents. 
> 
> When cloud gets stuck, i don't get anything in the logs, but when i run 
netstat i see the following.
> Sample netstat on a stuck run. http://apaste.info/hr0O 
> hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
> 
> At the moment my benchmarking efforts are at a stand still.
> 
> Any help from the community would be great, I got some heap dumps and stack 
dumps, but haven't found a smoking gun yet.
> If I can provide anything else to diagnose this issue. just let me know.
> 
> Thanks,
> 
> Rishi.

Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Mark Miller

If it actually happens with replicationFactor=1, it doesn't likely have 
anything to do with the update handler issue I'm referring to. In some cases 
like these, people have better luck with Jetty than Tomcat - we test it much 
more. For instance, it's setup to help avoid search side distributed deadlocks.

In any case, there is something special about it - I do and have seen a lot of 
heavy indexing to SolrCloud by me and others without running into this. Both 
with replicationFacotor=1 and greater. So there is something specific in how 
the load is being done or what features/methods are being used that likely 
causes it or makes it easier to cause.

But again, the issue I know about involves threads that are not even created in 
the replicationFactor = 1 case, so that could be a first report afaik.

- Mark

On Jun 17, 2013, at 5:52 PM, Rishi Easwaran  wrote:

> Update!!
> 
> This happens with replicationFactor=1
> Just for kicks I created a collection with a 24 shards, replicationfactor=1 
> cluster on my exisiting benchmark env.
> Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most 
> metrics looks fine.
> Only indication seems to be netstat showing incoming request not being read 
> in.
> 
> Yago,
> 
> I saw your previous post 
> (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
> Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets 
> fixed, but no luck.
> Looks like this is a dominant and easily reproducible issue on SOLR cloud.
> 
> 
> Thanks,
> 
> Rishi. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Yago Riveiro 
> To: solr-user 
> Sent: Mon, Jun 17, 2013 5:15 pm
> Subject: Re: Solr Cloud Hangs consistently .
> 
> 
> I can confirm that the deadlock happen with only 2 replicas by shard. I need 
> shutdown one node that host a replica of the shard to recover the indexation 
> capability.
> 
> -- 
> Yago Riveiro
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> 
> 
> On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:
> 
>> 
>> 
>> Hi All,
>> 
>> I am trying to benchmark SOLR Cloud and it consistently hangs. 
>> Nothing in the logs, no stack trace, no errors, no warnings, just seems 
>> stuck.
>> 
>> A little bit about my set up. 
>> I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each 
>> host 
> is configured to have 8 SOLR cloud nodes running at 4GB each.
>> JVM configs: http://apaste.info/57Ai
>> 
>> My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
>> 
>> I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
> running this configuration in production in Non-Cloud form. 
>> It got stuck repeatedly.
>> 
>> I decided to upgrade to the latest and greatest of everything, SOLR 4.3, 
>> JDK7 
> and tomcat7. 
>> It still shows same behaviour and hangs through the test.
>> 
>> My test schema and config.
>> Schema.xml - http://apaste.info/imah
>> SolrConfig.xml - http://apaste.info/ku4F
>> 
>> The test is pretty simple. its a jmeter test with update command via SOAP 
>> rpc 
> (round robin request across every node), adding in 5 fields from a csv file - 
> id, guid, subject, body, compositeID (guid!id).
>> number of jmeter threads = 150. loop count = 20, num of messages to add/per 
> guid = 3; total 150*3*20 = 9000 documents. 
>> 
>> When cloud gets stuck, i don't get anything in the logs, but when i run 
> netstat i see the following.
>> Sample netstat on a stuck run. http://apaste.info/hr0O 
>> hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
>> 
>> At the moment my benchmarking efforts are at a stand still.
>> 
>> Any help from the community would be great, I got some heap dumps and stack 
> dumps, but haven't found a smoking gun yet.
>> If I can provide anything else to diagnose this issue. just let me know.
>> 
>> Thanks,
>> 
>> Rishi. 
> 
> 
>

dynamic field

2013-06-17 Thread Mingfeng Yang

How is daynamic field in solr implemented?  Does it get saved into the same
Document as other regular fields in lucene index?

Ming-

Re: dynamic field

2013-06-17 Thread Rafał Kuć

Hello!

Dynamic field is just a regular field from Lucene point of view, so
its content will be treated just like the content of other fields. The
difference in on the Solr level.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

> How is daynamic field in solr implemented?  Does it get saved into the same
> Document as other regular fields in lucene index?

> Ming-

Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Yago Riveiro

I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, 
if is higher deadlock problems can appear

A stack trace like this 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
 is that I get

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote:

> If it actually happens with replicationFactor=1, it doesn't likely have 
> anything to do with the update handler issue I'm referring to. In some cases 
> like these, people have better luck with Jetty than Tomcat - we test it much 
> more. For instance, it's setup to help avoid search side distributed 
> deadlocks.
> 
> In any case, there is something special about it - I do and have seen a lot 
> of heavy indexing to SolrCloud by me and others without running into this. 
> Both with replicationFacotor=1 and greater. So there is something specific in 
> how the load is being done or what features/methods are being used that 
> likely causes it or makes it easier to cause.
> 
> But again, the issue I know about involves threads that are not even created 
> in the replicationFactor = 1 case, so that could be a first report afaik.
> 
> - Mark
> 
> On Jun 17, 2013, at 5:52 PM, Rishi Easwaran  (mailto:rishi.easwa...@aol.com)> wrote:
> 
> > Update!!
> > 
> > This happens with replicationFactor=1
> > Just for kicks I created a collection with a 24 shards, replicationfactor=1 
> > cluster on my exisiting benchmark env.
> > Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu 
> > most metrics looks fine.
> > Only indication seems to be netstat showing incoming request not being read 
> > in.
> > 
> > Yago,
> > 
> > I saw your previous post 
> > (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
> > Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets 
> > fixed, but no luck.
> > Looks like this is a dominant and easily reproducible issue on SOLR cloud.
> > 
> > 
> > Thanks,
> > 
> > Rishi. 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -Original Message-
> > From: Yago Riveiro mailto:yago.rive...@gmail.com)>
> > To: solr-user  > (mailto:solr-user@lucene.apache.org)>
> > Sent: Mon, Jun 17, 2013 5:15 pm
> > Subject: Re: Solr Cloud Hangs consistently .
> > 
> > 
> > I can confirm that the deadlock happen with only 2 replicas by shard. I 
> > need 
> > shutdown one node that host a replica of the shard to recover the 
> > indexation 
> > capability.
> > 
> > -- 
> > Yago Riveiro
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > 
> > 
> > On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:
> > 
> > > 
> > > 
> > > Hi All,
> > > 
> > > I am trying to benchmark SOLR Cloud and it consistently hangs. 
> > > Nothing in the logs, no stack trace, no errors, no warnings, just seems 
> > > stuck.
> > > 
> > > A little bit about my set up. 
> > > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each 
> > > host 
> > > 
> > 
> > is configured to have 8 SOLR cloud nodes running at 4GB each.
> > > JVM configs: http://apaste.info/57Ai
> > > 
> > > My cluster has 12 shards with replication factor 2- 
> > > http://apaste.info/09sA
> > > 
> > > I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
> > running this configuration in production in Non-Cloud form. 
> > > It got stuck repeatedly.
> > > 
> > > I decided to upgrade to the latest and greatest of everything, SOLR 4.3, 
> > > JDK7 
> > and tomcat7. 
> > > It still shows same behaviour and hangs through the test.
> > > 
> > > My test schema and config.
> > > Schema.xml - http://apaste.info/imah
> > > SolrConfig.xml - http://apaste.info/ku4F
> > > 
> > > The test is pretty simple. its a jmeter test with update command via SOAP 
> > > rpc 
> > (round robin request across every node), adding in 5 fields from a csv file 
> > - 
> > id, guid, subject, body, compositeID (guid!id).
> > > number of jmeter threads = 150. loop count = 20, num of messages to 
> > > add/per 
> > 
> > guid = 3; total 150*3*20 = 9000 documents. 
> > > 
> > > When cloud gets stuck, i don't get anything in the logs, but when i run 
> > netstat i see the following.
> > > Sample netstat on a stuck run. http://apaste.info/hr0O 
> > > hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
> > > 
> > > At the moment my benchmarking efforts are at a stand still.
> > > 
> > > Any help from the community would be great, I got some heap dumps and 
> > > stack 
> > dumps, but haven't found a smoking gun yet.
> > > If I can provide anything else to diagnose this issue. just let me know.
> > > 
> > > Thanks,
> > > 
> > > Rishi.

Re: Start custom Java component on Solr start?

2013-06-17 Thread Chris Hostetter


: What is the best thing in Solr to hook into that would allow me to
: start (and keep running) a custom piece of code when Solr starts?  Say
: I want to have something that pulls data from an external queue from
: within Solr and indexes it into Solr and I want it start and stop
: together with the Solr process.  Is there any place in Solr where one
: could do that?

My advice for this has always been a SolrCoreAware RequestHandler which 
spun up whatever threads it needs in it's init method and uses it's 
handleRequest method to return status information or recieve commands 
about pausing.  

This approach has worked very well for me in the past, oging as far back 
as Solr 1.0 -- which was before we even had the SolrCoreAware interface, i 
used a newSearcher listener back then to fire a "start" command to my 
custom RequestHandler.


-Hoss

RE: yet another optimize question

2013-06-17 Thread Petersen, Robert

Hi Upayavira,

You might have gotten it.  Yes we noticed maxdocs was way bigger than numdocs.  
There were a lot of files ending in '.del' in the index folder also.  We 
started on 1.3 also.   I don't currently have any solr config settings for 
MergePolicy at all.  Am I going to want to put something like this into my 
index defaults section?


   10
   10


Thanks
Robi

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Monday, June 17, 2013 12:29 PM
To: solr-user@lucene.apache.org
Subject: Re: yet another optimize question

The key figures are numdocs vs maxdocs. Maxdocs-numdocs is the number of 
deleted docs in your index.

This is a 3.6 system you say. But has it been upgraded? I've seen folks who've 
upgraded from 1.4 or 3.0/3.1 over time, keeping the old config.
The consequence of this is that they don't get the right config for the 
TieredMergePolicy, and therefore don't get to use it, seeing the old behaviour 
which does require periodic optimise.

Upayavira

On Mon, Jun 17, 2013, at 07:21 PM, Petersen, Robert wrote:
> Hi Otis,
> 
> Right I didn't restart the JVMs except on the one slave where I was
> experimenting with using G1GC on the 1.7.0_21 JRE.   Also some time ago I
> made all our caches small enough to keep us from getting OOMs while still
> having a good hit rate.Our index has about 50 fields which are mostly
> int IDs and there are some dynamic fields also.  These dynamic fields 
> can be used for custom faceting.  We have some standard facets we 
> always facet on and other dynamic facets which are only used if the 
> query is filtering on a particular category.  There are hundreds of 
> these fields but since they are only for a small subset of the overall 
> index they are very sparsely populated with regard to the overall 
> index.  With CMS GC we get a sawtooth on the old generation (I guess 
> every replication and commit causes it's usage to drop down to 10GB or 
> so) and it seems to be the old generation which is the main space 
> consumer.  With the G1GC, the memory map looked totally different!  I 
> was a little lost looking at memory consumption with that GC.  Maybe 
> I'll try it again now that the index is a bit smaller than it was last 
> time I tried it.  After four days without running an optimize now it 
> is 21GB.  BTW our indexing speed is mostly bound by the DB so reducing the 
> segments might be ok...
> 
> Here is a quick snapshot of one slaves memory map as reported by 
> PSI-Probe, but unfortunately I guess I can't send the history graphics 
> to the solr-user list to show their changes over time:
>   NameUsedCommitted   Max 
> Initial Group
>Par Survivor Space 20.02 MB108.13 MB   108.13 MB   
> 108.13 MB   HEAP
>CMS Perm Gen   42.29 MB70.66 MB82.00 MB20.75 
> MBNON_HEAP
>Code Cache 9.73 MB 9.88 MB 48.00 MB2.44 MB NON_HEAP
>CMS Old Gen20.22 GB30.94 GB30.94 GB
> 30.94 GBHEAP
>Par Eden Space 42.20 MB865.31 MB   865.31 MB   865.31 
> MB   HEAP
>Total  20.33 GB31.97 GB32.02 GB
> 31.92 GBTOTAL
> 
> And here's our current cache stats from a random slave:
> 
> name:queryResultCache  
> class:   org.apache.solr.search.LRUCache  
> version: 1.0  
> description: LRU Cache(maxSize=488, initialSize=6, autowarmCount=6,
> regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3)
> stats:  lookups : 619
> hits : 36
> hitratio : 0.05
> inserts : 592
> evictions : 101
> size : 488
> warmupTime : 2949
> cumulative_lookups : 681225
> cumulative_hits : 73126
> cumulative_hitratio : 0.10
> cumulative_inserts : 602396
> cumulative_evictions : 428868
> 
> 
>  name:fieldCache  
> class:   org.apache.solr.search.SolrFieldCacheMBean  
> version: 1.0  
> description: Provides introspection of the Lucene FieldCache, this is
> **NOT** a cache that is managed by Solr.  
> stats:  entries_count : 359
> 
> 
> name:documentCache  
> class:   org.apache.solr.search.LRUCache  
> version: 1.0  
> description: LRU Cache(maxSize=2048, initialSize=512,
> autowarmCount=10, regenerator=null)
> stats:  lookups : 12710
> hits : 7160
> hitratio : 0.56
> inserts : 5636
> evictions : 3588
> size : 2048
> warmupTime : 0
> cumulative_lookups : 10590054
> cumulative_hits : 6166913
> cumulative_hitratio : 0.58
> cumulative_inserts : 4423141
> cumulative_evictions : 3714653
> 
> 
> name:fieldValueCache  
> class:   org.apache.solr.search.FastLRUCache  
> version: 1.0  
> description: Concurrent LRU Cache(maxSize=280, initialSize=280,
> minSize=252, acceptableSize=266, cleanupThread=false, autowarmCount=6,
> regenerator=org.apache.solr.search.SolrIndexSearcher$1@143eb77a)
> stats:  lookups : 1725
> hits : 1481
> hitratio :

Re: Avoiding OOM fatal crash

2013-06-17 Thread Mark Miller

There is a java cmd line arg that lets you run a command on OOM - I'd configure 
it to log and kill -9 Solr. Then use runit or something to supervice Solr - so 
that if it's killed, it just restarts.

I think that is the best way to deal with OOM's. Other than that, you have to 
write a middle layer and put limits on user requests before making Solr 
requests.

- Mark

On Jun 17, 2013, at 4:44 PM, Manuel Le Normand  
wrote:

> Hello again,
> 
> After a heavy query on my index (returning 100K docs in a single query) my
> JVM heap's floods and I get an JAVA OOM exception, and then that my
> GCcannot collect anything (GC
> overhead limit exceeded) as these memory chunks are not disposable.
> 
> I want to afford queries like this, my concern is that this case provokes a
> total Solr crash, returning a 503 Internal Server Error while trying to *
> index.*
> 
> Is there anyway to separate these two logics? I'm fine with solr not being
> able to return any response after returning this OOM, but I don't see the
> justification the query to flood JVM's internal (bounded) buffers for
> writings.
> 
> Thanks,
> Manuel

Re: yet another optimize question

2013-06-17 Thread Otis Gospodnetic

Yes, in one of the example solrconfig.xml files this is right above
the merge factor definition.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/





On Mon, Jun 17, 2013 at 8:00 PM, Petersen, Robert
 wrote:
> Hi Upayavira,
>
> You might have gotten it.  Yes we noticed maxdocs was way bigger than 
> numdocs.  There were a lot of files ending in '.del' in the index folder 
> also.  We started on 1.3 also.   I don't currently have any solr config 
> settings for MergePolicy at all.  Am I going to want to put something like 
> this into my index defaults section?
>
> 
>10
>10
> 
>
> Thanks
> Robi
>
> -Original Message-
> From: Upayavira [mailto:u...@odoko.co.uk]
> Sent: Monday, June 17, 2013 12:29 PM
> To: solr-user@lucene.apache.org
> Subject: Re: yet another optimize question
>
> The key figures are numdocs vs maxdocs. Maxdocs-numdocs is the number of 
> deleted docs in your index.
>
> This is a 3.6 system you say. But has it been upgraded? I've seen folks 
> who've upgraded from 1.4 or 3.0/3.1 over time, keeping the old config.
> The consequence of this is that they don't get the right config for the 
> TieredMergePolicy, and therefore don't get to use it, seeing the old 
> behaviour which does require periodic optimise.
>
> Upayavira
>
> On Mon, Jun 17, 2013, at 07:21 PM, Petersen, Robert wrote:
>> Hi Otis,
>>
>> Right I didn't restart the JVMs except on the one slave where I was
>> experimenting with using G1GC on the 1.7.0_21 JRE.   Also some time ago I
>> made all our caches small enough to keep us from getting OOMs while still
>> having a good hit rate.Our index has about 50 fields which are mostly
>> int IDs and there are some dynamic fields also.  These dynamic fields
>> can be used for custom faceting.  We have some standard facets we
>> always facet on and other dynamic facets which are only used if the
>> query is filtering on a particular category.  There are hundreds of
>> these fields but since they are only for a small subset of the overall
>> index they are very sparsely populated with regard to the overall
>> index.  With CMS GC we get a sawtooth on the old generation (I guess
>> every replication and commit causes it's usage to drop down to 10GB or
>> so) and it seems to be the old generation which is the main space
>> consumer.  With the G1GC, the memory map looked totally different!  I
>> was a little lost looking at memory consumption with that GC.  Maybe
>> I'll try it again now that the index is a bit smaller than it was last
>> time I tried it.  After four days without running an optimize now it
>> is 21GB.  BTW our indexing speed is mostly bound by the DB so reducing the 
>> segments might be ok...
>>
>> Here is a quick snapshot of one slaves memory map as reported by
>> PSI-Probe, but unfortunately I guess I can't send the history graphics
>> to the solr-user list to show their changes over time:
>>   NameUsedCommitted   Max
>>  Initial Group
>>Par Survivor Space 20.02 MB108.13 MB   108.13 MB  
>>  108.13 MB   HEAP
>>CMS Perm Gen   42.29 MB70.66 MB82.00 MB20.75 
>> MBNON_HEAP
>>Code Cache 9.73 MB 9.88 MB 48.00 MB2.44 MB 
>> NON_HEAP
>>CMS Old Gen20.22 GB30.94 GB30.94 GB   
>>  30.94 GBHEAP
>>Par Eden Space 42.20 MB865.31 MB   865.31 MB   865.31 
>> MB   HEAP
>>Total  20.33 GB31.97 GB32.02 GB   
>>  31.92 GBTOTAL
>>
>> And here's our current cache stats from a random slave:
>>
>> name:queryResultCache
>> class:   org.apache.solr.search.LRUCache
>> version: 1.0
>> description: LRU Cache(maxSize=488, initialSize=6, autowarmCount=6,
>> regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3)
>> stats:  lookups : 619
>> hits : 36
>> hitratio : 0.05
>> inserts : 592
>> evictions : 101
>> size : 488
>> warmupTime : 2949
>> cumulative_lookups : 681225
>> cumulative_hits : 73126
>> cumulative_hitratio : 0.10
>> cumulative_inserts : 602396
>> cumulative_evictions : 428868
>>
>>
>>  name:fieldCache
>> class:   org.apache.solr.search.SolrFieldCacheMBean
>> version: 1.0
>> description: Provides introspection of the Lucene FieldCache, this is
>> **NOT** a cache that is managed by Solr.
>> stats:  entries_count : 359
>>
>>
>> name:documentCache
>> class:   org.apache.solr.search.LRUCache
>> version: 1.0
>> description: LRU Cache(maxSize=2048, initialSize=512,
>> autowarmCount=10, regenerator=null)
>> stats:  lookups : 12710
>> hits : 7160
>> hitratio : 0.56
>> inserts : 5636
>> evictions : 3588
>> size : 2048
>> warmupTime : 0
>> cumulative_lookups : 10590054
>> cumulative_hits : 6166913
>> cumulative_hitratio : 0.58
>> cumulative_inserts : 4423141
>> cumulative_evictions : 3714653
>>
>>
>> name:fieldValueCache
>> class:

Re: yet another optimize question

2013-06-17 Thread Otis Gospodnetic

Hi Robi,

This goes against the original problem of getting OOMEs, but it looks
like each of your Solr caches could be a little bigger if you want to
eliminate evictions, with the query results one possibly not being
worth keeping if you can't get the hit % up enough.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/





On Mon, Jun 17, 2013 at 2:21 PM, Petersen, Robert
 wrote:
> Hi Otis,
>
> Right I didn't restart the JVMs except on the one slave where I was 
> experimenting with using G1GC on the 1.7.0_21 JRE.   Also some time ago I 
> made all our caches small enough to keep us from getting OOMs while still 
> having a good hit rate.Our index has about 50 fields which are mostly int 
> IDs and there are some dynamic fields also.  These dynamic fields can be used 
> for custom faceting.  We have some standard facets we always facet on and 
> other dynamic facets which are only used if the query is filtering on a 
> particular category.  There are hundreds of these fields but since they are 
> only for a small subset of the overall index they are very sparsely populated 
> with regard to the overall index.  With CMS GC we get a sawtooth on the old 
> generation (I guess every replication and commit causes it's usage to drop 
> down to 10GB or so) and it seems to be the old generation which is the main 
> space consumer.  With the G1GC, the memory map looked totally different!  I 
> was a little lost looking at memory consumption with that GC.  Maybe I'll try 
> it again now that the index is a bit smaller than it was last time I tried 
> it.  After four days without running an optimize now it is 21GB.  BTW our 
> indexing speed is mostly bound by the DB so reducing the segments might be 
> ok...
>
> Here is a quick snapshot of one slaves memory map as reported by PSI-Probe, 
> but unfortunately I guess I can't send the history graphics to the solr-user 
> list to show their changes over time:
> NameUsedCommitted   Max   
>   Initial Group
>  Par Survivor Space 20.02 MB108.13 MB   108.13 MB 
>   108.13 MB   HEAP
>  CMS Perm Gen   42.29 MB70.66 MB82.00 MB20.75 
> MBNON_HEAP
>  Code Cache 9.73 MB 9.88 MB 48.00 MB2.44 MB 
> NON_HEAP
>  CMS Old Gen20.22 GB30.94 GB30.94 GB  
>   30.94 GBHEAP
>  Par Eden Space 42.20 MB865.31 MB   865.31 MB   
> 865.31 MB   HEAP
>  Total  20.33 GB31.97 GB32.02 GB  
>   31.92 GBTOTAL
>
> And here's our current cache stats from a random slave:
>
> name:queryResultCache
> class:   org.apache.solr.search.LRUCache
> version: 1.0
> description: LRU Cache(maxSize=488, initialSize=6, autowarmCount=6, 
> regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3)
> stats:  lookups : 619
> hits : 36
> hitratio : 0.05
> inserts : 592
> evictions : 101
> size : 488
> warmupTime : 2949
> cumulative_lookups : 681225
> cumulative_hits : 73126
> cumulative_hitratio : 0.10
> cumulative_inserts : 602396
> cumulative_evictions : 428868
>
>
>  name:   fieldCache
> class:   org.apache.solr.search.SolrFieldCacheMBean
> version: 1.0
> description: Provides introspection of the Lucene FieldCache, this is 
> **NOT** a cache that is managed by Solr.
> stats:  entries_count : 359
>
>
> name:documentCache
> class:   org.apache.solr.search.LRUCache
> version: 1.0
> description: LRU Cache(maxSize=2048, initialSize=512, autowarmCount=10, 
> regenerator=null)
> stats:  lookups : 12710
> hits : 7160
> hitratio : 0.56
> inserts : 5636
> evictions : 3588
> size : 2048
> warmupTime : 0
> cumulative_lookups : 10590054
> cumulative_hits : 6166913
> cumulative_hitratio : 0.58
> cumulative_inserts : 4423141
> cumulative_evictions : 3714653
>
>
> name:fieldValueCache
> class:   org.apache.solr.search.FastLRUCache
> version: 1.0
> description: Concurrent LRU Cache(maxSize=280, initialSize=280, 
> minSize=252, acceptableSize=266, cleanupThread=false, autowarmCount=6, 
> regenerator=org.apache.solr.search.SolrIndexSearcher$1@143eb77a)
> stats:  lookups : 1725
> hits : 1481
> hitratio : 0.85
> inserts : 122
> evictions : 0
> size : 128
> warmupTime : 4426
> cumulative_lookups : 3449712
> cumulative_hits : 3281805
> cumulative_hitratio : 0.95
> cumulative_inserts : 83261
> cumulative_evictions : 3479
>
>
> name:filterCache
> class:   org.apache.solr.search.FastLRUCache
> version: 1.0
> description: Concurrent LRU Cache(maxSize=248, initialSize=12, 
> minSize=223, acceptableSize=235, cleanupThread=false, autowarmCount=10, 
> regenerator=org.apache.solr.search.SolrIndexSearcher$2@36e831d6)
> stats:  lookups : 3990
> hits : 3831
> hitratio : 0.96
> inserts : 239
> evictions : 26
> size : 244
> warmupTime : 1
> cumulative_lookups : 5745011
> cumulati

mm (Minimum 'Should' Match)

2013-06-17 Thread anand_solr

I am not sure if this is supported out of box in Solr.

Search by giving multiple facet fields and query containing set of values
for each facet field & minimum should match parameter for each facets.  The
result should be document contains facets with minimum match.

Eg: solr documents of type electronic products.  so each document will
contain category [lcd,led,plasma], manufacture [sony,samsung, apple]
information.

query something like

http://localhost:8983/solr/select?q=(category:lcd+OR+category:led+OR+category:plasma)+AND+(manufacture:sony+OR+manufacture:samsung+OR+manufacture:apple)&facet.field=category&facet.field=manufacture&fl=id&mm=2

Should return the documents containing "category with matching atleast 2
keywords [lcd,led or lcd,plasma or other valid combination] AND manufacture
with matching atleast 2 keywords [sony,samsung or samsung,apple or other
valid combination]"...







--
View this message in context: 
http://lucene.472066.n3.nabble.com/mm-Minimum-Should-Match-tp4071197.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Avoiding OOM fatal crash

2013-06-17 Thread Roman Chyla

I think you can modify the response writer and stream results instead of
building them first and then sending in one go. I am using this technique
to dump millions of docs in json format - but in your case you may have to
figure out how to dump during streaming if you don't want to save data to
disk first.

Roman
On 17 Jun 2013 20:02, "Mark Miller"  wrote:

> There is a java cmd line arg that lets you run a command on OOM - I'd
> configure it to log and kill -9 Solr. Then use runit or something to
> supervice Solr - so that if it's killed, it just restarts.
>
> I think that is the best way to deal with OOM's. Other than that, you have
> to write a middle layer and put limits on user requests before making Solr
> requests.
>
> - Mark
>
> On Jun 17, 2013, at 4:44 PM, Manuel Le Normand 
> wrote:
>
> > Hello again,
> >
> > After a heavy query on my index (returning 100K docs in a single query)
> my
> > JVM heap's floods and I get an JAVA OOM exception, and then that my
> > GCcannot collect anything (GC
> > overhead limit exceeded) as these memory chunks are not disposable.
> >
> > I want to afford queries like this, my concern is that this case
> provokes a
> > total Solr crash, returning a 503 Internal Server Error while trying to *
> > index.*
> >
> > Is there anyway to separate these two logics? I'm fine with solr not
> being
> > able to return any response after returning this OOM, but I don't see the
> > justification the query to flood JVM's internal (bounded) buffers for
> > writings.
> >
> > Thanks,
> > Manuel
>
>

Re: mm (Minimum 'Should' Match)

2013-06-17 Thread Jack Krupansky


The "mm" parameter only applies to the top level query, not nested queries.

At the top level you have:

(...) AND (...)

And it's an AND, not OR.

The LucidWorks Search query parser does support minMatch at any level, such 
as:


(...)~2 AND (...)~2

-- Jack Krupansky

-Original Message- 
From: anand_solr

Sent: Monday, June 17, 2013 10:14 PM
To: solr-user@lucene.apache.org
Subject: mm (Minimum 'Should' Match)

I am not sure if this is supported out of box in Solr.

Search by giving multiple facet fields and query containing set of values
for each facet field & minimum should match parameter for each facets.  The
result should be document contains facets with minimum match.

Eg: solr documents of type electronic products.  so each document will
contain category [lcd,led,plasma], manufacture [sony,samsung, apple]
information.

query something like

http://localhost:8983/solr/select?q=(category:lcd+OR+category:led+OR+category:plasma)+AND+(manufacture:sony+OR+manufacture:samsung+OR+manufacture:apple)&facet.field=category&facet.field=manufacture&fl=id&mm=2

Should return the documents containing "category with matching atleast 2
keywords [lcd,led or lcd,plasma or other valid combination] AND manufacture
with matching atleast 2 keywords [sony,samsung or samsung,apple or other
valid combination]"...







--
View this message in context: 
http://lucene.472066.n3.nabble.com/mm-Minimum-Should-Match-tp4071197.html
Sent from the Solr - User mailing list archive at Nabble.com.

Upgrading from 3.6.1 to 4.3.0 and Custom collector

2013-06-17 Thread Peyman Faratin

Hi 

I am migrating from Lucene 3.6.1 to 4.3.0. I am however not sure how to migrate 
my custom collector below. this page 
http://lucene.apache.org/core/4_3_0/MIGRATE.html gives some hints but the 
instructions are incomplete and looking at the source examples of custom 
collectors make me want to go and eat cheesecake - every time !!!

Any advise would be very much appreciated 

thank you


public class AllInLinks extends Collector {
  private Scorer scorer;
  private int docBase;
  private String[] store;
  private HashSet outLinks = new HashSet();

  public boolean acceptsDocsOutOfOrder() {
return true;
  }
  public void setScorer(Scorer scorer) {
this.scorer = scorer;
  }
  public void setNextReader(IndexReader reader, int docBase) 
throws IOException{
this.docBase = docBase;
store = FieldCache.DEFAULT.getStrings(reader,"title");
  }
  public void collect(int doc) throws IOException {
  String page = store[doc];
  outLinks.add(page);
  }
  public void reset() {
  outLinks.clear();
  store = null;
  }
  public int getOutLinks() {
return outLinks.size();
  }
}

Re: mm (Minimum 'Should' Match)

2013-06-17 Thread anand_solr

Thank Jack.  Do you have any link to documents where I can refer for more?

I saw one of the link mentioning something similar that I can extend?  Do
you think this will help?
http://everydaydeveloper.blogspot.com/2013/03/minimum-match-per-index-field-solr.html?m=1



On Mon, Jun 17, 2013 at 10:59 PM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4071202...@n3.nabble.com> wrote:

> The "mm" parameter only applies to the top level query, not nested
> queries.
>
> At the top level you have:
>
> (...) AND (...)
>
> And it's an AND, not OR.
>
> The LucidWorks Search query parser does support minMatch at any level,
> such
> as:
>
> (...)~2 AND (...)~2
>
> -- Jack Krupansky
>
> -Original Message-
> From: anand_solr
> Sent: Monday, June 17, 2013 10:14 PM
> To: [hidden email] 
> Subject: mm (Minimum 'Should' Match)
>
> I am not sure if this is supported out of box in Solr.
>
> Search by giving multiple facet fields and query containing set of values
> for each facet field & minimum should match parameter for each facets.
>  The
> result should be document contains facets with minimum match.
>
> Eg: solr documents of type electronic products.  so each document will
> contain category [lcd,led,plasma], manufacture [sony,samsung, apple]
> information.
>
> query something like
>
>
> http://localhost:8983/solr/select?q=(category:lcd+OR+category:led+OR+category:plasma)+AND+(manufacture:sony+OR+manufacture:samsung+OR+manufacture:apple)&facet.field=category&facet.field=manufacture&fl=id&mm=2
>
> Should return the documents containing "category with matching atleast 2
> keywords [lcd,led or lcd,plasma or other valid combination] AND
> manufacture
> with matching atleast 2 keywords [sony,samsung or samsung,apple or other
> valid combination]"...
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/mm-Minimum-Should-Match-tp4071197.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/mm-Minimum-Should-Match-tp4071197p4071202.html
>  To unsubscribe from mm (Minimum 'Should' Match), click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/mm-Minimum-Should-Match-tp4071197p4071205.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: mm (Minimum 'Should' Match)

2013-06-17 Thread Jack Krupansky

No, although the  Query Parsers chapter of the book has a little more 
description for "mm". I'm hoping I'll have a draft of the book published on 
Friday.


The LucidWorks doc for  minMatch in their query parser is here:

http://docs.lucidworks.com/display/lweug/Minimum+Match+for+Simple+Queries

-- Jack Krupansky

-Original Message- 
From: anand_solr

Sent: Monday, June 17, 2013 11:30 PM
To: solr-user@lucene.apache.org
Subject: Re: mm (Minimum 'Should' Match)

Thank Jack.  Do you have any link to documents where I can refer for more?

I saw one of the link mentioning something similar that I can extend?  Do
you think this will help?
http://everydaydeveloper.blogspot.com/2013/03/minimum-match-per-index-field-solr.html?m=1



On Mon, Jun 17, 2013 at 10:59 PM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4071202...@n3.nabble.com> wrote:


The "mm" parameter only applies to the top level query, not nested
queries.

At the top level you have:

(...) AND (...)

And it's an AND, not OR.

The LucidWorks Search query parser does support minMatch at any level,
such
as:

(...)~2 AND (...)~2

-- Jack Krupansky

-Original Message-
From: anand_solr
Sent: Monday, June 17, 2013 10:14 PM
To: [hidden email] 
Subject: mm (Minimum 'Should' Match)

I am not sure if this is supported out of box in Solr.

Search by giving multiple facet fields and query containing set of values
for each facet field & minimum should match parameter for each facets.
 The
result should be document contains facets with minimum match.

Eg: solr documents of type electronic products.  so each document will
contain category [lcd,led,plasma], manufacture [sony,samsung, apple]
information.

query something like


http://localhost:8983/solr/select?q=(category:lcd+OR+category:led+OR+category:plasma)+AND+(manufacture:sony+OR+manufacture:samsung+OR+manufacture:apple)&facet.field=category&facet.field=manufacture&fl=id&mm=2

Should return the documents containing "category with matching atleast 2
keywords [lcd,led or lcd,plasma or other valid combination] AND
manufacture
with matching atleast 2 keywords [sony,samsung or samsung,apple or other
valid combination]"...







--
View this message in context:
http://lucene.472066.n3.nabble.com/mm-Minimum-Should-Match-tp4071197.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
 If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/mm-Minimum-Should-Match-tp4071197p4071202.html
 To unsubscribe from mm (Minimum 'Should' Match), click 
here

.
NAML






--
View this message in context: 
http://lucene.472066.n3.nabble.com/mm-Minimum-Should-Match-tp4071197p4071205.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to define my data in schema.xml

2013-06-17 Thread Gora Mohanty

On 18 June 2013 01:10, Mysurf Mail  wrote:
> Thanks for your quick reply. Here are some notes:
>
> 1. Consider that all tables in my example have two columns: Name &
> Description which I would like to index and search.
> 2. I have no other reason to create flat table other than for solar. So I
> would like to see if I can avoid it.
> 3. If in my example I will have a flat table then obviously it will hold a
> lot of rows for a single school.
> By searching the exact school name I will likely receive a lot of rows.
> (my flat table has its own pk)

Yes, all of this is definitely the case, but in practice
it does not matter. Solr can efficiently search through
millions of rows. To start with, just try the simplest
approach, and only complicate things as and when
needed.

> That is something I would like to avoid and I thought I can avoid this
> by defining teachers and students as multiple value or something like this
> and than teacherCourses and studentHobbies  as 1:n respectively.
> This is quite similiar to my real life demand, so I came here to get
> some tips as a solr noob.

You have still not described what are the searches that
you would want to do. Again, I would suggest starting
with the most straightforward approach.

Regards,
Gora

what does a zero score mean?

2013-06-17 Thread Joe Zhang

I issued a simple query ("apple") to my collection and got 201 documents
back, all of which are scored 0. What does this mean? --- The documents do
contain the query words.

Re: what does a zero score mean?

2013-06-17 Thread Gora Mohanty

On 18 June 2013 10:49, Joe Zhang  wrote:
> I issued a simple query ("apple") to my collection and got 201 documents
> back, all of which are scored 0. What does this mean? --- The documents do
> contain the query words.

My guess is that the float-valued score is getting
converted to an integer. You could also try your
query with the parameter &debugQuery=on
to get an explanation of the scoring:
http://wiki.apache.org/solr/CommonQueryParameters#debugQuery

Regards,
Gora

1 2 >

1 - 100 of 104 matches

Mail list logo