Re: Problem with migration to SolrAdaptersForLuceneSpatial4

2012-11-29 Thread Viacheslav Davidovich
Yes, re-indexing is the first thing which I execute when field changed.

And now looks like migration to solr4 finished. 

Thanks you for your answers David. 

WBR Viacheslav.

On 28.11.2012, at 18:25, David Smiley (@MITRE.org) wrote:

> Viacheslav,
> Did you re-index?  Clearly re-indexing is needed when changing field types.
> ~ David
> 
> From: Viacheslav Davidovich [via Lucene] 
> [ml-node+s472066n4022861...@n3.nabble.com]
> Sent: Wednesday, November 28, 2012 4:42 AM
> To: Smiley, David W.
> Subject: Re: Problem with migration to SolrAdaptersForLuceneSpatial4
> 
> Hi David,
> 
> thank you for reply.
> 
> Actually when I change the fieldType to
> 
>  subFieldSuffix="_coordinate" />
> 
> some magic happens and old query start to work.
> 
> And this change resolve my problems with distance calculation even without 
> solr.SpatialRecursivePrefixTreeFieldType field type usage.
> 
> WBR Viacheslav.
> 
> On 26.11.2012, at 18:52, David Smiley (@MITRE.org) wrote:
> 
>> Hi Viacheslav,
>> 
>> 1. You don't need JTS unless you're using polygons or WKT and your examples
>> uses neither.  So you can remove the spatialContext attribute to use the
>> default, and remove the JTS jar.  But that shouldn't be related to your
>> reported problem.
>> 
>> 2. The units for d= in the circle are in degrees (111.2 km per degree) --
>> this is why units="degrees" on the field type.  Granted this
>> misunderstanding should yield a larger circle than what you intended, and so
>> doesn't explain your reported problem.
>> 
>> Honestly I'm a bit stumped as what you are doing should work.  I assume of
>> course you indexed your points into this field like > name="some_loc">45.15,-93.85   Can you try varying some things, like
>> using a rectangle query, perhaps even of the whole world?  Remove
>> spatialContext attribute  (no need to re-index for points)?
>> 
>> By the way, the "boost" attribute on your field is strange to me; I didn't
>> know you could do that; are you sure you can?  I suspect it is erroneous.
>> 
>> ~ David
>> 
>> 
>> 
>> 
>> -
>> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Problem-with-migration-to-SolrAdaptersForLuceneSpatial4-tp4022298p4022384.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://lucene.472066.n3.nabble.com/Problem-with-migration-to-SolrAdaptersForLuceneSpatial4-tp4022298p4022861.html
> To unsubscribe from Problem with migration to SolrAdaptersForLuceneSpatial4, 
> click 
> here.
> NAML
> 
> 
> 
> 
> -
> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Problem-with-migration-to-SolrAdaptersForLuceneSpatial4-tp4022298p4022959.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Daniel Exner

Hi Solr community,

I'm currently doing some benchmarking of a real Solr 3.3 instance vs the 
same ported to Solr 4.0.


Benchmarking is done using JMeter from localhost.
Test scenario is a constant stream of queries from a log file out of 
production, at targeted 50 QPS.
After some time (marked in graph) I do a push via REST interface of the 
whole index data (796M XML), wait some time and do a optimize via REST.


Testmachine is a VM on a "Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GH", 
one core and 2Gb RAM attached.
Both Solr instances are running in the same Tomcat and are not used 
otherwise than testing.


Expected results where a lower overall load for Solr 4 and a lower 
latency while pushing new data.


In the graph you can see high CPU load, all the time. This is even the 
case if I reduce the QPS down to 5, so CPU is no good metric for 
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having 
time-outs sometimes.


You can also see no real increase in latency when pushing data into the 
index. This is puzzling me, as rumours say one should not push new data 
while under high load, as this would hurt query performance.


Anyone did similar tests before and may comment on that?

Greetings
Daniel Exner
--
Daniel Exner
Softwaredevelopment & Applicatiosupport
ESEMOS GmbH



Solr hangs after core reload

2012-11-29 Thread O. Klein
Every time I try to do something with the cores from the admin UI, Solr hangs
with no exceptions.

Anyone else experiencing this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-hangs-after-core-reload-tp4023206.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Downloading files from the solr replication Handler

2012-11-29 Thread Eva Lacy
I tried downloading them with my browser and also with a c# WebRequest.
If I skip the first and last 4 bytes it seems work fine.


On Thu, Nov 29, 2012 at 2:28 AM, Erick Erickson wrote:

> How are you downloading them? I suspect the issue is
> with the download process rather than Solr, but I'm just guessing.
>
> Best
> Erick
>
>
> On Wed, Nov 28, 2012 at 12:19 PM, Eva Lacy  wrote:
>
> > Just to add to that, I'm using solr 3.6.1
> >
> >
> > On Wed, Nov 28, 2012 at 5:18 PM, Eva Lacy  wrote:
> >
> > > I downloaded some configuration and data files directly from solr in an
> > > attempt to develop a backup solution.
> > > I noticed there is some characters at the start and end of the file
> that
> > > aren't in configuration files, I notice the same characters at the
> start
> > > and end of the data files.
> > > Anyone with any idea how I can download these files without the extra
> > > characters or predict how many there are going to be so I can skip
> them?
> > >
> >
>


Re: Multi word synonyms

2012-11-29 Thread O. Klein
Found an article about the issue of  multi word synonyms
  .

Not sure it's the solution I'm looking for, but it may be for someone else.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p4023220.html
Sent from the Solr - User mailing list archive at Nabble.com.


query for dimensional type

2012-11-29 Thread saoussen

I have a problem with query for dimensional type.

In fact in my schema.xml I added this filed:



with type:



In java class used for indexation I had this declaration:
@Field("myObjects") 
List  myObject;

and after indexation I can see this : 
[MyObject(x=1, y=32, z=247)]

now I want to search object having z=247, but with this query "
q=myObjects:*, *, 247 " result is 0.

which syntax should I use?

thanks for help!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-for-dimensional-type-tp4023215.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi word synonyms

2012-11-29 Thread Bernd Fehling
There are also other solutions:

Multi-word synonym filter (synonym expansion)
https://issues.apache.org/jira/browse/LUCENE-4499

Since Solr 3.4 i have my own solution which might be obsolete if
LUCENE-4499 will be in a released version.
http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html


Am 29.11.2012 13:44, schrieb O. Klein:
> Found an article about the issue of  multi word synonyms
>   .
> 
> Not sure it's the solution I'm looking for, but it may be for someone else.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p4023220.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Excluding caching of queryresult

2012-11-29 Thread richardg
Thanks Erick, I just found this ticket implying that is able to be used by
the main query also:

https://issues.apache.org/jira/browse/SOLR-2429
  

As you stated queryResultCache is cheap, I guess it would nice to get a
definitive answer and example as it would be used for other caches.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Excluding-caching-of-queryresult-tp4023105p4023233.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Jetty Form Too Large Exception

2012-11-29 Thread Marcin Rzewucki
Hi,

I think you should change/set value for multipartUploadLimitInKB attribute
of requestParsers in solrconfig.xml

Regards.

On 29 November 2012 07:58, deniz  wrote:

> hello,
>
> during tests, I keep getting
>
> SEVERE: null:java.lang.IllegalStateException: Form too large305367>20
> at
> org.eclipse.jetty.server.Request.extractParameters(Request.java:279)
> at
> org.eclipse.jetty.server.Request.getParameterMap(Request.java:705)
> at
> org.apache.solr.request.ServletSolrParams.(ServletSolrParams.java:29)
> at
>
> org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:394)
> at
>
> org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> at org.eclipse.jetty.server.Server.handle(Server.java:351)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> at
>
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> at java.lang.Thread.run(Unknown Source)
>
>
>
> After googling a little bit, i found where to change the value, but it
> didnt
> work, I have changed etc/jetty.xml's related part, making the value from
> 20 to 99something, but after restarting, I keep getting the same
> error... is there any other place that I should check for this jetty form
> size error?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Jetty-Form-Too-Large-Exception-tp4023185.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Delete documents in SOLR 1.4.1

2012-11-29 Thread Illu.Y.Ying (mis.sh04.Newegg) 41417
I tried your second case with SOLR3.5, It runs fine and the record could be 
deleted when you only configure deletedPkQuery.
Could you consider upgrading your SOLR to version 3.5? 

Best Regards,
Illu Ying

-邮件原件-
发件人: RPSolrUser [mailto:roopa.parek...@gmail.com] 
发送时间: 2012年11月27日 23:49
收件人: solr-user@lucene.apache.org
主题: Delete documents in SOLR 1.4.1

Question: We have Solr 1.4.1 in Production currently. We need to delete 
documents identified by ids on daily basis from solr.

--Following deletes by keys and delta data load works--
-- snippet of db-data-config.xml


  
 

  






-- snippet of solrconfig.xml
   

   

db-data-config.xml

  
  
I am invoking Delta-Import using dataimport.jsp. Everything works good.
Delta data (indentifed by SOLR_PART_SUPPLIER_AGREE_DELTA) get indexed and bunch 
of documents identified by solr_part_supplier_agreement_del get deleted.
 
However, I have a need to just delete the documents from another solr core 
without loading delta data. How can I achieve that?

 -- This does not work. Does not delete any docs as deletedPkQuery is supposed 
only work with delta loads --  






Question: How can I just delete the documents by id? Is deleteDocById the 
answer? How can I use it? I tried but the DIH does not show any updates. Can we 
use any other approach?

Do I need to patch SOLR 1.4.1 with any bug fix?
Example of deleteDocById is appreciated.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-documents-in-SOLR-1-4-1-tp4022660.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Shawn Heisey

On 11/29/2012 3:15 AM, Daniel Exner wrote:
I'm currently doing some benchmarking of a real Solr 3.3 instance vs 
the same ported to Solr 4.0.


Benchmarking is done using JMeter from localhost.
Test scenario is a constant stream of queries from a log file out of 
production, at targeted 50 QPS.
After some time (marked in graph) I do a push via REST interface of 
the whole index data (796M XML), wait some time and do a optimize via 
REST.


Testmachine is a VM on a "Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GH", 
one core and 2Gb RAM attached.
Both Solr instances are running in the same Tomcat and are not used 
otherwise than testing.


Expected results where a lower overall load for Solr 4 and a lower 
latency while pushing new data.


In the graph you can see high CPU load, all the time. This is even the 
case if I reduce the QPS down to 5, so CPU is no good metric for 
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having 
time-outs sometimes.


You can also see no real increase in latency when pushing data into 
the index. This is puzzling me, as rumours say one should not push new 
data while under high load, as this would hurt query performance.


I don't see any attachments, or any links to external attachments, so I 
can't see the graph.  I can only make general statements, and I can't 
guarantee that they'll even be applicable to your scenario.  You may 
need to use an external attachment service and just send us a link.


Are you seeing lower performance, or just worried about the CPU load?  
Solr4 should be able to handle concurrent indexing and querying better 
than 3.x.  It is able to do things concurrently that were not possible 
before.


One way that performance improvements happen is that developers find 
slow sections of code where the CPU is fairly idle, and rewrite them so 
they are faster, but also exercise the CPU harder.  When the new code 
runs, CPU load goes higher, but it all runs faster.


Thanks,
Shawn



Re: Multi word synonyms

2012-11-29 Thread Jack Krupansky
Yes, it is sad but true that multi-word synonym processing does not "work 
right out of the box" for all common interesting cases, although it does do 
semi-well for index-time processing, but even there, matching synonyms of 
varying lengths within larger phrases will sometimes work but sometimes not 
unless you all some amount of phrase slop.


The LucidWorks Search query parser does handle query-time synonyms 
reasonably well, but using some complicated, ad hoc processing that is not 
easy to replicate in your average application that doesn't have that extra, 
proprietary "magic". If you want robust, query-time processing of synonyms 
(which is a lot more flexible than index-time processing), you would need to 
replicate some form of that logic.


A couple of months ago I did propose that we design and implement a set of 
interfaces to support robust handling of multi-word synonyms at query time, 
but there was... NO interest expressed by any developers. Since then, the 
Lucene and Solr query parsers have diverged even further, making the support 
for such an interface even more problematic - unless we just bite the bullet 
and say that the Lucene query parser is a hopeless dinosaur and leave it 
behind in the dust as a remnant of "the early days" of Lucene and Solr. 
Also, the fact that we still have three distinct main Solr query parsers 
(SolrQueryParser, a derivative of the classic Lucene query parser, dismax, 
and edismax) still makes this task rather problematic, let alone the fact 
that there are a number of other "niche" query parsers which could use 
better synonym processing, make this a very daunting task. If we ever do 
integrate the "big three" (and write the Lucene query parser), then maybe 
the time will be ripe to revisit robust query-time multi-word synonym 
support.


(Or, maybe LucidWorks will finally donate their query parser!)

-- Jack Krupansky

-Original Message- 
From: Bernd Fehling

Sent: Thursday, November 29, 2012 8:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

There are also other solutions:

Multi-word synonym filter (synonym expansion)
https://issues.apache.org/jira/browse/LUCENE-4499

Since Solr 3.4 i have my own solution which might be obsolete if
LUCENE-4499 will be in a released version.
http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html


Am 29.11.2012 13:44, schrieb O. Klein:

Found an article about the issue of  multi word synonyms
  .

Not sure it's the solution I'm looking for, but it may be for someone 
else.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p4023220.html

Sent from the Solr - User mailing list archive at Nabble.com.





Re: Excluding caching of queryresult

2012-11-29 Thread richardg
Using cache=false seems to Not be caching the query result. I ran queries
against our master server that doesn't get web traffic with and without the
parameter and would only notice inserts when the parameter wasn't included.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Excluding-caching-of-queryresult-tp4023105p4023241.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Shawn Heisey

On 11/29/2012 3:15 AM, Daniel Exner wrote:
I'm currently doing some benchmarking of a real Solr 3.3 instance vs 
the same ported to Solr 4.0.


Another note specifically related to this part: Have you used the same 
configuration and done the minimal changes required to make it run, or 
have you tried to update the config for 4.0 and its considerable list of 
new features?  Did you start with a blank index on 4.0, or did you copy 
the 3.3 index over?


There's no wrong answer to these questions.  Depending on exactly what 
you are trying to do, what is right for someone else may not be right 
for you.  The answers will help narrow the discussion.


Thanks,
Shawn



RE: Permanently Full Old Generation...

2012-11-29 Thread Shawn Heisey
> My jvm settings:
>
>
> -Xmx8192M -Xms8192M -XX:+CMSScavengeBeforeRemark -XX:NewRatio=2
> -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseCMSInitiatingOccupancyOnly -XX:-CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycle=75
>
> I turned off IncrementalPacing, and enabled
> CMSInitiatingOccupancyFraction,
> after issues with nodes being reported as down due to large Garbage
> collection pauses.  The problem with the memory profile was visible before
> the drop down to 1.2GB (this was when I reloaded the core), my concern was
> that the collection of the old generation didn't seem to free any of the
> heap, and we went from occasionally collecting to always collecting the
> old
> gen.
>
> Please see the attached gc log.

I am on the train for my morning commute, so I have some time, but no
access to the log or graph.

Confession time: GC logs make me go glassy eyed and babble incoherently,
but I did take a look at it. I saw 18 CMS collections and three entries
near the end that saif Full GC. It looks like these collections take 6 to
8 seconds. That is pretty nasty, but probably unavoidable, so the goal is
to make them happen extremely infrequently - do young generation
collections instead.

The thing that seems to make GC less of a problem for solr is maximizing
the young generation memory pool. Based on the available info, I would
start with making NewRatio 1 instead of 2.  This will increase the eden
size and decrease the old gen size. You may even want to use an explicit
-Xmn of 6144.  If that doesn't help, you might actually need 6GB or so of
old gen heap, so try increasing the overall heap size to 9 or 10 GB and
going back to a NewRatio of 2.

Thanks,
Shawn




Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Daniel Exner

I'll answer both your mails in one.

Shawn Heisey wrote:

On 11/29/2012 3:15 AM, Daniel Exner wrote:

I'm currently doing some benchmarking of a real Solr 3.3 instance vs
the same ported to Solr 4.0.

[..]

In the graph you can see high CPU load, all the time. This is even the
case if I reduce the QPS down to 5, so CPU is no good metric for
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having
time-outs sometimes.

You can also see no real increase in latency when pushing data into
the index. This is puzzling me, as rumours say one should not push new
data while under high load, as this would hurt query performance.


I don't see any attachments, or any links to external attachments, so I
can't see the graph.  I can only make general statements, and I can't
guarantee that they'll even be applicable to your scenario.  You may
need to use an external attachment service and just send us a link.
Indeed, it seems like the mailing list daemon scrubbed my attachement. I 
dropped it into my Dropbox, here http://db.tt/EjYCqbpn



Are you seeing lower performance, or just worried about the CPU load?
Solr4 should be able to handle concurrent indexing and querying better
than 3.x.  It is able to do things concurrently that were not possible
before.
In general I'm interested in how much better Solr 4 performs and if it 
may be feasonable to use less powerful machines to get the same low 
latency, or do more data pushes etc.



One way that performance improvements happen is that developers find
slow sections of code where the CPU is fairly idle, and rewrite them so
they are faster, but also exercise the CPU harder.  When the new code
runs, CPU load goes higher, but it all runs faster.
Graphs show a slightly better latency for Solr 4.0 compared to 3.3, but 
not while pushing data.




Another note specifically related to this part: Have you used the same
configuration and done the minimal changes required to make it run, or
have you tried to update the config for 4.0 and its considerable list of
new features?  Did you start with a blank index on 4.0, or did you copy
the 3.3 index over?

I used the same configuration and did the minimal changes.
The first runs where using the same data from Solr 3.3 in Solr 4.0 (in 
fact it was even the same data dir..) but further runs used freshly 
filled different indices.



Greetings
Daniel Exner
--
Daniel Exner
Softwaredevelopment & Applicationsupport
ESEMOS GmbH


Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Shawn Heisey

On 11/29/2012 8:29 AM, Daniel Exner wrote:

I'll answer both your mails in one.

Shawn Heisey wrote:

On 11/29/2012 3:15 AM, Daniel Exner wrote:

I'm currently doing some benchmarking of a real Solr 3.3 instance vs
the same ported to Solr 4.0.

[..]

In the graph you can see high CPU load, all the time. This is even the
case if I reduce the QPS down to 5, so CPU is no good metric for
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having
time-outs sometimes.

You can also see no real increase in latency when pushing data into
the index. This is puzzling me, as rumours say one should not push new
data while under high load, as this would hurt query performance.


I don't see any attachments, or any links to external attachments, so I
can't see the graph.  I can only make general statements, and I can't
guarantee that they'll even be applicable to your scenario.  You may
need to use an external attachment service and just send us a link.
Indeed, it seems like the mailing list daemon scrubbed my attachement. 
I dropped it into my Dropbox, here http://db.tt/EjYCqbpn



Are you seeing lower performance, or just worried about the CPU load?
Solr4 should be able to handle concurrent indexing and querying better
than 3.x.  It is able to do things concurrently that were not possible
before.
In general I'm interested in how much better Solr 4 performs and if it 
may be feasonable to use less powerful machines to get the same low 
latency, or do more data pushes etc.



One way that performance improvements happen is that developers find
slow sections of code where the CPU is fairly idle, and rewrite them so
they are faster, but also exercise the CPU harder. When the new code
runs, CPU load goes higher, but it all runs faster.
Graphs show a slightly better latency for Solr 4.0 compared to 3.3, 
but not while pushing data.




Another note specifically related to this part: Have you used the same
configuration and done the minimal changes required to make it run, or
have you tried to update the config for 4.0 and its considerable list of
new features?  Did you start with a blank index on 4.0, or did you copy
the 3.3 index over?

I used the same configuration and did the minimal changes.
The first runs where using the same data from Solr 3.3 in Solr 4.0 (in 
fact it was even the same data dir..) but further runs used freshly 
filled different indices.


For best results, you'll want to ensure that Solr4 is working completely 
from scratch, that it has never seen a 3.3 index, so that it will use 
its own native format.  It may be a good idea to look into the example 
Solr4 config/schema and see whether there are improvements you can 
make.  One note: the updateLog feature in the update handler config will 
generally cause performance to be lower.  The features that require 
updateLog would make this less of an apples to apples comparison, so I 
wouldn't enable it unless I knew I needed it.


Unless the lines are labelled wrong in the legend, the graph does show 
higher CPU usage during the push, but lower CPU usage during the 
optimize and most of the rest of the time.


The graph shows that Solr4 has lower latency than 3.3 during both the 
push and the optimize, as well as most of the rest of the time.  The 
latency numbers however are a lot higher than I would expect, seeming to 
average out at around 100 seconds (10 ms).  That is terrible 
performance from both versions.  On my own Solr installation, which is 
distributed and has 78 million documents, I have a median latency of 8 
milliseconds and a 95th percentile latency of 248 milliseconds.


Is this a 64-bit platform with a 64-bit Java?  How much memory have you 
allocated for the java heap?  How big is the index?


Thanks,
Shawn



Re: Permanently Full Old Generation...

2012-11-29 Thread Andy Kershaw
Thanks for responding Shawn.

Annette is away until Monday so I am looking into this in the meantime.
Looking at the times of the Full GC entries at the end of the log, I think
they are collections we started manually through jconsole to try and reduce
the size of the old generation. This only seemed to have an effect when we
reloaded the core first though.

It is my understanding that the eden size is deliberately smaller to keep
the ParNew collection time down. If it takes too long then the node is
flagged as down.

On 29 November 2012 15:28, Shawn Heisey  wrote:

> > My jvm settings:
> >
> >
> > -Xmx8192M -Xms8192M -XX:+CMSScavengeBeforeRemark -XX:NewRatio=2
> > -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> > -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70
> > -XX:+UseCMSInitiatingOccupancyOnly -XX:-CMSIncrementalPacing
> > -XX:CMSIncrementalDutyCycle=75
> >
> > I turned off IncrementalPacing, and enabled
> > CMSInitiatingOccupancyFraction,
> > after issues with nodes being reported as down due to large Garbage
> > collection pauses.  The problem with the memory profile was visible
> before
> > the drop down to 1.2GB (this was when I reloaded the core), my concern
> was
> > that the collection of the old generation didn't seem to free any of the
> > heap, and we went from occasionally collecting to always collecting the
> > old
> > gen.
> >
> > Please see the attached gc log.
>
> I am on the train for my morning commute, so I have some time, but no
> access to the log or graph.
>
> Confession time: GC logs make me go glassy eyed and babble incoherently,
> but I did take a look at it. I saw 18 CMS collections and three entries
> near the end that saif Full GC. It looks like these collections take 6 to
> 8 seconds. That is pretty nasty, but probably unavoidable, so the goal is
> to make them happen extremely infrequently - do young generation
> collections instead.
>
> The thing that seems to make GC less of a problem for solr is maximizing
> the young generation memory pool. Based on the available info, I would
> start with making NewRatio 1 instead of 2.  This will increase the eden
> size and decrease the old gen size. You may even want to use an explicit
> -Xmn of 6144.  If that doesn't help, you might actually need 6GB or so of
> old gen heap, so try increasing the overall heap size to 9 or 10 GB and
> going back to a NewRatio of 2.
>
> Thanks,
> Shawn
>


Re: Permanently Full Old Generation...

2012-11-29 Thread Shawn Heisey

On 11/29/2012 10:44 AM, Andy Kershaw wrote:

Annette is away until Monday so I am looking into this in the meantime.
Looking at the times of the Full GC entries at the end of the log, I think
they are collections we started manually through jconsole to try and reduce
the size of the old generation. This only seemed to have an effect when we
reloaded the core first though.

It is my understanding that the eden size is deliberately smaller to keep
the ParNew collection time down. If it takes too long then the node is
flagged as down.


Your ParNew collections are taking less than 1 second (some WAY less 
than one second) to complete and the CMS collections are taking far 
longer -- 6 seconds seems to be a common number in the GC log.  GC is 
unavoidable with Java, so if there has to be a collection, you 
definitely want it to be on the young generation (ParNew).


Controversial idea coming up, nothing concrete to back it up.  This 
means that you might want to wait for a committer to weigh in:  I have 
seen a lot of recent development work relating to SolrCloud and shard 
stability.  You may want to check out branch_4x from SVN and build that, 
rather than use 4.0.  I don't have any idea what the timeline for 4.1 
is, but based on what I saw for 3.x releases, it should be released 
relatively soon.


The above advice is a bad idea if you have to be able to upgrade from 
one 4.1 snapshot to a later one without reindexing. There is a 
possibility that the 4.1 index format will change before release and 
require a reindex, it has happened at least twice already.


Thanks,
Shawn



Re: Solr cloud recovery, why does restarting leader need replicas?

2012-11-29 Thread Daniel Collins
Hi Mark,

I get that use case, if the non-leader dies, when it comes back it has to
allow for recovery, that makes perfect sense.

I guess I was (naively!) assuming there was an optimized scenario if the
leader dies, and is the first one to come back (is still therefore leader),
there is no recovery to do as by definition no updates can have been made
whilst the shard was inactive.

Aside: Interesting point about Solr only ack updates when they are on every
replica, are you talking about when the records are removed from the
transaction log?

My understanding was the the external "update" request completes as soon as
the document has made it to the leader's transaction log (might not even
have committed into the leader index), and the replicas then were pushed
those updates as they became available.

If a single replica dies, the leader can still process update/add document
requests, so it can't be waiting for replicas in that scenario?

> On Nov 28, 2012, at 11:58 AM, Mark Miller  wrote:
>
>
>  and we don't want to lose any updates.
>
>
> That's probably somewhat inaccurate - in this case it's more about
> consistency - we only ack updates once they are on every replica. So it's
> not a lost updates issue, but a consistency issue.
>
> The lost updates part is more like when you stop the cluster, than you
> start an old shard or something before starting more recent shards - you
> don't want that thing to become the leader because the other shards were
> not up yet.
>
> - Mark
>
>


Re: Permanently Full Old Generation...

2012-11-29 Thread Walter Underwood
Several suggestions.

1. Adjust the traffic load for about 75% CPU. When you hit 100%, you are 
already in an overload state and the variance of the response times goes way 
up. You'll have very noisy benchmark data.

2. Do not force manual GCs during a benchmark.

3. Do not force merge (optimise). That is a very expensive operation and will 
cause slowdowns.

4. Make eden big enough to hold all data allocated during a request for all 
simultaneous requests. All that stuff is garbage after the end of the request. 
If eden fills up, it will be allocated from the tenured space and cause that to 
grow unnecessarily. We use an 8GB heap and 2GB eden. I like setting the size 
better than setting ratios.

5. What version of the JVM are you using?

wunder

On Nov 29, 2012, at 10:15 AM, Shawn Heisey wrote:

> On 11/29/2012 10:44 AM, Andy Kershaw wrote:
>> Annette is away until Monday so I am looking into this in the meantime.
>> Looking at the times of the Full GC entries at the end of the log, I think
>> they are collections we started manually through jconsole to try and reduce
>> the size of the old generation. This only seemed to have an effect when we
>> reloaded the core first though.
>> 
>> It is my understanding that the eden size is deliberately smaller to keep
>> the ParNew collection time down. If it takes too long then the node is
>> flagged as down.
> 
> Your ParNew collections are taking less than 1 second (some WAY less than one 
> second) to complete and the CMS collections are taking far longer -- 6 
> seconds seems to be a common number in the GC log.  GC is unavoidable with 
> Java, so if there has to be a collection, you definitely want it to be on the 
> young generation (ParNew).
> 
> Controversial idea coming up, nothing concrete to back it up.  This means 
> that you might want to wait for a committer to weigh in:  I have seen a lot 
> of recent development work relating to SolrCloud and shard stability.  You 
> may want to check out branch_4x from SVN and build that, rather than use 4.0. 
>  I don't have any idea what the timeline for 4.1 is, but based on what I saw 
> for 3.x releases, it should be released relatively soon.
> 
> The above advice is a bad idea if you have to be able to upgrade from one 4.1 
> snapshot to a later one without reindexing. There is a possibility that the 
> 4.1 index format will change before release and require a reindex, it has 
> happened at least twice already.
> 
> Thanks,
> Shawn
> 

--
Walter Underwood
wun...@wunderwood.org





Re: Solr cloud recovery, why does restarting leader need replicas?

2012-11-29 Thread Mark Miller

On Nov 29, 2012, at 1:26 PM, Daniel Collins  wrote:

> Hi Mark,
> 
> I get that use case, if the non-leader dies, when it comes back it has to
> allow for recovery, that makes perfect sense.
> 
> I guess I was (naively!) assuming there was an optimized scenario if the
> leader dies, and is the first one to come back (is still therefore leader),
> there is no recovery to do as by definition no updates can have been made
> whilst the shard was inactive.
> 
> Aside: Interesting point about Solr only ack updates when they are on every
> replica, are you talking about when the records are removed from the
> transaction log?
> 
> My understanding was the the external "update" request completes as soon as
> the document has made it to the leader's transaction log (might not even
> have committed into the leader index), and the replicas then were pushed
> those updates as they became available.

No, currently it won't return until the update hits the replicas - its sent to 
replicas in parallel.

> 
> If a single replica dies, the leader can still process update/add document
> requests, so it can't be waiting for replicas in that scenario?

There should be no wait if there are any nodes waiting in line to be leader - 
it should only wait when a node comes up and realizes it's the leader and no 
one else was in line to be leader.

- Mark

> 
>> On Nov 28, 2012, at 11:58 AM, Mark Miller  wrote:
>> 
>> 
>> and we don't want to lose any updates.
>> 
>> 
>> That's probably somewhat inaccurate - in this case it's more about
>> consistency - we only ack updates once they are on every replica. So it's
>> not a lost updates issue, but a consistency issue.
>> 
>> The lost updates part is more like when you stop the cluster, than you
>> start an old shard or something before starting more recent shards - you
>> don't want that thing to become the leader because the other shards were
>> not up yet.
>> 
>> - Mark
>> 
>> 



Re: How to boost certain docs when search certain terms

2012-11-29 Thread Mikhail Khludnev
Hello Floyd,

I can suggest to start from
http://wiki.apache.org/solr/ExtendedDisMax#bq_.28Boost_Query.29 specifying
bq=id:(1 2 3 4 ... )

Good Luck


On Wed, Nov 28, 2012 at 11:34 PM, Floyd Wu  wrote:

> Sorry if this is a duplicated question, I have no luck to get started.
>
> What's possible solution to do this, please guide me a way.
>
> Many Thanks !
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Grouping by a date field

2012-11-29 Thread sdanzig
I'm trying to create a SOLR query that groups/field collapses by date.  I
have a field in -MM-dd'T'HH:mm:ss'Z' format, "datetime", and I'm looking
to group by just per day.  When grouping on this field using
group.field=datetime in the query, SOLR responds with a group for every
second.  I'm able to easily use this field to create day-based facets, but
not groups.  Advice please?

- Scott



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-by-a-date-field-tp4023318.html
Sent from the Solr - User mailing list archive at Nabble.com.


surround parser not working for span queries

2012-11-29 Thread Anirudha Jadhav
I was trying to port surround parer in 4.0 to 3.5

After getting the plugin to work I am not able to get the following results:

http://localhost:8983/solr/collection1/select?q=_query_:{!surround}features:(document3w
shiny)
this works on 4.0 but not on 3.5 with the plugin installed

3.5 query
http://localhost:8983/solr/select?q=_query_:{!surround}features:(document3w
shiny)

queries with and/or/not work correctly.

any suggestion as to what might be the problem ?

thanks,


code
-
import org.apache.lucene.queryParser.surround.query.SrndQuery;
import org.apache.lucene.queryParser.ParseException;//import
org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.search.Query;
import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.handler.SnapPuller;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.schema.IndexSchema;
import org.apache.lucene.queryParser.surround.parser.*;
import org.apache.lucene.queryParser.surround.query.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * porting solr 4.0 surround parser to 3.5
 */

public class SurroundQParserPlugin extends QParserPlugin {
  public static String NAME = "surround";

  @Override
  public void init(NamedList args) {
  }

  @Override
  public QParser createParser(String qstr, SolrParams localParams,
  SolrParams params, SolrQueryRequest req) {
return new SurroundQParser(qstr, localParams, params, req);
  }

}

class SurroundQParser extends QParser {
  protected static final Logger LOG = LoggerFactory
.getLogger(SurroundQParser.class);
  static final int DEFMAXBASICQUERIES = 1000;
  static final String MBQParam = "maxBasicQueries";

  String sortStr;
  SolrQueryParser lparser;
  int maxBasicQueries;

  public SurroundQParser(String qstr, SolrParams localParams,
  SolrParams params, SolrQueryRequest req) {
super(qstr, localParams, params, req);
  }

  @Override
  public Query parse()
  throws org.apache.lucene.queryParser.ParseException {
SrndQuery sq;
String qstr = getString();
if (qstr == null)
  return null;
String mbqparam = getParam(MBQParam);
if (mbqparam == null) {
  this.maxBasicQueries = DEFMAXBASICQUERIES;
} else {
  try {
this.maxBasicQueries = Integer.parseInt(mbqparam);
  } catch (Exception e) {
LOG.warn("Couldn't parse maxBasicQueries value " + mbqparam +",
using default of 1000");
this.maxBasicQueries = DEFMAXBASICQUERIES;
  }
}
// ugh .. colliding ParseExceptions
try {
  sq = org.apache.lucene.queryParser.surround.parser.QueryParser
  .parse(qstr);
} catch (org.apache.lucene.queryParser.surround.parser.ParseException
pe) {
  throw new org.apache.lucene.queryParser.ParseException(
  pe.getMessage());
}

// so what do we do with the SrndQuery ??
// processing based on example in LIA Ch 9

BasicQueryFactory bqFactory = new
BasicQueryFactory(this.maxBasicQueries);
String defaultField =
getDefaultField(getReq().getSchema(),getParam(CommonParams.DF));
Query lquery = sq.makeLuceneQueryField(defaultField, bqFactory);
return lquery;
  }


public static String getDefaultField(final IndexSchema s, final String
df) {
return df != null ? df : s.getDefaultSearchFieldName();
   }

}


-- 
Anirudha P. Jadhav


Re: Downloading files from the solr replication Handler

2012-11-29 Thread Lance Norskog
Maybe these are text encoding markers?

- Original Message -
| From: "Eva Lacy" 
| To: solr-user@lucene.apache.org
| Sent: Thursday, November 29, 2012 3:53:07 AM
| Subject: Re: Downloading files from the solr replication Handler
| 
| I tried downloading them with my browser and also with a c#
| WebRequest.
| If I skip the first and last 4 bytes it seems work fine.
| 
| 
| On Thu, Nov 29, 2012 at 2:28 AM, Erick Erickson
| wrote:
| 
| > How are you downloading them? I suspect the issue is
| > with the download process rather than Solr, but I'm just guessing.
| >
| > Best
| > Erick
| >
| >
| > On Wed, Nov 28, 2012 at 12:19 PM, Eva Lacy  wrote:
| >
| > > Just to add to that, I'm using solr 3.6.1
| > >
| > >
| > > On Wed, Nov 28, 2012 at 5:18 PM, Eva Lacy  wrote:
| > >
| > > > I downloaded some configuration and data files directly from
| > > > solr in an
| > > > attempt to develop a backup solution.
| > > > I noticed there is some characters at the start and end of the
| > > > file
| > that
| > > > aren't in configuration files, I notice the same characters at
| > > > the
| > start
| > > > and end of the data files.
| > > > Anyone with any idea how I can download these files without the
| > > > extra
| > > > characters or predict how many there are going to be so I can
| > > > skip
| > them?
| > > >
| > >
| >
| 


Re: Indexing performance with solrj vs. direct lucene API

2012-11-29 Thread Mark Bennett
Hi Robert,

SolrJ is sending data over a socket so that might explain some of the lag.
Are is your SolrJ app and the Solr server running on the same physical
machine?

I thought Mark M's idea sounded good.

One other idea:

When initializing SolrJ's connection for normal searching you probably use
HttpSolrServer.

But when doing massive updates, you might consider using
ConcurrentUpdateSolrServer instead.

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Wed, Nov 28, 2012 at 10:02 AM, Robert Stewart wrote:

> I have a project where I am porting existing application from direct
> Lucene API usage to using SOLR and SOLRJ client API.
>
> The problem I have is that indexing is 2-5x slower using SOLRJ+SOLR
> than using direct Lucene API.
>
> I am creating batches of documents between 200 and 500 documents per
> call to add() using SOLRJ.
>
> I tried adjusting SOLR parameters for indexing but did not make any
> difference.
>
> Documents are identical (same fields) in both cases.
>
> Nearly identical settings for tokenizing/analyzing/indexing/storing
> for each field with Lucene and SOLR.
>
> What could be the possible bottleneck in this case?   Can there
> significant over-head unpacking batch of documents in request?  Is
> there some SOLR over-head in update handler?
>
> I have tried both SOLR 3.6 and 4.0 with very similar results.
>
> When using SOLR 4.0 I have transaction logging (for NRT search) turned off.
>
> I am also NOT using a unique ID field.
>
> Performance for indexing 200 documents is around 250ms on SOLR, about
> 60ms on Lucene.
>
> I see that response time wrapping call to SOLRJ API add() method, and
> response time logged in SOLR log is nearly the same, so there is very
> little network overhead in this case.
>
> Is this typical amount of overhead to use SOLRJ+SOLR vs local Lucene API?
>
> The reason it matters in this case is application needs to rebuilt
> index once per day which currently takes about 45 minutes.  Using
> SOLRJ+SOLR it will take several hours, which is a show stopper in this
> case.
>
> Thanks.
>


Problem installing Solr-4.0 in Linux

2012-11-29 Thread dm_tim
Howdy,

I'm having rather a lot of difficulty getting Solr 4.0 running under Linux
(I got it up-and-running under Windows very quickly). My web server is
Glassfish 3.1.1. Additonally, my solr/home dir is /opt/solr/solr-4.0 and my
data dir is /opt/solr/data .

When I deploy the solr war file or restart glassfish I get the following
exception: 
[#|2012-11-29T15:42:10.439-0800|WARNING|glassfish3.1.1|javax.enterprise.system.container.web.com.sun.enterprise.web|_ThreadID=25;_ThreadName=Thread-2;|StandardWrapperValve[LoadAdminUI]:
PWC1406: Servlet.service() for servlet LoadAdminUI threw exception
java.lang.NoClassDefFoundError: org/apache/commons/lang/StringEscapeUtils

This makes no sense to me as that class is inside the war file. At this
point I have no idea what to do to get past this. I've been hammering at
this for quite some time now. Any suggestions?

Regards,

Tim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-installing-Solr-4-0-in-Linux-tp4023357.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: inconsistent number of results returned in solr cloud

2012-11-29 Thread Buttler, David
Sorry, yes, I had been using the BETA version.  I have deleted all of that, 
replaced the jars with the released versions (reduced my core count), and now I 
have consistent results.
I guess I missed that JIRA ticket, sorry for the false alarm.
Dave


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, November 23, 2012 4:25 AM
To: solr-user@lucene.apache.org
Subject: Re: inconsistent number of results returned in solr cloud

Dave:

I should have asked this first. What version of Solr are you using? I  Not sure 
whether it was fixed in BETA or not (certainly is in the 4.0 GA release). There 
was a problem with adding a doclist via solrj, here's one related JIRA, 
although it wasn't the main fix:
https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the "known 
problem" Mark mentioned.

Because what you're seeing _sure_ sounds similar

Best
Erick


On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David  wrote:

> Answers inline below
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, November 17, 2012 6:40 AM
> To: solr-user@lucene.apache.org
> Subject: Re: inconsistent number of results returned in solr cloud
>
> Hmmm, first an aside. If by "commit after every batch of documents " 
> you mean after every call to server.add(doclist), there's no real need 
> to do that unless you're striving for really low latency. the usual 
> recommendation is to use commitWithin when adding and commit only at 
> the very end of the run. This shouldn't actually be germane to your 
> issue, just an FYI.
>
> DB> Good point.  The code for committing docs to solr is fairly old.  
> DB> I
> will update it since I don't have a latency requirement.
>
> So you're saying that the inconsistency is permanent? By that I mean 
> it keeps coming back inconsistently for minutes/hours/days?
>
> DB> Yes, it is permanent.  I have collections that have been up for 
> DB> weeks,
> and are still returning inconsistent results, and I haven't been 
> adding any additional documents.
> DB> Related to this, I seem to have a discrepancy between the number 
> DB> of
> documents I think I am sending to solr, and the number of documents it 
> is reporting.  I have tried reducing the number of shards for one of 
> my small collections, so I deleted all references to this collections, 
> and reloaded it. I think I have 260 documents submitted (counted from a 
> hadoop job).
>  Solr returns a count of ~430 (it varies), and the first returned 
> document is not consistent.
>
> I guess if I were trying to test this I'd need to know how you added 
> subsequent collections. In particular what you did re: zookeeper as 
> you added each collection.
>
> DB> These are my steps
> DB> 1. Create the collection via the HTTP API: http://
> :/solr/admin/collections?action=CREATE&name=&n
> umShards=6&%20collection.configName=
> DB> 2. Relaunch one of my JVM processes, bootstrapping the collection:
> DB> java -Xmx16g -Dcollection.configName= 
> DB> -Djetty.port=
> -DzkHost= -Dsolr.solr.home= -DnumShards=6 
> -Dbootstrap_confdir=conf -jar start.jar
> DB> load data
>
> DB> Let me know if something is unclear.  I can run through the 
> DB> process
> again and document it more carefully.
> DB>
> DB> Thanks for looking at it,
> DB> Dave
>
> Best
> Erick
>
>
> On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David  wrote:
>
> > My typical way of adding documents is through SolrJ, where I commit 
> > after every batch of documents (where the batch size is 
> > configurable)
> >
> > I have now tried committing several times, from the command line 
> > (curl) with and without openSearcher=true.  It does not affect anything.
> >
> > Dave
> >
> > -Original Message-
> > From: Mark Miller [mailto:markrmil...@gmail.com]
> > Sent: Friday, November 16, 2012 11:04 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: inconsistent number of results returned in solr cloud
> >
> > How did you do the final commit? Can you try a lone commit (with
> > openSearcher=true) and see if that affects things?
> >
> > Trying to determine if this is a known issue or not.
> >
> > - Mark
> >
> > On Nov 16, 2012, at 1:34 PM, "Buttler, David"  wrote:
> >
> > > Hi all,
> > > I buried an issue in my last post, so let me pop it up.
> > >
> > > I have a cluster with 10 collections on it.  The first collection 
> > > I
> > loaded works perfectly.  But every subsequent collection returns an 
> > inconsistent number of results for each query.  The queries can be 
> > simply *:*, or more complex facet queries.  If I go to individual 
> > cores and
> issue
> > the query, with distrib=false, I get a consistent number of results.  
> > I
> am
> > wondering if there is some delay in returning results from my 
> > shards, and the queried node just times out and displays the number 
> > of results that
> it
> > has received so far.  If there is such a timeout, it must be very 
> > small,
> as
> > my QTime is ar

Re: Solr - Jetty Form Too Large Exception

2012-11-29 Thread deniz
Marcin Rzewucki wrote
> 
> I think you should change/set value for multipartUploadLimitInKB attribute
> of requestParsers in solrconfig.xml


the value for multiPartUploadLimit is shown as 2048000 in the config and in
the error logs i see 20, related with jetty... I have changed some part
in the source code and will test soon... i dunno if it works or not for
now...



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Jetty-Form-Too-Large-Exception-tp4023185p4023367.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr asp.net integration challenge

2012-11-29 Thread Gora Mohanty
On 27 November 2012 21:19, Paul Tester  wrote:
> Hi all,
>
> At our company we have an asp.net webapplication hosted in IIS 7.5. This
> application have a search module which is using solr. For communication
> with the solr instance we use a 3th party plugin. For every search we show
> the total count of the results and also 10 or 15 records. What we're now
> trying to achieve is that the user can select all the records from his
> search, which involves that all the doc ids should be available in the
> asp.net application in IIS as fast as possible. Our problem is that the
> count of his search easily can contain 1.000.000 records (or even more),
> which takes way to long to transport them to the application via a json
> result over http. So I'm looking for an alternative solution which is way
> faster.

Retrieving, and displaying, all of a million records is definitely
going to be slow. Are you not paginating your displayed results?
If so, you could fetch results from Solr in smaller batches, keeping
a small window of pages around the current one.

Regards,
Gora


Re: Grouping by a date field

2012-11-29 Thread Amit Nithian
Why not create a new field that just contains the day component? Then you
can group by this field.


On Thu, Nov 29, 2012 at 12:38 PM, sdanzig  wrote:

> I'm trying to create a SOLR query that groups/field collapses by date.  I
> have a field in -MM-dd'T'HH:mm:ss'Z' format, "datetime", and I'm
> looking
> to group by just per day.  When grouping on this field using
> group.field=datetime in the query, SOLR responds with a group for every
> second.  I'm able to easily use this field to create day-based facets, but
> not groups.  Advice please?
>
> - Scott
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Grouping-by-a-date-field-tp4023318.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SolrCloud - Sorting Problem

2012-11-29 Thread deniz
Hello, I am having a weird problem with solrcloud and sorting, I will open a
bug ticket about this too, but wondering if anyone had similar problems like
mine

Background: Basically, I have added a new feature to Solr after I got the
source code. Similar to the we get "score" in the resultset,  I am now able
to get position (or ranking) information of each document in the list. i.e
if there are 5 documents in the result set, each of them has its position
information if you add "fl=*,position" to the query.

Problem: Briefly, when a solr instance is standalone, there is no problem
with sorting and posiiton information of each document, but when the same
solr is on a cloud (as a master), the result set is some kinda shuffled and
position information is incorrect.

So it ls like this:

Both standalone and the on cloud finds the same amount of documents in the
index (say 15000), which is filled by using the same data source. So till
this point everything seems normal

But here are the results

Standalone Solr:


   a
   1


  b
  2


   c
   3


  d
  4


  e
  5


  f
  6


Same Solr on Cloud (as master)


z
4


   x
   6


   y
   1


   v
   3


   r
   2


   o
   5



As clear above, the *same configs with the same query and sorting
parameter*, are returning *different documents and totally shuffled
position* information. 


Anyone has any ideas on this?





-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Sorting-Problem-tp4023382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouping by a date field

2012-11-29 Thread Jack Krupansky
Or group by a function query which is the date field converted to 
milliseconds divided by the number of milliseconds in a day.


Such as:

 q=*:*&group=true&group.func=rint(div(ms(date_dt),mul(24,mul(60,mul(60,1000)

-- Jack Krupansky

-Original Message- 
From: Amit Nithian

Sent: Thursday, November 29, 2012 10:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Grouping by a date field

Why not create a new field that just contains the day component? Then you
can group by this field.


On Thu, Nov 29, 2012 at 12:38 PM, sdanzig  wrote:


I'm trying to create a SOLR query that groups/field collapses by date.  I
have a field in -MM-dd'T'HH:mm:ss'Z' format, "datetime", and I'm
looking
to group by just per day.  When grouping on this field using
group.field=datetime in the query, SOLR responds with a group for every
second.  I'm able to easily use this field to create day-based facets, but
not groups.  Advice please?

- Scott



--
View this message in context:
http://lucene.472066.n3.nabble.com/Grouping-by-a-date-field-tp4023318.html
Sent from the Solr - User mailing list archive at Nabble.com.





localHostContext should not contain a / ... why not?

2012-11-29 Thread Chris Hostetter


Can anyone shed some light on this code in ZkController...

if (localHostContext.contains("/")) {
  throw new IllegalArgumentException("localHostContext ("
  + localHostContext + ") should not contain a /");
}

...i don't really understand this limitation.  There's nothing in the 
servlet spec that prevents a context path from containing '/' characters 
-- i can for instance modify the jetty context file that ships with solr 
like so and jetty will happily run solr rooted at 
http://localhost:8983/solr/hoss/man ...



hossman@frisbee:~/lucene/dev$ svn diff solr/example/contexts/solr.xml
Index: solr/example/contexts/solr.xml
===
--- solr/example/contexts/solr.xml  (revision 1415493)
+++ solr/example/contexts/solr.xml  (working copy)
@@ -1,8 +1,8 @@
 
 http://www.eclipse.org/jetty/configure.dtd";>
 
-  /solr
+  /solr/hoss/man
   /webapps/solr.war
   /etc/webdefault.xml
   /solr-webapp
-
\ No newline at end of file
+



My best guesses as to the intent of this code are:

1) that it was really ment to ensure the localHostContext didn't *start* 
with a redundent "/"


2) that there is some reason why the nodeName shouldn't include slashes, 
and the nodeName is built using the localHostContext, so the restriction 
propogates.


If it's #1 it seems like a trivial bug with an easy fix. #2 doesn't really 
make sense to me -- but it may just be my ZK ignorance: Aren't nodePaths 
in ZK hierarchical by nature, so shouldn't allowing "/" be fine? is there 
some reason introducing multiple "sub directories" (with a single child) 
in ZK for a single solr node would bad? ... if so then wouldn't a simple 
solution be to URL encode the localHostContext (or escape the "/" in some 
other way) when building the nodeName so that we can eliminate this 
limitation?





-Hoss


Re: Grouping by a date field

2012-11-29 Thread Amit Nithian
What's the performance impact of doing this?


On Thu, Nov 29, 2012 at 7:54 PM, Jack Krupansky wrote:

> Or group by a function query which is the date field converted to
> milliseconds divided by the number of milliseconds in a day.
>
> Such as:
>
>  q=*:*&group=true&group.func=**rint(div(ms(date_dt),mul(24,**
> mul(60,mul(60,1000)
>
> -- Jack Krupansky
>
> -Original Message- From: Amit Nithian
> Sent: Thursday, November 29, 2012 10:29 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Grouping by a date field
>
>
> Why not create a new field that just contains the day component? Then you
> can group by this field.
>
>
> On Thu, Nov 29, 2012 at 12:38 PM, sdanzig  wrote:
>
>  I'm trying to create a SOLR query that groups/field collapses by date.  I
>> have a field in -MM-dd'T'HH:mm:ss'Z' format, "datetime", and I'm
>> looking
>> to group by just per day.  When grouping on this field using
>> group.field=datetime in the query, SOLR responds with a group for every
>> second.  I'm able to easily use this field to create day-based facets, but
>> not groups.  Advice please?
>>
>> - Scott
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.**nabble.com/Grouping-by-a-date-**
>> field-tp4023318.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>


Re: SolrCloud - Sorting Problem

2012-11-29 Thread deniz
After playing with this more, i think I have some clue...

on the standalone solr, when i give start 11 and rows 20, i can see
documents with positions ranging from 12 to 31, which is correct... on the
cloud, when i give the same parameters, again i get the same documents, but
this time position ranges between 1 to 20... 

so my question... cloud uses some different class for responding to the
search request? if so, are there any other ways to find those classes out
other than digging the code? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Sorting-Problem-tp4023382p4023399.html
Sent from the Solr - User mailing list archive at Nabble.com.


IOFileUploadException(Too many open files) occurs while indexing using ExtractingRequestHandler

2012-11-29 Thread Shigeki Kobayashi
Hello everyone

I use ManifoldCF (File Crawler) to crawl and index file contents into
Solr3.6.
ManifoldCF uses ExtractingRequestHandler to extract contents from files.
Somehow IOFileUploadException occurs and tells there are too many open
files.

Does Solr open temporary files under /var/tmp/ a lot? Are there any cases
that those files remained open?

Also, after IOFileUploadException occurs, LockObtainFailedException tend to
happen a lot. Do you think this is related to IOFileUploadException?


2012/11/30 04:11:19
ERROR[solr.servlet.SolrDispatchFilter]-[TP-Processor1962]-:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
Processing of multipart/form-data request failed.
/var/tmp/upload_4f3502de_13b4ac3d1f6__8000_24519177.tmp (Too many open
files)
at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
at
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
at
org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
at
org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774)
at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
at
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException:
/var/tmp/upload_4f3502de_13b4ac3d1f6__8000_24519177.tmp (Too many open
files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:194)
at java.io.FileOutputStream.(FileOutputStream.java:145)
at
org.apache.commons.io.output.DeferredFileOutputStream.thresholdReached(DeferredFileOutputStream.java:181)
at
org.apache.commons.io.output.ThresholdingOutputStream.checkThreshold(ThresholdingOutputStream.java:226)
at
org.apache.commons.io.output.ThresholdingOutputStream.write(ThresholdingOutputStream.java:130)
at org.apache.commons.fileupload.util.Streams.copy(Streams.java:101)
at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)
at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
... 23 more






2012/11/30 06:11:08
ERROR[solr.servlet.SolrDispatchFilter]-[TP-Processor1940]-:org.apache.lucene.store.LockObtainFailedException:
Lock obtain timed out: NativeFSLock@/usr/local/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1098)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:84)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:171)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:219)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:141)
at
org.apache.solr.handler.extraction.Extrac