Re: additional requests sent to solr

2013-08-11 Thread alxsss
Hi,

Could someone please confirm that this must me so or this is a bug in SOLR.

In short, I see three logs in SOLR for one  request
http://server1:8983/solr/mycollection/select?q=alex&wt=xml&defType=edismax&facet.field=school&facet.field=company&facet=true&facet.limit=10&facet.mincount=1&qf=school_txt+company_txt+name&shards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

for the case when facet=true.  

The third log looks like as 
INFO: [mycollection] webapp=/solr path=/select

params={facet=true&facet.mincount=1&company__terms=Google&ids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511&facet.limit=10&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&school__terms=Michigan+State+University,Brigham+Young+University,Northeastern+University&q=alex&facet.field={!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}company&isShard=true}
 status=0 QTime=6

where company__terms and school_terms values are taken from facet values for
company and school fields.

When data is big this leads to a log with all facet values, that
considerably slows performance. This issue is observed in distributed mode
only.

Thanks in advance.
Alex.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/additional-requests-sent-to-solr-tp4079007p4083799.html
Sent from the Solr - User mailing list archive at Nabble.com.


What do you use for solr's logging analysis?

2013-08-11 Thread adfel70
Hi
I'm looking at a tool that could help me perform solr logging analysis.
I use SolrCloud on multiple servers, so the tool should be able to collect
logs from multiple servers.

Any tool you use and can advice of?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-do-you-use-for-solr-s-logging-analysis-tp4083809.html
Sent from the Solr - User mailing list archive at Nabble.com.


Edismax vs Dismax

2013-08-11 Thread heaven
Hi, the application I am working on switched to edismax parser and I found
some weird behavior.

I have this field:
 
  




  
  



  


The string that is indexed is: facebook.com/profile.php?id=123456789

When I do use the dismax parser the query returns one result and 0 with
edismax. Here are the queries I tried:
1 result:
fq=type%3ASite&sort=score+desc&q=facebook.com%2Fprofile.php%3Fid%3D1571031169&fl=%2A+score&qf=url_url&defType=dismax&mm=1&start=0&rows=20&

0 results:
fq=type%3ASite&sort=score+desc&q=facebook.com%2Fprofile.php%3Fid%3D1571031169&fl=%2A+score&qf=url_url&defType=edismax&mm=1&start=0&rows=20&

Can someone please help me figure this out?

Thank you,
Alex



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edismax-vs-Dismax-tp4083812.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuring SpellCehckComponent

2013-08-11 Thread tamanjit.bin...@yahoo.co.in
The searchComponent would be placed in your solrconfig.xml. There is no
specific place for it. This is what the comment in you solrconfig.xml says:

 Search Components

   Search components are registered to SolrCore and used by 
   instances of SearchHandler (which can access them by name)
   
   By default, the following components are available:
   
   
   
   
   
   
   
   
   Default configuration in a requestHandler would look like:

   
 query
 facet
 mlt
 highlight
 stats
 debug
   

   If you register a searchComponent to one of the standard names, 
   that will be used instead of the default.

   To insert components before or after the 'standard' components, use:

   
 myFirstComponentName
   

   
 myLastComponentName
   

   NOTE: The component registered with the name "debug" will
   always be executed after the "last-components"



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuring-SpellCehckComponent-tp4083731p4083815.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spelling suggestions.

2013-08-11 Thread tamanjit.bin...@yahoo.co.in
I think the issue is that you are trying to use WordBreakSolrSpellChecker
(which was introduced in Solr 4.x version) in your Solr App of 3.5 version.
You need to correct that.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spelling-suggestions-tp4083519p4083816.html
Sent from the Solr - User mailing list archive at Nabble.com.


commit vs soft-commit

2013-08-11 Thread tamanjit.bin...@yahoo.co.in
Hi,
Some confusion in my head.
http://http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

  
says that
/A soft commit is much faster since it only makes index changes visible and
does not fsync index files or write a new index descriptor./

So this means that even with every softcommit a new searcher opens right? If
it does, isn't it still very heavy?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-vs-soft-commit-tp4083817.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Edismax vs Dismax

2013-08-11 Thread Jack Krupansky
Just escape the special characters of the URL with backslash or put the 
entire URL in quotes. The slash is particularly problematic since it 
introduces a regular expression. Dismax has a less-sophisticated syntax and 
automatically escapes more special characters.


-- Jack Krupansky

-Original Message- 
From: heaven

Sent: Sunday, August 11, 2013 8:53 AM
To: solr-user@lucene.apache.org
Subject: Edismax vs Dismax

Hi, the application I am working on switched to edismax parser and I found
some weird behavior.

I have this field:

 
   
   
   
   
 
 
   
   
   
 
   

The string that is indexed is: facebook.com/profile.php?id=123456789

When I do use the dismax parser the query returns one result and 0 with
edismax. Here are the queries I tried:
1 result:
fq=type%3ASite&sort=score+desc&q=facebook.com%2Fprofile.php%3Fid%3D1571031169&fl=%2A+score&qf=url_url&defType=dismax&mm=1&start=0&rows=20&

0 results:
fq=type%3ASite&sort=score+desc&q=facebook.com%2Fprofile.php%3Fid%3D1571031169&fl=%2A+score&qf=url_url&defType=edismax&mm=1&start=0&rows=20&

Can someone please help me figure this out?

Thank you,
Alex



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edismax-vs-Dismax-tp4083812.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Erick Erickson
What version of Solr is Cloudera's CDH built on? Looks to me like
the Solr you're using to read the M/R produced index is different
than the one used to build it. Or the version specified in the
Solr configs, evidenced by the LUCENE40 in the error
message. See  in solrconfig.xml.

But probably a better question to ask Cloudera...

Erick


On Fri, Aug 9, 2013 at 3:50 PM, Dmitriy Shvadskiy wrote:

> Hello,
> We are trying to utilize Amazon Elastic Map Reduce to build Solr indexes.
> We
> are using embedded Solr in the Reduce phase to create the actual index.
> However we run into a following error and not sure what is causing it. Solr
> version is 4.4. The job runs fine locally in Cloudera CDH 4.3 VM
>
> Thanks,
> Dmitriy
>
>
> 2013-08-09 14:52:02,602 FATAL org.apache.hadoop.mapred.Child (main): Error
> running child : java.lang.VerifyError: (class:
> org/apache/lucene/codecs/lucene40/Lucene40FieldInfosRead
> er, method: read signature:
>
> (Lorg/apache/lucene/store/Directory;Ljava/lang/String;Lorg/apache/lucene/store/IOContext;)Lorg/apache/lucene/index/FieldInfos;)
> Incompatible argument
> to function
> at
>
> org.apache.lucene.codecs.lucene40.Lucene40FieldInfosFormat.(Lucene40FieldInfosFormat.java:99)
> at
>
> org.apache.lucene.codecs.lucene40.Lucene40Codec.(Lucene40Codec.java:49)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at java.lang.Class.newInstance(Class.java:374)
> at
> org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
> at
> org.apache.lucene.util.NamedSPILoader.(NamedSPILoader.java:47)
> at
> org.apache.lucene.util.NamedSPILoader.(NamedSPILoader.java:37)
> at org.apache.lucene.codecs.Codec.(Codec.java:41)
> at
>
> org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:185)
> at
> org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:121)
> at
> org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:235)
> at
> org.apache.solr.core.CoreContainer.(CoreContainer.java:149)
> at
>
> org.finra.ss.solr.SolrIndexingReducer.getEmbeddedSolrServer(SolrIndexingReducer.java:195)
> at
> org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:94)
> at
> org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:33)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:528)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Shard splitting failure, with and without composite hashing

2013-08-11 Thread Erick Erickson
The very first thing I'd do is go to Solr 4.4. There have been
a lot of improvements in this code in the intervening 3
versions.

If the problem still occurs in 4.4, it'll get a lot more attention
than 4.1

FWIW,
Erick


On Fri, Aug 9, 2013 at 7:32 PM, Greg Preston wrote:

> Howdy,
>
> I'm trying to test shard splitting, and it's not working for me.  I've got
> a 4 node cloud with a single collection and 2 shards.
>
> I've indexed 170k small documents, and I'm using the compositeId router,
> with an internal "client id" as the shard key, with 4 distinct values
> across the data set.  For my testing, the values of the shard keys are 1
> through 4.  Before splitting, shard1 contains 100k docs (all of the docs
> for shard keys 1 and 4) and shard2 contains 70k docs (all of the docs for
> shard keys 2 and 3).
>
> In prod, we're going to have thousands of unique shard keys, but for now,
> I'm testing at a smaller scale.  I attempt to split shard2 with
>
> http://host0:8983/solr/admin/collections?action=SPLITSHARD&collection=coll&shard=shard2
>
> I understand the shard splitting is on hash range, not document count, and
> it shouldn't split up documents within a single shard key, so I'm ok with
> it if both shard keys end up in the same sub-shard.
>
> I see the following in the logs:
>
> 689524 [qtp259549756-119] ERROR org.apache.solr.servlet.SolrDispatchFilter
>  – null:java.lang.RuntimeException: java.lang.IllegalArgumentException:
> maxValue must be non-negative (got: -1)
> at
>
> org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:290)
> at
>
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
>
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.IllegalArgumentException: maxValue must be
> non-negative (got: -1)
> at
> org.apache.lucene.util.packed.PackedInts.bitsRequired(PackedInts.java:1184)
> at
>
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:140)
> at
>
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
> at
>
> 

Re: Purging unused segments.

2013-08-11 Thread Erick Erickson
Robert:

Thanks a million, that'll teach me to grep for the obvious ...

It's not even clear (I'm working twice-removed) that there _are_
unused files. I'm grasping at straws here

Thanks again,
Erick


On Fri, Aug 9, 2013 at 9:32 PM, Robert Muir  wrote:

> On Fri, Aug 9, 2013 at 7:48 PM, Erick Erickson 
> wrote:
> >
> > So is there a good way, without optimizing, to purge any segments not
> > referenced in the segments file? Actually I doubt that optimizing would
> > even do it if I _could_, any phantom segments aren't visible from the
> > segments file anyway...
> >
>
> I dont know why you have these files (windows? deletion policy?) but
> maybe you are interested in this:
>
>
> http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/IndexWriter.html#deleteUnusedFiles%28%29
>


Re: What do you use for solr's logging analysis?

2013-08-11 Thread Shreejay Nair
There are a lot of tools out there with varying degrees of functionality (
and ease of setup) we also have multiple solr servers in production ( both
cloud and single nodes ) and we have decided to use
http://loggly.  We will probably be setting it up for
all our servers in the next few weeks. .

There are plenty of other such log analysis tools. It all depends on your
particular use case.

--Shreejay



On Sunday, August 11, 2013, adfel70 wrote:

> Hi
> I'm looking at a tool that could help me perform solr logging analysis.
> I use SolrCloud on multiple servers, so the tool should be able to collect
> logs from multiple servers.
>
> Any tool you use and can advice of?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/What-do-you-use-for-solr-s-logging-analysis-tp4083809.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


-- 
-- 
Shreejay Nair
Sent from my mobile device. Please excuse brevity and typos.


Re: Could not load config for solrconfig.xml

2013-08-11 Thread Erick Erickson
bq: I have no idea what to do

First thing to do is look at the full stack trace
in the log. The offending bits are usually farther
down the stack.

Best
Erick


On Sat, Aug 10, 2013 at 2:10 PM, shuargan  wrote:

> Do you remember what was your mistake?
> Im having the same issue
>
> I have this solr.xml conf under Catalina/localhost
>
> 
>  privileged="true" crossContext="true">
>   value="/home/seba/aux" override="true"/>
> 
>
>
> and is throwing the same error...
> HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init
> failure: Could not load config for
> solrconfig.xml,trace=org.apache.solr.common.SolrException: SolrCore
> 'collection1' is not available due to init failure: Could not load config
> for solrconfig.xml at
> org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860) at
>
>
> I have no idea what to do
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Could-not-load-config-for-solrconfig-xml-tp4052152p4083741.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: commit vs soft-commit

2013-08-11 Thread Shreejay Nair
Yes a new searcher is opened with every soft commit. It's still considered
faster because it does not write to the disk which is a slow IO operation
and might take a lot more time.

On Sunday, August 11, 2013, tamanjit.bin...@yahoo.co.in wrote:

> Hi,
> Some confusion in my head.
> http://
> http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
>  http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
> >
> says that
> /A soft commit is much faster since it only makes index changes visible and
> does not fsync index files or write a new index descriptor./
>
> So this means that even with every softcommit a new searcher opens right?
> If
> it does, isn't it still very heavy?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/commit-vs-soft-commit-tp4083817.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


-- 
-- 
Shreejay Nair
Sent from my mobile device. Please excuse brevity and typos.


Re: commit vs soft-commit

2013-08-11 Thread Erick Erickson
Soft commits also do not rebuild certain per-segment caches
etc. It does invalidate the "top level" caches, including
the caches you configure in solrconfig.xml.

So no, it's not free at all. Your soft commits should still
be as long an interval as makes sense in your app. But
they're still much faster than hard commits with openSearcher
set to false.

Best
Erick




On Sun, Aug 11, 2013 at 11:00 AM, Shreejay Nair  wrote:

> Yes a new searcher is opened with every soft commit. It's still considered
> faster because it does not write to the disk which is a slow IO operation
> and might take a lot more time.
>
> On Sunday, August 11, 2013, tamanjit.bin...@yahoo.co.in wrote:
>
> > Hi,
> > Some confusion in my head.
> > http://
> >
> http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
> >  >
> http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
> > >
> > says that
> > /A soft commit is much faster since it only makes index changes visible
> and
> > does not fsync index files or write a new index descriptor./
> >
> > So this means that even with every softcommit a new searcher opens right?
> > If
> > it does, isn't it still very heavy?
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/commit-vs-soft-commit-tp4083817.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
> --
> Shreejay Nair
> Sent from my mobile device. Please excuse brevity and typos.
>


Re: Configuring SpellCehckComponent

2013-08-11 Thread tamanjit.bin...@yahoo.co.in
There are two portions here:
1. To build a dictionary. Since you are using IndexBasedSpellChecker, you
would have to tell Solr, what field from your index to build up the
dictionary from.
2. To actually be able to search for your corrected spellings. For this you
would need a new requestHandler, to query Solr.

1. Config settings for building a dictionary:

  

  
solr.IndexBasedSpellChecker
org.apache.solr.spelling.suggest.tst.FSTLookup
 XYZ
  
 true
 ./
   


2. To make a requestHandler:

" class="solr.SearchHandler">
 
explicit
false
true
5


   spellcheck





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuring-SpellCehckComponent-tp4083731p4083842.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: commit vs soft-commit

2013-08-11 Thread tamanjit.bin...@yahoo.co.in
Erik-
/It does invalidate the "top level" caches, including the caches you
configure in solrconfig.xml. /

Could you elucidate?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-vs-soft-commit-tp4083817p4083844.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellchecker suggests Tokens

2013-08-11 Thread tamanjit.bin...@yahoo.co.in
I think the issue lies in the analysis of the field you use for
spellchecking. It also contains NGramFilterFactory. So wither copy your data
to another field with  some other fieldType which doesnot do
NGramFilterFactory analysis and then try this out.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecker-suggests-Tokens-tp4082821p4083846.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: commit vs soft-commit

2013-08-11 Thread Erick Erickson
Take a loot at solrconfig.xml. You configure filtrerCache,
documentCache, queryResultCache. These (and
some others I believe, but certainly these) are _not_
per-segment caches, so are invalidated on soft commit.
Any autowarming you've specified also gets executed
if applicable.

On the other hand, you specify short autocommit
intervals for tight NRT searching capabilities, so it's
likely these caches aren't being re-used all that much
anyway.

Best
Erick


On Sun, Aug 11, 2013 at 11:22 AM, tamanjit.bin...@yahoo.co.in <
tamanjit.bin...@yahoo.co.in> wrote:

> Erik-
> /It does invalidate the "top level" caches, including the caches you
> configure in solrconfig.xml. /
>
> Could you elucidate?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/commit-vs-soft-commit-tp4083817p4083844.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Shard splitting failure, with and without composite hashing

2013-08-11 Thread Greg Preston
Oops, I somehow forgot to mention that.  The errors I'm seeing are with the
release version of Solr 4.4.0.  I mentioned 4.1.0 as that's what we
currently have in prod, and we want to upgrade to 4.4.0 so we can do shard
splitting.  Towards that end, I'm testing shard splitting in 4.4.0 and
seeing these errors.

-Greg


On Sun, Aug 11, 2013 at 7:51 AM, Erick Erickson wrote:

> The very first thing I'd do is go to Solr 4.4. There have been
> a lot of improvements in this code in the intervening 3
> versions.
>
> If the problem still occurs in 4.4, it'll get a lot more attention
> than 4.1
>
> FWIW,
> Erick
>
>
> On Fri, Aug 9, 2013 at 7:32 PM, Greg Preston  >wrote:
>
> > Howdy,
> >
> > I'm trying to test shard splitting, and it's not working for me.  I've
> got
> > a 4 node cloud with a single collection and 2 shards.
> >
> > I've indexed 170k small documents, and I'm using the compositeId router,
> > with an internal "client id" as the shard key, with 4 distinct values
> > across the data set.  For my testing, the values of the shard keys are 1
> > through 4.  Before splitting, shard1 contains 100k docs (all of the docs
> > for shard keys 1 and 4) and shard2 contains 70k docs (all of the docs for
> > shard keys 2 and 3).
> >
> > In prod, we're going to have thousands of unique shard keys, but for now,
> > I'm testing at a smaller scale.  I attempt to split shard2 with
> >
> >
> http://host0:8983/solr/admin/collections?action=SPLITSHARD&collection=coll&shard=shard2
> >
> > I understand the shard splitting is on hash range, not document count,
> and
> > it shouldn't split up documents within a single shard key, so I'm ok with
> > it if both shard keys end up in the same sub-shard.
> >
> > I see the following in the logs:
> >
> > 689524 [qtp259549756-119] ERROR
> org.apache.solr.servlet.SolrDispatchFilter
> >  – null:java.lang.RuntimeException: java.lang.IllegalArgumentException:
> > maxValue must be non-negative (got: -1)
> > at
> >
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:290)
> > at
> >
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> > at
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> > at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> > at
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
> > at
> > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
> > at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> >
> org.eclipse.jett

Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Dmitriy Shvadskiy
Erick,

Thank you for the reply. Cloudera image includes Solr 4.3. I'm not sure what
version Amazon EMR includes. We are not directly referencing or using their
version of Solr but instead build our jar against Solr 4.4 and include all
dependencies in our jar file.  Also error occurs not while reading existing
index but simply creating an instance of EmbeddedSolrServer. I think there
is a conflict between jars that EMR process loads and that our map/reduce
job requires but I can't figure out what it is.

Dmitriy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083855.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What do you use for solr's logging analysis?

2013-08-11 Thread William Bell
Loggly cannot accept our SOLR queries as fast as we get them in production.
We get 2.5M lines of queries in the log file per every 10 minutes, and to
send to Loggly it takes literally 1.5 hours even when having 20 Hadoop
servers sending them.

What we really need from Loggly is a way to point Loggly at an S3 log file
and have Loggly load it in.


On Sun, Aug 11, 2013 at 8:57 AM, Shreejay Nair  wrote:

> There are a lot of tools out there with varying degrees of functionality (
> and ease of setup) we also have multiple solr servers in production ( both
> cloud and single nodes ) and we have decided to use
> http://loggly.  We will probably be setting it up for
> all our servers in the next few weeks. .
>
> There are plenty of other such log analysis tools. It all depends on your
> particular use case.
>
> --Shreejay
>
>
>
> On Sunday, August 11, 2013, adfel70 wrote:
>
> > Hi
> > I'm looking at a tool that could help me perform solr logging analysis.
> > I use SolrCloud on multiple servers, so the tool should be able to
> collect
> > logs from multiple servers.
> >
> > Any tool you use and can advice of?
> >
> > Thanks
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/What-do-you-use-for-solr-s-logging-analysis-tp4083809.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
> --
> Shreejay Nair
> Sent from my mobile device. Please excuse brevity and typos.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Erick Erickson
Have you checked the luceneMatchVersion in all your solrconfig.xml
files? I'm guessing it't set to 40 somewhere in the process as
evidenced by the line:
org.apache.lucene.codecs.lucene40.Lucene40FieldInfosFormat.(
Lucene40FieldInfosFormat.java:99)
so it looks like somehow a Lucene 4.0 codec is being used to try to read
a more recent format.

You have three different Solr's you're trying to mix-and-match, so
getting them all coordinated is "interesting". I'm guessing that
when you instantiate the embedded Solr, you're pointing it at a
pre-existing index, but that's only guessing..

Best
Erick


On Sun, Aug 11, 2013 at 2:05 PM, Dmitriy Shvadskiy wrote:

> Erick,
>
> Thank you for the reply. Cloudera image includes Solr 4.3. I'm not sure
> what
> version Amazon EMR includes. We are not directly referencing or using their
> version of Solr but instead build our jar against Solr 4.4 and include all
> dependencies in our jar file.  Also error occurs not while reading existing
> index but simply creating an instance of EmbeddedSolrServer. I think there
> is a conflict between jars that EMR process loads and that our map/reduce
> job requires but I can't figure out what it is.
>
> Dmitriy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083855.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Multipoint date ranges with spatial - Invalid Longitude Exception?

2013-08-11 Thread zonski
Hi, 

I'm trying to implement date range searching using spatial features as per:
http://lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-td4025336.html

I've followed the steps and read through the linked articles but I can't get
past an exception: 

InvalidShapeException: Invalid longitude: longitudes are range -180
to 180: provided lon: [2013224.0]

I am trying to model date ranges so a thing (in my case a grant that you can
apply for) could be open open for a few months, then closes for a few
months, and then re-opens, etc. I want to find all grants that are open for
applicants in a specific date range (e.g. what can I apply for between
1-Mar-2013 and 1-Apr-2013). 

I have a field type like so: 

   

And a field like so: 



Then I store values in this field, using a simple/rough calc of (year * 1000
+ dayOfYear). I know this is not a perfect mapping for duration but I think
it should be enough for my purposes and is easy to read/debug.  

So I end up with something like: 

2013224 2013301 

Then I query on this using something like: 

grantRoundDates:"Intersects(0, 2013224, 2014231, 300)"

And I get the above exception about 2013224 not being a valid longitude. I'm
not sure why Solr is trying to convert this to Longitude when I have
geo=false but I admit my understanding of this whole space is pretty basic
at this stage. 

The examples in the links provided all use nice, small numbers. If I use
small numbers like this: 

grantRoundDates:"Intersects(0, 100, 100, 300)"

Then it doesn't error but also returns no results (as expected). Am I
suppose to map my range to fit between -180 and 180, or there is something
more I have to do to get Solr to allow my larger numbers?

Thanks,
Dan









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multipoint-date-ranges-with-spatial-Invalid-Longitude-Exception-tp4083882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Dmitriy Shvadskiy
Erick,

It actually suppose to be just one version of Solr that is bundled with our
map/reduce jar. To be clear: Map/Reduce job is generating a new index, not
reading an existing one. But it fails even before  as an instance of
EmbeddedSolrServer is created at the first line of the following code.

CoreContainer coreContainer = new CoreContainer(solrhomedir);
coreContainer.load();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer,
"collection1");

Dmitriy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083884.html
Sent from the Solr - User mailing list archive at Nabble.com.


very simple boolean query not working

2013-08-11 Thread S L
When I do this query:

q=catcode:CC001

I get a bunch of results. One of them looks like this:


CC001
Cooper, John


If I then do this query:

q=start_url_title:cooper

I also match the record above, as expected.

But, if I do this:

q=(catcode:CC001 AND start_url_title:cooper)

I get no results.

And, if I do this:

q=(catcode:CC001 OR start_url_title:cooper)

I also get no results.

schema.xml has this declaration of catcode:



And, this for start_url_title:



What am I missing?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/very-simple-boolean-query-not-working-tp4083895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding Postgres and Mysql JDBC drivers to Solr

2013-08-11 Thread Tim Vaillancourt
Another option is defining the location of these jars in your 
solrconfig.xml and storing the libraries external to jetty, which has 
some advantages.


Eg: MySQL connector is located at '/opt/mysql_connector' and adding this 
to your solrconfig.xml alongside the other lib entities:


   

Cheers,

Tim

On 06/08/13 08:02 AM, Spadez wrote:

Thank you very much



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-Postgres-and-Mysql-JDBC-drivers-to-Solr-tp4082806p4082832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: very simple boolean query not working

2013-08-11 Thread Jack Krupansky

What query parser and release of Solr are you using?

There was a bug at one point where a fielded term immediately after a left 
parenthesis was not handled properly.


If I recall, just insert a space after the left parenthesis.

Also, the dismax query parser does not support parentheses.

-- Jack Krupansky

-Original Message- 
From: S L

Sent: Monday, August 12, 2013 12:48 AM
To: solr-user@lucene.apache.org
Subject: very simple boolean query not working

When I do this query:

   q=catcode:CC001

I get a bunch of results. One of them looks like this:

   
   CC001
   Cooper, John
   

If I then do this query:

   q=start_url_title:cooper

I also match the record above, as expected.

But, if I do this:

   q=(catcode:CC001 AND start_url_title:cooper)

I get no results.

And, if I do this:

   q=(catcode:CC001 OR start_url_title:cooper)

I also get no results.

schema.xml has this declaration of catcode:

   

And, this for start_url_title:

   

What am I missing?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/very-simple-boolean-query-not-working-tp4083895.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Internal shard communication - performance?

2013-08-11 Thread Tim Vaillancourt
For me the biggest deal with increased chatter between SolrCloud is 
object creation and GCs.


The resulting CPU load from the increase GCing seems to affect 
performance for me in some load tests, but I'm still trying to gather 
hard numbers on it.


Cheers,

Tim

On 07/08/13 04:05 PM, Shawn Heisey wrote:

On 8/7/2013 2:45 PM, Torsten Albrecht wrote:

I would like to run zookeeper external at my old master server.

So I have two zookeeper to control my cloud. The third and fourth 
zookeeper will be a virtual machine.


For true HA with zookepeer, you need at least three instances on 
separate physical hardware.  If you want to use VMs, that would be 
fine, but you must ensure that you aren't running more than one 
instance on the same physical server.


For best results, use an odd number of ZK instances.  With three ZK 
instances, one can go down and everything still works.  With five, two 
can go down and everything still works.


If you've got a fully switched network that's at least gigabit speed, 
then the network latency involved in internal communication shouldn't 
really matter.


Thanks,
Shawn



Re: very simple boolean query not working

2013-08-11 Thread S L
Jack Krupansky-2 wrote
> What query parser and release of Solr are you using?
> 
> There was a bug at one point where a fielded term immediately after a left 
> parenthesis was not handled properly.
> 
> If I recall, just insert a space after the left parenthesis.
> 
> Also, the dismax query parser does not support parentheses.
> 
> -- Jack Krupansky

I'm using "standard" as the default query parser.

  

Solr version is 3.4.0. Yes, that's old.

Adding a space after the left parenthesis didn't make a difference.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/very-simple-boolean-query-not-working-tp4083895p4083904.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Internal shard communication - performance?

2013-08-11 Thread Alexey Kozhemiakin
Hi Tim, Torsten,

Please review following threads which covers chatty shard-shard and 
shard-replica conversations, and since you index large volumes of data it can 
be a  potential bottleneck in your case.

http://lucene.472066.n3.nabble.com/Sharding-and-Replication-td4071614.html

http://lucene.472066.n3.nabble.com/Performance-vs-maxBufferedAddsPerServer-10-td4080283.html
 

https://issues.apache.org/jira/browse/SOLR-4956 


-Original Message-
From: Tim Vaillancourt [mailto:t...@elementspace.com] 
Sent: Monday, August 12, 2013 08:19
To: solr-user@lucene.apache.org
Subject: Re: Internal shard communication - performance?

For me the biggest deal with increased chatter between SolrCloud is object 
creation and GCs.

The resulting CPU load from the increase GCing seems to affect performance for 
me in some load tests, but I'm still trying to gather hard numbers on it.

Cheers,

Tim

On 07/08/13 04:05 PM, Shawn Heisey wrote:
> On 8/7/2013 2:45 PM, Torsten Albrecht wrote:
>> I would like to run zookeeper external at my old master server.
>>
>> So I have two zookeeper to control my cloud. The third and fourth 
>> zookeeper will be a virtual machine.
>
> For true HA with zookepeer, you need at least three instances on 
> separate physical hardware.  If you want to use VMs, that would be 
> fine, but you must ensure that you aren't running more than one 
> instance on the same physical server.
>
> For best results, use an odd number of ZK instances.  With three ZK 
> instances, one can go down and everything still works.  With five, two 
> can go down and everything still works.
>
> If you've got a fully switched network that's at least gigabit speed, 
> then the network latency involved in internal communication shouldn't 
> really matter.
>
> Thanks,
> Shawn
>