Re: update to 4.3

2013-05-07 Thread Arkadi Colson

Found it on http://wiki.apache.org/solr/SolrLogging!

Thx

On 05/07/2013 08:40 AM, Arkadi Colson wrote:

Any tips on what to do with the configuration files?
Where do I have to store them and what should they look like? Any 
examples?



May 07, 2013 6:16:27 AM org.apache.catalina.core.AprLifecycleListener 
init
INFO: The APR based Apache Tomcat Native library which allows optimal 
performance in production environments was not found on the 
java.library.path: 
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8983"]
May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8009"]
May 07, 2013 6:16:28 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 621 ms
May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardService 
startInternal

INFO: Starting service Catalina
May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardEngine 
startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.39
May 07, 2013 6:16:28 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive 
/usr/local/apache-tomcat-7.0.39/webapps/solr.war
log4j:WARN No appenders could be found for logger 
(org.apache.solr.servlet.SolrDispatchFilter).

log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig 
for more info.
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/host-manager
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/docs
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/manager
May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/ROOT
May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/examples

May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8983"]
May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["ajp-bio-8009"]
May 07, 2013 6:16:34 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 6000 ms

BR,
Arkadi

On 05/06/2013 10:13 PM, Jan Høydahl wrote:

Hi,

The reason is that from Solr 4.3 you need to provide the SLF4J logger 
jars of choice

when deploying Solr to an external servlet container.

Simplest is to copy all jars from example/lib/ext into tomcat/lib

cd solr-4.3.0/example/lib/ext
cp * /usr/local/apache-tomcat-7.0.39/lib/

Please see CHANGES.TXT for more info 
http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

6. mai 2013 kl. 16:50 skrev Arkadi Colson :


Hi

After update to 4.3 I got this error:

May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8983"]
May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8009"]
May 06, 2013 2:30:08 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 610 ms
May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardService 
startInternal

INFO: Starting service Catalina
May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardEngine 
startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.39
May 06, 2013 2:30:08 PM org.apache.catalina.startup.HostConfig 
deployWAR
INFO: Deploying web application archive 
/usr/local/apache-tomcat-7.0.39/webapps/solr.war
May 06, 2013 2:30:45 PM org.apache.catalina.util.SessionIdGenerator 
createSecureRandom
INFO: Creation of SecureRandom instance for session ID generation 
using [SHA1PRNG] took [36,697] milliseconds.
May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext 
startInternal

SEVERE: Error filterStart
May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext 
startInternal

SEVERE: Context [/solr] startup failed due to previous errors
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/host-manager
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/docs
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application director

Re: When a search query comes to a replica what happens?

2013-05-07 Thread Furkan KAMACI
Hi Otis;

I've read at somewhere says that if you have one replica and 1000 query per
second search rate and if you switch to 5 replica you may get 200 qps
search rate. What do you think about that and how Solr parallelize
searching within replicas? By the way when you say replica do you mean both
replica and leader (because of leader is a replica too) or nodes of shard
except for leader?

2013/4/17 Otis Gospodnetic 

> No.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Tue, Apr 16, 2013 at 6:23 PM, Furkan KAMACI 
> wrote:
> > All in all will replica ask to its leader about where is remaining of
> data
> > or it directly asks to Zookeper?
> >
> > 2013/4/17 Otis Gospodnetic 
> >
> >> Hi,
> >>
> >> No, I believe "redirect" from replica to leader would happen only at
> >> index time, so a doc first gets indexed to leader and from there it's
> >> replicated to non-leader shards.  At query time there is no redirect
> >> to leader, I imagine, as that would quickly turn leaders into
> >> hotspots.
> >>
> >> Otis
> >> --
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI 
> >> wrote:
> >> > I want to make it clear in my mind:
> >> >
> >> > When a search query comes to a replica what happens?
> >> >
> >> > -Does it forwards the search query to leader and leader collects all
> the
> >> > data and prepares response (this will cause a performance issue
> because
> >> > leader is responsible for indexing at same time)
> >> > or
> >> > - replica communicates with leader and learns where is remaining
> >> > data(leaders asks to Zookeper and tells it to replica) and replica
> >> collects
> >> > all data and response it?
> >>
>


RE: Solr Cloud with large synonyms.txt

2013-05-07 Thread Roman Chyla
We have synonym files bigger than 5MB so even with compression that would
be probably failing (not using solr cloud yet)
Roman
On 6 May 2013 23:09, "David Parks"  wrote:

> Wouldn't it make more sense to only store a pointer to a synonyms file in
> zookeeper? Maybe just make the synonyms file accessible via http so other
> boxes can copy it if needed? Zookeeper was never meant for storing
> significant amounts of data.
>
>
> -Original Message-
> From: Jan Høydahl [mailto:jan@cominvent.com]
> Sent: Tuesday, May 07, 2013 4:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud with large synonyms.txt
>
> See discussion here
> http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614.html
>
> One idea was compression. Perhaps if we add gzip support to SynonymFilter
> it
> can read synonyms.txt.gz which would then fit larger raw dicts?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 6. mai 2013 kl. 18:32 skrev Son Nguyen :
>
> > Hello,
> >
> > I'm building a Solr Cloud (version 4.1.0) with 2 shards and a Zookeeper
> (the Zookeeer is on different machine, version 3.4.5).
> > I've tried to start with a 1.7MB synonyms.txt, but got a
> "ConnectionLossException":
> > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt
> >at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
> >at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:270)
> >at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:267)
> >at
>
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java
> :65)
> >at
> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:267)
> >at
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:436)
> >at
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:315)
> >at
> org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135)
> >at
> org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:955)
> >at
> org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:285)
> >... 43 more
> >
> > I did some researches on internet and found out that because Zookeeper
> znode size limit is 1MB. I tried to increase the system property
> "jute.maxbuffer" but it won't work.
> > Does anyone have experience of dealing with it?
> >
> > Thanks,
> > Son
>
>


Re: Rearranging Search Results of a Search?

2013-05-07 Thread Furkan KAMACI
Can I use Transformers for my purpose?

2013/5/3 Furkan KAMACI 

> I think this looks like what I search for:
> https://issues.apache.org/jira/browse/SOLR-4465
>
> How about post filter for Lucene, can it help me for my purpose?
>
> 2013/5/3 Otis Gospodnetic 
>
>> Hi,
>>
>> You should use search more often :)
>>
>> http://search-lucene.com/?q=scriptable+collector&sort=newestOnTop&fc_project=Solr&fc_type=issue
>>
>> Coincidentally, what you see there happens to be a good example of a
>> Solr component that does something behind the scenes to deliver those
>> search results even though my original query was bad.  Knd of
>> similar to what you are after.
>>
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>>
>>
>>
>>
>> On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI 
>> wrote:
>> > I know that I can use boosting at query for a field, for a searching
>> term,
>> > at solrconfig.xml and query elevator so I can arrange the results of a
>> > search. However after I get top documents how can I change the order of
>> a
>> > results? Does Lucene's postfilter stands for that?
>>
>
>


Re: Solr Cloud with large synonyms.txt

2013-05-07 Thread Jan Høydahl
Hi,

SolrCloud is designed with an assumption that you should be able to upload your 
whole disk-based conf folder into ZK, and that you should be able to add an 
empty Solr node to a cluster and it would download all config from ZK. So 
immediately a splitting strategy automatically handled by ZkSolresourceLoader 
for large files could be one way forward, i.e. store synonyms.txt as e.g. 
__001_synonyms.txt __002_synonyms.txt

Feel free to open a JIRA issue for this so we can get a proper resolution.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 09:55 skrev Roman Chyla :

> We have synonym files bigger than 5MB so even with compression that would
> be probably failing (not using solr cloud yet)
> Roman
> On 6 May 2013 23:09, "David Parks"  wrote:
> 
>> Wouldn't it make more sense to only store a pointer to a synonyms file in
>> zookeeper? Maybe just make the synonyms file accessible via http so other
>> boxes can copy it if needed? Zookeeper was never meant for storing
>> significant amounts of data.
>> 
>> 
>> -Original Message-
>> From: Jan Høydahl [mailto:jan@cominvent.com]
>> Sent: Tuesday, May 07, 2013 4:35 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Cloud with large synonyms.txt
>> 
>> See discussion here
>> http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614.html
>> 
>> One idea was compression. Perhaps if we add gzip support to SynonymFilter
>> it
>> can read synonyms.txt.gz which would then fit larger raw dicts?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 6. mai 2013 kl. 18:32 skrev Son Nguyen :
>> 
>>> Hello,
>>> 
>>> I'm building a Solr Cloud (version 4.1.0) with 2 shards and a Zookeeper
>> (the Zookeeer is on different machine, version 3.4.5).
>>> I've tried to start with a 1.7MB synonyms.txt, but got a
>> "ConnectionLossException":
>>> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt
>>>   at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>>   at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:270)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:267)
>>>   at
>> 
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java
>> :65)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:267)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:436)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:315)
>>>   at
>> org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135)
>>>   at
>> org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:955)
>>>   at
>> org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:285)
>>>   ... 43 more
>>> 
>>> I did some researches on internet and found out that because Zookeeper
>> znode size limit is 1MB. I tried to increase the system property
>> "jute.maxbuffer" but it won't work.
>>> Does anyone have experience of dealing with it?
>>> 
>>> Thanks,
>>> Son
>> 
>> 



Re: Delete from Solr Cloud 4.0 index..

2013-05-07 Thread Annette Newton
Hi Erick,

Thanks for the tip.

Will docValues help with memory usage?  It seemed a bit complicated to set
up..

The index size saving was nice because that means that potentially I could
use smaller provisioned IOP volumes which cost less...

Thanks.


On 3 May 2013 18:27, Erick Erickson  wrote:

> Anette:
>
> Be a little careful with the index size savings, they really don't
> mean much for _searching_. The sotred field compression
> significantly reduces the size on disk, but only for the stored
> data which is only accessed when returning the top N docs. In
> terms of how many docs you can fit on your hardware, it's pretty
> irrelevant.
>
> The *.fdt and *.fdx files in your index directory contain the stored
> data, so when looking at the effects of various options (including
> compression), you can pretty much ignore these files.
>
> FWIW,
> Erick
>
> On Fri, May 3, 2013 at 2:03 AM, Annette Newton
>  wrote:
> > Thanks Shawn.
> >
> > I have played around with Soft Commits before and didn't seem to have any
> > improvement, but with the current load testing I am doing I will give it
> > another go.
> >
> > I have researched docValues and came across the fact that it would
> increase
> > the index size.  With the upgrade to 4.2.1 the index size has reduced by
> > approx 33% which is pleasing and I don't really want to lose that saving.
> >
> > We do use the facet.enum method - which works really well, but I will
> > verify that we are using that in every instance, we have numerous
> > developers working on the product and maybe one or two have slipped
> > through.
> >
> > Right from the first I upped the zkClientTimeout to 30 as I wanted to
> give
> > extra time for any network blips that we experience on AWS.  We only seem
> > to drop communication on a full garbage collection though.
> >
> > I am coming to the conclusion that we need to have more shards to cope
> with
> > the writes, so I will play around with adding more shards and see how I
> go.
> >
> >
> > I appreciate you having a look over our setup and the advice.
> >
> > Thanks again.
> >
> > Netty.
> >
> >
> > On 2 May 2013 23:17, Shawn Heisey  wrote:
> >
> >> On 5/2/2013 4:24 AM, Annette Newton wrote:
> >> > Hi Shawn,
> >> >
> >> > Thanks so much for your response.  We basically are very write
> intensive
> >> > and write throughput is pretty essential to our product.  Reads are
> >> > sporadic and actually is functioning really well.
> >> >
> >> > We write on average (at the moment) 8-12 batches of 35 documents per
> >> > minute.  But we really will be looking to write more in the future, so
> >> need
> >> > to work out scaling of solr and how to cope with more volume.
> >> >
> >> > Schema (I have changed the names) :
> >> >
> >> > http://pastebin.com/x1ry7ieW
> >> >
> >> > Config:
> >> >
> >> > http://pastebin.com/pqjTCa7L
> >>
> >> This is very clean.  There's probably more you could remove/comment, but
> >> generally speaking I couldn't find any glaring issues.  In particular,
> >> you have disabled autowarming, which is a major contributor to commit
> >> speed problems.
> >>
> >> The first thing I think I'd try is increasing zkClientTimeout to 30 or
> >> 60 seconds.  You can use the startup commandline or solr.xml, I would
> >> probably use the latter.  Here's a solr.xml fragment that uses a system
> >> property or a 15 second default:
> >>
> >> 
> >> 
> >>>> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
> >> hostContext="solr">
> >>
> >> General thoughts, these changes might not help this particular issue:
> >> You've got autoCommit with openSearcher=true.  This is a hard commit.
> >> If it were me, I would set that up with openSearcher=false and either do
> >> explicit soft commits from my application or set up autoSoftCommit with
> >> a shorter timeframe than autoCommit.
> >>
> >> This might simply be a scaling issue, where you'll need to spread the
> >> load wider than four shards.  I know that there are financial
> >> considerations with that, and they might not be small, so let's leave
> >> that alone for now.
> >>
> >> The memory problems might be a symptom/cause of the scaling issue I just
> >> mentioned.  You said you're using facets, which can be a real memory hog
> >> even with only a few of them.  Have you tried facet.method=enum to see
> >> how it performs?  You'd need to switch to it exclusively, never go with
> >> the default of fc.  You could put that in the defaults or invariants
> >> section of your request handler(s).
> >>
> >> Another way to reduce memory usage for facets is to use disk-based
> >> docValues on version 4.2 or later for the facet fields, but this will
> >> increase your index size, and your index is already quite large.
> >> Depending on your index contents, the increase may be small or large.
> >>
> >> Something to just mention: It looks like your solrconfig.xml has
> >> hard-coded absolute paths for dataDir and updateLog.  This is fine if
> >> you'll only ever have one core/

Re: solr adding unique values

2013-05-07 Thread Nikhil Kumar
Thanks Erik,
 For the reply ! I know about 'set' but that's not my goal, i had to give a
better example.
I want this and if i have to add another list_c
user a[
id:a
liists[
 list_a,
 list_b
   ]
]
It Should look like:
user a[
id:a
liists[
 list_a,
 list_b,
 list_c
   ]
]
However if i again add list_a, it should *not* be:
user a[
id:a
liists[
 list_a,
 list_b,
 list_c,
 list_a,
   ]
]
I am *not* reindexing the documents.

Depends on your goal here. I'm guessing you're using
atomic updates, in which case you need to use "set"
rather than "add" as the former replaces the contents.
See: http://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example

If you're simply re-indexing the documents, just send the entire
fresh document to solr and it'll replace the earlier document
completely.

Best
Erick


On Mon, May 6, 2013 at 1:44 PM, Nikhil Kumar wrote:

> Hey,
>I have recently started using solr, I have a list of users, which are
> subscribed to some lists.
> eg.
> user a[
> id:a
> liists[
>  list_a
>]
> ]
> user b[
>id:b
> liists[
>  list_a
>]
> ]
> I am using {"id": a, "lists":{"add":"list_a"}} to add particular list a
> user.
> but what is happening if I use the same command again, it again adds the
> same list, which i want to avoid.
> user a[
> id:a
> liists[
>  list_a,
>  list_a
>]
> ]
> I searched the documentation and tutorials, i found
>
>-
>
>overwrite = "true" | "false" — default is "true", meaning newer
>documents will replace previously added documents with the same uniqueKey.
>-
>
>commitWithin = "(milliseconds)" if the "commitWithin" attribute is
>present, the document will be added within that time. [image: ]
>Solr1.4 . See 
> CommitWithin
>-
>
>(deprecated) allowDups = "true" | "false" — default is "false"
>-
>
>(deprecated) overwritePending = "true" | "false" — default is negation
>of allowDups
>-
>
>(deprecated) overwriteCommitted = "true"|"false" — default is negation
>of allowDups
>
>
>but using overwrite and allowDups didn't solve the problem either,
>seems because there is no unique id but just value.
>
>So the question is how to solve this problem?
>
> --
> Thank You and Regards,
> Nikhil Kumar
> +91-9916343619
> Technical Analyst
> Hashed In Technologies Pvt. Ltd.
>



-- 
Thank You and Regards,
Nikhil Kumar
+91-9916343619
Technical Analyst
Hashed In Technologies Pvt. Ltd.


Lazy load Error on UI analysis area

2013-05-07 Thread yriveiro
Hi,

I was exploring the UI interface and in the analysis section I had a lazy
load error.

The logs says:

INFO  - 2013-05-07 11:52:06.412; org.apache.solr.core.SolrCore; []
webapp=/solr path=/admin/luke params={_=1367923926380&show=schema&wt=json}
status=0 QTime=23
ERROR - 2013-05-07 11:52:06.499; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:258)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.solr.FieldAnalysisRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:464)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
... 20 more
Caused by: java.lang.ClassNotFoundException:
solr.solr.FieldAnalysisRequestHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:266)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
... 24 more



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lazy-load-Error-on-UI-analysis-area-tp4061291.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search performance: shards or replications?

2013-05-07 Thread Jan Høydahl
Hi,

It depends(TM) on what kind of search performance problems you are seeing.
If you simply have so high query load that the server starts to kneal, it will
definitely not help to shard, since ALL the shards will still be hit with
ALL the queries, and you add some extra overhead with sharding as well.

But if your QPS is moderate and you have tons of documents, you may gain
better performance both for indexing latency and search latency by sharding.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov :

> Hi,
> 
> We are moving to SolrCloud architecture. And I have question about search
> performance and its correlation with shards or replicas. What will be more
> efficient: to split all index we have to several shards or create several
> replications of index? Is parallel search works with both shards and
> replicas?
> 
> Please share your experience regarding this matter.
> 
> Thanks in advance.
> 
> Regards,
> Stanislav



Re: Search performance: shards or replications?

2013-05-07 Thread Stanislav Sandalnikov
Hi Yan,

Thanks for the quick reply.

Thus, replication seems to be the preferable solution. QTime decreases
proportional to replications number or there are any other drawbacks?

Just to clarify, what amount of documents stands for "tons of documents" in
your opinion? :)


2013/5/7 Jan Høydahl 

> Hi,
>
> It depends(TM) on what kind of search performance problems you are seeing.
> If you simply have so high query load that the server starts to kneal, it
> will
> definitely not help to shard, since ALL the shards will still be hit with
> ALL the queries, and you add some extra overhead with sharding as well.
>
> But if your QPS is moderate and you have tons of documents, you may gain
> better performance both for indexing latency and search latency by
> sharding.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov  >:
>
> > Hi,
> >
> > We are moving to SolrCloud architecture. And I have question about search
> > performance and its correlation with shards or replicas. What will be
> more
> > efficient: to split all index we have to several shards or create several
> > replications of index? Is parallel search works with both shards and
> > replicas?
> >
> > Please share your experience regarding this matter.
> >
> > Thanks in advance.
> >
> > Regards,
> > Stanislav
>
>


Re: Search performance: shards or replications?

2013-05-07 Thread Stanislav Sandalnikov
P.S. Sorry for misspelling your name, Jan


2013/5/7 Stanislav Sandalnikov 

> Hi Yan,
>
> Thanks for the quick reply.
>
> Thus, replication seems to be the preferable solution. QTime decreases
> proportional to replications number or there are any other drawbacks?
>
> Just to clarify, what amount of documents stands for "tons of documents"
> in your opinion? :)
>
>
> 2013/5/7 Jan Høydahl 
>
>> Hi,
>>
>> It depends(TM) on what kind of search performance problems you are seeing.
>> If you simply have so high query load that the server starts to kneal, it
>> will
>> definitely not help to shard, since ALL the shards will still be hit with
>> ALL the queries, and you add some extra overhead with sharding as well.
>>
>> But if your QPS is moderate and you have tons of documents, you may gain
>> better performance both for indexing latency and search latency by
>> sharding.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov <
>> s.sandalni...@gmail.com>:
>>
>> > Hi,
>> >
>> > We are moving to SolrCloud architecture. And I have question about
>> search
>> > performance and its correlation with shards or replicas. What will be
>> more
>> > efficient: to split all index we have to several shards or create
>> several
>> > replications of index? Is parallel search works with both shards and
>> > replicas?
>> >
>> > Please share your experience regarding this matter.
>> >
>> > Thanks in advance.
>> >
>> > Regards,
>> > Stanislav
>>
>>
>


How to get Term Vector Information on Distributed Search

2013-05-07 Thread meghana
Hi,

I am using distributed query to fetch records. Distributed Search Document
on wiki says , Distributed Search support distributed query. but I m getting
error while querying. Not sure if I am doing anything wrong.

below is my Query to fetch Term Vector with Distributed Search. 

http://localhost:8080/solr/core1/tvrh?q=id:3426545&tv.all=true&f.text.tv.tf_idf=false&f.text.tv.df=false&tv.fl=text&shards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3&shards.qt=select&debugQuery=on

Below is error coming... 

java.lang.NullPointerException
at
org.apache.solr.handler.component.TermVectorComponent.finishStage(TermVectorComponent.java:437)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at
org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930)
at java.lang.Thread.run(Unknown Source)


Please help me on this. 
Thanks 
Meghana




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-Term-Vector-Information-on-Distributed-Search-tp4061313.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How to get Term Vector Information on Distributed Search

2013-05-07 Thread Markus Jelsma
hi - this is a known issue: https://issues.apache.org/jira/browse/SOLR-4479

 
 
-Original message-
> From:meghana 
> Sent: Tue 07-May-2013 14:28
> To: solr-user@lucene.apache.org
> Subject: How to get Term Vector Information on Distributed Search
> 
> Hi,
> 
> I am using distributed query to fetch records. Distributed Search Document
> on wiki says , Distributed Search support distributed query. but I m getting
> error while querying. Not sure if I am doing anything wrong.
> 
> below is my Query to fetch Term Vector with Distributed Search. 
> 
> http://localhost:8080/solr/core1/tvrh?q=id:3426545&tv.all=true&f.text.tv.tf_idf=false&f.text.tv.df=false&tv.fl=text&shards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3&shards.qt=select&debugQuery=on
> 
> Below is error coming... 
> 
> java.lang.NullPointerException
>   at
> org.apache.solr.handler.component.TermVectorComponent.finishStage(TermVectorComponent.java:437)
>   at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
>   at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>   at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
>   at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
>   at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>   at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
>   at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
>   at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
>   at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
>   at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
>   at
> org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)
>   at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
>   at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>   at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>   at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368)
>   at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
>   at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671)
>   at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930)
>   at java.lang.Thread.run(Unknown Source)
> 
> 
> Please help me on this. 
> Thanks 
> Meghana
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-get-Term-Vector-Information-on-Distributed-Search-tp4061313.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Solr 1.4 - Proximity Search - Where is configuration for storing positions?

2013-05-07 Thread KnightRider
I have an index built using Solr 1.4 with one field.
I was able to run proximity search (Ex: word1 within5 word2) but no where in
the configuration I see any information about storing/indexing the positions
or offsets of the terms.

My understanding is that we need to store/index termvectors
positions/offsets for proximity search to work.

Can someone please tell if positions are indexed by default in Solr 1.4?

FYI, Here is the configuration of field in schema.xml
(to keep it simple I am only adding fieldType and field definition from
schema.xml here)









 

Thanks
-kRider



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr 1.4 - Proximity Search - Where is configuration for storing positions?

2013-05-07 Thread Markus Jelsma
Hi - they are indexed by default but can be omitted since 3.4:
http://wiki.apache.org/solr/SchemaXml#Common_field_options

 
 
-Original message-
> From:KnightRider 
> Sent: Tue 07-May-2013 14:41
> To: solr-user@lucene.apache.org
> Subject: Solr 1.4 - Proximity Search - Where is configuration for storing 
> positions?
> 
> I have an index built using Solr 1.4 with one field.
> I was able to run proximity search (Ex: word1 within5 word2) but no where in
> the configuration I see any information about storing/indexing the positions
> or offsets of the terms.
> 
> My understanding is that we need to store/index termvectors
> positions/offsets for proximity search to work.
> 
> Can someone please tell if positions are indexed by default in Solr 1.4?
> 
> FYI, Here is the configuration of field in schema.xml
> (to keep it simple I am only adding fieldType and field definition from
> schema.xml here)
> 
>  omitNorms="true"
> sortMissingLast="true">
> 
>  class="solr.StandardTokenizerFactory" />
>  />
>  enablePositionIncrements="true"
> ignoreCase="true" words="stop-words.txt" />
> 
> 
> 
>   type="string" />
> 
> Thanks
> -kRider
> 
> 
> 
> -
> Thanks
> -K'Rider
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Search performance: shards or replications?

2013-05-07 Thread Andre Bois-Crettez

Some clarifications :

1) *lots of docs, few queries* : If you have a high number of documents
(+dozen millions) and lowish number of queries per second (say less than
10), replicas will not help to reduce the Qtime. For this kind of task
it is better to shard the index, as each query will effectively be
processed in parallel by N shards, thus reducing Qtime.

2) *few docs, lots of queries* : less than 10M docs and 30+ qps, on the
contrary, you want more replicas to handle more traffic, and avoid
overloaded servers (which would increase the Qtime).

3) *lots of docs, lots of queries* : do both sharding and replicas.

Actual numbers depends on the hardware, the type of docs and queries, etc.
The best is to benchmark your setup varying the load so that you case
trace a hockey stick graph of Qtime versus qps.
Feel free to ask for details if needed.



André

On 05/07/2013 01:56 PM, Stanislav Sandalnikov wrote:

Hi Yan,

Thanks for the quick reply.

Thus, replication seems to be the preferable solution. QTime decreases
proportional to replications number or there are any other drawbacks?

Just to clarify, what amount of documents stands for "tons of documents" in
your opinion? :)


2013/5/7 Jan Høydahl


Hi,

It depends(TM) on what kind of search performance problems you are seeing.
If you simply have so high query load that the server starts to kneal, it
will
definitely not help to shard, since ALL the shards will still be hit with
ALL the queries, and you add some extra overhead with sharding as well.

But if your QPS is moderate and you have tons of documents, you may gain
better performance both for indexing latency and search latency by
sharding.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov
:
Hi,

We are moving to SolrCloud architecture. And I have question about search
performance and its correlation with shards or replicas. What will be

more

efficient: to split all index we have to several shards or create several
replications of index? Is parallel search works with both shards and
replicas?

Please share your experience regarding this matter.

Thanks in advance.

Regards,
Stanislav




--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


custom facet.sort

2013-05-07 Thread Giovanni Bricconi
I have a string field containing values such as "1khz" "1ghz" "1mhz" etc.

I use this field to show a facet, currently I'm showing results in
facet.sort=count order. Now I'm asked to reorder the facet according to the
unit of measure (khz/mhz/ghz).

I also have 3/4 other custom sorting to implement

Is it possible to plug in a custom java class to provide custom facet.sort
modes?

Thank you

Giovanni


RE: Solr 1.4 - Proximity Search - Where is configuration for storing positions?

2013-05-07 Thread KnightRider
Thanks Markus.



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315p4061325.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR query performance

2013-05-07 Thread Kamal Palei
Dear All
I am using Apache SOLR 3.6.2 version for my search engine in a job site.

I am observing for a solr query taking around 15 seconds to complete. I am
sure there is something wrong in my approach or I am doing indexing
wrongly. I need assistance/pointer to resolve this issue. I am providing a
detail background of work what I have done. Kindly provide me some pointer
how do I resolve this.

I am using Drupal 7.15 framework for job site. Using Apache solr 3.6.2 as
my search engine. When a user registers his profile, I create a node
(page), attach the document in that node. In every one hour I run the cron
task and do indexing of new nodes or updated nodes.

When an employer search some key words say java, mysql, php etc, I use apis
provided by Drupal to interact with SOLR and get the documents that
contains keywords such as java, mysql, drupal etc.

There is a filter "rows". If I specify rows as 100 or 200, the query
returns first (takes around half second). If I specify rows as 3000, it
takes around 15seconds to return.

Now, my question is, Is there any mechanism, I can tell to solr that, my
start row is X, rows is Y, then it will return search result from Xth row
with Y number of rows (Please note that this is similar with LIMIT stuff
provided by mysql).

Kindly let me know. This will help us to great extent.

Best Regards
Kamal


Re: Solr Cloud with large synonyms.txt

2013-05-07 Thread Mark Miller

On May 6, 2013, at 12:32 PM, Son Nguyen  wrote:

> I did some researches on internet and found out that because Zookeeper znode 
> size limit is 1MB. I tried to increase the system property "jute.maxbuffer" 
> but it won't work.
> Does anyone have experience of dealing with it?

Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, 
though you have to do it for each ZK instance.

- Mark



Re: SOLR query performance

2013-05-07 Thread Alexandre Rafalovitch
Yes, that's what the 'start' and 'rows' parameters do in the query
string. I would check the queries Solr sees when you do that long
request. There is usually a delay in retrieving items further down the
sorted list, but 15 seconds does feel excessive.

http://wiki.apache.org/solr/CommonQueryParameters#start

Regards,
   Alex.

On Tue, May 7, 2013 at 10:10 AM, Kamal Palei  wrote:
> Now, my question is, Is there any mechanism, I can tell to solr that, my
> start row is X, rows is Y, then it will return search result from Xth row
> with Y number of rows (Please note that this is similar with LIMIT stuff
> provided by mysql).



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: Solr Cloud with large synonyms.txt

2013-05-07 Thread Mark Miller

On May 7, 2013, at 10:24 AM, Mark Miller  wrote:

> 
> On May 6, 2013, at 12:32 PM, Son Nguyen  wrote:
> 
>> I did some researches on internet and found out that because Zookeeper znode 
>> size limit is 1MB. I tried to increase the system property "jute.maxbuffer" 
>> but it won't work.
>> Does anyone have experience of dealing with it?
> 
> Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, 
> though you have to do it for each ZK instance.
> 
> - Mark
> 

"the system property must be set on all servers and clients otherwise problems 
will arise."

Make sure you try passing it both to ZK *and* to Solr.

- Mark



Get Suggester to return same phrase as query

2013-05-07 Thread Rounak Jain
Hi,

I'm using the Suggester component in Solr, and if I search for "iPhone 5"
the suggestions never give me the same phrase, that is "iPhone 5." Is there
any way to alter this behaviour to return "iPhone 5" as well?

A backup option could be to always display what the user has entered in the
UI, but I want it to be displayed *only *if there are results for it in
Solr, which is only possible if Solr returns the term.

Rounak


Re: SOLR query performance

2013-05-07 Thread Kamal Palei
Thanks a lot Alex.

I will go and try to make use of start filter and update.

Meantime, if I need to know, how many total search records are there.
Example: Lets say I am searching key word "java".

There might be 1000 documents having java keyword. I need to show only 100
records at a time.

When I query, as query result I need to know total number of records, and
only 100 records data.

At the bottom of the web page, I am showing something like

*Prev   1234567 8  910 Next*

When user clicks, 4, I will set "start" filter as 300, "rows" filter as 100
and do the query. As query result, I am expecting row count as 1000, and
100 records data (row number 301 to 400).

Is this something possible.

Alex, kindly guide me.

Thanks
kamal



On Tue, May 7, 2013 at 7:55 PM, Alexandre Rafalovitch wrote:

> Yes, that's what the 'start' and 'rows' parameters do in the query
> string. I would check the queries Solr sees when you do that long
> request. There is usually a delay in retrieving items further down the
> sorted list, but 15 seconds does feel excessive.
>
> http://wiki.apache.org/solr/CommonQueryParameters#start
>
> Regards,
>Alex.
>
> On Tue, May 7, 2013 at 10:10 AM, Kamal Palei 
> wrote:
> > Now, my question is, Is there any mechanism, I can tell to solr that, my
> > start row is X, rows is Y, then it will return search result from Xth row
> > with Y number of rows (Please note that this is similar with LIMIT stuff
> > provided by mysql).
>
>
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>


Re: SOLR query performance

2013-05-07 Thread Shawn Heisey
On 5/7/2013 8:45 AM, Kamal Palei wrote:
> When user clicks, 4, I will set "start" filter as 300, "rows" filter as 100
> and do the query. As query result, I am expecting row count as 1000, and
> 100 records data (row number 301 to 400).

This is what using the start and rows parameter with Solr will do.  A
nitpick: It will be row number 300 to 399 - the first page is accessed
with "start=0".

Requesting 3000 rows (or even a start value of 3000) should not take 15
seconds.  You should review this wiki page that I wrote for possible
problems with your install:

http://wiki.apache.org/solr/SolrPerformanceProblems

One thing that is not on the wiki page, I will need to add it: Solr
performs best when it is the only thing running on a server.  Other
applications (like a web server running Drupal) compete for resources.
Performance of both Solr and the other applications will suffer.  For
low-volume installations on really good hardware this may not be a
problem, but if your volume is high and/or your server is undersized,
then sharing is not a good idea.

Thanks,
Shawn



Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

2013-05-07 Thread KnightRider
I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
lucene whether to Index Documents/Frequencies/Positions/Offsets.

We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I was
wondering if there was a way to tell lucene whether to index
docs/freqs/pos/offsets or not in the older versions (2.9) or did it always
index positions and offsets by default?

Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and
FieldType.setStoreTermVectorOffsets.
Can someone please tell me a usecase for storing positions and offsets in
index?
Is it necessary to store termvector positions and offsets when using
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS?

Thanks
-kRider



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-positions-and-offsets-vs-FieldType-IndexOptions-DOCS-AND-FREQS-AND-POSITIONS-AND-OFFSETS-tp4061354.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search performance: shards or replications?

2013-05-07 Thread Stanislav Sandalnikov
Thank you, everything seems clear.
 07.05.2013 20:17 пользователь "Andre Bois-Crettez" 
написал:

> Some clarifications :
>
> 1) *lots of docs, few queries* : If you have a high number of documents
> (+dozen millions) and lowish number of queries per second (say less than
> 10), replicas will not help to reduce the Qtime. For this kind of task
> it is better to shard the index, as each query will effectively be
> processed in parallel by N shards, thus reducing Qtime.
>
> 2) *few docs, lots of queries* : less than 10M docs and 30+ qps, on the
> contrary, you want more replicas to handle more traffic, and avoid
> overloaded servers (which would increase the Qtime).
>
> 3) *lots of docs, lots of queries* : do both sharding and replicas.
>
> Actual numbers depends on the hardware, the type of docs and queries, etc.
> The best is to benchmark your setup varying the load so that you case
> trace a hockey stick graph of Qtime versus qps.
> Feel free to ask for details if needed.
>
>
>
> André
>
> On 05/07/2013 01:56 PM, Stanislav Sandalnikov wrote:
>
>> Hi Yan,
>>
>> Thanks for the quick reply.
>>
>> Thus, replication seems to be the preferable solution. QTime decreases
>> proportional to replications number or there are any other drawbacks?
>>
>> Just to clarify, what amount of documents stands for "tons of documents"
>> in
>> your opinion? :)
>>
>>
>> 2013/5/7 Jan Høydahl
>>
>>  Hi,
>>>
>>> It depends(TM) on what kind of search performance problems you are
>>> seeing.
>>> If you simply have so high query load that the server starts to kneal, it
>>> will
>>> definitely not help to shard, since ALL the shards will still be hit with
>>> ALL the queries, and you add some extra overhead with sharding as well.
>>>
>>> But if your QPS is moderate and you have tons of documents, you may gain
>>> better performance both for indexing latency and search latency by
>>> sharding.
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>>
>>> 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov>> gmail.com 
>>>
 :
 Hi,

 We are moving to SolrCloud architecture. And I have question about
 search
 performance and its correlation with shards or replicas. What will be

>>> more
>>>
 efficient: to split all index we have to several shards or create
 several
 replications of index? Is parallel search works with both shards and
 replicas?

 Please share your experience regarding this matter.

 Thanks in advance.

 Regards,
 Stanislav

>>>
>>>
>> --
>> André Bois-Crettez
>>
>> Search technology, Kelkoo
>> http://www.kelkoo.com/
>>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>


Re: FieldCache insanity with field used as facet and group

2013-05-07 Thread Chris Hostetter

: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:

FWIW: I'm the one that named the result of these "sanity checks" 
"FieldCacheInsantity" and i have regretted it ever since -- a better label 
would have been "inconsistency"

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
: All insane instances are for a field "merchantid" of type "int" used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not 
being consistent in how they are building hte field cache, so you are 
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if 
you could: please file a bug with the details of which Solr version you 
are using along with the schema fieldType & filed declarations for your 
merchantid field, along with the mbean stats output showing the field 
cache insanity after executing two queries like...

/select?q=*:*&facet=true&facet.field=merchantid
/select?q=*:*&group=true&group.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in 
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly 
neccessary.  unless there is something unusual in your fieldType 
delcataion, i don't think there is an easy fix you can apply -- we need to 
fix the underlying code.

-Hoss

RE: Solr Cloud with large synonyms.txt

2013-05-07 Thread Son Nguyen
Mark,

I tried to set that property on both ZK (I have only one ZK instance) and Solr, 
but it still didn't work.
But I read somewhere that ZK is not really designed for keeping large data 
files, so this solution - increasing jute.maxbuffer (if I can implement it) 
should be just temporary.

Son

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, May 07, 2013 9:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with large synonyms.txt


On May 7, 2013, at 10:24 AM, Mark Miller  wrote:

> 
> On May 6, 2013, at 12:32 PM, Son Nguyen  wrote:
> 
>> I did some researches on internet and found out that because Zookeeper znode 
>> size limit is 1MB. I tried to increase the system property "jute.maxbuffer" 
>> but it won't work.
>> Does anyone have experience of dealing with it?
> 
> Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, 
> though you have to do it for each ZK instance.
> 
> - Mark
> 

"the system property must be set on all servers and clients otherwise problems 
will arise."

Make sure you try passing it both to ZK *and* to Solr.

- Mark



RE: Solr Cloud with large synonyms.txt

2013-05-07 Thread Son Nguyen
Jan,

Thank you for your answer.
I've opened a JIRA issue with your suggestion.
https://issues.apache.org/jira/browse/SOLR-4793

Son

-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com] 
Sent: Tuesday, May 07, 2013 4:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with large synonyms.txt

Hi,

SolrCloud is designed with an assumption that you should be able to upload your 
whole disk-based conf folder into ZK, and that you should be able to add an 
empty Solr node to a cluster and it would download all config from ZK. So 
immediately a splitting strategy automatically handled by ZkSolresourceLoader 
for large files could be one way forward, i.e. store synonyms.txt as e.g. 
__001_synonyms.txt __002_synonyms.txt

Feel free to open a JIRA issue for this so we can get a proper resolution.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 09:55 skrev Roman Chyla :

> We have synonym files bigger than 5MB so even with compression that 
> would be probably failing (not using solr cloud yet) Roman On 6 May 
> 2013 23:09, "David Parks"  wrote:
> 
>> Wouldn't it make more sense to only store a pointer to a synonyms 
>> file in zookeeper? Maybe just make the synonyms file accessible via 
>> http so other boxes can copy it if needed? Zookeeper was never meant 
>> for storing significant amounts of data.
>> 
>> 
>> -Original Message-
>> From: Jan Høydahl [mailto:jan@cominvent.com]
>> Sent: Tuesday, May 07, 2013 4:35 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Cloud with large synonyms.txt
>> 
>> See discussion here
>> http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614
>> .html
>> 
>> One idea was compression. Perhaps if we add gzip support to 
>> SynonymFilter it can read synonyms.txt.gz which would then fit larger 
>> raw dicts?
>> 
>> --
>> Jan Høydahl, search solution architect Cominvent AS - 
>> www.cominvent.com
>> 
>> 6. mai 2013 kl. 18:32 skrev Son Nguyen :
>> 
>>> Hello,
>>> 
>>> I'm building a Solr Cloud (version 4.1.0) with 2 shards and a 
>>> Zookeeper
>> (the Zookeeer is on different machine, version 3.4.5).
>>> I've tried to start with a 1.7MB synonyms.txt, but got a
>> "ConnectionLossException":
>>> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt
>>>   at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>>   at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java
>> :270)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java
>> :267)
>>>   at
>> 
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecut
>> or.java
>> :65)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:2
>> 67)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:
>> 436)
>>>   at
>> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:
>> 315)
>>>   at
>> org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135)
>>>   at
>> org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:
>> 955)
>>>   at
>> org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:2
>> 85)
>>>   ... 43 more
>>> 
>>> I did some researches on internet and found out that because 
>>> Zookeeper
>> znode size limit is 1MB. I tried to increase the system property 
>> "jute.maxbuffer" but it won't work.
>>> Does anyone have experience of dealing with it?
>>> 
>>> Thanks,
>>> Son
>> 
>> 



stats cache

2013-05-07 Thread J Mohamed Zahoor
Hi

I am computing lots of stats as part of a query…
looks like the solr caching is not helping here… 

Does solr caches stats of a query?

./zahoor

facet.pivot limit

2013-05-07 Thread J Mohamed Zahoor
Hi

is there a limit for facet pivot  like we have in facet.limit?

./zahoor


Re: Solr Cloud with large synonyms.txt

2013-05-07 Thread Mark Miller
I'm not so worried about the large file in zk issue myself.

The concern is that you start storing and accessing lots of large files in ZK. 
This is not what it was made for, and everything stays in RAM, so they guard 
against this type of usage.

We are talking about a config file that is loaded on Core load though. It's 
uploaded and read very rarely. On modern hardware and networks, making that 
file 5MB rather than 1MB is not going to ruin your day. It just won't. Solr 
does not use ZooKeeper heavily - in a steady state cluster, it doesn't read or 
write from ZooKeeper at all to any degree that registers. I'm going to have to 
see problems loading these larger config files from ZooKeeper before I'm 
worried that it's a problem.

- Mark

On May 7, 2013, at 12:21 PM, Son Nguyen  wrote:

> Mark,
> 
> I tried to set that property on both ZK (I have only one ZK instance) and 
> Solr, but it still didn't work.
> But I read somewhere that ZK is not really designed for keeping large data 
> files, so this solution - increasing jute.maxbuffer (if I can implement it) 
> should be just temporary.
> 
> Son
> 
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com] 
> Sent: Tuesday, May 07, 2013 9:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud with large synonyms.txt
> 
> 
> On May 7, 2013, at 10:24 AM, Mark Miller  wrote:
> 
>> 
>> On May 6, 2013, at 12:32 PM, Son Nguyen  wrote:
>> 
>>> I did some researches on internet and found out that because Zookeeper 
>>> znode size limit is 1MB. I tried to increase the system property 
>>> "jute.maxbuffer" but it won't work.
>>> Does anyone have experience of dealing with it?
>> 
>> Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, 
>> though you have to do it for each ZK instance.
>> 
>> - Mark
>> 
> 
> "the system property must be set on all servers and clients otherwise 
> problems will arise."
> 
> Make sure you try passing it both to ZK *and* to Solr.
> 
> - Mark
> 



Re: stats cache

2013-05-07 Thread Otis Gospodnetic
Hi,

Yes, in the query cache.  You should see it in your monitoring tool or
your Solr Stats Admin page.  Doesn't help if queries don't repeat or
cache settings and poor.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
SOLR Performance Monitoring - http://sematext.com/spm/index.html





On Tue, May 7, 2013 at 12:48 PM, J Mohamed Zahoor  wrote:
> Hi
>
> I am computing lots of stats as part of a query…
> looks like the solr caching is not helping here…
>
> Does solr caches stats of a query?
>
> ./zahoor


Use case for storing positions and offsets in index?

2013-05-07 Thread KnightRider
Can someone please tell me the usecase for storing term positions and offsets
in the index?

I am trying to understand the difference between storing positions/offsets
vs indexing positions/offsets.

Thanks
KR



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-case-for-storing-positions-and-offsets-in-index-tp4061376.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delete from Solr Cloud 4.0 index..

2013-05-07 Thread Erick Erickson
bq: Will docValues help with memory usage?

'm still a bit fuzzy on all the ramifications of DocValues, but I
somewhat doubt they'll result in index size savings, they _really_
help with loading the values for a field, but the end result is still
the values in memory

People who know what they're talking about, _please_ correct this if
I'm off base.

Sure, stored field compression will help with disk space, no question.
I was mostly cautioning against extrapolating from disk size to memory
requirements without taking this into account.


Best
Erick

Best
Erick

On Tue, May 7, 2013 at 6:46 AM, Annette Newton
 wrote:
> Hi Erick,
>
> Thanks for the tip.
>
> Will docValues help with memory usage?  It seemed a bit complicated to set
> up..
>
> The index size saving was nice because that means that potentially I could
> use smaller provisioned IOP volumes which cost less...
>
> Thanks.
>
>
> On 3 May 2013 18:27, Erick Erickson  wrote:
>
>> Anette:
>>
>> Be a little careful with the index size savings, they really don't
>> mean much for _searching_. The sotred field compression
>> significantly reduces the size on disk, but only for the stored
>> data which is only accessed when returning the top N docs. In
>> terms of how many docs you can fit on your hardware, it's pretty
>> irrelevant.
>>
>> The *.fdt and *.fdx files in your index directory contain the stored
>> data, so when looking at the effects of various options (including
>> compression), you can pretty much ignore these files.
>>
>> FWIW,
>> Erick
>>
>> On Fri, May 3, 2013 at 2:03 AM, Annette Newton
>>  wrote:
>> > Thanks Shawn.
>> >
>> > I have played around with Soft Commits before and didn't seem to have any
>> > improvement, but with the current load testing I am doing I will give it
>> > another go.
>> >
>> > I have researched docValues and came across the fact that it would
>> increase
>> > the index size.  With the upgrade to 4.2.1 the index size has reduced by
>> > approx 33% which is pleasing and I don't really want to lose that saving.
>> >
>> > We do use the facet.enum method - which works really well, but I will
>> > verify that we are using that in every instance, we have numerous
>> > developers working on the product and maybe one or two have slipped
>> > through.
>> >
>> > Right from the first I upped the zkClientTimeout to 30 as I wanted to
>> give
>> > extra time for any network blips that we experience on AWS.  We only seem
>> > to drop communication on a full garbage collection though.
>> >
>> > I am coming to the conclusion that we need to have more shards to cope
>> with
>> > the writes, so I will play around with adding more shards and see how I
>> go.
>> >
>> >
>> > I appreciate you having a look over our setup and the advice.
>> >
>> > Thanks again.
>> >
>> > Netty.
>> >
>> >
>> > On 2 May 2013 23:17, Shawn Heisey  wrote:
>> >
>> >> On 5/2/2013 4:24 AM, Annette Newton wrote:
>> >> > Hi Shawn,
>> >> >
>> >> > Thanks so much for your response.  We basically are very write
>> intensive
>> >> > and write throughput is pretty essential to our product.  Reads are
>> >> > sporadic and actually is functioning really well.
>> >> >
>> >> > We write on average (at the moment) 8-12 batches of 35 documents per
>> >> > minute.  But we really will be looking to write more in the future, so
>> >> need
>> >> > to work out scaling of solr and how to cope with more volume.
>> >> >
>> >> > Schema (I have changed the names) :
>> >> >
>> >> > http://pastebin.com/x1ry7ieW
>> >> >
>> >> > Config:
>> >> >
>> >> > http://pastebin.com/pqjTCa7L
>> >>
>> >> This is very clean.  There's probably more you could remove/comment, but
>> >> generally speaking I couldn't find any glaring issues.  In particular,
>> >> you have disabled autowarming, which is a major contributor to commit
>> >> speed problems.
>> >>
>> >> The first thing I think I'd try is increasing zkClientTimeout to 30 or
>> >> 60 seconds.  You can use the startup commandline or solr.xml, I would
>> >> probably use the latter.  Here's a solr.xml fragment that uses a system
>> >> property or a 15 second default:
>> >>
>> >> 
>> >> 
>> >>   > >> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
>> >> hostContext="solr">
>> >>
>> >> General thoughts, these changes might not help this particular issue:
>> >> You've got autoCommit with openSearcher=true.  This is a hard commit.
>> >> If it were me, I would set that up with openSearcher=false and either do
>> >> explicit soft commits from my application or set up autoSoftCommit with
>> >> a shorter timeframe than autoCommit.
>> >>
>> >> This might simply be a scaling issue, where you'll need to spread the
>> >> load wider than four shards.  I know that there are financial
>> >> considerations with that, and they might not be small, so let's leave
>> >> that alone for now.
>> >>
>> >> The memory problems might be a symptom/cause of the scaling issue I just
>> >> mentioned.  You said you're using facets, which can be a real memory h

Re: Rearranging Search Results of a Search?

2013-05-07 Thread Erick Erickson
No, DocTransformers work on a single document at a time, which is
pretty clear if you look at the methods you must implement.

Really, you'd do yourself a favor by doing a little more research
before asking questions, you might review:
http://wiki.apache.org/solr/UsingMailingLists
and consider that most of us are volunteers with limited time. So a
little evidence that you're putting forth some effort before pinging
the list would be well received.

Best
Erick

On Tue, May 7, 2013 at 4:04 AM, Furkan KAMACI  wrote:
> Can I use Transformers for my purpose?
>
> 2013/5/3 Furkan KAMACI 
>
>> I think this looks like what I search for:
>> https://issues.apache.org/jira/browse/SOLR-4465
>>
>> How about post filter for Lucene, can it help me for my purpose?
>>
>> 2013/5/3 Otis Gospodnetic 
>>
>>> Hi,
>>>
>>> You should use search more often :)
>>>
>>> http://search-lucene.com/?q=scriptable+collector&sort=newestOnTop&fc_project=Solr&fc_type=issue
>>>
>>> Coincidentally, what you see there happens to be a good example of a
>>> Solr component that does something behind the scenes to deliver those
>>> search results even though my original query was bad.  Knd of
>>> similar to what you are after.
>>>
>>> Otis
>>> --
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>>
>>>
>>>
>>>
>>>
>>> On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI 
>>> wrote:
>>> > I know that I can use boosting at query for a field, for a searching
>>> term,
>>> > at solrconfig.xml and query elevator so I can arrange the results of a
>>> > search. However after I get top documents how can I change the order of
>>> a
>>> > results? Does Lucene's postfilter stands for that?
>>>
>>
>>


Re: Lazy load Error on UI analysis area

2013-05-07 Thread Erick Erickson
It looks like you have old jars in the classpath somewhere, class not found
just shouldn't be happening.

If this can be reproduced on a fresh install (and even better on a machine
that's never had Solr installed) it would be something we'd need to pursue...

Best
Erick

On Tue, May 7, 2013 at 6:56 AM, yriveiro  wrote:
> Hi,
>
> I was exploring the UI interface and in the analysis section I had a lazy
> load error.
>
> The logs says:
>
> INFO  - 2013-05-07 11:52:06.412; org.apache.solr.core.SolrCore; []
> webapp=/solr path=/admin/luke params={_=1367923926380&show=schema&wt=json}
> status=0 QTime=23
> ERROR - 2013-05-07 11:52:06.499; org.apache.solr.common.SolrException;
> null:org.apache.solr.common.SolrException: lazy loading error
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:258)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.solr.common.SolrException: Error loading class
> 'solr.solr.FieldAnalysisRequestHandler'
> at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:464)
> at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
> at 
> org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
> ... 20 more
> Caused by: java.lang.ClassNotFoundException:
> solr.solr.FieldAnalysisRequestHandler
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:266)
> at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
> ... 24 more
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Lazy-load-Error-on-UI-analysis-area-tp4061291.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: Unsubscribing from JIRA

2013-05-07 Thread johnmunir

For someone link me, who want to follow dev discussions but not JIRA, having a 
separate mailing list subscription for each would be ideal.  The incoming mail 
traffic would be cut drastically (for me, I get far more non relevant emails 
from JIRA vs. dev).


-- MJ
 
-Original Message-
From: Raymond Wiker [mailto:rwi...@gmail.com] 
Sent: Wednesday, May 01, 2013 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Unsubscribing from JIRA
 
On May 1, 2013,at 19:07 , johnmunir@aol.comwrote:
> Are yousaying because I'm subscribed to dev, which I'm, is why I'm getting 
> JIRA mailstoo, and the only way I can stop JIRA mails is to unsubscribe from 
> dev?  I don't think so.  I'm subscribed to other projects, both devand user, 
> and yet I do not receive JIRA mails.
> 
 
I'm pretty surethat's the case... I subscribed to dev, and got the JIRA mails. 
I unsubscribedfrom dev, and the JIRA mails stopped.


Re: Get Suggester to return same phrase as query

2013-05-07 Thread Erick Erickson
Hmmm, R. Muir did some work here:
https://issues.apache.org/jira/browse/SOLR-3143, note that it's 4.0 or
later. I haven't implemented this, but this is a common problem so if
you do dig into it and get it to work (warning, I haven't a clue) it'd
be a great contribution to the Wiki.

Best
Erick

On Tue, May 7, 2013 at 10:41 AM, Rounak Jain  wrote:
> Hi,
>
> I'm using the Suggester component in Solr, and if I search for "iPhone 5"
> the suggestions never give me the same phrase, that is "iPhone 5." Is there
> any way to alter this behaviour to return "iPhone 5" as well?
>
> A backup option could be to always display what the user has entered in the
> UI, but I want it to be displayed *only *if there are results for it in
> Solr, which is only possible if Solr returns the term.
>
> Rounak


Re: Unsubscribing from JIRA

2013-05-07 Thread Alexandre Rafalovitch
Email filters? I mean, you may have a point, but the cost of change at
this moment is probably too high. Personal email filters, on the other
hand, seems like an easy solution.

Regards,
   Alex.

On Tue, May 7, 2013 at 2:01 PM,   wrote:
> For someone link me, who want to follow dev discussions but not JIRA, having 
> a separate mailing list subscription for each would be ideal.  The incoming 
> mail traffic would be cut drastically (for me, I get far more non relevant 
> emails from JIRA vs. dev).



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Search identifier fields containing blanks

2013-05-07 Thread Silvio Hermann

Hello,

I am about to index identfier fields containing blanks (shelfmarks) eg. G 23/60 
12
The field type is set to Solr.string. To get the exact matching hit (the doc 
with shelfmark mentioned above) the user must quote the search term. Is there a 
way to omit the quotes?

Best,

Silvio


Re: Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

2013-05-07 Thread Shawn Heisey

On 5/7/2013 9:50 AM, KnightRider wrote:

I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
lucene whether to Index Documents/Frequencies/Positions/Offsets.


I really don't like giving unhelpful responses like this, but I don't 
think there's any other way to go.


This is the solr-user mailing list.  Most of the end-users here (and a 
few of the regulars, including myself) have very little experience with 
Lucene, even though Solr is a Lucene application and the source code is 
part of Lucene.


There are a number of lucene-specific discussion places available:

http://lucene.apache.org/core/discussion.html

Thanks,
Shawn



dataimport handler

2013-05-07 Thread Eric Myers
In the  data import handler  I have multiple entities.  Each one
generates a date in the
dataimport.properties i.e. entityname.last_index_time.

How do I reference the specific entity time in my delta queries?

Thanks

Eric


Re: solr.LatLonType type vs solr.SpatialRecursivePrefixTreeFieldType

2013-05-07 Thread Smiley, David W.
Hi Barani,

This identical question was posed at the same time on StackOverflow, and I
answered it there already:

http://stackoverflow.com/questions/16407110/solr-4-2-solr-latlontype-type-v
s-solr-spatialrecursiveprefixtreefieldtype/16409327#16409327

~ David

On 5/6/13 12:28 PM, "bbarani"  wrote:

>Hi,
>
>I am currently using SOLR 4.2 to index geospatial data. I have configured
>my
>geospatial field as below.
>
> subFieldSuffix="_coordinate"/>
>
>  stored="false" multiValued="true"/>
>
>I just want to make sure that I am using the correct SOLR class for
>performing geospatial search since I am not sure which of the 2
>class(LatLonType vs  SpatialRecursivePrefixTreeFieldType) will be
>supported
>by future versions of SOLR.
>
>I assume latlong is an upgraded version of
>SpatialRecursivePrefixTreeFieldType, can someone please confirm if I am
>right?
>
>Thanks,
>Barani 
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/solr-LatLonType-type-vs-solr-SpatialRec
>ursivePrefixTreeFieldType-tp4061113.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Re: ConcurrentUpdateSolrServer "Missing ContentType" error on SOLR 4.2.1

2013-05-07 Thread cleardot
This is resolved, I switched in the 4.2.1 jars and also corrected a mismatch
between the compile and runtime JDKs, for some reason the system was
overriding my JAVA_HOME setting (6.1) and running the client with a 5.0 JVM. 
I did not have to use setParser.

I did try running the 'new' 4.2.1 SolrJ client against SOLR 3.6 and got this
error in the server log:

2013-05-07 16:14:34,835 WARN 
[org.apache.solr.handler.XmlUpdateRequestHandler]
(http-0.0.0.0-18841-Processor15) Unknown attribute doc/field/@update

so I've settled for separate 3.6 and 4.2.1 versions.

Your info helped a lot, thanks Shawn.

DK




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-Missing-ContentType-error-on-SOLR-4-2-1-tp4061160p4061416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Storing and retrieving Objects using ByteField

2013-05-07 Thread zqzuk
Hi

I need to store and retrieve some custom java objects using Solr and I have
used ByteField and java serialisation for this. Using the embedded jetty
server I can see these byte data but when I use Solrj api to retrieve the
data they are not available. Details are below:

My schema:
--
 
 



My query using jetty embedded solr server:
--
http://localhost:8983/solr/collection1/select?q=id:1843921115&wt=xml&indent=true
-


And I can see the following results in the browser:
-
esult name="response" numFound="1" start="0">

1843921115

rO0ABXNyABNqYXZhLnV0aWwuQXJyYX..blahblahblah

1434407268842995712

-

So it looks like the data are created properly.

However, when I use SolrJ to retrieve this record like this:

---
 ModifiableSolrParams params = new ModifiableSolrParams();
 params.set("q", "id:1843921115");
 QueryResponse response = server.query(params);
 SolrDocument doc = response.getResults().get(0);
 if(doc.getFieldValues("value")==null)
 System.out.println("data unavailable");
---

I can see that "doc" only have two fields: id and version, and the field
"value" is never available.

Please suggestions what have I done wrong?

Many thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-and-retrieving-Objects-using-ByteField-tp4061418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Index compatibility between Solr releases.

2013-05-07 Thread Skand Gupta
We have a fairly large (in the order of 10s of TB) indices built using Solr
3.5. We are considering migrating to Solr 4.3 and was wondering what the
policy is on maintaining backward compatibility of the indices? Will 4.3
work with my 3.5 indexes? Because of the large data size, I would ideally
like to move new data to 4.3 and gradually re-index all the 3.5 indices.

Thanks,
- Skand.


Index compatibility between Solr releases.

2013-05-07 Thread Skand Gupta
We have a fairly large (in the order of 10s of TB) indices built using Solr
3.5. We are considering migrating to Solr 4.3 and was wondering what the
policy is on maintaining backward compatibility of the indices? Will 4.3
work with my 3.5 indexes? Because of the large data size, I would ideally
like to move new data to 4.3 and gradually re-index all the 3.5 indices.

Thanks,
- Skand.


Re: Index compatibility between Solr releases.

2013-05-07 Thread Shawn Heisey

On 5/7/2013 3:11 PM, Skand Gupta wrote:

We have a fairly large (in the order of 10s of TB) indices built using Solr
3.5. We are considering migrating to Solr 4.3 and was wondering what the
policy is on maintaining backward compatibility of the indices? Will 4.3
work with my 3.5 indexes? Because of the large data size, I would ideally
like to move new data to 4.3 and gradually re-index all the 3.5 indices.


Solr 4.x will read 3.x indexes with no problem.  When Solr 5.x comes 
out, it will read 4.x indexes, but it will not read 3.x indexes.


If the 4.x server does any updates on a 3.x index, it will write new 
segments in the new format, and if existing segments get merged, they 
will be in the new format.


If you do an optimize in that situation, which would take forever with 
terabytes of data, Solr would convert the index format.  Reindexing is 
MUCH better, but you've already stated that as a goal, so I won't 
mention any more about that.


Due to advances and bugfixes, you might see some unusual behavior until 
you reindex.  This happens due to changes in the way analyzers and query 
parsers work as compared to the way things worked on 3.5 when you built 
the index.  The more complicated your analyzer chains are in your 
schema, the more likely you are to run into this.


One thing that might be of immediate concern - in 4.0 and later, the 
forward slash is a special query character and must be escaped with a 
backslash.  It is safe to send this escaped character to 3.5 as well. 
The utility method in SolrJ for escaping queries 
(ClientUtils#escapeQueryChars) has been updated to include the foward 
slash in newer SolrJ versions.


Thanks,
Shawn



Re: dataimport handler

2013-05-07 Thread Shalin Shekhar Mangar
Using ${dih..last_index_time} should work. Make sure you put
it in quotes in your query.


On Tue, May 7, 2013 at 12:07 PM, Eric Myers  wrote:

> In the  data import handler  I have multiple entities.  Each one
> generates a date in the
> dataimport.properties i.e. entityname.last_index_time.
>
> How do I reference the specific entity time in my delta queries?
>
> Thanks
>
> Eric
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: stats cache

2013-05-07 Thread Yonik Seeley
On Tue, May 7, 2013 at 12:48 PM, J Mohamed Zahoor  wrote:
> Hi
>
> I am computing lots of stats as part of a query…
> looks like the solr caching is not helping here…
>
> Does solr caches stats of a query?

No.  Neither facet counts or stats part of a request are cached.  The
query cache only caches top N docs (plus scores if applicable) for a
given query + filters.

If the whole request is identical, then you can use an HTTP caching
mechanism though.

-Yonik
http://lucidworks.com


Re: Index compatibility between Solr releases.

2013-05-07 Thread Skand S Gupta
Thank you Shawn. This was detailed and very helpful.

Skand. 

On May 7, 2013, at 5:54 PM, Shawn Heisey  wrote:

> On 5/7/2013 3:11 PM, Skand Gupta wrote:
>> We have a fairly large (in the order of 10s of TB) indices built using Solr
>> 3.5. We are considering migrating to Solr 4.3 and was wondering what the
>> policy is on maintaining backward compatibility of the indices? Will 4.3
>> work with my 3.5 indexes? Because of the large data size, I would ideally
>> like to move new data to 4.3 and gradually re-index all the 3.5 indices.
> 
> Solr 4.x will read 3.x indexes with no problem.  When Solr 5.x comes out, it 
> will read 4.x indexes, but it will not read 3.x indexes.
> 
> If the 4.x server does any updates on a 3.x index, it will write new segments 
> in the new format, and if existing segments get merged, they will be in the 
> new format.
> 
> If you do an optimize in that situation, which would take forever with 
> terabytes of data, Solr would convert the index format.  Reindexing is MUCH 
> better, but you've already stated that as a goal, so I won't mention any more 
> about that.
> 
> Due to advances and bugfixes, you might see some unusual behavior until you 
> reindex.  This happens due to changes in the way analyzers and query parsers 
> work as compared to the way things worked on 3.5 when you built the index.  
> The more complicated your analyzer chains are in your schema, the more likely 
> you are to run into this.
> 
> One thing that might be of immediate concern - in 4.0 and later, the forward 
> slash is a special query character and must be escaped with a backslash.  It 
> is safe to send this escaped character to 3.5 as well. The utility method in 
> SolrJ for escaping queries (ClientUtils#escapeQueryChars) has been updated to 
> include the foward slash in newer SolrJ versions.
> 
> Thanks,
> Shawn
> 


Index corrupted detection from http get command.

2013-05-07 Thread Michel Dion
Hello,

I'm look for a way to detect solr index corruption using a http get
command. I've look at the /admin/ping and /admin/luke request handlers but
not sure if the their status provide guarantees that everything is all
right. The idea is to be able to tell a load balancer to put a given solr
instance out of rotation if its index is  corrupted.

Thanks

Michel


Re: Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

2013-05-07 Thread KnightRider
Thanks Shawn. I'll reach out to Lucene discussion group.



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-positions-and-offsets-vs-FieldType-IndexOptions-DOCS-AND-FREQS-AND-POSITIONS-AND-OFFSETS-tp4061354p4061457.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questions about the performance of Solr

2013-05-07 Thread joo
Thank you.
However, fq is already in use.
In my opinion, it is to think that it might be slow data of 70 million
reviews is contained in the core of one, but do you have examples of
performance of a certain number or more may decrease maybe?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Questions-about-the-performance-of-Solr-tp4060988p4061461.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search identifier fields containing blanks

2013-05-07 Thread Chris Hostetter

: I am about to index identfier fields containing blanks (shelfmarks) eg. G
: 23/60 12
: The field type is set to Solr.string. To get the exact matching hit (the doc
: with shelfmark mentioned above) the user must quote the search term. Is there
: a way to omit the quotes?

whitespace has to be quoted when using the lucene QParser because it's a 
semanticly significant character that means "end boolean query clause"

if you want to search for a literal string w/o needing any escaping, use 
the term QParser...

{!term f=yourFieldName}G 23/60 12

Of course, if you are putting this in a URL (ie: testing in a browser) it 
still needs to be URL escaped...

/select?q={!term+f=yourFieldName}G+23/60+12


-Hoss


Re: Unsubscribing from JIRA

2013-05-07 Thread Chris Hostetter

: Email filters? I mean, you may have a point, but the cost of change at
: this moment is probably too high. Personal email filters, on the other
: hand, seems like an easy solution.

The reason for having Jira notifications go to the devs list is that all 
of the comments & discussion in jira are the bulk of the discussion about 
developing Solr/Lucene.  The goal is to make it easy to subscribe to one 
list and then be notified about eveyrthing related to the development 
efforts.

As mentioned in a previous comment, the appropraite place to suggest 
policy changes to the dev list would be on the dev list -- but Alex's 
comment is probably what you are going to hear from most people.


-Hoss


Re: Search identifier fields containing blanks

2013-05-07 Thread Upayavira


On Wed, May 8, 2013, at 02:07 AM, Chris Hostetter wrote:
> 
> : I am about to index identfier fields containing blanks (shelfmarks) eg.
> G
> : 23/60 12
> : The field type is set to Solr.string. To get the exact matching hit
> (the doc
> : with shelfmark mentioned above) the user must quote the search term. Is
> there
> : a way to omit the quotes?
> 
> whitespace has to be quoted when using the lucene QParser because it's a 
> semanticly significant character that means "end boolean query clause"
> 
> if you want to search for a literal string w/o needing any escaping, use 
> the term QParser...
> 
>   {!term f=yourFieldName}G 23/60 12
> 
> Of course, if you are putting this in a URL (ie: testing in a browser) it 
> still needs to be URL escaped...
> 
>   /select?q={!term+f=yourFieldName}G+23/60+12

I'm surprised you didn't offer the improvement (a technique I learned
from you..:-) ):

/select?q={!term f=yourFieldName v=$productCode}&productCode=G 23/60 12

which allows you to present the code as a separate request parameter.

Upayavira


Re: Scores dilemma after providing boosting with bq as same weigtage for 2 condition

2013-05-07 Thread nishi
"ab_1eb83ef9bc0896":"
0.17063755 = (MATCH) sum of:
  3.085E-4 = (MATCH) MatchAllDocsQuery, product of:
3.085E-4 = queryNorm
  0.009742409 = (MATCH) product of:
0.019484818 = (MATCH) sum of:
  0.016588148 = (MATCH) sum of:
0.0034696688 = (MATCH) weight(articleTopic:Food^1.2 in 2441)
[DefaultSimilarity], result of:
  0.0034696688 = score(doc=2441,freq=1.0 = termFreq=1.0
), product of:
0.0012905049 = queryWeight, product of:
  1.2 = boost
  2.6886134 = idf(docFreq=52556, maxDocs=284437)
  3.085E-4 = queryNorm
2.6886134 = fieldWeight in 2441, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  2.6886134 = idf(docFreq=52556, maxDocs=284437)
  1.0 = fieldNorm(doc=2441)
0.013118479 = (MATCH) weight(articleTopic:Office^1.2 in 2441)
[DefaultSimilarity], result of:
  0.013118479 = score(doc=2441,freq=1.0 = termFreq=1.0
), product of:
0.0025093278 = queryWeight, product of:
  1.2 = boost
  5.2278857 = idf(docFreq=4147, maxDocs=284437)
  3.085E-4 = queryNorm
5.2278857 = fieldWeight in 2441, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  5.2278857 = idf(docFreq=4147, maxDocs=284437)
  1.0 = fieldNorm(doc=2441)
  7.967604E-4 = (MATCH) product of:
0.0051789423 = (MATCH) sum of:
  0.0017619515 = (MATCH) weight(subTopic:Protein in 2441)
[DefaultSimilarity], result of:
0.0017619515 = score(doc=2441,freq=1.0 = termFreq=1.0
), product of:
  4.5981447E-4 = queryWeight, product of:
3.8318748 = idf(docFreq=16753, maxDocs=284437)
1.1999726E-4 = queryNorm
  3.8318748 = fieldWeight in 2441, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.8318748 = idf(docFreq=16753, maxDocs=284437)
1.0 = fieldNorm(doc=2441)
  0.0034169909 = (MATCH) weight(subTopic:Printers in 2441)
[DefaultSimilarity], result of:
0.0034169909 = score(doc=2441,freq=1.0 = termFreq=1.0
), product of:
  5.019797E-4 = queryWeight, product of:
0.3 = boost
4.18326 = idf(docFreq=11789, maxDocs=284437)
3.085E-4 = queryNorm
  4.18326 = fieldWeight in 2441, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.18326 = idf(docFreq=11789, maxDocs=284437)
1.0 = fieldNorm(doc=2441)
0.5 = coord(3/6)
  0.16049515 = (MATCH)
FunctionQuery(0.08/(3.16E-11*float(ms(const(136779840),date(pubDate)))+0.5)),
product of:
0.16049883 =
0.08/(3.16E-11*float(ms(const(136779840),date(pubDate)=1367847578000))+0.5)
2500.0 = boost
3.085E-4 = queryNorm
",
.
.
.
..

This is the explain description for the score coming upbut not coming in
easier understandable format...any pointer would be helpful, meantime
looking into it to understand more



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Scores-dilemma-after-providing-boosting-with-bq-as-same-weigtage-for-2-condition-tp4061035p4061480.html
Sent from the Solr - User mailing list archive at Nabble.com.