Re: Facet and totaltermfreq

2012-05-05 Thread Dmitry Kan
The query:

http://localhost:8983/solr/select/?qt=tvrh&q=query:the&tv.fl=query&tv.all=true&f.id.tv.tf=true&facet.field=id&facet=true&facet.limit=-1&facet.mincount=1

be careful with facet.limit=-1, it'll pull everything matching the query.
Probably paging would make more sense in your case.
f.id.tv.tf will make sure to output term frequencies.

In my schema the field "query" contains some "text" data to search against
(don't get mixed up ;) ). You will also need termVectors="true"
termPositions="true" termOffsets="true" for your searching field.

Relevant exceprts from response:



721
the doors "the end"
2002-08-09T04:05:28Z
wifcbcdr

...





1
...





 
721


1

4
9


1

37
0.02702702702702703


1

15
18


3

43
0.023255813953488372

 
2

0
3
11
14


0
2

787
0.0025412960609911056



...


-Dmitry

On Sat, May 5, 2012 at 12:05 AM, Jamie Johnson  wrote:

> it might be...can you provide an example of the request/response?
>
> On Fri, May 4, 2012 at 3:31 PM, Dmitry Kan  wrote:
> > I have tried (as a test) combining facets and term vectors (
> > http://wiki.apache.org/solr/TermVectorComponent ) in one query and was
> able
> > to get a list of facets and for each facet there was a term freq under
> > termVectors section. Not sure, if that's what you are trying to achieve.
> >
> > -Dmitry
> >
> > On Fri, May 4, 2012 at 8:37 PM, Jamie Johnson  wrote:
> >
> >> Is it possible when faceting to return not only the strings but also
> >> the total term frequency for those facets?  I am trying to avoid
> >> building a customized faceting component and making multiple queries.
> >> In our scenario we have multivalued fields which may have duplicates
> >> and I would like to be able to get a count of how many documents that
> >> term appears (currently what faceting does) but also how many times
> >> that term appears in general.
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
>



-- 
Regards,

Dmitry Kan


Re: >1MB file to Zookeeper

2012-05-05 Thread Jan Høydahl
ZK is not really designed for keeping large data files, from 
http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access:
> ZooKeeper was not designed to be a general database or large object 
> store.If large data storage is needed, the usually pattern of dealing 
> with such data is to store it on a bulk storage system, such as NFS or HDFS, 
> and store pointers to the storage locations in ZooKeeper.

So perhaps we should think about adding K/V store support to ResourceLoader? If 
a file is >1Mb, a reference to the file is stored in ZK under
the original resource name, in a way that ResourceLoader can tell that it is a 
reference, not the complete file. We then make a simple 
LargeObjectStoreInterface (with get/put/del) which ResourceLoader uses to get 
the complete file based on reference. To start with we can make a
ZkLargeFileStoreImpl where the put(key,val) method chops up the file and stores 
it spanning multiple 1M ZK nodes, and the get(key) method
assembles all parts and returns the object. It would be good enough for most, 
but if you require something better you can easily impl
support for CouchDb, Voldemort or whatever.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 4. mai 2012, at 19:09, Yonik Seeley wrote:

> On Fri, May 4, 2012 at 12:50 PM, Mark Miller  wrote:
>>> And how should we detect if data is compressed when
>>> reading from ZooKeeper?
>> 
>> I was thinking we could somehow use file extensions?
>> 
>> eg synonyms.txt.gzip - then you can use different compression algs depending 
>> on the ext, etc.
>> 
>> We would want to try and make it as transparent as possible though...
> 
> At first I thought about adding a marker to the beginning of a file, but
> file extensions could work too, as long as the resource loader made it
> transparent
> (i.e. code would just need to ask for synonyms.txt, but the resource
> loader would search
> for synonyms.txt.gzip, etc, if the original name was not found)
> 
> Hmmm, but this breaks down for things like watches - I guess that's
> where putting the encoding inside the file would be a better option.
> 
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10



Re: >1MB file to Zookeeper

2012-05-05 Thread Yonik Seeley
On Sat, May 5, 2012 at 8:39 AM, Jan Høydahl  wrote:
> support for CouchDb, Voldemort or whatever.

Hmmm... Or Solr!

-Yonik


solr: adding a string on to a field via DIH

2012-05-05 Thread okayndc
Hello,

Is it possible to concatenate a field via DIH?
For example for the id field, in order to make it unique
I want to add 'project' to the beginning of the id field.
So the field would look like 'project1234'
Is this possible?



Thanks


Re: solr: adding a string on to a field via DIH

2012-05-05 Thread Michael Della Bitta
There might be a Solr way of accomplishing this, but I've always done
stuff like this in SQL (i.e. the CONCAT command). Doing it a Solr-native
way would probably be better in terms of bandwidth consumption, but just
giving you that option early in case there's not a better one.

Michael

On Sat, 2012-05-05 at 09:12 -0400, okayndc wrote:
> Hello,
> 
> Is it possible to concatenate a field via DIH?
> For example for the id field, in order to make it unique
> I want to add 'project' to the beginning of the id field.
> So the field would look like 'project1234'
> Is this possible?
> 
> 
> 
> Thanks




Re: solr: adding a string on to a field via DIH

2012-05-05 Thread Jack Krupansky
Sounds like you need a "Template Transformer": "... it helps to concatenate 
multiple values or add extra characters to field for injection."




...


See:
http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer

Or did you have something different in mind?

-- Jack Krupansky

-Original Message- 
From: okayndc

Sent: Saturday, May 05, 2012 9:12 AM
To: solr-user@lucene.apache.org
Subject: solr: adding a string on to a field via DIH

Hello,

Is it possible to concatenate a field via DIH?
For example for the id field, in order to make it unique
I want to add 'project' to the beginning of the id field.
So the field would look like 'project1234'
Is this possible?



Thanks 



Re: solr: adding a string on to a field via DIH

2012-05-05 Thread okayndc
Thanks guys.  I had taken a quick look at
the Template Transformer and it looks it does
what I need it to dodidn't see the 'hello' part
when reviewing earlier.

On Sat, May 5, 2012 at 11:47 AM, Jack Krupansky wrote:

> Sounds like you need a "Template Transformer": "... it helps to
> concatenate multiple values or add extra characters to field for injection."
>
> 
> 
> ...
> 
>
> See:
> http://wiki.apache.org/solr/**DataImportHandler#**TemplateTransformer
>
> Or did you have something different in mind?
>
> -- Jack Krupansky
>
> -Original Message- From: okayndc
> Sent: Saturday, May 05, 2012 9:12 AM
> To: solr-user@lucene.apache.org
> Subject: solr: adding a string on to a field via DIH
>
>
> Hello,
>
> Is it possible to concatenate a field via DIH?
> For example for the id field, in order to make it unique
> I want to add 'project' to the beginning of the id field.
> So the field would look like 'project1234'
> Is this possible?
>
> 
>
> Thanks
>


Re: Solr Merge during off peak times

2012-05-05 Thread Shawn Heisey

On 5/4/2012 8:10 PM, Lance Norskog wrote:

Optimize takes a 'maxSegments' option. This tells it to stop when
there are N segments instead of just one.

If you use a very high mergeFactor and then call optimize with a sane
number like 50, it only merges the little teeny segments.


When I optimize, I want only one segment.  My main concern in doing 
occasional optimizes is removing deleted documents.  Whatever speedup I 
get from having only one segment is just a nice bonus.


When it comes to only merging the small segments, I am concerned about 
that happening when regular indexing builds up enough segments to do a 
merge.  If I start with one large optimized segment, then do indexing 
operations such that I reach segmentsPerTier, will it leave the large 
segment alone and just work on the little ones?  I am using Solr 3.5 
with the following config:



35
35
105


Thanks,
Shawn



Re: >1MB file to Zookeeper

2012-05-05 Thread Mark Miller

On May 5, 2012, at 8:39 AM, Jan Høydahl wrote:

> ZK is not really designed for keeping large data files, 
> fromhttp://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access:
>> ZooKeeper was not designed to be a general database or large object 
>> store.If large data storage is needed, the usually pattern of dealing 
>> with such data is to store it on a bulk storage system, such as NFS or HDFS, 
>> and store pointers to the storage locations in ZooKeeper.
> 

I don't really think it's that big a deal where Solr is concerned. We don't use 
ZooKeeper intensively. Only for state changes, and initially loading config for 
SolrCore starts or reloads. ZooKeeper perf should not be a big deal for Solr. I 
think the main issue is going to be that zk wants everything in ram - but 
again, a few, couple MB conf files should be no big deal AFAICT.

Other large scale applications that constantly and intensively use zookeeper 
are a different story - when Solr is in its running state, it doesnt do 
anything with zookeeper other than maintain it's heartbeat.

- Mark Miller
lucidimagination.com













Re: Understanding RecoveryStrategy

2012-05-05 Thread Mark Miller

On May 5, 2012, at 2:37 AM, Trym R. Møller wrote:

> Hi
> 
> Using Solr trunk with the replica feature, I see the below exception 
> repeatedly in the Solr log.
> I have been looking into the code of RecoveryStrategy#commitOnLeader and read 
> the code as follows:
> 1. sends a commit request (with COMMIT_END_POINT=true) to the Solr instance 
> containing the leader of the slice
> 2. sends a commit request to the Solr instance containing the leader of the 
> slice
> The first results in a commit on the shards in the single leader Solr 
> instance and the second results in a commit on the shards in the single 
> leader Solr plus on all other Solrs having slices or replica belonging to the 
> collection.
> 
> I would expect that the first request is the relevant (and enough to do a 
> recovery of the specific replica).
> Am I reading the second request wrong or is it a bug?

Your right - that second server.commit() looks like a bug - I don't think it 
should be harmful - it would just send out a commit to the cluster - but it 
should not be there.

I'm not sure why that second commit is timing out though. I guess it may be 
that reopening the searcher so quickly twice in a row is taking longer than the 
30 second timeout. I guess we have to consider if we even need that commit to 
open a new searcher - but either way it should be a lot better once we remove 
that second commit.

> 
> The code I'm referring to is
>UpdateRequest ureq = new UpdateRequest();
>ureq.setParams(new ModifiableSolrParams());
>ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true);
>ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl);
> 1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, 
> true).process(
>server);
> 2.server.commit();
> 
> Thanks in advance for any input.
> 
> Best regards Trym R. Møller
> 
> Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to 
> recover:org.apache.solr.client.solrj.SolrServerException: 
> http://myIP:8983/solr/myShardId
>at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493)
>at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
>at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103)
>at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180)
>at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156)
>at 
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170)
>at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
>at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341)
>at 
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
> Caused by: java.net.SocketTimeoutException: Read timed out
>at java.net.SocketInputStream.socketRead0(Native Method)
>at java.net.SocketInputStream.read(SocketInputStream.java:129)
>at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>at 
> org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
>at 
> org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
>at 
> org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
>at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
>at 
> org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
>at 
> org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
>at 
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
>at 
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>at 
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440)
>... 8 more

- Mark Miller
lucidimagination.com













Re: Understanding RecoveryStrategy

2012-05-05 Thread Mark Miller
https://issues.apache.org/jira/browse/SOLR-3437

On May 5, 2012, at 12:46 PM, Mark Miller wrote:

> 
> On May 5, 2012, at 2:37 AM, Trym R. Møller wrote:
> 
>> Hi
>> 
>> Using Solr trunk with the replica feature, I see the below exception 
>> repeatedly in the Solr log.
>> I have been looking into the code of RecoveryStrategy#commitOnLeader and 
>> read the code as follows:
>> 1. sends a commit request (with COMMIT_END_POINT=true) to the Solr instance 
>> containing the leader of the slice
>> 2. sends a commit request to the Solr instance containing the leader of the 
>> slice
>> The first results in a commit on the shards in the single leader Solr 
>> instance and the second results in a commit on the shards in the single 
>> leader Solr plus on all other Solrs having slices or replica belonging to 
>> the collection.
>> 
>> I would expect that the first request is the relevant (and enough to do a 
>> recovery of the specific replica).
>> Am I reading the second request wrong or is it a bug?
> 
> Your right - that second server.commit() looks like a bug - I don't think it 
> should be harmful - it would just send out a commit to the cluster - but it 
> should not be there.
> 
> I'm not sure why that second commit is timing out though. I guess it may be 
> that reopening the searcher so quickly twice in a row is taking longer than 
> the 30 second timeout. I guess we have to consider if we even need that 
> commit to open a new searcher - but either way it should be a lot better once 
> we remove that second commit.
> 
>> 
>> The code I'm referring to is
>>   UpdateRequest ureq = new UpdateRequest();
>>   ureq.setParams(new ModifiableSolrParams());
>>   ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true);
>>   ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl);
>> 1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, 
>> true).process(
>>   server);
>> 2.server.commit();
>> 
>> Thanks in advance for any input.
>> 
>> Best regards Trym R. Møller
>> 
>> Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log
>> SEVERE: Error while trying to 
>> recover:org.apache.solr.client.solrj.SolrServerException: 
>> http://myIP:8983/solr/myShardId
>>   at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493)
>>   at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
>>   at 
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103)
>>   at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180)
>>   at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156)
>>   at 
>> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170)
>>   at 
>> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
>>   at 
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341)
>>   at 
>> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
>> Caused by: java.net.SocketTimeoutException: Read timed out
>>   at java.net.SocketInputStream.socketRead0(Native Method)
>>   at java.net.SocketInputStream.read(SocketInputStream.java:129)
>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>>   at 
>> org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
>>   at 
>> org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
>>   at 
>> org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
>>   at 
>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
>>   at 
>> org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
>>   at 
>> org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
>>   at 
>> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
>>   at 
>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>>   at 
>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>>   at 
>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>>   at 
>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>>   at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440)
>>   ... 8 more
> 
> - Mark Miller
> lucidimagination.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

- Mark Miller
lucidimagination.com













Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-05 Thread Erick Erickson
The first thing I'd check is if, in the log, there is a replication happening
immediately prior to the error. I confess I'm not entirely up on the
version thing, but is it possible you're replicating an index that
is built with some other version of Solr?

That would at least explain your statement that it runs OK, but then
fails sometime later.

Best
Erick

On Fri, May 4, 2012 at 1:50 PM, Ravi Solr  wrote:
> Hello,
>         We Recently we migrated our SOLR 3.6 server OS from Solaris
> to CentOS and from then on we started seeing "Invalid version
> (expected 2, but 60)" errors on one of the query servers (oddly one
> other query server seems fine). If we restart the server having issue
> everything will be alright, but the next day in the morning again we
> get the same exception. I made sure that all the client applications
> are using SOLR 3.6 version.
>
> The Glassfish on which all the applications  and SOLR are deployed use
> Java  1.6.0_29. The only difference I could see
>
> 1. The process indexing to the server having issues is using java1.6.0_31
> 2. The process indexing to the server that DOES NOT have issues is
> using java1.6.0_29
>
> Could the Java minor version being greater than the SOLR instance be
> the cause of this issue  ???
>
> Can anybody please help me debug this a bit more ? what else can I
> look at to understand the underlying problem. The stack trace is given
> below
>
>
> [#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5;
> org.apache.solr.client.solrj.SolrServerException: Error executing query
>        at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
>        at 
> com.wpost.ipad.feeds.FeedController.findLinksetNewsBySection(FeedController.java:743)
>        at 
> com.wpost.ipad.feeds.FeedController.findNewsBySection(FeedController.java:347)
>        at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
>        at 
> org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
>        at 
> org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
>        at 
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
>        at 
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
>        at 
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
>        at 
> org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
>        at 
> org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
>        at 
> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
>        at 
> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
>        at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
>        at 
> com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
>        at 
> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
>        at 
> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
>        at 
> org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
>        at 
> org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
>        at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
>        at 
> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
>        at 
> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
>        at 
> org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
>        at 
> org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
>        at 
> org.a

Re: Single Index to Shards

2012-05-05 Thread Erick Erickson
Oh, isn't that easier! Need more coffee before suggesting things..

Thanks,
Erick

On Fri, May 4, 2012 at 8:16 PM, Lance Norskog  wrote:
> If you are not using SolrCloud, splitting an index is simple:
> 1) copy the index
> 2) remove what you do not want via "delete-by-query"
> 3) Optimize!
>
> #2 brings up a basic design question: you have to decide which
> documents go to which shards. Mostly people use a value generated by a
> hash on the actual id- this allows you to assign docs evenly.
>
> http://wiki.apache.org/solr/UniqueKey
>
> On Fri, May 4, 2012 at 4:28 PM, Young, Cody  wrote:
>> You can also make a copy of your existing index, bring it up as a second 
>> instance/core and then send delete queries to both indexes.
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Friday, May 04, 2012 8:37 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Single Index to Shards
>>
>> There's no way to split an _existing_ index into multiple shards, although 
>> some of the work on SolrCloud is considering being able to do this. You have 
>> a couple of choices here:
>>
>> 1> Just reindex everything from scratch into two shards
>> 2> delete all the docs from your index that will go into shard 2 and
>> 2> just
>>     index the docs for shard 2 in your new shard
>>
>> But I want to be sure you're on the right track here. You only need to shard 
>> if your index contains "too many" documents for your hardware to produce 
>> decent query rates. If you are getting (and I'm picking this number out of 
>> thin air) 50 QPS on your hardware (i.e. you're not stressing memory
>> etc) and just want to get to 150 QPS, use replication rather than sharding.
>>
>> see: http://wiki.apache.org/solr/SolrReplication
>>
>> Best
>> Erick
>>
>> On Fri, May 4, 2012 at 9:44 AM, michaelsever  wrote:
>>> If I have a single Solr index running on a Core, can I split it or
>>> migrate it into 2 shards?
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht
>>> ml Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
> Lance Norskog
> goks...@gmail.com


Re: Single Index to Shards

2012-05-05 Thread Lance Norskog
We did it at my last job. Took a few days to split a 500mdoc index.

On Sat, May 5, 2012 at 9:55 AM, Erick Erickson  wrote:
> Oh, isn't that easier! Need more coffee before suggesting things..
>
> Thanks,
> Erick
>
> On Fri, May 4, 2012 at 8:16 PM, Lance Norskog  wrote:
>> If you are not using SolrCloud, splitting an index is simple:
>> 1) copy the index
>> 2) remove what you do not want via "delete-by-query"
>> 3) Optimize!
>>
>> #2 brings up a basic design question: you have to decide which
>> documents go to which shards. Mostly people use a value generated by a
>> hash on the actual id- this allows you to assign docs evenly.
>>
>> http://wiki.apache.org/solr/UniqueKey
>>
>> On Fri, May 4, 2012 at 4:28 PM, Young, Cody  wrote:
>>> You can also make a copy of your existing index, bring it up as a second 
>>> instance/core and then send delete queries to both indexes.
>>>
>>> -Original Message-
>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>> Sent: Friday, May 04, 2012 8:37 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Single Index to Shards
>>>
>>> There's no way to split an _existing_ index into multiple shards, although 
>>> some of the work on SolrCloud is considering being able to do this. You 
>>> have a couple of choices here:
>>>
>>> 1> Just reindex everything from scratch into two shards
>>> 2> delete all the docs from your index that will go into shard 2 and
>>> 2> just
>>>     index the docs for shard 2 in your new shard
>>>
>>> But I want to be sure you're on the right track here. You only need to 
>>> shard if your index contains "too many" documents for your hardware to 
>>> produce decent query rates. If you are getting (and I'm picking this number 
>>> out of thin air) 50 QPS on your hardware (i.e. you're not stressing memory
>>> etc) and just want to get to 150 QPS, use replication rather than sharding.
>>>
>>> see: http://wiki.apache.org/solr/SolrReplication
>>>
>>> Best
>>> Erick
>>>
>>> On Fri, May 4, 2012 at 9:44 AM, michaelsever  wrote:
 If I have a single Solr index running on a Core, can I split it or
 migrate it into 2 shards?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht
 ml Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com



-- 
Lance Norskog
goks...@gmail.com


Re: Why would solr norms come up different from Lucene norms?

2012-05-05 Thread Lance Norskog
Which Similarity class do you use for the Lucene code? Solr has a custom one.

On Fri, May 4, 2012 at 6:30 AM, Benson Margulies  wrote:
> So, I've got some code that stores the same documents in a Lucene
> 3.5.0 index and a Solr 3.5.0 instance. It's only five documents.
>
> For a particular field, the Solr norm is always 0.625, while the
> Lucene norm is .5.
>
> I've watched the code in NormsWriterPerField in both cases.
>
> In Solr we've got .577, in naked Lucene it's .5.
>
> I tried to check for boosts, and I don't see any non-1.0 document or
> field boosts.
>
> The Solr field is:
>
>  stored="true" multiValued="false" />



-- 
Lance Norskog
goks...@gmail.com


Re: problem with date searching.

2012-05-05 Thread Lance Norskog
Use debugQuery=true to see exactly how the dismax parser sees this query.

Also, since this is a binary query, you can use filter queries
instead. Those use the Lucene syntax.

On Fri, May 4, 2012 at 8:14 AM, Erick Erickson  wrote:
> Right, you need to do the explicit qualification of the date field.
> dismax parsing is intended to work with text-type fields, not
> numeric or date fields. If you attach &debugQuery=on, you'll
> see that your "scanneddate" field is just dropped.
>
> Furthermore, dismax was never intended to work with range
> queries. Note this from the DisMaxQParserPlugin page:
>
> " extremely simplified subset of the Lucene QueryParser syntax"
>
> I'll expand on this a bit on the Wiki page.
>
>
> Best
> Erick
>
> On Fri, May 4, 2012 at 6:45 AM, Dmitry Kan  wrote:
>> unless, something else is wrong, my question would be, if you have the
>> documents in solr stamped with these dates?
>> also could try for a test specifying the field name directly:
>>
>> q=scanneddate:["2011-09-22T22:40:30Z" TO "2012-02-02T01:30:52Z"]
>>
>> also, in your first e-mail you said you have used
>>
>> [*"2012-02-02T01:30:52Z" TO "2012-02-02T01:30:52Z"*]
>>
>> with asterisks *, what scanneddate values did you then get?
>>
>> On Fri, May 4, 2012 at 1:37 PM, ayyappan  wrote:
>>
>>> thanks for quick response.
>>>
>>>  I tried your advice .  ["2011-09-22T22:40:30Z" TO "2012-02-02T01:30:52Z"]
>>> like that even though i am not getting any result .
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Dmitry Kan



-- 
Lance Norskog
goks...@gmail.com


Re: Why would solr norms come up different from Lucene norms?

2012-05-05 Thread Benson Margulies
On Sat, May 5, 2012 at 7:59 PM, Lance Norskog  wrote:
> Which Similarity class do you use for the Lucene code? Solr has a custom one.

I am embarassed to report that I also have a custom similarity that I
didn't know about, and once I configured that into Solr all was well.


>
> On Fri, May 4, 2012 at 6:30 AM, Benson Margulies  
> wrote:
>> So, I've got some code that stores the same documents in a Lucene
>> 3.5.0 index and a Solr 3.5.0 instance. It's only five documents.
>>
>> For a particular field, the Solr norm is always 0.625, while the
>> Lucene norm is .5.
>>
>> I've watched the code in NormsWriterPerField in both cases.
>>
>> In Solr we've got .577, in naked Lucene it's .5.
>>
>> I tried to check for boosts, and I don't see any non-1.0 document or
>> field boosts.
>>
>> The Solr field is:
>>
>> > stored="true" multiValued="false" />
>
>
>
> --
> Lance Norskog
> goks...@gmail.com