Re: sort by function

2010-05-16 Thread MitchK

Can you please do some math to show the principle?

Do you want to do something like this: 
finalScore = score * rank
finalScore = rank

???

If the first is the case, than it is done by default (have a look at the
wiki-example for making more recent documents more relevant).
If the second is the case, than I would say you need a new sort-function
(never realized something like that).

Hope this helps
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-by-function-tp814380p821239.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort by function

2010-05-16 Thread MitchK

Forget what I said about the second case.
The second case is a simple sort on your field. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-by-function-tp814380p821252.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Connection Pool

2010-05-16 Thread Monmohan Singh
Sorry for hijacking the thread, but I have an additional question
Is there a way to achieve similar performance (SUSS like) when targeting
extract request handler (/update/extract)?
I guess one way can be to extract content on the client side and then use
SUSS to send update request but then extraction needs to be taken care of
locally in an asynchronous/batch manner.
Regards
Monmohan

On Sun, May 16, 2010 at 5:19 AM, Lance Norskog  wrote:

> Connection spooling is specified by the underlying apache commons
> connection manager when you create the Server.
>
> The SUSS does socket pooling by default and is the preferred way to do
> concurrent indexing. There are some quirks in the Server
> implementation set, and SUSS avoids them. Unless you are willing to
> root around in the SolrJ Server code and understand exactly how it
> works, stay with the SUSS.
>
> On Fri, May 14, 2010 at 6:44 AM, gabriele renzi  wrote:
> > On Fri, May 14, 2010 at 3:35 PM, Anderson vasconcelos
> >  wrote:
> >> Hi
> >> I wanna to know if has any connection pool client to manage the
> connections
> >> with solr. In my system, we have a lot of concurrency index request. I
> cant
> >> shared my  connection, i need to create one per transaction. But if i
> create
> >> one per transaction, i think the performance will down.
> >>
> >> How you resolve this problem?
> >
> > The commonsHttpSolrServer class does connection pooling, and IIRC also
> > the StreamingUpdateSolrServer.
> >
> >
> >
> > --
> > blog en: http://www.riffraff.info
> > blog it: http://riffraff.blogsome.com
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


match to non tokenizable word ("helloworld")

2010-05-16 Thread siping liu

I get no match when searching for "helloworld", even though I have "hello 
world" in my index. How do people usually deal with this? Write a custom 
analyzer, with help from a collection of all dictionary words?

 

thanks for suggestions/comments.
  
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

Re: bi-directional replication on solr 1.4?

2010-05-16 Thread Mark Miller

On 5/14/10 8:08 PM, Chris Hostetter wrote:


: It looks like SnapPuller.java doesn't allow for the possibility of the
: slave having a later index version than the master. It only checks
: whether the versions are equal.
:
: It's easy enough to add that check and prevent the index fetch when
: the slave has a later version (in fact I'm running it in a sandbox

I'm not 100% positive, but i believe a change like that could cause
problems if the index on the master is completley rebuild from scratch.

indexVersion is garunteed to increase as the index is modified, (ie: add
or merge segments) but i think an entirely new index (ie: delete the
entire index directory as deleteByQuery("*:*) does and then reindex) could
concievably result i na new index with a lower indexVersion number then
the index it replaces.


I think you should be good because the version starts as the current 
time in milliseconds? (and then is incremented by 1 on every commit 
thereafter)




Yonik / Miller: does the SolrCloud branch already have support for master
failover in a situation like this (ie: a two node "cloud") ?


No - no master/failover support in SolrCloud yet. We havn't integrated 
at all with replication yet.





-Hoss




--
- Mark

http://www.lucidimagination.com


Re: match to non tokenizable word ("helloworld")

2010-05-16 Thread Erick Erickson
You might want to look at ngrams and/or shingles. In this
case I suspect that ngrams are better suited, I don't
think shingles applies with the direction you stated, but
your problem description is so short I thought I'd mention
it.

Although your collection of words can work (think synonyms) if you have a
pre-determined, probably small, list of equivalencies...

Best
Erick

On Sun, May 16, 2010 at 12:58 PM, siping liu  wrote:

>
> I get no match when searching for "helloworld", even though I have "hello
> world" in my index. How do people usually deal with this? Write a custom
> analyzer, with help from a collection of all dictionary words?
>
>
>
> thanks for suggestions/comments.
>
> _
> Hotmail has tools for the New Busy. Search, chat and e-mail from your
> inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
>


Re: date slider

2010-05-16 Thread Ahmet Arslan

> Now I also want to offer a slider to define the range to
> include in the result set. However here I do not want to do
> faceting, instead I just want to find out the min and max
> date values in the result (without any of the facet filters
> applies) so I know the start and end points for the slider.
> The user can then move the sliders to further filter the
> result set.
> 
> How can I best go about fetching just those min and max
> values, ideally without having to add a separate query just
> for this?

http://wiki.apache.org/solr/StatsComponent can give you min and max values.
Since it calculates additional statistics, i am not sure which one is faster: 
fetching min and max separately or using stats component.

q=query&start=0&rows=1&fl=date&sort=date asc
q=query&start=0&rows=1&fl=date&sort=date desc



  


Connection Reset Errors on a Distributed Index

2010-05-16 Thread harish.agarwal

Hello,

For reference, I've posted about this before (but have new information now):
http://lucene.472066.n3.nabble.com/Connection-reset-errors-during-commits-optimize-td484058.html#a484058

and have seen other similar posts as well:
http://lucene.472066.n3.nabble.com/Question-on-Solr-Distributed-Search-td495188.html#a495191

During the aftermath of commits on a distributed index (3 shards, about 3M
documents each with many many facets), I'm getting ConnectionReset errors
(see below for the full trace).  The place in the code where it happens is
where the 'master server' is waiting on results from the other shards.  I've
been combing through the logs of all the shards at the time of the
exceptions and have noticed that every exception is thrown on a search which
does not appear in one of the other shard's logs.  In addition, the shard
which doesn't record the search is usually busy warming up uninverted fields
for facet searches, via a newSearcher query, at the time.  These exceptions
are problematic because they tend to slow the server down to a crawl,
sometimes permanently.

Does anyone have any advice for me on how to proceed?  Is it possible that a
shard would stop responding while uninverting fields?

Thanks,
-Harish

Trace:

SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: java.net.SocketException:
Connection reset
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection resetat
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
   
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
   
at
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
   
at
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:394)
   
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   
at java.util.concurrent.FutureTask.run(FutureTask.java:138)at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)   
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   
at java.util.concurrent.FutureTask.run(FutureTask.java:138)at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
   
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 
  
... 1 moreCaused by: java.net.SocketException: Connection resetat
java.net.SocketInputStream.read(SocketInputStream.java:168)at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)at
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)   
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
   
at
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
   
at
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
   
at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)  
 
at
org.apache.commons.httpc

Re: date slider

2010-05-16 Thread Ahmet Arslan


> http://wiki.apache.org/solr/StatsComponent can give you
> min and max values.

Sorry my bad, I just tested StatsComponent with tdate field. And it is not 
working for date typed fields. Wiki says it is for numeric fields.


  


Re: date slider

2010-05-16 Thread Lukas Kahwe Smith

On 16.05.2010, at 21:01, Ahmet Arslan  wrote:




http://wiki.apache.org/solr/StatsComponent can give you
min and max values.


Sorry my bad, I just tested StatsComponent with tdate field. And it  
is not working for date typed fields. Wiki says it is for numeric  
fields.


ok thx for checking. is my use case really so unusual? i guess i could  
store a unix timestamp or i just do a fixed range.


hmm if i use facets with a really large gap will it always give me at  
least the min and max maybe? will try it out when i get home.


regards
Lukas


RE: Solr Deployment Question

2010-05-16 Thread Maduranga Kannangara
They are two web applications running on a single Tomcat instance.

Thanks
Madu



-Original Message-
From: findbestopensource [mailto:findbestopensou...@gmail.com] 
Sent: Friday, 14 May 2010 4:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Deployment Question

Please explain how you have handled two indexes in a single VM. Is it multi
core?

To identify memory consumption, You need to calculate usedmemory before and
after loading the indexes, basically calculate usedmemory before and after
any check point you want to analyse. Their difference will give you the
actual memory consumption.

Regards
Aditya
http://www.findbestopensource.com


On Fri, May 14, 2010 at 11:14 AM, Maduranga Kannangara <
mkannang...@infomedia.com.au> wrote:

> But even we used a single index, we were running out of memory.
> What do you mean by "active"? No queries on the masters.
> Only one index is being processed/optimized.
>
> Also, if I may add to my same question, how can I find the
> amount of memory that an index would use, theoretically?
> i.e.: Is there a formulae etc?
>
> Thanks
> Madu
>
>
>
> -Original Message-
> From: findbestopensource [mailto:findbestopensou...@gmail.com]
> Sent: Friday, 14 May 2010 3:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Deployment Question
>
> You may use one index at a time, but both indexes are active and loaded all
> its terms in memory. Memory consumption will be certainly more.
>
> Regards
> Aditya
> http://www.findbestopensource.com
>
> On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara <
> mkannang...@infomedia.com.au> wrote:
>
> > Hi
> >
> > We use separate JVMs to Index and Query.
> > (Client applications will query only slaves,
> > while master does only indexing)
> >
> > Recently we moved a two master indexes to
> > a single JVM. Our memory allocation was for
> > each index was 512Mb and 1Gb.
> >
> > Once we moved both indexes to a single VM,
> > we thought it would still Index using 1Gb as we
> > use only one index at a time. But for our surprise
> > it needed more than that (1.2Gb) even though
> > only one index was used at a time.
> >
> > Can I know why, or can I know how to find
> > why this is?
> >
> > Solr 1.4
> > Java 1.6.0_20
> >
> > We use a VPS for deployment.
> >
> > Thanks in advance
> > Madu
> >
> >
> >
>


Solr Search problem; cannot search the existing word in the index content

2010-05-16 Thread Mint o_O!
Hi,

I'm working on the index/search project recently and i found solr which is
very fascinating to me.

I followed the test successful from the tutorial page. Starting up jetty and
run adding new xml (user:~/solr/example/exampledocs$ *java -jar post.jar
*.xml*) so far so good at this stage.

Now i have create my own testing westpac.xml file with real data I intend to
implement, putting in exampledocs and again ran the command
(user:~/solr/example/exampledocs$ *java -jar post.jar westpac.xml*).
Everything went on very well however when i searched for "*rhode*" which is
in the content. And Index returned nothing.

Could anyone guide me what I did wrong why i couldn't search for that word
even though that word is in my index content.

thanks,

Mint