Re: thresholding results by percentage drop from maxScore in lucene/solr

2010-05-01 Thread MitchK

I am curious:
What is your usecase or what type of data is this? Web-Pages? Blog-posts?
Product-items?

Can you provide some real examples so that we can discuss other ideas than
doing it by the score?
Because I think this is not possible or really difficult to achieve, since
you don't know what the highest score will be, until every document that
match the query is found.

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/thresholding-results-by-percentage-drop-from-maxScore-in-lucene-solr-tp768872p770063.html
Sent from the Solr - User mailing list archive at Nabble.com.


Random Field

2010-05-01 Thread Blargy

Can someone explain a useful case for the RandomSortField? 



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Random-Field-tp770087p770087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Random Field

2010-05-01 Thread Yonik Seeley
On Sat, May 1, 2010 at 10:23 AM, Blargy  wrote:
> Can someone explain a useful case for the RandomSortField?

People sometimes have requirements to show different results to
everyone (essentially randomly shuffling matches per person).

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague

> 
>    


Re: Solr Dismax query - prefix matching

2010-05-01 Thread Ahmet Arslan

> Folks, 
> Greetings.
> Using dismax query parser is there a way to perform prefix
> match. For
> example: If I have a field called 'booktitle' with the
> actual values as
> 'Code Complete', 'Coding standard 101', then I'd like to
> search for the
> query string 'cod' and have the dismax match against both
> the book
> titles since 'cod' is a prefix match for 'code' and
> 'coding'. 

dismax does not support PrefixQuery (cod*) if you asking that. edismax or 
Extended Dismax [1] supports supports full lucene query syntax.

[1]https://issues.apache.org/jira/browse/SOLR-1553


  


Re: Random Field

2010-05-01 Thread Static Void
What would be more useful would be randomizing closely related hits.  
IE hits within 5% of each other


Sent from my iPhone

On May 1, 2010, at 7:37 AM, Yonik Seeley   
wrote:



On Sat, May 1, 2010 at 10:23 AM, Blargy  wrote:

Can someone explain a useful case for the RandomSortField?


People sometimes have requirements to show different results to
everyone (essentially randomly shuffling matches per person).

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



   indexed="true" />


Re: Random Field

2010-05-01 Thread Yonik Seeley
On Sat, May 1, 2010 at 12:32 PM, Static Void  wrote:
> What would be more useful would be randomizing closely related hits. IE hits
> within 5% of each other

This is not the use case I've encountered multiple times in the past, but
it should also be doable by using the random field in a function query.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


Solr commit issue

2010-05-01 Thread Indika Tantrigoda
Hi all,

I've been working with Solr for a few weeks and have gotten SolrJ
to connect to it, index, search documents.

However I am having an issue when a document is committed.
When a document is committed it does not show in the search results if I do
a *:* search,
but if I search for it with some text then it is shown in the results.
Only when another document is committed, the previous document is found when
I do a *:* search

Is this because of the SolrJ client or do I have to pass additional
parameters to commit() ?

Thanks in advance.

Regards,
Indika


Re: Random Field

2010-05-01 Thread Static Void
Example use case: We have a bunch of items sold by multiple sellers. I  
would rather show closely related items  distributed by seller rather  
than clumps of items by the same seller. This will be more of a "fair"  
scoring for sellers. The scores should be within a certain percentage  
of each other to still return relevant results.


Would you mine providing an example of random sort in a boost function/ 
function query. Thanks


Sent from my iPhone

On May 1, 2010, at 9:37 AM, Yonik Seeley   
wrote:


On Sat, May 1, 2010 at 12:32 PM, Static Void > wrote:
What would be more useful would be randomizing closely related  
hits. IE hits

within 5% of each other


This is not the use case I've encountered multiple times in the  
past, but
it should also be doable by using the random field in a function  
query.


-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


Re: Solr commit issue

2010-05-01 Thread Erick Erickson
The underlying IndexReader must be reopened. If you're
searching for a document with a searcher that was opened
before the document was indexed, it won't show up on the
search results.

I'm guessing that your statement that when you search
for it with some test is coincidence, but that's just a guess.

HTH
Erick

On Sat, May 1, 2010 at 1:07 PM, Indika Tantrigoda wrote:

> Hi all,
>
> I've been working with Solr for a few weeks and have gotten SolrJ
> to connect to it, index, search documents.
>
> However I am having an issue when a document is committed.
> When a document is committed it does not show in the search results if I do
> a *:* search,
> but if I search for it with some text then it is shown in the results.
> Only when another document is committed, the previous document is found
> when
> I do a *:* search
>
> Is this because of the SolrJ client or do I have to pass additional
> parameters to commit() ?
>
> Thanks in advance.
>
> Regards,
> Indika
>


Embedded Server and Webapp using same Index???

2010-05-01 Thread John Gillies
I've been searching in vain for an answer to my question so hopefully someone 
on the mail list has the answer or a solution.

I am attempting to use the Solr Embedded Server to generate my index and then 
utilize the Solr Web Application to query the same index and it works great if 
I restart the Solr Web Application after I've created the index, which is not 
ideal.

Is what I want to do possible?  

The application that is doing the indexing and the Solr web application are 
both running fine in JBoss and I want to use the embedded server for indexing 
due to better performance.  I have 250,000 documents to index once a week and 
I'd like the index to be available for searching as it is indexed.  I commit to 
the embedded server every 100 documents so that number of documents being 
indexed isn't too many and take to long to process.

So I guess the root question is can you have the embedded server and the Solr 
webapp looking and using the same core index at the same time?

Should I use a master slave replication where the embedded server is creating 
the index for the master and the Solr Web app is querying the slave index?  If 
so How do you setup the embedded server as a master, since there is no URL to 
point the Solr Web app to?

Am I missing something simple and fundamental? probably.

I'm currently running this all on Windows for development but in production it 
will be on Linux, does that make a difference?

Is there a specific locking mechanism I need to set (I've tried Native, Simple 
and Single without success).

Is the answer simply that you can't have both the embedded and web app solr 
working off the same Index at the same time and I should just have my 
application index the data via the Solr web app?

Thank you,

John



  

Re: Embedded Server and Webapp using same Index???

2010-05-01 Thread Erick Erickson
The problem here, I think, is that you're updating the
index in a manner that the regular SOLR webapp doesn't
know about. So the index changes without SOLR knowing
it has to reopen the index to see the modifications.

Something to try:
curl http://localhost:8983/solr/update -F stream.body=' '
This *might* force SOLR to reopen the underlying index, but I
admit I haven't tried it (stolen from the SOLR 1.4 e-book). I'm not
entirely sure that this will actually do what you want, it's possible
that the SOLR webapp will no-op it since the webapp hasn't seen
the changes.

When you stop and restart the server, you're forcing the underlying
IndexReader to reopen, which is why you're getting the changes after
you restart.

I'm pretty sure that what you want is possible without restarting the
server, it's just the mechanics of how to make it happen that I'm not
entirely clear about.

In general, you should have one process updating the index, but
any number of processes you want *reading* the index.

HTH
Erick

On Sat, May 1, 2010 at 3:39 PM, John Gillies  wrote:

> I've been searching in vain for an answer to my question so hopefully
> someone on the mail list has the answer or a solution.
>
> I am attempting to use the Solr Embedded Server to generate my index and
> then utilize the Solr Web Application to query the same index and it works
> great if I restart the Solr Web Application after I've created the index,
> which is not ideal.
>
> Is what I want to do possible?
>
> The application that is doing the indexing and the Solr web application are
> both running fine in JBoss and I want to use the embedded server for
> indexing due to better performance.  I have 250,000 documents to index once
> a week and I'd like the index to be available for searching as it is
> indexed.  I commit to the embedded server every 100 documents so that number
> of documents being indexed isn't too many and take to long to process.
>
> So I guess the root question is can you have the embedded server and the
> Solr webapp looking and using the same core index at the same time?
>
> Should I use a master slave replication where the embedded server is
> creating the index for the master and the Solr Web app is querying the slave
> index?  If so How do you setup the embedded server as a master, since there
> is no URL to point the Solr Web app to?
>
> Am I missing something simple and fundamental? probably.
>
> I'm currently running this all on Windows for development but in production
> it will be on Linux, does that make a difference?
>
> Is there a specific locking mechanism I need to set (I've tried Native,
> Simple and Single without success).
>
> Is the answer simply that you can't have both the embedded and web app solr
> working off the same Index at the same time and I should just have my
> application index the data via the Solr web app?
>
> Thank you,
>
> John
>
>
>
>


run on reboot on windows

2010-05-01 Thread S Ahmed
Hi,

I'm trying to get Solr to run on windows, such that if it reboots the Solr
service will be running.

How can I do this?


RE: Embedded Server and Webapp using same Index???

2010-05-01 Thread Titash Neogi
You could even do the indexing via csv update handler. Something like this -

http://:/solr/update/csv?commit=true&separator=~&escape="&stream
.contentType=text/plain;charset=utf-8&stream.file= 

via a backend process. This lets the document be available for searching as
soon as the commit happens. Such that, if you do 

http://:/solr/select?q=

via a browser or any other process, you should see the results immediately.
We are doing this in our app. 

HTH
Titash


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, May 01, 2010 2:46 PM
To: solr-user@lucene.apache.org; John Gillies
Subject: Re: Embedded Server and Webapp using same Index???

The problem here, I think, is that you're updating the
index in a manner that the regular SOLR webapp doesn't
know about. So the index changes without SOLR knowing
it has to reopen the index to see the modifications.

Something to try:
curl http://localhost:8983/solr/update -F stream.body=' '
This *might* force SOLR to reopen the underlying index, but I
admit I haven't tried it (stolen from the SOLR 1.4 e-book). I'm not
entirely sure that this will actually do what you want, it's possible
that the SOLR webapp will no-op it since the webapp hasn't seen
the changes.

When you stop and restart the server, you're forcing the underlying
IndexReader to reopen, which is why you're getting the changes after
you restart.

I'm pretty sure that what you want is possible without restarting the
server, it's just the mechanics of how to make it happen that I'm not
entirely clear about.

In general, you should have one process updating the index, but
any number of processes you want *reading* the index.

HTH
Erick

On Sat, May 1, 2010 at 3:39 PM, John Gillies  wrote:

> I've been searching in vain for an answer to my question so hopefully
> someone on the mail list has the answer or a solution.
>
> I am attempting to use the Solr Embedded Server to generate my index and
> then utilize the Solr Web Application to query the same index and it works
> great if I restart the Solr Web Application after I've created the index,
> which is not ideal.
>
> Is what I want to do possible?
>
> The application that is doing the indexing and the Solr web application
are
> both running fine in JBoss and I want to use the embedded server for
> indexing due to better performance.  I have 250,000 documents to index
once
> a week and I'd like the index to be available for searching as it is
> indexed.  I commit to the embedded server every 100 documents so that
number
> of documents being indexed isn't too many and take to long to process.
>
> So I guess the root question is can you have the embedded server and the
> Solr webapp looking and using the same core index at the same time?
>
> Should I use a master slave replication where the embedded server is
> creating the index for the master and the Solr Web app is querying the
slave
> index?  If so How do you setup the embedded server as a master, since
there
> is no URL to point the Solr Web app to?
>
> Am I missing something simple and fundamental? probably.
>
> I'm currently running this all on Windows for development but in
production
> it will be on Linux, does that make a difference?
>
> Is there a specific locking mechanism I need to set (I've tried Native,
> Simple and Single without success).
>
> Is the answer simply that you can't have both the embedded and web app
solr
> working off the same Index at the same time and I should just have my
> application index the data via the Solr web app?
>
> Thank you,
>
> John
>
>
>
>



Re: Solr commit issue

2010-05-01 Thread Indika Tantrigoda
Thanks for the reply.
Here is another thread I found similar to this
http://www.mail-archive.com/solr-user@lucene.apache.org/msg28236.html

>From what I understand the IndexReaders get reopened after a commit.

Regards,
Indika

On 2 May 2010 00:29, Erick Erickson  wrote:

> The underlying IndexReader must be reopened. If you're
> searching for a document with a searcher that was opened
> before the document was indexed, it won't show up on the
> search results.
>
> I'm guessing that your statement that when you search
> for it with some test is coincidence, but that's just a guess.
>
> HTH
> Erick
>
> On Sat, May 1, 2010 at 1:07 PM, Indika Tantrigoda  >wrote:
>
> > Hi all,
> >
> > I've been working with Solr for a few weeks and have gotten SolrJ
> > to connect to it, index, search documents.
> >
> > However I am having an issue when a document is committed.
> > When a document is committed it does not show in the search results if I
> do
> > a *:* search,
> > but if I search for it with some text then it is shown in the results.
> > Only when another document is committed, the previous document is found
> > when
> > I do a *:* search
> >
> > Is this because of the SolrJ client or do I have to pass additional
> > parameters to commit() ?
> >
> > Thanks in advance.
> >
> > Regards,
> > Indika
> >
>


Re: run on reboot on windows

2010-05-01 Thread Dave Searle
Set tomcat6 service to auto start on boot (if running tomat)

Sent from my iPhone

On 2 May 2010, at 02:31, "S Ahmed"  wrote:

> Hi,
>
> I'm trying to get Solr to run on windows, such that if it reboots  
> the Solr
> service will be running.
>
> How can I do this?