Re: Anyproblem in running two solr instances on the same machine using the same directory ?

2008-09-27 Thread Jagadish Rath
I am indexing data provided by the users our web site. If load on the site
increases, the rate of the commits also increases. The nature of the data is
such that it should get reflected in the index instantaneously.

On Sat, Sep 27, 2008 at 4:00 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Fri, Sep 26, 2008 at 2:18 AM, Jagadish Rath <[EMAIL PROTECTED]> wrote:
> >   - *What are the other solutions to the problem of "maxWarmingSearchers
> >   limit exceeded error " ?**  *
>
> Don't commit so rapidly?
> What is the reason for your high commit rate?
>
> -Yonik
>


Re: Searching Question

2008-09-27 Thread Grant Ingersoll


On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote:

It might be easiest to store the thread ID and the number of replies  
in the thread in each post Document in Solr.


Yeah, but that would mean updating every document in a thread every  
time a new reply is added.


I still keep going back to the solution as putting all the replies in  
a single document, and then using a custom Similarity factor that  
overrides the TF function and/or the length normalization.  Still,  
this suffers from having to update the document for every new reply.


Let's take a step back...

Can I ask why you want the scoring this way?  What have you seen in  
your results that leads you to believe it is the correct way?  Note,  
I'm not trying to convince you it's wrong, I just want to better  
understand what's going on.






Otherwise it sounds like you'll have to combine some search results  
or data post-search.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Jake Conk <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, September 26, 2008 1:50:37 PM
Subject: Re: Searching Question

Grant,

Each post is its own document but I can merge them all into a single
document under one  thread if that will allow me to do what I want.
The number of replies is stored both in Solr and the DB.

Thanks,

- JC

On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote:
Is a thread and all of it's posts a single document?  In other  
words, how
are you modeling your posts as Solr documents?  Also, where are  
you keeping

track of the number of replies?  Is that in Solr or in a DB?

-Grant

On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:


Hello,

We are using Solr for our new forums search feature. If possible  
when
searching for the word "Halo" we would like threads that contain  
the
word "Halo" the most with the least amount of posts in that  
thread to

have a higher score.

For instance, if we have a thread with 10 posts and the word "Halo"
shows up 5 times then that should have a lower score than a thread
that has the word "Halo" 3 times within its posts and has 5  
replies.

Basically the thread that shows the search string most frequently
amongst the number of posts in the thread should be the one with  
the

highest score.

Is something like this possible?

Thanks,




Re: Anyproblem in running two solr instances on the same machine using the same directory ?

2008-09-27 Thread Otis Gospodnetic
Solr today is not suited for real-time search (seeing newly added docs in 
search results as soon as they've been added - the way databases work, for 
example).  Work on that is in progress, though.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jagadish Rath <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Saturday, September 27, 2008 6:24:02 AM
> Subject: Re: Anyproblem in running two solr instances on the same machine 
> using the same directory ?
> 
> I am indexing data provided by the users our web site. If load on the site
> increases, the rate of the commits also increases. The nature of the data is
> such that it should get reflected in the index instantaneously.
> 
> On Sat, Sep 27, 2008 at 4:00 AM, Yonik Seeley wrote:
> 
> > On Fri, Sep 26, 2008 at 2:18 AM, Jagadish Rath wrote:
> > >   - *What are the other solutions to the problem of "maxWarmingSearchers
> > >   limit exceeded error " ?**  *
> >
> > Don't commit so rapidly?
> > What is the reason for your high commit rate?
> >
> > -Yonik
> >



Re: Anyproblem in running two solr instances on the same machine using the same directory ?

2008-09-27 Thread Jason Rutherglen
The question I have is what is the optimal approach for integrating
realtime into SOLR?  What classes should be extended or created?

On Sat, Sep 27, 2008 at 9:40 AM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Solr today is not suited for real-time search (seeing newly added docs in 
> search results as soon as they've been added - the way databases work, for 
> example).  Work on that is in progress, though.
>
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Jagadish Rath <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Saturday, September 27, 2008 6:24:02 AM
>> Subject: Re: Anyproblem in running two solr instances on the same machine 
>> using the same directory ?
>>
>> I am indexing data provided by the users our web site. If load on the site
>> increases, the rate of the commits also increases. The nature of the data is
>> such that it should get reflected in the index instantaneously.
>>
>> On Sat, Sep 27, 2008 at 4:00 AM, Yonik Seeley wrote:
>>
>> > On Fri, Sep 26, 2008 at 2:18 AM, Jagadish Rath wrote:
>> > >   - *What are the other solutions to the problem of "maxWarmingSearchers
>> > >   limit exceeded error " ?**  *
>> >
>> > Don't commit so rapidly?
>> > What is the reason for your high commit rate?
>> >
>> > -Yonik
>> >
>
>


Updating the index with a csv file

2008-09-27 Thread vmaidel

Hello, 

I would like to update my index with a csv file, but for some reason I get
the following error: 
"The request sent by the client was syntactically incorrect (missing content
stream)" 
I get it after using the following statement:
curl http://localhost:8983/solr/update/csv --data-binary @blog.csv -H
'Content-type:text/plain; charset=utf-8'

I use the windows version of curl, running this statement from the curl
folder where the blog.csv file resides as well. 

Thank you.
-- 
View this message in context: 
http://www.nabble.com/Updating-the-index-with-a-csv-file-tp19706582p19706582.html
Sent from the Solr - User mailing list archive at Nabble.com.



DataImportHandler: way to merge multiple db-rows to 1 doc using transformer?

2008-09-27 Thread Britske

Looking at the wiki, code of DataImportHandler and it looks impressive. 
There's talk about ways to use Transformers to be able to create several
rows (solr docs) based on a single db row. 

I'd like to know if it's possible to do the exact opposite: to build
customer transformers that take multiple db-rows and merge it to a single
solr-row/document. If so, how?

Thanks, 
Britske
-- 
View this message in context: 
http://www.nabble.com/DataImportHandler%3A-way-to-merge-multiple-db-rows-to-1-doc-using-transformer--tp19706722p19706722.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler: way to merge multiple db-rows to 1 doc using transformer?

2008-09-27 Thread Jon Baer
If I understand your question right ... you would not need a  
transformer, basically you nest entities under each other ... ie:




driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/nhldb? 
connectTimeout=0&autoReconnect=true" user="root" password=""  
batchSize="-1"/>




  
  
  processor="org.apache.solr.handler.dataimport.CachedSqlEntityProcessor">


  




I believe that is the basic steps.  Look up CachedSqlEntityProcessor  
to see if you need it.


- Jon

On Sep 27, 2008, at 5:47 PM, Britske wrote:



Looking at the wiki, code of DataImportHandler and it looks  
impressive.
There's talk about ways to use Transformers to be able to create  
several

rows (solr docs) based on a single db row.

I'd like to know if it's possible to do the exact opposite: to build
customer transformers that take multiple db-rows and merge it to a  
single

solr-row/document. If so, how?

Thanks,
Britske
--
View this message in context: 
http://www.nabble.com/DataImportHandler%3A-way-to-merge-multiple-db-rows-to-1-doc-using-transformer--tp19706722p19706722.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: DataImportHandler: way to merge multiple db-rows to 1 doc using transformer?

2008-09-27 Thread Walter Underwood
Make a view in your database and index that. No point in duplicating
database views in Solr. --wunder

On 9/27/08 2:47 PM, "Britske" <[EMAIL PROTECTED]> wrote:
> 
> Looking at the wiki, code of DataImportHandler and it looks impressive.
> There's talk about ways to use Transformers to be able to create several
> rows (solr docs) based on a single db row.
> 
> I'd like to know if it's possible to do the exact opposite: to build
> customer transformers that take multiple db-rows and merge it to a single
> solr-row/document. If so, how?
> 
> Thanks, 
> Britske



Re: Updating the index with a csv file

2008-09-27 Thread Chris Hostetter
: "The request sent by the client was syntactically incorrect (missing content
: stream)" 

that usually means either the content type wasn't set, or there was no 
post data

: curl http://localhost:8983/solr/update/csv --data-binary @blog.csv -H
: 'Content-type:text/plain; charset=utf-8'
: 
: I use the windows version of curl, running this statement from the curl
: folder where the blog.csv file resides as well. 

my gut assumption was that you needed some whitespace in the content type 
(ie: 'Content-type: text/plain; charset=utf-8') but i was able to get this 
to work just fine on linux using the example setup...

curl 'http://localhost:8983/solr/update/csv?commit=true' --data-binary 
@books.csv -H 'Content-type:text/plain; charset=utf-8'

...perhaps there is some eccentricity about windows curl?


-Hoss