Re: how to balance index and search

2007-03-18 Thread James liu

2007/3/17, Chris Hostetter <[EMAIL PROTECTED]>:



if your indexing while searching is causing problems, one way to reduce
the impact is to index on a master instance and then use the replication
scripts to sync it up with a slave instance (where all of your searches
happen)



I think it have problem that we use win2003 and i remember replication
scripts have problem in FreeBSD.

if you are specificly seeing high CPU when indexing HTML, that's probably

because the HTML Analyzers have to do a lot of compelx stuff to strip out
hte HTML ... another option might be to parse that HTML on the client side
before sending it to Solr.



Spider crawl html data into MS sql server. I just get data from SQL Server
and curl it to solr.
Tomorrow i will test under this option .


: I find index html will make tomcat obtain cpu 100% . It make seach become

: slow.
:
: So how to balance index and search.
:
:
: web i use apache+php
:
: solr i use tomcat 6+java1.6


-Hoss






--
regards
jl


Re: how to balance index and search

2007-03-18 Thread James liu

2007/3/17, Chris Hostetter <[EMAIL PROTECTED]>:



: Can people from cnet tell how to use solr in CNET.COM ?

I really don't understand your question, here's some links to CNET.com
that use Solr...

http://www.cnet.com/4244-5_1-0.html?query=ipod
http://search.news.com/search?q=apple
http://reviews.cnet.com/4566-3121-0.html



I just wana know CNET.com's index and search architecture if it can be
public.
Many people who use solr or wanna use,,they all wanna know and learn.



-Hoss






--
regards
jl


Score / Sort question

2007-03-18 Thread shai deljo

How do i force SOLR to score documents that contain ALL terms 1st
before results that contain some of the terms?
The problem is that i don't want to use an AND (since i am also
interested in the OR results) but i do want to score documents that
contain all terms higher.
Please advise,
Thanks


Re: how to balance index and search

2007-03-18 Thread Chris Hostetter

: I think it have problem that we use win2003 and i remember replication

The scripts thta come with Solr don't work on windows becaues they rely on
hardlinks to efficinelty copy only things that have changed -- but the
principle of indexing on one server, creating "snapshots" (which could be
true copies instead of hardlinks) and the nreplicating those snapshots out
to slave servers for searching is still a solid one.

the hooks Solr provides for triggering snapshot creation on the master and
snapshot installation on the slave make it possible for you to implement
those anyway thta makes sense for your environment.



-Hoss



Re: how to balance index and search

2007-03-18 Thread Chris Hostetter

: I just wana know CNET.com's index and search architecture if it can be
: public.
: Many people who use solr or wanna use,,they all wanna know and learn.

I'm not sure what to tell you: Solr *is* our search arch.  We have a dozen
or so Solr, indexes, all of them use hte master/slave model -- but they
are all configured in various ways based on the nature of the data and the
types of queries we do.  the news collection doesn't do faceted search and
surfacing new stories immediately is crucial, so they have small cache
configs, with very low auto warming, and replication cranked up to happen
very frequently; meanwhile hte product index where update latency of 20
minutes isn't the end of the world but we do want to support faceted
searching does snapinstalls only every 15 minutes (i think) with big
caches, that are 100% auto warmed.





-Hoss



Re: Score / Sort question

2007-03-18 Thread Chris Hostetter

: How do i force SOLR to score documents that contain ALL terms 1st
: before results that contain some of the terms?

generally speaking this is hte result you will usually on random data ...
under the covers Lucene uses TF/IDF based weighting of terms, with a coord
factor that penalizes queries that don't match all clauses -- but i'm sure
it's possible that sometimes the tf is so high and the idf so low, that
the score from one term can dominate.

the only solution to your problem that i can think of is to write a custom
Similarity class where tf and idf are fixed so only the coordFactor
matters.




-Hoss



Re: Returning xx number of each group in a single query?

2007-03-18 Thread Chris Hostetter

there's nothing like that in SOlr right now, but you could write a Custom
RequestHandler to do it.

in theory you could even write a request handler thta just successivly
called another request handler by name (which could be a param) altering
the request params each time based on it's input, and consolidating all of
hte results.

: Date: Fri, 16 Mar 2007 17:59:56 -0700 (PDT)
: From: Brian Lucas <[EMAIL PROTECTED]>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Returning xx number of each group in a single query?
:
:
: Is there a way to fetch 5 records with group_id:1, 5 records with group_id:2,
: 5 records with group_id:3, and so forth in a single query?
:
: The facet features don't seem to give me what I need -- same with rows.  Any
: ideas on how to do something like this?
:
:
: --
: View this message in context: 
http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9525144
: Sent from the Solr - User mailing list archive at Nabble.com.
:



-Hoss



Re: Score / Sort question

2007-03-18 Thread shai deljo

I assumed the tf/idf would behave like this but it's behaving VERY
differently/wrong so i wonder maybe something is wrong with my
indexing strategy ?
I think for a quicker solution (ok, hack :) ) I'll run two different
queries (AND, OR) and merge them.
Does SOLR support some kind of merging i can leverage or do i need to
do it manually ?
Thanks,


On 3/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: How do i force SOLR to score documents that contain ALL terms 1st
: before results that contain some of the terms?

generally speaking this is hte result you will usually on random data ...
under the covers Lucene uses TF/IDF based weighting of terms, with a coord
factor that penalizes queries that don't match all clauses -- but i'm sure
it's possible that sometimes the tf is so high and the idf so low, that
the score from one term can dominate.

the only solution to your problem that i can think of is to write a custom
Similarity class where tf and idf are fixed so only the coordFactor
matters.




-Hoss




Re: how to balance index and search

2007-03-18 Thread James liu

2007/3/19, Chris Hostetter <[EMAIL PROTECTED]>:



: I think it have problem that we use win2003 and i remember replication

The scripts thta come with Solr don't work on windows becaues they rely on
hardlinks to efficinelty copy only things that have changed -- but the
principle of indexing on one server, creating "snapshots" (which could be
true copies instead of hardlinks) and the nreplicating those snapshots out
to slave servers for searching is still a solid one.



Now i m reading cwRsync which is Rsync in Window.

the hooks Solr provides for triggering snapshot creation on the master and

snapshot installation on the slave make it possible for you to implement
those anyway thta makes sense for your environment.


-Hoss






--
regards
jl


Re: how to balance index and search

2007-03-18 Thread James liu

2007/3/19, Chris Hostetter <[EMAIL PROTECTED]>:



: I just wana know CNET.com's index and search architecture if it can be
: public.
: Many people who use solr or wanna use,,they all wanna know and learn.

I'm not sure what to tell you: Solr *is* our search arch.



Below information  that i wanna learn. Thks  Chris.

Maybe this thing should add to wiki. I think person will be happy reading
it.

 We have a dozen

or so Solr, indexes, all of them use hte master/slave model -- but they
are all configured in various ways based on the nature of the data and the
types of queries we do.  the news collection doesn't do faceted search and
surfacing new stories immediately is crucial, so they have small cache
configs, with very low auto warming, and replication cranked up to happen
very frequently; meanwhile hte product index where update latency of 20
minutes isn't the end of the world but we do want to support faceted
searching does snapinstalls only every 15 minutes (i think) with big
caches, that are 100% auto warmed.





-Hoss





--
regards
jl


Re: Returning xx number of each group in a single query?

2007-03-18 Thread nick19701

How about returning at most 1 result of each group in a single query?

For example, a website may have a lot of pages. When google returns
search results, it only shows at most one result for each website. I have
a similar situation. Is there an easy way for this kind of problem?
-- 
View this message in context: 
http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9546032
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Score / Sort question

2007-03-18 Thread shai deljo

Hey Chris,
wouldn't doing something like this in the query :
(field1:tag1 tag2) OR (field1:tag1 AND tag2)

Achieve similar affect ?

The documents that have all the tags (tag1 and tag2)  will comply with
both conditions and get scores from both while the documents that
don't have both tags will only get a score from the 1st (the OR)
condition therfore won't have higher score.

is this right ?



On 3/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: How do i force SOLR to score documents that contain ALL terms 1st
: before results that contain some of the terms?

generally speaking this is hte result you will usually on random data ...
under the covers Lucene uses TF/IDF based weighting of terms, with a coord
factor that penalizes queries that don't match all clauses -- but i'm sure
it's possible that sometimes the tf is so high and the idf so low, that
the score from one term can dominate.

the only solution to your problem that i can think of is to write a custom
Similarity class where tf and idf are fixed so only the coordFactor
matters.




-Hoss




Re: Returning xx number of each group in a single query?

2007-03-18 Thread Ryan McKinley

You may want to take a look at the related discussion:
http://www.nabble.com/result-grouping--tf2910425.html#a8131895

Yonik suggested a dynamic priority queue... if the number of things
you are grouping by is small it is probably easier to make multiple
calls to solr.

ryan


On 3/16/07, Brian Lucas <[EMAIL PROTECTED]> wrote:


Is there a way to fetch 5 records with group_id:1, 5 records with group_id:2,
5 records with group_id:3, and so forth in a single query?

The facet features don't seem to give me what I need -- same with rows.  Any
ideas on how to do something like this?


--
View this message in context: 
http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9525144
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Score / Sort question

2007-03-18 Thread Walter Underwood
An example would help. A query and the results that you see.

wunder

On 3/18/07 6:48 PM, "shai deljo" <[EMAIL PROTECTED]> wrote:

> I assumed the tf/idf would behave like this but it's behaving VERY
> differently/wrong so i wonder maybe something is wrong with my
> indexing strategy ?
> I think for a quicker solution (ok, hack :) ) I'll run two different
> queries (AND, OR) and merge them.
> Does SOLR support some kind of merging i can leverage or do i need to
> do it manually ?
> Thanks,
> 
> 
> On 3/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>> 
>> : How do i force SOLR to score documents that contain ALL terms 1st
>> : before results that contain some of the terms?
>> 
>> generally speaking this is hte result you will usually on random data ...
>> under the covers Lucene uses TF/IDF based weighting of terms, with a coord
>> factor that penalizes queries that don't match all clauses -- but i'm sure
>> it's possible that sometimes the tf is so high and the idf so low, that
>> the score from one term can dominate.
>> 
>> the only solution to your problem that i can think of is to write a custom
>> Similarity class where tf and idf are fixed so only the coordFactor
>> matters.
>> 
>> 
>> 
>> 
>> -Hoss
>> 
>> 



Re: Returning xx number of each group in a single query?

2007-03-18 Thread Brian Lucas

Thanks to everyone who responded thus far.  

Simple is good for right now.  Chris, is there a way to do what you describe
here (write a request handler that successively calls another request
handler by name) in the solrconfig.xml file, or does this require me to
write a custom RequestHandler in java to perform this?

Brian

Chris Hostetter wrote:
> 
> 
> there's nothing like that in SOlr right now, but you could write a Custom
> RequestHandler to do it.
> 
> in theory you could even write a request handler thta just successivly
> called another request handler by name (which could be a param) altering
> the request params each time based on it's input, and consolidating all of
> hte results.
> 
> : Date: Fri, 16 Mar 2007 17:59:56 -0700 (PDT)
> : From: Brian Lucas <[EMAIL PROTECTED]>
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: Returning xx number of each group in a single query?
> :
> :
> : Is there a way to fetch 5 records with group_id:1, 5 records with
> group_id:2,
> : 5 records with group_id:3, and so forth in a single query?
> :
> : The facet features don't seem to give me what I need -- same with rows. 
> Any
> : ideas on how to do something like this?
> :
> :
> : --
> : View this message in context:
> http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9525144
> : Sent from the Solr - User mailing list archive at Nabble.com.
> :
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9546710
Sent from the Solr - User mailing list archive at Nabble.com.