Re: capping term frequency?

2008-04-14 Thread peter360
Thanks that worked! Otis Gospodnetic wrote: > > Hi, > > Probably by writing your own Similarity (Lucene codebase) and implementing > the following method with capping: > > /** Implemented as sqrt(freq). */ > public float tf(float freq) { > return (float)Math.sqrt(freq); > } > > The

Re: Interleaved results form different sources

2008-04-14 Thread Mike Klaas
On 14-Apr-08, at 6:14 PM, s d wrote: We have an index of documents from different sources and we want to make sure the results we display are interleaved from the different sources and not only ranked based on relevancy.Is there a way to do this ? By far the easiest way is to get the top N

Re: Snipets Solr/nutch

2008-04-14 Thread Mike Klaas
On 13-Apr-08, at 3:25 AM, khirb7 wrote: it doesn't work solr still use the default value fragsize=100. also I am not able to spécifieregex fragmenter due to this probleme of version I suppose or the way I am declaring ..highlighting> because both of: Hi khirb, It might be easi

Interleaved results form different sources

2008-04-14 Thread s d
We have an index of documents from different sources and we want to make sure the results we display are interleaved from the different sources and not only ranked based on relevancy.Is there a way to do this ? Thanks, S.

Re: Zappos's new Solr Site

2008-04-14 Thread Brian Mansell
Matthew - Thanks for sharing this example. The Zeta site search works well and provided results to my test queries instantly. cheers, --bemansell On Fri, Apr 11, 2008 at 10:35 AM, Matthew Runo <[EMAIL PROTECTED]> wrote: > Hello folks! > > First, the link: https://zeta.zappos.com (it's a very ear

RE: capping term frequency?

2008-04-14 Thread Norskog, Lance
Doing this well is harder. Giving a spam score to each page and boosting by a function on this score is probably a stronger tool.Can't remember where I found it. Gives a solid spam score algorithm for several easy-to-code text analyses and a scoring function. This assumes you pre-process. Detectin

Fuzzy queries in dismax specs?

2008-04-14 Thread Walter Underwood
I've started implementing something to use fuzzy queries for selected fields in dismax. The request handler spec looks like this: exact~0.7^4.0 stemmed^2.0 If anyone has already done this, I'd be glad to use it. I'm working with an older version of Solr, so I won't have a 1.2 patch right away

Re: too many queries?

2008-04-14 Thread Otis Gospodnetic
It's hard to tell from the info given, though something doesn't sound ideal. Even if Solr's caching doesn't help, with only 4M documents, your Solr search slaves should be able to keep the whole index in RAM, assuming your index is not huge. How large is the index? (GB on disk) Is it optimized

too many queries?

2008-04-14 Thread Jonathan Ariel
Hi, I have some questions about performance for you guys. So basically I have 2 slave solr servers and 1 master solr server load balanced and around 100 request/second, aprox. 50 request per second per solr server. My index is about 4 million documents and the average query response time is 0.6 sec

Re: Searching for popular phrases or words

2008-04-14 Thread Chris Hostetter
it depends on your definition of "polular" if you mean "occurs in a lot of documents" then take a look at the LukeRequestHandler ... if can give you info on terms with high frequencies (and you can use a Shingles based tokenizer to index "phrase" as terms if by popular you mean "occurs in a lo

Re: Can I find the which field matched?

2008-04-14 Thread Chris Hostetter
the Lucene Scorers don't keep track of component scores as they go, the cumulative score is calculated all at once. For specific documents your plugin could use the SolrIndexSearcher.explain method to execute logic that will build up a data structure showing the intermediate calculations. -H

Re: issues with solr

2008-04-14 Thread Erik Hatcher
There is an "Ant script" section on that mySolr page. But there is no need to use any of that for your project. All you need is Solr's WAR file and the appropriate Solr configuration files and you're good to go. Erik On Apr 14, 2008, at 9:12 AM, dudes dudes wrote: thanks Erik,

Re: solr query time

2008-04-14 Thread neil22
Thanks for the info. Is the whole result set read into memory meaning that the number of matches I can have have for a query is limited by my machine's memory? Otis Gospodnetic wrote: > > Hi, > rows=N param just tells Solr how many top N results to return. Solr (and > Lucene, really) still n

Re: how to apply a patch

2008-04-14 Thread Grant Ingersoll
Note, this patch has been applied to trunk. At any rate, here's what I did for this patch: Downloaded to my Solr patches directory from the JIRA website My setup looks like: ./patches ./solr-clean #contains a clean copy of Solr. It never has uncommitted patches on it On the command line >c

Re: how to apply a patch

2008-04-14 Thread khirb7
Grant Ingersoll-6 wrote: > > I generally do: > > svn up (make sure I am up to date) > patch -p 0 -i [--dry-run] > > I usually do the --dry-run first to see if it applies cleanly, then > drop it if it does. > > HTH, > Grant > > On Apr 13, 2008, at 10:37 AM, khirb7 wrote: > >> >> hello

RE: capturing page numbers

2008-04-14 Thread Binkley, Peter
Tricia Williams is working on this problem in https://issues.apache.org/jira/browse/SOLR-380, and there is a patch you can try (instructions at https://issues.apache.org/jira/browse/SOLR-380?focusedCommentId=12541699 #action_12541699). It uses Lucene payloads to carry the page information, and requ

Count of facet count

2008-04-14 Thread Dawe
Hello, how I can get count of distinct facet_fields ? like numFacetFound in this example: http://localhost:8983/solr/select?q=xxx&rows=0&facet=true&facet.limit=10&facet.field=county 02 3 1 1 5 Thanks -- View this message in context: http://ww

Re: Multicore Issue with nightly build

2008-04-14 Thread Ryan McKinley
On Apr 10, 2008, at 3:48 PM, kirk beers wrote: Hi Ryan, I still can't seem to get my solr cores : core0 and core1 to accept new documents. I changed the appropriate code in the Perl client to accommodate the core as you mentioned in the previous email. I am able to delete docs. Is there

capturing page numbers

2008-04-14 Thread AHMET ARSLAN
I have extracted text from .pdf files and I also inserted page numbers of the .pdf file to the text. My document looks something like: ..Some Text.. ..Some Text.. .. ... I indexed my data using solr and I am making highli

Re: solr query time

2008-04-14 Thread Otis Gospodnetic
Hi, rows=N param just tells Solr how many top N results to return. Solr (and Lucene, really) still needs to find all documents that match the query and then score them (and optionally sort them). The more documents and matches you have, the more time the query will take. Otis -- Sematext --

solr query time

2008-04-14 Thread neil22
It seems that response time to a query is linear with the size of the result set even if I always only want the first 10 hits back. Testing I did - 1 millions documents that have "feature1" all with the the same score - query time = 3 seconds to get first 10 hits 10 millions documents t

RE: issues with solr

2008-04-14 Thread dudes dudes
thanks Erik, Basically I have used the build file from solr not from that page,... I have had a look and couldn't really find their build.xml file ! thanks ak > From: [EMAIL PROTECTED] > Subject: Re: issues with solr > Date: Mon, 14 Apr 2008 08:54:39 -

Re: issues with solr

2008-04-14 Thread Erik Hatcher
The mysolr.dist target is defined in the Ant file on that page. My guess is that you were not using the Ant build file bits there. My take is that the mySolr page is not quite what folks should be cloning for incorporation of Solr into their application. Maybe that page should be removed

issues with solr

2008-04-14 Thread dudes dudes
Hello there I'm new to Solr I'm trying to deploy the example under http://wiki.apache.org/solr/mySolr .However, every time I issue ant mysolr.dist it generates: Buildfile: build.xml BUILD FAILED Target "mysolr.dist" does not exist in the project "solr". I'm running Ubuntu getty an