boost type:true

2009-02-26 Thread sunnyfr
Hello everybody, Little question : status_official:true^1,5 How come this doesn't show up datas and if I remove status_official then it will show up data. I tried to add status_official:false^1 but nothing come up and if I remove this param I've some value. I would like to boost some status .

boost several boolean in bq

2009-02-26 Thread sunnyfr
Hi I dont get where I'm wrong. I would like to boost some type of my books. So If I do : &bq=status_official:0^1.5+status_creative:0^1.5 I've one result If I do: &bq=status_official:1^1.5+status_creative:1^1.5 Nothing, I think the result should still come up even if it doesn't have this status

Re: boost several boolean in bq

2009-02-26 Thread sunnyfr
I've actually added (status_official:1 OR status_creative:1)^2.5 sunnyfr wrote: > > Hi > > I dont get where I'm wrong. > I would like to boost some type of my books. > > So If I do : &bq=status_official:0^1.5+status_creative:0^1.5 > I've one result > > If I do: &bq=status_official:1^1.5+st

dataDir configuration

2009-02-26 Thread Thijs
Hi, I just upgraded from solr-1.3-dev to 1.4-dev and I'm having issues with the location of the dataDir. I configure solr through -Dsolr.solr.home= /u01/app/apptest/solr In v1.3 the datadir is located in /u01/app/apptest/solr/data However when I dorp the 1.4 war in place the dataDir is opened

custom reranking

2009-02-26 Thread CIF Search
We have a distributed index consisting of several shards. There could be some documents repeated across shards. We want to remove the duplicate records from the documents returned from the shards, and re-order the results by grouping them on the basis of a clustering algorithm and reranking the doc

What is the best scalable scheme to support multiple users?

2009-02-26 Thread Vikram B. Kumar
Hi All, Our web based document management system has few thousand users and is growing rapidly. Like any SaaS, while we support a lot of customers, only few of them (those logged in) will be reading their index and only a subset of those logged in (who are adding documents) will be writing to

dismax + and -

2009-02-26 Thread sunnyfr
Hi How come if i put in my query q=+wow-kill wow-kill dismax I will have books which contain wow and kill instead of books which have wow in the title without kill??? Thanks a lot, -- View this message in context: http://www.nabble.com/dismax-%2B-and---tp4770p4770.html Sent from the

Re: custom reranking

2009-02-26 Thread Grant Ingersoll
On Feb 26, 2009, at 6:04 AM, CIF Search wrote: We have a distributed index consisting of several shards. There could be some documents repeated across shards. We want to remove the duplicate records from the documents returned from the shards, and re-order the results by grouping them on the

Re: unique result

2009-02-26 Thread Grant Ingersoll
I presume these all have different unique ids? If you can address it at indexing time, then have a look at https://issues.apache.org/jira/browse/SOLR-799 Otherwise, you might look at https://issues.apache.org/jira/browse/SOLR-236 On Feb 25, 2009, at 6:54 PM, Cheng Zhang wrote: Is it possibl

Re: What is the best scalable scheme to support multiple users?

2009-02-26 Thread Walter Underwood
1a. Multiple Solr instances partitioned by user_id%N, with index files segmented by user_id field. That can scale rather gracefully, though it does need reindexing to add a server. wunder On 2/26/09 3:44 AM, "Vikram B. Kumar" wrote: > Hi All, > > Our web based document management system has f

Re: dismax + and -

2009-02-26 Thread Jeff Newburn
In your example there is no space between +wow -kill so my guess is that solr is interpreting it as wow-kill all one word. Then depending on the field type the tokenizer is probably splitting wow and kill into 2 words along the -. -- Jeff Newburn > From: sunnyfr > Reply-To: > Date: Thu, 26 Fe

Re: What is the best scalable scheme to support multiple users?

2009-02-26 Thread Vikram Kumar
Hi Wunder, Can you please elaborate? Vikram On Thu, Feb 26, 2009 at 10:13 AM, Walter Underwood wrote: > 1a. Multiple Solr instances partitioned by user_id%N, with index > files segmented by user_id field. > > That can scale rather gracefully, though it does need reindexing > to add a server. > >

order of word in the request

2009-02-26 Thread sunnyfr
Hi guys, I look for the parameter or the way to boost the order of the word in the query. Let's imagine people look for "rich & famous" book ... so in the search they will just write rich & famous and let's imagine a book with a better rating and lot of views is like famous & very rich is there

Use of scanned documents for text extraction and indexing

2009-02-26 Thread Sudarsan, Sithu D.
Hi All: Is there any study / research done on using scanned paper documents as images (may be PDF), and then use some OCR or other technique for extracting text, and the resultant index quality? Thanks in advance, Sithu D Sudarsan sithu.sudar...@fda.hhs.gov sdsudar...@ualr.edu

Re: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Hannes Carl Meyer
Hi Sithu, there is a project called ocropus done by the DFKI, check the online demo here: http://demo.iupr.org/cgi-bin/main.cgi And also http://sites.google.com/site/ocropus/ Regards Hannes m...@hcmeyer.com http://mimblog.de On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. < sithu.sudar...

Re: order of word in the request

2009-02-26 Thread Yonik Seeley
On Thu, Feb 26, 2009 at 11:25 AM, sunnyfr wrote: > How can I tell it to put a lot of more weight for the book which has exactly > the same title. A sloppy phrase query should work. See the "pf" param in the dismax query parser. -Yonik http://www.lucidimagination.com

Re: What is the best scalable scheme to support multiple users?

2009-02-26 Thread Walter Underwood
With five servers, assign 1/5 of user_id's to each server. Choose the number of servers to handle the number of logged-in users. Each user's searches go to the single server with their data. Partitioning by user_id is common with relational databases. We do this to hold our two billion movie ratin

RE: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Sudarsan, Sithu D.
Thanks Hannes, The tool looks good. Sincerely, Sithu D Sudarsan sithu.sudar...@fda.hhs.gov sdsudar...@ualr.edu -Original Message- From: hannesc...@googlemail.com [mailto:hannesc...@googlemail.com] On Behalf Of Hannes Carl Meyer Sent: Thursday, February 26, 2009 11:35 AM To: solr-user@l

Re: unique result

2009-02-26 Thread Cheng Zhang
It's exactly what I'm looking for. Thank you Grant. - Original Message From: Grant Ingersoll To: solr-user@lucene.apache.org Sent: Thursday, February 26, 2009 6:56:22 AM Subject: Re: unique result I presume these all have different unique ids? If you can address it at indexing time,

Re: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Shashi Kant
Another project worth investigating is Tesseract. http://code.google.com/p/tesseract-ocr/ - Original Message From: Hannes Carl Meyer To: solr-user@lucene.apache.org Sent: Thursday, February 26, 2009 11:35:14 AM Subject: Re: Use of scanned documents for text extraction and indexing H

SolrCoreAware analyzer

2009-02-26 Thread Bojan Šmid
Hello, I am writing a custom analyzer for my field type. This analyzer would need to use SolrResourceLoader and SolrConfig, so I want to make it SolrCoreAware. However, it seems that Analyzer classes aren't supposed to be used in this way (as described in http://wiki.apache.org/solr/SolrPlugins).

Lucene sync bottleneck?

2009-02-26 Thread Matthew Runo
Hello folks! I was under the impression that this sync bottleneck was fixed in recent versions of Solr/Lucene, but we're seeing it with 1.4-dev right now. When we load test a server with >100 threads (using jmeter), we see several threads all blocked at the same spot: "http-8080-exec-505"

Re: Lucene sync bottleneck?

2009-02-26 Thread Yonik Seeley
That's interesting. We should be using read-only readers, which should not synchronize on the deleted docs check. But as your stack trace shows, you're using SegmentReader and MultiSegmentReader. Right now, if I look at the admin/statistics page at the searcher, it shows the following for the rea

Re: Lucene sync bottleneck?

2009-02-26 Thread Matthew Runo
I see a ReadOnlySegmentReader now - we're on an optimized index now which gets around the isDeleted() check. (solr4, optimized) searcherName : searc...@260f8e27 main caching : true numDocs : 139583 maxDoc : 139583 readerImpl : ReadOnlySegmentReader readerDir : org.apache.lucene.store.NIOFSDirec

RE: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Renaud Waldura
There is quite a bit of litterature available on this topic. This paper presents a summary. Nothing immediately applicable I'm afraid. Retrieving OCR Text: A survey of current approaches Steven M. Beitzel, Eric C. Jensen, David A Grossman Illinois Institute of Technology It lists a number of othe

Re: Number of webapps

2009-02-26 Thread Alexander Ramos Jardim
Another simple solution for your requirement is to use multicore. This way you will have only one Solr webapp loaded with as many indexes as you need. See more at http://wiki.apache.org/solr/MultiCore 2009/2/25 Michael Della Bitta > Unfortunately, I think the way this works is the container cre

Re: SolrCoreAware analyzer

2009-02-26 Thread Chris Hostetter
: I am writing a custom analyzer for my field type. This analyzer would need : to use SolrResourceLoader and SolrConfig, so I want to make it : SolrCoreAware. 1) Solr's support for using Analyzer instances is mainly just to make it easy for people who already have existing ANalyzer impls that th

Re: Lucene sync bottleneck?

2009-02-26 Thread Chris Hostetter
: We should be using read-only readers, which should not synchronize on FWIW: skimming through code that i don't normally look at to see the new read only changes i noticed this in SolrCore... // gets a non-caching searcher public SolrIndexSearcher newSearcher(String name, boolean readOnly)

Re: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Vikram Kumar
Tesseract is pure OCR. Ocropus builds on Tesseract. Vikram On Thu, Feb 26, 2009 at 12:11 PM, Shashi Kant wrote: > Another project worth investigating is Tesseract. > > http://code.google.com/p/tesseract-ocr/ > > > > > - Original Message > From: Hannes Carl Meyer > To: solr-user@lucene.

Re: warming question

2009-02-26 Thread Jonathan Haddad
Does anyone have any good documentation that explains how to set up the warming feature within the config? On Wed, Feb 25, 2009 at 11:58 AM, Marc Sturlese wrote: > > Shalin your patch worked perfect for my use case. > Thank's both for the information! > > > > Amit Nithian wrote: >> >> I'm actuall

Re: custom reranking

2009-02-26 Thread CIF Search
I believe the query component will generate the query in such a way that i get the results that i want, but not process the returned results, is that correct? Is there a way in which i can group the returned results, and rank each group separately, and return the results together. In other words wh

Re: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Shashi Kant
Can anyone back that up? IMHO Tesseract is the state-of-the-art in OCR, but not sure that "Ocropus builds on Tesseract". Can you confirm that Vikram has a point? Shashi - Original Message From: Vikram Kumar To: solr-user@lucene.apache.org; Shashi Kant Sent: Thursday, February 26,

Re: rsync

2009-02-26 Thread Otis Gospodnetic
Hi, If the master goes down and the slave(s) already have the index, search remains working If the master goes down during replication, the search will remain working, but the slave will not have/see the most recent index changes. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nu

Re: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Bastian Buch
You can use Tesseract, an openSource OCR Engine owned from Google. Its native C Code and to use it in Java you should use JNI or direct process creation. There is no PDF support, but you can use imagemagick to convert those docs on the fly. The engine scan documents line by line without trying

Re: dataDir configuration

2009-02-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess this is a bug introduced by SOLR-943. We shall raise an issue.(JIRA is down now) --Noble On Thu, Feb 26, 2009 at 4:26 PM, Thijs wrote: > Hi, > > I just upgraded from solr-1.3-dev to 1.4-dev and I'm having issues with the > location of the dataDir. > > I configure solr through -Dsolr.solr.