Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-16 Thread Steve Radhouani
Thanks Ron. Actually, I'm developing a Web search engine. Would that matter? Thanks. 2010/2/16 Ron Chan > > I'd doubt if a performance benchmark would be very useful, it ultimately > depends on what you are trying to do and what you are comfortable with. > > We've had successful deployments on

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-16 Thread Chris Hostetter
: > I belive Koji was mistaken. looking at DocumentBuilder.toDocument, the : > boosts have been propogated to copyField destinations since that method was : > added in 2007 (initially it didn't deal with copyfields at all, but once : > that was fixed it copied the boosts as well.) ... : Hm

Re: Upgrading from solr1.3 to solr1.4

2010-02-16 Thread Rakhi Khatwani
Hi, Solr home: 1.3.0/examples/multicore Type of Queries: Recursive e.g. I search in the index for some name that returns some rows. For each row there is a field called parentid which is a unique key for some other row in the index. The next queries search the index for the parentid . This continue

Re: Performance-Issues and raising numbers of "cumulative inserts"

2010-02-16 Thread Antonio Lobato
I've actually run into this issue; huge, 30 minute warm up times. I've found that reducing the auto-warm count on caches (and the general size of the cache) helped a -lot-, as did making sure my warm up query wasn't something like: q=*:*&facet=true&facet.field=somethingWithAWholeLotOfTerms T

Re: Preventing mass index delete via DataImportHandler full-import

2010-02-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Feb 17, 2010 at 8:03 AM, Chris Hostetter wrote: > > : I have a small worry though. When I call the full-import functions, can > : I configure Solr (via the XML files) to make sure there are rows to > : index before wiping everything? What worries me is if, for some unknown > : reason, we h

Re: Performance-Issues and raising numbers of "cumulative inserts"

2010-02-16 Thread Lance Norskog
These are some very large numbers. 700k ms is 70 seconds, 4M ms is 4k seconds or 66 minutes. No Solr installation should take this long to warm up. There is something very wrong here. Have you optimized lately? What queries do you run to warm it up? And, the basics: how many documents, how much da

Re: regarding ranking

2010-02-16 Thread Lance Norskog
Norms are generally not calculated. You need to change the field you want with this attribute: omitNorms="false". On Tue, Feb 16, 2010 at 2:38 PM, Ahmet Arslan wrote: >> After getting aware of all >> these combinations, it seems not >> wise to proceed blindly by punushing what ever we want. >> Th

Re: Tool for analyzing data in solr

2010-02-16 Thread Lance Norskog
This is the CheckIndex program in Lucene. I don't have a link handy for running it, but it is in the lucene-core jar file in solr/lib. On Tue, Feb 16, 2010 at 11:08 AM, dipti khullar wrote: > Hi All > > Is there any tool to analyze corrupted data in Solr. I am aware of luke. > But does it shows s

Re: Question on Index Replication

2010-02-16 Thread Lance Norskog
When you change an index you do not have to copy the entire index again. The new part of the index is in separate files and the replication code knows to only pull the differences. Indexing on a master and copying to slaves works very well - there are thousands of Solr installations using that tec

Re: schema design - catch all field question

2010-02-16 Thread Lance Norskog
The data copied from title to content is exactly the strings that you give. The data is copied around, then each field is analyzed. Changing 'title' from text to string makes no difference. On Mon, Feb 15, 2010 at 6:48 AM, adeelmahmood wrote: > > I am just trying to understand the difference betw

Re: VelocityResponseWriter: Image References

2010-02-16 Thread Lance Norskog
You can add a static content container to the jetty example. This is a patch against example/etc/jetty.xml. You then make a directory example/webapp/ROOT. This works the same as ROOT in tomcat: http://localhost:8983/image.png comes from webapp/ROOT/image.png. It is static and the files are not cop

Re: implementing profanity detector

2010-02-16 Thread Lance Norskog
A problem is that your profanity list will not stop growing, and with each new word you will want to rescrub the index. We had a thousand-word NOT clause in every query (a filter query would be true for 99% of the index) until we switched to another arrangement. Another small problem was that I k

Re: Merge several queries into one result?

2010-02-16 Thread Erick Erickson
It's generally a bad idea to try to think of various SOLR/Lucene indexes in a database-like way, Lucene isn't built to do RDBMS-like stuff. The first suggestion is usually to consider flattening your data. That would be something like adding NY and "New York" in each document. If that's not possib

Re: How to retrieve relevance "debug/explain" info in code?

2010-02-16 Thread Erick Erickson
Thanks for bringing closure. Erick On Tue, Feb 16, 2010 at 7:13 PM, uwdanny wrote: > > update - found the answer > > API getExplainList in org.apache.solr.util.SolrPluginUtils > > works. > > > > > uwdanny wrote: > > > > Hi, > > > > I was trying to get the detailed "explain" info in (java) code

Re: and DisMaxRequestHandler

2010-02-16 Thread Chris Hostetter
: no but you can set a default for the qf parameter with the same value good call... https://issues.apache.org/jira/browse/SOLR-1776 -Hoss

Re: Deleting spelll checker index

2010-02-16 Thread Chris Hostetter
: But still i cant stop thinking about this. : i deleted my entire index and now i have 0 documents. : : Now if i make a query with accrd i still get a suggestion of accord even : though there are no document returned since i deleted my entire index. i : hope it also clear the spell check index fi

Re: Question about custom Lucene filters and Solr

2010-02-16 Thread Chris Hostetter
: I'm interested in using Solr with a custom Lucene Filter (like the one : described in section 6.4.1 of the Lucene In Action, Second Edition : book). I'd like to filter search results from a Lucene index against : information stored in a relational database. I don't want to move the : relat

Re: Preventing mass index delete via DataImportHandler full-import

2010-02-16 Thread Chris Hostetter
: I have a small worry though. When I call the full-import functions, can : I configure Solr (via the XML files) to make sure there are rows to : index before wiping everything? What worries me is if, for some unknown : reason, we have an empty database, then the full-import will just wipe : t

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-16 Thread Koji Sekiguchi
Chris Hostetter wrote: : According to this email exchange between Koji and Mat Brown, : : http://www.mail-archive.com/solr-user@lucene.apache.org/msg23759.html : : The boost value from copyField's shouldn't be accumulated into the boost for : the text field, can anyone else verify this? This s

Re: Collating results from multiple indexes

2010-02-16 Thread Will Johnson
Jan Hoydal / Otis, First off, Thanks for mentioning us. We do use some utility functions from SOLR but our index engine is built on top of Lucene only, there are no Solr cores involved. We do have a JOIN operator that allows us to perform relational searches while still acting like a search en

Re: Upgrading from solr1.3 to solr1.4

2010-02-16 Thread Chris Hostetter
:i have indexed some data on solr 1.3.0. Now i wanna upgrade to solr : 1.4.0 but on the same data. : so here are the following steps i performed: : 1. extract solr 1.4.0 : 2. copied the conf and data folder of my index from solr : 1.3.0/examples/multicore to solr1.4.0/examples/multicore/ :

Re: Request time out in solr

2010-02-16 Thread Chris Hostetter
: I want to know How can I set request timeout through perl by : webservice::solr end or solr end so that I could hanlde request timeout I've never used WebService::Solr, but it's docs say it takes in a user agent object, (ie: LWP::UserAgent) so that's where you can specify the client side time

Seattle Hadoop/Lucene/NoSQL Meetup; Wed Feb 24th, Feat. MongoDB

2010-02-16 Thread Bradford Stephens
Greetings, It's time for another awesome Seattle Hadoop/Lucene/Scalability/NoSQL Meetup! As always, it's at the University of Washington, Allen Computer Science building, Room 303 at 6:45pm. You can find a map here: http://www.washington.edu/home/maps/southcentral.html?cse Last month, we had a g

Re: How to query multiple fields with phrases

2010-02-16 Thread Chris Hostetter
: I need to do a search that will search 3 different fields and combine : the results. First, it needs to not break the phrase into tokens, but : rather treat it is a phrase for one field. The other fields need to be : parsed with their normal analyzers. your description of your goal is a littl

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-16 Thread Chris Hostetter
: According to this email exchange between Koji and Mat Brown, : : http://www.mail-archive.com/solr-user@lucene.apache.org/msg23759.html : : The boost value from copyField's shouldn't be accumulated into the boost for : the text field, can anyone else verify this? This seem to go against what I

Re: ConstantScoreQuery and wildcards

2010-02-16 Thread Ahmet Arslan
> It seems that when I do a search with a wildcard (eg, > +text:abc*) the Solr > standard SearchHandler will construct a ConstantScoreQuery > passing in a > Filter, so all the documents in the result set are scored > the same. Is there > a way to make Solr construct a BooleanQuery instead so that >

ConstantScoreQuery and wildcards

2010-02-16 Thread TCK
Hi, It seems that when I do a search with a wildcard (eg, +text:abc*) the Solr standard SearchHandler will construct a ConstantScoreQuery passing in a Filter, so all the documents in the result set are scored the same. Is there a way to make Solr construct a BooleanQuery instead so that scoring ba

Re: How to retrieve relevance "debug/explain" info in code?

2010-02-16 Thread uwdanny
update - found the answer API getExplainList in org.apache.solr.util.SolrPluginUtils works. uwdanny wrote: > > Hi, > > I was trying to get the detailed "explain" info in (java) code using the > APIs, see codes below, > > - > ResponseBuilder rb (from some inherited proces

Re: regarding ranking

2010-02-16 Thread Ahmet Arslan
> After getting aware of all > these combinations, it seems not > wise to proceed blindly by punushing what ever we want. > Thank you very > much for letting me know. Generally most of the people are happy with default solr scoring. Especially in web like search. I am not sure but you can find t

Re: Question about custom Lucene filters and Solr

2010-02-16 Thread Jon Bodner
Hi Israel (et al), I don't think that I need an Update Handler; I don't intend to change the values in the search index (in fact, the goal is to build a Lucene index with Hadoop and then point a Solr instance at it). What I'm trying to do is split the document into two locations: one is the Lu

Merge several queries into one result?

2010-02-16 Thread Daniel Shane
Hi all! I'm trying to join 2 indexes together to produce a final result using only Solr + Velocity Response Writer. The problem is that each "hit" of the main index contains references to some common documents located in another index. For example, the hit could have a field that describes in

Re: Question about custom Lucene filters and Solr

2010-02-16 Thread Israel Ekpo
Hi Jon, You will need to write a plugin You will need custom Query parser and an Update Handler depending on what you are doing. The implementation of an Update Handler or Update Request Processor is not recommended because it is considered to be advanced. Take a look at the following links for

Re: Deleting spelll checker index

2010-02-16 Thread darniz
Thanks Hoss Apology for flooding the post. But still i cant stop thinking about this. i deleted my entire index and now i have 0 documents. Now if i make a query with accrd i still get a suggestion of accord even though there are no document returned since i deleted my entire index. i hope it al

RE: filter queries not fully filtering

2010-02-16 Thread Nagelberg, Kallin
Problem solved. I wasn't quoting the value. Since I was using names such as 'Gary Bettman' solr must have been giving all the Garys. -Original Message- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Tuesday, February 16, 2010 3:22 PM To: 'solr-user@lucene.apache.org'

Re: Re: Updating index: Replacing data directory recommended?

2010-02-16 Thread Peter Karich
Hi Ups, sorry. I didn't recognized the answer because it was in the bulk folder. I though with this procedure it will be a lot faster and less overhead. Just two lines of shell script. What do you think? Regards, Peter. This should work on Linux. The rsync based replication scripts used to

Range Queries, Geospatial

2010-02-16 Thread Fuad Efendi
Hi, I've read very interesting interview with Ryan, http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and -Videos/Interview-Ryan-McKinley Another finding is https://issues.apache.org/jira/browse/SOLR-773 (lucene/contrib/spatial) Is there any more staff going on for SOLR

filter queries not fully filtering

2010-02-16 Thread Nagelberg, Kallin
Hi everyone, I am attempting to implement a faceted drill down feature with Solr. I am having problems explaining some results of the fq parameter. Let's say I have two fields, 'people' and 'category'. I do a search for 'dog' and ask to facet on the people and category fields. I am told that t

Re: regarding ranking

2010-02-16 Thread Smith G
Hello , Thanks for your detailed explaination. > Do you want to punish *more* long documents? Not alot, but a bit more than default implementation. It seems "lengthNorm" is field based and pinushing lengthy fields does fit most of the cases in our project. > There will be a trade-off

Question about custom Lucene filters and Solr

2010-02-16 Thread Jon Bodner
Hello, I'm interested in using Solr with a custom Lucene Filter (like the one described in section 6.4.1 of the Lucene In Action, Second Edition book). I'd like to filter search results from a Lucene index against information stored in a relational database. I don't want to move the relationa

Preventing mass index delete via DataImportHandler full-import

2010-02-16 Thread Daniel Shane
I've setup a simple DIH import handler with Solr that connects via a database to my data. I have a small worry though. When I call the full-import functions, can I configure Solr (via the XML files) to make sure there are rows to index before wiping everything? What worries me is if, for some u

Re: How to retrieve relevance "debug/explain" info in code?

2010-02-16 Thread uwdanny
Hi erick, thanks for the reply. my query url includes "debugQuery=on" and the result page is correctly showing all the debug / explain info. the problem I'm facing is that I cannot get the same debug/explain info in code. I've been trying IndexSearcher.explain(Weight, int ) API, as well as Search

Tool for analyzing data in solr

2010-02-16 Thread dipti khullar
Hi All Is there any tool to analyze corrupted data in Solr. I am aware of luke. But does it shows somehow that the data is corrupted? Like some segments are missing or whether some documents have been corrupted - not fully indexed? Thanks Dipti

Re: How to retrieve relevance "debug/explain" info in code?

2010-02-16 Thread Erick Erickson
Any details? This is pretty ambiguous tacking debugQuery=true to a URL brings back some stuff in Lucene, IndexSearcher.explain()? Erick On Tue, Feb 16, 2010 at 1:21 PM, uwdanny wrote: > > any hints? > -- > View this message in context: > http://old.nabble.com/How-to-retrieve-relevance

Re: Updating index: Replacing data directory recommended?

2010-02-16 Thread Peter Karich
Hi, any hints or suggestions? Does anyone do the updating this way? Regards, Peter. Hi solr community! Is it recommended to replace the data directory of a heavy used solr instance? (I am aware of the http queries, but that will be too slow) I need a fast way to push development data to pr

Re: How to retrieve relevance "debug/explain" info in code?

2010-02-16 Thread uwdanny
any hints? -- View this message in context: http://old.nabble.com/How-to-retrieve-relevance-%22debug-explain%22-info-in-code--tp27602530p27612814.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Upgrading Tika in Solr

2010-02-16 Thread Grant Ingersoll
I've got a task open to upgrade to 0.6. Will try to get to it this week. Upgrading is usually pretty trivial. On Feb 14, 2010, at 12:37 AM, Liam O'Boyle wrote: > Afternoon, > > I've got a large collections of documents which I'm attempting to add to > a Solr index using Tika via the Extracti

Re: cannot match on phrase queries

2010-02-16 Thread Kevin Osborn
It definitely had something to do with omitTermFreqAndPosition. As soon as I disabled the option and re-indexed, my queries starting working as expected.I suspect it has to something to do with terms occupying the same position and losing that information by using omitTermFreqAndPositions, but I

Strict Hierarchical Facets (SOLR-64)

2010-02-16 Thread Wadim Kruse
Hi @all, I am getting the same recursive-concatenated results as the guys in the comments (http://issues.apache.org/jira/browse/SOLR-64). I couldn't get hiefacets working wether with release-1.4.0 nor with branch-1.4.0. I've got a 1.4.0-dev incl. SOLR-64 running and in parallel a 1.4.0-final. I w

Re: Delete by query discrepancy

2010-02-16 Thread Mat Brown
Cool, thanks - just wanted to make sure I'm not insane. Makes sense that there would be a difference if the index is built fresh in that case. On Tue, Feb 16, 2010 at 11:59, Mark Miller wrote: > Mat Brown wrote: >> Hi all, >> >> Trying to debug a very sneaky bug in a small Solr extension that I >

Re: regarding ranking

2010-02-16 Thread Ahmet Arslan
> Hello , >           Thanks. That clears my > doubts. Coming to the point two, Can > you please tell me which part of the Similarity takes care > of the > same. Is it possible to implement in such a way that we > give more > preference to "number of found terms". public float coord(int overlap,

Re: Delete by query discrepancy

2010-02-16 Thread Mark Miller
Mat Brown wrote: > Hi all, > > Trying to debug a very sneaky bug in a small Solr extension that I > wrote, and I've come across an odd situation. Here's what my test > suite does: > > deleteByQuery("*:*"); > // add some documents > commit(); > // test the search > > This works fine. The test suite

Delete by query discrepancy

2010-02-16 Thread Mat Brown
Hi all, Trying to debug a very sneaky bug in a small Solr extension that I wrote, and I've come across an odd situation. Here's what my test suite does: deleteByQuery("*:*"); // add some documents commit(); // test the search This works fine. The test suite that exposed the error (which is actua

Re: persistent cache

2010-02-16 Thread Jason Rutherglen
On a related note. Maybe it'd be good to have wiki page of experiences and possibly stats of various SSD drives? Either on Lucene or Solr wiki sites? 2010/2/16 Tim Terlegård : > 2010/2/15 Toke Eskildsen : >> From: Tim Terlegård [tim.terleg...@gmail.com] >>> If the index size is more than you can

Re: Getting max/min dates from solr index

2010-02-16 Thread Mark N
thanks . Is it possible to do date faceting on multiple solr shards? I am using index created in two different shards to do date faceting on field "DATE" * http://localhost:8983/solr/1_13_1_3/select?&shards=localhost:8983/solr/index1/,localhost_two:8983/solr/index/&start=0&rows=20&q=*&facet=true&

Re: multivalued : how to get file names

2010-02-16 Thread Kranti™ K K Parisa
that has answered my concern about the index size/duplicated data. but the other one is about presenting the search results, results should be one with list of files. so in this case I would need to write some logic before showing the results right? (may be like comparing each result solrdocument/

Re: multivalued : how to get file names

2010-02-16 Thread Erick Erickson
Unless you have *evidence* that the indexing each pdf with the form data as a single SOLR document is a problem, I would just index the fields with each document rather than try to index the PDFs as multivalued. The space used by duplicating the form field data is probably a tiny fraction of the da

dataimporthandler and expungeDeletes=false

2010-02-16 Thread Jorg Heymans
Hi, Can anybody tell me if [1] still applies as of version trunk 03/02/2010 ? I am removing documents from my index using deletedPkQuery and a deltaimport. I can tell from the logs that the removal seems to be working: 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder collectDelt

Re: regarding ranking

2010-02-16 Thread Smith G
Hello , Thanks. That clears my doubts.Coming to the point two, Can you please tell me which part of the Similarity takes care of the same. Is it possible to implement in such a way that we give more preference to "number of found terms". Also, here in our case we need to give more import

multivalued : how to get file names

2010-02-16 Thread Kranti™ K K Parisa
Hi, When we index using SOLR, we have an option called multivalued. How does that work with multiple files associated with same document. For example: submiting a form with some fields + list of pdf files index process: 1) considering all the form fields as individual solr input document fields (

IndexSchema object

2010-02-16 Thread Gargate, Siddharth
How can we get instance of IndexSchema object in Tokenizer subclass?

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-16 Thread Ron Chan
I'd doubt if a performance benchmark would be very useful, it ultimately depends on what you are trying to do and what you are comfortable with. We've had successful deployments on both. Any difference in performance is far outweighed by ease of setup/support that you personally find in each

Re: Query or FilterQuery for exact field match

2010-02-16 Thread gabriele renzi
On Tue, Feb 16, 2010 at 2:04 PM, NarasimhaRaju wrote: > Hi, > > using filterQuery(fq) is more efficient because SolrIndexSearcher will make > use of filterCache > and in your case it returns entire set from the cache instead of searching > from the entire index. > more info about solrCaches at

Re: Query or FilterQuery for exact field match

2010-02-16 Thread NarasimhaRaju
Hi, using filterQuery(fq) is more efficient because SolrIndexSearcher will make use of filterCache and in your case it returns entire set from the cache instead of searching from the entire index. more info about solrCaches at http://wiki.apache.org/solr/SolrCaching#filterCache Regards, P.N

Tomcat vs Jetty: A Comparative Analysis?

2010-02-16 Thread Steve Radhouani
Hi there, Is there any analysis out there that may help to choose between Tomcat and Jetty to deploy Solr? I wonder wether there's a significant difference between them in terms of performance. Any advice would be much appreciated, -Steve

Upgrading from solr1.3 to solr1.4

2010-02-16 Thread Rakhi Khatwani
Hi, i have indexed some data on solr 1.3.0. Now i wanna upgrade to solr 1.4.0 but on the same data. so here are the following steps i performed: 1. extract solr 1.4.0 2. copied the conf and data folder of my index from solr 1.3.0/examples/multicore to solr1.4.0/examples/multicore/ 3. started

WG: Performance-Issues and raising numbers of "cumulative inserts"

2010-02-16 Thread Bohnsack, Sven
Hi Shalin! Thanks for quick response. Sadly it tells me, that i have to look elsewhere to fix the problem. Anyone an idea what could cause the increasing warmup-Times? If required I can post some stats. Thanking you in anticipation! Regards, Sven Feed: Solr-Mailing-List Berei

Query or FilterQuery for exact field match

2010-02-16 Thread gabriele renzi
Hi everyone, in our app we sometimes use solr programmatically to retrieve all the elements that have a certain value in a single-valued single-token field ( brand:xxx). Since we are not interested in scoring this results, I was thinking that maybe this should be performed as a filterQuery (fq="br

Pragmatic more or less high availability option on 2 servers

2010-02-16 Thread Robert Krüger
Hi, I have to set up a SOLR cluster with some availability concept (is allowed to require manual interaction on fault, however, if there is a better way, I'd be interested in recommendations). I have two servers (A and B for the example) at my disposal. What I was thinking about was the follo

Re: persistent cache

2010-02-16 Thread Tim Terlegård
2010/2/15 Toke Eskildsen : > From: Tim Terlegård [tim.terleg...@gmail.com] >> If the index size is more than you can have in RAM, do you recommend >> to split the index to several servers so it can all be in RAM? >> >> I do expect phrase queries. Total index size is 107 GB. *prx files are >> total

Re: Updating index: Replacing data directory recommended?

2010-02-16 Thread Shalin Shekhar Mangar
On Mon, Feb 15, 2010 at 3:30 PM, Peter Karich wrote: > Hi solr community! > > Is it recommended to replace the data directory of a heavy used solr > instance? > (I am aware of the http queries, but that will be too slow) > > I need a fast way to push development data to production servers. > I tr

Re: Performance-Issues and raising numbers of "cumulative inserts"

2010-02-16 Thread Shalin Shekhar Mangar
On Tue, Feb 16, 2010 at 1:06 PM, Bohnsack, Sven wrote: > Hey IT-Crowd! > > I'm dealing with some performance issues during warmup the > queryResultCache. Normally it tooks about 11 Minutes (~700.000 ms), but > now it tooks about 4 MILLION and more ms. All I can see in the solr.log > ist that the

Re: Discovering Slaves

2010-02-16 Thread Shalin Shekhar Mangar
On Tue, Feb 16, 2010 at 4:23 AM, wojtekpia wrote: > > Is there a way to 'discover' slaves using ReplicationHandler? I'm writing a > quick dashboard, and don't have access to a list of slaves, but would like > to show some stats about their health. > No, the master does not know about any slave.