Re: Dynamic Boosting at query time with boost value as another fieldvalue

2008-12-12 Thread Pooja Verlani
Hi, Will this currentDate work with epoch time only or can work with any date format as specified by the "simpleDateFormat" class of Java ?? Thank you, Regards, Pooja On Thu, Dec 11, 2008 at 7:20 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Take a look at FunctionQuery support i

Re: ExtractingRequestHandler and XmlUpdateHandler

2008-12-12 Thread Jacob Singh
Hi Grant, Happy to. Currently we are sending over documents by building a big XML file of all of the fields of that document. Something like this: $document = new Apache_Solr_Document(); $document->id = apachesolr_document_id($node->nid); $document->title = $node->title; $document->b

Re: ExtractingRequestHandler and XmlUpdateHandler

2008-12-12 Thread Grant Ingersoll
Hmmm, I think I see the disconnect, but I'm not sure. Sending to the ERH (ExtractingReqHandler) is not an XML command at all, it's a file- upload/ multi-part encoding. I think you will need an API that does something like: (Just making this up, this is not real code) File file = new File(f

RE: Taxonomy Support on Solr

2008-12-12 Thread Jana, Kumar Raja
Thanks all. This workaround was very helpful for my case. However, it would be wonderful if there was a way to make Solr have a copy of my classification so that I need not create a big string at the client side everytime I need to index a document. I am sure there are many others out there who d

RE: Returning snippets with results

2008-12-12 Thread Jana, Kumar Raja
Hi Grant, Thanks for the help. I have decided to store only the first MB in Solr and return snippets for results matching within that MB. For the rest of the results, tough luck -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Saturday, December 06, 2008

Solr and Aperture Framework

2008-12-12 Thread Rogerio Pereira
Hi! Is there someone in the list that worked on integrate Aperture Framework ( http://aperture.sourceforge.net/) with Solr? -- Regards, Rogério (_rogerio_) [Blog: http://faces.eti.br] [Sandbox: http://bmobile.dyndns.org] [Twitter: http://twitter.com/ararog] "Faça a diferença! Ajude o seu paí

Unwanted clustering of search results after sorting by score

2008-12-12 Thread Max Scheffler
Hallo, We have a website on which you can search through a large amount of products from different shops. The information describing the products are provided to us by the shops which sell these products. If we sort a search result by score many products of the same shop are clustered together.

Solr 1.3 - DataInputHandler DIH integration

2008-12-12 Thread Rakesh Sinha
[Changing subject accordingly ] . Thanks Noble. I grabbed one of the nightlies at - http://people.apache.org/builds/lucene/solr/nightly/ . I could not find the DataImportHandler in the same. May be I am missing something about the sources of DataImportHandler. Can somebody suggest on where to

Re: Taxonomy Support on Solr

2008-12-12 Thread Walter Underwood
I designed and built the taxonomy and classification support in the Ultraseek search engine. There are many kinds of taxonomies, even different "shapes": tree, DAG, facets, tree + links (e.g. ANSI/NISO Z39.19, LCSH, Yahoo directory), and even mixtures of those. It would be a serious limitation to

Re: not string or text fields and shards

2008-12-12 Thread Ian Connor
That problem related to the score being computed when it was not in the sort: https://issues.apache.org/jira/browse/SOLR-626 To see if this is also your problem, you can make the score in the sort to see if that stops the error. If you have the same issue, you can take the patch and compile your

Re: Solr 1.3 - DataInputHandler DIH integration

2008-12-12 Thread Rakesh Sinha
Ooops . Sorry - Never mind - they are present under contrib directory. /opt/programs/solr $ find contrib -name *.java | grep Handler contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/DataImportHandlerException.java contrib/dataimporthandler/src/main/java/org/apache/solr/ha

Re: Query Performance while updating teh index

2008-12-12 Thread oleg_gnatovskiy
Hey Otis, Do you think our problem is slow warm time, or too few items that are being copied? Oleg -- View this message in context: http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452835p20980523.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query Performance while updating teh index

2008-12-12 Thread oleg_gnatovskiy
Here’s what we have on one of the data slaves for the autowarming. -- Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main filterCache{lookups=351993,hits=347055,hitratio=0.98,inserts=8332,eviction

Solr 1.3 DataImportHandler iBatis integration ..

2008-12-12 Thread Rakesh Sinha
Hi - I was planning to check more details about integrating ibatis query resultsets with the query required for tags . Before I start experimenting more along the lines - I am just curious if there had been some effort done earlier on this end (specifically - how to better integrate DataImport

Re: Query Performance while updating teh index

2008-12-12 Thread Otis Gospodnetic
It looks like cache warming is taking about 12 seconds. It sounds like you need to see if performance is bad during warming, or right after warming (and right after the new searcher gets exposed to queries). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Mes

Re: Unwanted clustering of search results after sorting by score

2008-12-12 Thread Otis Gospodnetic
Max - field collapsing may be your friend - https://issues.apache.org/jira/browse/SOLR-236 This field collapsing keeps coming up... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Max Scheffler > To: solr-user@lucene.apache.org > Sent: F

Re: Solr and Aperture Framework

2008-12-12 Thread Otis Gospodnetic
Rogerio, I think it might be better to specify which part of Aperture specifically - e.g. parsers or crawler or ...? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Rogerio Pereira > To: solr-user@lucene.apache.org > Sent: Friday, Decemb

RE: Query Performance while updating teh index

2008-12-12 Thread Feak, Todd
It's spending 4-5 seconds warming up your query cache. If 4-5 seconds is too much, you could reduce the number of queries to auto-warm with on that cache. Notice that the 4-5 seconds is spent only putting about 420 queries into the query cache. Your autowarm of 5 for the query cache seems a bi

Re: Solr and Aperture Framework

2008-12-12 Thread Rogerio Pereira
Parsers to be more specific. 2008/12/12 Otis Gospodnetic > Rogerio, > > I think it might be better to specify which part of Aperture specifically - > e.g. parsers or crawler or ...? > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >

Re: Solr and Aperture Framework

2008-12-12 Thread Otis Gospodnetic
Rogerio, You may want to look at http://wiki.apache.org/solr/ExtractingRequestHandler Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Rogerio Pereira > To: solr-user@lucene.apache.org > Sent: Friday, December 12, 2008 1:47:35 PM > Subject:

Re: Solr and Aperture Framework

2008-12-12 Thread Grant Ingersoll
Yes, I have for my book and other things. On Dec 12, 2008, at 9:47 AM, Rogerio Pereira wrote: Hi! Is there someone in the list that worked on integrate Aperture Framework ( http://aperture.sourceforge.net/) with Solr? -- Regards, Rogério (_rogerio_) [Blog: http://faces.eti.br] [Sandbox:

Re: Query Performance while updating teh index

2008-12-12 Thread Yonik Seeley
Right, query cache typically has a lower hit ratio, and one check per request - often not worth autowarming much. The filter cache can be a different story with a higher hit ratio, and higher number of checks per request. -Yonik On Fri, Dec 12, 2008 at 1:35 PM, Feak, Todd wrote: > It's spending

Re: Solr and Aperture Framework

2008-12-12 Thread Ryan McKinley
If your up for a bit of integration you may also want to look at: http://incubator.apache.org/droids/ droids + http://wiki.apache.org/solr/ExtractingRequestHandler + some polish could be an alternative to aperture. With Aperture, I feel like I spend most of my time getting stuff out of RDF

RE: Query Performance while updating teh index

2008-12-12 Thread oleg_gnatovskiy
The auto warm time is not an issue. We take the server off the load balancer while it is autowarming. It seems that the slowness occurs after autowarm is done. Feak, Todd wrote: > > It's spending 4-5 seconds warming up your query cache. If 4-5 seconds is > too much, you could reduce the number

RE: Query Performance while updating teh index

2008-12-12 Thread oleg_gnatovskiy
I just verified this. The slowness occurs after auto warm is done. Oleg -- View this message in context: http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452835p20982068.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Query Performance while updating teh index

2008-12-12 Thread Feak, Todd
Sorry, my bad. Didn't read the entire thread. Look at your filter cache first. You are autowarming 1000, and there is exactly 1000 in there. Yet it looks like there may be tens of thousands of filter queries in your system. I would try autowarming more. Try 10,000 or 20,000 and see if it helps. S

Re: Solr and Aperture Framework

2008-12-12 Thread Rogerio Pereira
Thanks guys for all answers.I'll take a look on ExtractingRequestHandler and keep tracking Otis progress on Tika integration. 2008/12/12 Ryan McKinley > If your up for a bit of integration you may also want to look at: > http://incubator.apach

RE: Query Performance while updating teh index

2008-12-12 Thread oleg_gnatovskiy
Should this autowarm value be set based on the number of lookups? From the info I provided that like 60k. filterCache{lookups=58522 Will 25k be enough? Also, does that mean that we have to at least increase the size and initial size as big as we set the autowarm? Feak, Todd wrote: > > Sorry,

Re: Taxonomy Support on Solr

2008-12-12 Thread Shalin Shekhar Mangar
On Fri, Dec 12, 2008 at 9:11 PM, Walter Underwood wrote: > > One feature that is very useful is to update the category tag > after the document has been indexed. We ran into that again > and again when implementing taxonomies at Verity. Take a look at SOLR-828. There's no patch there but there h

Re: Solr 1.3 DataImportHandler iBatis integration ..

2008-12-12 Thread Shalin Shekhar Mangar
On Fri, Dec 12, 2008 at 11:50 PM, Rakesh Sinha wrote: > Hi - > I was planning to check more details about integrating ibatis query > resultsets with the query required for tags . Before I > start experimenting more along the lines - I am just curious if there > had been some effort done earlie

Re: Solr 1.3 DataImportHandler iBatis integration ..

2008-12-12 Thread Rakesh Sinha
Trivial answer - I already have quite a bit of iBatis queries as part of the project ( a large consumer facing website) that I want to reuse. Also - the iBatis layer already has all the db authentication tokens / sqlmap wired on ( as part of sql-map-config.xml ). When I create the dataConfig xml

Re: Applying Field Collapsing Patch

2008-12-12 Thread John Martyniak
That worked perfectly!!! Thank you. I wonder why it didn't work in the same way off the downloaded build. -John On Dec 11, 2008, at 9:40 PM, Doug Steigerwald wrote: Have you tried just checking out (or exporting) the source from SVN and applying the patch? Works fine for me that way. $ s

Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
As per the example in the wiki - http://wiki.apache.org/solr/DataImportHandler - I am seeing the following fragment. .. My scaled-down application looks very similar along these lines but where my resultset is s

Using Regex fragmenter to extract paragraphs

2008-12-12 Thread Mark Ferguson
Hello, I am trying to use the regex fragmenter and am having a hard time getting the results I want. I am trying to get fragments that start on a word character and end on punctuation, but for some reason the fragments being returned to me seem to be very inflexible, despite that I've provided a l

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Shalin Shekhar Mangar
DataImportHandler is designed to stream rows one by one to create Solr documents. As long as your database driver supports streaming, you should be fine. Which database are you using? On Sat, Dec 13, 2008 at 2:20 AM, Kay Kay wrote: > As per the example in the wiki - > http://wiki.apache.org/solr

Re: new faceting algorithm

2008-12-12 Thread wojtekpia
It looks like my filterCache was too big. I reduced my filterCache size from 700,000 to 20,000 (without changing the heap size) and all my performance issues went away. I experimented with various GC settings, but none of them made a significant difference. I see a 16% increase in throughput by a

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
I am using MySQL. I believe (since MySQL 5) supports streaming. On more about streaming - can we assume that when the database driver supports streaming , the resultset iterator is a forward directional iterator. If , say the streaming size is 10K records and we are trying to retrieve a total

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Bryan Talbot
It only supports streaming if properly enabled which is completely lame: http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
Thanks Bryan . That clarifies a lot. But even with streaming - retrieving one document at a time and adding to the IndexWriter seems to making it more serializable . So - may be the DataImportHandler could be optimized to retrieve a bunch of results from the query and add the Documents in a

Re: Using Regex fragmenter to extract paragraphs

2008-12-12 Thread Mark Ferguson
Someone helped me with the regex and pointed out a couple mistakes, most notably the extra quantifier in .*{400,600}. My new regex is this: \w.{400,600}[\.!?] Unfortunately, my results still aren't any better. Some results start with a word character, some don't, and none seem to end with punctua

negative boosts

2008-12-12 Thread Kevin Osborn
My index has a category field and I would like to apply a negative boost to certain categories. For example, if I search for "thinkpad", it should push results for the laptop bag and other accessory categories to the bottom. So, I first tried altering the bq field with category:(batteries bags

Re: ExtractingRequestHandler and XmlUpdateHandler

2008-12-12 Thread Jacob Singh
Hi Grant, Thanks for the quick response. My Colleague looked into the code a bit, and I did as well, here is what I see (my Java sucks): http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/SolrContentHandler.java //handle the lite

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Shalin Shekhar Mangar
On Sat, Dec 13, 2008 at 4:51 AM, Kay Kay wrote: > Thanks Bryan . > > That clarifies a lot. > > But even with streaming - retrieving one document at a time and adding to > the IndexWriter seems to making it more serializable . > We have experimented with making DataImportHandler multi-threaded in

Re: Solr 1.3 DataImportHandler iBatis integration ..

2008-12-12 Thread Shalin Shekhar Mangar
Ok makes sense. I don't think anybody has reported trying this. If you decide to do it, it might be worth contributing back. I guess it may be more difficult than just using plain sql queries. On Sat, Dec 13, 2008 at 2:10 AM, Rakesh Sinha wrote: > Trivial answer - I already have quite a bit of iB

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
Thanks Shalin for the clarification. The case about Lucene taking more time to index the Document when compared to DataImportHandler creating the input is definitely intuitive. But just curious about the underlying architecture on which the test was being run. Was this performed on a multi-co

Stopping / Starting IndexReaders in Solr 1.3+

2008-12-12 Thread Kay Kay
For a particular application of ours - we need to suspend the Solr server from doing any query operation ( IndexReader-s) for sometime, and then after sometime in the near future ( in minutes ) - reinitialize / warm IndexReaders once again and get moving. It is a little bit different from sin

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Shalin Shekhar Mangar
On Sat, Dec 13, 2008 at 11:03 AM, Kay Kay wrote: > Thanks Shalin for the clarification. > > The case about Lucene taking more time to index the Document when compared > to DataImportHandler creating the input is definitely intuitive. > > But just curious about the underlying architecture on which

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
Shalin Shekhar Mangar wrote: On Sat, Dec 13, 2008 at 11:03 AM, Kay Kay wrote: Thanks Shalin for the clarification. The case about Lucene taking more time to index the Document when compared to DataImportHandler creating the input is definitely intuitive. But just curious about the underly

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Shalin Shekhar Mangar
On Sat, Dec 13, 2008 at 11:45 AM, Kay Kay wrote: > True - Currently , playing around with mysql . But I was trying to > understand more about how the Statement object is getting created (in the > case of a platform / vendor specific query like this ). Are we going through > JPA internally in Solr

Re: multiword query using dismax

2008-12-12 Thread Chris Hostetter
: 2<-25% : my query is = monty+python+scandal : i just issue monty+python i get bunch of documents but when i issue : monty+python+scandal then i just get 1. isn't the case that i should : get documents which match monty+python+scandal then followed by : documents that match monty+pytho

Re: minimum match issue with dismax

2008-12-12 Thread Chris Hostetter
: do any one know how to make sure minimum match in dismax is working? i : change the values and try doing solrCtl restart indexname but i don't : see it taking into effect. any body have an idea on this? use debugQuery=true, and then look at the parsedquery ... it can be somewhat confusing

Re: negative boosts

2008-12-12 Thread Chris Hostetter
: My index has a category field and I would like to apply a negative boost : to certain categories. For example, if I search for "thinkpad", it : should push results for the laptop bag and other accessory categories to : the bottom. : So, I first tried altering the bq field with category:(batt

Re: Stopping / Starting IndexReaders in Solr 1.3+

2008-12-12 Thread Erik Hatcher
Maybe the PingRequestHandler can help? It can check for the existence of a file (see solrconfig.xml for healthcheck) and return an error if it is not there. This wouldn't prevent Solr from responding to requests, but if a client used that information to determine whether to make requests

Re: Dismax Minimum Match/Stopwords Bug

2008-12-12 Thread Chris Hostetter
: I have discovered some weirdness with our Minimum Match functionality. : Essentially it comes up with absolutely no results on certain queries. : Basically, searches with 2 words and 1 being ³the² don¹t have a return : result. From what we can gather the minimum match criteria is making it : su