RE: Replacing existing documents

2007-08-22 Thread Ard Schrijvers
Hello, "Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id." AFAIK, this is not possible. You have the update in lucene, but internally it just does a delete/add operation "We

Major update to Solrsharp

2007-08-22 Thread Jeff Rodenburg
A big update was just posted to the Solrsharp project. This update now provides for first-class support for highlighting in the library. The implementation is really robust and provides the following features: - Structured highlight parameter assignment based on the SolrField object - F

defining fiels to be returned when using mlt

2007-08-22 Thread Stefan Rinner
Hi Is there any way to define the numer/type of fields of the documents returned in the "moreLikeThis" part of the response, when "mlt" is set to true? Currently I'm using morelikethis to show the number and sources of similar documents - therefore I'd need only the "source" field of th

Re: Replacing existing documents

2007-08-22 Thread Erik Hatcher
On Aug 21, 2007, at 9:25 PM, Lance Norskog wrote: Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id. There is such a patch: https://issues.apache.org/jira/browse/SOLR-13

Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Ravish Bhagdev
Hello, Sorry for stupid question. I'm trying to index html file as one of the fields in Solr, I've setup appropriate analyzer in schema but I'm not sure how to add html content to Solr. Encapsulating HTML content within field tag is obviously not valid. How do I add html content? Hope the query

Re: Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Jérôme Etévé
You need to encode your html content so it can be include as a normal 'string' value in your xml element. As far as remember, the only unsafe characters you have to encode as entities are: < -> < > -> > " -> "e; & -> & (google xml entities to be sure). I dont know what language you use , but fo

Re: Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Ravish Bhagdev
Thanks Jérôme! It seems to work now. I just hope the provided HTMLStripWhitespaceTokenizerFactory will strip the right tags now. I use Java and used HtmlEncoder provided in http://itext.ugent.be/library/api/ for encoding with success. (just in case someone happens to search this thread) Ravi

almost realtime updates with replication

2007-08-22 Thread mike topper
Hello, Currently in our application we are using the master/slave setup and have a batch update/commit about every 5 minutes. There are a couple queries that we would like to run almost realtime so I would like to have it so our client sends an update on every new document and then have solr

RE: Query optimisation - multiple filter caches?

2007-08-22 Thread Jonathan Woods
I understand - thanks, Yonik. I notice that LuceneQueryOptimizer is still used in SolrIndexSearcher.search(Query, Filter, Sort) - is the idea then that this method is deprecated, or that the config parameter query/boolTofilterOptimizer is no longer to be used? As for the other search() methods, t

Re: Query optimisation - multiple filter caches?

2007-08-22 Thread Yonik Seeley
On 8/22/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: > I notice that LuceneQueryOptimizer is still used in > SolrIndexSearcher.search(Query, Filter, Sort) - is the idea then that this > method is deprecated, Hmmm, so it is. I hadn't noticed because that method is not called from any query handle

RE: Query optimisation - multiple filter caches?

2007-08-22 Thread Jonathan Woods
Not high priority, but a few thoughts occur, then: - perhaps it would be better to use org.apache.lucene.search.Searcher by composition and have SolrIndexSearcher merely implement Searchable. - or... perhaps search(...) should perform optimally cache-aware searches - else integrators might wrongl

Apache web server logs in solr

2007-08-22 Thread Andrew Nagy
Hello, I was thinking that solr - with its built in faceting - would make for a great apache log file storage system. I was wondering if anyone knows of any module or library for apache to write log files directly to solr or to a lucene index? Thanks Andrew

RE: SolJava --- which attachments are valid?

2007-08-22 Thread Teruhiko Kurosaka
Sorry for revisiting this 3 weeks old thread. I downloaded the nighlty yesterday. I noticed that some classes have API docs (.html) but no source code (.java). For example, there is a javadoc for org.apache.solr.client.solrj.util.ClientUtils but no ClientUtils.java: bash-3.00$ find . -type f | gre

Solr and terracotta

2007-08-22 Thread Jonathan Ariel
Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out?

Solr scoring: relative or absolute?

2007-08-22 Thread Lance Norskog
Are the score values generated in Solr relative to the index or are they against an absolute standard? Is it possible to create a scoring algorithm with this property? Are there parts of the score inputs that are absolute? My use case is this: I would like to do a parallel search against two Solr

Re: Solr scoring: relative or absolute?

2007-08-22 Thread Sean Timm
Indexes cannot be directly compared unless they have similar collection statistics.  That is the same terms occur with the same frequency across all indexes and the average document lengths are about the same (though the default similarity in Lucene may not care about average document length--I

RE: Solr and terracotta

2007-08-22 Thread Jeryl Cook
tried it, didn't work that well...so I ended up making my own little faceted Search engine directly using RAMDirectory and clustering it via Terracotta...not as good as SOLR(smile), but it worked. i actually posted some questions awhile back in trying to get it to work. so terracotta can "hook"

Re: Solr and terracotta

2007-08-22 Thread Jonathan Ariel
How come it didn't work? How did you add RAMDir support to solr? On 8/22/07, Jeryl Cook <[EMAIL PROTECTED]> wrote: > > tried it, didn't work that well...so I ended up making my own little > faceted Search engine directly using RAMDirectory and clustering it via > Terracotta...not as good as SOLR(s

Running into problems with distributed index and search

2007-08-22 Thread Kasi Sankaralingam
Hi All, This is the scenario, I have two search SOLR instances running on two different partitions, I am treating one of the servers strictly read-only (for search) (search server) and the other Instance (index server) for indexing. The index file data directory reside on a NFS partition, I am

How to extract constrained fields from query

2007-08-22 Thread Martin Grotzke
Hello, in my custom request handler, I want to determine which fields are constrained by the user. E.g. the query (q) might be "ipod AND brand:apple" and there might be a filter query (fq) like "color:white" (or more). What I want to know is that "brand" and "color" are constrained. AFAICS I co

RE: Solr and terracotta

2007-08-22 Thread Orion Letizi
Jeryl, I remember you asking about how to hook in the RAMDirectory a while back. It seemed like there was maybe some support within Solr that you needed. I assume you're suggesting adding an issue in the Solr JIRA, right? Is there something that the Terracotta team can do to help? Cheers, Or

Web statistics for solr?

2007-08-22 Thread Matthew Runo
Hello! I was wondering if anyone has written a script that displays any stats from SOLR.. queries per second, number of docs added.. this sort of thing. Sort of a general dashboard for SOLR. I'd rather not write it myself if I don't need to, and I didn't see anything conclusive in the ar

Re: defining fiels to be returned when using mlt

2007-08-22 Thread Pieter Berkel
Hi Stefan, Currently there is no way to specify the list of fields to be returned by the MoreLikeThis handler. I've been looking to address this issue in https://issues.apache.org/jira/browse/SOLR-295 (point 3) however in the broader scheme of things, it seems logical to wait until https://issues

Re: Web statistics for solr?

2007-08-22 Thread Pieter Berkel
Matthew, Maybe the SOLR Statistics page would suit your purpose? (click on "statistics" from the main solr page or use the following url) http://localhost:8983/solr/admin/stats.jsp cheers, Piete On 23/08/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > > Hello! > > I was wondering if anyone has w

Re: almost realtime updates with replication

2007-08-22 Thread Walter Underwood
At Infoseek, we ran a separate search index with today's updates and merged that in once each day. It requires a little bit of federated search to prefer the new content over the big index, but the daily index can be very nimble for update. wunder On 8/22/07 7:58 AM, "mike topper" <[EMAIL PROTECT

Re: Solr and terracotta

2007-08-22 Thread Jonathan Ariel
If I am not wrong once you have the RAMDir feature mounting Terracotta should be transparent and fast, right? On 8/22/07, Orion Letizi <[EMAIL PROTECTED]> wrote: > > > Jeryl, > > I remember you asking about how to hook in the RAMDirectory a while back. > It seemed like there was maybe some support

RE: SolJava --- which attachments are valid?

2007-08-22 Thread Chris Hostetter
: I noticed that some classes have API docs (.html) but no source code : (.java). : For example, there is a javadoc for : org.apache.solr.client.solrj.util.ClientUtils : but no ClientUtils.java: i beleive this issue is that none of the source from the client directory is included in the builds at

Re: almost realtime updates with replication

2007-08-22 Thread Chris Hostetter
: : There are a couple queries that we would like to run almost realtime so : I would like to have it so our client sends an update on every new : document and then have solr configured to do an autocommit every 5-10 : seconds. : : reading the Wiki, it seems like this isn't possible because of the

Re: Running into problems with distributed index and search

2007-08-22 Thread Chris Hostetter
: 3) I had to bounce the tomcat search SOLR Webapp instance for it to : read the index files, is it mandatory? In a distributed environment, do : we always have to : : Bounce the SOLR Webapp instances to reflect the changes in the index : files? it sounds like you esentially have a master/sl

Re: How to extract constrained fields from query

2007-08-22 Thread Chris Hostetter
: in my custom request handler, I want to determine which fields are : constrained by the user. : : E.g. the query (q) might be "ipod AND brand:apple" and there might : be a filter query (fq) like "color:white" (or more). : : What I want to know is that "brand" and "color" are constrained. techni

Re: Structured Lucene documents

2007-08-22 Thread Chris Hostetter
: aren't expandable at query time. It would be quite cool if Solr could do : query-time expansions of dynamic fields (e.g. hl.fl=page_*) however that : would require some knowledge of the dynamic fields already stored in the : index, which I don't think is currently available in either Solr or Lu

Constraining date facets

2007-08-22 Thread raikoe
Hello, i am using faceting in a project and would like to do date faceting with facet.date. That works fine, but as well returns dates which have no resulting pages underneath, i.e. the facet count equals 0. Is it possible to constrain this just to dates for which results exist similar to facet.m

RE: Solr and terracotta

2007-08-22 Thread Jonathan Woods
Note that Hoss was earlier calling for someone to submit an implementation of SolrDirectoryFactory... http://www.nabble.com/forum/ViewPost.jtp?post=12260989&framed=y Jon > -Original Message- > From: Jonathan Ariel [mailto:[EMAIL PROTECTED] > Sent: 23 August 2007 03:23 > To: solr-user@lu