Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread Ravish Bhagdev
Thanks all for help. Just to make sure I understand correctly, am I right in summarizing this way than?: No significance of using HTML: Unlike nutch Solr doesn't parse HTML, so it ignores the anchors, titles etc and is not good for page rank -esq indexing. HTMLAnalyser (by with you probably mean

Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread J.J. Larrea
At 3:45 PM -0700 10/4/07, Mike Klaas wrote: >I'm actually somewhat surprised that several people are interested in this but >none have have been sufficiently interested to implement a solution to >contribute: > >http://issues.apache.org/jira/browse/SOLR-42 I just devised a workaround earlier in

Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread Walter Underwood
Wow, well-formed HTML. That's a rare beast. --wunder On 10/4/07 7:08 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > if you have wellformed HTML documents, use an HTML parser to extract the > real content.

Re: Does Solr Have?

2007-10-04 Thread Chris Hostetter
: Is there, or are there plans to start, a plugin and extension repository? Strictly speaking, there is no reason why Solr Plugins would have to live in an Apache repository. if people write plugins that would be generally usefull to several people, and they wish to contribute them to Apache,

Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread Chris Hostetter
: In general, I don't recommend indexing HTML content straight to Solr. None of : the Solr contributors do this so the use case hasn't received a lot of love. I second that comment ... the HTML Striping code was never intended to be an "HTML Parser" it was designed to be a workarround for deali

Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread Adrian Sutton
On 05/10/2007, at 8:45 AM, Mike Klaas wrote: In general, I don't recommend indexing HTML content straight to Solr. None of the Solr contributors do this so the use case hasn't received a lot of love. We're indexing XHTML straight to Solr and it's working great so far. I'm actually somewhat

Re: Handling empty query

2007-10-04 Thread Christopher Triggs
Have a look at http://www.mail-archive.com/solr-user@lucene.apache.org/msg03394.html The thread goes on to describe that just using q=*:* is efficient and is very usefull for getting facets for browsing / navigation. Regards, Triggsie On 4-Oct-07, at 3:25 PM, Guangwei Yuan wrote: Does S

Re: Handling empty query

2007-10-04 Thread Mike Klaas
On 4-Oct-07, at 3:25 PM, Guangwei Yuan wrote: Does Solr support empty queries? It'll be nice if Solr can return all results if q is null. Otherwise, I guess I'll have to write a customized request handler. Any thoughts? The dismax handler has an "alt.q" parameter which is used as the quer

Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread Mike Klaas
On 4-Oct-07, at 3:19 PM, Adrian Sutton wrote: I see that you're using the HTML analyzer. Unfortunately that does not play very well with highlighting at the moment. You may get garbled output. Is it the HTML analyzer or the fact that it's HTML content? If it's just the analyzer you could

RE: Handling empty query

2007-10-04 Thread Lance Norskog
If a field is required, and always has data, this query will enumerate all documents: field:[* TO *] -Original Message- From: Guangwei Yuan [mailto:[EMAIL PROTECTED] Sent: Thursday, October 04, 2007 3:26 PM To: solr-user@lucene.apache.org Subject: Handling empty query Hi, Does Solr su

Handling empty query

2007-10-04 Thread Guangwei Yuan
Hi, Does Solr support empty queries? It'll be nice if Solr can return all results if q is null. Otherwise, I guess I'll have to write a customized request handler. Any thoughts? Thanks in advance. - Guangwei

RE: how to make sure a particular query is ALWAYS cached

2007-10-04 Thread Britske
I need the documents in order, so FilterCache is no use. Moreover, I already use lots of the filtercache for other fq-queries. About 99% of the 6000 fields I mentioned have there values seperately in the filtercache. There must be room for optimization there, but that's a different story ;-) //G

Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread Adrian Sutton
I see that you're using the HTML analyzer. Unfortunately that does not play very well with highlighting at the moment. You may get garbled output. Is it the HTML analyzer or the fact that it's HTML content? If it's just the analyzer you could always just copy the HTML content to another

RE: how to make sure a particular query is ALWAYS cached

2007-10-04 Thread Lance Norskog
You could make these filter queries. Filters are a separate cache and as long as you have more cache than queries they will remain pinned in RAM. Your code has to remember these special queries in special-case code, and create dummy query strings to fetch the filter query. "field:[* TO *]" will do

Re: Indexing HTML

2007-10-04 Thread Mike Klaas
On 3-Oct-07, at 3:26 AM, Ravish Bhagdev wrote: Because of this I cannot present the resulting html in a webpage. Is it possible to strip out all HTML tags completely in result set? Would you recommend sending stripped out text to solr instead? But doesn't Solr use HTML features while searchin

Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread Mike Klaas
On 2-Oct-07, at 12:52 AM, Ravish Bhagdev wrote: I see that you're using the HTML analyzer. Unfortunately that does not play very well with highlighting at the moment. You may get garbled output. -Mike

RE: question about bi-gram analysis on query

2007-10-04 Thread Keene, David
Hi, Thanks for responding. I should have been clearer.. By "actual search" I meant hitting the search demo page on the solr admin page. So I get no results on this query: /solr/select/?q=%E7%BE%8E%E8%81%AF&version=2.2&start=0&rows=10&indent=on But the same query (with the data in my index) o

Re: unable to figure out nutch type highlighting in solr....

2007-10-04 Thread Mike Klaas
In 2-Oct-07, at 12:52 AM, Ravish Bhagdev wrote: I have tried very hard to follow documentation and forums that try to answer questions about how to return snippets with highlights for relevant searched term using Solr (as nutch does with such ease). I will be really grateful if someone can guid

Re: Solr - Lucene Query

2007-10-04 Thread Mike Klaas
On 4-Oct-07, at 11:07 AM, Jae Joo wrote: In the schema.xml, this fiend is defined by type="text" indexed="true" /> Is there any way to find the document by querying - The Appraisal Station? sure: if you query trade1:(the appraisal station), that document hsould be found. If you w

Re: how to make sure a particular query is ALWAYS cached

2007-10-04 Thread Britske
hossman wrote: > > > : I want a couple of costly queries to be cached at all times in the > : queryResultCache. (unless I have a new searcher of course) > > first off: you can ensure that certain queries are in the cache, even if > there is a newSearcher, just configure a newSearcher Event L

RE: question about bi-gram analysis on query

2007-10-04 Thread Teruhiko Kurosaka
Hello David, > And if I do a search in Luke and the solr analysis page > for美聯, I get a hit. But on the actual search, I don't. I think you need to tell us what you mean by "actual search" and your code that interfaces with Solr. -kuro

Solr - Lucene Query

2007-10-04 Thread Jae Joo
In the schema.xml, this fiend is defined by Is there any way to find the document by querying - The Appraisal Station? Thanks, Jae

facet and field collapse

2007-10-04 Thread Xuesong Luo
Hi, there, Our index stores employee working history information. For each employee, there could be multiple index records. The requirement is: 1. The search result should be sorted on score. 2. Each employee should only appear once regardless how many match are found. 3. The resul

Re: Does Solr Have?

2007-10-04 Thread Matthew Runo
Boo, thank you for the reply. That's what I get for customizing it and taking out all the other code I guess. Sorry about that. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 +

Re: Does Solr Have?

2007-10-04 Thread Ryan McKinley
add: class="org.apache.solr.handler.admin.LukeRequestHandler" /> to your solrconfig.xml It is in the example solrconfig.xml that comes with 1.2 Matthew Runo wrote: How does one set up the LukeRequestHandler? I didn't see a document in the wiki about how to add new handlers, and my install

Re: Real-time replication

2007-10-04 Thread Walter Underwood
We don't use Solr replication. Each server is independent and does its own indexing. This has several advantages: * all installations are identical * no single point of failure * no inter-server version or config dependencies * we can run a different version or config on one server for testing Th

Re: Real-time replication

2007-10-04 Thread Matthew Runo
The only problem that I see possibly happening is that you may end up committing more often than SOLR can open/prewarm new searchers. This happens in the peak of the day on our servers - leaving us with 5-10 searchers just hanging out waiting for prewarm to be up - only be closed as soon as

Re: Does Solr Have?

2007-10-04 Thread Matthew Runo
How does one set up the LukeRequestHandler? I didn't see a document in the wiki about how to add new handlers, and my install (a 1.1 install upgraded to 1.2) does not have this handler available. I'd like to see what we're talking about, it sounds very interesting.. but I can't find how to

Real-time replication

2007-10-04 Thread John Reuning
Apologies if this has been covered. I searched the archives and didn't see a thread on this topic. Has anyone experimented with a near real-time replication scheme similar to RDBMS replication? There's large efficiency in using rsync to copy the lucene index files to slaves, but what if you

Re: Re: Replication

2007-10-04 Thread ycrux
Hi Eric ! I can help on that if you know, even If I'm new to Solr. What are you planning to do ? cheers Younès Message d'origine >De: Erik Hatcher >Sujet: Re: Replication >Date: Thu, 4 Oct 2007 09:09:03 -0400 >A: solr-user@lucene.apache.org > RCVD_IN_SORBS_DUL autolearn=no version

Re: how to make sure a particular query is ALWAYS cached

2007-10-04 Thread Chris Hostetter
: I want a couple of costly queries to be cached at all times in the : queryResultCache. (unless I have a new searcher of course) first off: you can ensure that certain queries are in the cache, even if there is a newSearcher, just configure a newSearcher Event Listener that forcibly warms the

Re: Does Solr Have?

2007-10-04 Thread Robert Young
Is there, or are there plans to start, a plugin and extension repository? Cheers Rob On 10/4/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: > dooh, should check all my email first! > > >> > >> Will Solr automatically reload the file if it changes or does it have > >> to be informed of the change? >

Re: Solr live at Netflix

2007-10-04 Thread Walter Underwood
Gamera and Gamers do not stem to the same word, but the old Netflix engine did conflate those two words. The Metaphones for those are KMR and KMRS, respectively, and the old engine did fuzzy matching on Metaphones, something I don't recommend. It also matched "skiing" to "sings". wunder On 10/4/0

RE: Solr live at Netflix

2007-10-04 Thread Wagner,Harry
Otis, Take a look at KStem: http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi It's less aggressive than Porter. I modified the Lucene version to work with Solr, but don't know if it was adopted into the Solr source. Let me know if you are interested and I'll send you a jar file. Cheers!

Re: Solr live at Netflix

2007-10-04 Thread Otis Gospodnetic
I'm curious about this one. I'm assuming Porter stemmer would stem Gamers and Gamera to the same stem (Game?). If the stems are different, which stemmer are you using? A smarter custom morphological stemmer? Thanks, Otis - Original Message From: Tom Hill <[EMAIL PROTECTED]> To: solr

Re: Does Solr Have?

2007-10-04 Thread Ryan McKinley
dooh, should check all my email first! Will Solr automatically reload the file if it changes or does it have to be informed of the change? I'll expose my confusion here and say that I don't know for sure, but I'm pretty sure that once it's been loaded it won't get reloaded without bouncing

Re: Does Solr Have?

2007-10-04 Thread Ryan McKinley
Robert Young wrote: Hi, We're just about to start work on a project in Solr and there are a couple of points which I haven't been able to find out from the wiki which I'm interested in. 1. Is there a REST interface for getting index stats? I would particularly like access to terms and their doc

Re: Replication

2007-10-04 Thread Erik Hatcher
Eric - there is tons here Start there and hit us up here if anything is amiss. Erik On Oct 4, 2007, at 9:05 AM, Eric Treece wrote: Hello All, I am interested in some of the joys, tribulations and processes of running a replica

Replication

2007-10-04 Thread Eric Treece
Hello All, I am interested in some of the joys, tribulations and processes of running a replicated Solr environment. Can anyone point to any particular links, documents and/or personal experiences. Thanks, Eric Treece [EMAIL PROTECTED]

Re: Does Solr Have?

2007-10-04 Thread Erik Hatcher
On Oct 4, 2007, at 6:10 AM, Robert Young wrote: Brilliant, thank you, that LukeRequestHandler looks very useful. On 10/4/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: 3. Is it possible to change stopword and synonym sets at runtime? Only if the underlying text file is changed. Will Solr aut

Re: how to make sure a particular query stays cached (and is not overwritten)

2007-10-04 Thread Britske
the title of my original post was misguided. // Geert-Jan Britske wrote: > > I want a couple of costly queries to be cached at all times in the > queryResultCache. (unless I have a new searcher of course) > > As for as I know the only parameters to be supplied to the > LRU-implementation of

Re: Does Solr Have?

2007-10-04 Thread Robert Young
Brilliant, thank you, that LukeRequestHandler looks very useful. On 10/4/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > 3. Is it possible to change stopword and synonym sets at runtime? > > Only if the underlying text file is changed. Will Solr automatically reload the file if it changes or does

Re: Does Solr Have?

2007-10-04 Thread Erik Hatcher
On Oct 4, 2007, at 4:38 AM, Robert Young wrote: 1. Is there a REST interface for getting index stats? I would particularly like access to terms and their document frequencies, prefereably filtered by a query. Yes, the Luke request handler provides deeper views into the index information:

how to make sure a particular query is ALWAYS cached

2007-10-04 Thread Britske
I want a couple of costly queries to be cached at all times in the queryResultCache. (unless I have a new searcher of course) As for as I know the only parameters to be supplied to the LRU-implementation of the queryResultCache are size-related, which doens't give me this guarentee. what would

Re: Searching combined English-Japanese index

2007-10-04 Thread Maximilian Hütter
You were right, the indexing is already wrong. I debugged Solr and saw that the indexwriter gets the wrong values. That was because of the missing Content-Type in the update-requests. It was just text/xml without the charset=utf-8 . So it was interpreted as ISO-8859-1 Ithink. Changing the charset t

Does Solr Have?

2007-10-04 Thread Robert Young
Hi, We're just about to start work on a project in Solr and there are a couple of points which I haven't been able to find out from the wiki which I'm interested in. 1. Is there a REST interface for getting index stats? I would particularly like access to terms and their document frequencies, pre