Re: Solr 1.4.1 field collapse

2010-07-31 Thread Otis Gospodnetic
Moazzam, I think you are thinking about Carrot2 clustering: http://wiki.apache.org/solr/ClusteringComponent Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Moazzam Khan > To: solr-user

Re: how to ignore position in indexing?

2010-07-31 Thread Otis Gospodnetic
The fact that https://issues.apache.org/jira/browse/LUCENE-2048 is still open tells me omitting only positional info isn't yet possible. You can also look around here: http://search-lucene.com/?q=omit+position+frequency&fc_project=Solr Otis Sematext :: http://sematext.com/ :: Solr - Luce

how to ignore position in indexing?

2010-07-31 Thread Li Li
hi all in lucene, we can only store tf of a term's invert list. in my application, I only provide dismax query with boolean query and don't support queries which need position info such as phrase query. So I don't want to store position info in prx file. How to turn off it? And if I turn off i

Re: Solr 1.4.1 field collapse

2010-07-31 Thread Moazzam Khan
Thanks, Hoss. I read something about clustering in the config file (and some place that it would be in 1.4.1) so I was curious. - Moazzam On Thu, Jul 29, 2010 at 2:20 PM, Chris Hostetter wrote: > > : I read somewhere that Solr 1.4.1 has field collapse support by default > : (without patching it)

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Lance Norskog
Ah! You're not just highlighting, you're snippetizing. This makes it easier. Highlighting does not stream- it pulls the entire stored contents into one string and then pulls out the snippet. If you want this to be fast, you have to split up the text into small pieces and only snippetize from the

DIH, UTF8 and default DIH encoding value

2010-07-31 Thread Amit Nithian
All, I am not sure if this is overly obvious or not (it wasn't to me) but in trying to index some international characters from XML files using the DIH, I found that setting the encoding attribute on the dataSource element to "UTF-8" fixed my problem. My question is why the default isn't UTF-8

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
However, I do need to search the entire document, or else the highlighting will sometimes be blank :-( Thanks! - Peter ps. sorry for the many responses - I'm rushing around trying to get this working. On Jul 31, 2010, at 1:11 PM, Peter Spam wrote: > Correction - it went from 17 seconds to 10

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
Correction - it went from 17 seconds to 10 seconds - I was changing the hl.regex.maxAnalyzedChars the first time. Thanks! -Peter On Jul 31, 2010, at 1:06 PM, Peter Spam wrote: > On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: > >> did you already try other values for hl.maxAnalyzedChars=21474

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: > did you already try other values for hl.maxAnalyzedChars=2147483647 Yes, I tried dropping it down to 21, but it didn't have much of an impact (one search I just tried went from 17 seconds to 15.8 seconds, and this is an 8-core Mac Pro with 6GB

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
On Jul 30, 2010, at 7:04 PM, Lance Norskog wrote: > Wait- how much text are you highlighting? You say these logfiles are X > big- how big are the actual documents you are storing? I want it to be like google - I put the entire (sometimes 60MB) doc in a field, and then just highlight 2-4 lines of

Re: search with special chars like € @ % §

2010-07-31 Thread Erick Erickson
Could you provide some more details on your use case? This sounds like an XY problem (see http://people.apache.org/~hossman/#xyproblem). The reason I say this is that you're probably going to shoot yourself in the foot if you require such symbols, leading to an "interesting" user experience. That

Boosting DisMax queries with !boost component

2010-07-31 Thread Martynas Miliauskas
Hi guys, I have following boosting query: b=scale(popularity,0,1) So far I have tried following queries, and got following results: 1. http://localhost:8080/solr/select?q={!boost%20$b=scale(popularity,0,1)%20v=$qq%20defType=dismax}&qf=title+tags&fl=*,score&qq=s

DIH: Rows fetch OK, Total Documents Failed??

2010-07-31 Thread scrapy
Hi, I'm a bit lost with this, i'm trying to import a new XML via DIH, all row are fetched but no ducument are indexed? I don't find any log or error? Any ideas? Here is the STATUS: status idle 1 7554 0 2010-07-31 10:14:33 0 7554 0:0:4.720 My xml file looks like this: M