date:20111104

Proper analyzer / tokenizer for syslog data?

2011-11-04 Thread Peter Spam

Example data: 01/23/2011 05:12:34 [Test] a=1; hello_there=50; data=[1,5,30%]; I would love to be able to just "grep" the data - ie. if I search for "ello", it finds and returns "ello", and if I search for "hello_there=5", it would match too. Here's what I'm using now:

Re: Can Solr handle large text files?

2011-11-04 Thread Peter Spam

Solr 4.0 (11/1 snapshot) Data: 80k files, average size 2.5MB, largest is 750MB; Solr: Each document is max 256k; total docs = 800k Machine: Early 2009 Mac Pro, 6GB RAM, 1GBmin/2GBmax given to Solr Java; Admin shows 30% mem usage I originally tried injecting the entire file into a single Solr doc

Zookeeper aware Replication in SolrCloud

2011-11-04 Thread prakash chandrasekaran

hi, i m using SolrCloud and i wanted to add Replication feature to it .. i followed the steps in Solr Wiki .. but when the client tried to poll for data from server i got below Error Message .. in Master LogNov 3, 2011 8:34:00 PM org.apache.solr.common.SolrException logSEVERE: org.apache.solr.

Re: Proper analyzer / tokenizer for syslog data?

2011-11-04 Thread Ahmet Arslan

> Example data: > 01/23/2011 05:12:34 [Test] a=1; hello_there=50; > data=[1,5,30%]; > > I would love to be able to just "grep" the data - ie. if I > search for "ello", it finds and returns "ello", and if I > search for "hello_there=5", it would match too. > > Here's what I'm using now: > > c

Solr, MultiValues and links...

2011-11-04 Thread Tiernan OToole

Right, not sure how to ask this question, what the terminology, but hopefully my explaination will help... We are chucking data into solr for queries. i cant mention the exact data, but the closest thing i can think of is as follows: - Unique ID for the solr record (DB ID in this case) - A

Comparing apples & oranges?

2011-11-04 Thread Martin Koch

Hi List I have a solr index where I want to include numerical fields in my ranking function as well as keyword relevance. For example, each document has a document view count, and I'd like to increase the relevancy of documents that are read often, and penalize documents with a very low view count

Solr

2011-11-04 Thread KARHU Toni

Hi, when is the SOLR cloud version planned to be released/stable what are your thought of using it in a serious production environment? Br, Toni ** IMPORTANT: This message is intended exclusively for in

Re: Ordered proximity search

2011-11-04 Thread Rahul Warawdekar

Hi Thomas, Do you always need the ordered proximity search by default ? You may want to check SpanNearQuery at " http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/";. We are using edismax query parser provided by Solr. I had a similar type of requirement in our project in here is how

Re: Ordered proximity search

2011-11-04 Thread LT.thomas

Thanks for your reply, I will check this advice -- View this message in context: http://lucene.472066.n3.nabble.com/Ordered-proximity-search-tp3477946p3480321.html Sent from the Solr - User mailing list archive at Nabble.com.

Fwd: Assist please

2011-11-04 Thread Oleg Tikhonov

-- Forwarded message -- From: NDIAYE Bacar Date: Fri, Nov 4, 2011 at 12:05 PM Subject: Assist please To: d...@tika.apache.org, u...@tika.apache.org Hi, I need your assist please for to configuration the Apache Tika to Sorl attachment in Drupal 7. I have try to confi

Re: limiting searches to particular sources

2011-11-04 Thread Erick Erickson

How are you crawling your info? Somewhere you have to inject the source into the document, won't do the trick because there's no source available If you're crawling the data by yourself, you can just add the source to the document. If you're using DIH, you can specify the field as a constant

Re: Highlighting "text" field when query is for "string" field

2011-11-04 Thread Erick Erickson

Try this with &debugQuery=on. I suspect you're not getting the query you think you are and I'd straighten that out before worrying about highlighting. Usually, for instance, AND should be capitalized to be an operator. So try with &debugQuery=on and see what happens. The highlighter, I believe, w

Re: Extended Dismax and Proximity Queries

2011-11-04 Thread Erick Erickson

Yes. Just try it with &debugQuery=on and you can see the parsed form of the query. Best Erick On Wed, Nov 2, 2011 at 6:20 PM, Jamie Johnson wrote: > Is it possible to do Proximity queries using edismax? I saw I could > do the following > > q="batman movie"&qs=100 > > but I wanted to be able to

Re: limiting searches to particular sources

2011-11-04 Thread Markus Jelsma

Your Nutch indexes the site and host fields. If that is not enough you can use its subcollection plugin to write values for URL patterns. On Wednesday 02 November 2011 15:52:37 Fred Zimmerman wrote: > I want to be able to list some searches to particular sources, e.g. "wiki > only", "crawled only

Re: Solr real-time update taking time

2011-11-04 Thread Erick Erickson

I think that 1-2 second requirement is unreasonable. The first thing I'd do is push back and understand whether this is actually a requirement or just somebody picking numbers our of thin air. Committing often enough for this to work is just *asking* for trouble with 3.3. I'd take a look at the Ne

Re: best way for sum of fields

2011-11-04 Thread Erick Erickson

Please define "sum of fields". The total number of unique terms for all the fields? The sum of some values of some fields for each document? The count of the number of fields in the index? Other??? Best Erick On Thu, Nov 3, 2011 at 11:43 AM, stockii wrote: > i am searching for the best way to ge

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Erick Erickson

Let's see... 1> Committing every second, even with commitWithin is probably going to be a problem. I usually think that 1 second latency is usually overkill, but that's up to your product manager. Look at the NRT (Near Real Time) stuff if you really need this. I thought that NRT was

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread karsten-solr

Hi Spark, 2009 there was a monitor from lucidimagination: http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open-source-apache-lucene A colleague of mine calls the sematext-monitor "trojan" because "SPM phone home": "Easy in, easy out -

InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting

2011-11-04 Thread Edwin Steiner

Hello all I would like to handle german accents (Umlaute) by replacing the accented char with its two-letter substitute (e.g ä => ae). For this reason I use the char-filter solr.MappingCharFilterFactory configured with a mapping file containing entries like “ä” => “ae”. I also want to use the

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread Andre Bois-Crettez

SolrMeter is useful too, it can be plugged to a production server just to watch evolution of caches usage : http://code.google.com/p/solrmeter/wiki/Screenshots#CacheHistoryStatistic André

Re: limiting searches to particular sources

2011-11-04 Thread Fred Zimmerman

Yes -- how do I specify the field as a constant in DIH? On Fri, Nov 4, 2011 at 11:17 AM, Erick Erickson wrote: > How are you crawling your info? Somewhere you have to inject the > source into the document, won't do the trick because > there's no source available > > If you're crawling the da

Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod

This is a code fragment of how I am doing a ContentStreamUpdateRequest using CommonHTTPSolrServer: ContentStreamBase.URLStream csbu = new ContentStreamBase.URLStream(url); InputStream is = csbu.getStream(); FastInputStream fis = new FastInputStream(is); csur.addContentStream(csbu); c

overwrite=false support with SolrJ client

2011-11-04 Thread Ken Krugler

Hi list, I'm working on improving the performance of the Solr scheme for Cascading. This supports generating a Solr index as the output of a Hadoop job. We use SolrJ to write the index locally (via EmbeddedSolrServer). There are mentions of using overwrite=false with the CSV request handler, as

Re: performance - dynamic fields versus static fields

2011-11-04 Thread Erick Erickson

Dynamic fields are just fields, man. There's really no overhead that I know of. I tend to prefer non-dynamic fields whenever possible to reduce hard-to-find errors where, say, I've made a typo and they dynamic pattern matches but that's largely a personal preference. Best Erick On Thu, Nov 3, 20

Re: overwrite=false support with SolrJ client

2011-11-04 Thread Jason Rutherglen

It should be supported in SolrJ, I'm surprised it's been lopped out. Bulk indexing is extremely common. On Fri, Nov 4, 2011 at 1:16 PM, Ken Krugler wrote: > Hi list, > > I'm working on improving the performance of the Solr scheme for Cascading. > > This supports generating a Solr index as the out

Term frequency question

2011-11-04 Thread Craig Stadler

I am using this reference link: http://www.mail-archive.com/solr-user@lucene.apache.org/msg26389.html However the article is a bit old and when I try to compile the class (using newest solr 3.4 / java version "1.7.0_01" / Java(TM) SE Runtime Environment (build 1.7.0_01-b08) / Java HotSpot(TM)

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Gustavo Falco

First of all, thanks a lot for your answer. 1) I could use 5 to 15 seconds between each commit and give it a try. Is this an acceptable configuration? I'll take a look at NRT. 2) Currently I'm using a single core, the simplest setup. I don't expect to have an overwhelming quantity of records, but

solr/jetty request log strangeness

2011-11-04 Thread Shawn Heisey

If the URL being sent to Solr is too long to be completely displayed in the jetty request log, the next log entry is recorded on the same line. The following line from my log is actually three separate requests: 10.100.0.240 - - [04/Nov/2011:00:00:00 +] "GET /solr/s1live/select?qt=lbche

RE: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Brian Gerby

Gustavo - Even with the most basic requirements, I'd recommend setting up a multi-core configuration so you can RELOAD the main core you will be using when you make simple changes to config files. This is much cleaner than bouncing solr each time. There are other benefits to doing it, but thi

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Gustavo Falco

Hi Brian, I'll take a look at what you mentioned. I didn't think about that. I'll finish the implementation at the app level and then I'll read a little more about multi-core setups. Maybe I don't know yet all the benefits it has. Thanks a lot for your advice. 2011/11/4 Brian Gerby > > Gustav

Re: Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod

Answering my own question. ContentStreamUpdateRequest (csur) needs to be within the while loop not outside as I had it. Still not seeing any dramatic performance improvements over perl though (the point of this exercise). Indexing locks after about 30-45 minutes of activity, even a commit wo

Solrj 3.3.0 Method SolrQuery.setSortField not working

2011-11-04 Thread tech20nn

When setting SolrQuery.setSortField("field1", ORDER.asc) on SolrQuery is not adding sort parameter to Solr query. Has anyone faced this issue ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-3-3-0-Method-SolrQuery-setSortField-not-working-tp3481239p3481239.html Sent fro

Re: Proper analyzer / tokenizer for syslog data?

2011-11-04 Thread Peter Spam

Wow, I tried with minGramSize=1 and maxgramSize=1000 (I want someone to be able to search on any substring, just like "grep"), and the index is multiple orders of magnitude larger than my data! There's got to be a better way to support full grep-like searching? Thanks! Pete On Nov 4, 2011, at

Solr Score Normalization

2011-11-04 Thread sangrish

Hi, I have a (dismax) request handler which has the following 3 scoring components (1 qf & 2 bf) : qf = "field1^2 field2^3" bf = func1(field3)^2 func2(field4)^3 Both func1 & func2 return scores between 0 & 1. The score returned by textual match (qf) ranges from 0 to To allow

Re: Solr Score Normalization

2011-11-04 Thread Chris Hostetter

:To allow better combination of text match & my functions, I want the text : score to be normalized between 0 & 1. Is there any way I can achieve that : here? It is achievable, but it is not usualy meaningful... https://wiki.apache.org/lucene-java/ScoresAsPercentages -Hoss

Re: Question about dismax and score boost with date

2011-11-04 Thread Chris Hostetter

: /solr/ftf/dismax/?q=libya : &debugQuery=off : &hl=true : &start= : &rows=10 : -- : : I am trying to factor in created to the SCORE. (boost) I have tried a million : ways to do this, no success. I know the dates are populating correctly because : I can

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread yu shen

Really helpful, thanks so much. Spark 2011/11/4 > Hi Spark, > > 2009 there was a monitor from lucidimagination: > > http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open-source-apache-lucene > > A colleague of mine calls the sematext-

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread yu shen

Thank you for the information. 2011/11/5 yu shen > Really helpful, thanks so much. > > Spark > > 2011/11/4 > > Hi Spark, >> >> 2009 there was a monitor from lucidimagination: >> >> http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open

Re: DIH doesn't handle bound namespaces?

2011-11-04 Thread Lance Norskog

Yes, the xpath thing is a custom lightweight thing for high-speed use. There is a separate full XSL processor. http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 I think this lets you run real XSL on input files. I assume it lets you throw in your favorite XSL implem

39 matches

Mail list logo