Proper analyzer / tokenizer for syslog data?

2011-11-04 Thread Peter Spam
Example data: 01/23/2011 05:12:34 [Test] a=1; hello_there=50; data=[1,5,30%]; I would love to be able to just "grep" the data - ie. if I search for "ello", it finds and returns "ello", and if I search for "hello_there=5", it would match too. Here's what I'm using now:

Re: Can Solr handle large text files?

2011-11-04 Thread Peter Spam
Solr 4.0 (11/1 snapshot) Data: 80k files, average size 2.5MB, largest is 750MB; Solr: Each document is max 256k; total docs = 800k Machine: Early 2009 Mac Pro, 6GB RAM, 1GBmin/2GBmax given to Solr Java; Admin shows 30% mem usage I originally tried injecting the entire file into a single Solr doc

Zookeeper aware Replication in SolrCloud

2011-11-04 Thread prakash chandrasekaran
hi, i m using SolrCloud and i wanted to add Replication feature to it .. i followed the steps in Solr Wiki .. but when the client tried to poll for data from server i got below Error Message .. in Master LogNov 3, 2011 8:34:00 PM org.apache.solr.common.SolrException logSEVERE: org.apache.solr.

Re: Proper analyzer / tokenizer for syslog data?

2011-11-04 Thread Ahmet Arslan
> Example data: > 01/23/2011 05:12:34 [Test] a=1; hello_there=50; > data=[1,5,30%]; > > I would love to be able to just "grep" the data - ie. if I > search for "ello", it finds and returns "ello", and if I > search for "hello_there=5", it would match too. > > Here's what I'm using now: > >     c

Solr, MultiValues and links...

2011-11-04 Thread Tiernan OToole
Right, not sure how to ask this question, what the terminology, but hopefully my explaination will help... We are chucking data into solr for queries. i cant mention the exact data, but the closest thing i can think of is as follows: - Unique ID for the solr record (DB ID in this case) - A

Comparing apples & oranges?

2011-11-04 Thread Martin Koch
Hi List I have a solr index where I want to include numerical fields in my ranking function as well as keyword relevance. For example, each document has a document view count, and I'd like to increase the relevancy of documents that are read often, and penalize documents with a very low view count

Solr

2011-11-04 Thread KARHU Toni
Hi, when is the SOLR cloud version planned to be released/stable what are your thought of using it in a serious production environment? Br, Toni ** IMPORTANT: This message is intended exclusively for in

Re: Ordered proximity search

2011-11-04 Thread Rahul Warawdekar
Hi Thomas, Do you always need the ordered proximity search by default ? You may want to check SpanNearQuery at " http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/";. We are using edismax query parser provided by Solr. I had a similar type of requirement in our project in here is how

Re: Ordered proximity search

2011-11-04 Thread LT.thomas
Thanks for your reply, I will check this advice -- View this message in context: http://lucene.472066.n3.nabble.com/Ordered-proximity-search-tp3477946p3480321.html Sent from the Solr - User mailing list archive at Nabble.com.

Fwd: Assist please

2011-11-04 Thread Oleg Tikhonov
-- Forwarded message -- From: NDIAYE Bacar Date: Fri, Nov 4, 2011 at 12:05 PM Subject: Assist please To: d...@tika.apache.org, u...@tika.apache.org Hi, I need your assist please for to configuration the Apache Tika to Sorl attachment in Drupal 7. I have try to confi

Re: limiting searches to particular sources

2011-11-04 Thread Erick Erickson
How are you crawling your info? Somewhere you have to inject the source into the document, won't do the trick because there's no source available If you're crawling the data by yourself, you can just add the source to the document. If you're using DIH, you can specify the field as a constant

Re: Highlighting "text" field when query is for "string" field

2011-11-04 Thread Erick Erickson
Try this with &debugQuery=on. I suspect you're not getting the query you think you are and I'd straighten that out before worrying about highlighting. Usually, for instance, AND should be capitalized to be an operator. So try with &debugQuery=on and see what happens. The highlighter, I believe, w

Re: Extended Dismax and Proximity Queries

2011-11-04 Thread Erick Erickson
Yes. Just try it with &debugQuery=on and you can see the parsed form of the query. Best Erick On Wed, Nov 2, 2011 at 6:20 PM, Jamie Johnson wrote: > Is it possible to do Proximity queries using edismax?  I saw I could > do the following > > q="batman movie"&qs=100 > > but I wanted to be able to

Re: limiting searches to particular sources

2011-11-04 Thread Markus Jelsma
Your Nutch indexes the site and host fields. If that is not enough you can use its subcollection plugin to write values for URL patterns. On Wednesday 02 November 2011 15:52:37 Fred Zimmerman wrote: > I want to be able to list some searches to particular sources, e.g. "wiki > only", "crawled only

Re: Solr real-time update taking time

2011-11-04 Thread Erick Erickson
I think that 1-2 second requirement is unreasonable. The first thing I'd do is push back and understand whether this is actually a requirement or just somebody picking numbers our of thin air. Committing often enough for this to work is just *asking* for trouble with 3.3. I'd take a look at the Ne

Re: best way for sum of fields

2011-11-04 Thread Erick Erickson
Please define "sum of fields". The total number of unique terms for all the fields? The sum of some values of some fields for each document? The count of the number of fields in the index? Other??? Best Erick On Thu, Nov 3, 2011 at 11:43 AM, stockii wrote: > i am searching for the best way to ge

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Erick Erickson
Let's see... 1> Committing every second, even with commitWithin is probably going to be a problem. I usually think that 1 second latency is usually overkill, but that's up to your product manager. Look at the NRT (Near Real Time) stuff if you really need this. I thought that NRT was

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread karsten-solr
Hi Spark, 2009 there was a monitor from lucidimagination: http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open-source-apache-lucene A colleague of mine calls the sematext-monitor "trojan" because "SPM phone home": "Easy in, easy out -

InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting

2011-11-04 Thread Edwin Steiner
Hello all I would like to handle german accents (Umlaute) by replacing the accented char with its two-letter substitute (e.g ä => ae). For this reason I use the char-filter solr.MappingCharFilterFactory configured with a mapping file containing entries like “ä” => “ae”. I also want to use the

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread Andre Bois-Crettez
SolrMeter is useful too, it can be plugged to a production server just to watch evolution of caches usage : http://code.google.com/p/solrmeter/wiki/Screenshots#CacheHistoryStatistic André

Re: limiting searches to particular sources

2011-11-04 Thread Fred Zimmerman
Yes -- how do I specify the field as a constant in DIH? On Fri, Nov 4, 2011 at 11:17 AM, Erick Erickson wrote: > How are you crawling your info? Somewhere you have to inject the > source into the document, won't do the trick because > there's no source available > > If you're crawling the da

Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod
This is a code fragment of how I am doing a ContentStreamUpdateRequest using CommonHTTPSolrServer: ContentStreamBase.URLStream csbu = new ContentStreamBase.URLStream(url); InputStream is = csbu.getStream(); FastInputStream fis = new FastInputStream(is); csur.addContentStream(csbu); c

overwrite=false support with SolrJ client

2011-11-04 Thread Ken Krugler
Hi list, I'm working on improving the performance of the Solr scheme for Cascading. This supports generating a Solr index as the output of a Hadoop job. We use SolrJ to write the index locally (via EmbeddedSolrServer). There are mentions of using overwrite=false with the CSV request handler, as

Re: performance - dynamic fields versus static fields

2011-11-04 Thread Erick Erickson
Dynamic fields are just fields, man. There's really no overhead that I know of. I tend to prefer non-dynamic fields whenever possible to reduce hard-to-find errors where, say, I've made a typo and they dynamic pattern matches but that's largely a personal preference. Best Erick On Thu, Nov 3, 20

Re: overwrite=false support with SolrJ client

2011-11-04 Thread Jason Rutherglen
It should be supported in SolrJ, I'm surprised it's been lopped out. Bulk indexing is extremely common. On Fri, Nov 4, 2011 at 1:16 PM, Ken Krugler wrote: > Hi list, > > I'm working on improving the performance of the Solr scheme for Cascading. > > This supports generating a Solr index as the out

Term frequency question

2011-11-04 Thread Craig Stadler
I am using this reference link: http://www.mail-archive.com/solr-user@lucene.apache.org/msg26389.html However the article is a bit old and when I try to compile the class (using newest solr 3.4 / java version "1.7.0_01" / Java(TM) SE Runtime Environment (build 1.7.0_01-b08) / Java HotSpot(TM)

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Gustavo Falco
First of all, thanks a lot for your answer. 1) I could use 5 to 15 seconds between each commit and give it a try. Is this an acceptable configuration? I'll take a look at NRT. 2) Currently I'm using a single core, the simplest setup. I don't expect to have an overwhelming quantity of records, but

solr/jetty request log strangeness

2011-11-04 Thread Shawn Heisey
If the URL being sent to Solr is too long to be completely displayed in the jetty request log, the next log entry is recorded on the same line. The following line from my log is actually three separate requests: 10.100.0.240 - - [04/Nov/2011:00:00:00 +] "GET /solr/s1live/select?qt=lbche

RE: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Brian Gerby
Gustavo - Even with the most basic requirements, I'd recommend setting up a multi-core configuration so you can RELOAD the main core you will be using when you make simple changes to config files. This is much cleaner than bouncing solr each time. There are other benefits to doing it, but thi

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Gustavo Falco
Hi Brian, I'll take a look at what you mentioned. I didn't think about that. I'll finish the implementation at the app level and then I'll read a little more about multi-core setups. Maybe I don't know yet all the benefits it has. Thanks a lot for your advice. 2011/11/4 Brian Gerby > > Gustav

Re: Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod
Answering my own question. ContentStreamUpdateRequest (csur) needs to be within the while loop not outside as I had it. Still not seeing any dramatic performance improvements over perl though (the point of this exercise). Indexing locks after about 30-45 minutes of activity, even a commit wo

Solrj 3.3.0 Method SolrQuery.setSortField not working

2011-11-04 Thread tech20nn
When setting SolrQuery.setSortField("field1", ORDER.asc) on SolrQuery is not adding sort parameter to Solr query. Has anyone faced this issue ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-3-3-0-Method-SolrQuery-setSortField-not-working-tp3481239p3481239.html Sent fro

Re: Proper analyzer / tokenizer for syslog data?

2011-11-04 Thread Peter Spam
Wow, I tried with minGramSize=1 and maxgramSize=1000 (I want someone to be able to search on any substring, just like "grep"), and the index is multiple orders of magnitude larger than my data! There's got to be a better way to support full grep-like searching? Thanks! Pete On Nov 4, 2011, at

Solr Score Normalization

2011-11-04 Thread sangrish
Hi, I have a (dismax) request handler which has the following 3 scoring components (1 qf & 2 bf) : qf = "field1^2 field2^3" bf = func1(field3)^2 func2(field4)^3 Both func1 & func2 return scores between 0 & 1. The score returned by textual match (qf) ranges from 0 to To allow

Re: Solr Score Normalization

2011-11-04 Thread Chris Hostetter
:To allow better combination of text match & my functions, I want the text : score to be normalized between 0 & 1. Is there any way I can achieve that : here? It is achievable, but it is not usualy meaningful... https://wiki.apache.org/lucene-java/ScoresAsPercentages -Hoss

Re: Question about dismax and score boost with date

2011-11-04 Thread Chris Hostetter
: /solr/ftf/dismax/?q=libya : &debugQuery=off : &hl=true : &start= : &rows=10 : -- : : I am trying to factor in created to the SCORE. (boost) I have tried a million : ways to do this, no success. I know the dates are populating correctly because : I can

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread yu shen
Really helpful, thanks so much. Spark 2011/11/4 > Hi Spark, > > 2009 there was a monitor from lucidimagination: > > http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open-source-apache-lucene > > A colleague of mine calls the sematext-

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread yu shen
Thank you for the information. 2011/11/5 yu shen > Really helpful, thanks so much. > > Spark > > 2011/11/4 > > Hi Spark, >> >> 2009 there was a monitor from lucidimagination: >> >> http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open

Re: DIH doesn't handle bound namespaces?

2011-11-04 Thread Lance Norskog
Yes, the xpath thing is a custom lightweight thing for high-speed use. There is a separate full XSL processor. http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 I think this lets you run real XSL on input files. I assume it lets you throw in your favorite XSL implem