date:20101130

Re: Basic Solr Configurations and best practice

2010-11-30 Thread Lance Norskog

Solr 4- You mean the Solr 'trunk' source or the Solr 1.4.1 release? The 1.4.1 release does not have the TikaEntityProcessor, only the /extract code. The Solr 3.x branch and the trunk have the TikaEP. I use the 3.x branch and, well, the TikaEP has a few problems but can be hacked around. Whatever

Re: distributed architecture

2010-11-30 Thread Dennis Gearon

Wow, would you put a diagram somewhere up on the Solr site? Or, here, and I will put it somewhere there. And, what is a VIP? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes,

Re: ArrayIndexOutOfBoundsException in sort

2010-11-30 Thread Gora Mohanty

On Wed, Dec 1, 2010 at 10:56 AM, Jerry Li wrote: > Hi team > > My solr version is 1.4 > There is an ArrayIndexOutOfBoundsException when i sort one field and the > following is my code and log info, > any help will be appreciated. > > Code: > > SolrQuery query = new SolrQuery(); > que

Twitter Search + big Hadoop, Dec. 8th at Seattle Scalability Meetup

2010-11-30 Thread Bradford Stephens

Greetings, The Seattle Scalability Meetup isn't slacking for the holidays. We've got an awesome lineup for Wed, December 8 at 7pm: http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/ -Jake Mannix from Twitter will talk about the Twitter Search infrastructure (with distributed Lucene) -Chris

ArrayIndexOutOfBoundsException in sort

2010-11-30 Thread Jerry Li

Hi team My solr version is 1.4 There is an ArrayIndexOutOfBoundsException when i sort one field and the following is my code and log info, any help will be appreciated. Code: SolrQuery query = new SolrQuery(); query.setSortField("author", ORDER.desc); query.addFilterQuery

Re: Dinamically change master

2010-11-30 Thread Tommaso Teofili

Hi Upayavira, this is a good start for solving my problem, can you please tell how does such a replication URL look like? Thanks, Tommaso 2010/12/1 Upayavira > Hi Tommaso, > > I believe you can tell each server to act as a master (which means it > can have its indexes pulled from it). > > You ca

RE: distributed architecture

2010-11-30 Thread Jayant Das

Hi, A diagram will be very much appreciated. Thanks, Jayant > From: u...@odoko.co.uk > To: solr-user@lucene.apache.org > Subject: Re: distributed architecture > Date: Wed, 1 Dec 2010 00:39:40 + > > I cannot say how mature the code for B) is, but it is not yet included > in a release. > > I

Re: shutdown.sh does not kill the tomcat process running solr./?

2010-11-30 Thread Shawn Heisey

On 11/30/2010 3:49 PM, Robert Petersen wrote: That raises another question: top can show only 20 GB free out of 64 but the tomcat/solr process only shows its using half of that. What is using the rest? The numbers don't add up... Chances are that it's your operating system disk cache. Below

Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread Li Li

you may implement your own MergePolicy to keep on large index and merge all other small ones or simply set merge factor to 2 and the largest index not be merged by set maxMergeDocs less than the docs in the largest one. So there is one large index and a small one. when adding a little docs, they wi

Re: entire farm fails at the same time with OOM issues

2010-11-30 Thread Yonik Seeley

On Tue, Nov 30, 2010 at 6:04 PM, Robert Petersen wrote: > My question is this. Why in the world would all of my slaves, after > running fine for some days, suddenly all at the exact same minute > experience OOM heap errors and go dead? If there is no change in query traffic when this happens, th

Re: shutdown.sh does not kill the tomcat process running solr./?

2010-11-30 Thread Li Li

1. make sure the the port is not used. 2. ./bin/shutdown.sh && tail -f logs/xxx to see what the server is doing if you just feed data or modified index, and don't flush/commit, when shutdowning, it will do something. 2010/12/1 Robert Petersen : > Greetings, we're wondering why we can issue th

RE: entire farm fails at the same time with OOM issues

2010-11-30 Thread Robert Petersen

What would I do with the heap dump though? Run one of those java heap analyzers looking for memory leaks or something? I have no experience with thoseI saw there was a bug fix in solr 1.4.1 for a 100 byte memory leak occurring on each commit, but it would take thousands of commits to make that ad

Re: distributed architecture

2010-11-30 Thread Shawn Heisey

On 11/30/2010 2:27 PM, Cinquini, Luca (3880) wrote: Hi, I'd like to know if anybody has suggestions/opinions on what is currently the best architecture for a distributed search system using Solr. The use case is that of a system composed of N indexes, each hosted on a separate machine,

Re: distributed architecture

2010-11-30 Thread Upayavira

I cannot say how mature the code for B) is, but it is not yet included in a release. If you want the ability to distribute content across multiple nodes (due to volume) and want resilience, then use both. I've had one setup where we have two master servers, each with four cores. Then we have two

Re: Dinamically change master

2010-11-30 Thread Upayavira

Hi Tommaso, I believe you can tell each server to act as a master (which means it can have its indexes pulled from it). You can then include the master hostname in the URL that triggers a replication process. Thus, if you triggered replication from outside solr, you'd have control over which mast

Re: Preventing index segment corruption when windows crashes

2010-11-30 Thread Peter Sturge

After a recent Windows 7 crash (:-\), upon restart, Solr starts giving LockObtainFailedException errors: (excerpt) 30-Nov-2010 23:10:51 org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: nativefsl...@solr\.\.\data0\index

Re: Large Hdd-Space using during commit/optimize

2010-11-30 Thread Upayavira

I don't know who you are replying to here, but... There's nothing to stop you doing: * import 2m docs * sleep 2 days * import 2m docs * sleep 2 days * repeat above until done * commit There's no reason why you should commit regularly. If you need to slow down for your DB, do, but that does

Re: entire farm fails at the same time with OOM issues

2010-11-30 Thread Ken Krugler

Hi Robert, I'd recommend launching Tomcat with -XX:+HeapDumpOnOutOfMemoryError and -XX:HeapDumpPath=, so then you have something to look at versus a Gedankenexperiment :) -- Ken On Nov 30, 2010, at 3:04pm, Robert Petersen wrote: Greetings, we are running one master and four slaves of our

entire farm fails at the same time with OOM issues

2010-11-30 Thread Robert Petersen

Greetings, we are running one master and four slaves of our multicore solr setup. We just served searches for our catalog of 8 million products with this farm during black Friday and cyber Monday, our busiest days of the year, and the servers did not break a sweat! Index size is about 28GB. H

shutdown.sh does not kill the tomcat process running solr./?

2010-11-30 Thread Robert Petersen

Greetings, we're wondering why we can issue the command to shutdown tomcat/solr but the process remains visible in memory (by using the top command) and we have to manually kill the PID for it to release its memory before we can (re)start tomcat/solr? Anybody have any ideas? The process is using 1

distributed architecture

2010-11-30 Thread Cinquini, Luca (3880)

Hi, I'd like to know if anybody has suggestions/opinions on what is currently the best architecture for a distributed search system using Solr. The use case is that of a system composed of N indexes, each hosted on a separate machine, each index containing unique content. Options that I

RE: how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files again. I also commented out the one in the section. Unfortunately the files are still chopped out if the size of file is more than 20MB. Any suggestions? I really appreciate your help! Xiaohui -Original Message-

Re: "Bad file descriptor" Errors

2010-11-30 Thread John Williams

Bump. Anyone? -J On Nov 29, 2010, at 3:17 PM, John Williams wrote: > Recently, we have started to get "Bad file descriptor" errors in one of our > Solr instances. This instance is a searcher and its index is stored on a > local SSD. The master however has it's index stored on NFS, which seems

Very slow sorting, even on small result sets

2010-11-30 Thread Simon Wistow

We've got a largish corpus (~94 million documents). We'd like to be able to sort on one of the string fields. However this takes an incredibly long time. A warming query for that field takes about ~20 minutes. However most of the time the result sets are small since we use filters heavily - typ

RE: how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Thanks so much for your help! Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 30, 2010 2:01 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd Set the value in solrconfig.xml to, say, 2147483647

Need info on CachedSqlEntityProcessor

2010-11-30 Thread bbarani

Hi, I am using cached SQL entity processor in my data config, please find below the structure of my data config file. Object property and relationship needs to be matched against each object. Whenever the data is being returned by all the 3 entities (all 3 sel

Re: Dinamically change master

2010-11-30 Thread Tommaso Teofili

Hi, Thanks Jacob and Ken for your replies. I am not able to change project architecture to add Lucandra even if it looks like a nice solution. Going the VIP way can definitely an option even if I'd be more keen to solve that "inside" Solr. I am thinking to try and play with Collection Distribution

Re: how to set maxFieldLength to unlimitd

2010-11-30 Thread Erick Erickson

Set the value in solrconfig.xml to, say, 2147483647 Also, see this thread for a common gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html , it appears you can just comment out the one in the section. Best Erick On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NI

Re: ArrayIndexOutOfBoundsException for query with rows=0 and sort param

2010-11-30 Thread Martin Grotzke

On Tue, Nov 30, 2010 at 3:09 PM, Yonik Seeley wrote: > On Tue, Nov 30, 2010 at 8:24 AM, Martin Grotzke > wrote: >> Still I'm wondering, why this issue does not occur with the plain >> example solr setup with 2 indexed docs. Any explanation? > > It's an old option you have in your solrconfig.xml t

how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

I need index and search some pdf files which are very big (around 1000 pages each). How can I set maxFieldLength to unlimited? Thanks so much for your help in advance, Xiaohui

Re: Dinamically change master

2010-11-30 Thread Ken Krugler

Hi Tommaso, On Nov 30, 2010, at 7:41am, Tommaso Teofili wrote: Hi all, in a replication environment if the host where the master is running goes down for some reason, is there a way to communicate to the slaves to point to a different (backup) master without manually changing configurati

Re: Large Hdd-Space using during commit/optimize

2010-11-30 Thread stockii

okay. the query kills the database, because no index of modified is set ... -- View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1993750.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread Erick Erickson

I don't know, you'll have to debug it to see if it's the thing that takes so long. Solr should be able to handle 1,200 updates in a very short time unless there's something else going on, like you're committing after every update or something. This may help you track down performance with DIH htt

PermGen per Solr core?

2010-11-30 Thread Andrew Davidoff

Hi, I am running multiple Solr cores (solr-tomcat 1.4.0+ds1-1ubuntu1) under Tomcat (6.0.24-2ubuntu1.4) on Ubuntu 10.04.1. I have a master server where all Solr writes go, and a slave server that replicates all cores from the master, and accepts all read-only queries. After maxing out PermGen spac

Re: Failover setup (is this a bad idea)

2010-11-30 Thread Jayendra Patil

Rather have a Master and multiple Slave combination, with master only being used for writes and slaves used for reads. Master to Slave replication is easily configurable. Two Solr instances sharing the same index is not at all good idea with both writing to the same index. Regards, Jayendra On T

Re: QueryNorm and FieldNorm

2010-11-30 Thread Jayendra Patil

fieldNorm is the combination of length of the field with index and query time boosts. 1. lengthNorm = measure of the importance of a term according to the total number of terms in the field 1. Implementation: 1/sqrt(numTerms) 2. Implication: a term matched in fields with

Re: Dinamically change master

2010-11-30 Thread Jacob Elder

Your best bet might be to look into Lucandra: https://github.com/tjake/Lucandra On Tue, Nov 30, 2010 at 10:41 AM, Tommaso Teofili wrote: > Hi all, > > in a replication environment if the host where the master is running goes > down for some reason, is there a way to communicate to the slaves to

Dinamically change master

2010-11-30 Thread Tommaso Teofili

Hi all, in a replication environment if the host where the master is running goes down for some reason, is there a way to communicate to the slaves to point to a different (backup) master without manually changing configuration (and restarting the slaves or their cores)? Basically I'd like to be

Re: Good example of multiple tokenizers for a single field

2010-11-30 Thread Robert Muir

On Tue, Nov 30, 2010 at 9:45 AM, Jacob Elder wrote: > Right. CJK doesn't tend to have a lot of whitespace to begin with. In the > past, we were using a patched version of StandardTokenizer which treated > @twitteruser and #hashtag better, but this became a release engineering > nightmare so we swi

RE: Return Lucene DocId in Solr Results

2010-11-30 Thread Lohrenz, Steven

Hmm, I found some similar queries on stackoverflow and they did not recommend exposing the lucene docId. So, I guess my question becomes: What is the best way, from within my custom QParser, to take a list of solr primary keys (that were retrieved from elsewhere) and turn them into docIds? I a

Re: Good example of multiple tokenizers for a single field

2010-11-30 Thread Jacob Elder

Right. CJK doesn't tend to have a lot of whitespace to begin with. In the past, we were using a patched version of StandardTokenizer which treated @twitteruser and #hashtag better, but this became a release engineering nightmare so we switched to Whitespace. Perhaps I could rephrase the question a

QueryNorm and FieldNorm

2010-11-30 Thread Gastone Penzo

Hello, someone can explain the difference between queryNorm and FieldNorm in debugQuery?? why if i push one bf boost up, the queryNorm goes down?? i made some modifies..before the situation was different. why?? thanx -- Gastone Penzo

Re: Good example of multiple tokenizers for a single field

2010-11-30 Thread Jacob Elder

+1 That's exactly what we need, too. On Mon, Nov 29, 2010 at 5:28 PM, Shawn Heisey wrote: > On 11/29/2010 3:15 PM, Jacob Elder wrote: > >> I am looking for a clear example of using more than one tokenizer for a >> source single field. My application has a single "body" field which until >> rece

Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread stockii

i copied the wrong query, because 10 hours ;) i didnt test the query with 28 million records . but wiht a few million and it works fine. ... before i used DIH, i used php and import direclty documents into solr. but i want use dih because the better performance, i think so ... grml ... -- Vie

Re: ArrayIndexOutOfBoundsException for query with rows=0 and sort param

2010-11-30 Thread Yonik Seeley

On Tue, Nov 30, 2010 at 8:24 AM, Martin Grotzke wrote: > Still I'm wondering, why this issue does not occur with the plain > example solr setup with 2 indexed docs. Any explanation? It's an old option you have in your solrconfig.xml that causes a different code path to be followed in Solr:

Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread stockii

how do you think is the deltaQuery better ? XD -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread stockii

everyday ~30.000 Documents and every hour ~1200 multiple thread with DIH ? how it works ? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992767.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread Erick Erickson

Please provide more data. Specifically: > how many documents are updated? > Have you tried running this query without Solr? In other words have you investigated whether the speed issue is simply your SQL executing slowly? > Why are you selecting the last 10 hours' data when all you want i

Best practice for Delta every 2 Minutes.

2010-11-30 Thread stockii

Hello. index is about 28 Million documents large. When i starts an delta-import is look at modified. but delta import takes to long. over an hour need solr for delta. thats my query. all sessions from the last hour should updated and all changed. i think its normal that solr need long time for t

Re: Creating Email Token Filter

2010-11-30 Thread Erick Erickson

See below. If this still doesn't make sense, could you show us some examples? Best Erick On Tue, Nov 30, 2010 at 8:33 AM, Greg Smith wrote: > Bernd, > > Looking at the results returned in the search results the field is > populated > with all of the information regardless of whether there was an

Re: Creating Email Token Filter

2010-11-30 Thread Greg Smith

Bernd, Looking at the results returned in the search results the field is populated with all of the information regardless of whether there was an email contained in the contents. Would the way the analysers and tokens be handled different if using a copy field? Thanks On 30 November 2010 10:54

Re: ArrayIndexOutOfBoundsException for query with rows=0 and sort param

2010-11-30 Thread Martin Grotzke

On Tue, Nov 30, 2010 at 10:29 AM, Michael McCandless wrote: > Hmm this is in fact a regression. > > TopFieldCollector expects (but does not verify) that numHits is > 0. > > I guess to fix this we could fix TopFieldCollector.create to return a > NullCollector when numHits is 0. Fixing this in luce

Re: Large Hdd-Space using during commit/optimize

2010-11-30 Thread Erick Erickson

Solr doesn't lock anything as far as I know, it just executes the query you specify. The query you specify may well do bad things to your database, but that's not Solr's fault. What happens if you simply try executing the query outside Solr? Do you see the same "locking" behavior? You might want t

Re: SOLR for Log analysis feasibility

2010-11-30 Thread Peter Sturge

We do a lot of precisely this sort of thing. Ours is a commercial product (Honeycomb Lexicon) that extracts behavioural information from logs, events and network data (don't worry, I'm not pushing this on you!) - only to say that there are a lot of considerations beyond base Solr when it comes to h

Re: SOLR for Log analysis feasibility

2010-11-30 Thread Stefan Matheis

i know, it's not solr .. but perhaps you should have a look at it: http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ On Tue, Nov 30, 2010 at 12:58 PM, Peter Karich wrote: > take a look into this: > http://vimeo.com/16102543 > > for that amount of data it isn'

Failover setup (is this a bad idea)

2010-11-30 Thread Keith Pope

Hi, I have a windows cluster that I would like to install Solr onto, there are two nodes that provide basic failover. I was thinking of this setup: Tomcat installed as win service Two solr instances sharing the same index The second instance would take over when the first fails, so you should

Re: SOLR for Log analysis feasibility

2010-11-30 Thread Peter Karich

take a look into this: http://vimeo.com/16102543 for that amount of data it isn't that easy :-) We are looking into building a reporting feature and investigating solutions which will allow us to search though our logs for downloads, searches and view history. Each log item is relatively smal

Return Lucene DocId in Solr Results

2010-11-30 Thread Lohrenz, Steven

Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is th

Re: Termvector based result grouping / field collapsing?

2010-11-30 Thread Grant Ingersoll

On Nov 29, 2010, at 5:17 PM, Shawn Heisey wrote: > I was just in a meeting where we discussed customer feedback on our website. > One thing that the users would like to see is "galleries" where photos that > are part of a set are grouped together under a single result. This is > basically fi

RE: BasicHelloRequestHandler plugin - class path changed

2010-11-30 Thread Hong-Thai Nguyen

Hi, I found the problem: The class name has been changed to 1.4.1: From: import org.apache.solr.response.SolrQueryResponse; To: import org.apache.solr.request.SolrQueryResponse; Best, --- Hong-Thai -Message d'origine- De : Hong-Thai Nguyen [mailto:hong-thai.ngu...@polyspot

Re: Creating Email Token Filter

2010-11-30 Thread Bernd Fehling

Am 30.11.2010 10:56, schrieb Greg Smith: > Hi, > > I have written a plugin to filter on email types and keep those tokens, > however when I run it in the analysis in the admin it all works fine. > > But when I use the data import handler to import the data and set the field > type it doesn't rem

Creating Email Token Filter

2010-11-30 Thread Greg Smith

Hi, I have written a plugin to filter on email types and keep those tokens, however when I run it in the analysis in the admin it all works fine. But when I use the data import handler to import the data and set the field type it doesn't remove the other tokens and keeps the field in the original

Re: Boost on newer documents

2010-11-30 Thread Savvas-Andreas Moysidis

ahhh I see..good point..yes, for a high number of unique scores the secondary sort won't have any effect.. On 30 November 2010 09:32, Jason Brown wrote: > Hi - you do understand may case - we tried what you suggested but as the > relevancy is very precise we couldn't get it it to do a dual-sort.

RE: Boost on newer documents

2010-11-30 Thread Jason Brown

Hi - you do understand may case - we tried what you suggested but as the relevancy is very precise we couldn't get it it to do a dual-sort. I like the idea of using one of the dismax parameters (bf) to in-effect increase the boost on a newer document. Thanks for all replies, most useful. ---

Re: ArrayIndexOutOfBoundsException for query with rows=0 and sort param

2010-11-30 Thread Michael McCandless

Hmm this is in fact a regression. TopFieldCollector expects (but does not verify) that numHits is > 0. I guess to fix this we could fix TopFieldCollector.create to return a NullCollector when numHits is 0. But: why is your app doing this? Ie, if numHits (rows) is 0, the only useful thing you ca

Re: Boost on newer documents

2010-11-30 Thread Savvas-Andreas Moysidis

hi, I might not understand your case right but can you not add an extra publishedDate field and then specify a secondary (after relevance) sort by that? On 30 November 2010 08:05, wrote: > You could also put a short representation of the data (I suggest days since > 01.01.2010) as payload and c

Re: Preventing index segment corruption when windows crashes

2010-11-30 Thread Peter Sturge

The index itself isn't corrupt - just one of the segment files. This means you can read the index (less the offending segment(s)), but once this happens it's no longer possible to access the documents that were in that segment (they're gone forever), nor write/commit to the index (depending on the

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-30 Thread Bernd Fehling

> As mentioned, in the typical case it's important that the field names be > included in the signature, but i imagine there would be cases where you > wouldn't want them included (like a simple concat Signature for building > basic composite keys) > > I think the Signature API could definitely

Re: Large Hdd-Space using during commit/optimize

2010-11-30 Thread stockii

aha aha :D hm i dont know. we import in 2MillionSteps because we think that solr locks our database and we want a better controll of the import ... -- View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1991392.html Sent from

RE: Good example of multiple tokenizers for a single field

2010-11-30 Thread jan.kurella

We had the same problem for our fields and we wrote a Tokenizer using the icu4j library. Breaking tokens at script changes, and dealing with them according the script and the configured Breakiterators. This works out very well, as we also add the "scrip" information to the token so later filter

Re: search strangeness

2010-11-30 Thread ramzesua

I found the problem: solr.EnglishPorterFilterFactory in the form that parsedquery. -- View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1991321.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: search strangeness

2010-11-30 Thread ramzesua

Here result with &debugQuery: For term annual: annual annual text:year text:twelve-month text:onceayear text:yearbook text:year text:twelve-month text:onceayear text:yearbook LuceneQParser 63.0 For term welcome: welcome welcome text:welcom text:welcom

RE: Boost on newer documents

2010-11-30 Thread jan.kurella

You could also put a short representation of the data (I suggest days since 01.01.2010) as payload and calculate boost with payload function of the similarity. >-Original Message- >From: ext Jason Brown [mailto:jason.br...@sjp.co.uk] >Sent: Montag, 29. November 2010 17:28 >To: solr-user@

73 matches

Mail list logo