spellcheck file format - multiple words on a line?

2012-03-23 Thread geeky2
hello all, for business reasons, we are sourcing the spellcheck file from another business group. the file we receive looks like the example data below can solr support this type of format - or do i need to process this file in to a format that has a single word on a single line? thanks for a

Re: querying on shards

2012-03-23 Thread Shawn Heisey
On 3/23/2012 9:55 AM, stockii wrote: how look your requestHandler of your broker? i think about your idea to do the same ;) Here's what I have got for the default request handler in my broker core, which is called ncmain. The "rollingStatistics" section is applicable to the SOLR-1972 patch.

Re: Field length and scoring

2012-03-23 Thread Ahmet Arslan
> Also, the field length is enocded in a byte (as I remember). > So it's > quite possible that, > even if the lengths of these fields were 3 and 4 instead of > both being > 1, the value > stored for the length norms would be the same number. Exactly. http://search-lucene.com/m/uGKRu1pvRjw

Practical Optimization

2012-03-23 Thread dw5ight
Hey All- we run a http://carsabi.com car search engine with Solr and did some benchmarking recently after we switched from a hosted service to self-hosting. In brief, we went from 800ms complex range queries on a 1.5M document corpus to 43ms. The major shifts were switching from EC2 Large to EC2

Re: Field length and scoring

2012-03-23 Thread Erick Erickson
Erik: The field length is, I believe, based on _tokens_, not characters. Both of your examples are exactly one token long, so the scores are probably identical Also, the field length is enocded in a byte (as I remember). So it's quite possible that, even if the lengths of these fields were 3

Re: Slave index size growing fast

2012-03-23 Thread Erick Erickson
Alexandre: Have you changed anything like on your slave? And do you have more than one slave? If you do, have you considered just blowing away the entire .../data directory on the slave and letting it re-start from scratch? I'd take the slave out of service for the duration of this operation, or

Field length and scoring

2012-03-23 Thread Erik Fäßler
Hello there, I have a quite basic question but my Solr is behaving in a way I'm not quite sure of why it does so. The setup is simple: I have a field "suggestionText" in which single strings are indexed. Schema: Since I want this field to serve for a suggestion-search, the input string is

Re: Solr 4.0 replication problem

2012-03-23 Thread Erick Erickson
In that case, I'm kind of stuck. You've already rebuilt your index from scratch and removed it from your slaves. That should have cleared out most everything that could be an issue. I'd suggest you set up a pair of machines from scratch and try to set up an index/replication with your current schem

Tags and Folksonomies

2012-03-23 Thread Nishant Chandra
Suppose I have content which has title and description. Users can tag content and search content based on tag, title and description. Tag has more weightage. Any inputs on how indexing and retrieval will work given there is content and tags using Solr? Has anyone implemented search based on collab

Re: Slave index size growing fast

2012-03-23 Thread Alexandre Rocco
Tomás, The 300+GB size is only inside the index.20110926152410 dir. Inside there are a lot of files. I am almost conviced that something is messed up like someone commited on this slave machine. Thanks 2012/3/23 Tomás Fernández Löbbe > Alexandre, additionally to what Erick said, you may want t

Re: Slave index size growing fast

2012-03-23 Thread Alexandre Rocco
Erick, The master /data dir contains only an index dir with a bunch of files. In the slave, the /data dir contains an index.20110926152410 dir with a lot more files than the master. That is quite strange for me. I guess that the config is right, since we have another slave that is running fine wi

Re: Commit Strategy for SolrCloud when Talking about 200 million records.

2012-03-23 Thread Mark Miller
On Mar 23, 2012, at 12:49 PM, I-Chiang Chen wrote: > Caused by: java.lang.OutOfMemoryError: Map failed Hmm...looks like this is the key info here. - Mark Miller lucidimagination.com

Re: Commit Strategy for SolrCloud when Talking about 200 million records.

2012-03-23 Thread I-Chiang Chen
We saw couple distinct errors and all machines in a shard is identical: -On the leader of the shard Mar 21, 2012 1:58:34 AM org.apache.solr.common.SolrException log SEVERE: shard update error StdNode: http://blah.blah.net:8983/solr/master2-slave1/:org.apache.solr.common.SolrException: Map failed a

Unexpected Tika Exception extracting text from a PDF file.

2012-03-23 Thread Jon Dragt
Howdy Folks, I'm stumped and hope somebody can give me some clues on how to work around this occasional error I'm getting. I've got a .Net console program using SolrNet to scour certain folders at certain times and extract text from PDF files and index them. It succeeds on a majority of the fi

Re: Solr 4.0 replication problem

2012-03-23 Thread Hakan İlter
Hi Erick, It's not possible because both master and slaves using same binaries. Thanks... On Fri, Mar 23, 2012 at 5:30 PM, Erick Erickson wrote: > Hmmm, looking at your stack trace in a bit more detail, this is really > suspicious: > > Caused by: org.apache.lucene.index.IndexFormatTooNewExcept

Re: querying on shards

2012-03-23 Thread stockii
@Shawn Heisey-4 how look your requestHandler of your broker? i think about your idea to do the same ;) - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores < 200.000

Re: Slave index size growing fast

2012-03-23 Thread Tomás Fernández Löbbe
Alexandre, additionally to what Erick said, you may want to check in the slave if what's 300+GB is the "data" directory or the "index." directory. On Fri, Mar 23, 2012 at 12:25 PM, Erick Erickson wrote: > not really, unless perhaps you're issuing commits or optimizes > on the _slave_ (which you s

Re: Solr 4.0 replication problem

2012-03-23 Thread Erick Erickson
Hmmm, looking at your stack trace in a bit more detail, this is really suspicious: Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported in file 'segments_1': -12 (needs to be between -9 and -11) This *looks* like your Solr version on your slave is older t

Re: Slave index size growing fast

2012-03-23 Thread Erick Erickson
not really, unless perhaps you're issuing commits or optimizes on the _slave_ (which you should NOT do). Replication happens based on the version of the index on the master. True, it starts out as a timestamp, but then successive versions just have that number incremented. The version number in th

Re: Simple Slave Replication Question

2012-03-23 Thread Tomás Fernández Löbbe
Also, what happens if, instead of adding the 40K docs you add just one and commit? 2012/3/23 Tomás Fernández Löbbe > Have you changed the mergeFactor or are you using 10 as in the example > solrconfig? > > What do you see in the slave's log during replication? Do you see any line > like "Skippin

Re: Simple Slave Replication Question

2012-03-23 Thread Tomás Fernández Löbbe
Have you changed the mergeFactor or are you using 10 as in the example solrconfig? What do you see in the slave's log during replication? Do you see any line like "Skipping download for..."? On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy < ben.mccar...@tradermedia.co.uk> wrote: > I just have a i

Re: Solr 4.0 replication problem

2012-03-23 Thread Hakan İlter
Hi Erick, I've already tried step 2 and 3 but it didn't help. It's almost impossible to do step 1 for us because of project dead-line. Do you have any other suggestion? Thank your reply. On Fri, Mar 23, 2012 at 4:56 PM, Erick Erickson wrote: > Hmmm, that is odd. But a trunk build from that lon

Re: Slave index size growing fast

2012-03-23 Thread Alexandre Rocco
Erick, We're using Solr 3.3 on Linux (CentOS 5.6). The /data dir on master is actually 1.2G. I haven't tried to recreate the index yet. Since it's a production environment, I guess that I can stop replication and indexing and then recreate the master index to see if it makes any difference. Also

RE: Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minut

Re: Solr 4.0 replication problem

2012-03-23 Thread Erick Erickson
Hmmm, that is odd. But a trunk build from that long ago is going to be almost impossible to debug/fix. The problem with working from trunk is that this kind of problem won't get much attention. I have three suggestions: 1> update to current trunk. NOTE: you'll have to completely reindex your

Re: Slave index size growing fast

2012-03-23 Thread Erick Erickson
What version of Solr and what operating system? But regardless, this shouldn't be happening. Indexes can temporarily double in size, but any extras should be cleaned up relatively soon. On the master, what's the total size of the /data directory? I'm a little suspicious of the on your master, bu

Re: Simple Slave Replication Question

2012-03-23 Thread Tomás Fernández Löbbe
Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an "invalid" index. Are you committing

Slave index size growing fast

2012-03-23 Thread Alexandre Rocco
Hello, We have a Solr index that has an average of 1.19 GB in size. After configuring the replication, the slave machine is growing the index size expoentially. Currently we have an slave with 323.44 GB in size. Is there anything that could cause this behavior? The current replication config is be

RE: Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we mo

Re: Simple Slave Replication Question

2012-03-23 Thread Martin Koch
I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy < ben.mccar...@tradermedia.co.uk> wrote: > Hello, > > Im looking at the replication from a master to a number of slaves.

"Error 500 seek past EOF" : SOLR bug?

2012-03-23 Thread Martin Koch
Hi list In my ~6M index served from a slave that is replicating from a master, I'm trying to do this query : localhost:8080/solr/core0/select?q=car&qf=document%5E1&defType=edismax Can anybody explain the below error that I get as a result? It may (or may not) be related to another problem that w

Re: Grouping queries

2012-03-23 Thread Martijn v Groningen
> > Where is Join documented? I looked at > http://wiki.apache.org/solr/Join and see no reference to "fromIndex". > Also does this work in a distributed environment? > The "fromIndex" isn't documented in the wiki It is mentioned in the issue and you can find in the Solr code: https://issues.ap

Re: Grouping queries

2012-03-23 Thread Jamie Johnson
On Fri, Mar 23, 2012 at 6:37 AM, Martijn v Groningen wrote: > On 22 March 2012 03:10, Jamie Johnson wrote: > >> I need to apologize I believe that in my example I have too grossly >> over simplified the problem and it's not clear what I am trying to do, >> so I'll try again. >> >> I have a situat

Have made site for comparison of Solr and other enterprise search engines

2012-03-23 Thread Runar Buvik
Hi all Fyi, I am working on a website for doing side by side comparison of several common enterprise search engines, including some that is based on Solr. Currently I have Searchdaimon ES, Microsoft SSE 2010, SearchBlox, Google Mini, Thunderstone, Constellio, mnoGoSearch and Ibm OmniFind Yahoo ru

Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massiv

Re: Commit Strategy for SolrCloud when Talking about 200 million records.

2012-03-23 Thread Markus Jelsma
We did some tests too with many millions of documents and auto-commit enabled. It didn't take long for the indexer to stall and in the meantime the number of open files exploded, to over 16k, then 32k. On Friday 23 March 2012 12:20:15 Mark Miller wrote: > What issues? It really shouldn't be a pr

Re: Commit Strategy for SolrCloud when Talking about 200 million records.

2012-03-23 Thread Mark Miller
What issues? It really shouldn't be a problem. On Mar 22, 2012, at 11:44 PM, I-Chiang Chen wrote: > At this time we are not leveraging the NRT functionality. This is the > initial data load process where the idea is to just add all 200 millions > records first. Than do a single commit at the e

Re: Faceted range based on float within velocity not working properly

2012-03-23 Thread Marcelo Carvalho Fernandes
I went deeper in the problem and discovered that... $math.toInteger("10.1") returns 101 $math.toInteger("10,1") returns 10 Although I'm using Strings in the previous examples, I have a Float variable from Solr. I'm not sure if it is just a Solr problem, just a Velocity problema or somewhere betw

Re: Grouping queries

2012-03-23 Thread Martijn v Groningen
On 22 March 2012 03:10, Jamie Johnson wrote: > I need to apologize I believe that in my example I have too grossly > over simplified the problem and it's not clear what I am trying to do, > so I'll try again. > > I have a situation where I have a set of access controls say user, > super user and

Re: RequestHandler versus SearchComponent

2012-03-23 Thread Michael Kuhlmann
Am 23.03.2012 11:17, schrieb Michael Kuhlmann: Adding an own SearchComponent after the regular QueryComponent (or better as a "last-element") is goof ... Of course, I meant "good", not "goof"! ;) Greetings, Kuli

Re: RequestHandler versus SearchComponent

2012-03-23 Thread Michael Kuhlmann
Am 23.03.2012 10:29, schrieb Ahmet Arslan: I'm looking at the following. I want to (1) map some query fields to some other query fields and add some things to FL, and then (2) rescore. I can see how to do it as a RequestHandler that makes a parser to get the fields, or I could see making a Searc

Re: RequestHandler versus SearchComponent

2012-03-23 Thread Ahmet Arslan
> I'm looking at the following. I want > to (1) map some query fields to > some other query fields and add some things to FL, and then > (2) > rescore. > > I can see how to do it as a RequestHandler that makes a > parser to get > the fields, or I could see making a SearchComponent that was > stuck

Re: Trouble Setting Up Development Environment

2012-03-23 Thread Li Li
here is my method. 1. check out latest source codes from trunk or download tar ball svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunklucene_trunk 2. create a dynamic web project in eclipse and close it. for example, I create a project name lucene-solr-trunk in my workspace.