Backup and Restore of index files

2008-06-27 Thread Jacob Singh
Hi, I see this has been discussed: http://www.mail-archive.com/solr-user@lucene.apache.org/msg08150.html and I've read the wiki. I've got replication working okay, but I'm not trying to do replication. Rather, I want to: 1. Get a hot backup of a master server (meaning no interruption of servic

Re: Suggestion for short text matching using dictionary

2008-06-27 Thread Grant Ingersoll
below On Jun 27, 2008, at 1:18 AM, climbingrose wrote: Firstly, my apologies for being off topic. I'm asking this question because I think there are some machine learning and text processing experts on this mailing list. Basically, my task is to normalize a fairly unstructured set of sh

Re: Suggestion for short text matching using dictionary

2008-06-27 Thread climbingrose
Thanks Grant. I did try Secondstring before and found out that it wasn't particular good for doing a lot of text matching. I'm leaning toward the combination of Lucene and Secondstring. Googling around a bit, I came across this project http://datamining.anu.edu.au/projects/linkage.html. Looks inter

Re: Suggestion for short text matching using dictionary

2008-06-27 Thread Grant Ingersoll
Yeah, I don't know if SecondString scales. Note that Lucene now has an implementation of Jaro-Winkler, which is a pretty good distance measure, so you may want to give that a try, plus if you see speedups, feel free to contrib a patch ;-) I'm wondering if Hadoop couldn't help w/ the scale

Re: Backup and Restore of index files

2008-06-27 Thread Jeremy Hinegardner
Hi, I've been working with the snapshot / replication scripts recently, and I believe you could use / alter them to do what you want. On Fri, Jun 27, 2008 at 03:49:49PM +0530, Jacob Singh wrote: > Hi, > > I see this has been discussed: > http://www.mail-archive.com/solr-user@lucene.apache.org/msg

Re: SpellCheckerRequestHandler & qt parameter

2008-06-27 Thread Geoffrey Young
I had null pointer exceptions left and right while composing this email... then I added spellcheck.build=true to one and they went away. do you need to rebuild the spelling index every time you alter (certain parts) of solrconfig.xml? it was very consistent as reported below, but after simpl

Re: Solr Security and XSRF

2008-06-27 Thread Chris Hostetter
: > A basic technique that can be used to mitigate the risk of a possible CSRF : > attack like this is to configure your Servlet Container so that access to : > paths which can modify the index (ie: /update, /update/csv, etc...) are : > restricted either to specific client IPs, or using HTTP Authe

Re: Backup and Restore of index files

2008-06-27 Thread Chris Hostetter
: Those would be the individual files inside the lucene index, The "index" you : want to backup is the entire 'solr/data/index' directory. I imagine you want : to think of that directory as an atomic unit, the whole directory at once : describes your index, not the individual files in it. Correc

Re: AW: nonexistent filter class in schema.xml

2008-06-27 Thread Chris Hostetter
I just tested this using some tweaks to the example Schema and I cannot reproduce ... as Shalin described i get an expected "Error loading class" message logged on startup, and when attempting to load the admin screen i get a 500 page informing me there were errors in my config with the full d