Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently
Hi Hoss. Well I'll enable this ignore options for fields that aren't declared in my schema. Thanks. Exactly, you can try it really easily, just remove one of your fields on the example schema config and try to add content using the Java client API... Well I'm using SOLRJ and it returns no error code for me. But anyway don't you think the server should also have some logging informing that documents are being discarded? Cheers, Daniel On 28/11/07 19:25, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > : I didn't know that trick. > > erik is refering to this in the example schema.xml... > > > > > ...but it sounds like you are having some other problem ... you said that > when you POST your documents with "extra" fields you get a 200 > response but the documents aren't getting indexed at all correct? > > that is not suppose to happen, Solr should be generating an error. can > you give us more info on your setup: what does your schema.xml look like, > what does your update code look like (you said you were using SolrJ i > believe?) what does Solr log when these updates happen, etc... > > > > -Hoss > http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Schema class configuration syntax
Norskog, Lance wrote: Hi- What is the element in an element that will load this class: org.apache.lucene.analysis.cn.ChineseFilter This did not work: This is in Solr 1.2. the class needs to point to a FilterFactory (not a Filter) 1.3-dev adds FilterFactories for all the lucne contrib fiters. Using 1.2, add a jar file with this class and you should be all set: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/analysis/ChineseFilterFactory.java ryan
Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently
To be clear: solr *should* fail with an error if you send an unknown field. I just tested this with a clean checkout of 1.3-dev and 1.2 and in both cases I get an error 400 "unknown field 'asgasdgasgd'" The suggestion to look at the "ignore option" is to make sure you don't have one -- this should be the only to add an arbitrary unknown field without an error. From a clean 1.2/1.3-dev install, how can you reproduce the error? I tried: $ ant example $ cd example/ $ java -jar start.jar another terminal: edit mem.xml to add: 5 $ cd example/exampledocs $ ./post.sh mem.xml this gives: HTTP ERROR: 400ERROR:unknown field 'asgasdgasgd' running either 1.2 or 1.3 ryan Daniel Alheiros wrote: Hi Hoss. Well I'll enable this ignore options for fields that aren't declared in my schema. Thanks. Exactly, you can try it really easily, just remove one of your fields on the example schema config and try to add content using the Java client API... Well I'm using SOLRJ and it returns no error code for me. But anyway don't you think the server should also have some logging informing that documents are being discarded? Cheers, Daniel On 28/11/07 19:25, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: : I didn't know that trick. erik is refering to this in the example schema.xml... ...but it sounds like you are having some other problem ... you said that when you POST your documents with "extra" fields you get a 200 response but the documents aren't getting indexed at all correct? that is not suppose to happen, Solr should be generating an error. can you give us more info on your setup: what does your schema.xml look like, what does your update code look like (you said you were using SolrJ i believe?) what does Solr log when these updates happen, etc... -Hoss http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: LowerCaseFilterFactory and spellchecker
It seems the best thing to do would be to do a case-insensitive spellcheck, but provide the suggestion preserving the original case that the user provided--or at least make this an option. Users are often lazy about capitalization, especially with search where they've learned from web search engines that case (typically) doesn't matter. So, for example, Thurne would return Thorne, but thurne would return thorne. -Sean John Stewart wrote: Rob, Let's say it worked as you want it to in the first place. If the query is for Thurne, wouldn't you get thorne (lower-case 't') as the suggestion? This may look weird for proper names. jds
How much disc space Solr consumes?
Hello,.. If index size is 100Gb and I want to run optimize command, how much more space I need for this? Also,.. If I run snapshooter does it take more space during shooting than actual snapshoot? \Thank you Gene
can I do *thing* substring searches at all?
With a fieldtype of string, can I do any sort of *thing* search? I can do thing* but not *thing or *thing*. Workarounds?
Re: can I do *thing* substring searches at all?
Store a copy with the string reversed in another field. Then you can search that field for gniht* ... Also, I believe I saw some comments about prefix wildcards being available in some upcoming release (1.3?) ... sorry I can't remember any better than that. Google may help ... -Charlie On Nov 29, 2007 2:51 PM, Brian Whitman <[EMAIL PROTECTED]> wrote: > With a fieldtype of string, can I do any sort of *thing* search? I > can do thing* but not *thing or *thing*. Workarounds? > > > > >
Document field data not getting indexed
Hi, I have 22 documents. I index these by posting them using LWP::UserAgent all with http status 200 OK. One of my documents (id=44) contains the word "Campeau" in the "ocr" field. But according to luke this term does not appear in the index. Yet when I delete the index (delete by query *:* or restart server after deleting /index) and index just document id=44 its ocr field data does appear in the index according to luke. Also I notice that the numTerms for 22 documents is 5579 and for just the doc id=44 it's 2194. Hard to believe that 22 documents only increase the number of terms by so little. Why/how could this be happening? Thanks, Phil --- My schema.xml: required="true"/> required="true"/> required="true"/> where "mytext" is Indexing 22 docs: - 22 22 5579 1196382086904 true true false 2007-11-30T00:22:06Z mytext IT--- (unstored field) 22 5513 [...] 22 22 22 ??? 22 22 Indexing just doc id=44: 1 1 2194 1196381821086 true true false 2007-11-30T00:17:21Z mytext IT--- (unstored field) 1 2191 [...] 1 1 1 1 << 1 1 1
Distribution without SSH?
Hello, I recently set up Solr with distribution on a couple of servers. I just learned that our network policies do not permit us to use SSH with passphraseless keys, and the snappuller script uses SSH to examine the master Solr instance's state before it pulls the newest index via rsync. We plan to attempt to rewrite the snappuller (and possibly other distribution scripts, as required) to eliminate this dependency on SSH. I thought I ask the list in case anyone has experience with this same situation or any insights into the reasoning behind requiring SSH access to the master instance. Thanks, Justin Knoll
Re: Document field data not getting indexed
On Nov 29, 2007 7:29 PM, Phillip Farber <[EMAIL PROTECTED]> wrote: > One of my documents (id=44) contains the word "Campeau" in the "ocr" > field. But according to luke this term does not appear in the index. AFAIK the Luke handler lists the top terms, not necessarily all of them. Do a search for ocr:Campeau and see if it returns anything. -Yonik
Re: Document field data not getting indexed
see yonik's comments regarding Luke and wether or not your term is indexedx, as for this point : Also I notice that the numTerms for 22 documents is 5579 and for just the doc : id=44 it's 2194. Hard to believe that 22 documents only increase the number : of terms by so little. this is not suprising. numTerms is the number of *unique* terms, independent of how many documents each term appears in -- if the word "eclipse" appears in the ocr field of 17 documents a total of 457 times, it is still only counted once in numTerms. -Hoss
Re: LowerCaseFilterFactory and spellchecker
: think i'm just doing something wrong... : : was experimenting with the spellcheck handler with the nightly : checkout from 11-28; seems my spellchecking is case-sensitive, even : tho i think i'm adding the LowerCaseFilterFactory to both the index : and query analyzers. I'm not very familiar with the SpellCheckerRequestHandler, but i don't think you are doing anything wrong. a quick skim of the code indicates that the "q" param isn't being analyzed by that handler, so the raw input string is pased to the SpellChecker.suggestSimilar method. This may or may not have been intentional. I personally can't think of any reason why it wouldn't make sense to get the query analyzer for the termSourceField and use it to analyze the q param before getting suggestions. -Hoss
Re: SOLR/Lucene sorting - Question/ requesting suggestion
Kasi Sankaralingam wrote: When we have the following set of data, they are first sorted based on Capital letters and then lower case . Is there a way to make them sort regardless of character case? Avaneesh Bruce Veda caroleY jonathan junit So carole would come after Bruce. Thanks sorting is based on the token, not the stored field. Use a fieldType that includes the LowerCaseFilterFactory Check the 'alphaOnlySort' fieldType in the example schema.xml -- that makes a token with lowercase and tosses any non letters (you can get rid of the PatternReplaceFilterFactory but it is a good example) ryan
Re: Distribution without SSH?
Your company's network policies seem to be a good thing. I've worked at places with this same policy, for good reason. But it does tend to complicate operations sometimes. Some options you might pursue: * Set up ssh-agent on the clients and use passphrase-protected keys. Downside to this, someone on your ops team will be inevitably awoken at 4am to type in the password. * Try to get an exception to the policy by running Solr under a new user account inside a jail. Use a restricted login shell to make sure it can do only what you intend. So when the key is compromised, damage is contained. Or, write a custom server/client running on a different port. In this case you lose over-the-wire encryption, and if your server is buggy, you get pwn3d anyway. --Matt On Nov 29, 2007, at 7:48 PM, Justin Knoll wrote: Hello, I recently set up Solr with distribution on a couple of servers. I just learned that our network policies do not permit us to use SSH with passphraseless keys, and the snappuller script uses SSH to examine the master Solr instance's state before it pulls the newest index via rsync. We plan to attempt to rewrite the snappuller (and possibly other distribution scripts, as required) to eliminate this dependency on SSH. I thought I ask the list in case anyone has experience with this same situation or any insights into the reasoning behind requiring SSH access to the master instance. Thanks, Justin Knoll -- Matt Kangas / [EMAIL PROTECTED]
SOLR/Lucene sorting - Question/ requesting suggestion
When we have the following set of data, they are first sorted based on Capital letters and then lower case . Is there a way to make them sort regardless of character case? Avaneesh Bruce Veda caroleY jonathan junit So carole would come after Bruce. Thanks
Re: LowerCaseFilterFactory and spellchecker
On 29-Nov-07, at 5:40 PM, Chris Hostetter wrote: I'm not very familiar with the SpellCheckerRequestHandler, but i don't think you are doing anything wrong. a quick skim of the code indicates that the "q" param isn't being analyzed by that handler, so the raw input string is pased to the SpellChecker.suggestSimilar method. This may or may not have been intentional. I personally can't think of any reason why it wouldn't make sense to get the query analyzer for the termSourceField and use it to analyze the q param before getting suggestions. It does make some sense, but I'm not sure that it should be blindly analyzed without adding logic to handle certain cases (like the QueryParser does). What happens if the analyzer produces two tokens? The spellchecker has to deal with this appropriately. Spell checkers should be able to "reverse analyze" the suggestions as well, so "Pyhton" gets corrected to "Python" and not "python". Similarly, "ad-hco" should probably suggest "ad-hoc" and not "adhoc". -Mike
Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently
: Exactly, you can try it really easily, just remove one of your fields on the : example schema config and try to add content using the Java client API... : Well I'm using SOLRJ and it returns no error code for me. But anyway don't : you think the server should also have some logging informing that documents : are being discarded? As someone who is not very familiar with SolrJ, I can imagine that perhaps it has a bug where it might not return an error code in situations like this (it would suprise me, but i can imagine it) however I'm really confused by your comment that the server isn't logging that documents are being discarded. If you try to index a document with a field SOlr doesn't recognize, it logs quite a big exception. This is easily reproducable using post.jar and the example schema (unchanged). Running this command... java -Ddata=args -jar post.jar 'hoss' ...triggers this log messages in Solr... Nov 29, 2007 6:09:28 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'hoss' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:245) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196) ... ...which leads me to suspect there's something wonky with your setup. exactly which version of Solr are you using, what does your SolrJ code look like, and what log messages do you see when a document is *successfully* indexed? you should see somehting like... INFO: {add=[SOLR1000]} 0 102 ...where the uniqueKey of your doc is in the []. If you don't see those messages, then you aren't looking in the right place for Solr's log messages. -Hoss