Re: sorting on aggregate averages
Thanks! I'll have a look at that. On Wed, Apr 2, 2008 at 6:25 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : I am computing a sorted rank list and returning a slice (for pagination) > but > : have to recompute the result for each request, although the actual q > : parameter and fq would be cached but not the sorted list which I could > cache > : to reuse on subsequent requests. > : > : I might have a look at the caching also, any suggestions in this regard. > > Take a look at "User/Generic Caches" here... > >http://wiki.apache.org/solr/SolrCaching > > Your custom handler/component can use SolrIndexSearcher.getCache to see if > a cache with a specific name has been defined, if it has you can do the > normal get/put operations on it. The cache will worry about expulsion of > items if it's full (the only Impl that comes with Solr is an LRUCache, but > you could write your own if you want), and SolrCore will worry about > giving you a new cache instance when a new reader is opened. If you > implement a CacheRegenerator (and configure it for this cache) then you > can put whatever custome code in that you want for autowarming entries in > the cache based on the keys/values of the old cache (ie: warm all the > keys, warm the "first" N keys, warm all the keys whose values indicate > they were expensive to compute, etc) > > (just make sure your custom handler/component can function ok even if the > cache doesn't exist, or if there are cache misses even when you don't > expect them -- it is after all just a cache, good code should be able to > function (slowly) without it if it's turned off.) > > -Hoss > >
Search exact terms
Hi all, is there a Solr wide setting that with which I can achieve the following : if I now search for q=onderwij, I also receive documents with results of "onderwijs" etc.. this is ofcourse the behavior that is described but if I search on "onderwij", I still get the "onderwijs" hits, I use for this field the type "text" from the schema.xml that is supplied with the default Solr. Is there a global setting on Solr to always search Exact ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
problem with ShowFileRequestHandler
Edward.Zhang had commit the problem before > I want to programmatically retrieve the schema and the config from the > ShowFileRequestHandler. I encounter some trouble. There are CJK characters > in the xml files as follows: > > > > > > 记录号 > > > > But I get a confusing response from solr using "/admin/file/?file= > schema.xml". IE and firefox both report parse errors.I try > "/admin/file/?file=schema.x&contentType=text/plain" and I get the same > result as follow: > > > > > > ?/uniqueKey> > > > BTW: The xml files are encoded in UTF-8 and they work fine when I open > these files locally using IE. And I set tomcat's 8080 connector > "URIEncoding" argument "UTF-8" too. > So is there anything missing for me? Or is it a bug? > > Every reply would be appreciated. Ryan has changed the RawResponseWriter to use the Reader but problem seems not solved For example: my schema.xml is a "UTF-8" File but reader's default encoding is "GBK" then i still can't get the right String
Re: Search exact terms
If you want this behavior then the field type should not be 'text'. for default fieldtype=text there are many filters applied before the values are indexed, this includes stemming (reducing the word to root word, removing s in ur case. try using fieldtype=string instead. this will match strictly to the values in the field (exact match, case sensitive) try tweaking schema.xml in the conf folder . you can tweak the definition in this file to be able to use delimiter/ case filters as seems fir for your case. -umar 2008/4/2 Tim Mahy <[EMAIL PROTECTED]>: > Hi all, > > is there a Solr wide setting that with which I can achieve the following : > > if I now search for q=onderwij, I also receive documents with results of > "onderwijs" etc.. this is ofcourse the behavior that is described but if I > search on "onderwij", I still get the "onderwijs" hits, I use for this field > the type "text" from the schema.xml that is supplied with the default > Solr. > > Is there a global setting on Solr to always search Exact ? > > Greetings, > > Tim > > > > > > Info Support - http://www.infosupport.com > > Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is > op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit > bericht en staat niet in voor de juiste en volledige overbrenging van de > inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al > de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen > - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van > Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw > verzoek per omgaande kosteloos toe. > > De informatie in dit e-mailbericht is uitsluitend bestemd voor de > geadresseerde. Gebruik van deze informatie door anderen is verboden. > Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze > informatie aan derden is niet toegestaan. > > Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit > bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de > zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze > e-mail vervolgens vernietigt. >
Wildcard search + case insensitive
Hi all, I use this type definition in my schema.xml : When I have a document with the term "demo" in it and I search for dem* , I receive the document back from Solr, but when I search on Dem* I don't get the document. Is the LowerCaseFilterFactory not executed when a wildcard search is being performed ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
java.io.FileNotFoundException?
We just started hitting a FileNotFoundException for no real apparent reason for both our regular index and our spellchecker index, and only a few minute after we restarted Solr. I did some searching and didn't find much that helped. We started to do some load testing, and after about 10 minutes we started getting these errors. We hit the spellchecker every request through a SpellcheckComponent that we created (ie, code ripped out of SpellCheckRequestHandler for now). It runs essentially the same code as the spellcheck request handler when we specify a parameter (spellcheck=true). We have 34 cores. All but two cores are fully optimized (haven't been updated in 2 months). Only two cores are actively updated. We started Solr around 11:45am, not much happened until 12:27 when we started load testing (just a few queries, maybe 100 updates). find /home/dsteiger/local/solr/cores/*/data/index|wc -l => 414 find /home/dsteiger/local/solr/cores/*/data/spell|wc -l => 6 (only the two 'active' cores use the spell checker). So, not many files are open. Anyone have any idea what might cause the two below errors to happen? When I restarted Solr around 11:45am it was to test a new patch that set the mergeFactor in the lucene spellchecker to 2 instead of 300 because we kept running into 'too many files open' errors when rebuilding more than one spell index at a time. The spell indexes were rebuilt manually using the mergeFactor of 300, solr restarted, and any subsequent rebuild of the spell index would use a mergeFactor of 2. After we hit this error, I rebuilt the spell indexes with the new code replicated them to the slave, restarted Solr, and all has been well. We ran the load testing for more than an hour and the issue hasn't returned. Could the old spell indexes that were created using the high mergeFactor cause an issue like this somehow? Could the opening and closing of searchers so fast cause this? I don't have the slightest idea. All of our search queries hit the slave, and the master just handles updates. The master had no issues through all of this. Caused by: java.io.IOException: cannot read directory org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qaa/data/spell: list() returned null at org.apache.lucene.index.SegmentInfos.getCurrentSegmentGeneration(SegmentInfos.java:115) at org.apache.lucene.index.IndexReader.indexExists(IndexReader.java:506) at org.apache.lucene.search.spell.SpellChecker.setSpellIndex(SpellChecker.java:102) at org.apache.lucene.search.spell.SpellChecker.(SpellChecker.java:89) And this happened I believe when running the snapinstaller (done through cron)... Caused by: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qab/data/index: files: null at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587) at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) at org.apache.lucene.index.IndexReader.open(IndexReader.java:209) at org.apache.lucene.index.IndexReader.open(IndexReader.java:173) at org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:93) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:706) We're running r614955. Thanks. Doug
Re: How to use Solr in java program
hossman wrote: > > > : > I recommend using Solr as a webservice, even if your client is Java. > but > : > there are options for embedding Solr directly into your applications > using > > : thank you hossman for your response,I have another question : I have > writen > : a small java program using sockets to send an http query to solr which > is > : running under tomcat then I got a response in xml format is > : that an example of using it as Web services since the the comminication > is > : based on http/xml ; or using tools such Axis is mondatory to talk about > web > : services (or Solr in it self by its behaviour is a web service). > > semantics are either wonderful or horrible - depending on perspective. to > some people, the term "webservice" has a *very* specific meaning, i > however was just using it in the more relaxed sense of communicating over > HTTP - so yes, you understood my meaning. > > But really: opening your own raw Socket to do hte HTTP communication is > one level lower then anyone should ever consider coding. it's HTTP, > there are lots of libraries that will take care of the nitty gritty > details for you and make your life easier. > > Like i said before: look at the wiki, try out SolrJ, it should make your > life much easier. > > > > -Hoss > > > Thank you Hossman for your reply, now I see solr differently and clearly; i will try the SolrJ. -- View this message in context: http://www.nabble.com/How-to-use-Solr-in-java-program-tp16236930p16446909.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Wildcard search + case insensitive
Hi all, I already found the answer to my question on the following blog : http://michaelkimsal.com/blog/2007/04/solr-case-sensitivty/ greetings, Tim -Oorspronkelijk bericht- Van: Tim Mahy [mailto:[EMAIL PROTECTED] Verzonden: wo 2-4-2008 13:19 Aan: solr-user@lucene.apache.org Onderwerp: Wildcard search + case insensitive Hi all, I use this type definition in my schema.xml : When I have a document with the term "demo" in it and I search for dem* , I receive the document back from Solr, but when I search on Dem* I don't get the document. Is the LowerCaseFilterFactory not executed when a wildcard search is being performed ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
Re: Multiple unique field?
Thank you for your reply In other word, can I set 2 unique key field? directly in solr: no In your own code, yes -- either in the client or in custom plugin. ryan
Help with XmlPullParserException
Hello all, I'm indexing a body of OCR and encountered this exception. Apparently it's some kind of XML parser error. Out of thousands of documents, which I create with significant processing to make sure they are XML compliant, only this one appears to have a problem. But can anyone tell me what this specific error message means? SEVERE: org.xmlpull.v1.XmlPullParserException: character reference (with decimal value) may not contain a (position: START_TAG seen ...dieses aus dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 2a... @21781:16) Thanks! Phil == Full trace: SEVERE: org.xmlpull.v1.XmlPullParserException: character reference (with decimal value) may not contain a (position: START_TAG seen ...dieses aus dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 2a... @21781:16) at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2195) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Re: problem with ShowFileRequestHandler
On Apr 2, 2008, at 5:03 AM, 李银松 wrote: Edward.Zhang had commit the problem before I want to programmatically retrieve the schema and the config from the ShowFileRequestHandler. I encounter some trouble. There are CJK characters in the xml files as follows: 记录号 But I get a confusing response from solr using "/admin/file/?file= schema.xml". IE and firefox both report parse errors.I try "/admin/file/?file=schema.x&contentType=text/plain" and I get the same result as follow: ?/uniqueKey> BTW: The xml files are encoded in UTF-8 and they work fine when I open these files locally using IE. And I set tomcat's 8080 connector "URIEncoding" argument "UTF-8" too. So is there anything missing for me? Or is it a bug? Every reply would be appreciated. Ryan has changed the RawResponseWriter to use the Reader but problem seems not solved For example: my schema.xml is a "UTF-8" File but reader's default encoding is "GBK" then i still can't get the right String I just changed this to use the same ContentStream code we use for posting files -- so it should now respect the "contentType" param You should be able to see things properly with: ?file=xxx&contentType=UTF-8 ryan
Re: Search exact terms
search is based on the fields you index and how you index them. If you index using the "text" field -- with stemming etc, you will have to search with the same criteria. If you want exact search, consider the "string" type. If you want both, you can use the to copy the same content into multiple fields so it is searchable multiple ways ryan On Apr 2, 2008, at 4:46 AM, Tim Mahy wrote: Hi all, is there a Solr wide setting that with which I can achieve the following : if I now search for q=onderwij, I also receive documents with results of "onderwijs" etc.. this is ofcourse the behavior that is described but if I search on "onderwij", I still get the "onderwijs" hits, I use for this field the type "text" from the schema.xml that is supplied with the default Solr. Is there a global setting on Solr to always search Exact ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
Re: java.io.FileNotFoundException?
Hi Doug, Sounds fishy, especially increasing/decreasing mergeFactor to "funny values" (try changing your OS setting instead). My guess is this is happening only with the 2 indices that are being modified and I'll guess that the FNFE is due to a bad/incomplete rsync from the master. Do snappuller logs mention any errors? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, April 1, 2008 4:12:25 PM Subject: java.io.FileNotFoundException? We just started hitting a FileNotFoundException for no real apparent reason for both our regular index and our spellchecker index, and only a few minute after we restarted Solr. I did some searching and didn't find much that helped. We started to do some load testing, and after about 10 minutes we started getting these errors. We hit the spellchecker every request through a SpellcheckComponent that we created (ie, code ripped out of SpellCheckRequestHandler for now). It runs essentially the same code as the spellcheck request handler when we specify a parameter (spellcheck=true). We have 34 cores. All but two cores are fully optimized (haven't been updated in 2 months). Only two cores are actively updated. We started Solr around 11:45am, not much happened until 12:27 when we started load testing (just a few queries, maybe 100 updates). find /home/dsteiger/local/solr/cores/*/data/index|wc -l => 414 find /home/dsteiger/local/solr/cores/*/data/spell|wc -l => 6 (only the two 'active' cores use the spell checker). So, not many files are open. Anyone have any idea what might cause the two below errors to happen? When I restarted Solr around 11:45am it was to test a new patch that set the mergeFactor in the lucene spellchecker to 2 instead of 300 because we kept running into 'too many files open' errors when rebuilding more than one spell index at a time. The spell indexes were rebuilt manually using the mergeFactor of 300, solr restarted, and any subsequent rebuild of the spell index would use a mergeFactor of 2. After we hit this error, I rebuilt the spell indexes with the new code replicated them to the slave, restarted Solr, and all has been well. We ran the load testing for more than an hour and the issue hasn't returned. Could the old spell indexes that were created using the high mergeFactor cause an issue like this somehow? Could the opening and closing of searchers so fast cause this? I don't have the slightest idea. All of our search queries hit the slave, and the master just handles updates. The master had no issues through all of this. Caused by: java.io.IOException: cannot read directory org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qaa/data/spell: list() returned null at org.apache.lucene.index.SegmentInfos.getCurrentSegmentGeneration(SegmentInfos.java:115) at org.apache.lucene.index.IndexReader.indexExists(IndexReader.java:506) at org.apache.lucene.search.spell.SpellChecker.setSpellIndex(SpellChecker.java:102) at org.apache.lucene.search.spell.SpellChecker.(SpellChecker.java:89) And this happened I believe when running the snapinstaller (done through cron)... Caused by: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qab/data/index: files: null at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587) at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) at org.apache.lucene.index.IndexReader.open(IndexReader.java:209) at org.apache.lucene.index.IndexReader.open(IndexReader.java:173) at org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:93) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:706) We're running r614955. Thanks. Doug
Brazilian Portuguese synonyms
Hi guys! Lucas, I would like know more about your work with support of brazilian portguese synonyms in solr. Thanks for any help. -- Yours truly (Atenciosamente), Rogério (_rogerio_) http://faces.eti.br "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento, distribua e aprenda mais." (http://faces.eti.br/?p=45)
Re: java.io.FileNotFoundException?
The user that runs our apps is configured to allow 65536 open files in limits.conf. Shouldn't even come close to that number. Solr is the only app we have running on these machines as our app user. We hit the same type of issue when we had our mergeFactor set to 40 for all of our indexes. We lowered it to 5 and have been fine since. No errors in the snappuller for either core. The spellcheck index is rebuilt once a night around midnight and copied to the slave afterwards. I had even rebuilt the spell index manually for the two cores, pulled them, installed them, and tested to make sure it was working with a few queries before the load testing started (this was before we released the patch to lower the spell index mergeFactor). We were even getting errors trying to run out postCommit script on the slave (it doesn't end up doing anything since it's the slave). SEVERE: java.io.IOException: Cannot run program "./solr/bin/snapctl": java.io.IOException: error=24, Too many open files at java.lang.ProcessBuilder.start(Unknown Source) at java.lang.Runtime.exec(Unknown Source) And a correction from my previous email. The errors started 10 -seconds- after load testing started. This was about 40 minutes after Solr started, and less than 30 queries had been run on the server before load testing started. Load testing has been fine since I restarted Solr and rebuilt the spellcheck indexes with the lowered mergeFactor. Doug Otis Gospodnetic wrote: Hi Doug, Sounds fishy, especially increasing/decreasing mergeFactor to "funny values" (try changing your OS setting instead). My guess is this is happening only with the 2 indices that are being modified and I'll guess that the FNFE is due to a bad/incomplete rsync from the master. Do snappuller logs mention any errors? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
Re: Help with XmlPullParserException
I just looked at this again and I think the problem is that the message is referring to the garbage string of characters "2a" where a looks like a decimal numeric character reference but the letter 'a' is a hex digit. I'll have to go back to my OCR cleanup routine ... Thanks for reading. Phil Phillip Farber wrote: Hello all, I'm indexing a body of OCR and encountered this exception. Apparently it's some kind of XML parser error. Out of thousands of documents, which I create with significant processing to make sure they are XML compliant, only this one appears to have a problem. But can anyone tell me what this specific error message means? SEVERE: org.xmlpull.v1.XmlPullParserException: character reference (with decimal value) may not contain a (position: START_TAG seen ...dieses aus dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 2a... @21781:16) Thanks! Phil == Full trace: SEVERE: org.xmlpull.v1.XmlPullParserException: character reference (with decimal value) may not contain a (position: START_TAG seen ...dieses aus dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 2a... @21781:16) at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2195) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Re: Brazilian Portuguese synonyms
Synonyms support? Actually, we just have a big list of portuguese synonyms. I was talking about a portuguese steemer. Interested? Anything, just mail me @ [EMAIL PROTECTED] []s, Lucas Rogerio Pereira wrote: Hi guys! Lucas, I would like know more about your work with support of brazilian portguese synonyms in solr. Thanks for any help.
Re: Wildcard search + case insensitive
Hmm. I'd like the ability to turn on or off in the config case sensitivity... I'm looking forward to this patch. Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On Apr 2, 2008, at 5:48 AM, Tim Mahy wrote: Hi all, I already found the answer to my question on the following blog : http://michaelkimsal.com/blog/2007/04/solr-case-sensitivty/ greetings, Tim -Oorspronkelijk bericht- Van: Tim Mahy [mailto:[EMAIL PROTECTED] Verzonden: wo 2-4-2008 13:19 Aan: solr-user@lucene.apache.org Onderwerp: Wildcard search + case insensitive Hi all, I use this type definition in my schema.xml : positionIncrementGap="100"> words="stopwords.txt"/> words="stopwords.txt"/> When I have a document with the term "demo" in it and I search for dem* , I receive the document back from Solr, but when I search on Dem* I don't get the document. Is the LowerCaseFilterFactory not executed when a wildcard search is being performed ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
Re: Brazilian Portuguese synonyms
Yes! 2008/4/2, Lucas F. A. Teixeira <[EMAIL PROTECTED]>: > > Synonyms support? > > Actually, we just have a big list of portuguese synonyms. > I was talking about a portuguese steemer. Interested? > > Anything, just mail me @ [EMAIL PROTECTED] > > []s, > > > Lucas > > > > Rogerio Pereira wrote: > > Hi guys! > > > > Lucas, I would like know more about your work with support of brazilian > > portguese synonyms in solr. > > > > Thanks for any help. > > > > > -- Yours truly (Atenciosamente), Rogério (_rogerio_) http://faces.eti.br "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento, distribua e aprenda mais." (http://faces.eti.br/?p=45)
numDocs and maxDoc
Hi, I am trying to update the index by 2 stage posting: part of the index will be posted in stage 1 by 1.xml, then after a meanwhiles the left of the index of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml have 3 document and id is used as unique field, what I see in the admin panel make me feels confusing: numDocs : 3 maxDoc : 6 which number is the value of document exist in system? Is maxDoc just only a stat, not involved in any calculating process? If the maxDoc is the true number of document exist in system, is the optimization tool is the only way to compress the index? Thank you, Vinci -- View this message in context: http://www.nabble.com/numDocs-and-maxDoc-tp16448068p16448068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing a word in url
I also couldn't get the exact results I wanted for indexing URL components using WordDelimeterFilter or patternTokenizer, so resorted to adding a new field ('pathparts'), plus a few lines of code to generate the tokens in our content preprocessor which submits documents to SOLR for indexing. -Simon On Tue, Apr 1, 2008 at 7:24 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Actually I want to use anything that is not alphabet or digit to be the > : separator - anything between them will be a word (so that I can use the > URL > : fragment to see what is indexed about this site)...any suggestion? > > In addition to Mike's suggestion of trying out the WordDelimiterFilter, > take a look at the PatternTokenizerFactory. > > > > -Hoss > >
Re: numDocs and maxDoc
On 2-Apr-08, at 11:29 AM, Vinci wrote: Hi, I am trying to update the index by 2 stage posting: part of the index will be posted in stage 1 by 1.xml, then after a meanwhiles the left of the index of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml have 3 document and id is used as unique field, what I see in the admin panel make me feels confusing: numDocs : 3 maxDoc : 6 which number is the value of document exist in system? Is maxDoc just only a stat, not involved in any calculating process? If the maxDoc is the true number of document exist in system, is the optimization tool is the only way to compress the index? When you add a document that has the same unique id as a document currently in the index, the previous document is marked as "deleted" and the new one added. This results in 6 documents physically on disk (BUT when searching you will never see the deleted docs). Deleted documents are purged during segment merging, which will occur for the whole index during optimization and will happen naturally as you add more documents to the system without optimization. Normally it isn't something to worry about. -Mike
Re: Wildcard search + case insensitive
: Hmm. I'd like the ability to turn on or off in the config case sensitivity... : I'm looking forward to this patch. FYI: here's the relevant issue... http://issues.apache.org/jira/browse/SOLR-218 NOTE: no one has ever contributed any patches to address this problem. (although yonik did felsh out a POC patch for an alternate "DWIM" approach in SOLR-219) -Hoss
Re: numDocs and maxDoc
: I am trying to update the index by 2 stage posting: part of the index will : be posted in stage 1 by 1.xml, then after a meanwhiles the left of the index : of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml have 3 : document and id is used as unique field, what I see in the admin panel make my gut tells me that what you mean by this is that you want to index fields A and B for documents 1, 2, and 3; and then later you want to provide valudes for additional fields C and D for the same documents (1,2 and 3) "updating" documents is not currently supported in Solr. there has been lots of dicsussion about it in the past, and some patches exist in Jira that approach the problem, but it's a lot harder then it seems like it should be because of hte way Lucene works - esentially Solr under the covers does the exact same thing you currently have do do: keep a record of all the fields for all the documents, and reindex the *whole* document once you have them. : me feels confusing: : numDocs : 3 : maxDoc : 6 numDocs is hte number of unique "live" Documents in the index. it's how many docs you would get back fro ma query for *:*. maxDoc is the maximum internal document id currently in use. the difference between those numbers gives you an idea of how many "deleted" (orreplaced) documents are currently still in the index ... they gradually get cleaned up as segments get merged or when the index gets optimized. -Hoss
RE: Search exact terms
This is confusing advice to a beginner. A string field will not find a word in the middle of a sentence. To get normal searches without this confusions, copy the 'text' type and make a variant without the Stemmer. The problem is that you are using an English language stemmer for what appears to be Dutch. There is a Dutch stemmer, it might be better for your needs if the content is all Dutch. To make an exact search field which still has helpful searching properties, make another variant of text that breaks up words but does not stem. You might also want to add the ISOLatin1 filter which maps all European characters to USASCII equivalents. This is also very helpful for multi-language searching. Lance -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2008 7:06 AM To: solr-user@lucene.apache.org Subject: Re: Search exact terms search is based on the fields you index and how you index them. If you index using the "text" field -- with stemming etc, you will have to search with the same criteria. If you want exact search, consider the "string" type. If you want both, you can use the to copy the same content into multiple fields so it is searchable multiple ways ryan On Apr 2, 2008, at 4:46 AM, Tim Mahy wrote: > Hi all, > > is there a Solr wide setting that with which I can achieve the > following : > > if I now search for q=onderwij, I also receive documents with results > of "onderwijs" etc.. this is ofcourse the behavior that is described > but if I search on "onderwij", I still get the "onderwijs" > hits, I use for this field the type "text" from the schema.xml that is > supplied with the default Solr. > > Is there a global setting on Solr to always search Exact ? > > Greetings, > > Tim > > > > > > Info Support - http://www.infosupport.com > > Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support > is op geen enkele wijze aansprakelijk voor vergissingen of > onjuistheden in dit bericht en staat niet in voor de juiste en > volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden > door Info Support uitgevoerd en op al de aan ons gegeven opdrachten > zijn - tenzij expliciet anders overeengekomen - onze Algemene > Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te > Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek > per omgaande kosteloos toe. > > De informatie in dit e-mailbericht is uitsluitend bestemd voor de > geadresseerde. Gebruik van deze informatie door anderen is verboden. > Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van > deze informatie aan derden is niet toegestaan. > > Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit > bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als > u de zender door een antwoord op deze e-mail hiervan op de hoogte > brengt en deze e-mail vervolgens vernietigt.
dataimport handler multiple databases
Hi I have a situaion where I am using dataimport handler with development db and going to use it with production database in production environment I have entry in solr-config.xml like this /home/username/data-config.xml com.mysql.jdbc.Driver jdbc:mysql://localhost/dbname db_username db_password I understand i can add another datasource called datasource-2 . but how can I can use this datasource to index data currently i am colling somethign /dataimport?command=full-import or /dataimport?command=delta-import.How can i define a particular db to be called so it indexes dev db on development machine and prod db in production environmnt. thanks
searching like RDBMS way
This is very general requirement and I am sure somebody might have thought about the solution. Sample scenario to explain my question --- There is a many-to-many relationship between 2 entities - Sales Person & Client One sales person can work for many clients. One Client may be served by many sales persons. I will have 3 separate index storages. 1. Only for Sales Persons 2. Id combinations for IDs of sales persons and clients (many-to-many list) 3. Only for Clients Query Requirement - > Get all the clients for a given sales person. For this I need to hook to index 2 and 3 to get the full result. One immediate solution would be - Make first query to get client ids from 2nd index - and then make another query using those client ids to pull client detail information from 3rd index. I cannot make 2 separate search calls since there could be thousands of clients for a sales person. This results into maxClause count error. I know how to increase it but not a good solutions. Thanks Sunil ** This e-mail transmission and any attachments that accompany it may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law and is intended solely for the use of the individual(s) to whom it was intended to be addressed. If you have received this e-mail by mistake, or you are not the intended recipient, any disclosure, dissemination, distribution, copying or other use or retention of this communication or its substance is prohibited. If you have received this communication in error, please immediately reply to the author via e-mail that you received this message by mistake and also permanently delete the original and all copies of this e-mail and any attachments from your computer. Thank you. **
Solr and OpenPipe
Hi! Somebody has been working with Solr and OpenPipe? -- Yours truly (Atenciosamente), Rogério (_rogerio_) http://faces.eti.br "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento, distribua e aprenda mais." (http://faces.eti.br/?p=45)
Re: problem with ShowFileRequestHandler
Thanks Ryan 2008/4/2, Ryan McKinley <[EMAIL PROTECTED]>: > > > On Apr 2, 2008, at 5:03 AM, 李银松 wrote: > > > Edward.Zhang had commit the problem before > > > > I want to programmatically retrieve the schema and the config from the > > > ShowFileRequestHandler. I encounter some trouble. There are CJK > > > > > characters > > > > > in the xml files as follows: > > > > > > > > > > > > > 记录号 > > > > > > > > > > > But I get a confusing response from solr using "/admin/file/?file= > > > schema.xml". IE and firefox both report parse errors.I try > > > "/admin/file/?file=schema.x&contentType=text/plain" and I get the same > > > result as follow: > > > > > > > > > > > > > ?/uniqueKey> > > > > > > > > > > > > > BTW: The xml files are encoded in UTF-8 and they work fine when I open > > > these files locally using IE. And I set tomcat's 8080 connector > > > "URIEncoding" argument "UTF-8" too. > > > So is there anything missing for me? Or is it a bug? > > > > > > Every reply would be appreciated. > > > > > > > Ryan has changed the RawResponseWriter to use the Reader > > but problem seems not solved > > For example: > > my schema.xml is a "UTF-8" File > > but reader's default encoding is "GBK" > > then i still can't get the right String > > > > > I just changed this to use the same ContentStream code we use for posting > files -- so it should now respect the "contentType" param > > You should be able to see things properly with: > ?file=xxx&contentType=UTF-8 > > ryan > >
Re: searching like RDBMS way
On Wed, 2 Apr 2008 15:31:43 -0500 [EMAIL PROTECTED] wrote: > This is very general requirement and I am sure somebody might have thought > about the solution. Hi Sunil, - please don't hijack the thread :) - why don't you use the right tool for the problem? from what you said, a RDBMS sounds like is what you need. B _ {Beto|Norberto|Numard} Meijome Sysadmins can't be sued for malpractice, but surgeons don't have to deal with patients who install new versions of their own innards. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: dataimport handler multiple databases
each entity has an optional attribute called dataSource. If you have multiple dataSources give them a name and use the name is dataSource .So you solrconfig must look like /home/username/data-config.xml datasource-1 com.mysql.jdbc.Driver datasource-2 com.mysql.jdbc.Driver and each entity can have its dataSource attribute refer to something eg: But as I see you have a usecase where prod and qa uses different dbs. But So betweenprod and qa us can change the solrconfig xml --Noble On undefined, Ismail Siddiqui <[EMAIL PROTECTED]> wrote: > Hi I have a situaion where I am using dataimport handler with development db > and going to use it with production database in production environment > > I have entry in solr-config.xml like this > >class="org.apache.solr.handler.dataimport.DataImportHandler"> > > /home/username/data-config.xml > > com.mysql.jdbc.Driver > jdbc:mysql://localhost/dbname > db_username > db_password > > > > > I understand i can add another datasource called datasource-2 . but how can > I can use this datasource to index data > > currently i am colling somethign /dataimport?command=full-import or > /dataimport?command=delta-import.How can i define a particular db to be > called > so it indexes dev db on development machine and prod db in production > environmnt. > > > thanks > -- --Noble Paul
Re: numDocs and maxDoc
Hi, Thanks hossman, this is exactly what I want to do. Final question: so I need to merge the field by myself first? (Actually my original plan is to do 2 consecutive postingso merging is possible) Thank you, Vinci hossman wrote: > > > : I am trying to update the index by 2 stage posting: part of the index > will > : be posted in stage 1 by 1.xml, then after a meanwhiles the left of the > index > : of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml have 3 > : document and id is used as unique field, what I see in the admin panel > make > > my gut tells me that what you mean by this is that you want to index > fields A and B for documents 1, 2, and 3; and then later you want to > provide valudes for additional fields C and D for the same documents (1,2 > and 3) > > "updating" documents is not currently supported in Solr. there has > been lots of dicsussion about it in the past, and some patches exist in > Jira that approach the problem, but it's a lot harder then it seems like > it should be because of hte way Lucene works - esentially Solr under the > covers does the exact same thing you currently have do do: keep a record > of all the fields for all the documents, and reindex the *whole* document > once you have them. > > : me feels confusing: > : numDocs : 3 > : maxDoc : 6 > > numDocs is hte number of unique "live" Documents in the index. it's how > many docs you would get back fro ma query for *:*. maxDoc is the maximum > internal document id currently in use. the difference between those > numbers gives you an idea of how many "deleted" (orreplaced) documents are > currently still in the index ... they gradually get cleaned up as segments > get merged or when the index gets optimized. > > > > -Hoss > > > -- View this message in context: http://www.nabble.com/numDocs-and-maxDoc-tp16448068p16465796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple unique field?
Hi, Thank you for your reply. When I set 2 unique key field, it looks like Solr only accept the first definition in schema.xml...question: so once the unique Key defined, it can't be overrided? Thank you, Vinci ryantxu wrote: > >> >> Thank you for your reply >> In other word, can I set 2 unique key field? > > directly in solr: no > > In your own code, yes -- either in the client or in custom plugin. > > ryan > > -- View this message in context: http://www.nabble.com/Multiple-unique-field--tp16367339p16465798.html Sent from the Solr - User mailing list archive at Nabble.com.