Stemming and other tokenizers
Hello, I want to implement some king of AutoStemming that will detect the language of a field based on a tag at the start of this field like #en# my field is stored on disc but I don't want this tag to be stored. Is there a way to avoid this field to be stored ? To me all the filters and the tokenizers interact only with the indexed field and not the stored one. Am I wrong ? Is it possible to you to do such a filter. Patrick.
Re: Master Slave Question
Real Time indexing (solr 4) or decrease replication poll and auto commit time. 2011/9/10 Jamie Johnson > Is it appropriate to query the master servers when replicating? I ask > because there could be a case where we index say 50 documents to the > master, they have not yet been replicated and a user asks for page 2, > when they ask for page 2 the request could be sent to a slave and get > 0. Is there a way to avoid this? My thought was to not allow > querying of the master but I'm not sure that this could be configured > in solr >
Re: Stemming and other tokenizers
I can't create one field per language, that is the problem but I'll dig into it following your indications. I let you know what I could come out with. Patrick. 2011/9/11 Jan Høydahl > Hi, > > You'll not be able to detect language and change stemmer on the same field > in one go. You need to create one fieldType in your schema per language you > want to use, and then use LanguageIdentification (SOLR-1979) to do the magic > of detecting language and renaming the field. If you set > langid.override=false, languid.map=true and populate your "language" field > with the known language, you will probably get the desired effect. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > On 10. sep. 2011, at 03:24, Patrick Sauts wrote: > > > Hello, > > > > > > > > I want to implement some king of AutoStemming that will detect the > language > > of a field based on a tag at the start of this field like #en# my field > is > > stored on disc but I don't want this tag to be stored. Is there a way to > > avoid this field to be stored ? > > > > To me all the filters and the tokenizers interact only with the indexed > > field and not the stored one. > > > > Am I wrong ? > > > > Is it possible to you to do such a filter. > > > > > > > > Patrick. > > > >
RE: Weird behaviors with not operators.
Maybe this will answer your question http://wiki.apache.org/solr/FAQ Why does 'foo AND -baz' match docs, but 'foo AND (-bar)' doesn't ? Boolean queries must have at least one "positive" expression (ie; MUST or SHOULD) in order to match. Solr tries to help with this, and if asked to execute a BooleanQuery that does contains only negatived clauses _at the topmost level_, it adds a match all docs query (ie: *:*) If the top level BoolenQuery contains somewhere inside of it a nested BooleanQuery which contains only negated clauses, that nested query will not be modified, and it (by definition) an't match any documents -- if it is required, that means the outer query will not match. More Detail: * https://issues.apache.org/jira/browse/SOLR-80 * https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3Cal pine.deb.1.10.1006011609080.29...@radix.cryptio.net%3E Patrick. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, September 12, 2011 3:04 PM To: solr-user@lucene.apache.org Subject: Re: Weird behaviors with not operators. : I'm crashing into a weird behavior with - operators. I went ahead and added a FAQ on this using some text from a previous nearly identical email ... https://wiki.apache.org/solr/FAQ#Why_does_.27foo_AND_-baz.27_match_docs.2C_b ut_.27foo_AND_.28-bar.29.27_doesn.27t_.3F please reply if you have followup questions. -Hoss
RE: Weird behaviors with not operators.
I mean it's a known bug. Hostetter AND (-chris *:*) Should do the trick. Depending on your request. NAME:(-chris *:*) -Original Message- From: Patrick Sauts [mailto:patrick.via...@gmail.com] Sent: Monday, September 12, 2011 3:57 PM To: solr-user@lucene.apache.org Subject: RE: Weird behaviors with not operators. Maybe this will answer your question http://wiki.apache.org/solr/FAQ Why does 'foo AND -baz' match docs, but 'foo AND (-bar)' doesn't ? Boolean queries must have at least one "positive" expression (ie; MUST or SHOULD) in order to match. Solr tries to help with this, and if asked to execute a BooleanQuery that does contains only negatived clauses _at the topmost level_, it adds a match all docs query (ie: *:*) If the top level BoolenQuery contains somewhere inside of it a nested BooleanQuery which contains only negated clauses, that nested query will not be modified, and it (by definition) an't match any documents -- if it is required, that means the outer query will not match. More Detail: * https://issues.apache.org/jira/browse/SOLR-80 * https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3Cal pine.deb.1.10.1006011609080.29...@radix.cryptio.net%3E Patrick. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, September 12, 2011 3:04 PM To: solr-user@lucene.apache.org Subject: Re: Weird behaviors with not operators. : I'm crashing into a weird behavior with - operators. I went ahead and added a FAQ on this using some text from a previous nearly identical email ... https://wiki.apache.org/solr/FAQ#Why_does_.27foo_AND_-baz.27_match_docs.2C_b ut_.27foo_AND_.28-bar.29.27_doesn.27t_.3F please reply if you have followup questions. -Hoss
facet.method=fc
Is the parameter facet.method=fc still needed ? Thank you. Patrick.
Limitations of prohibited clausses in sub-expression - pure negative query
I can find the answer but is this problem solved in Solr 1.4.1 ? Thx for your answers.
RE: Limitations of prohibited clausses in sub-expression - pure negative query
Maybe SOLR-80 jira issue ? As written in Solr 1.4 book; "pure negative query doesn't work correctly ." you have to add 'AND *:* ' thx From: Patrick Sauts [mailto:patrick.via...@gmail.com] Sent: mardi 28 septembre 2010 11:53 To: 'solr-user@lucene.apache.org' Subject: Limitations of prohibited clausses in sub-expression - pure negative query I can find the answer but is this problem solved in Solr 1.4.1 ? Thx for your answers.
Re: Huge load and long response times during search
Try solr.FastLRUCache instead of solr.LRUCache it's the new cache gesture for solr 1.4. And maybe true in main index section or diminish mergefactor see http://wiki.apache.org/lucene-java/ImproveSearchingSpeed Tomasz Kępski a écrit : Hi, I'm using SOLR(1.4) to search among about 3,500,000 documents. After the server kernel was updated to 64bit system has started to suffer. Our server has 8G of RAM and double Intel Core 2 DUO. We used to have average loads around 2-2,5. It was not as good as it should but as long HTTP response times was acceptable we do not care to much ;-) Since few days avg loads are usually around 6, sometimes goes even to 20. PHP, Mysql and Postgresql based application is rather fine, but when tries to access SOLR it takes ages to load page. In top java process (Jetty) takes 200-250% of CPU, iotop shows that most of the disk operations are done by SOLR threads as well. When we do shut down Jetty load goes down to 1,5 or even less than 1. My index has ~12G below is a part of my solrconf.xml: 1024 true true 40 200 solr 0 name="rows">10 solr price 0 10 solr name="sort">rekomendacja 0 name="rows">10 static newSearcher warming query from solrconfig.xml fast_warm 0 10 static firstSearcher warming query from solrconfig.xml false dismax explicit 0.01 name^90.0 scategory^450.0 brand^90.0 text^0.01 description^30 brand,description,id,name,price,score 4<100% 5<90% 100 *:* sample query parameters from log looks like this: 2009-11-20 21:07:15 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={spellcheck=true&wt=json&rows=20&json.nl=map&start=520&facet=true&spellcheck.collate=true&fl=id,name,description,preparation,url,shop_id&q=camera&qt=dismax&version=1.3&hl.fl=name,description,atributes,brand,url&facet.field=shop_id&facet.field=brand&hl.fragsize=200&spellcheck.count=5&hl.snippets=3&hl=true} hits=3784 status=0 QTime=83 2009-11-20 21:07:15 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/spellCheckCompRH params={spellcheck=true&wt=json&rows=20&json.nl=map&start=520&facet=true&spellcheck.collate=true&fl=id,name,description,preparation,url,shop_id&q=camera&qt=dismax&version=1.3&hl.fl=name,description,atributes,brand,url&facet.field=shop_id&facet.field=brand&hl.fragsize=200&spellcheck.count=5&hl.snippets=3&hl=true} hits=3784 status=0 QTime=16 And at last the question ;-) How to speed up the search? Which parameters should I check first to find out what is the bottleneck? Sorry for verbose entry but I would like to give as clear point of view as possible Thanks in advance, Tom
StreamingUpdateSolrServer
Hi All, I'm testing StreamingUpdateSolrServer for indexing but I don't see the last : finished: org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner@ in my logs. Do I have to use a special function to wait until update is effective ? Another question (maybe easy for you) I'm running solr on a tomcat 5.0.28 and sometimes, not at a time of rsync or big traffic or commit, it doesn't respond anymore and uptime is very high. Thank you for your help. Patrick.
Invalid CRLF - StreamingUpdateSolrServer ?
I'm using solr 1.4 on tomcat 5.0.28, with client StreamingUpdateSolrServer with 10threads and xml communication via Post method. Is there a way to avoid this error (data lost)? And is StreamingUpdateSolrServer reliable ? GRAVE: org.apache.solr.common.SolrException: Invalid CRLF at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) Caused by: com.ctc.wstx.exc.WstxIOException: Invalid CRLF
Re: Invalid CRLF - StreamingUpdateSolrServer ?
Thank you Yonik for your answer. The platform encoding is "fr_FR.UTF-8", so it's still UTF-8, it should be I guess "en_US.UTF-8" ? I've also tested LBHttpSolrServer (We wanted to have it as a "backup" for HAproxy) and it appears not to be thread safe ( what is also curious about it, is that there's no way to manage the connections' pool ). If you're interresting in the logs, I can send those to you. *Will there be a Solr 1.4.1 that'll fix those problems ?* Cause using a SNAPSHOT doesn't seem a good idea to me. I have another question but I don't know if I have to make a new post : Can I use "-Dmaster=disabled" in JAVA_OPTS for a server that is slave and repeater ? Patrick. Yonik Seeley a écrit : It could be this bug, fixed in trunk: * SOLR-1595: StreamingUpdateSolrServer used the platform default character set when streaming updates, rather than using UTF-8 as the HTTP headers indicated, leading to an encoding mismatch. (hossman, yonik) Could you try a recent nightly build (or build your own from trunk) and see if it fixes it? -Yonik http://www.lucidimagination.com On Thu, Dec 31, 2009 at 5:07 AM, Patrick Sauts wrote: I'm using solr 1.4 on tomcat 5.0.28, with client StreamingUpdateSolrServer with 10threads and xml communication via Post method. Is there a way to avoid this error (data lost)? And is StreamingUpdateSolrServer reliable ? GRAVE: org.apache.solr.common.SolrException: Invalid CRLF at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) Caused by: com.ctc.wstx.exc.WstxIOException: Invalid CRLF
Re: Invalid CRLF - StreamingUpdateSolrServer ?
The issue was sometimes null result during facet navigation or simple search, results were back after a refresh, we tried to changed the cache to . But same behaviour. *My implementation was :* (maybe wrong ?) LBHttpSolrServer solrServer = new LBHttpSolrServer(new HttpClient(), new XMLResponseParser(), solrServerUrl.split(",")); solrServer.setConnectionManagerTimeout(CONNECTION_TIMEOUT); solrServer.setConnectionTimeout(CONNECTION_TIMEOUT); solrServer.setSoTimeout(READ_TIMEOUT); solrServer.setAliveCheckInterval(CHECK_HEALTH_INTERVAL_MS); *What I was suggesting :* As a LBHttpSolrServer is a wrapper to CommonsHttpSolrServer CommonsHttpSolrServer search1 = new CommonsHttpSolrServer("http://mysearch1";); search1.setConnectionTimeout(CONNECTION_TIMEOUT); search1.setSoTimeout(READ_TIMEOUT); search1.setConnectionManagerTimeout(solr.CONNECTION_MANAGER_TIMEOUT); search1.setDefaultMaxConnectionsPerHost(MAX_CONNECTIONS_PER_HOST1); search1.setMaxTotalConnections(MAX_TOTAL_CONNECTIONS1); search1.setParser(new XMLResponseParser()); CommonsHttpSolrServer search2 = new CommonsHttpSolrServer("http://mysearch1";); search2.setConnectionTimeout(CONNECTION_TIMEOUT); search2.setSoTimeout(READ_TIMEOUT); search2.setConnectionManagerTimeout(solr.CONNECTION_MANAGER_TIMEOUT); search2.setDefaultMaxConnectionsPerHost(MAX_CONNECTIONS_PER_HOST1); search2.setMaxTotalConnections(MAX_TOTAL_CONNECTIONS1); search2.setParser(new XMLResponseParser()); *LBHttpSolrServer solrServers = new LBHttpSolrServer(search1, search2);* So we can manage the parameters per server. Thank you for your time. Patrick. Shalin Shekhar Mangar a écrit : On Mon, Jan 4, 2010 at 6:11 PM, Patrick Sauts wrote: I've also tested LBHttpSolrServer (We wanted to have it as a "backup" for HAproxy) and it appears not to be thread safe ( what is also curious about it, is that there's no way to manage the connections' pool ). If you're interresting in the logs, I can send those to you. What is the issue that you are facing? What is it exactly that you want to change?
Re: Invalid CRLF - StreamingUpdateSolrServer ?
The issue was sometimes null result during facet navigation or simple search, results were back after a refresh, we tried to changed the cache to . But same behaviour. That is strange. Just to make sure, you were using the same LBHttpSolrServer instance for all requests, weren't you? Yes it was a single static instance for all request on the same core/index. We have 6 differents indexes on a tomcat. When testing localy I had no problem, but the dysfunction happened on real traffic application server. That's why I think it might not be thread safe. *My implementation was :* (maybe wrong ?) LBHttpSolrServer solrServer = new LBHttpSolrServer(new HttpClient(), new XMLResponseParser(), solrServerUrl.split(",")); solrServer.setConnectionManagerTimeout(CONNECTION_TIMEOUT); solrServer.setConnectionTimeout(CONNECTION_TIMEOUT); solrServer.setSoTimeout(READ_TIMEOUT); solrServer.setAliveCheckInterval(CHECK_HEALTH_INTERVAL_MS); *What I was suggesting :* As a LBHttpSolrServer is a wrapper to CommonsHttpSolrServer I think that is a good idea. Can you open a jira issue? I have opened : SOLR-1700. I hope it is precise enough.
readOnly=true IndexReader
In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found -Open the IndexReader with readOnly=true. This makes a big difference when multiple threads are sharing the same reader, as it removes certain sources of thread contention. How to open the IndexReader with readOnly=true ? I can't find anything related to this parameter. Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any incidence on solr with a standart solrConfig.xml? Thank you for your answers. Patrick.
schema.xml and Xinclude
As in schema.xml are the same between all our indexes, I'd like to make them an XInclude so I tried : xmlns:xi="http://www.w3.org/2001/XInclude";> - - - My Syntax might not be correct ? Or it is not possible ? yet ? Thank you again for your time. Patrick.
Re: Invalid CRLF - StreamingUpdateSolrServer ?
I've patched the solrj release(tag) 1.4 with SOLR-1595, it's online for about two weeks now and It's working just fine. Thanks a lot. Patrick. P.S.: It's a pity there is no plan for a 1.4.1 release Yonik Seeley a écrit : It could be this bug, fixed in trunk: * SOLR-1595: StreamingUpdateSolrServer used the platform default character set when streaming updates, rather than using UTF-8 as the HTTP headers indicated, leading to an encoding mismatch. (hossman, yonik) Could you try a recent nightly build (or build your own from trunk) and see if it fixes it? -Yonik http://www.lucidimagination.com On Thu, Dec 31, 2009 at 5:07 AM, Patrick Sauts wrote: I'm using solr 1.4 on tomcat 5.0.28, with client StreamingUpdateSolrServer with 10threads and xml communication via Post method. Is there a way to avoid this error (data lost)? And is StreamingUpdateSolrServer reliable ? GRAVE: org.apache.solr.common.SolrException: Invalid CRLF at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) Caused by: com.ctc.wstx.exc.WstxIOException: Invalid CRLF
Re: If you could have one feature in Solr...
Synchronisation between the slaves to switch the new index at the same time after replication. Grant Ingersoll a écrit : What would it be?