Problems with long named parameters with update extract handler
$MalformedStreamException: Header section has more than 10240 bytes (maybe it is not properly terminated) at org.apache.commons.fileupload.MultipartStream.readHeaders(MultipartStream.java:544) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.findNextItem(FileUploadBase.java:1038) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.hasNext(FileUploadBase.java:1106) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:339) ... 42 more Under a certain amount of chars in the parameters name, the update request ends up correctly but I am not able to determine the exact amount. I was not able to find any advices or limitations in the Solr documentation so I am wondering if you consider it as a bug ? otherwise can you tell me what is the exact limitation so I can prevent this error to happen, and can we modify this limitation via configuration ? Regards, Julien --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus
Problems with long named parameters with update extract handler
: org.apache.commons.fileupload.MultipartStream$MalformedStreamException: Header section has more than 10240 bytes (maybe it is not properly terminated) at org.apache.commons.fileupload.MultipartStream.readHeaders(MultipartStream.java:544) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.findNextItem(FileUploadBase.java:1038) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.hasNext(FileUploadBase.java:1106) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:339) ... 42 more Under a certain amount of chars in the parameters name, the update request ends up correctly but I am not able to determine the exact amount. I was not able to find any advices or limitations in the Solr documentation so I am wondering if you consider it as a bug ? otherwise can you tell me what is the exact limitation so I can prevent this error to happen, and can we modify this limitation via configuration ? Regards, Julien --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus
Re: collection API timeout
I forgot to mention that we are using Solr 4.9.0 and zookeeper 3.4.6 Thanks Julien Le 04/11/2015 11:37, Julien DAVID - Decalog a écrit : Hi all, We have a production environment composed by 6 solrcloud server and 3 zookeeper. We've got around 30 collections, with 6 shards each. We recently moved from 3 solr to 6, splitting the shards (3 to 6). As the last weeks were a low period we didn't noticed any problem. But since monday, the API collections calls go systematically to timeout. We use calls to CLUSTERSTATUS, but LIST or OVERSEERSTATUS has the same results, whatever the node. We don't have any problem on the qualification environment which is identical, except the load. The error message is : CLUSTERSTATUS the collection time out:180sorg.apache.solr.common.SolrException: CLUSTERSTATUS the collection time out:180s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:320) at org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:639) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:220) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Thanks for your help -- Julien
Re: collection API timeout
Seems I'ill need to upgrade to 5.3.1 It is possible to upgrade from 4.9 to 5.3 or do I need deploy all intermediate versions? Thks
My solr server finishes itself
Hello, I'm facing strange problem my solr server stops itself randomly. with the message: Graceful shutdown SocketConnector@0.0.0.0:8983 You will find attached my solr.log I don't understand why there are no crontab running. There nothing in my log telling why solr shutdown itself. Any help will be very helpfull Thanks in advance
Can't post comment on Confluence pages under "Apache Solr Reference Guide"
Hi,I would like to post a comment about the problem below on Solr Confluence documentation, but comments are disabled right now for confluence-users (at least at the time I'm writing this - it was confirmed on IRC a minute ago).The page I would like to comment on is : https://cwiki.apache.org/confluence/display/solr/Result+GroupingIt seems to me that there is a minor mistake in the following sentence :"Grouped faceting only supports facet.field for string based fields that are not tokenized and are not multivalued."The point is : grouped faceting DO support multivalued fields. Indeed, as it can be read in the "request parameter" table on the same page:"Grouped faceting supports single and multivalued fields"I did many tests today that confirm the fact that multivalued fields are supported for grouped faceting.If someone can confirm that and have the rights to modify the documentation (or to post a comment), it would be great.Many thanks in advance.-- Julien Canquelain
Issue with Solr and copyFields
Hello, I meet an issue with Solr and copyFields and other things probably. here's a description of the issue. if anyone could help, it would be greatly appreciated as I've searched everywhere an am not able to figure out what's happening. my Solr is configured as follow: DIH request: SELECT d.label, d.description,r.name, r.id, r.phone FROM tab_r r INNER JOIN tab_d d ON d.rid = r.rid schema.xml If I leave everything as is, all is working fine. BUT, if I add the following lines to the schema.xml Then the import command gives: 135860 Is there anything i'm doing wrong? Thanks
pruning search result with search score gradient
Hi everyone, I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order), normalise them (0 would be the the lowest value and 1 the greatest value) and then calculate the gradient of the normalised scores. The documents with a gradient below a threshold value would be rejected. If the scores are linearly decreasing, then no document is rejected. However, if there is a brutal score drop, then the documents below the drop are rejected. The threshold value would still have to be tuned but I believe it would make a much stronger metric than an absolute search score. What do you think about this approach? Do you see any problem with it? Is there any SOLR tools that could help me dealing with that? Thanks for your answer. Julien
Specify increment gap with PatternTokenizerFactory
Hi, Is there a way to specify an increment gap between tokens with the PatternTokenizerFactory or do I need to customise it? For instance if I split on commas in "*Books, Online Shopping, Book Store*" I want to be able to put a 100 position gap between say "books" and "online shopping". There is of course the positionIncrementGap at the field type level but that won't help. Am currently using v1.3 Thanks Julien -- DigitalPebble Ltd http://www.digitalpebble.com
Re: boost function on field of type double
Have tried using * * for the field domain_score but still get 0.0 = (MATCH) FunctionQuery(sfloat(domain_score)), product of: 0.0 = sfloat(domain_score)=0.0 1.0 = boost 0.07387746 = queryNorm Thanks! Julien 2008/6/20 Julien Nioche <[EMAIL PROTECTED]>: > Hi guys, > > I am usgin SOLR 2.2. I am trying to boost documents such as this one > > > 9.600311 > 1.8212872 > content > d340da6d1483f028110b0ffc2402c417 > *14730* > http://www.bebo.com/Video.jsp > 20080529185637 > www.bebo.com > Video > 20080529195525711 > http://www.bebo.com/Video.jsp > 6 > > > using the *domain_score *field > > the field type double is defined in my schema as :* name="double" class="solr.DoubleField" omitNorms="true"/>* > > I added *bf=domain_score* to my query but can't see any change in the way > the documents are sorted > > the debug info shows that > > 0.0 = (MATCH) > FunctionQuery(org.apache.solr.search.function.FloatFieldSource:float(domain_score)), > product of: > 0.0 = float(domain_score)=0.0 > 1.0 = boost > > 0.1766438 = queryNorm > > Does the field have to be of type float to be used in a Function Query? I > tried using *bf=ord(domain_score*) but to no avail. > > Any clues? > > Thanks > > Julien > > -- > DigitalPebble Ltd > http://www.digitalpebble.com -- DigitalPebble Ltd http://www.digitalpebble.com
Re: boost function on field of type double
Hi Grant, Thanks for your help. I've just found the explanation to my problem: the fields need to be indexed in order to be used in a bf, which was even stated clearly in the documentation ;-) Hopefully someone will make the same mistake at some point and find this. I'm now using the SVN trunk version. I wanted to give SOLR-572 a try and really like it! Cool stuff - thanks to the contributors. Julien 2008/6/20 Grant Ingersoll <[EMAIL PROTECTED]>: > Hey Julien, > > What's your actual query look like? The original, the parsed, etc. (I > think a bunch of the variations get output when using debugQuery=true) > > There is a DoubleFieldSource in the trunk as of SOLR-324, so that probably > explains why you are seeing the FloatFieldSource (as I recall, the > DoubleField actually used the FloatFieldSource). > > However, I am not sure why it is not working otherwise. Perhaps you could > try trunk and see what happens? > > -Grant > > > On Jun 20, 2008, at 7:44 AM, Julien Nioche wrote: > > Hi guys, >> >> I am usgin SOLR 2.2. I am trying to boost documents such as this one >> >> >> 9.600311 >> 1.8212872 >> content >> d340da6d1483f028110b0ffc2402c417 >> *14730* >> http://www.bebo.com/Video.jsp >> 20080529185637 >> www.bebo.com >> Video >> 20080529195525711 >> http://www.bebo.com/Video.jsp >> 6 >> >> >> using the *domain_score *field >> >> the field type double is defined in my schema as :* > name="double" >> class="solr.DoubleField" omitNorms="true"/>* >> >> I added *bf=domain_score* to my query but can't see any change in the way >> the documents are sorted >> >> the debug info shows that >> >> 0.0 = (MATCH) >> FunctionQuery(org.apache.solr.search.function.FloatFieldSource:float(domain_score)), >> product of: >> 0.0 = float(domain_score)=0.0 >> 1.0 = boost >> 0.1766438 = queryNorm >> >> Does the field have to be of type float to be used in a Function Query? I >> tried using *bf=ord(domain_score*) but to no avail. >> >> Any clues? >> >> Thanks >> >> Julien >> >> -- >> DigitalPebble Ltd >> http://www.digitalpebble.com >> > > -- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > -- DigitalPebble Ltd http://www.digitalpebble.com
Re: Entity extraction?
Hi, Open Source NLP platforms like GATE (http://gate.ac.uk) or Apache UIMA are typically used for these types of tasks. GATE in particular comes with an application called ANNIE which does Named Entity Recognition. OpenCalais does that as well and should be easy to embed, but it can't be tuned to do more specific things unlike UIMA or GATE based applications. Depending on the architecture you have in mind it could be worth investigating Nutch and add the NER as a custom plugin; NLP being often a CPU intensive task you could leverage the scalability of Hadoop in Nutch. There is a patch which allows to delegate the indexing to SOLR. As someone else already said these named entities could then be used as facets. HTH Julien -- DigitalPebble Ltd http://www.digitalpebble.com 2008/10/24 Rogerio Pereira <[EMAIL PROTECTED]> > I agree Ryan and I would like see a completly integration between solr, > nutch, tika and mahout in the future. > > 2008/10/24 Ryan McKinley <[EMAIL PROTECTED]> > > > This is not something solr does currently... > > > > It sounds like something that should be added to Mahout: > > http://lucene.apache.org/mahout/ > > > > > > > > On Oct 24, 2008, at 4:18 PM, Charlie Jackson wrote: > > > > During a recent sales pitch to my company by FAST, they mentioned entity > >> extraction. I'd never heard of it before, but they described it as > >> basically recognizing people/places/things in documents being indexed > >> and then being able to do faceting on this data at query time. Does > >> anything like this already exist in SOLR? If not, I'm not opposed to > >> developing it myself, but I could use some pointers on where to start. > >> > >> > >> > >> Thanks, > >> > >> - Charlie > >> > >> > > > > > -- > Regards, > > Rogério (_rogerio_) > > [Blog: http://faces.eti.br] [Sandbox: http://bmobile.dyndns.org] > [Twitter: > http://twitter.com/ararog] > > "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento, > distribua e aprenda mais." > (http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento) >
boost function on field of type double
Hi guys, I am usgin SOLR 2.2. I am trying to boost documents such as this one 9.600311 1.8212872 content d340da6d1483f028110b0ffc2402c417 *14730* http://www.bebo.com/Video.jsp 20080529185637 www.bebo.com Video 20080529195525711 http://www.bebo.com/Video.jsp 6 using the *domain_score *field the field type double is defined in my schema as :* * I added *bf=domain_score* to my query but can't see any change in the way the documents are sorted the debug info shows that 0.0 = (MATCH) FunctionQuery(org.apache.solr.search.function.FloatFieldSource:float(domain_score)), product of: 0.0 = float(domain_score)=0.0 1.0 = boost 0.1766438 = queryNorm Does the field have to be of type float to be used in a Function Query? I tried using *bf=ord(domain_score*) but to no avail. Any clues? Thanks Julien -- DigitalPebble Ltd http://www.digitalpebble.com
Unified highlighter
Hi Solr community, I would like some help with a strange behavior that I observe on the unified highlighter. Here is the configuration of my highlighter : on unified false <span class="em"> </span> content_fr content_en exactContent true CHARACTER html 200 51200 I indexed some html documents from the www.datafari.com website. The problem is that on some documents (not all), there is not enough "context" wrapping the found search terms. For example, by searching "France labs", here is the highlighting obtained for a certain document: "content_en":["France class=\"em\">Labs"] Now, if I perform the same query but with the hl.bs.type set to SENTENCE instead of CHARACTER, I obtain the following highlighting for the same document : "content_en":["Trusted by About Contact Home Migrating GSA © 2018 Datafari by class=\"em\">France Labs"] This is way better but I strongly prefer using the WORD or CHARACTER types because highlighting can be too big with the SENTENCE or LINE types, depending on the indexed documents. I tried to change the hl.bs.type to WORD or either to increase the hl.fragsize up to 1000, but with any other hl.bs.type than SENTENCE or LINE, the highlighting is limited to the found words only, which is not enough for what I need. Is there something I am missing with the configuration ? For infos, I am using Solr 6.6.4. Thanks for your help. Julien
Error 500 with update extract handler on Solr 7.4.0
: org.apache.commons.fileupload.MultipartStream$MalformedStreamException: Header section has more than 10240 bytes (maybe it is not properly terminated) at org.apache.commons.fileupload.MultipartStream.readHeaders(MultipartStream.java:544) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.findNextItem(FileUploadBase.java:1038) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.hasNext(FileUploadBase.java:1106) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:339) ... 42 more Under a certain amount of chars in the parameters name, the update request ends up correctly but I am not able to determine the exact amount. I was not able to find any advices or limitations in the Solr documentation so I am wondering if you consider it as a bug ? otherwise can you tell me what is the exact limitation so I can prevent this error to happen, and can we modify this limitation via configuration ? Regards, Julien
Error 500 with update extract handler on Solr 7.4.0
: org.apache.commons.fileupload.MultipartStream$MalformedStreamException: Header section has more than 10240 bytes (maybe it is not properly terminated) at org.apache.commons.fileupload.MultipartStream.readHeaders(MultipartStream.java:544) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.findNextItem(FileUploadBase.java:1038) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.hasNext(FileUploadBase.java:1106) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:339) ... 42 more Under a certain amount of chars in the parameters name, the update request ends up correctly but I am not able to determine the exact amount. I was not able to find any advices or limitations in the Solr documentation so I am wondering if you consider it as a bug ? otherwise can you tell me what is the exact limitation so I can prevent this error to happen, and can we modify this limitation via configuration ? Regards, Julien
collection API timeout
Hi all, We have a production environment composed by 6 solrcloud server and 3 zookeeper. We've got around 30 collections, with 6 shards each. We recently moved from 3 solr to 6, splitting the shards (3 to 6). As the last weeks were a low period we didn't noticed any problem. But since monday, the API collections calls go systematically to timeout. We use calls to CLUSTERSTATUS, but LIST or OVERSEERSTATUS has the same results, whatever the node. We don't have any problem on the qualification environment which is identical, except the load. The error message is : CLUSTERSTATUS the collection time out:180sorg.apache.solr.common.SolrException: CLUSTERSTATUS the collection time out:180s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:320) at org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:639) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:220) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Thanks for your help -- Julien