Re: newbie Q regarding schema configuration
: so.. my first question in schema.xml, can you have a composite key as : the 'uniquekey' field, or do i need to do this on the client side? at the moment this would need to be done client site, but you're not the first person to ask so i've added it to the TaskList ... it doesn't seem like it would be too hard. : can you have complex types which are multivalued? : I'd like to store something like : a tag-name with a corresponding tag-weighting. There's nothing like that built into Solr - the best way to model that would probably be to use the term frequency to represent the weight - you could have an analyzer that parsed input like... "blue state"^2 "democrat"^1 "john kerry"^5 ...and converted it into a stream of tokens like... [blue state] [blue state] [democrat] [john kerry] [john kerry]... ..kind of kludgy, but that's the best mechanism Lucene has at the moment (there are plans to add more generic term attributes, but that's still currently a design thing) : can you do sum(*) type queries in lucene/solr? it is efficient ? or : are you better having a 2nd index which has these sum(*) values in it : and keep it up to date instead. sum's across multiple documents, or sums of values in a single document? in the later case, you don't need a seperate index, just another field. in the former case it's really a question of what sets of documents you want sums across? .. if it's all of them then you could just store that info in a flat file, or a special metadata document in your index .. if what you want is more of a run-time calculation then you can certainly do it in a custom request handler (and you can use a SolrCache and a custom CacheRegnirator to make sure the values are cached for as long as the searcher is open, and autowarmed when a new one is opened). Generally the best way to do math operations on sets of documents in Lucene is using the FieldCache, and this is certainly available to Lucene request handlers. -Hoss
Re: Wildcard Query
: I have been using Lucene for about a month now and trying to port the same : functionality to Solr. How do I do a wildcard query with a leading "*" : ...This is possible with Lucene if you do not use the standard query : parser. How do you do this with Solr This is probably very easy but I : can not find any information in docs or mailing list. There is no easy way to change this just by modifying configuration -- you'll need to write your own request handler which uses the QueryParser of your choice. -Hoss
Re: Wildcard Query
Ok, before I go start writing a new request handlerlet me ask a dumb question and see if I am approaching this wrong in Solr. If I am trying to search a field where I have one doc with a field that has a value of "Hello World"...if the search query is "ello" ...currently is there a way to make this query match this field? > From: Chris Hostetter <[EMAIL PROTECTED]> > Reply-To: solr-user@lucene.apache.org > Date: Tue, 20 Jun 2006 01:09:52 -0700 (PDT) > To: solr-user@lucene.apache.org > Subject: Re: Wildcard Query > > > : I have been using Lucene for about a month now and trying to port the same > : functionality to Solr. How do I do a wildcard query with a leading "*" > : ...This is possible with Lucene if you do not use the standard query > : parser. How do you do this with Solr This is probably very easy but I > : can not find any information in docs or mailing list. > > There is no easy way to change this just by modifying configuration -- > you'll need to write your own request handler which uses the QueryParser > of your choice. > > > -Hoss > >
Re: Wildcard Query
On Jun 20, 2006, at 6:07 AM, Pace Davis wrote: Ok, before I go start writing a new request handlerlet me ask a dumb question and see if I am approaching this wrong in Solr. If I am trying to search a field where I have one doc with a field that has a value of "Hello World"...if the search query is "ello" ...currently is there a way to make this query match this field? This is more a Lucene question than Solr. You will need to either do special analysis that would index "hello" in various pieces such as "ello", "llo"... or do as Hoss suggested and create a custom request handler that searched and returned results however you like. I have written several custom request handlers in my application. You could do this pretty easily yourself by copying StandardRequestHandler to your own class name, modifying how it creates the Query, and configuring it in solrconfig.xml file. Erik
Re: Wildcard Query
If it is just a matter of matching lower case to upper case and upper case to lower case, one can simply use the LowercaseFilter. Bill On 6/20/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jun 20, 2006, at 6:07 AM, Pace Davis wrote: > Ok, before I go start writing a new request handlerlet me ask a > dumb > question and see if I am approaching this wrong in Solr. If I am > trying to > search a field where I have one doc with a field that has a value > of "Hello > World"...if the search query is "ello" ...currently is there a way > to make > this query match this field? This is more a Lucene question than Solr. You will need to either do special analysis that would index "hello" in various pieces such as "ello", "llo"... or do as Hoss suggested and create a custom request handler that searched and returned results however you like. I have written several custom request handlers in my application. You could do this pretty easily yourself by copying StandardRequestHandler to your own class name, modifying how it creates the Query, and configuring it in solrconfig.xml file. Erik
Re: Wildcard Query
On 6/20/06, Pace Davis <[EMAIL PROTECTED]> wrote: I have been using Lucene for about a month now and trying to port the same functionality to Solr. How do I do a wildcard query with a leading "*" ...This is possible with Lucene if you do not use the standard query parser. It's not really possible to do efficiently with Lucene out-of-the-box either. Terms are sorted, so foo* is a relatively quick query, but *foo is horribly slow since all terms must be scanned. You can do things like what Erik suggests... index all the variants: "Hello" "ello" "llo", etc Another more limited form that would take up less index space would be to index the reverse of the token as well: Index="olleH", query="olle*" We don't yet have an analyzer to do this (and neither does Lucene AFAIK). As Chris points out, in addition to analysis components, the QueryParse would need to be changed as well. I've thought about hooking in the QueryParser to the FieldTypes more before... One reason is to know if something like a prefix query should be lowercased or not. Another reason could be to handle special construction of wildcard queries when there is support for "*foo". -Yonik
Re: Wildcard Query
Thanks for all the help. The only field where I need this is to search sku fields...example being "19-JN910" The search needs to be able to pull a match if the query were "JN" ...Erik's solution is the way to go and simple to implement. > From: "Yonik Seeley" <[EMAIL PROTECTED]> > Reply-To: solr-user@lucene.apache.org > Date: Tue, 20 Jun 2006 09:53:53 -0400 > To: solr-user@lucene.apache.org > Subject: Re: Wildcard Query > > On 6/20/06, Pace Davis <[EMAIL PROTECTED]> wrote: >> I have been using Lucene for about a month now and trying to port the same >> functionality to Solr. How do I do a wildcard query with a leading "*" >> ...This is possible with Lucene if you do not use the standard query >> parser. > > It's not really possible to do efficiently with Lucene out-of-the-box either. > Terms are sorted, so foo* is a relatively quick query, but *foo is > horribly slow since all terms must be scanned. > > You can do things like what Erik suggests... index all the variants: > "Hello" "ello" "llo", etc > Another more limited form that would take up less index space would be > to index the reverse of the token as well: > Index="olleH", query="olle*" > > We don't yet have an analyzer to do this (and neither does Lucene AFAIK). > As Chris points out, in addition to analysis components, the > QueryParse would need to be changed as well. > > I've thought about hooking in the QueryParser to the FieldTypes more before... > One reason is to know if something like a prefix query should be > lowercased or not. > Another reason could be to handle special construction of wildcard > queries when there is support for "*foo". > > > -Yonik >
Re: newbie Q regarding schema configuration
can you have complex types which are multivalued? I'd like to store something like a tag-name with a corresponding tag-weighting. How much work it is might depend on how static or dynamic the tag-weighting is. If it's very static, you could simply use index-time boosts. can you do sum(*) type queries in lucene/solr? it is efficient ? If all tag weights were the same, you would get summing for "free" via lucene scoring I think... It all depends on the exact details of what you are trying to do, how many tags, how are the weights calculated, are the sums across all tags or dynamically determined by some query, etc... -Yonik
Re: Wildcard Query
On 6/20/06, Pace Davis <[EMAIL PROTECTED]> wrote: Thanks for all the help. The only field where I need this is to search sku fields...example being "19-JN910" The search needs to be able to pull a match if the query were "JN" ...Erik's solution is the way to go and simple to implement. For SKUs, another possible soultion is to use the WordDelimiterFilter. It's good if there is a person is typing in SKUs that might use a different delimiter by mistake. 19-JN910 would be indexed as "19 JN 910", and the following queries would all match it: "19" "JN" "910" "19JN" "JN 910" "19JN910" "19/JN-910", etc. -Yonik
Invalid XML returned from Solr
I have a application that I recently ported to Solr and am running into a few problems with the XML responses from Solr. An XML response which came from a Solr query, returned XML data that was not properly escaped (no CDATA tag, or entity substitution). In particular the "summary" field contains '<' characters. An example of such a response can be found here: http://www.willetts.com/mike/response.xml I looked through the source code for XMLWriter and it appears to be using util.XML.escape to escape the data, so I do not see how this response able to occur. Does anyone have any ideas? Here is the requestHandler tag in the Solr config file: On another note: I also noticed that I get non-utf8 characters in the response even though the encoding line at the top of the XML document specifies utf8 encoding. I did not see anywhere in the XMLWriter code that checked the encoding of the output. Is this by design, or am I missing something? Thanks in advance, the feedback I have received from the user lists has been invaluable. Regards, Mike
Re: Invalid XML returned from Solr
On 6/20/06, Mike Richmond <[EMAIL PROTECTED]> wrote: I have a application that I recently ported to Solr and am running into a few problems with the XML responses from Solr. An XML response which came from a Solr query, returned XML data that was not properly escaped (no CDATA tag, or entity substitution). In particular the "summary" field contains '<' characters. An example of such a response can be found here: http://www.willetts.com/mike/response.xml Hmmm, that is interesting... I haven't seen that before. I'll try and duplicate it with your example "summary" field. On another note: I also noticed that I get non-utf8 characters in the response even though the encoding line at the top of the XML document specifies utf8 encoding. Are you using the bundled version of Jetty? People have been having problems with international chars with that. You might try using Tomcat. I did not see anywhere in the XMLWriter code that checked the encoding of the output. Is this by design, or am I missing something? By design... XMLWriter writes java characters and strings, and the servlet container handles encoding to UTF-8. -Yonik
Re: Invalid XML returned from Solr
Hi Yonik, Thanks for the quick reply. I am willing to give you access to my index, config files, or any other pieces that you may need if it would help. I am basically running the example application (which uses Jetty), but with a modified schema.xml and a couple other small changes. I'll look into giving Tomcat a try over Jetty. --Mike On 6/20/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 6/20/06, Mike Richmond <[EMAIL PROTECTED]> wrote: > I have a application that I recently ported to Solr and am running > into a few problems with the XML responses from Solr. An XML response > which came from a Solr query, returned XML data that was not properly > escaped (no CDATA tag, or entity substitution). In particular the > "summary" field contains '<' characters. An example of such a response > can be found here: http://www.willetts.com/mike/response.xml Hmmm, that is interesting... I haven't seen that before. I'll try and duplicate it with your example "summary" field. > On another note: > I also noticed that I get non-utf8 characters in the response even > though the encoding line at the top of the XML document specifies utf8 > encoding. Are you using the bundled version of Jetty? People have been having problems with international chars with that. You might try using Tomcat. > I did not see anywhere in the XMLWriter code that checked > the encoding of the output. Is this by design, or am I missing > something? By design... XMLWriter writes java characters and strings, and the servlet container handles encoding to UTF-8. -Yonik
Re: Invalid XML returned from Solr
I've confirmed this is a Jetty bug related to international chars (>=128) and their output writer. When I moved the example to Tomcat 5.5, everything worked as expected. For the exact same Lucene index file, Tomcat outputs I¹lland Jetty outputs I¹ll We should really look into switching the appserver we bundle for the example. -Yonik
Error posting document
I am getting the following error when trying to post any document. org.xmlpull.v1.XmlPullParserException: only whitespace content allowed before start tag and not = (position: START_DOCUMENT seen =... @1:1) at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1519) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078) at org.apache.solr.core.SolrCore.update(SolrCore.java:642) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:744) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Unknown Source) Has anyone experienced this? -- Kerry Wilson Lead Developer Williams Web [EMAIL PROTECTED] | 423.485.4747
Re: Error posting document
On 6/20/06, Kerry Wilson <[EMAIL PROTECTED]> wrote: I am getting the following error when trying to post any document. Hi Kerry, could you provide an example document that show this? -Yonik
Re: Error posting document
It is happening on all documents but just for fun here is one I have used: Post to SOLR http://localhost:8080/solr/update"; method="POST"> It happens to be the form I am submitting it from, here is the output: org.xmlpull.v1.XmlPullParserException: only whitespace content allowed before start tag and not = (position: START_DOCUMENT seen =... @1:1) at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1519) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078) at org.apache.solr.core.SolrCore.update(SolrCore.java:642) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:744) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Unknown Source) Yonik Seeley wrote: On 6/20/06, Kerry Wilson <[EMAIL PROTECTED]> wrote: I am getting the following error when trying to post any document. Hi Kerry, could you provide an example document that show this? -Yonik -- Kerry Wilson Lead Developer Williams Web [EMAIL PROTECTED] | 423.485.4747
Re: Error posting document
Yeah, Solr currently only accepts an HTTP-POST with an XML document as the post body. That's not what you will get with a browser-based form data / file post. -Yonik On 6/20/06, Kerry Wilson <[EMAIL PROTECTED]> wrote: It is happening on all documents but just for fun here is one I have used: Post to SOLR http://localhost:8080/solr/update"; method="POST"> It happens to be the form I am submitting it from, here is the output: org.xmlpull.v1.XmlPullParserException: only whitespace content allowed before start tag and not = (position: START_DOCUMENT seen =... @1:1) at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1519) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078) at org.apache.solr.core.SolrCore.update(SolrCore.java:642) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:744) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Unknown Source) Yonik Seeley wrote: > On 6/20/06, Kerry Wilson <[EMAIL PROTECTED]> wrote: >> I am getting the following error when trying to post any document. > > Hi Kerry, could you provide an example document that show this? > > -Yonik
Re: Invalid XML returned from Solr
Hi Yonik, Thanks again for the quick help. I switched to Tomcat and all the problems went away. Not sure what the process would be but I'd be willing to migrate the example application to tomcat and update the existing documentation. I would like to give back to this project as it has done quite a bit for me. --Mike On 6/20/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: I've confirmed this is a Jetty bug related to international chars (>=128) and their output writer. When I moved the example to Tomcat 5.5, everything worked as expected. For the exact same Lucene index file, Tomcat outputs I¹lland Jetty outputs I¹ll We should really look into switching the appserver we bundle for the example. -Yonik
Re: Error posting document
: Yeah, Solr currently only accepts an HTTP-POST with an XML document as : the post body. That's not what you will get with a browser-based form : data / file post. Also, if i'm reading this comment correctly... : > It happens to be the form I am submitting it from, here is the output: ...then the document you are trying to submit to Solr *is* the same HTML form you listed. Solr XML Documents have a specific format with information about what the field/values are that you want Solr to index -- it doesn't parse raw text or HTML documents, This Wiki has more information on the specifics... http://wiki.apache.org/solr/UpdateXmlMessages ...as well as links to several examplesdocs... https://svn.apache.org/repos/asf/incubator/solr/trunk/example/exampledocs/ -Hoss
Re: newbie Q regarding schema configuration
thanks for the input Chris (and Yonik) i'm not sure lucene is the best answer for what I want to do ;( regards Ian On 20/06/2006, at 5:58 PM, Chris Hostetter wrote: : so.. my first question in schema.xml, can you have a composite key as : the 'uniquekey' field, or do i need to do this on the client side? at the moment this would need to be done client site, but you're not the first person to ask so i've added it to the TaskList ... it doesn't seem like it would be too hard. : can you have complex types which are multivalued? : I'd like to store something like : a tag-name with a corresponding tag-weighting. There's nothing like that built into Solr - the best way to model that would probably be to use the term frequency to represent the weight - you could have an analyzer that parsed input like... "blue state"^2 "democrat"^1 "john kerry"^5 ...and converted it into a stream of tokens like... [blue state] [blue state] [democrat] [john kerry] [john kerry]... ..kind of kludgy, but that's the best mechanism Lucene has at the moment (there are plans to add more generic term attributes, but that's still currently a design thing) : can you do sum(*) type queries in lucene/solr? it is efficient ? or : are you better having a 2nd index which has these sum(*) values in it : and keep it up to date instead. sum's across multiple documents, or sums of values in a single document? in the later case, you don't need a seperate index, just another field. in the former case it's really a question of what sets of documents you want sums across? .. if it's all of them then you could just store that info in a flat file, or a special metadata document in your index .. if what you want is more of a run-time calculation then you can certainly do it in a custom request handler (and you can use a SolrCache and a custom CacheRegnirator to make sure the values are cached for as long as the searcher is open, and autowarmed when a new one is opened). Generally the best way to do math operations on sets of documents in Lucene is using the FieldCache, and this is certainly available to Lucene request handlers. -Hoss
Duplicates in MultiValued fields
Is there anyway to not allow duplicates inside of a mutivalued field using Solr? Thanks, Mike
Re: Duplicates in MultiValued fields
On 6/21/06, Mike Richmond <[EMAIL PROTECTED]> wrote: Is there anyway to not allow duplicates inside of a mutivalued field using Solr? Not currently. Do you want uniqueness within a single document, or across many documents? If it's within a single document, this is probably easiest to implement in the client sending the document. -Yonik