Re: How to search multiphrase or a middle term in a word???
Thanks Hoss. As you said i tried NGrimTokenizer, but still i didnt get the result. I just added in schema.xml during indexing and querying. But it didnt work. But after added this tokenizer this scenario also didnt work Index data : sweetheart,sweetHeart. Search data : Heart or heart Result : sweetHeart As i explored i didnt correctly understood the behaviour of NGrim Tokenizer. I want to know the usage and the configuration in code using NGrim Tokenizer. Please help me out. Thanks, Nithya. your "master" examples work part of the time because of the WordDelimiterField can tell at indextime that the capital M in the middle of the word is a good place to split on. without hints like that at index time, the only way to do "middle of the word" searches is with wildcard type queries -- which ar really inefficent. You might want to read a it about N-Grams and consider using an NGramTokenizer to chunk up your input words into ngrams for easy searching on pieces of words. : -- View this message in context: http://www.nabble.com/How-to-search-multiphrase-or-a-middle-term-in-a-wordtp15484754p15850225.html Sent from the Solr - User mailing list archive at Nabble.com.
JSONRequestWriter
We're using localsolr and the RubyResponseWriter. When we do a request with the localsolr component in our requestHandler we're seeing issues with the display of a multivalued field when it only has one value. 'class'=>['showtime']'showtime', <-- 'genre'=>['Drama', 'Suspsense/Triller'], With no localsolr component it works fine. Looks like the issue is with the JSONRequestWriter.writeSolrDocument(). Here's the small patch for it that seems to fix it. Index: src/java/org/apache/solr/request/JSONResponseWriter.java === --- src/java/org/apache/solr/request/JSONResponseWriter.java(revision 614955) +++ src/java/org/apache/solr/request/JSONResponseWriter.java(working copy) @@ -416,7 +416,7 @@ writeVal(fname, val); writeArrayCloser(); } -writeVal(fname, val); +else writeVal(fname, val); } if (pseudoFields !=null && pseudoFields.size()>0) { We're running solr trunk r614955 (Jan 23rd), and r75 of localsolr. Result snippet with the patch: 'class'=>['showtime'], 'genre'=>['Drama', 'Suspsense/Triller'], Has anyone come across an issue like this? Is this fixed in a newer build of Solr? It looks like we'd still need this patch even in a build of the solr trunk from yesterday, but maybe not. -- Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287
Re: JSONRequestWriter
Thanks Doug, I just checked in your fix. This was a recent bug... writing of SolrDocument was recently added and is not touched by normal code paths, except for distributed search. -Yonik On Wed, Mar 5, 2008 at 9:29 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: > We're using localsolr and the RubyResponseWriter. When we do a request with > the localsolr component > in our requestHandler we're seeing issues with the display of a multivalued > field when it only has > one value. > > 'class'=>['showtime']'showtime', <-- > 'genre'=>['Drama', > 'Suspsense/Triller'], > > With no localsolr component it works fine. > > Looks like the issue is with the JSONRequestWriter.writeSolrDocument(). > Here's the small patch for > it that seems to fix it. > > Index: src/java/org/apache/solr/request/JSONResponseWriter.java > === > --- src/java/org/apache/solr/request/JSONResponseWriter.java(revision > 614955) > +++ src/java/org/apache/solr/request/JSONResponseWriter.java(working > copy) > @@ -416,7 +416,7 @@ > writeVal(fname, val); > writeArrayCloser(); > } > -writeVal(fname, val); > +else writeVal(fname, val); > } > > if (pseudoFields !=null && pseudoFields.size()>0) { > > > We're running solr trunk r614955 (Jan 23rd), and r75 of localsolr. > > Result snippet with the patch: > > 'class'=>['showtime'], > 'genre'=>['Drama', > 'Suspsense/Triller'], > > Has anyone come across an issue like this? Is this fixed in a newer build > of Solr? It looks like > we'd still need this patch even in a build of the solr trunk from yesterday, > but maybe not. > > -- > Doug Steigerwald > Software Developer > McClatchy Interactive > [EMAIL PROTECTED] > 919.861.1287 >
Re: JSONRequestWriter
Sweet. Thanks. Doug Yonik Seeley wrote: Thanks Doug, I just checked in your fix. This was a recent bug... writing of SolrDocument was recently added and is not touched by normal code paths, except for distributed search. -Yonik On Wed, Mar 5, 2008 at 9:29 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: We're using localsolr and the RubyResponseWriter. When we do a request with the localsolr component in our requestHandler we're seeing issues with the display of a multivalued field when it only has one value. 'class'=>['showtime']'showtime', <-- 'genre'=>['Drama', 'Suspsense/Triller'], With no localsolr component it works fine. Looks like the issue is with the JSONRequestWriter.writeSolrDocument(). Here's the small patch for it that seems to fix it. Index: src/java/org/apache/solr/request/JSONResponseWriter.java === --- src/java/org/apache/solr/request/JSONResponseWriter.java(revision 614955) +++ src/java/org/apache/solr/request/JSONResponseWriter.java(working copy) @@ -416,7 +416,7 @@ writeVal(fname, val); writeArrayCloser(); } -writeVal(fname, val); +else writeVal(fname, val); } if (pseudoFields !=null && pseudoFields.size()>0) { We're running solr trunk r614955 (Jan 23rd), and r75 of localsolr. Result snippet with the patch: 'class'=>['showtime'], 'genre'=>['Drama', 'Suspsense/Triller'], Has anyone come across an issue like this? Is this fixed in a newer build of Solr? It looks like we'd still need this patch even in a build of the solr trunk from yesterday, but maybe not. -- Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287
Ranking search results by content type
Hi there, What I am trying to do is get search results sorted by content type, since there is an order in which results are preferred, for example: * Topics * Postings * Replies * News in that order. The content type is encoded in a "type" field that is stored in the document. Am I right in thinking that I need custom Analyzers to accomplish this or is this something that can be done with a configuration file? Now what I am seeing is the default ranking which only analyzes the significance of the search term, which mixes up the content types. What is the best way to approach a requirement like this? At the query level, the solr level, or the Lucene level? -jim -- -- Jim Wiegand --- Home: [EMAIL PROTECTED] AIM: originaljimdandy WWW: http://www.netzingers.com Cell: 215 284 8160
Re: JSONRequestWriter
Note that we now have to add a default param to the requestHandler: explicit map collapse localsolr facet If you don't add the json.nl=map to your params, then you can't eval() what you get back in Ruby ("can't convert String into Integer"). Not sure if this can be put into the RubyResponseWriter as a default. Also not sure if this an issue with the python writer either (since I don't use python). Doug Yonik Seeley wrote: Thanks Doug, I just checked in your fix. This was a recent bug... writing of SolrDocument was recently added and is not touched by normal code paths, except for distributed search. -Yonik On Wed, Mar 5, 2008 at 9:29 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: We're using localsolr and the RubyResponseWriter. When we do a request with the localsolr component in our requestHandler we're seeing issues with the display of a multivalued field when it only has one value. 'class'=>['showtime']'showtime', <-- 'genre'=>['Drama', 'Suspsense/Triller'], With no localsolr component it works fine. Looks like the issue is with the JSONRequestWriter.writeSolrDocument(). Here's the small patch for it that seems to fix it. Index: src/java/org/apache/solr/request/JSONResponseWriter.java === --- src/java/org/apache/solr/request/JSONResponseWriter.java(revision 614955) +++ src/java/org/apache/solr/request/JSONResponseWriter.java(working copy) @@ -416,7 +416,7 @@ writeVal(fname, val); writeArrayCloser(); } -writeVal(fname, val); +else writeVal(fname, val); } if (pseudoFields !=null && pseudoFields.size()>0) { We're running solr trunk r614955 (Jan 23rd), and r75 of localsolr. Result snippet with the patch: 'class'=>['showtime'], 'genre'=>['Drama', 'Suspsense/Triller'], Has anyone come across an issue like this? Is this fixed in a newer build of Solr? It looks like we'd still need this patch even in a build of the solr trunk from yesterday, but maybe not. -- Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287
Unparseable date
Hi people I've got a date(&time] indexed with every document, defined as: multiValued="false" /> According to the schema.xml-file "The format for this date field is of the form 1995-12-31T23:59:59Z". Yet I'm getting the following error on SOME queries: Mar 5, 2008 10:32:53 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: "2008-02-12T15:02:06Z" at org.apache.solr.schema.DateField.toObject(DateField.java: 173) at org.apache.solr.schema.DateField.toObject(DateField.java:83) at org.apache.solr.update.DocumentBuilder.loadStoredFields (DocumentBuilder.java:285) at com.pjaol.search.solr.component.LocalSolrQueryComponent.luceneDocToSolrD oc(LocalSolrQueryComponent.java:403) at com.pjaol.search.solr.component.LocalSolrQueryComponent.mergeResultsDist ances(LocalSolrQueryComponent.java:363) at com.pjaol.search.solr.component.LocalSolrQueryComponent.process (LocalSolrQueryComponent.java:305) at org.apache.solr.handler.component.SearchHandler.handleRequestBody (SearchHandler.java:158) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:118) at org.apache.solr.core.SolrCore.execute(SolrCore.java:944) at org.apache.solr.servlet.SolrDispatchFilter.execute (SolrDispatchFilter.java:326) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:278) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle (ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle (SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle (SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle (ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle (WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle (ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle (HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle (HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest (HttpConnection.java:502) at org.mortbay.jetty.HttpConnection $RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable (HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle (HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run (SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run (BoundedThreadPool.java:442) Caused by: java.text.ParseException: Unparseable date: "2008-02-12T15:02:06Z" at java.text.DateFormat.parse(DateFormat.java:335) at org.apache.solr.schema.DateField.toObject(DateField.java: 170) ... 27 more Could this be because we're using 24h instead of 12h? (the example seems to imply that 24h is what should be used though) Thanks in advance! Kind regards, Daniel
Re: JSONRequestWriter
On Wed, Mar 5, 2008 at 11:25 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: > If you don't add the json.nl=map to your params, then you can't eval() what > you get back in Ruby > ("can't convert String into Integer"). Can you show what the problematic ruby output is? json.nl=map isn't the default because some things need to be ordered, and eval of a map in python & ruby looses that order. -Yonik
Re: Unparseable date
Solr does use 24 hour dates. Are you positive there are no extraneous characters at the end of your date string such as carriage returns, spaces, or tabs? I have the same format in the code I've written and have never had a date parsing problem (yet). Ryan Grange, IT Manager DollarDays International, LLC [EMAIL PROTECTED] 480-922-8155 x106 Daniel Andersson wrote: Hi people I've got a date(&time] indexed with every document, defined as: multiValued="false" /> According to the schema.xml-file "The format for this date field is of the form 1995-12-31T23:59:59Z". Yet I'm getting the following error on SOME queries: Mar 5, 2008 10:32:53 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: "2008-02-12T15:02:06Z" at org.apache.solr.schema.DateField.toObject(DateField.java:173) at org.apache.solr.schema.DateField.toObject(DateField.java:83) at org.apache.solr.update.DocumentBuilder.loadStoredFields(DocumentBuilder.java:285) at com.pjaol.search.solr.component.LocalSolrQueryComponent.luceneDocToSolrDoc(LocalSolrQueryComponent.java:403) at com.pjaol.search.solr.component.LocalSolrQueryComponent.mergeResultsDistances(LocalSolrQueryComponent.java:363) at com.pjaol.search.solr.component.LocalSolrQueryComponent.process(LocalSolrQueryComponent.java:305) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:158) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:118) at org.apache.solr.core.SolrCore.execute(SolrCore.java:944) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:326) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:278) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: java.text.ParseException: Unparseable date: "2008-02-12T15:02:06Z" at java.text.DateFormat.parse(DateFormat.java:335) at org.apache.solr.schema.DateField.toObject(DateField.java:170) ... 27 more Could this be because we're using 24h instead of 12h? (the example seems to imply that 24h is what should be used though) Thanks in advance! Kind regards, Daniel
Re: JSONRequestWriter
Sure. The default (json.nl=flat): 'response',{'numFound'=>41,'start'=>0, Adding json.nl=map makes output correct: 'response'=>{'numFound'=>41,'start'=>0, This also changes facet output (which was evaluating fine): FLAT: 'facet_counts',{ 'facet_queries'=>{}, 'facet_fields'=>{ 'movies_movie_genre_facet'=>[ 'Drama',22, 'Action/Adventure',11, 'Comedy',11, 'Suspense/Thriller',11, 'SciFi/Fantasy',5, 'Animation',4, 'Documentary',4, 'Family',3, 'Horror',3, 'Musical',2, 'Romance',2, 'Concert',1, 'War',1]}, 'facet_dates'=>{}} MAP: 'facet_counts'=>{ 'facet_queries'=>{}, 'facet_fields'=>{ 'movies_movie_genre_facet'=>{ 'Drama'=>22, 'Action/Adventure'=>11, 'Comedy'=>11, 'Suspense/Thriller'=>11, 'SciFi/Fantasy'=>5, 'Animation'=>4, 'Documentary'=>4, 'Family'=>3, 'Horror'=>3, 'Musical'=>2, 'Romance'=>2, 'Concert'=>1, 'War'=>1}}, 'facet_dates'=>{}} Doug Yonik Seeley wrote: On Wed, Mar 5, 2008 at 11:25 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: If you don't add the json.nl=map to your params, then you can't eval() what you get back in Ruby ("can't convert String into Integer"). Can you show what the problematic ruby output is? json.nl=map isn't the default because some things need to be ordered, and eval of a map in python & ruby looses that order. -Yonik
Re: JSONRequestWriter
The output you showed is indeed incorrect, but I can't reproduce that with stock solr. Here is a example of what I get: { 'responseHeader'=>{ 'status'=>0, 'QTime'=>16, 'params'=>{ 'wt'=>'ruby', 'indent'=>'true', 'q'=>'*:*', 'facet'=>'true', 'highlight'=>'true'}}, 'response'=>{'numFound'=>0,'start'=>0,'docs'=>[] }, 'facet_counts'=>{ 'facet_queries'=>{}, 'facet_fields'=>{}, 'facet_dates'=>{}}} -Yonik On Wed, Mar 5, 2008 at 12:00 PM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: > Sure. > > The default (json.nl=flat): > > 'response',{'numFound'=>41,'start'=>0, > > Adding json.nl=map makes output correct: > > 'response'=>{'numFound'=>41,'start'=>0, > > This also changes facet output (which was evaluating fine): > > FLAT: > > 'facet_counts',{ >'facet_queries'=>{}, >'facet_fields'=>{ > 'movies_movie_genre_facet'=>[ > 'Drama',22, > 'Action/Adventure',11, > 'Comedy',11, > 'Suspense/Thriller',11, > 'SciFi/Fantasy',5, > 'Animation',4, > 'Documentary',4, > 'Family',3, > 'Horror',3, > 'Musical',2, > 'Romance',2, > 'Concert',1, > 'War',1]}, >'facet_dates'=>{}} > > MAP: > > 'facet_counts'=>{ >'facet_queries'=>{}, >'facet_fields'=>{ > 'movies_movie_genre_facet'=>{ > 'Drama'=>22, > 'Action/Adventure'=>11, > 'Comedy'=>11, > 'Suspense/Thriller'=>11, > 'SciFi/Fantasy'=>5, > 'Animation'=>4, > 'Documentary'=>4, > 'Family'=>3, > 'Horror'=>3, > 'Musical'=>2, > 'Romance'=>2, > 'Concert'=>1, > 'War'=>1}}, >'facet_dates'=>{}} > > Doug > > > > Yonik Seeley wrote: > > On Wed, Mar 5, 2008 at 11:25 AM, Doug Steigerwald > > <[EMAIL PROTECTED]> wrote: > >> If you don't add the json.nl=map to your params, then you can't eval() > what you get back in Ruby > >> ("can't convert String into Integer"). > > > > Can you show what the problematic ruby output is? > > > > json.nl=map isn't the default because some things need to be ordered, > > and eval of a map in python & ruby looses that order. > > > > -Yonik >
Re: Ranking search results by content type
You could have each doctype correspond to a number that's saved in the type field, and sort by the number and then score. Or you can do what I do and when you search, just weight each type differently. My types are all just one letter, so for instance: q=((search string) AND type:A^1) OR ((search string) AND type:B^10) OR etc etc -Reece On Wed, Mar 5, 2008 at 10:18 AM, James Wiegand <[EMAIL PROTECTED]> wrote: > Hi there, > > What I am trying to do is get search results sorted by content type, since > there is an order in which results are preferred, for example: > > * Topics > * Postings > * Replies > * News > > in that order. The content type is encoded in a "type" field that is stored > in the document. > Am I right in thinking that I need custom Analyzers to accomplish this or is > this something that can be done with a configuration file? > Now what I am seeing is the default ranking which only analyzes the > significance of the search term, which mixes up the content types. > What is the best way to approach a requirement like this? At the query > level, the solr level, or the Lucene level? > > -jim > > -- > -- > Jim Wiegand > --- > Home: [EMAIL PROTECTED] > AIM: originaljimdandy > WWW: http://www.netzingers.com > Cell: 215 284 8160 >
Re: JSONRequestWriter
Looks like it's only happening when we use the LocalSolrQueryComponent from localsolr. rsp.add("response", sdoclist); sdoclist is a SolrDocumentList. Could that be causing an issue instead of it being just a DocList? Doug Yonik Seeley wrote: The output you showed is indeed incorrect, but I can't reproduce that with stock solr. Here is a example of what I get: { 'responseHeader'=>{ 'status'=>0, 'QTime'=>16, 'params'=>{ 'wt'=>'ruby', 'indent'=>'true', 'q'=>'*:*', 'facet'=>'true', 'highlight'=>'true'}}, 'response'=>{'numFound'=>0,'start'=>0,'docs'=>[] }, 'facet_counts'=>{ 'facet_queries'=>{}, 'facet_fields'=>{}, 'facet_dates'=>{}}} -Yonik On Wed, Mar 5, 2008 at 12:00 PM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: Sure. The default (json.nl=flat): 'response',{'numFound'=>41,'start'=>0, Adding json.nl=map makes output correct: 'response'=>{'numFound'=>41,'start'=>0, This also changes facet output (which was evaluating fine): FLAT: 'facet_counts',{ 'facet_queries'=>{}, 'facet_fields'=>{ 'movies_movie_genre_facet'=>[ 'Drama',22, 'Action/Adventure',11, 'Comedy',11, 'Suspense/Thriller',11, 'SciFi/Fantasy',5, 'Animation',4, 'Documentary',4, 'Family',3, 'Horror',3, 'Musical',2, 'Romance',2, 'Concert',1, 'War',1]}, 'facet_dates'=>{}} MAP: 'facet_counts'=>{ 'facet_queries'=>{}, 'facet_fields'=>{ 'movies_movie_genre_facet'=>{ 'Drama'=>22, 'Action/Adventure'=>11, 'Comedy'=>11, 'Suspense/Thriller'=>11, 'SciFi/Fantasy'=>5, 'Animation'=>4, 'Documentary'=>4, 'Family'=>3, 'Horror'=>3, 'Musical'=>2, 'Romance'=>2, 'Concert'=>1, 'War'=>1}}, 'facet_dates'=>{}} Doug Yonik Seeley wrote: > On Wed, Mar 5, 2008 at 11:25 AM, Doug Steigerwald > <[EMAIL PROTECTED]> wrote: >> If you don't add the json.nl=map to your params, then you can't eval() what you get back in Ruby >> ("can't convert String into Integer"). > > Can you show what the problematic ruby output is? > > json.nl=map isn't the default because some things need to be ordered, > and eval of a map in python & ruby looses that order. > > -Yonik
Re: JSONRequestWriter
I think the container is the issue (the type of rsp). Here is the Javadoc for SimpleOrderedMap: /** SimpleOrderedMap is a [EMAIL PROTECTED] NamedList} where access by key is more * important than maintaining order when it comes to representing the * held data in other forms, as ResponseWriters normally do. * It's normally not a good idea to repeat keys or use null keys, but this * is not enforced. If key uniqueness enforcement is desired, use a regular [EMAIL PROTECTED] Map}. * * For example, a JSON response writer may choose to write a SimpleOrderedMap * as {"foo":10,"bar":20} and may choose to write a NamedList as * ["foo",10,"bar",20]. An XML response writer may choose to render both * the same way. * * * This class does not provide efficient lookup by key, it's main purpose is * to hold data to be serialized. It aims to minimize overhead and to be * efficient at adding new elements. * */ -Yonik On Wed, Mar 5, 2008 at 1:56 PM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: > Looks like it's only happening when we use the LocalSolrQueryComponent from > localsolr. > > rsp.add("response", sdoclist); > > sdoclist is a SolrDocumentList. Could that be causing an issue instead of > it being just a DocList? > > Doug > > > > Yonik Seeley wrote: > > The output you showed is indeed incorrect, but I can't reproduce that > > with stock solr. > > Here is a example of what I get: > > > > { > > 'responseHeader'=>{ > > 'status'=>0, > > 'QTime'=>16, > > 'params'=>{ > > 'wt'=>'ruby', > > 'indent'=>'true', > > 'q'=>'*:*', > > 'facet'=>'true', > > 'highlight'=>'true'}}, > > 'response'=>{'numFound'=>0,'start'=>0,'docs'=>[] > > }, > > 'facet_counts'=>{ > > 'facet_queries'=>{}, > > 'facet_fields'=>{}, > > 'facet_dates'=>{}}} > > > > > > -Yonik > > > > On Wed, Mar 5, 2008 at 12:00 PM, Doug Steigerwald > > <[EMAIL PROTECTED]> wrote: > >> Sure. > >> > >> The default (json.nl=flat): > >> > >> 'response',{'numFound'=>41,'start'=>0, > >> > >> Adding json.nl=map makes output correct: > >> > >> 'response'=>{'numFound'=>41,'start'=>0, > >> > >> This also changes facet output (which was evaluating fine): > >> > >> FLAT: > >> > >> 'facet_counts',{ > >>'facet_queries'=>{}, > >>'facet_fields'=>{ > >> 'movies_movie_genre_facet'=>[ > >> 'Drama',22, > >> 'Action/Adventure',11, > >> 'Comedy',11, > >> 'Suspense/Thriller',11, > >> 'SciFi/Fantasy',5, > >> 'Animation',4, > >> 'Documentary',4, > >> 'Family',3, > >> 'Horror',3, > >> 'Musical',2, > >> 'Romance',2, > >> 'Concert',1, > >> 'War',1]}, > >>'facet_dates'=>{}} > >> > >> MAP: > >> > >> 'facet_counts'=>{ > >>'facet_queries'=>{}, > >>'facet_fields'=>{ > >> 'movies_movie_genre_facet'=>{ > >> 'Drama'=>22, > >> 'Action/Adventure'=>11, > >> 'Comedy'=>11, > >> 'Suspense/Thriller'=>11, > >> 'SciFi/Fantasy'=>5, > >> 'Animation'=>4, > >> 'Documentary'=>4, > >> 'Family'=>3, > >> 'Horror'=>3, > >> 'Musical'=>2, > >> 'Romance'=>2, > >> 'Concert'=>1, > >> 'War'=>1}}, > >>'facet_dates'=>{}} > >> > >> Doug > >> > >> > >> > >> Yonik Seeley wrote: > >> > On Wed, Mar 5, 2008 at 11:25 AM, Doug Steigerwald > >> > <[EMAIL PROTECTED]> wrote: > >> >> If you don't add the json.nl=map to your params, then you can't > eval() what you get back in Ruby > >> >> ("can't convert String into Integer"). > >> > > >> > Can you show what the problematic ruby output is? > >> > > >> > json.nl=map isn't the default because some things need to be ordered, > >> > and eval of a map in python & ruby looses that order. > >> > > >> > -Yonik > >> >
Re: Random search result
There is a builtin fieldType to support random results. Every time you want a different random ordering, query a dynamic field type and change the name (increase a counter, or simply include a random number in the name). -Yonik On Tue, Mar 4, 2008 at 11:49 AM, Evgeniy Strokin <[EMAIL PROTECTED]> wrote: > I want to get sample from my search result. Not first 10 but 10 random > (really random, not pseudo random) documents. > For example if I run simple query like STATE:NJ no order by any field, just > the query and get 10 first documents from my result set, will it be random 10 > or pseudo random, like first 10 indexed or something like this? > > Thank you > Gene
passing params into SOLR
Hi all, I'm using solrJ to build a wrapper for ColdFusion (scripting language such as PHP). What's the best practice for passing search parameters into solr from a web app? What are the shortcomings of each approach? Currently, I'm explicitly setting the params with solrQuery.setParam("name","value") and solrQuery.addFacetField("facet","value") etc. How would I go about passing a valid query string into solr? Do I need to 'decompose' it into parameters and then set them with setParam()s, or is there a method that will take the entire URL and execute it as is? pt ???0? Paul Treszczotko Architect, Client Systems INPUT 11720 Plaza America Drive, Suite 1200 Reston, Virginia 20190 Direct: 703-707-3524; Fax 703-707-6201 This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email and any such files in error and that any use, dissemination, forwarding, printing or copying of this email and/or any such files is strictly prohibited. If you have received this email in error please immediately notify [EMAIL PROTECTED] and destroy the original message and any such files.
Re: boost ignored with wildcard queries
: > Using the StandardRequestHandler, it appears that the index boost values are : > ignored when the query has a wildcard in it. For example, if I have 2 : > 's and one has a boost of 1.0 and another has a boost of 10.0, then I : > do a search for "bob*", both records will be returned with the same score of : > 1.0. If I just do a normal search then the that has the higher boost : > has the higher score as expected. : A feature :-) : Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to : avoid getting exceptions from too many terms. Hmmm... except for the fact that the name would be even more missleading, there's really no performance related reason why ConstantScoreRangeQuery and ConstantScorePrefixQuery couldn't use the fieldNorms (if they exist) when computing the score. the "constant score" part of their names refered to not doing term expansion to find tf/idf factors ... but the doc/field/length info encoded into the norms could still be factored into the score fairly efficiently. this would be something to submit as apatch to Lucene-Java if anyone is interested. -Hoss
Re: boost ignored with wildcard queries
On Wed, Mar 5, 2008 at 4:07 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : > Using the StandardRequestHandler, it appears that the index boost > values are > > : > ignored when the query has a wildcard in it. For example, if I have 2 > : > 's and one has a boost of 1.0 and another has a boost of 10.0, > then I > : > do a search for "bob*", both records will be returned with the same > score of > : > 1.0. If I just do a normal search then the that has the higher > boost > : > has the higher score as expected. > > : A feature :-) > > : Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to > : avoid getting exceptions from too many terms. > > Hmmm... except for the fact that the name would be even more missleading, > there's really no performance related reason why ConstantScoreRangeQuery > and ConstantScorePrefixQuery couldn't use the fieldNorms (if they exist) > when computing the score. the "constant score" part of their names > refered to not doing term expansion to find tf/idf factors ... but the > doc/field/length info encoded into the norms could still be factored into > the score fairly efficiently. Doug & I talked about this a while ago. At a minimum, it would require byte[maxDoc()] to store scores in a compressed 8 bit format. It would certainly impact performance too. -Yonik
Re: boost ignored with wildcard queries
: Doug & I talked about this a while ago. At a minimum, it would require : byte[maxDoc()] to store scores in a compressed 8 bit format. It would : certainly impact performance too. Why would you have to store the scores? why not just add an optional byte[]norms param to ConstantScoreQuery, if it's null, things work as they currently do, if it's non null ConstantWeight.scorer returns a new subclass of ConstantScorer where score is implemented as... public float score() throws IOException { return theScore * normDecoder[norms[doc] & 0xFF]; } (where normDecoder is just like in TermScorer) -Hoss
Re: boost ignored with wildcard queries
On Wed, Mar 5, 2008 at 4:27 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Doug & I talked about this a while ago. At a minimum, it would require > > : byte[maxDoc()] to store scores in a compressed 8 bit format. It would > : certainly impact performance too. > > Why would you have to store the scores? why not just add an optional > byte[]norms param to ConstantScoreQuery Ah, I see what you mean. Good idea. It doesn't handle score accumulation when multiple terms hit the same doc (prefix query), and doesn't balance lengthNorm with tf, but it's a lot better than nothing and still serves to pop index-boosted docs to the top. -Yonik >, if it's null, things work as they > currently do, if it's non null ConstantWeight.scorer returns a new > subclass of ConstantScorer where score is implemented as... > > public float score() throws IOException { > return theScore * normDecoder[norms[doc] & 0xFF]; > } > > (where normDecoder is just like in TermScorer)
Re: Unparseable date
: According to the schema.xml-file "The format for this date field is of the : form 1995-12-31T23:59:59Z". : : Yet I'm getting the following error on SOME queries: : : Mar 5, 2008 10:32:53 AM org.apache.solr.common.SolrException log : SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable : date: "2008-02-12T15:02:06Z" : at org.apache.solr.schema.DateField.toObject(DateField.java:173) : at org.apache.solr.schema.DateField.toObject(DateField.java:83) : at org.apache.solr.update.DocumentBuilder.loadStoredFields : (DocumentBuilder.java:285) : at : com.pjaol.search.solr.component.LocalSolrQueryComponent.luceneDocToSolrD : oc(LocalSolrQueryComponent.java:403) : at : com.pjaol.search.solr.component.LocalSolrQueryComponent.mergeResultsDist : ances(LocalSolrQueryComponent.java:363) Hmmm... this seems related to SOLR-470 in the sense that it has to do with reusing the same SimpleDateParser for more things then it was ment for ... looking at the current code for DateField.toObject(Fieldable) it seems inheriently broken, attempting to parse a string right after concating 'Z' on the end even though the parser expects the Z to already be gone -- i'm not sure how this could path could *ever* work, regardless of the input. ugh. just to clarify, this stack trace doesn't look like you are actually doing a "query", it seems like it's happening during an "update" of some kind (using DocumentBuilder.loadStoredFields to populate a SolrDocument from a Document) .. can you elaborate on what you are doing here? -Hoss
Re: Unparseable date
It's stored in MySQL (datatype: datetime), then extracted and run through the following code: $date = substr($date, 0, 10) . "T" . substr($date, 11) . "Z"; If there was some odd chars at the end, I would have assumed it would have been included in the error message. SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: "2008-02-12T15:02:06Z" at org.apache.solr.schema.DateField.toObject(DateField.java: 173) Seems to imply that that's all it's getting.. Cheers, Daniel On Mar 5, 2008, at 5:40 PM, Ryan Grange wrote: Solr does use 24 hour dates. Are you positive there are no extraneous characters at the end of your date string such as carriage returns, spaces, or tabs? I have the same format in the code I've written and have never had a date parsing problem (yet). Ryan Grange, IT Manager DollarDays International, LLC [EMAIL PROTECTED] 480-922-8155 x106 Daniel Andersson wrote: Hi people I've got a date(&time] indexed with every document, defined as: multiValued="false" /> According to the schema.xml-file "The format for this date field is of the form 1995-12-31T23:59:59Z". Yet I'm getting the following error on SOME queries: Mar 5, 2008 10:32:53 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: "2008-02-12T15:02:06Z" at org.apache.solr.schema.DateField.toObject (DateField.java:173) at org.apache.solr.schema.DateField.toObject (DateField.java:83) at org.apache.solr.update.DocumentBuilder.loadStoredFields (DocumentBuilder.java:285) at com.pjaol.search.solr.component.LocalSolrQueryComponent.luceneDocToSo lrDoc(LocalSolrQueryComponent.java:403) at com.pjaol.search.solr.component.LocalSolrQueryComponent.mergeResultsD istances(LocalSolrQueryComponent.java:363) at com.pjaol.search.solr.component.LocalSolrQueryComponent.process (LocalSolrQueryComponent.java:305) at org.apache.solr.handler.component.SearchHandler.handleRequestBody (SearchHandler.java:158) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:118) at org.apache.solr.core.SolrCore.execute(SolrCore.java:944) at org.apache.solr.servlet.SolrDispatchFilter.execute (SolrDispatchFilter.java:326) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:278) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle (ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle (SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle (SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle (ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle (WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle (ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle (HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle (HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest (HttpConnection.java:502) at org.mortbay.jetty.HttpConnection $RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java: 513) at org.mortbay.jetty.HttpParser.parseAvailable (HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle (HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run (SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run (BoundedThreadPool.java:442) Caused by: java.text.ParseException: Unparseable date: "2008-02-12T15:02:06Z" at java.text.DateFormat.parse(DateFormat.java:335) at org.apache.solr.schema.DateField.toObject (DateField.java:170) ... 27 more Could this be because we're using 24h instead of 12h? (the example seems to imply that 24h is what should be used though) Thanks in advance! Kind regards, Daniel
Re: Unparseable date
: looking at the current code for DateField.toObject(Fieldable) it seems : inheriently broken, attempting to parse a string right after concating 'Z' : on the end even though the parser expects the Z to already be gone -- i'm : not sure how this could path could *ever* work, regardless of the input. Duh, i forgot a key aspect of DateFormat.parse(String): "The method may not use the entire text of the given string." ... which is why the code can work sometimes, the parser will happily ignore the 'Z' ... the exception is about the fact that there are no milliseconds (which are suppose to be optional) so this really is the exact same bug as SOLR-470 ... the problem is just more significant then i thought -- I assumed the only way to trigger this was a DateMath expression that didn't have milliseconds, but DateField.toObject(Fieldable) has the same bug so direct usage of it (by code like DocumentBuilder.loadStoredFields) will also cause this problem if the orriginal date didn't have any milliseconds. -Hoss
Re: Unparseable date
On Mar 5, 2008, at 10:46 PM, Chris Hostetter wrote: : According to the schema.xml-file "The format for this date field is of the : form 1995-12-31T23:59:59Z". : : Yet I'm getting the following error on SOME queries: : : Mar 5, 2008 10:32:53 AM org.apache.solr.common.SolrException log : SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable : date: "2008-02-12T15:02:06Z" : at org.apache.solr.schema.DateField.toObject (DateField.java:173) : at org.apache.solr.schema.DateField.toObject (DateField.java:83) : at org.apache.solr.update.DocumentBuilder.loadStoredFields : (DocumentBuilder.java:285) : at : com.pjaol.search.solr.component.LocalSolrQueryComponent.luceneDocToSol rD : oc(LocalSolrQueryComponent.java:403) : at : com.pjaol.search.solr.component.LocalSolrQueryComponent.mergeResultsDi st : ances(LocalSolrQueryComponent.java:363) Hmmm... this seems related to SOLR-470 in the sense that it has to do with reusing the same SimpleDateParser for more things then it was ment for ... looking at the current code for DateField.toObject(Fieldable) it seems inheriently broken, attempting to parse a string right after concating 'Z' on the end even though the parser expects the Z to already be gone -- i'm not sure how this could path could *ever* work, regardless of the input. ugh. just to clarify, this stack trace doesn't look like you are actually doing a "query", it seems like it's happening during an "update" of some kind (using DocumentBuilder.loadStoredFields to populate a SolrDocument from a Document) .. can you elaborate on what you are doing here? It is a query, which is run through LocalSolr/LocalLucene CC'ing them in, since it seems you're suggesting that they might be re-using something incorrectly.. ? Cheers, Daniel
Re: Unparseable date
On Mar 5, 2008, at 10:57 PM, Chris Hostetter wrote: : looking at the current code for DateField.toObject(Fieldable) it seems : inheriently broken, attempting to parse a string right after concating 'Z' : on the end even though the parser expects the Z to already be gone -- i'm : not sure how this could path could *ever* work, regardless of the input. Duh, i forgot a key aspect of DateFormat.parse(String): "The method may not use the entire text of the given string." ... which is why the code can work sometimes, the parser will happily ignore the 'Z' ... the exception is about the fact that there are no milliseconds (which are suppose to be optional) so this really is the exact same bug as SOLR-470 ... the problem is just more significant then i thought -- I assumed the only way to trigger this was a DateMath expression that didn't have milliseconds, but DateField.toObject(Fieldable) has the same bug so direct usage of it (by code like DocumentBuilder.loadStoredFields) will also cause this problem if the orriginal date didn't have any milliseconds. So if I add :00 to every time, it should be fine? ie "2008-02-12T15:02:06:00Z" instead of "2008-02-12T15:02:06Z" Cheers, Daniel
Re: Unparseable date
: So if I add :00 to every time, it should be fine? : ie : "2008-02-12T15:02:06:00Z" instead of "2008-02-12T15:02:06Z" It's ".000" not ":00" ... "2008-02-12T15:02:06.000Z" but like i said: that stack trace is odd, the time doesn't seem like it actually comes from any query params, it looks like it's coming from a previously indexed doc. To work arround this you may need to reindex all of your docs with those optional milliseconds. -Hoss
Re: Unparseable date
On Mar 5, 2008, at 11:08 PM, Chris Hostetter wrote: It's ".000" not ":00" ... "2008-02-12T15:02:06.000Z" but like i said: that stack trace is odd, the time doesn't seem like it actually comes from any query params, it looks like it's coming from a previously indexed doc. To work arround this you may need to reindex all of your docs with those optional milliseconds. Ah, re-indexing now. Thanks for your help! / d
Re: Ranking search results by content type
: Or you can do what I do and when you search, just weight each type : differently. My types are all just one letter, so for instance: : : q=((search string) AND type:A^1) OR ((search string) AND type:B^10) OR etc etc a simpler approach would be... q = +(search string) type:A^1 type:B^10 type:C^3.5 ... this is what the "bq" param was designed for by the way... q = search string bq = type:A^1 type:B^10 type:C^3.5 -Hoss
Re: passing params into SOLR
Paul Treszczotko wrote: Hi all, I'm using solrJ to build a wrapper for ColdFusion (scripting language such as PHP). What's the best practice for passing search parameters into solr from a web app? What are the shortcomings of each approach? Currently, I'm explicitly setting the params with solrQuery.setParam("name","value") and solrQuery.addFacetField("facet","value") etc. How would I go about passing a valid query string into solr? Do I need to 'decompose' it into parameters and then set them with setParam()s, or is there a method that will take the entire URL and execute it as is? I'm not sure I follow what the problem is. The solrj API takes a SolrParams argument. SolrQuery is a subclass of SolrParms that has helper functions to set the standard params. For example: public void setQuery(String query) { this.set(CommonParams.Q, query); } public String getQuery() { return this.get(CommonParams.Q); } So it is exactly equivalent to use: solrQuery.setQuery( "hello" ); or solrQuery.set( "q", "hello" ) though stylistically the former may be nicer and easier to maintain. --- Also, when you post a message to the group, start a new message rather then replying to another on -- it makes things hard to follow. http://people.apache.org/~hossman/#threadhijack ryan pt ???0? Paul Treszczotko Architect, Client Systems INPUT 11720 Plaza America Drive, Suite 1200 Reston, Virginia 20190 Direct: 703-707-3524; Fax 703-707-6201 This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email and any such files in error and that any use, dissemination, forwarding, printing or copying of this email and/or any such files is strictly prohibited. If you have received this email in error please immediately notify [EMAIL PROTECTED] and destroy the original message and any such files.
Some problem with ShowFileRequestHandler
I want to programmatically retrieve the schema and the config from the ShowFileRequestHandler. I encounter some trouble. There are CJK characters in the xml files as follows: > > 记录号 > But I get a confusing response from solr using "/admin/file/?file=schema.xml". IE and firefox both report parse errors.I try "/admin/file/?file= schema.x&contentType=text/plain" and I get the same result as follow: > > ?/uniqueKey> BTW: The xml files are encoded in UTF-8 and they work fine when I open these files locally using IE. And I set tomcat's 8080 connector "URIEncoding" argument "UTF-8" too. So is there anything missing for me? Or is it a bug? Every reply would be appreciated.