Re: detecting duplicates using the field type 'text'
: id : document_title : whoa... that's a pretty out there usecase ... i don't think i've ever seen someone use their uniqueKey field as the target of a copyField. off the top of my head, i suspect maybe the copy field is taking place after the duplicate detection? ... but i'm not sure... : When I add a document with a duplicate title (numeric only), it does not : get duplicated ...and now i'm *really* not sure, that doens't make much sense to me at all. : I can ensure duplicates DO NOT get added when using the field type : 'string'. hmm... could you perhaps add the value directly to your "id" field (string) and then copyField it into document_title ? based one what youv'e said, thta should work -- although i would agree, what you describe when using your current schema definitely sounds like a bug. it would be great if you could open a Jira issue describing this problem ... it would be even better if after posting the issue you could make fixing it easier by attaching a test case. :) -Hoss
highlight exception
I have thousands of docs in my solr instance. The following doc (maybe others) is causing exception everytime highlight is turned on. Best buy - Acer Aspire AS5610-2273 - $599. Windows vista, 1 GB RAM The exception is like this: java.lang.StringIndexOutOfBoundsException: String index out of range: -52 at java.lang.String.substring(String.java:1768) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:235) at org.apache.solr.util.HighlightingUtils.doHighlighting(HighlightingUtils.java:252) at org.apache.solr.request.StandardRequestHandler.handleRequest(StandardRequestHandler.java:161) at org.apache.solr.core.SolrCore.execute(SolrCore.java:587) at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) This exception only occurs when highlight is on and the above doc is in the response. So for example, these three requests all cause the exception: hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:best+buy;replies desc&start=40&rows=10 hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:acer;replies desc&start=0&rows=10 hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:vista;replies desc&start=60&rows=10 Below is the field definition for topicTitle. What's so special about the above doc? -- View this message in context: http://www.nabble.com/highlight-exception-tf3234528.html#a8987980 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tagging
On 2/15/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: One way around this is to get support for ParallelReader (I believe ParallelWriter is still in JIRA, contributed by Chuck) into Solr. http://lucene.apache.org/java/docs/api/org/apache/lucene/index/ParallelReader.html Then you'd keep your big fields in one index, and the frequently modified and shorter fields in another index. But I never understood how you'd keep doc IDs in sync between the two, which is something that ParallelReader requires. Aye, that's the rub. ParallelReader keeps popping into my head too, but then I think about what it takes to keep those id's in sync, and it seems like everything needs to be re-indexed in the smaller index on a change to that index. It doesn't seem easy or fast/scalable. I'd love to know what Chuck is doing with this stuff. -Yonik
Re: Tagging
On Feb 15, 2007, at 2:55 AM, Otis Gospodnetic wrote: Then you'd keep your big fields in one index, and the frequently modified and shorter fields in another index. But I never understood how you'd keep doc IDs in sync between the two, which is something that ParallelReader requires. I've never understood that either. I'd love to hear more about how folks use it. Doug elaborated on it once, but *woosh* over my head. :) Erik
Re: highlight exception
Thanks for the report Nick, could you open a JIRA bug for this? Thanks, -Yonik On 2/15/07, nick19701 <[EMAIL PROTECTED]> wrote: I have thousands of docs in my solr instance. The following doc (maybe others) is causing exception everytime highlight is turned on. Best buy - Acer Aspire AS5610-2273 - $599. Windows vista, 1 GB RAM The exception is like this: java.lang.StringIndexOutOfBoundsException: String index out of range: -52 at java.lang.String.substring(String.java:1768) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:235) at org.apache.solr.util.HighlightingUtils.doHighlighting(HighlightingUtils.java:252) at org.apache.solr.request.StandardRequestHandler.handleRequest(StandardRequestHandler.java:161) at org.apache.solr.core.SolrCore.execute(SolrCore.java:587) at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) This exception only occurs when highlight is on and the above doc is in the response. So for example, these three requests all cause the exception: hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:best+buy;replies desc&start=40&rows=10 hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:acer;replies desc&start=0&rows=10 hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:vista;replies desc&start=60&rows=10 Below is the field definition for topicTitle. What's so special about the above doc? -- View this message in context: http://www.nabble.com/highlight-exception-tf3234528.html#a8987980 Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlight exception
On 2/15/07, nick19701 <[EMAIL PROTECTED]> wrote: Best buy - Acer Aspire AS5610-2273 - $599. Windows vista, 1 GB RAM Doesn't look particularly out of the ordinary. The exception is like this: java.lang.StringIndexOutOfBoundsException: String index out of range: -52 at java.lang.String.substring(String.java:1768) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:235) at org.apache.solr.util.HighlightingUtils.doHighlighting(HighlightingUtils.java:252) at Corresponds to: startOffset = tokenGroup.matchStartOffset; endOffset = tokenGroup.matchEndOffset; tokenText = text.substring(startOffset, endOffset); where the offsets are token offsets from analysis, and should not be -52. Are you using term vectors? Is the field multi-valued? Also, what version of Solr are you using? Could you c&p the output of verbose analysis of this text in the solr admin? thanks, -Mike
Re: Tagging
I explicitly asked on java-user once, "Hey, how does/can this thing workblah blah", but got no responses. As far as I know, Chuck is the only ParallelReader users. :) Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Erik Hatcher <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, February 16, 2007 12:49:08 AM Subject: Re: Tagging On Feb 15, 2007, at 2:55 AM, Otis Gospodnetic wrote: > Then you'd keep your big fields in one index, and the frequently > modified and shorter fields in another index. But I never > understood how you'd keep doc IDs in sync between the two, which is > something that ParallelReader requires. I've never understood that either. I'd love to hear more about how folks use it. Doug elaborated on it once, but *woosh* over my head. :) Erik
Range search problem on float values
Hi all, I'm having a problem doing a range search on float values. The field types for longitude and latitude were text, then I changed to float to give it a try but I'm still having problems. The correct search string would be: latitude:[32.71852 TO 32.792765] AND longitude:[-117.159316 TO -116.966504] which doesn't work but if I invert the longitude: latitude:[32.71852 TO 32.792765] AND longitude:[-116.966504 TO -117.159316] it works fine, but isn't the correct way of doing it Any thoughts? Thanks. _ The average US Credit Score is 675. The cost to see yours: $0 by Experian. http://www.freecreditreport.com/pm/default.aspx?sc=660600&bcd=EMAILFOOTERAVERAGE
Re: highlight exception
Mike Klaas wrote: > > Corresponds to: > startOffset = > tokenGroup.matchStartOffset; > endOffset = > tokenGroup.matchEndOffset; > tokenText = > text.substring(startOffset, endOffset); > > where the offsets are token offsets from analysis, and should not be > -52. Are you using term vectors? Is the field multi-valued? Also, > what version of Solr are you using? > > Could you c&p the output of verbose analysis of this text in the solr > admin? > > thanks, > -Mike > > As far as I know, I'm not using term vectors and this field is single-valued. Solr version is 1.1.0 dated on 12/17/2006. Below is the verbose analysis: Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 term text Bestbuy - AcerAspire AS5610-2273 - $599. Windows vista, 1 GB RAM term type wordwordwordwordwordwordwordwordwordword wordwordword source start,end 0,4 5,8 9,1011,15 16,22 23,34 35,36 37,42 43,50 51,57 58,59 60,62 63,66 org.apache.solr.analysis.SynonymFilterFactory {expand=true, ignoreCase=true, synonyms=index_synonyms.txt} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 term text bestbuy buy - AcerAspire AS5610-2273 - $599. Windows vista, 1 GB RAM bb gib bestgigabyte gigabytes term type wordwordwordwordwordwordwordwordwordword wordwordword wordword wordword word source start,end 0,8 0,8 9,1011,15 16,22 23,34 35,36 37,42 43,50 51,57 58,59 60,863,66 0,8 60,8 0,8 60,8 60,8 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 term text bestbuy buy - AcerAspire AS5610-2273 - $599. Windows vista, 1 GB RAM bb gib bestgigabyte gigabytes term type wordwordwordwordwordwordwordwordwordword wordwordword wordword wordword word source start,end 0,8 0,8 9,1011,15 16,22 23,34 35,36 37,42 43,50 51,57 58,59 60,863,66 0,8 60,8 0,8 60,8 60,8 org.apache.solr.analysis.WordDelimiterFilterFactory {catenateWords=1, catenateNumbers=1, catenateAll=0, generateNumberParts=1, generateWordParts=1} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 term text bestbuy buy AcerAspire AS 56102273599 Windows vista 1 GB RAM bb 56102273gib bestgigabyte gigabytes term type wordwordwordwordwordwordwordwordwordword wordwordword wordwordword wordword word source start,end 0,8 0,8 11,15 16,22 23,25 25,29 30,34 38,41 43,50 51,56 58,59 60,863,66 0,8 25,34 60,8 0,8 60,8 60,8 org.apache.solr.analysis.LowerCaseFilterFactory {} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 term text bestbuy buy aceraspire as 56102273599 windows vista 1 gb ram bb 56102273gib bestgigabyte gigabytes term type wordwordwordwordwordwordwordwordwordword wordwordword wordwordword wordword word source start,end 0,8 0,8 11,15 16,22 23,25 25,29 30,34 38,41 43,50 51,56 58,59 60,863,66 0,8 25,34 60,8 0,8 60,8 60,8 org.apache.solr.analysis.EnglishPorterFilterFactory {protected=protwords.txt} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 term text bestbuy buy aceraspir as 56102273599 window vista 1 gb ram bb 56102273gib bestgigabyt gigabyt term type wordwordwordwordwordwordwordwordwordword wordwordword wordwordword wordword word source start,end 0,8 0,8 11,15 16,22 23,25 25,29 30,34 38,41 43,50 51,56 58,59 60,863,66 0,8 25,34 60,8 0,8 60,8 60,8 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 term text bestbuy buy aceraspir
Re: Range search problem on float values
On 2/15/07, Peter McPeterson <[EMAIL PROTECTED]> wrote: Hi all, I'm having a problem doing a range search on float values. The field types for longitude and latitude were text, then I changed to float to give it a try but I'm still having problems. The correct search string would be: latitude:[32.71852 TO 32.792765] AND longitude:[-117.159316 TO -116.966504] which doesn't work Did you re-index after you changed the field type? If you compare the field values as text, -116 comes before -117 -Yonik
Re: Range search problem on float values
: Hi all, I'm having a problem doing a range search on float values. The field : types for longitude and latitude were text, then I changed to float to give : it a try but I'm still having problems. are you using "float" or "sfloat" ... float stores floats, but doesn't so the super special magic sauce that makes them sort properly (which is neccessary for doing range queries) -Hoss
Re: Range search problem on float values
Yonik, yes, I did re-index the data after changing the field type. And Chris, yes, I am using float. Any other thoughts on what could be causing it to behave this way? So weird behaviour. Thanks. From: Chris Hostetter <[EMAIL PROTECTED]> Reply-To: solr-user@lucene.apache.org To: solr-user@lucene.apache.org Subject: Re: Range search problem on float values Date: Thu, 15 Feb 2007 21:02:05 -0800 (PST) : Hi all, I'm having a problem doing a range search on float values. The field : types for longitude and latitude were text, then I changed to float to give : it a try but I'm still having problems. are you using "float" or "sfloat" ... float stores floats, but doesn't so the super special magic sauce that makes them sort properly (which is neccessary for doing range queries) -Hoss _ Play Flexicon: the crossword game that feeds your brain. PLAY now for FREE. http://zone.msn.com/en/flexicon/default.htm?icid=flexicon_hmtagline
Re: Range search problem on float values
On 2/16/07, Peter McPeterson <[EMAIL PROTECTED]> wrote: Yonik, yes, I did re-index the data after changing the field type. And Chris, yes, I am using float. Ah ha... the comments in the example schema say it all. You need sfloat if you need range queries. Slightly confusing to have these different numeric types, I know... it's because lucene sort of had a type of float for sorting purposes. -Yonik
Re: Range search problem on float values
Ah ha! Awesome thanks. From: "Yonik Seeley" <[EMAIL PROTECTED]> Reply-To: solr-user@lucene.apache.org To: solr-user@lucene.apache.org Subject: Re: Range search problem on float values Date: Fri, 16 Feb 2007 01:01:33 -0500 On 2/16/07, Peter McPeterson <[EMAIL PROTECTED]> wrote: Yonik, yes, I did re-index the data after changing the field type. And Chris, yes, I am using float. Ah ha... the comments in the example schema say it all. You need sfloat if you need range queries. Slightly confusing to have these different numeric types, I know... it's because lucene sort of had a type of float for sorting purposes. sortMissingLast="true" omitNorms="true"/> sortMissingLast="true " omitNorms="true"/> sortMissingLast="tr ue" omitNorms="true"/> sortMissingLast=" true" omitNorms="true"/> -Yonik _ Mortgage rates as low as 4.625% - Refinance $150,000 loan for $579 a month. Intro*Terms https://www2.nextag.com/goto.jsp?product=10035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f6&disc=y&vers=743&s=4056&p=5117