Re: detecting duplicates using the field type 'text'

2007-02-15 Thread Chris Hostetter
:  id
:  document_title
:  

whoa... that's a pretty out there usecase ... i don't think i've ever seen
someone use their uniqueKey field as the target of a copyField.

off the top of my head, i suspect maybe the copy field is taking place
after the duplicate detection? ... but i'm not sure...

: When I add a document with a duplicate title (numeric only), it does not
: get duplicated

...and now i'm *really* not sure, that doens't make much sense to me at
all.

: I can ensure duplicates DO NOT get added when using the field type
: 'string'.

hmm... could you perhaps add the value directly to your "id" field
(string) and then copyField it into document_title ?  based one what
youv'e said, thta should work -- although i would agree, what you describe
when using your current schema definitely sounds like a bug.

it would be great if you could open a Jira issue describing this problem
... it would be even better if after posting the issue you could
make fixing it easier by attaching a test case. :)



-Hoss



highlight exception

2007-02-15 Thread nick19701

I have thousands of docs in my solr instance. 
The following doc (maybe others) is causing exception everytime
highlight is turned on. 



Best buy - Acer Aspire AS5610-2273 - $599. Windows vista, 1 GB RAM



The exception is like this:

java.lang.StringIndexOutOfBoundsException: String index out of range: -52
at java.lang.String.substring(String.java:1768)
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:235)
at
org.apache.solr.util.HighlightingUtils.doHighlighting(HighlightingUtils.java:252)
at
org.apache.solr.request.StandardRequestHandler.handleRequest(StandardRequestHandler.java:161)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:587)
at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)

This exception only occurs when highlight is on and the above doc is in the
response.
So for example, these three requests all cause the exception:

hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:best+buy;replies
desc&start=40&rows=10

hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:acer;replies
desc&start=0&rows=10

hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:vista;replies
desc&start=60&rows=10


Below is the field definition for topicTitle. What's so special about the
above doc?


 
  

 







  
  







  



-- 
View this message in context: 
http://www.nabble.com/highlight-exception-tf3234528.html#a8987980
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tagging

2007-02-15 Thread Yonik Seeley

On 2/15/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

One way around this is to get support for ParallelReader (I believe 
ParallelWriter is still in JIRA, contributed by Chuck) into Solr.
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/ParallelReader.html

Then you'd keep your big fields in one index, and the frequently modified and 
shorter fields in another index.  But I never understood how you'd keep doc IDs 
in sync between the two, which is something that ParallelReader requires.


Aye, that's the rub.

ParallelReader keeps popping into my head too, but then I think about
what it takes to keep those id's in sync, and it seems like everything
needs to be re-indexed in the smaller index on a change to that index.
It doesn't seem easy or fast/scalable.  I'd love to know what Chuck
is doing with this stuff.

-Yonik


Re: Tagging

2007-02-15 Thread Erik Hatcher


On Feb 15, 2007, at 2:55 AM, Otis Gospodnetic wrote:
Then you'd keep your big fields in one index, and the frequently  
modified and shorter fields in another index.  But I never  
understood how you'd keep doc IDs in sync between the two, which is  
something that ParallelReader requires.


I've never understood that either.  I'd love to hear more about how  
folks use it.  Doug elaborated on it once, but *woosh* over my head. :)


Erik



Re: highlight exception

2007-02-15 Thread Yonik Seeley

Thanks for the report Nick,
could you open a JIRA bug for this?
Thanks,
-Yonik

On 2/15/07, nick19701 <[EMAIL PROTECTED]> wrote:


I have thousands of docs in my solr instance.
The following doc (maybe others) is causing exception everytime
highlight is turned on.



Best buy - Acer Aspire AS5610-2273 - $599. Windows vista, 1 GB RAM



The exception is like this:

java.lang.StringIndexOutOfBoundsException: String index out of range: -52
at java.lang.String.substring(String.java:1768)
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:235)
at
org.apache.solr.util.HighlightingUtils.doHighlighting(HighlightingUtils.java:252)
at
org.apache.solr.request.StandardRequestHandler.handleRequest(StandardRequestHandler.java:161)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:587)
at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)

This exception only occurs when highlight is on and the above doc is in the
response.
So for example, these three requests all cause the exception:

hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:best+buy;replies
desc&start=40&rows=10

hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:acer;replies
desc&start=0&rows=10

hl=on&hl.fl=topicTitle&hl.fragsize=0&hl.simple.pre=&hl.simple.post=&q=topicTitle:vista;replies
desc&start=60&rows=10


Below is the field definition for topicTitle. What's so special about the
above doc?


 
  

 







  
  







  



--
View this message in context: 
http://www.nabble.com/highlight-exception-tf3234528.html#a8987980
Sent from the Solr - User mailing list archive at Nabble.com.




Re: highlight exception

2007-02-15 Thread Mike Klaas

On 2/15/07, nick19701 <[EMAIL PROTECTED]> wrote:




Best buy - Acer Aspire AS5610-2273 - $599. Windows vista, 1 GB RAM




Doesn't look particularly out of the ordinary.


The exception is like this:

java.lang.StringIndexOutOfBoundsException: String index out of range: -52
at java.lang.String.substring(String.java:1768)
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:235)
at
org.apache.solr.util.HighlightingUtils.doHighlighting(HighlightingUtils.java:252)
at


Corresponds to:
   startOffset =
tokenGroup.matchStartOffset;
   endOffset = tokenGroup.matchEndOffset;
   tokenText =
text.substring(startOffset, endOffset);

where the offsets are token offsets from analysis, and should not be
-52.  Are you using term vectors?  Is the field multi-valued?  Also,
what version of Solr are you using?

Could you c&p the output of verbose analysis of this text in the solr admin?

thanks,
-Mike


Re: Tagging

2007-02-15 Thread Otis Gospodnetic
I explicitly asked on java-user once, "Hey, how does/can this thing 
workblah blah", but got no responses.  As far as I know, Chuck is the only 
ParallelReader users. :)

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Erik Hatcher <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, February 16, 2007 12:49:08 AM
Subject: Re: Tagging


On Feb 15, 2007, at 2:55 AM, Otis Gospodnetic wrote:
> Then you'd keep your big fields in one index, and the frequently  
> modified and shorter fields in another index.  But I never  
> understood how you'd keep doc IDs in sync between the two, which is  
> something that ParallelReader requires.

I've never understood that either.  I'd love to hear more about how  
folks use it.  Doug elaborated on it once, but *woosh* over my head. :)

Erik






Range search problem on float values

2007-02-15 Thread Peter McPeterson
Hi all, I'm having a problem doing a range search on float values. The field 
types for longitude and latitude were text, then I changed to float to give 
it a try but I'm still having problems.


The correct search string would be:
latitude:[32.71852 TO 32.792765] AND longitude:[-117.159316 TO -116.966504]
which doesn't work

but if I invert the longitude:
latitude:[32.71852 TO 32.792765] AND longitude:[-116.966504 TO -117.159316]
it works fine, but isn't the correct way of doing it

Any thoughts?

Thanks.

_
The average US Credit Score is 675. The cost to see yours: $0 by Experian. 
http://www.freecreditreport.com/pm/default.aspx?sc=660600&bcd=EMAILFOOTERAVERAGE




Re: highlight exception

2007-02-15 Thread nick19701


Mike Klaas wrote:
> 
> Corresponds to:
> startOffset =
> tokenGroup.matchStartOffset;
> endOffset =
> tokenGroup.matchEndOffset;
> tokenText =
> text.substring(startOffset, endOffset);
> 
> where the offsets are token offsets from analysis, and should not be
> -52.  Are you using term vectors?  Is the field multi-valued?  Also,
> what version of Solr are you using?
> 
> Could you c&p the output of verbose analysis of this text in the solr
> admin?
> 
> thanks,
> -Mike
> 
> 

As far as I know, I'm not using term vectors and this field is
single-valued.
Solr version is 1.1.0 dated on 12/17/2006.

Below is the verbose analysis:

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory   {}


term position
1   2   3   4   5   6   7   8   9   10  
11  12  13


term text
Bestbuy -   AcerAspire  AS5610-2273 -   $599.   Windows 
vista,  1   GB  RAM


term type
wordwordwordwordwordwordwordwordwordword
wordwordword


source start,end
0,4 5,8 9,1011,15   16,22   23,34   35,36   37,42   43,50   51,57   
58,59   60,62   63,66


org.apache.solr.analysis.SynonymFilterFactory   {expand=true,
ignoreCase=true, synonyms=index_synonyms.txt}


term position
1   2   3   4   5   6   7   8   9   10  
11  12  13


term text
bestbuy buy -   AcerAspire  AS5610-2273 -   $599.   Windows 
vista,  1   GB  RAM


bb  gib

bestgigabyte

gigabytes

term type
wordwordwordwordwordwordwordwordwordword
wordwordword


wordword

wordword

word

source start,end
0,8 0,8 9,1011,15   16,22   23,34   35,36   37,42   43,50   51,57   
58,59   60,863,66


0,8 60,8

0,8 60,8

60,8

org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt,
ignoreCase=true}


term position

1   2   3   4   5   6   7   8   9   10  
11  12  13

term text

bestbuy buy -   AcerAspire  AS5610-2273 -   $599.   Windows 
vista,  1   GB  RAM

bb  gib


bestgigabyte

gigabytes

term type
wordwordwordwordwordwordwordwordwordword
wordwordword


wordword

wordword

word

source start,end
0,8 0,8 9,1011,15   16,22   23,34   35,36   37,42   43,50   51,57   
58,59   60,863,66


0,8 60,8

0,8 60,8

60,8

org.apache.solr.analysis.WordDelimiterFilterFactory   {catenateWords=1,
catenateNumbers=1, catenateAll=0, generateNumberParts=1,
generateWordParts=1}


term position

1   2   3   4   5   6   7   8   9   10  
11  12  13

term text

bestbuy buy AcerAspire  AS  56102273599 Windows vista   
1   GB  RAM

bb  56102273gib


bestgigabyte

gigabytes

term type
wordwordwordwordwordwordwordwordwordword
wordwordword


wordwordword

wordword

word

source start,end
0,8 0,8 11,15   16,22   23,25   25,29   30,34   38,41   43,50   51,56   
58,59   60,863,66


0,8 25,34   60,8

0,8 60,8

60,8

org.apache.solr.analysis.LowerCaseFilterFactory   {}



term position
1   2   3   4   5   6   7   8   9   10  
11  12  13


term text
bestbuy buy aceraspire  as  56102273599 windows vista   
1   gb  ram


bb  56102273gib

bestgigabyte

gigabytes

term type
wordwordwordwordwordwordwordwordwordword
wordwordword


wordwordword

wordword

word

source start,end
0,8 0,8 11,15   16,22   23,25   25,29   30,34   38,41   43,50   51,56   
58,59   60,863,66


0,8 25,34   60,8

0,8 60,8

60,8

org.apache.solr.analysis.EnglishPorterFilterFactory  
{protected=protwords.txt}



term position
1   2   3   4   5   6   7   8   9   10  
11  12  13


term text
bestbuy buy aceraspir   as  56102273599 window  vista   
1   gb  ram


bb  56102273gib

bestgigabyt

gigabyt

term type
wordwordwordwordwordwordwordwordwordword
wordwordword


wordwordword

wordword

word

source start,end
0,8 0,8 11,15   16,22   23,25   25,29   30,34   38,41   43,50   51,56   
58,59   60,863,66


0,8 25,34   60,8

0,8 60,8

60,8

org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}



term position
1   2   3   4   5   6   7   8   9   10  
11  12  13


term text
bestbuy buy aceraspir

Re: Range search problem on float values

2007-02-15 Thread Yonik Seeley

On 2/15/07, Peter McPeterson <[EMAIL PROTECTED]> wrote:

Hi all, I'm having a problem doing a range search on float values. The field
types for longitude and latitude were text, then I changed to float to give
it a try but I'm still having problems.

The correct search string would be:
latitude:[32.71852 TO 32.792765] AND longitude:[-117.159316 TO -116.966504]
which doesn't work


Did you re-index after you changed the field type?
If you compare the field values as text, -116 comes before -117

-Yonik


Re: Range search problem on float values

2007-02-15 Thread Chris Hostetter
: Hi all, I'm having a problem doing a range search on float values. The field
: types for longitude and latitude were text, then I changed to float to give
: it a try but I'm still having problems.

are you using "float" or "sfloat" ... float stores floats, but doesn't so
the super special magic sauce that makes them sort properly (which is
neccessary for doing range queries)



-Hoss



Re: Range search problem on float values

2007-02-15 Thread Peter McPeterson
Yonik, yes, I did re-index the data after changing the field type. And 
Chris, yes, I am using float.


Any other thoughts on what could be causing it to behave this way? So weird 
behaviour.


Thanks.



From: Chris Hostetter <[EMAIL PROTECTED]>
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Range search problem on float values
Date: Thu, 15 Feb 2007 21:02:05 -0800 (PST)

: Hi all, I'm having a problem doing a range search on float values. The 
field
: types for longitude and latitude were text, then I changed to float to 
give

: it a try but I'm still having problems.

are you using "float" or "sfloat" ... float stores floats, but doesn't so
the super special magic sauce that makes them sort properly (which is
neccessary for doing range queries)



-Hoss



_
Play Flexicon: the crossword game that feeds your brain. PLAY now for FREE.  
 http://zone.msn.com/en/flexicon/default.htm?icid=flexicon_hmtagline




Re: Range search problem on float values

2007-02-15 Thread Yonik Seeley

On 2/16/07, Peter McPeterson <[EMAIL PROTECTED]> wrote:

Yonik, yes, I did re-index the data after changing the field type. And
Chris, yes, I am using float.


Ah ha... the comments in the example schema say it all.  You need
sfloat if you need range queries.  Slightly confusing to have these
different numeric types, I know... it's because lucene sort of had a
type of float for sorting purposes.

   
   
   
   
   


   
   
   
   
   

-Yonik


Re: Range search problem on float values

2007-02-15 Thread Peter McPeterson

Ah ha! Awesome thanks.



From: "Yonik Seeley" <[EMAIL PROTECTED]>
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Range search problem on float values
Date: Fri, 16 Feb 2007 01:01:33 -0500

On 2/16/07, Peter McPeterson <[EMAIL PROTECTED]> wrote:

Yonik, yes, I did re-index the data after changing the field type. And
Chris, yes, I am using float.


Ah ha... the comments in the example schema say it all.  You need
sfloat if you need range queries.  Slightly confusing to have these
different numeric types, I know... it's because lucene sort of had a
type of float for sorting purposes.

   
   
   
   
   


   
   sortMissingLast="true"

omitNorms="true"/>
   sortMissingLast="true

" omitNorms="true"/>
   sortMissingLast="tr

ue" omitNorms="true"/>
   sortMissingLast="

true" omitNorms="true"/>

-Yonik


_
Mortgage rates as low as 4.625% - Refinance $150,000 loan for $579 a month. 
Intro*Terms  
https://www2.nextag.com/goto.jsp?product=10035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f6&disc=y&vers=743&s=4056&p=5117