Spellchecker index rebuild error

2008-01-14 Thread Doug Steigerwald
Lately I've been having issues with the spellchecker failing to properly rebuild my spell index.  I 
used to be able to delete the spell directory and reload the core and build the index fine if it 
ever crapped out, but now I can't even build it.


java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs (No such 
file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
at 
org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:70)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167)
...

Here's the query: /solr/dsteiger/select/?q=test&qt=spellchecker&cmd=rebuild

Here's my config snippet:



1
0.5

spell
spell


Anyone have any ideas?

Doug


field:(-null) returns records where field was not specified

2008-01-14 Thread Karen Loughran


Hi all,

We are indexing different types of documents, some with certain fields set and 
some without, some fields sometimes in both.

If a particular field is missing in a newly added record, I would have 
expected the query:

field_name:(-null)

not to return this particular record in the response, ie, I'm assuming the 
field is set to null.

But the response we see includes empty docs:

..

..

 

 

 
etc, etc
..


Can someone explain why field_name:(-null) returns the records where 
field_name is missing ?

We note that if we do the range operation we can get a response without the 
records with no field_name:

field_name:[* TO *]

Many thanks
Karen


Re: field:(-null) returns records where field was not specified

2008-01-14 Thread Erick Erickson
Have you seen this page?
http://lucene.apache.org/java/docs/queryparsersyntax.html

>From that page:
Note: The NOT operator cannot be used with just one term. For example, the
following search will return no results:
NOT "jakarta apache"


Erick


On Jan 14, 2008 9:30 AM, Karen Loughran <[EMAIL PROTECTED]> wrote:

>
>
> Hi all,
>
> We are indexing different types of documents, some with certain fields set
> and
> some without, some fields sometimes in both.
>
> If a particular field is missing in a newly added record, I would have
> expected the query:
>
> field_name:(-null)
>
> not to return this particular record in the response, ie, I'm assuming the
> field is set to null.
>
> But the response we see includes empty docs:
>
> ..
> 
> ..
> 
>  
> 
>  
> 
>  
> etc, etc
> ..
> 
>
> Can someone explain why field_name:(-null) returns the records where
> field_name is missing ?
>
> We note that if we do the range operation we can get a response without
> the
> records with no field_name:
>
> field_name:[* TO *]
>
> Many thanks
> Karen
>


Re: LNS - or - "now i know we've succeeded"

2008-01-14 Thread Walter Underwood
Yes, they are reputable. They've been doing consulting with Verity,
Ultraseek, and other platforms for many years.  --wunder

On 1/12/08 1:22 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:

> It is pretty cool to see a reputable
> Search company (is ideaeng.com a reputable search consulting company?



batch indexing takes more time than shown on SOLR output --> something to do with IO?

2008-01-14 Thread Britske

I have a batch program which inserts items in a solr/lucene index. 
all is going fine and I get update messages in the console like: 

14-jan-2008 16:40:52 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498, ...(42
more)
]} 0 875

However, when timing this instruction on the client-side (I use SOlrJ -->
req.process(server)) I get totally different numbers (in the beginning the
client-side measured time is about 2 seconds on average but after some time
this time goes up to about 30-40 seconds, altough the solr-outputted time
stays between 0.8-1.3 seconds? 

Does this have anything to do with costly IO-activity that is accounted for
in the SOLR output? If this is true, what tool do you recommend using to
monitor IO-activity?

Thanks, 
Geert-Jan 
-- 
View this message in context: 
http://www.nabble.com/batch-indexing-takes-more-time-than-shown-on-SOLR-output%3E-something-to-do-with-IO--tp14804471p14804471.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: field:(-null) returns records where field was not specified

2008-01-14 Thread Karen Loughran

Hi Erik, thanks for your reply,

I had read this page.  But I'm not using the "NOT" operator,  I'm using 
the "-" operator.  I'm assuming there is a subtle difference between them in 
that NOT qualifies something else, hence needs 2 terms.  Isn't the "-" 
operator supposed to be a complement to the "+" operator, ie. excludes 
something rather than requiring it ?

thanks
Karen



On Monday 14 January 2008 15:14:05 Erick Erickson wrote:
> Have you seen this page?
> http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> From that page:
> Note: The NOT operator cannot be used with just one term. For example, the
> following search will return no results:
> NOT "jakarta apache"
>
>
> Erick
>
> On Jan 14, 2008 9:30 AM, Karen Loughran <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > We are indexing different types of documents, some with certain fields
> > set and
> > some without, some fields sometimes in both.
> >
> > If a particular field is missing in a newly added record, I would have
> > expected the query:
> >
> > field_name:(-null)
> >
> > not to return this particular record in the response, ie, I'm assuming
> > the field is set to null.
> >
> > But the response we see includes empty docs:
> >
> > ..
> > 
> > ..
> > 
> >  
> > 
> >  
> > 
> >  
> > etc, etc
> > ..
> > 
> >
> > Can someone explain why field_name:(-null) returns the records where
> > field_name is missing ?
> >
> > We note that if we do the range operation we can get a response without
> > the
> > records with no field_name:
> >
> > field_name:[* TO *]
> >
> > Many thanks
> > Karen




new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,

I am new to solr. I followed solr online tutorial to get the example
work. The search result is xml. I wonder if there is a way to show
result in a form. I saw there is example.xsl in conf/xslt directory. I
really don't know how to do it. Anyone has some ideas for me. I really
appreciate it!

Thanks,
Xiaohui 


Re: new to solr

2008-01-14 Thread Ryan McKinley

Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

Hello,

I am new to solr. 


Welcome!


I followed solr online tutorial to get the example
work. The search result is xml. I wonder if there is a way to show
result in a form. I saw there is example.xsl in conf/xslt directory. I
really don't know how to do it. Anyone has some ideas for me. I really
appreciate it!



Are you asking how to display results for people to see?  A nicely 
formatted website?


Solr (a database) does not aim to solve the display side... but there 
are lots of clients to help integrate with your website. 
php/java/.net/ruby/etc


ryan





RE: new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your reply! Please tell me what example.xsl is for in
conf/xslt.

Please let me know where the search result is located. I can use php or
.net to display the result in web. Is it created on fly?

Thanks,
Xiaohui 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: new to solr

Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
> Hello,
> 
> I am new to solr. 

Welcome!

> I followed solr online tutorial to get the example
> work. The search result is xml. I wonder if there is a way to show
> result in a form. I saw there is example.xsl in conf/xslt directory. I
> really don't know how to do it. Anyone has some ideas for me. I really
> appreciate it!
> 

Are you asking how to display results for people to see?  A nicely 
formatted website?

Solr (a database) does not aim to solve the display side... but there 
are lots of clients to help integrate with your website. 
php/java/.net/ruby/etc

ryan





Re: new to solr

2008-01-14 Thread Ryan McKinley

the example.xsl is an example using XSLT to format results.  Check:
http://wiki.apache.org/solr/XsltResponseWriter

For php, check:
http://wiki.apache.org/solr/SolPHP

ryan



Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

Thanks so much for your reply! Please tell me what example.xsl is for in
conf/xslt.

Please let me know where the search result is located. I can use php or
.net to display the result in web. Is it created on fly?

Thanks,
Xiaohui 


-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 11:37 AM

To: solr-user@lucene.apache.org
Subject: Re: new to solr

Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

Hello,

I am new to solr. 


Welcome!


I followed solr online tutorial to get the example
work. The search result is xml. I wonder if there is a way to show
result in a form. I saw there is example.xsl in conf/xslt directory. I
really don't know how to do it. Anyone has some ideas for me. I really
appreciate it!



Are you asking how to display results for people to see?  A nicely 
formatted website?


Solr (a database) does not aim to solve the display side... but there 
are lots of clients to help integrate with your website. 
php/java/.net/ruby/etc


ryan








RE: new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks very much, Ryan. I really appreciate it. I will take a look on
both.

Best regards,
Xiaohui 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: new to solr

the example.xsl is an example using XSLT to format results.  Check:
http://wiki.apache.org/solr/XsltResponseWriter

For php, check:
http://wiki.apache.org/solr/SolPHP

ryan



Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
> Thanks so much for your reply! Please tell me what example.xsl is for
in
> conf/xslt.
> 
> Please let me know where the search result is located. I can use php
or
> .net to display the result in web. Is it created on fly?
> 
> Thanks,
> Xiaohui 
> 
> -Original Message-
> From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, January 14, 2008 11:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: new to solr
> 
> Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
>> Hello,
>>
>> I am new to solr. 
> 
> Welcome!
> 
>> I followed solr online tutorial to get the example
>> work. The search result is xml. I wonder if there is a way to show
>> result in a form. I saw there is example.xsl in conf/xslt directory.
I
>> really don't know how to do it. Anyone has some ideas for me. I
really
>> appreciate it!
>>
> 
> Are you asking how to display results for people to see?  A nicely 
> formatted website?
> 
> Solr (a database) does not aim to solve the display side... but there 
> are lots of clients to help integrate with your website. 
> php/java/.net/ruby/etc
> 
> ryan
> 
> 
> 
> 



Re: new to solr

2008-01-14 Thread Stuart Sierra
On Jan 14, 2008 11:55 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> the example.xsl is an example using XSLT to format results.  Check:
> http://wiki.apache.org/solr/XsltResponseWriter

To add to the above: I think the XsltResponseWriter is not intended
for formatting results for display on your web site.  Normally you
would use your server-side language (PHP, Python, etc.) to query the
Solr server, get the results, and format them for display.  Solr
doesn't provide the "front-end" search interface for your web site --
you have to create that yourself.

-Stuart
altlaw.org


Re: Documents with One-to-many

2008-01-14 Thread Stuart Sierra
On Jan 11, 2008 10:44 AM, Evgeniy Strokin <[EMAIL PROTECTED]> wrote:
> Hello. If I need documents which has number of fields but also I have number 
> of other documents which related to the first one one-to-many. For example a 
> person, could have several addresses. I want to have all of them in search 
> result if I look for people. Also I want to search people by address.
> How it could be done in Solr?

It may be easier to perform this type of query in a relational
database.  With Solr, I think you would have to copy all of the "many"
fields into a single field in your "one" document.  So, a "person"
document would have a single "address" field containing all the
addresses for that person.

-Stuart
altlaw.org


Re: batch indexing takes more time than shown on SOLR output --> something to do with IO?

2008-01-14 Thread Otis Gospodnetic
Re monitoring IO activity iostat, vmstat, sar and such under Linux, for 
example.

Yes, Solr doesn't count how long it takes to send the response back to the 
client, so if the response is large and/or network is slow, the actual number 
is going to be higher than the number that Solr logs.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Britske <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 10:56:24 AM
Subject: batch indexing takes more time than shown on SOLR output  --> 
something to do with IO?


I have a batch program which inserts items in a solr/lucene index. 
all is going fine and I get update messages in the console like: 

14-jan-2008 16:40:52
 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498,
 ...(42
more)
]} 0 875

However, when timing this instruction on the client-side (I use SOlrJ
 -->
req.process(server)) I get totally different numbers (in the beginning
 the
client-side measured time is about 2 seconds on average but after some
 time
this time goes up to about 30-40 seconds, altough the solr-outputted
 time
stays between 0.8-1.3 seconds? 

Does this have anything to do with costly IO-activity that is accounted
 for
in the SOLR output? If this is true, what tool do you recommend using
 to
monitor IO-activity?

Thanks, 
Geert-Jan 
-- 
View this message in context:
 
http://www.nabble.com/batch-indexing-takes-more-time-than-shown-on-SOLR-output%3E-something-to-do-with-IO--tp14804471p14804471.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: Spellchecker index rebuild error

2008-01-14 Thread Otis Gospodnetic
I haven't looked at the Spellchecker in a while, but it sounds like you are 
deleting the index files manually.  Any reason for that?  Shouldn't that 
rebuild command run smoothly even with a pre-existing index there (funny that I 
ask this, considering this was my doing).

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Doug Steigerwald <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 8:31:06 AM
Subject: Spellchecker index rebuild error

Lately I've been having issues with the spellchecker failing to
 properly rebuild my spell index.  I 
used to be able to delete the spell directory and reload the core and
 build the index fine if it 
ever crapped out, but now I can't even build it.

java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs
 (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
at
 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506)
at
 org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536)
at
 org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
at
 org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:70)
at
 org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181)
at
 org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167)
...

Here's the query:
 /solr/dsteiger/select/?q=test&qt=spellchecker&cmd=rebuild

Here's my config snippet:

 
 
 1
 0.5
 
 spell
 spell
 

Anyone have any ideas?

Doug





Text Summarizer

2008-01-14 Thread Ycrux

Hi!

I'm looking for a good way to get a good "text summarizer"
for my personal search engine based Solr.

Actually, I'm using "ots" (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \
-no-references  2>/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain something 
similar

to google "text snippet" (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with "elinks" 
(the text browser)

like in the previous example.

Thanks in adavance.

cheers
Younès


MoreLikeThis similarity field boosting

2008-01-14 Thread Vladimir Garvardt

Hello.

I'm using Solr for searching our system.
Using MoreLikeThis for related content searching.
Now url used for search is like this:
http://localhost:8983/solr/mlt?q=nid:7280&mlt=true&mlt.fl=title,teaser,body&mlt.mindf=1&mlt.mintf=1&fl=nid,title,score
Where "nid" is uniqueKey and "title,teaser,body" are stored fields with 
multiValued set to "true".


The question is:
Is it possible to boost terms for one or more similarity fields?
For example I'd like something like mlt.fl=title^3,teaser^10,body - 
terms from teaser will have highest weight, then title terms and the 
lowest terms weight for body.


Thanks.


Re: Text Summarizer

2008-01-14 Thread Ycrux

Hi Otis,

Don't know really what's the name for that.

cheers
Y.

Otis Gospodnetic a écrit :

Sounds like you are looking for a highlighter/KWIC, not a summarizer?

Otis 


--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Ycrux <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 2:45:09 PM
Subject: Text Summarizer 


Hi!

I'm looking for a good way to get a good "text summarizer"
for my personal search engine based Solr.

Actually, I'm using "ots" (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \
-no-references  2>/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain something 
similar

to google "text snippet" (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with
 "elinks" 
(the text browser)

like in the previous example.

Thanks in adavance.

cheers
Younès



 

  




Re: Text Summarizer

2008-01-14 Thread Otis Gospodnetic
Sounds like you are looking for a highlighter/KWIC, not a summarizer?

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Ycrux <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 2:45:09 PM
Subject: Text Summarizer 

Hi!

I'm looking for a good way to get a good "text summarizer"
for my personal search engine based Solr.

Actually, I'm using "ots" (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \
-no-references  2>/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain something 
similar
to google "text snippet" (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with
 "elinks" 
(the text browser)
like in the previous example.

Thanks in adavance.

cheers
Younès





unique ID question

2008-01-14 Thread Evgeniy Strokin
If I make one of my field as a unique ID, id doesn't increase/decrease 
performance of searching by this field. Right?
For example if I have two fields, I know for sure both of them are unique, both 
the same type, and make one of them as a Solr Unique ID. The general 
performance should be the same if I want to retrieve a document by first field 
or by the second.
Am I correct? Any general ideas or comments on this topic would be helpful to 
better understand how unique ID works.
 
Thank you
Gene

Re: unique ID question

2008-01-14 Thread Ryan McKinley

Evgeniy Strokin wrote:

If I make one of my field as a unique ID, id doesn't increase/decrease 
performance of searching by this field. Right?
For example if I have two fields, I know for sure both of them are unique, both 
the same type, and make one of them as a Solr Unique ID. The general 
performance should be the same if I want to retrieve a document by first field 
or by the second.
Am I correct? Any general ideas or comments on this topic would be helpful to 
better understand how unique ID works.
 


correct - search performance only depends on the lucene index 
characteristics.


The field you declare as: id is just a marker to 
solr to say what field it should use to check if the document overwrites 
another one.


From the searching side, there is nothing special about the uniqueKey 
field, it is only for /update that it gets used.


ryan


index out of disk space, CorruptIndexException

2008-01-14 Thread Brian Whitman
We had an index run out of disk space. Queries work fine but commits  
return


500 doc counts differ for segment _18lu: fieldsReader shows 104  
but segmentInfo shows 212


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _18lu: fieldsReader shows 104 but segmentInfo shows 212
	at  
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:191)


I've made room, restarted resin, and now solr won't start. No useful  
messages in the startup, just a


[21:01:49.105] Could not start SOLR. Check solr/home property
[21:01:49.105] java.lang.NullPointerException
[21:01:49.105]  at  
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
100)


What can I do from here?







Re: index out of disk space, CorruptIndexException

2008-01-14 Thread Ryan McKinley

ug -- maybe someone else has better ideas, but you can try:
http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/CheckIndex.java

I think that converts (what it can) to a 2.3 index.

The NullPointerException should be gone in trunk, that is just an 
artifact of stuff going wrong during initialization.


ryan


Brian Whitman wrote:

We had an index run out of disk space. Queries work fine but commits return

500 doc counts differ for segment _18lu: fieldsReader shows 104 but 
segmentInfo shows 212


org.apache.lucene.index.CorruptIndexException: doc counts differ for 
segment _18lu: fieldsReader shows 104 but segmentInfo shows 212
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:191)


I've made room, restarted resin, and now solr won't start. No useful 
messages in the startup, just a


[21:01:49.105] Could not start SOLR. Check solr/home property
[21:01:49.105] java.lang.NullPointerException
[21:01:49.105]  at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:100) 



What can I do from here?










Re: index out of disk space, CorruptIndexException

2008-01-14 Thread Brian Whitman


On Jan 14, 2008, at 4:08 PM, Ryan McKinley wrote:

ug -- maybe someone else has better ideas, but you can try:
http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/CheckIndex.java


thanks for the tip, i did run that, but I stopped it 30 minutes in, as  
it was still on the first (out of 46) segment.. The index is (was)  
129GB.

I just restored to an older index and made this ticket, 
https://issues.apache.org/jira/browse/SOLR-455





Re: MoreLikeThis similarity field boosting

2008-01-14 Thread Ken Krugler

I'm using Solr for searching our system.
Using MoreLikeThis for related content searching.
Now url used for search is like this:
http://localhost:8983/solr/mlt?q=nid:7280&mlt=true&mlt.fl=title,teaser,body&mlt.mindf=1&mlt.mintf=1&fl=nid,title,score
Where "nid" is uniqueKey and "title,teaser,body" are stored fields 
with multiValued set to "true".


The question is:
Is it possible to boost terms for one or more similarity fields?
For example I'd like something like mlt.fl=title^3,teaser^10,body - 
terms from teaser will have highest weight, then title terms and the 
lowest terms weight for body.


A while ago I had a similar issue, and (at least back then) I don't 
think this was possible.


What I did was use Solr's copy-field support to create a "boosted" 
version of a field, where I copied the field in multiple times.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"


Re: Text Summarizer

2008-01-14 Thread Ycrux

Maybe the right name is "Snippet". Like Google snippets.

cheers
Y.

Otis Gospodnetic a écrit :

Sounds like you are looking for a highlighter/KWIC, not a summarizer?

Otis 


--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Ycrux <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 2:45:09 PM
Subject: Text Summarizer 


Hi!

I'm looking for a good way to get a good "text summarizer"
for my personal search engine based Solr.

Actually, I'm using "ots" (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \
-no-references  2>/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain something 
similar

to google "text snippet" (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with
 "elinks" 
(the text browser)

like in the previous example.

Thanks in adavance.

cheers
Younès



 

  




Re: index out of disk space, CorruptIndexException

2008-01-14 Thread Chris Hostetter

: I've made room, restarted resin, and now solr won't start. No useful messages
: in the startup, just a
: 
: [21:01:49.105] Could not start SOLR. Check solr/home property
: [21:01:49.105] java.lang.NullPointerException
: [21:01:49.105]  at org
: .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:100)

that message usually comes after some earlier (possibly much earlier) 
error about the real cause of the problem (usually with a meaninigful 
stack trace).  I'm guessing that the meaningful error in this case hwoever 
is something along the lines of "index corrupted" but it might have ust 
been a stray lock file.

-Hoss



Re: Text Summarizer

2008-01-14 Thread Ycrux

Hi Mike and Otis,


Mike Klaas a écrit :
See http://wiki.apache.org/solr/HighlightingParameters .  The default 
behaviour will provide snippets like google does.


Note that you need to "store" the text of fields you want to highlight 
for this to work.




Thanks for the help. Works like a charm.

cheers
Y.


RE: field:(-null) returns records where field was not specified

2008-01-14 Thread Lance Norskog
The *:* (star colon star) means "all records". The trick is to use (*:* AND
-field:[* TO *]). It's silly, but there it is.

A performance note: we switched from empty fields to fields with a standard
'empty' value. This way we don't have to do a range check to find records
with empty fields.

Lance Norskog

-Original Message-
From: Karen Loughran [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 7:51 AM
To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: field:(-null) returns records where field was not specified


Hi Erik, thanks for your reply,

I had read this page.  But I'm not using the "NOT" operator,  I'm using the
"-" operator.  I'm assuming there is a subtle difference between them in
that NOT qualifies something else, hence needs 2 terms.  Isn't the "-" 
operator supposed to be a complement to the "+" operator, ie. excludes
something rather than requiring it ?

thanks
Karen



On Monday 14 January 2008 15:14:05 Erick Erickson wrote:
> Have you seen this page?
> http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> From that page:
> Note: The NOT operator cannot be used with just one term. For example, 
> the following search will return no results:
> NOT "jakarta apache"
>
>
> Erick
>
> On Jan 14, 2008 9:30 AM, Karen Loughran <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > We are indexing different types of documents, some with certain 
> > fields set and some without, some fields sometimes in both.
> >
> > If a particular field is missing in a newly added record, I would 
> > have expected the query:
> >
> > field_name:(-null)
> >
> > not to return this particular record in the response, ie, I'm 
> > assuming the field is set to null.
> >
> > But the response we see includes empty docs:
> >
> > ..
> > 
> > ..
> > 
> >  
> > 
> >  
> > 
> >  
> > etc, etc
> > ..
> > 
> >
> > Can someone explain why field_name:(-null) returns the records where 
> > field_name is missing ?
> >
> > We note that if we do the range operation we can get a response 
> > without the records with no field_name:
> >
> > field_name:[* TO *]
> >
> > Many thanks
> > Karen



No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.0/1218 - Release Date: 1/10/2008
1:32 PM
 



Re: Text Summarizer

2008-01-14 Thread Mike Klaas
See http://wiki.apache.org/solr/HighlightingParameters .  The default  
behaviour will provide snippets like google does.


Note that you need to "store" the text of fields you want to  
highlight for this to work.


cheers,
-Mike

On 14-Jan-08, at 2:17 PM, Ycrux wrote:


Maybe the right name is "Snippet". Like Google snippets.

cheers
Y.

Otis Gospodnetic a écrit :

Sounds like you are looking for a highlighter/KWIC, not a summarizer?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Ycrux <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 2:45:09 PM
Subject: Text Summarizer
Hi!

I'm looking for a good way to get a good "text summarizer"
for my personal search engine based Solr.

Actually, I'm using "ots" (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \
-no-references  2>/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain  
something similar

to google "text snippet" (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with
 "elinks" (the text browser)
like in the previous example.

Thanks in adavance.

cheers
Younès











RE: LNS - or - "now i know we've succeeded"

2008-01-14 Thread Lance Norskog
Now that Microsoft is buying FAST (!!) the open source world needs a
matching technology :) 

-Original Message-
From: Walter Underwood [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 7:42 AM
To: solr-user@lucene.apache.org
Subject: Re: LNS - or - "now i know we've succeeded"

Yes, they are reputable. They've been doing consulting with Verity,
Ultraseek, and other platforms for many years.  --wunder

On 1/12/08 1:22 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:

> It is pretty cool to see a reputable
> Search company (is ideaeng.com a reputable search consulting company?


No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.0/1218 - Release Date: 1/10/2008
1:32 PM
 



RE: field:(-null) returns records where field was not specified

2008-01-14 Thread Chris Hostetter

Several things in this thread should be clarified (note: order of 
quotations munged for clarity)...

: I had read this page.  But I'm not using the "NOT" operator,  I'm using the
: "-" operator.  I'm assuming there is a subtle difference between them in
: that NOT qualifies something else, hence needs 2 terms.  Isn't the "-" 
: operator supposed to be a complement to the "+" operator, ie. excludes
: something rather than requiring it ?

"The NOT operator" and "the - operator" are in fact the same thing ... the 
duplicate syntax comes from Lucene trying to appease people that 
want boolean style operator synta (AND/OR/NOT) even though the query 
parser is not a boolean syntax.

: > Have you seen this page?
: > http://lucene.apache.org/java/docs/queryparsersyntax.html
: >
: > From that page:
: > Note: The NOT operator cannot be used with just one term. For example, 
: > the following search will return no results:
: > NOT "jakarta apache"

In Solr, the query parser can in fact support purely negative queries, by 
internally transforming the query, this is noted on the Solr query syntax 
wiki...

http://wiki.apache.org/solr/SolrQuerySyntax

: > > field_name:(-null)

"null" is not a special keyword, if you look at the debugging output when 
doing that query you'll see that it is the same as:   -field_name:null  
... which is a search for all docs containing the string "null" in the 
field "field_name".

: The *:* (star colon star) means "all records". The trick is to use (*:* AND
: -field:[* TO *]). It's silly, but there it is.

as i mentioned, you can do pure wildcard queries now, so a simple search 
for -field_name:[* TO *] will find all docs that have no indexed values 
for that field at all.

: A performance note: we switched from empty fields to fields with a standard
: 'empty' value. This way we don't have to do a range check to find records
: with empty fields.

Your milage may vary depending on how many docs you have with "no value" 
... this also issn't practical when dealing with numeric, boolean, or date 
based fields.  (and depending on how much churn there is in your index, 
the filterCache can probably make the difference negliable on average 
anyway).




-Hoss