Spellchecker index rebuild error
Lately I've been having issues with the spellchecker failing to properly rebuild my spell index. I used to be able to delete the spell directory and reload the core and build the index fine if it ever crapped out, but now I can't even build it. java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506) at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) at org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:70) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167) ... Here's the query: /solr/dsteiger/select/?q=test&qt=spellchecker&cmd=rebuild Here's my config snippet: 1 0.5 spell spell Anyone have any ideas? Doug
field:(-null) returns records where field was not specified
Hi all, We are indexing different types of documents, some with certain fields set and some without, some fields sometimes in both. If a particular field is missing in a newly added record, I would have expected the query: field_name:(-null) not to return this particular record in the response, ie, I'm assuming the field is set to null. But the response we see includes empty docs: .. .. etc, etc .. Can someone explain why field_name:(-null) returns the records where field_name is missing ? We note that if we do the range operation we can get a response without the records with no field_name: field_name:[* TO *] Many thanks Karen
Re: field:(-null) returns records where field was not specified
Have you seen this page? http://lucene.apache.org/java/docs/queryparsersyntax.html >From that page: Note: The NOT operator cannot be used with just one term. For example, the following search will return no results: NOT "jakarta apache" Erick On Jan 14, 2008 9:30 AM, Karen Loughran <[EMAIL PROTECTED]> wrote: > > > Hi all, > > We are indexing different types of documents, some with certain fields set > and > some without, some fields sometimes in both. > > If a particular field is missing in a newly added record, I would have > expected the query: > > field_name:(-null) > > not to return this particular record in the response, ie, I'm assuming the > field is set to null. > > But the response we see includes empty docs: > > .. > > .. > > > > > > > etc, etc > .. > > > Can someone explain why field_name:(-null) returns the records where > field_name is missing ? > > We note that if we do the range operation we can get a response without > the > records with no field_name: > > field_name:[* TO *] > > Many thanks > Karen >
Re: LNS - or - "now i know we've succeeded"
Yes, they are reputable. They've been doing consulting with Verity, Ultraseek, and other platforms for many years. --wunder On 1/12/08 1:22 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > It is pretty cool to see a reputable > Search company (is ideaeng.com a reputable search consulting company?
batch indexing takes more time than shown on SOLR output --> something to do with IO?
I have a batch program which inserts items in a solr/lucene index. all is going fine and I get update messages in the console like: 14-jan-2008 16:40:52 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498, ...(42 more) ]} 0 875 However, when timing this instruction on the client-side (I use SOlrJ --> req.process(server)) I get totally different numbers (in the beginning the client-side measured time is about 2 seconds on average but after some time this time goes up to about 30-40 seconds, altough the solr-outputted time stays between 0.8-1.3 seconds? Does this have anything to do with costly IO-activity that is accounted for in the SOLR output? If this is true, what tool do you recommend using to monitor IO-activity? Thanks, Geert-Jan -- View this message in context: http://www.nabble.com/batch-indexing-takes-more-time-than-shown-on-SOLR-output%3E-something-to-do-with-IO--tp14804471p14804471.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: field:(-null) returns records where field was not specified
Hi Erik, thanks for your reply, I had read this page. But I'm not using the "NOT" operator, I'm using the "-" operator. I'm assuming there is a subtle difference between them in that NOT qualifies something else, hence needs 2 terms. Isn't the "-" operator supposed to be a complement to the "+" operator, ie. excludes something rather than requiring it ? thanks Karen On Monday 14 January 2008 15:14:05 Erick Erickson wrote: > Have you seen this page? > http://lucene.apache.org/java/docs/queryparsersyntax.html > > From that page: > Note: The NOT operator cannot be used with just one term. For example, the > following search will return no results: > NOT "jakarta apache" > > > Erick > > On Jan 14, 2008 9:30 AM, Karen Loughran <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > We are indexing different types of documents, some with certain fields > > set and > > some without, some fields sometimes in both. > > > > If a particular field is missing in a newly added record, I would have > > expected the query: > > > > field_name:(-null) > > > > not to return this particular record in the response, ie, I'm assuming > > the field is set to null. > > > > But the response we see includes empty docs: > > > > .. > > > > .. > > > > > > > > > > > > > > etc, etc > > .. > > > > > > Can someone explain why field_name:(-null) returns the records where > > field_name is missing ? > > > > We note that if we do the range operation we can get a response without > > the > > records with no field_name: > > > > field_name:[* TO *] > > > > Many thanks > > Karen
new to solr
Hello, I am new to solr. I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Thanks, Xiaohui
Re: new to solr
Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Hello, I am new to solr. Welcome! I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan
RE: new to solr
Thanks so much for your reply! Please tell me what example.xsl is for in conf/xslt. Please let me know where the search result is located. I can use php or .net to display the result in web. Is it created on fly? Thanks, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:37 AM To: solr-user@lucene.apache.org Subject: Re: new to solr Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: > Hello, > > I am new to solr. Welcome! > I followed solr online tutorial to get the example > work. The search result is xml. I wonder if there is a way to show > result in a form. I saw there is example.xsl in conf/xslt directory. I > really don't know how to do it. Anyone has some ideas for me. I really > appreciate it! > Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan
Re: new to solr
the example.xsl is an example using XSLT to format results. Check: http://wiki.apache.org/solr/XsltResponseWriter For php, check: http://wiki.apache.org/solr/SolPHP ryan Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thanks so much for your reply! Please tell me what example.xsl is for in conf/xslt. Please let me know where the search result is located. I can use php or .net to display the result in web. Is it created on fly? Thanks, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:37 AM To: solr-user@lucene.apache.org Subject: Re: new to solr Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Hello, I am new to solr. Welcome! I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan
RE: new to solr
Thanks very much, Ryan. I really appreciate it. I will take a look on both. Best regards, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:56 AM To: solr-user@lucene.apache.org Subject: Re: new to solr the example.xsl is an example using XSLT to format results. Check: http://wiki.apache.org/solr/XsltResponseWriter For php, check: http://wiki.apache.org/solr/SolPHP ryan Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: > Thanks so much for your reply! Please tell me what example.xsl is for in > conf/xslt. > > Please let me know where the search result is located. I can use php or > .net to display the result in web. Is it created on fly? > > Thanks, > Xiaohui > > -Original Message- > From: Ryan McKinley [mailto:[EMAIL PROTECTED] > Sent: Monday, January 14, 2008 11:37 AM > To: solr-user@lucene.apache.org > Subject: Re: new to solr > > Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: >> Hello, >> >> I am new to solr. > > Welcome! > >> I followed solr online tutorial to get the example >> work. The search result is xml. I wonder if there is a way to show >> result in a form. I saw there is example.xsl in conf/xslt directory. I >> really don't know how to do it. Anyone has some ideas for me. I really >> appreciate it! >> > > Are you asking how to display results for people to see? A nicely > formatted website? > > Solr (a database) does not aim to solve the display side... but there > are lots of clients to help integrate with your website. > php/java/.net/ruby/etc > > ryan > > > >
Re: new to solr
On Jan 14, 2008 11:55 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > the example.xsl is an example using XSLT to format results. Check: > http://wiki.apache.org/solr/XsltResponseWriter To add to the above: I think the XsltResponseWriter is not intended for formatting results for display on your web site. Normally you would use your server-side language (PHP, Python, etc.) to query the Solr server, get the results, and format them for display. Solr doesn't provide the "front-end" search interface for your web site -- you have to create that yourself. -Stuart altlaw.org
Re: Documents with One-to-many
On Jan 11, 2008 10:44 AM, Evgeniy Strokin <[EMAIL PROTECTED]> wrote: > Hello. If I need documents which has number of fields but also I have number > of other documents which related to the first one one-to-many. For example a > person, could have several addresses. I want to have all of them in search > result if I look for people. Also I want to search people by address. > How it could be done in Solr? It may be easier to perform this type of query in a relational database. With Solr, I think you would have to copy all of the "many" fields into a single field in your "one" document. So, a "person" document would have a single "address" field containing all the addresses for that person. -Stuart altlaw.org
Re: batch indexing takes more time than shown on SOLR output --> something to do with IO?
Re monitoring IO activity iostat, vmstat, sar and such under Linux, for example. Yes, Solr doesn't count how long it takes to send the response back to the client, so if the response is large and/or network is slow, the actual number is going to be higher than the number that Solr logs. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Britske <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 10:56:24 AM Subject: batch indexing takes more time than shown on SOLR output --> something to do with IO? I have a batch program which inserts items in a solr/lucene index. all is going fine and I get update messages in the console like: 14-jan-2008 16:40:52 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498, ...(42 more) ]} 0 875 However, when timing this instruction on the client-side (I use SOlrJ --> req.process(server)) I get totally different numbers (in the beginning the client-side measured time is about 2 seconds on average but after some time this time goes up to about 30-40 seconds, altough the solr-outputted time stays between 0.8-1.3 seconds? Does this have anything to do with costly IO-activity that is accounted for in the SOLR output? If this is true, what tool do you recommend using to monitor IO-activity? Thanks, Geert-Jan -- View this message in context: http://www.nabble.com/batch-indexing-takes-more-time-than-shown-on-SOLR-output%3E-something-to-do-with-IO--tp14804471p14804471.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellchecker index rebuild error
I haven't looked at the Spellchecker in a while, but it sounds like you are deleting the index files manually. Any reason for that? Shouldn't that rebuild command run smoothly even with a pre-existing index there (funny that I ask this, considering this was my doing). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 8:31:06 AM Subject: Spellchecker index rebuild error Lately I've been having issues with the spellchecker failing to properly rebuild my spell index. I used to be able to delete the spell directory and reload the core and build the index fine if it ever crapped out, but now I can't even build it. java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506) at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) at org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:70) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167) ... Here's the query: /solr/dsteiger/select/?q=test&qt=spellchecker&cmd=rebuild Here's my config snippet: 1 0.5 spell spell Anyone have any ideas? Doug
Text Summarizer
Hi! I'm looking for a good way to get a good "text summarizer" for my personal search engine based Solr. Actually, I'm using "ots" (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \ -no-references 2>/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google "text snippet" (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with "elinks" (the text browser) like in the previous example. Thanks in adavance. cheers Younès
MoreLikeThis similarity field boosting
Hello. I'm using Solr for searching our system. Using MoreLikeThis for related content searching. Now url used for search is like this: http://localhost:8983/solr/mlt?q=nid:7280&mlt=true&mlt.fl=title,teaser,body&mlt.mindf=1&mlt.mintf=1&fl=nid,title,score Where "nid" is uniqueKey and "title,teaser,body" are stored fields with multiValued set to "true". The question is: Is it possible to boost terms for one or more similarity fields? For example I'd like something like mlt.fl=title^3,teaser^10,body - terms from teaser will have highest weight, then title terms and the lowest terms weight for body. Thanks.
Re: Text Summarizer
Hi Otis, Don't know really what's the name for that. cheers Y. Otis Gospodnetic a écrit : Sounds like you are looking for a highlighter/KWIC, not a summarizer? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ycrux <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 2:45:09 PM Subject: Text Summarizer Hi! I'm looking for a good way to get a good "text summarizer" for my personal search engine based Solr. Actually, I'm using "ots" (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \ -no-references 2>/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google "text snippet" (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with "elinks" (the text browser) like in the previous example. Thanks in adavance. cheers Younès
Re: Text Summarizer
Sounds like you are looking for a highlighter/KWIC, not a summarizer? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ycrux <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 2:45:09 PM Subject: Text Summarizer Hi! I'm looking for a good way to get a good "text summarizer" for my personal search engine based Solr. Actually, I'm using "ots" (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \ -no-references 2>/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google "text snippet" (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with "elinks" (the text browser) like in the previous example. Thanks in adavance. cheers Younès
unique ID question
If I make one of my field as a unique ID, id doesn't increase/decrease performance of searching by this field. Right? For example if I have two fields, I know for sure both of them are unique, both the same type, and make one of them as a Solr Unique ID. The general performance should be the same if I want to retrieve a document by first field or by the second. Am I correct? Any general ideas or comments on this topic would be helpful to better understand how unique ID works. Thank you Gene
Re: unique ID question
Evgeniy Strokin wrote: If I make one of my field as a unique ID, id doesn't increase/decrease performance of searching by this field. Right? For example if I have two fields, I know for sure both of them are unique, both the same type, and make one of them as a Solr Unique ID. The general performance should be the same if I want to retrieve a document by first field or by the second. Am I correct? Any general ideas or comments on this topic would be helpful to better understand how unique ID works. correct - search performance only depends on the lucene index characteristics. The field you declare as: id is just a marker to solr to say what field it should use to check if the document overwrites another one. From the searching side, there is nothing special about the uniqueKey field, it is only for /update that it gets used. ryan
index out of disk space, CorruptIndexException
We had an index run out of disk space. Queries work fine but commits return 500 doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo shows 212 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo shows 212 at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:191) I've made room, restarted resin, and now solr won't start. No useful messages in the startup, just a [21:01:49.105] Could not start SOLR. Check solr/home property [21:01:49.105] java.lang.NullPointerException [21:01:49.105] at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 100) What can I do from here?
Re: index out of disk space, CorruptIndexException
ug -- maybe someone else has better ideas, but you can try: http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/CheckIndex.java I think that converts (what it can) to a 2.3 index. The NullPointerException should be gone in trunk, that is just an artifact of stuff going wrong during initialization. ryan Brian Whitman wrote: We had an index run out of disk space. Queries work fine but commits return 500 doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo shows 212 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo shows 212 at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:191) I've made room, restarted resin, and now solr won't start. No useful messages in the startup, just a [21:01:49.105] Could not start SOLR. Check solr/home property [21:01:49.105] java.lang.NullPointerException [21:01:49.105] at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:100) What can I do from here?
Re: index out of disk space, CorruptIndexException
On Jan 14, 2008, at 4:08 PM, Ryan McKinley wrote: ug -- maybe someone else has better ideas, but you can try: http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/CheckIndex.java thanks for the tip, i did run that, but I stopped it 30 minutes in, as it was still on the first (out of 46) segment.. The index is (was) 129GB. I just restored to an older index and made this ticket, https://issues.apache.org/jira/browse/SOLR-455
Re: MoreLikeThis similarity field boosting
I'm using Solr for searching our system. Using MoreLikeThis for related content searching. Now url used for search is like this: http://localhost:8983/solr/mlt?q=nid:7280&mlt=true&mlt.fl=title,teaser,body&mlt.mindf=1&mlt.mintf=1&fl=nid,title,score Where "nid" is uniqueKey and "title,teaser,body" are stored fields with multiValued set to "true". The question is: Is it possible to boost terms for one or more similarity fields? For example I'd like something like mlt.fl=title^3,teaser^10,body - terms from teaser will have highest weight, then title terms and the lowest terms weight for body. A while ago I had a similar issue, and (at least back then) I don't think this was possible. What I did was use Solr's copy-field support to create a "boosted" version of a field, where I copied the field in multiple times. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "If you can't find it, you can't fix it"
Re: Text Summarizer
Maybe the right name is "Snippet". Like Google snippets. cheers Y. Otis Gospodnetic a écrit : Sounds like you are looking for a highlighter/KWIC, not a summarizer? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ycrux <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 2:45:09 PM Subject: Text Summarizer Hi! I'm looking for a good way to get a good "text summarizer" for my personal search engine based Solr. Actually, I'm using "ots" (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \ -no-references 2>/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google "text snippet" (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with "elinks" (the text browser) like in the previous example. Thanks in adavance. cheers Younès
Re: index out of disk space, CorruptIndexException
: I've made room, restarted resin, and now solr won't start. No useful messages : in the startup, just a : : [21:01:49.105] Could not start SOLR. Check solr/home property : [21:01:49.105] java.lang.NullPointerException : [21:01:49.105] at org : .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:100) that message usually comes after some earlier (possibly much earlier) error about the real cause of the problem (usually with a meaninigful stack trace). I'm guessing that the meaningful error in this case hwoever is something along the lines of "index corrupted" but it might have ust been a stray lock file. -Hoss
Re: Text Summarizer
Hi Mike and Otis, Mike Klaas a écrit : See http://wiki.apache.org/solr/HighlightingParameters . The default behaviour will provide snippets like google does. Note that you need to "store" the text of fields you want to highlight for this to work. Thanks for the help. Works like a charm. cheers Y.
RE: field:(-null) returns records where field was not specified
The *:* (star colon star) means "all records". The trick is to use (*:* AND -field:[* TO *]). It's silly, but there it is. A performance note: we switched from empty fields to fields with a standard 'empty' value. This way we don't have to do a range check to find records with empty fields. Lance Norskog -Original Message- From: Karen Loughran [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 7:51 AM To: solr-user@lucene.apache.org Cc: Erick Erickson Subject: Re: field:(-null) returns records where field was not specified Hi Erik, thanks for your reply, I had read this page. But I'm not using the "NOT" operator, I'm using the "-" operator. I'm assuming there is a subtle difference between them in that NOT qualifies something else, hence needs 2 terms. Isn't the "-" operator supposed to be a complement to the "+" operator, ie. excludes something rather than requiring it ? thanks Karen On Monday 14 January 2008 15:14:05 Erick Erickson wrote: > Have you seen this page? > http://lucene.apache.org/java/docs/queryparsersyntax.html > > From that page: > Note: The NOT operator cannot be used with just one term. For example, > the following search will return no results: > NOT "jakarta apache" > > > Erick > > On Jan 14, 2008 9:30 AM, Karen Loughran <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > We are indexing different types of documents, some with certain > > fields set and some without, some fields sometimes in both. > > > > If a particular field is missing in a newly added record, I would > > have expected the query: > > > > field_name:(-null) > > > > not to return this particular record in the response, ie, I'm > > assuming the field is set to null. > > > > But the response we see includes empty docs: > > > > .. > > > > .. > > > > > > > > > > > > > > etc, etc > > .. > > > > > > Can someone explain why field_name:(-null) returns the records where > > field_name is missing ? > > > > We note that if we do the range operation we can get a response > > without the records with no field_name: > > > > field_name:[* TO *] > > > > Many thanks > > Karen No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.0/1218 - Release Date: 1/10/2008 1:32 PM
Re: Text Summarizer
See http://wiki.apache.org/solr/HighlightingParameters . The default behaviour will provide snippets like google does. Note that you need to "store" the text of fields you want to highlight for this to work. cheers, -Mike On 14-Jan-08, at 2:17 PM, Ycrux wrote: Maybe the right name is "Snippet". Like Google snippets. cheers Y. Otis Gospodnetic a écrit : Sounds like you are looking for a highlighter/KWIC, not a summarizer? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ycrux <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 2:45:09 PM Subject: Text Summarizer Hi! I'm looking for a good way to get a good "text summarizer" for my personal search engine based Solr. Actually, I'm using "ots" (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks "http://lucene.apache.org/solr/"; -force-html -no-numbering \ -no-references 2>/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google "text snippet" (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with "elinks" (the text browser) like in the previous example. Thanks in adavance. cheers Younès
RE: LNS - or - "now i know we've succeeded"
Now that Microsoft is buying FAST (!!) the open source world needs a matching technology :) -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 7:42 AM To: solr-user@lucene.apache.org Subject: Re: LNS - or - "now i know we've succeeded" Yes, they are reputable. They've been doing consulting with Verity, Ultraseek, and other platforms for many years. --wunder On 1/12/08 1:22 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > It is pretty cool to see a reputable > Search company (is ideaeng.com a reputable search consulting company? No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.0/1218 - Release Date: 1/10/2008 1:32 PM
RE: field:(-null) returns records where field was not specified
Several things in this thread should be clarified (note: order of quotations munged for clarity)... : I had read this page. But I'm not using the "NOT" operator, I'm using the : "-" operator. I'm assuming there is a subtle difference between them in : that NOT qualifies something else, hence needs 2 terms. Isn't the "-" : operator supposed to be a complement to the "+" operator, ie. excludes : something rather than requiring it ? "The NOT operator" and "the - operator" are in fact the same thing ... the duplicate syntax comes from Lucene trying to appease people that want boolean style operator synta (AND/OR/NOT) even though the query parser is not a boolean syntax. : > Have you seen this page? : > http://lucene.apache.org/java/docs/queryparsersyntax.html : > : > From that page: : > Note: The NOT operator cannot be used with just one term. For example, : > the following search will return no results: : > NOT "jakarta apache" In Solr, the query parser can in fact support purely negative queries, by internally transforming the query, this is noted on the Solr query syntax wiki... http://wiki.apache.org/solr/SolrQuerySyntax : > > field_name:(-null) "null" is not a special keyword, if you look at the debugging output when doing that query you'll see that it is the same as: -field_name:null ... which is a search for all docs containing the string "null" in the field "field_name". : The *:* (star colon star) means "all records". The trick is to use (*:* AND : -field:[* TO *]). It's silly, but there it is. as i mentioned, you can do pure wildcard queries now, so a simple search for -field_name:[* TO *] will find all docs that have no indexed values for that field at all. : A performance note: we switched from empty fields to fields with a standard : 'empty' value. This way we don't have to do a range check to find records : with empty fields. Your milage may vary depending on how many docs you have with "no value" ... this also issn't practical when dealing with numeric, boolean, or date based fields. (and depending on how much churn there is in your index, the filterCache can probably make the difference negliable on average anyway). -Hoss