Highlight - get terms used by lucene

2008-03-25 Thread Tim Mahy
Hi All,

we use highlighting and snippets for our searches. Besides those two, I would 
want to have a list of terms that lucene used for the highlighting, so that I 
can pull out of a "Tim OR Antwerpen AND Ekeren" the following terms : 
Antwerpen, Ekeren if let's say these are the only terms that gave results ...

is there any way of achieving this ?

Greetings,

Tim




Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan.

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt.


Term frequency

2008-03-26 Thread Tim Mahy
Hi All,

is there a way to get the term frequency per found result back from Solr ?

Greetings,
Tim




Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan.

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt.


RE: Highlight - get terms used by lucene

2008-03-27 Thread Tim Mahy
Hi,

thanks for the answer, with that information I can pull out the term frequency. 
Reason for all this, is that we want to use this scoring algorithm: 
http://download-uk.oracle.com/docs/cd/B19306_01/text.102/b14218/ascore.htm

but is there a performance cost on the explain, that can be painfull for 
production (16 million documents), since we would have to always use the 
explain feature for every request ..

hoping someone can answer this and help us out,

greetings,
Tim


-Oorspronkelijk bericht-
Van: Chris Hostetter [mailto:[EMAIL PROTECTED]
Verzonden: do 27-3-2008 7:36
Aan: solr-user@lucene.apache.org
Onderwerp: Re: Highlight - get terms used by lucene


: we use highlighting and snippets for our searches. Besides those two, I
: would want to have a list of terms that lucene used for the
: highlighting, so that I can pull out of a "Tim OR Antwerpen AND Ekeren"
: the following terms : Antwerpen, Ekeren if let's say these are the only
: terms that gave results ...

the closest you can get is the "explain" info in the debugging output.

currently that comes back as a big string you would need to parse, but
since the topic of progromaticly accessing that data seems to have come up
quite a bit more then i ever really expected, i will point out that
internally it's a fairly well structured class that could be output as a
hierarchy of NamedLists (funny bit of trivia: i wrote that code once upon
a time before SOlr was an Apache project, but it wouldn't work because the
XmlResponseWriter had a bug where it couldn't handle NamedLists more then
3 levels deep)

a patch would be fairly simple if someone wanted to write one.



-Hoss






Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan.

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt.


RE: Highlight - get terms used by lucene

2008-03-28 Thread Tim Mahy
Hi,

Solr returns the max score and the score per document.
This means that the best hit always is 100% which is not always what you want 
because the article itself could still be quite irrelevant...

groeten,
Tim


-Oorspronkelijk bericht-
Van: Chris Hostetter [mailto:[EMAIL PROTECTED]
Verzonden: vr 28-3-2008 4:34
Aan: solr-user@lucene.apache.org
Onderwerp: RE: Highlight - get terms used by lucene


: thanks for the answer, with that information I can pull out the term
: frequency. Reason for all this, is that we want to use this scoring
: algorithm:
: http://download-uk.oracle.com/docs/cd/B19306_01/text.102/b14218/ascore.htm

Uh why?  Based on the description this sounds exactly like the Lucene
scoring formula with some of hte details glossed over ... why not just use
the score Solr computes for you?


-Hoss






Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan.

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt.


Search exact terms

2008-04-02 Thread Tim Mahy
Hi all,

is there a Solr wide setting that with which I can achieve the following :

if I now search for q=onderwij, I also receive documents with results of 
"onderwijs" etc.. this is ofcourse the behavior that is described but if I 
search on "onderwij", I still get the "onderwijs" hits, I use for this field 
the type "text" from the schema.xml that is supplied with the default Solr.

Is there a global setting on Solr to always search Exact ?

Greetings,

Tim





Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan.

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt.


Wildcard search + case insensitive

2008-04-02 Thread Tim Mahy
Hi all,

I use this type definition in my schema.xml :


  




  
  




  


When I have a document with the term "demo" in it and I search for dem* , I 
receive the document back from Solr, but when I search on Dem* I don't get the 
document.

Is the LowerCaseFilterFactory not executed when a wildcard search is being 
performed ?

Greetings,
Tim




Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan.

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt.


RE: Wildcard search + case insensitive

2008-04-02 Thread Tim Mahy
Hi all,

I already found the answer to my question on the following blog : 
http://michaelkimsal.com/blog/2007/04/solr-case-sensitivty/

greetings,
Tim


-Oorspronkelijk bericht-
Van: Tim Mahy [mailto:[EMAIL PROTECTED]
Verzonden: wo 2-4-2008 13:19
Aan: solr-user@lucene.apache.org
Onderwerp: Wildcard search + case insensitive
 
Hi all,

I use this type definition in my schema.xml :


  




  
  




  


When I have a document with the term "demo" in it and I search for dem* , I 
receive the document back from Solr, but when I search on Dem* I don't get the 
document.

Is the LowerCaseFilterFactory not executed when a wildcard search is being 
performed ?

Greetings,
Tim




Info Support - http://www.infosupport.com 

Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op 
geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit 
bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud 
hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan 
ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze 
Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te 
Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per 
omgaande kosteloos toe.

De informatie in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen is verboden. 
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze 
informatie aan derden is niet toegestaan. 

Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht 
dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door 
een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail 
vervolgens vernietigt. 




Multi language, one "body" field, multi stopwords ?

2008-04-23 Thread Tim Mahy
Hi all,

we are in the situation that we want to store documents from x number of 
languages but in the query we want to query the same field,
but at indexing time we want different stopwords text file to be used for the 
language of the uploaded document.

I thought perhaps creating a body field per language and use the copyField to 
copy the contents to one body field, is this possible because I would set 
different settings for the type per "body language field" so that I can set 
different stop word files to be used. This only works I think if the copyField 
does the copy after the indexing, is this the case ?

Greetings,
Tim


Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie 
http://www.infosupport.nl/disclaimer
 

Delete's increase while adding new documents

2008-04-25 Thread Tim Mahy
Hi all,

we send xml add document messages to Solr and we notice something very strange.
We autocommit at 10 documents, starting from a total clean index (removed 
the data folder), when we start uploading we notice that the docsPending is 
going up but also that the deletesPending is going up very fast. After reaching 
the first 10 we queried to solr to return everything and the total results 
count was not 10 but somewhere around 77000 which is exactly 10 - 
docsDeleted from the stats page.

We used that Solr instance before, so my question is : is it possible that Solr 
remembers the unique identities somewhere else as in the data folder ? Btw we 
stopped Solr, removed the data folder and restarted Solr and than this behavior 
began...

greetings,
Tim


Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie 
http://www.infosupport.nl/disclaimer
 

RE: Delete's increase while adding new documents

2008-04-28 Thread Tim Mahy
Hi all,

thank you for your reply. The id's that we send are unique, so we still have no 
clue what is happening :)

greetings,
Tim

-Oorspronkelijk bericht-
Van: Mike Klaas [mailto:[EMAIL PROTECTED]
Verzonden: za 26-4-2008 1:52
Aan: solr-user@lucene.apache.org
Onderwerp: Re: Delete's increase while adding new documents

On 25-Apr-08, at 4:27 AM, Tim Mahy wrote:
>
> Hi all,
>
> we send xml add document messages to Solr and we notice something
> very strange.
> We autocommit at 10 documents, starting from a total clean index
> (removed the data folder), when we start uploading we notice that
> the docsPending is going up but also that the deletesPending is
> going up very fast. After reaching the first 10 we queried to
> solr to return everything and the total results count was not 10
> but somewhere around 77000 which is exactly 10 - docsDeleted
> from the stats page.
>
> We used that Solr instance before, so my question is : is it
> possible that Solr remembers the unique identities somewhere else as
> in the data folder ? Btw we stopped Solr, removed the data folder
> and restarted Solr and than this behavior began...

Are you sure that all the documents you added were unique?

(btw, deletePending doesn't necessarily mean that an old version of
the doc was in the index, I think).

-Mike




Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie 
http://www.infosupport.nl/disclaimer
 

RE: Delete's increase while adding new documents

2008-05-06 Thread Tim Mahy
Hi all,

it seems that we get errors during the auto-commit :


java.io.FileNotFoundException: /opt/solr/upload/nl/archive/data/index/_4x.fnm 
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:501)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:526)

the _4x.fnm file is not on the file system. When we switch from autocommit to 
manual commits throughout xml messages we get the same kind of errors.
Any idea what could be wrong in our configuration to cause these exceptions ?

Greetings,
Tim
____
Van: Tim Mahy [EMAIL PROTECTED]
Verzonden: maandag 28 april 2008 12:11
Aan: solr-user@lucene.apache.org
Onderwerp: RE: Delete's increase while adding new documents

Hi all,

thank you for your reply. The id's that we send are unique, so we still have no 
clue what is happening :)

greetings,
Tim

-Oorspronkelijk bericht-
Van: Mike Klaas [mailto:[EMAIL PROTECTED]
Verzonden: za 26-4-2008 1:52
Aan: solr-user@lucene.apache.org
Onderwerp: Re: Delete's increase while adding new documents

On 25-Apr-08, at 4:27 AM, Tim Mahy wrote:
>
> Hi all,
>
> we send xml add document messages to Solr and we notice something
> very strange.
> We autocommit at 10 documents, starting from a total clean index
> (removed the data folder), when we start uploading we notice that
> the docsPending is going up but also that the deletesPending is
> going up very fast. After reaching the first 10 we queried to
> solr to return everything and the total results count was not 10
> but somewhere around 77000 which is exactly 10 - docsDeleted
> from the stats page.
>
> We used that Solr instance before, so my question is : is it
> possible that Solr remembers the unique identities somewhere else as
> in the data folder ? Btw we stopped Solr, removed the data folder
> and restarted Solr and than this behavior began...

Are you sure that all the documents you added were unique?

(btw, deletePending doesn't necessarily mean that an old version of
the doc was in the index, I think).

-Mike



Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie 
http://www.infosupport.nl/disclaimer

[cid:60561037AEC348669B4CF2083E6168F4]



RE: multi-language searching with Solr

2008-05-06 Thread Tim Mahy
Hi,

you could also use multiple Solr instances having specific settings and 
stopwords etc for the same field and upload your documents to the correct 
instance

and than merge the indexes to one searchable index ...

greetings,
Tim

Van: Eli K [EMAIL PROTECTED]
Verzonden: dinsdag 6 mei 2008 18:26
Aan: solr-user@lucene.apache.org
Onderwerp: Re: multi-language searching with Solr

Peter,

Thanks for your help, I will prototype your solution and see if it
makes sense for me.

Eli

On Mon, May 5, 2008 at 5:38 PM, Binkley, Peter
<[EMAIL PROTECTED]> wrote:
> It won't make much difference to the index size, since you'll only be
>  populating one of the language fields for each document, and empty
>  fields cost nothing. The performance may suffer a bit but Lucene may
>  surprise you with how good it is with that kind of boolean query.
>
>  I agree that as the number of fields and languages increases, this is
>  going to become a lot to manage. But you're up against some basic
>  problems when you try to model this in Solr: for each token, you care
>  about not just its value (which is all Lucene cares about) but also its
>  language and its stem; and the stem for a given token depends on the
>  language (different stemming rules); and at query time you may not know
>  the language. I don't think you're going to get a solution without some
>  redundancy; but solving problems by adding redundant fields is a common
>  method in Solr.
>
>
>  Peter
>
>
>  -Original Message-
>  From: Eli K [mailto:[EMAIL PROTECTED]
>
> Sent: Monday, May 05, 2008 2:28 PM
>  To: solr-user@lucene.apache.org
>
>
> Subject: Re: multi-language searching with Solr
>
>  Wouldn't this impact both indexing and search performance and the size
>  of the index?
>  It is also probable that I will have more then one free text fields
>  later on and with at least 20 languages this approach does not seem very
>  manageable.  Are there other options for making this work with stemming?
>
>  Thanks,
>
>  Eli
>
>
>  On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter
>  <[EMAIL PROTECTED]> wrote:
>  > I think you would have to declare a separate field for each language
>  > (freetext_en, freetext_fr, etc.), each with its own appropriate
>  > stemming. Your ingestion process would have to assign the free text
>  > content for each document to the appropriate field; so, for each
>  > document, only one of the freetext fields would be populated. At
>  > search  time, you would either search against the appropriate field if
>
>  > you know  the search language, or search across them with
>  > "freetext_fr:query OR  freetext_en:query OR ...". That way your query
>  > will be interpreted by  each language field using that language's
>  stemming rules.
>  >
>  >  Other options for combining indexes, such as copyfield or dynamic
>  > fields  (see http://wiki.apache.org/solr/SchemaXml), would lead to a
>  > single  field type and therefore a single type of stemming. You could
>  > always use  copyfield to create an unstemmed common index, if you
>  > don't care about  stemming when you search across languages (since
>  > you're likely to get  odd results when a query in one language is
>  > stemmed according to the  rules of another language).
>  >
>  >  Peter
>  >
>  >
>  >
>  >  -Original Message-
>  >  From: Eli K [mailto:[EMAIL PROTECTED]
>  >  Sent: Monday, May 05, 2008 8:27 AM
>  >  To: solr-user@lucene.apache.org
>  >  Subject: multi-language searching with Solr
>  >
>  >  Hello folks,
>  >
>  >  Let me start by saying that I am new to Lucene and Solr.
>  >
>  >  I am in the process of designing a search back-end for a system that
>
>  > receives 20k documents a day and needs to keep them available for 30
>  > days.  The documents should be searchable on a free text field and on
>
>  > about 8 other fields.
>  >
>  >  One of my requirements is to index and search documents in multiple
>  > languages.  I would like to have the ability to stem and provide the
>  > advanced search features that are based on it.  This will only affect
>
>  > the free text field because the rest of the fields are in English.
>  >
>  >  I can find out the language of the document before indexing and I
>  > might  be able to provide the language to search on.  I also need to
>  > have the  ability to search across all indexed languages (there will
>  > be 20 in  total).
>  >
>  >  Given these requirements do you think this is doable with Solr?  A
>  > major  limiting factor is that I need to stick to the 1.2 GA version
>  > and I  cannot utilize the multi-core features in the 1.3 trunk.
>  >
>  >  I considered writing my own analyzer that will call the appropriate
>  > Lucene analyzer for the given language but I did not see any way for
>  > it  to access the field that specifies the language of the document.
>  >
>  >  Thanks,
>  >
>  >  Eli
>  >
>  >  p.s. I am looking for an experienced Lucene/Solr consultant to help
>  > 

RE: Delete's increase while adding new documents

2008-05-07 Thread Tim Mahy
Hi all,

it seems that we just post to much to fast to Solr.

When we post 100 documents (seperate calls) and perform a commit everything 
goes well, but as soon as we start sending thousands of documents and than use 
autocommit or send the commit message we have the situation that there are a 
lot of documents not in the index although they were sended to Solr ...

has anyone experience with how much documents you can import and at which speed 
so that Solr stays stable ?

We use Tomcat 5.5 and our java memory limit is 2gb.

Greetings,
Tim

Van: Mike Klaas [EMAIL PROTECTED]
Verzonden: dinsdag 6 mei 2008 20:17
Aan: solr-user@lucene.apache.org
Onderwerp: Re: Delete's increase while adding new documents

On 6-May-08, at 4:56 AM, Tim Mahy wrote:

> Hi all,
>
> it seems that we get errors during the auto-commit :
>
>
> java.io.FileNotFoundException: /opt/solr/upload/nl/archive/data/
> index/_4x.fnm (No such file or directory)
>at java.io.RandomAccessFile.open(Native Method)
>at java.io.RandomAccessFile.<init>
> (RandomAccessFile.java:212)
>at org.apache.lucene.store.FSDirectory$FSIndexInput
> $Descriptor.<init>(FSDirectory.java:501)
>at org.apache.lucene.store.FSDirectory
> $FSIndexInput.<init>(FSDirectory.java:526)
>
> the _4x.fnm file is not on the file system. When we switch from
> autocommit to manual commits throughout xml messages we get the same
> kind of errors.
> Any idea what could be wrong in our configuration to cause these
> exceptions ?

I have only heard of that error appearing in two cases.  Either the
index is corrupt, or something else deleted the file.  Are you sure
that there is only one Solr instance that accesses the directory, and
that nothing else ever touches it?

Can you reproduce the deletion issue with a small number of documents
(something that could be tested by one of us)?

-Mike




Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


Duplicates results when using a non optimized index

2008-05-13 Thread Tim Mahy
Hi all,

is this expected behavior when having an index like this :

numDocs : 9479963
maxDoc : 12622942
readerImpl : MultiReader

which is in the process of optimizing that when we search through the index we 
get this :


15257559


15257559


17177888


11825631


11825631


The id field is declared like this :


and is set as the unique identity like this in the schema xml :
  id

so the question : is this expected behavior and if so is there a way to let 
Solr only return unique documents ?

greetings and thanx in advance,
Tim




Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


RE: how to clean an index ?

2008-05-13 Thread Tim Mahy
Hi,

you can create a delete query matching al your documents like the query "*:*"

greetings,
Tim

Van: Pierre-Yves LANDRON [EMAIL PROTECTED]
Verzonden: dinsdag 13 mei 2008 11:53
Aan: solr-user@lucene.apache.org
Onderwerp: how to clean an index ?

Hello,

I want to clean an index (ie delete all documents), but cannot delete the index 
repertory.
Is it possible with the rest interface ?

Thanks,

Pierre-Yves Landron

_
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE




Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


RE: Duplicates results when using a non optimized index

2008-05-14 Thread Tim Mahy
Hi,

thanks for the answer,

- do duplicates go away after optimization is done?
--> no, if we search the index even after it is optimized, we still get the 
duplicate results and even if we search on one of the slaves servers  which 
have the same index through synchronization ...
btw this is the first time we notice this, the only thing we have had was the 
known problem with the "too many open files" which we fixed using the ulimit 
and rebooted the tomcat server 

- do duplicate IDs that you are seeing IDs of previously deleted documents?
--> it is possible that these documenst were uploaded earlier and have been 
replaced...

- which Solr version are you using and can you try a recent nightly?
--> we use the 1.2 stable build

greetings,
Tim

Van: Otis Gospodnetic [EMAIL PROTECTED]
Verzonden: woensdag 14 mei 2008 6:11
Aan: solr-user@lucene.apache.org
Onderwerp: Re: Duplicates results when using a non optimized index

Hm, not sure why that is happening, but here is some info regarding other stuff 
from your email

- there should be no duplicates even if you are searching an index that is 
being optimized
- why are you searching an index that is being optimized?  It's doable, but 
people typically perform index-modifying operations on a Solr master and 
read-only operations on Solr query slave(s)
- do duplicates go away after optimization is done?
- do duplicate IDs that you are seeing IDs of previously deleted documents?
- which Solr version are you using and can you try a recent nightly?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Tim Mahy <[EMAIL PROTECTED]>
> To: "solr-user@lucene.apache.org" 
> Sent: Tuesday, May 13, 2008 5:59:28 AM
> Subject: Duplicates results when using a non optimized index
>
> Hi all,
>
> is this expected behavior when having an index like this :
>
> numDocs : 9479963
> maxDoc : 12622942
> readerImpl : MultiReader
>
> which is in the process of optimizing that when we search through the index we
> get this :
>
>
> 15257559
>
>
> 15257559
>
>
> 17177888
>
>
> 11825631
>
>
> 11825631
>
>
> The id field is declared like this :
>
>
> and is set as the unique identity like this in the schema xml :
>   id
>
> so the question : is this expected behavior and if so is there a way to let 
> Solr
> only return unique documents ?
>
> greetings and thanx in advance,
> Tim
>
>
>
>
> Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx





Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


RE: Duplicates results when using a non optimized index

2008-05-15 Thread Tim Mahy
Hi,

yep it is a very strange problem that we never encountered before.
We are uploading all the documents again to see if that solves the problem 
(hoping that the delete will delete also the multiple document instances)

greetings,
Tim

Van: Otis Gospodnetic [EMAIL PROTECTED]
Verzonden: woensdag 14 mei 2008 23:18
Aan: solr-user@lucene.apache.org
Onderwerp: Re: Duplicates results when using a non optimized index

Tim,

Hm, not sure what caused this.  1.2 is now quite old (yes, I know it's the last 
stable release), so if I were you I would consider moving to 1.3-dev.  It 
sounds like the index is already "polluted" with duplicate documents, so you'll 
want to rebuild the index whether you decide to stay with 1.2 or move to 
1.3-dev.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message ----
> From: Tim Mahy <[EMAIL PROTECTED]>
> To: "solr-user@lucene.apache.org" 
> Sent: Wednesday, May 14, 2008 3:59:23 AM
> Subject: RE: Duplicates results when using a non optimized index
>
> Hi,
>
> thanks for the answer,
>
> - do duplicates go away after optimization is done?
> --> no, if we search the index even after it is optimized, we still get the
> duplicate results and even if we search on one of the slaves servers  which 
> have
> the same index through synchronization ...
> btw this is the first time we notice this, the only thing we have had was the
> known problem with the "too many open files" which we fixed using the ulimit 
> and
> rebooted the tomcat server 
>
> - do duplicate IDs that you are seeing IDs of previously deleted documents?
> --> it is possible that these documenst were uploaded earlier and have been
> replaced...
>
> - which Solr version are you using and can you try a recent nightly?
> --> we use the 1.2 stable build
>
> greetings,
> Tim
> 
> Van: Otis Gospodnetic [EMAIL PROTECTED]
> Verzonden: woensdag 14 mei 2008 6:11
> Aan: solr-user@lucene.apache.org
> Onderwerp: Re: Duplicates results when using a non optimized index
>
> Hm, not sure why that is happening, but here is some info regarding other 
> stuff
> from your email
>
> - there should be no duplicates even if you are searching an index that is 
> being
> optimized
> - why are you searching an index that is being optimized?  It's doable, but
> people typically perform index-modifying operations on a Solr master and
> read-only operations on Solr query slave(s)
> - do duplicates go away after optimization is done?
> - do duplicate IDs that you are seeing IDs of previously deleted documents?
> - which Solr version are you using and can you try a recent nightly?
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Message 
> > From: Tim Mahy
> > To: "solr-user@lucene.apache.org"
> > Sent: Tuesday, May 13, 2008 5:59:28 AM
> > Subject: Duplicates results when using a non optimized index
> >
> > Hi all,
> >
> > is this expected behavior when having an index like this :
> >
> > numDocs : 9479963
> > maxDoc : 12622942
> > readerImpl : MultiReader
> >
> > which is in the process of optimizing that when we search through the index 
> > we
> > get this :
> >
> >
> > 15257559
> >
> >
> > 15257559
> >
> >
> > 17177888
> >
> >
> > 11825631
> >
> >
> > 11825631
> >
> >
> > The id field is declared like this :
> >
> >
> > and is set as the unique identity like this in the schema xml :
> >   id
> >
> > so the question : is this expected behavior and if so is there a way to let
> Solr
> > only return unique documents ?
> >
> > greetings and thanx in advance,
> > Tim
> >
> >
> >
> >
> > Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
>
>
>
>
>
> Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx





Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


RE: solr search

2008-05-16 Thread Tim Mahy
Hi,

1) did you perform a commit after the delete ?
2) in the default schema there are some comments on the different analyzers 
which should help you get started I think

greetings,
Tim

Van: dharhsana [EMAIL PROTECTED]
Verzonden: vrijdag 16 mei 2008 13:56
Aan: solr-user@lucene.apache.org
Onderwerp: Re: solr search

hi umar, thanks for ur reply ...

As per ur sugesstion i have  done the search,it perfectly worked well...

i have  two more question
1) how to delete an id in a solr 

i tried some example but there is no changes in my index.



2) How to use an Analyzer for querying and indexing, for example i have
indexed java1.5,java1.6,java in the index, while iam searching i entered a
query like "what is java", it is not fetching any results ,but only if i
give java or java is object oriented language something like that only i can
see the result. I have no idea of what analyser i have to use.

waiting for ur reply,

Thank u.

with regards
Rekha.


--
View this message in context: 
http://www.nabble.com/solr-search-tp17249602p17273000.html
Sent from the Solr - User mailing list archive at Nabble.com.





Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


RE: hi umar

2008-05-16 Thread Tim Mahy
Hi,

you can send a delete query, the delete query is the same syntax as a normal 
search.

so if your id field is called "ID"

you can send as query : ID:"1450"  instead of *:* (which deletes everything)
which will delete in this example the document with id 1450

greetings,
Tim

Van: dharhsana [EMAIL PROTECTED]
Verzonden: vrijdag 16 mei 2008 16:27
Aan: solr-user@lucene.apache.org
Onderwerp: hi umar

hi,thank u for ur reply..

As per ur suggestion ,the index has been deleted ,can you plz help me out
for deleting the index by 'ID' (but not the whole index).


For analyzer,i have given the text as the fileld name ,but i didnt get the
proper loose search

can u give me some more examples for analyzer...

waiting for ur reply,

with regards,
T.Rekha.
--
View this message in context: 
http://www.nabble.com/hi-umar-tp17276060p17276060.html
Sent from the Solr - User mailing list archive at Nabble.com.





Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


bitwise comparer

2008-05-16 Thread Tim Mahy
Hi all,

is there any already existing patch or feature which allows to search bitwise ?

so like you would do in mysql : " myField & 5 "

which returns in this case all the documents which have myField values 4 and 1 ?

greetings,
Tim




Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


Highlighting - field criteria highlights in other fields

2008-05-20 Thread Tim Mahy
Hi all,

we have situation in which we have documents
that have an introduction (text) , a body (text) and some meta data fields 
(integers mostly).

when we create a query like this :

q=( +(body_nl:( brussel) ) AND ( (+publicationid:("3430" OR "3451")) )&fq= 
+publishdateAsString:[20070520 TO 
20080520]&start=0&rows=11&hl=on&hl.fl=body_nl&hl.snippets=3&hl.fragsize=320&hl.simple.pre=&hl.simple.post=&sort=publishdateAsString
 desc,publicationname desc&fl=id,score,introduction

we get nice highlighting from the body_nl field but Solr also highlights 3430 
and 3451 if there is such a "word" in the body_nl, while we were expecting only 
to get highlighting from the word "brussel" in the body_nl.
So it seems that all posible criteria terms are highlighted in any of the given 
highlighting fields. Is it possible to disable this (with some kind of 
parameter of something) and only let the hl.fl's highlight the criteria for 
their own field ?

greetings,
Tim




Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


Fetching the first 10 results and the last result

2008-05-21 Thread Tim Mahy
Hi all,

is there a way to let Solr not only return the total number of found articles, 
but also the data of the last document when for example only requesting the 
first 10 documents ?

we could do this with a seperate query by either letting the second query fetch 
1 row from position = previouscount - 1 or by switching the sorting and fetch 
only the first row, but it would be nicer if we didn't have to do this second 
query, so is there any solution to our problem ?

greetings and thanks in advance,
Tim




Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx