Highlight - get terms used by lucene
Hi All, we use highlighting and snippets for our searches. Besides those two, I would want to have a list of terms that lucene used for the highlighting, so that I can pull out of a "Tim OR Antwerpen AND Ekeren" the following terms : Antwerpen, Ekeren if let's say these are the only terms that gave results ... is there any way of achieving this ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
Term frequency
Hi All, is there a way to get the term frequency per found result back from Solr ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
RE: Highlight - get terms used by lucene
Hi, thanks for the answer, with that information I can pull out the term frequency. Reason for all this, is that we want to use this scoring algorithm: http://download-uk.oracle.com/docs/cd/B19306_01/text.102/b14218/ascore.htm but is there a performance cost on the explain, that can be painfull for production (16 million documents), since we would have to always use the explain feature for every request .. hoping someone can answer this and help us out, greetings, Tim -Oorspronkelijk bericht- Van: Chris Hostetter [mailto:[EMAIL PROTECTED] Verzonden: do 27-3-2008 7:36 Aan: solr-user@lucene.apache.org Onderwerp: Re: Highlight - get terms used by lucene : we use highlighting and snippets for our searches. Besides those two, I : would want to have a list of terms that lucene used for the : highlighting, so that I can pull out of a "Tim OR Antwerpen AND Ekeren" : the following terms : Antwerpen, Ekeren if let's say these are the only : terms that gave results ... the closest you can get is the "explain" info in the debugging output. currently that comes back as a big string you would need to parse, but since the topic of progromaticly accessing that data seems to have come up quite a bit more then i ever really expected, i will point out that internally it's a fairly well structured class that could be output as a hierarchy of NamedLists (funny bit of trivia: i wrote that code once upon a time before SOlr was an Apache project, but it wouldn't work because the XmlResponseWriter had a bug where it couldn't handle NamedLists more then 3 levels deep) a patch would be fairly simple if someone wanted to write one. -Hoss Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
RE: Highlight - get terms used by lucene
Hi, Solr returns the max score and the score per document. This means that the best hit always is 100% which is not always what you want because the article itself could still be quite irrelevant... groeten, Tim -Oorspronkelijk bericht- Van: Chris Hostetter [mailto:[EMAIL PROTECTED] Verzonden: vr 28-3-2008 4:34 Aan: solr-user@lucene.apache.org Onderwerp: RE: Highlight - get terms used by lucene : thanks for the answer, with that information I can pull out the term : frequency. Reason for all this, is that we want to use this scoring : algorithm: : http://download-uk.oracle.com/docs/cd/B19306_01/text.102/b14218/ascore.htm Uh why? Based on the description this sounds exactly like the Lucene scoring formula with some of hte details glossed over ... why not just use the score Solr computes for you? -Hoss Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
Search exact terms
Hi all, is there a Solr wide setting that with which I can achieve the following : if I now search for q=onderwij, I also receive documents with results of "onderwijs" etc.. this is ofcourse the behavior that is described but if I search on "onderwij", I still get the "onderwijs" hits, I use for this field the type "text" from the schema.xml that is supplied with the default Solr. Is there a global setting on Solr to always search Exact ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
Wildcard search + case insensitive
Hi all, I use this type definition in my schema.xml : When I have a document with the term "demo" in it and I search for dem* , I receive the document back from Solr, but when I search on Dem* I don't get the document. Is the LowerCaseFilterFactory not executed when a wildcard search is being performed ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
RE: Wildcard search + case insensitive
Hi all, I already found the answer to my question on the following blog : http://michaelkimsal.com/blog/2007/04/solr-case-sensitivty/ greetings, Tim -Oorspronkelijk bericht- Van: Tim Mahy [mailto:[EMAIL PROTECTED] Verzonden: wo 2-4-2008 13:19 Aan: solr-user@lucene.apache.org Onderwerp: Wildcard search + case insensitive Hi all, I use this type definition in my schema.xml : When I have a document with the term "demo" in it and I search for dem* , I receive the document back from Solr, but when I search on Dem* I don't get the document. Is the LowerCaseFilterFactory not executed when a wildcard search is being performed ? Greetings, Tim Info Support - http://www.infosupport.com Alle informatie in dit e-mailbericht is onder voorbehoud. Info Support is op geen enkele wijze aansprakelijk voor vergissingen of onjuistheden in dit bericht en staat niet in voor de juiste en volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden door Info Support uitgevoerd en op al de aan ons gegeven opdrachten zijn - tenzij expliciet anders overeengekomen - onze Algemene Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw verzoek per omgaande kosteloos toe. De informatie in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Gebruik van deze informatie door anderen is verboden. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is niet toegestaan. Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u dit bericht dus per ongeluk ontvangt, stelt Info Support het op prijs als u de zender door een antwoord op deze e-mail hiervan op de hoogte brengt en deze e-mail vervolgens vernietigt.
Multi language, one "body" field, multi stopwords ?
Hi all, we are in the situation that we want to store documents from x number of languages but in the query we want to query the same field, but at indexing time we want different stopwords text file to be used for the language of the uploaded document. I thought perhaps creating a body field per language and use the copyField to copy the contents to one body field, is this possible because I would set different settings for the type per "body language field" so that I can set different stop word files to be used. This only works I think if the copyField does the copy after the indexing, is this the case ? Greetings, Tim Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie http://www.infosupport.nl/disclaimer
Delete's increase while adding new documents
Hi all, we send xml add document messages to Solr and we notice something very strange. We autocommit at 10 documents, starting from a total clean index (removed the data folder), when we start uploading we notice that the docsPending is going up but also that the deletesPending is going up very fast. After reaching the first 10 we queried to solr to return everything and the total results count was not 10 but somewhere around 77000 which is exactly 10 - docsDeleted from the stats page. We used that Solr instance before, so my question is : is it possible that Solr remembers the unique identities somewhere else as in the data folder ? Btw we stopped Solr, removed the data folder and restarted Solr and than this behavior began... greetings, Tim Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie http://www.infosupport.nl/disclaimer
RE: Delete's increase while adding new documents
Hi all, thank you for your reply. The id's that we send are unique, so we still have no clue what is happening :) greetings, Tim -Oorspronkelijk bericht- Van: Mike Klaas [mailto:[EMAIL PROTECTED] Verzonden: za 26-4-2008 1:52 Aan: solr-user@lucene.apache.org Onderwerp: Re: Delete's increase while adding new documents On 25-Apr-08, at 4:27 AM, Tim Mahy wrote: > > Hi all, > > we send xml add document messages to Solr and we notice something > very strange. > We autocommit at 10 documents, starting from a total clean index > (removed the data folder), when we start uploading we notice that > the docsPending is going up but also that the deletesPending is > going up very fast. After reaching the first 10 we queried to > solr to return everything and the total results count was not 10 > but somewhere around 77000 which is exactly 10 - docsDeleted > from the stats page. > > We used that Solr instance before, so my question is : is it > possible that Solr remembers the unique identities somewhere else as > in the data folder ? Btw we stopped Solr, removed the data folder > and restarted Solr and than this behavior began... Are you sure that all the documents you added were unique? (btw, deletePending doesn't necessarily mean that an old version of the doc was in the index, I think). -Mike Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie http://www.infosupport.nl/disclaimer
RE: Delete's increase while adding new documents
Hi all, it seems that we get errors during the auto-commit : java.io.FileNotFoundException: /opt/solr/upload/nl/archive/data/index/_4x.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:501) at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:526) the _4x.fnm file is not on the file system. When we switch from autocommit to manual commits throughout xml messages we get the same kind of errors. Any idea what could be wrong in our configuration to cause these exceptions ? Greetings, Tim ____ Van: Tim Mahy [EMAIL PROTECTED] Verzonden: maandag 28 april 2008 12:11 Aan: solr-user@lucene.apache.org Onderwerp: RE: Delete's increase while adding new documents Hi all, thank you for your reply. The id's that we send are unique, so we still have no clue what is happening :) greetings, Tim -Oorspronkelijk bericht- Van: Mike Klaas [mailto:[EMAIL PROTECTED] Verzonden: za 26-4-2008 1:52 Aan: solr-user@lucene.apache.org Onderwerp: Re: Delete's increase while adding new documents On 25-Apr-08, at 4:27 AM, Tim Mahy wrote: > > Hi all, > > we send xml add document messages to Solr and we notice something > very strange. > We autocommit at 10 documents, starting from a total clean index > (removed the data folder), when we start uploading we notice that > the docsPending is going up but also that the deletesPending is > going up very fast. After reaching the first 10 we queried to > solr to return everything and the total results count was not 10 > but somewhere around 77000 which is exactly 10 - docsDeleted > from the stats page. > > We used that Solr instance before, so my question is : is it > possible that Solr remembers the unique identities somewhere else as > in the data folder ? Btw we stopped Solr, removed the data folder > and restarted Solr and than this behavior began... Are you sure that all the documents you added were unique? (btw, deletePending doesn't necessarily mean that an old version of the doc was in the index, I think). -Mike Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie http://www.infosupport.nl/disclaimer [cid:60561037AEC348669B4CF2083E6168F4]
RE: multi-language searching with Solr
Hi, you could also use multiple Solr instances having specific settings and stopwords etc for the same field and upload your documents to the correct instance and than merge the indexes to one searchable index ... greetings, Tim Van: Eli K [EMAIL PROTECTED] Verzonden: dinsdag 6 mei 2008 18:26 Aan: solr-user@lucene.apache.org Onderwerp: Re: multi-language searching with Solr Peter, Thanks for your help, I will prototype your solution and see if it makes sense for me. Eli On Mon, May 5, 2008 at 5:38 PM, Binkley, Peter <[EMAIL PROTECTED]> wrote: > It won't make much difference to the index size, since you'll only be > populating one of the language fields for each document, and empty > fields cost nothing. The performance may suffer a bit but Lucene may > surprise you with how good it is with that kind of boolean query. > > I agree that as the number of fields and languages increases, this is > going to become a lot to manage. But you're up against some basic > problems when you try to model this in Solr: for each token, you care > about not just its value (which is all Lucene cares about) but also its > language and its stem; and the stem for a given token depends on the > language (different stemming rules); and at query time you may not know > the language. I don't think you're going to get a solution without some > redundancy; but solving problems by adding redundant fields is a common > method in Solr. > > > Peter > > > -Original Message- > From: Eli K [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 05, 2008 2:28 PM > To: solr-user@lucene.apache.org > > > Subject: Re: multi-language searching with Solr > > Wouldn't this impact both indexing and search performance and the size > of the index? > It is also probable that I will have more then one free text fields > later on and with at least 20 languages this approach does not seem very > manageable. Are there other options for making this work with stemming? > > Thanks, > > Eli > > > On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter > <[EMAIL PROTECTED]> wrote: > > I think you would have to declare a separate field for each language > > (freetext_en, freetext_fr, etc.), each with its own appropriate > > stemming. Your ingestion process would have to assign the free text > > content for each document to the appropriate field; so, for each > > document, only one of the freetext fields would be populated. At > > search time, you would either search against the appropriate field if > > > you know the search language, or search across them with > > "freetext_fr:query OR freetext_en:query OR ...". That way your query > > will be interpreted by each language field using that language's > stemming rules. > > > > Other options for combining indexes, such as copyfield or dynamic > > fields (see http://wiki.apache.org/solr/SchemaXml), would lead to a > > single field type and therefore a single type of stemming. You could > > always use copyfield to create an unstemmed common index, if you > > don't care about stemming when you search across languages (since > > you're likely to get odd results when a query in one language is > > stemmed according to the rules of another language). > > > > Peter > > > > > > > > -Original Message- > > From: Eli K [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 05, 2008 8:27 AM > > To: solr-user@lucene.apache.org > > Subject: multi-language searching with Solr > > > > Hello folks, > > > > Let me start by saying that I am new to Lucene and Solr. > > > > I am in the process of designing a search back-end for a system that > > > receives 20k documents a day and needs to keep them available for 30 > > days. The documents should be searchable on a free text field and on > > > about 8 other fields. > > > > One of my requirements is to index and search documents in multiple > > languages. I would like to have the ability to stem and provide the > > advanced search features that are based on it. This will only affect > > > the free text field because the rest of the fields are in English. > > > > I can find out the language of the document before indexing and I > > might be able to provide the language to search on. I also need to > > have the ability to search across all indexed languages (there will > > be 20 in total). > > > > Given these requirements do you think this is doable with Solr? A > > major limiting factor is that I need to stick to the 1.2 GA version > > and I cannot utilize the multi-core features in the 1.3 trunk. > > > > I considered writing my own analyzer that will call the appropriate > > Lucene analyzer for the given language but I did not see any way for > > it to access the field that specifies the language of the document. > > > > Thanks, > > > > Eli > > > > p.s. I am looking for an experienced Lucene/Solr consultant to help > >
RE: Delete's increase while adding new documents
Hi all, it seems that we just post to much to fast to Solr. When we post 100 documents (seperate calls) and perform a commit everything goes well, but as soon as we start sending thousands of documents and than use autocommit or send the commit message we have the situation that there are a lot of documents not in the index although they were sended to Solr ... has anyone experience with how much documents you can import and at which speed so that Solr stays stable ? We use Tomcat 5.5 and our java memory limit is 2gb. Greetings, Tim Van: Mike Klaas [EMAIL PROTECTED] Verzonden: dinsdag 6 mei 2008 20:17 Aan: solr-user@lucene.apache.org Onderwerp: Re: Delete's increase while adding new documents On 6-May-08, at 4:56 AM, Tim Mahy wrote: > Hi all, > > it seems that we get errors during the auto-commit : > > > java.io.FileNotFoundException: /opt/solr/upload/nl/archive/data/ > index/_4x.fnm (No such file or directory) >at java.io.RandomAccessFile.open(Native Method) >at java.io.RandomAccessFile.<init> > (RandomAccessFile.java:212) >at org.apache.lucene.store.FSDirectory$FSIndexInput > $Descriptor.<init>(FSDirectory.java:501) >at org.apache.lucene.store.FSDirectory > $FSIndexInput.<init>(FSDirectory.java:526) > > the _4x.fnm file is not on the file system. When we switch from > autocommit to manual commits throughout xml messages we get the same > kind of errors. > Any idea what could be wrong in our configuration to cause these > exceptions ? I have only heard of that error appearing in two cases. Either the index is corrupt, or something else deleted the file. Are you sure that there is only one Solr instance that accesses the directory, and that nothing else ever touches it? Can you reproduce the deletion issue with a small number of documents (something that could be tested by one of us)? -Mike Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
Duplicates results when using a non optimized index
Hi all, is this expected behavior when having an index like this : numDocs : 9479963 maxDoc : 12622942 readerImpl : MultiReader which is in the process of optimizing that when we search through the index we get this : 15257559 15257559 17177888 11825631 11825631 The id field is declared like this : and is set as the unique identity like this in the schema xml : id so the question : is this expected behavior and if so is there a way to let Solr only return unique documents ? greetings and thanx in advance, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
RE: how to clean an index ?
Hi, you can create a delete query matching al your documents like the query "*:*" greetings, Tim Van: Pierre-Yves LANDRON [EMAIL PROTECTED] Verzonden: dinsdag 13 mei 2008 11:53 Aan: solr-user@lucene.apache.org Onderwerp: how to clean an index ? Hello, I want to clean an index (ie delete all documents), but cannot delete the index repertory. Is it possible with the rest interface ? Thanks, Pierre-Yves Landron _ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
RE: Duplicates results when using a non optimized index
Hi, thanks for the answer, - do duplicates go away after optimization is done? --> no, if we search the index even after it is optimized, we still get the duplicate results and even if we search on one of the slaves servers which have the same index through synchronization ... btw this is the first time we notice this, the only thing we have had was the known problem with the "too many open files" which we fixed using the ulimit and rebooted the tomcat server - do duplicate IDs that you are seeing IDs of previously deleted documents? --> it is possible that these documenst were uploaded earlier and have been replaced... - which Solr version are you using and can you try a recent nightly? --> we use the 1.2 stable build greetings, Tim Van: Otis Gospodnetic [EMAIL PROTECTED] Verzonden: woensdag 14 mei 2008 6:11 Aan: solr-user@lucene.apache.org Onderwerp: Re: Duplicates results when using a non optimized index Hm, not sure why that is happening, but here is some info regarding other stuff from your email - there should be no duplicates even if you are searching an index that is being optimized - why are you searching an index that is being optimized? It's doable, but people typically perform index-modifying operations on a Solr master and read-only operations on Solr query slave(s) - do duplicates go away after optimization is done? - do duplicate IDs that you are seeing IDs of previously deleted documents? - which Solr version are you using and can you try a recent nightly? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Tim Mahy <[EMAIL PROTECTED]> > To: "solr-user@lucene.apache.org" > Sent: Tuesday, May 13, 2008 5:59:28 AM > Subject: Duplicates results when using a non optimized index > > Hi all, > > is this expected behavior when having an index like this : > > numDocs : 9479963 > maxDoc : 12622942 > readerImpl : MultiReader > > which is in the process of optimizing that when we search through the index we > get this : > > > 15257559 > > > 15257559 > > > 17177888 > > > 11825631 > > > 11825631 > > > The id field is declared like this : > > > and is set as the unique identity like this in the schema xml : > id > > so the question : is this expected behavior and if so is there a way to let > Solr > only return unique documents ? > > greetings and thanx in advance, > Tim > > > > > Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
RE: Duplicates results when using a non optimized index
Hi, yep it is a very strange problem that we never encountered before. We are uploading all the documents again to see if that solves the problem (hoping that the delete will delete also the multiple document instances) greetings, Tim Van: Otis Gospodnetic [EMAIL PROTECTED] Verzonden: woensdag 14 mei 2008 23:18 Aan: solr-user@lucene.apache.org Onderwerp: Re: Duplicates results when using a non optimized index Tim, Hm, not sure what caused this. 1.2 is now quite old (yes, I know it's the last stable release), so if I were you I would consider moving to 1.3-dev. It sounds like the index is already "polluted" with duplicate documents, so you'll want to rebuild the index whether you decide to stay with 1.2 or move to 1.3-dev. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message ---- > From: Tim Mahy <[EMAIL PROTECTED]> > To: "solr-user@lucene.apache.org" > Sent: Wednesday, May 14, 2008 3:59:23 AM > Subject: RE: Duplicates results when using a non optimized index > > Hi, > > thanks for the answer, > > - do duplicates go away after optimization is done? > --> no, if we search the index even after it is optimized, we still get the > duplicate results and even if we search on one of the slaves servers which > have > the same index through synchronization ... > btw this is the first time we notice this, the only thing we have had was the > known problem with the "too many open files" which we fixed using the ulimit > and > rebooted the tomcat server > > - do duplicate IDs that you are seeing IDs of previously deleted documents? > --> it is possible that these documenst were uploaded earlier and have been > replaced... > > - which Solr version are you using and can you try a recent nightly? > --> we use the 1.2 stable build > > greetings, > Tim > > Van: Otis Gospodnetic [EMAIL PROTECTED] > Verzonden: woensdag 14 mei 2008 6:11 > Aan: solr-user@lucene.apache.org > Onderwerp: Re: Duplicates results when using a non optimized index > > Hm, not sure why that is happening, but here is some info regarding other > stuff > from your email > > - there should be no duplicates even if you are searching an index that is > being > optimized > - why are you searching an index that is being optimized? It's doable, but > people typically perform index-modifying operations on a Solr master and > read-only operations on Solr query slave(s) > - do duplicates go away after optimization is done? > - do duplicate IDs that you are seeing IDs of previously deleted documents? > - which Solr version are you using and can you try a recent nightly? > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > - Original Message > > From: Tim Mahy > > To: "solr-user@lucene.apache.org" > > Sent: Tuesday, May 13, 2008 5:59:28 AM > > Subject: Duplicates results when using a non optimized index > > > > Hi all, > > > > is this expected behavior when having an index like this : > > > > numDocs : 9479963 > > maxDoc : 12622942 > > readerImpl : MultiReader > > > > which is in the process of optimizing that when we search through the index > > we > > get this : > > > > > > 15257559 > > > > > > 15257559 > > > > > > 17177888 > > > > > > 11825631 > > > > > > 11825631 > > > > > > The id field is declared like this : > > > > > > and is set as the unique identity like this in the schema xml : > > id > > > > so the question : is this expected behavior and if so is there a way to let > Solr > > only return unique documents ? > > > > greetings and thanx in advance, > > Tim > > > > > > > > > > Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx > > > > > > Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
RE: solr search
Hi, 1) did you perform a commit after the delete ? 2) in the default schema there are some comments on the different analyzers which should help you get started I think greetings, Tim Van: dharhsana [EMAIL PROTECTED] Verzonden: vrijdag 16 mei 2008 13:56 Aan: solr-user@lucene.apache.org Onderwerp: Re: solr search hi umar, thanks for ur reply ... As per ur sugesstion i have done the search,it perfectly worked well... i have two more question 1) how to delete an id in a solr i tried some example but there is no changes in my index. 2) How to use an Analyzer for querying and indexing, for example i have indexed java1.5,java1.6,java in the index, while iam searching i entered a query like "what is java", it is not fetching any results ,but only if i give java or java is object oriented language something like that only i can see the result. I have no idea of what analyser i have to use. waiting for ur reply, Thank u. with regards Rekha. -- View this message in context: http://www.nabble.com/solr-search-tp17249602p17273000.html Sent from the Solr - User mailing list archive at Nabble.com. Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
RE: hi umar
Hi, you can send a delete query, the delete query is the same syntax as a normal search. so if your id field is called "ID" you can send as query : ID:"1450" instead of *:* (which deletes everything) which will delete in this example the document with id 1450 greetings, Tim Van: dharhsana [EMAIL PROTECTED] Verzonden: vrijdag 16 mei 2008 16:27 Aan: solr-user@lucene.apache.org Onderwerp: hi umar hi,thank u for ur reply.. As per ur suggestion ,the index has been deleted ,can you plz help me out for deleting the index by 'ID' (but not the whole index). For analyzer,i have given the text as the fileld name ,but i didnt get the proper loose search can u give me some more examples for analyzer... waiting for ur reply, with regards, T.Rekha. -- View this message in context: http://www.nabble.com/hi-umar-tp17276060p17276060.html Sent from the Solr - User mailing list archive at Nabble.com. Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
bitwise comparer
Hi all, is there any already existing patch or feature which allows to search bitwise ? so like you would do in mysql : " myField & 5 " which returns in this case all the documents which have myField values 4 and 1 ? greetings, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
Highlighting - field criteria highlights in other fields
Hi all, we have situation in which we have documents that have an introduction (text) , a body (text) and some meta data fields (integers mostly). when we create a query like this : q=( +(body_nl:( brussel) ) AND ( (+publicationid:("3430" OR "3451")) )&fq= +publishdateAsString:[20070520 TO 20080520]&start=0&rows=11&hl=on&hl.fl=body_nl&hl.snippets=3&hl.fragsize=320&hl.simple.pre=&hl.simple.post=&sort=publishdateAsString desc,publicationname desc&fl=id,score,introduction we get nice highlighting from the body_nl field but Solr also highlights 3430 and 3451 if there is such a "word" in the body_nl, while we were expecting only to get highlighting from the word "brussel" in the body_nl. So it seems that all posible criteria terms are highlighted in any of the given highlighting fields. Is it possible to disable this (with some kind of parameter of something) and only let the hl.fl's highlight the criteria for their own field ? greetings, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
Fetching the first 10 results and the last result
Hi all, is there a way to let Solr not only return the total number of found articles, but also the data of the last document when for example only requesting the first 10 documents ? we could do this with a seperate query by either letting the second query fetch 1 row from position = previouscount - 1 or by switching the sorting and fetch only the first row, but it would be nicer if we didn't have to do this second query, so is there any solution to our problem ? greetings and thanks in advance, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx