Sort results on a field not ordered
I have some data which I am querying with ?q=aseptic technique&fl=id,score,chapterTitle&sort=chapterTitle asc&rows=200 The results are being reordered (at least they are no longer in score order) but the order makes no sense: Communication Drug administration: cytotoxic drugs Elimination: bowel care Elimination: bladder lavage and irrigation The context of nursing Observations Pain management and assessment Pain management: Entonox administration Pain management: epidural and intrathecal analgesia Abdominal paracentesis Barrier nursing: nursing the infectious or immunosuppressed patient Perioperative care Personal hygiene: eye care Personal hygiene: mouth care Discharge planning Positioning Drug administration: general principles Haematological procedures Assessment and the process of care Cardiopulmonary resuscitation Scalp cooling Breast aspiration and seroma drainage Specimen collection for microbiological analysis Spinal cord compression management Elimination: stoma care Nutritional support Aseptic technique Compression therapy in the management of lymphoedema Gene therapy for the management of cancer Radioactive source therapy: sealed sources Transfusion of blood and blood products The unconscious patient Radioactive source therapy and diagnostic procedures: unsealed sources Elimination: continent urinary diversions Elimination: urinary Vascular access devices: insertion and management Venepuncture Renal replacement therapy: peritoneal dialysis and continuous venovenous haemodiafiltration Violence: prevention and management Tracheostomy care and laryngectomy care (including voice rehabilitation) Wound management Can anyone shed a light on this? -- View this message in context: http://www.nabble.com/Sort-results-on-a-field-not-ordered-tp17013905p17013905.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to extract terms associated with a field
Thanks a lot. I could able to extract all terms in a field for any query. Also I was wondering how can I extract nearest term info for autocomplete kind of suggestion. In one of my earlier post, I asked same kind of question related to faceted search. -RB On 4/27/08, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > Take a look at the LukeRequestHandler ... it can list all the terms in a > field (or in many fields) along with their frequencies. > > > > > -Hoss > >
Re: Sort results on a field not ordered
What field type is chapterTitle? I'm betting it is an analyzed field with multiple values (tokens/terms) per document. To successfully sort, you'll need to have a single value per document - using copyField can help with this to have both a searchable field and a sortable version. Erik On May 2, 2008, at 6:42 AM, Peter Hickman wrote: I have some data which I am querying with ?q=aseptic technique&fl=id,score,chapterTitle&sort=chapterTitle asc&rows=200 The results are being reordered (at least they are no longer in score order) but the order makes no sense: Communication Drug administration: cytotoxic drugs Elimination: bowel care Elimination: bladder lavage and irrigation The context of nursing Observations Pain management and assessment Pain management: Entonox administration Pain management: epidural and intrathecal analgesia Abdominal paracentesis Barrier nursing: nursing the infectious or immunosuppressed patient Perioperative care Personal hygiene: eye care Personal hygiene: mouth care Discharge planning Positioning Drug administration: general principles Haematological procedures Assessment and the process of care Cardiopulmonary resuscitation Scalp cooling Breast aspiration and seroma drainage Specimen collection for microbiological analysis Spinal cord compression management Elimination: stoma care Nutritional support Aseptic technique Compression therapy in the management of lymphoedema Gene therapy for the management of cancer Radioactive source therapy: sealed sources Transfusion of blood and blood products The unconscious patient Radioactive source therapy and diagnostic procedures: unsealed sources Elimination: continent urinary diversions Elimination: urinary Vascular access devices: insertion and management Venepuncture Renal replacement therapy: peritoneal dialysis and continuous venovenous haemodiafiltration Violence: prevention and management Tracheostomy care and laryngectomy care (including voice rehabilitation) Wound management Can anyone shed a light on this? -- View this message in context: http://www.nabble.com/Sort-results-on- a-field-not-ordered-tp17013905p17013905.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sort results on a field not ordered
Erik Hatcher wrote: What field type is chapterTitle? I'm betting it is an analyzed field with multiple values (tokens/terms) per document. To successfully sort, you'll need to have a single value per document - using copyField can help with this to have both a searchable field and a sortable version. does this apply to facet fields as well? I noticed that that if I set facet.sort="true" the results are indeed sorted by count... until the counts are the same, after which they are in random order (instead of ascii alpha). --Geoff
Re: Sort results on a field not ordered
Erik Hatcher wrote: > > What field type is chapterTitle? I'm betting it is an analyzed > field with multiple values (tokens/terms) per document. To > successfully sort, you'll need to have a single value per document - > using copyField can help with this to have both a searchable field > and a sortable version. > > Erik > The parts from schema.xml are: Which uses this definition of the field type, which I believe is the 'out of the box' settings. But I could be wrong. Is this of any help? -- View this message in context: http://www.nabble.com/Sort-results-on-a-field-not-ordered-tp17013905p17019099.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sort results on a field not ordered
Ok, thanks for the question as it would seem that I have found the answer. Changing to a string field type fixes the problem. As we do not search over the chapterTitle field this is no loss. Thanks again for pointing me in the right direction -- View this message in context: http://www.nabble.com/Sort-results-on-a-field-not-ordered-tp17013905p17019288.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Zappos's new Solr Site
Hey Matt - congratulations on your new site -- it looks great. I'm curious, after a few weeks of having run this way, what your findings are regarding running the shared index on NFS. Any problems as of yet? I assume you're indexing from one machine and calling 'commit' on the others on some schedule to get them to 'see' changes. How is that working out for you? --- Alok K. Dhir [EMAIL PROTECTED] Symplicity Corporation 1 703 351 0200 x 8080 www.symplicity.com On Apr 11, 2008, at 1:35 PM, Matthew Runo wrote: Hello folks! First, the link: https://zeta.zappos.com (it's a very early open beta... we're just very proud of everyone's work and wanted to share it with you all) We've been working on a new site here at Zappos for about the last 7 months, with planning going back almost two years. We looked at Endeca, we looked at Fast, we looked at so many commercial search engine technologies in that time that I can't even remember them all. We ended up choosing Solr, and not just because it's free. Solr has a truly wonderful group of users here who respond to support questions far faster than most paid support contracts. I've never had a question that I couldn't get answered on this list, no matter how stupid it's been (sorry Hoss!) =p Zappos has a long history of using open source technologies to drive their business, and have used Apache 1.3 + Perl 5 for the past 8 years. Our new site is written in Java, and is really built around our Solr index. Solr powers all the navigation and facets, as well as the brand list and brand pages. One of the issues with our old site was how database heavy it was, with some pages generating 100s of queries. Zeta is much better in this regard, and we really think Solr is going to serve us very well. Here's some stats on our Solr index... 158,821 documents in about 2 gigs of disk space, running in Tomcat 6 with 10 gigs of ram set aside. We have 5 servers clustered together, and each runs an instance of zeta.zappos.com and a local copy of solr. For now, each of these servers reads from a single Solr index stored on NFS - we'll see how this works out, and are prepared to store a local copy of the index on each server. Thanks, and we'd love any feedback on the new site (keep in mind, some parts of it aren't quite done). Matthew Runo Software Developer Zappos.com 702.943.7833
Re: Sort results on a field not ordered
On Fri, May 2, 2008 at 8:17 AM, Geoffrey Young <[EMAIL PROTECTED]> wrote: > does this apply to facet fields as well? I noticed that that if I set > facet.sort="true" the results are indeed sorted by count... until the counts > are the same, after which they are in random order (instead of ascii alpha). facet.sort should be the default. Ties in count are broken by order in the term index (not random). This should correspond to alphabetical (ascii). -Yonik
sometimes, snapshooter doesn't work
Hi, Here is the config info in solrconfig.xml: /opt/solr-jetty/solr/bin/snapshooter /opt/solr-jetty/solr/data true arg1 arg2 MYVAR=val1 /opt/solr-jetty/solr/bin/snapshooter /opt/solr-jetty/solr/data true Our system updates the solr index once daily. Mostly, the snapshooter works. Last night, it did not work. Thanks, Feng
Re: sometimes, snapshooter doesn't work
Hi Feng, That's not enough information for anyone to help. You should have a look at the snapshooter log. Here are some other ideas: - What does "did not work" mean? No snapshot was created? - Is your disk/partition full? - Can you run commit or optimize now and see if snapshooter will work? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Feng Gao <[EMAIL PROTECTED]> > To: "solr-user@lucene.apache.org" > Sent: Friday, May 2, 2008 5:09:49 PM > Subject: sometimes, snapshooter doesn't work > > Hi, > > Here is the config info in solrconfig.xml: > > > /opt/solr-jetty/solr/bin/snapshooter > /opt/solr-jetty/solr/data > true >arg1 arg2 >MYVAR=val1 > > > > /opt/solr-jetty/solr/bin/snapshooter > /opt/solr-jetty/solr/data > true > > > Our system updates the solr index once daily. Mostly, the snapshooter works. > Last night, it did not work. > > > Thanks, > > Feng >
Re: Master / slave setup with multicore
snapinstall calls commit to trigger Solr to use the new index. Do you see the commit request in your Solr log? Anything in the snapinstaller log? Bill On Thu, May 1, 2008 at 8:35 PM, James Brady <[EMAIL PROTECTED]> wrote: > Hi Ryan, thanks for that! > > I have one outstanding question: when I take a snapshot on the master, > snappull and snapinstall on the slave, the new index is not being used: > restarting the slave server does pick up the changes, however. > > Has anyone else had this problem with recent development builds? > > In case anyone is trying to do multicore replication, here some of the > things I've done to get it working.. These could go on the wiki somewhere, > what do people think? > > Obviously, have as much shared configuration as possible is ideal. On the > master, I have core-specific: > - scripts.conf, for webapp_name, master_data_dir and master_status_dir > - solrconfig.xml, for the post-commit and post-optimise snapshooter > locations > > On the slave, I have core-specific: > -scripts.conf, as above > > I've also customised snappuller to accept a different rsync module name > (hard coded to 'solr' at present). This module name is set in the slave > scripts.conf > > James > > > On 29 Apr 2008, at 13:44, Ryan McKinley wrote: > > > > On Apr 29, 2008, at 3:09 PM, James Brady wrote: > > > > > Hi all, > > > I'm aiming to use the new multicore features in development versions > > > of Solr. My ideal setup would be to have master / slave servers on the > > > same > > > machine, snapshotting across from the 'write' to the 'read' server at > > > intervals. > > > > > > This was all fine with Solr 1.2, but the rsync & snappuller > > > configuration doesn't seem to be set up to allow for multicore replication > > > in 1.3. > > > > > > The rsyncd.conf file allows for several data directories to be > > > defined, but the snappuller script only handles a single directory, > > > expecting the Lucene index to be directly inside that directory. > > > > > > What's the best practice / best suggestions for replicating a > > > multicore update server out to search servers? > > > > > > > > Currently, for multicore replication you will need to install the snap* > > scripts for _each_ core. The scripts all expect a single core so for > > multiple cores, you will just need to install it multiple times. > > > > ryan > > > >
RE: sometimes, snapshooter doesn't work
Hi Otis, Thanks, Before I sent my first email to solr-user, I checked the follows: 1. Disk is not full, there is 40G available. 2. There is not snapshot created last midnight under solr/data folder. 3. I checked the log and there is no any new log for the snapshoter. 4. I sent the snapshoter command manually at 2008/05/02 11:04:00, the snapshot was created. Here it is a part of the log: --- 2008/05/01 05:30:50 started by solr 2008/05/01 05:30:50 command: /opt/solr-jetty/solr/bin/snapshooter 2008/05/01 05:30:50 taking snapshot /opt/solr-jetty/solr/data/snapshot.20080501053050 2008/05/01 05:30:50 ended (elapsed time: 0 sec) 2008/05/01 19:53:20 started by solr 2008/05/01 19:53:20 command: /opt/solr-jetty/solr/bin/snapshooter arg1 arg2 2008/05/01 19:53:20 taking snapshot /opt/solr-jetty/solr/data/snapshot.20080501195320 2008/05/01 19:53:20 ended (elapsed time: 0 sec) 2008/05/02 11:04:00 started by solr 2008/05/02 11:04:00 command: ./snapshooter 2008/05/02 11:04:00 taking snapshot /opt/solr-jetty/solr/data/snapshot.200805021 10400 2008/05/02 11:04:00 ended (elapsed time: 0 sec) - After 2008/05/01 19:53:20, I am sure that we committed and optimized once around 2008/05/02 00:53:00. And I checked the log of our program which sends the commit and optimize to solr. There is no any exception. And I can see the commit and optimize having been sent to solr in the log. I don't think that the problem is in solr. There is probably a bug in our program. I don't think so as well. (Really??), checking again... The program and solr have been working fine for a few months. This is the second time I met this kind of problem. Thanks, Feng -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: May 2, 2008 11:46 AM To: solr-user@lucene.apache.org Subject: Re: sometimes, snapshooter doesn't work Hi Feng, That's not enough information for anyone to help. You should have a look at the snapshooter log. Here are some other ideas: - What does "did not work" mean? No snapshot was created? - Is your disk/partition full? - Can you run commit or optimize now and see if snapshooter will work? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Feng Gao <[EMAIL PROTECTED]> > To: "solr-user@lucene.apache.org" > Sent: Friday, May 2, 2008 5:09:49 PM > Subject: sometimes, snapshooter doesn't work > > Hi, > > Here is the config info in solrconfig.xml: > > > /opt/solr-jetty/solr/bin/snapshooter > /opt/solr-jetty/solr/data > true >arg1 arg2 >MYVAR=val1 > > > > /opt/solr-jetty/solr/bin/snapshooter > /opt/solr-jetty/solr/data > true > > > Our system updates the solr index once daily. Mostly, the snapshooter works. > Last night, it did not work. > > > Thanks, > > Feng >
Re: Multiple open SegmentReaders?
Hah, thank you for doing this. Sometimes I see MultiSegmentReaders, sometimes SegmentReaders, so both show up from time to time. Right now we've got two MultiSegmentReaders open.. Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On May 1, 2008, at 7:19 PM, Koji Sekiguchi wrote: I can reproduce with solr/example setup. What I did: 1. $ svn co http://svn.apache.org/repos/asf/lucene/solr/trunk TEMP 2. $ cd TEMP 3. $ ant clean example 4. $ cd example 5. $ java -jar start.jar (to post commit) 6. $ cd $SOLR_HOME/example/exampledocs 7. $ ./post.sh then see admin>statistics. I can see MultiSegmentReader instead of SegmentReader, though. name: [EMAIL PROTECTED] main class: org.apache.solr.search.SolrIndexSearcher version: 1.0 description: index searcher stats: caching : true numDocs : 0 maxDoc : 0 readerImpl : MultiSegmentReader readerDir : [EMAIL PROTECTED]:\Project\jakarta \lucene\solr\TEMP\example\solr\data\index indexVersion : 1209693930226 openedAt : Fri May 02 11:05:30 JST 2008 registeredAt : Fri May 02 11:05:30 JST 2008 name: [EMAIL PROTECTED] main class: org.apache.solr.search.SolrIndexSearcher version: 1.0 description: index searcher stats: caching : true numDocs : 0 maxDoc : 0 readerImpl : MultiSegmentReader readerDir : [EMAIL PROTECTED]:\Project\jakarta \lucene\solr\TEMP\example\solr\data\index indexVersion : 1209693930226 openedAt : Fri May 02 11:06:13 JST 2008 registeredAt : Fri May 02 11:06:13 JST 2008 Koji Yonik Seeley wrote: Hmmm, if there is a bug, odds are it's due to multicore stuff - probably nothing else has touched core stuff like that recently. Can you reproduce (or rather help others to reproduce) with the solr/example setup? -Yonik On Wed, Apr 30, 2008 at 5:39 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: Hello! In using the SVN head version of Solr, I've found that recently we started getting multiple open SegmentReaders, all registered... etc.. Any ideas why this would happen? They don't go away unless the server is restarted, and don't go away with commits, etc. In fact, commits seem to cause the issue. They're causing issues since it causes really stale searchers to be around... For example, right now... org.apache.solr.search.SolrIndexSearcher caching : true numDocs : 153312 maxDoc : 153324 readerImpl : SegmentReader readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index indexVersion : 1205944085143 openedAt : Wed Apr 30 14:04:15 PDT 2008 registeredAt : Wed Apr 30 14:04:15 PDT 2008 (and right below that one...) org.apache.solr.search.SolrIndexSearcher caching : true numDocs : 153312 maxDoc : 153324 readerImpl : SegmentReader readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index indexVersion : 1205944085143 openedAt : Wed Apr 30 14:30:02 PDT 2008 registeredAt : Wed Apr 30 14:30:02 PDT 2008 Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833
Re: Master / slave setup with multicore
Ah, wait, my fault - I didn't have the right Solr port configured in the slave, so snapinstaller was commiting the master :/ Thanks, James On 2 May 2008, at 09:17, Bill Au wrote: snapinstall calls commit to trigger Solr to use the new index. Do you see the commit request in your Solr log? Anything in the snapinstaller log? Bill On Thu, May 1, 2008 at 8:35 PM, James Brady <[EMAIL PROTECTED] > wrote: Hi Ryan, thanks for that! I have one outstanding question: when I take a snapshot on the master, snappull and snapinstall on the slave, the new index is not being used: restarting the slave server does pick up the changes, however. Has anyone else had this problem with recent development builds? In case anyone is trying to do multicore replication, here some of the things I've done to get it working.. These could go on the wiki somewhere, what do people think? Obviously, have as much shared configuration as possible is ideal. On the master, I have core-specific: - scripts.conf, for webapp_name, master_data_dir and master_status_dir - solrconfig.xml, for the post-commit and post-optimise snapshooter locations On the slave, I have core-specific: -scripts.conf, as above I've also customised snappuller to accept a different rsync module name (hard coded to 'solr' at present). This module name is set in the slave scripts.conf James On 29 Apr 2008, at 13:44, Ryan McKinley wrote: On Apr 29, 2008, at 3:09 PM, James Brady wrote: Hi all, I'm aiming to use the new multicore features in development versions of Solr. My ideal setup would be to have master / slave servers on the same machine, snapshotting across from the 'write' to the 'read' server at intervals. This was all fine with Solr 1.2, but the rsync & snappuller configuration doesn't seem to be set up to allow for multicore replication in 1.3. The rsyncd.conf file allows for several data directories to be defined, but the snappuller script only handles a single directory, expecting the Lucene index to be directly inside that directory. What's the best practice / best suggestions for replicating a multicore update server out to search servers? Currently, for multicore replication you will need to install the snap* scripts for _each_ core. The scripts all expect a single core so for multiple cores, you will just need to install it multiple times. ryan
Re: Zappos's new Solr Site
We have a dedicated server set up as the "master", with it's own local index. We have an NFS mount (read-only) on each of the other machines which the master copies it's index to every 20 minutes. We run a commit on each "slave" then to force them to open new readers. So far, it's worked fine. I would suggest having the reading and writing done to different indexes though, it makes it easier when you can have a read-only NFS mounted index (no chance of another server updating it at all). Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On May 2, 2008, at 6:41 AM, Alok Dhir wrote: Hey Matt - congratulations on your new site -- it looks great. I'm curious, after a few weeks of having run this way, what your findings are regarding running the shared index on NFS. Any problems as of yet? I assume you're indexing from one machine and calling 'commit' on the others on some schedule to get them to 'see' changes. How is that working out for you? --- Alok K. Dhir [EMAIL PROTECTED] Symplicity Corporation 1 703 351 0200 x 8080 www.symplicity.com On Apr 11, 2008, at 1:35 PM, Matthew Runo wrote: Hello folks! First, the link: https://zeta.zappos.com (it's a very early open beta... we're just very proud of everyone's work and wanted to share it with you all) We've been working on a new site here at Zappos for about the last 7 months, with planning going back almost two years. We looked at Endeca, we looked at Fast, we looked at so many commercial search engine technologies in that time that I can't even remember them all. We ended up choosing Solr, and not just because it's free. Solr has a truly wonderful group of users here who respond to support questions far faster than most paid support contracts. I've never had a question that I couldn't get answered on this list, no matter how stupid it's been (sorry Hoss!) =p Zappos has a long history of using open source technologies to drive their business, and have used Apache 1.3 + Perl 5 for the past 8 years. Our new site is written in Java, and is really built around our Solr index. Solr powers all the navigation and facets, as well as the brand list and brand pages. One of the issues with our old site was how database heavy it was, with some pages generating 100s of queries. Zeta is much better in this regard, and we really think Solr is going to serve us very well. Here's some stats on our Solr index... 158,821 documents in about 2 gigs of disk space, running in Tomcat 6 with 10 gigs of ram set aside. We have 5 servers clustered together, and each runs an instance of zeta.zappos.com and a local copy of solr. For now, each of these servers reads from a single Solr index stored on NFS - we'll see how this works out, and are prepared to store a local copy of the index on each server. Thanks, and we'd love any feedback on the new site (keep in mind, some parts of it aren't quite done). Matthew Runo Software Developer Zappos.com 702.943.7833
Re: Multiple open SegmentReaders?
On Fri, May 2, 2008 at 1:08 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: > Hah, thank you for doing this. Sometimes I see MultiSegmentReaders, > sometimes SegmentReaders, so both show up from time to time. Right now we've > got two MultiSegmentReaders open.. OK, this implies there's a leak and the initial searcher that is opened never gets closed. Could you open a JIRA issue for this? -Yonik > > Thanks! > > Matthew Runo > Software Developer > Zappos.com > 702.943.7833 > > > On May 1, 2008, at 7:19 PM, Koji Sekiguchi wrote: > > > I can reproduce with solr/example setup. > > What I did: > > > > 1. $ svn co http://svn.apache.org/repos/asf/lucene/solr/trunk TEMP > > 2. $ cd TEMP > > 3. $ ant clean example > > 4. $ cd example > > 5. $ java -jar start.jar > > > > (to post commit) > > 6. $ cd $SOLR_HOME/example/exampledocs > > 7. $ ./post.sh > > > > then see admin>statistics. I can see MultiSegmentReader instead of > > SegmentReader, though. > > > > name: [EMAIL PROTECTED] main class: > org.apache.solr.search.SolrIndexSearcher version: 1.0 description: > index searcher stats: caching : true > > numDocs : 0 > > maxDoc : 0 > > readerImpl : MultiSegmentReader > > readerDir : > [EMAIL PROTECTED]:\Project\jakarta\lucene\solr\TEMP\example\solr\data\index > > indexVersion : 1209693930226 > > openedAt : Fri May 02 11:05:30 JST 2008 > > registeredAt : Fri May 02 11:05:30 JST 2008 > > name: [EMAIL PROTECTED] main class: > org.apache.solr.search.SolrIndexSearcher version: 1.0 description: > index searcher stats: caching : true > > numDocs : 0 > > maxDoc : 0 > > readerImpl : MultiSegmentReader > > readerDir : > [EMAIL PROTECTED]:\Project\jakarta\lucene\solr\TEMP\example\solr\data\index > > indexVersion : 1209693930226 > > openedAt : Fri May 02 11:06:13 JST 2008 > > registeredAt : Fri May 02 11:06:13 JST 2008 > > > > Koji > > > > > > Yonik Seeley wrote: > > > > > Hmmm, if there is a bug, odds are it's due to multicore stuff - > > > probably nothing else has touched core stuff like that recently. > > > Can you reproduce (or rather help others to reproduce) with the > > > solr/example setup? > > > > > > -Yonik > > > > > > On Wed, Apr 30, 2008 at 5:39 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Hello! > > > > > > > > In using the SVN head version of Solr, I've found that recently we > started > > > > getting multiple open SegmentReaders, all registered... etc.. > > > > > > > > Any ideas why this would happen? They don't go away unless the server > is > > > > restarted, and don't go away with commits, etc. In fact, commits seem > to > > > > cause the issue. They're causing issues since it causes really stale > > > > searchers to be around... > > > > > > > > For example, right now... > > > > org.apache.solr.search.SolrIndexSearcher > > > > caching : true > > > > numDocs : 153312 > > > > maxDoc : 153324 > > > > readerImpl : SegmentReader > > > > readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index > > > > indexVersion : 1205944085143 > > > > openedAt : Wed Apr 30 14:04:15 PDT 2008 > > > > registeredAt : Wed Apr 30 14:04:15 PDT 2008 > > > > > > > > (and right below that one...) > > > > org.apache.solr.search.SolrIndexSearcher > > > > caching : true > > > > numDocs : 153312 > > > > maxDoc : 153324 > > > > readerImpl : SegmentReader > > > > readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index > > > > indexVersion : 1205944085143 > > > > openedAt : Wed Apr 30 14:30:02 PDT 2008 > > > > registeredAt : Wed Apr 30 14:30:02 PDT 2008 > > > > > > > > Thanks! > > > > > > > > Matthew Runo > > > > Software Developer > > > > Zappos.com > > > > 702.943.7833 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Too many open files
I'm getting this with Solr 1.2 trying to load a large db. Is there a workaround?
Re: Multiple open SegmentReaders?
This bug was introduced in SOLR-509 (committed April 17th). I'm working on a fix now. -Yonik On Fri, May 2, 2008 at 2:32 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Fri, May 2, 2008 at 1:08 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: > > Hah, thank you for doing this. Sometimes I see MultiSegmentReaders, > > sometimes SegmentReaders, so both show up from time to time. Right now > we've > > got two MultiSegmentReaders open.. > > OK, this implies there's a leak and the initial searcher that is > opened never gets closed. > Could you open a JIRA issue for this? > > -Yonik > > > > > > > > Thanks! > > > > Matthew Runo > > Software Developer > > Zappos.com > > 702.943.7833 > > > > > > On May 1, 2008, at 7:19 PM, Koji Sekiguchi wrote: > > > > > I can reproduce with solr/example setup. > > > What I did: > > > > > > 1. $ svn co http://svn.apache.org/repos/asf/lucene/solr/trunk TEMP > > > 2. $ cd TEMP > > > 3. $ ant clean example > > > 4. $ cd example > > > 5. $ java -jar start.jar > > > > > > (to post commit) > > > 6. $ cd $SOLR_HOME/example/exampledocs > > > 7. $ ./post.sh > > > > > > then see admin>statistics. I can see MultiSegmentReader instead of > > > SegmentReader, though. > > > > > > name: [EMAIL PROTECTED] main class: > > org.apache.solr.search.SolrIndexSearcher version: 1.0 description: > > index searcher stats: caching : true > > > numDocs : 0 > > > maxDoc : 0 > > > readerImpl : MultiSegmentReader > > > readerDir : > > [EMAIL PROTECTED]:\Project\jakarta\lucene\solr\TEMP\example\solr\data\index > > > indexVersion : 1209693930226 > > > openedAt : Fri May 02 11:05:30 JST 2008 > > > registeredAt : Fri May 02 11:05:30 JST 2008 > > > name: [EMAIL PROTECTED] main class: > > org.apache.solr.search.SolrIndexSearcher version: 1.0 description: > > index searcher stats: caching : true > > > numDocs : 0 > > > maxDoc : 0 > > > readerImpl : MultiSegmentReader > > > readerDir : > > [EMAIL PROTECTED]:\Project\jakarta\lucene\solr\TEMP\example\solr\data\index > > > indexVersion : 1209693930226 > > > openedAt : Fri May 02 11:06:13 JST 2008 > > > registeredAt : Fri May 02 11:06:13 JST 2008 > > > > > > Koji > > > > > > > > > Yonik Seeley wrote: > > > > > > > Hmmm, if there is a bug, odds are it's due to multicore stuff - > > > > probably nothing else has touched core stuff like that recently. > > > > Can you reproduce (or rather help others to reproduce) with the > > > > solr/example setup? > > > > > > > > -Yonik > > > > > > > > On Wed, Apr 30, 2008 at 5:39 PM, Matthew Runo <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > > > Hello! > > > > > > > > > > In using the SVN head version of Solr, I've found that recently we > > started > > > > > getting multiple open SegmentReaders, all registered... etc.. > > > > > > > > > > Any ideas why this would happen? They don't go away unless the server > > is > > > > > restarted, and don't go away with commits, etc. In fact, commits seem > > to > > > > > cause the issue. They're causing issues since it causes really stale > > > > > searchers to be around... > > > > > > > > > > For example, right now... > > > > > org.apache.solr.search.SolrIndexSearcher > > > > > caching : true > > > > > numDocs : 153312 > > > > > maxDoc : 153324 > > > > > readerImpl : SegmentReader > > > > > readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index > > > > > indexVersion : 1205944085143 > > > > > openedAt : Wed Apr 30 14:04:15 PDT 2008 > > > > > registeredAt : Wed Apr 30 14:04:15 PDT 2008 > > > > > > > > > > (and right below that one...) > > > > > org.apache.solr.search.SolrIndexSearcher > > > > > caching : true > > > > > numDocs : 153312 > > > > > maxDoc : 153324 > > > > > readerImpl : SegmentReader > > > > > readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index > > > > > indexVersion : 1205944085143 > > > > > openedAt : Wed Apr 30 14:30:02 PDT 2008 > > > > > registeredAt : Wed Apr 30 14:30:02 PDT 2008 > > > > > > > > > > Thanks! > > > > > > > > > > Matthew Runo > > > > > Software Developer > > > > > Zappos.com > > > > > 702.943.7833 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: Too many open files
I'm not sure what "large db" you are referring to (indexing a RDBMS to Solr?), but the first thing to do is run ulimit -a (or some flavour of it, depending on the OS) and increase the open file descriptors limit if the one you see there is just very low (e.g. 1024). If that limit is not low, make sure that things are getting closed properly by your app, so there are no f.d. leaks. Also, make sure mergeFactor is not ridiculously high. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "Wagner,Harry" <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, May 2, 2008 9:04:56 PM > Subject: Too many open files > > I'm getting this with Solr 1.2 trying to load a large db. Is there a > workaround? > >
Distributed Search (shard) w/ Multicore?
Hi, Im trying to figure out if I can do this or if something else needs to be set, trying to run a query over multiple cores w/ the shard param? I seem to be getting the correct number of results back but no data ... any ideas? Thanks. - Jon
Re: Distributed Search (shard) w/ Multicore?
On Fri, May 2, 2008 at 3:36 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > Im trying to figure out if I can do this or if something else needs to be > set, trying to run a query over multiple cores w/ the shard param? I seem > to be getting the correct number of results back but no data ... any ideas? Should work OK (note that schemas should match across cores... distributed search is not federated search). You might need to be a little more explicit about what you are sending and what you are getting back (the actual URL of the request, and the actual XML of the response). -Yonik
Re: Shared index base
Here's another question on this rather old thread -- while poring through various options in solrconfig, I came across the the 'native' lockType option. That seems to indicate that SOLR/Lucene should work fine with multiple writers, as long as a proper locking mechanism is in place, such as would be provided by a POSIX compliant cluster file system, such as GPFS, GFS, Ibrix, OCFS2... Single shared index, multiple readers/writers, as long as the underlying filesystem implements fs locks properly. Is this correct? --- Alok K. Dhir [EMAIL PROTECTED] Symplicity Corporation 1 703 351 0200 x 8080 www.symplicity.com On Feb 27, 2008, at 3:10 AM, Otis Gospodnetic wrote: Alok: correct - commit causes Solr to re-open the index. Gene: That should work just fine. While you can't have multiple concurrent writers, you can send multiple concurrent indexing requests to a single Solr instance designated to be the master. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Alok K. Dhir <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, February 26, 2008 7:51:19 PM Subject: Re: Shared index base thanks for your response - i've been waiting for this very clarification. so 'commit()' makes readers re-read the indexes? On Feb 26, 2008, at 7:03 PM, Mike Klaas wrote: There hasn't really been a concrete answer given in this thread, so: It works to point multiple Solr's at a single data dir, but you can't have more than one writer. If you try, the index could become corrupted or inconsistent (especially if you are using 'simple' lock type). Also, the Solrs do not communicate with each other. You have to tell the readers manually that the index is updated (via commit()--autoCommit will not work). -Mike On 26-Feb-08, at 9:39 AM, Alok Dhir wrote: Are you saying all the servers will use the same 'data' dir? Is that a supported config? On Feb 26, 2008, at 12:29 PM, Matthew Runo wrote: We're about to do the same thing here, but have not tried yet. We currently run Solr with replication across several servers. So long as only one server is doing updates to the index, I think it should work fine. Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On Feb 26, 2008, at 7:51 AM, Evgeniy Strokin wrote: I know there was such discussions about the subject, but I want to ask again if somebody could share more information. We are planning to have several separate servers for our search engine. One of them will be index/search server, and all others are search only. We want to use SAN (BTW: should we consider something else?) and give access to it from all servers. So all servers will use the same index base, without any replication, same files. Is this a good practice? Did somebody do the same? Any problems noticed? Or any suggestions, even about different configurations are highly appreciated. Thanks, Gene
Re: Distributed Search (shard) w/ Multicore?
Sorry about that, Im sending something simple like: http://search.company.com:8115/solr/search/players?q=Smith&shards=box1:8115/search/players,box2:8115/search/players Im getting back: 0 18 Identical schemas, it found the correct 13 but no docs attached. In the logs I can see the results come back (w/ wt=javabin&isShared=true) ... - Jon On May 2, 2008, at 3:41 PM, Yonik Seeley wrote: On Fri, May 2, 2008 at 3:36 PM, Jon Baer <[EMAIL PROTECTED]> wrote: Im trying to figure out if I can do this or if something else needs to be set, trying to run a query over multiple cores w/ the shard param? I seem to be getting the correct number of results back but no data ... any ideas? Should work OK (note that schemas should match across cores... distributed search is not federated search). You might need to be a little more explicit about what you are sending and what you are getting back (the actual URL of the request, and the actual XML of the response). -Yonik
Re: Distributed Search (shard) w/ Multicore?
Try adding echoParams=all to the request. Maybe there is a default rows=0 or something. Are you using a recent version of Solr? -Yonik On Fri, May 2, 2008 at 4:30 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > Sorry about that, Im sending something simple like: > >http://search.company.com:8115/solr/search/players?q=Smith&shards=box1:8115/search/players,box2:8115/search/players > > Im getting back: > > > > 0 > 18 > > > > > Identical schemas, it found the correct 13 but no docs attached. In the > logs I can see the results come back (w/ wt=javabin&isShared=true) ... > > - Jon > > > > On May 2, 2008, at 3:41 PM, Yonik Seeley wrote: > > > > On Fri, May 2, 2008 at 3:36 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > > > > > Im trying to figure out if I can do this or if something else needs to > be > > > set, trying to run a query over multiple cores w/ the shard param? I > seem > > > to be getting the correct number of results back but no data ... any > ideas? > > > > > > > Should work OK (note that schemas should match across cores... > > distributed search is not federated search). > > You might need to be a little more explicit about what you are sending > > and what you are getting back (the actual URL of the request, and the > > actual XML of the response). > > > > -Yonik
Re: solr on ubuntu 8.04
Hardy has solr packages already. You might want to look how they packaged solr if you cannot move to that version. Did you just drop the war file? Or did you use JNDI? You probably need to configure solr/home, and maybe fiddle with securitymanager stuff. Albert On Thu, May 1, 2008 at 6:46 PM, Jack Bates <[EMAIL PROTECTED]> wrote: > I am trying to evaluate Solr for an open source records management > project to which I contribute: http://code.google.com/p/qubit-toolkit/ > > I installed the Ubuntu solr-tomcat5.5 package: > http://packages.ubuntu.com/hardy/solr-tomcat5.5 > > - and pointed my browser at: http://localhost:8180/solr/admin (The > Ubuntu and Debian Tomcat packages run on port 8180) > > However, in response I get a Tomcat 404: The requested > resource(/solr/admin) is not available. > > This differs from the response I get accessing a random URL: > http://localhost:8180/foo/bar > > - which displays a blank page. > > From this I gather that the solr-tomcat5.5 package installed > *something*, but that it's misconfigured or missing something. > Unfortunately I lack the Java / Tomcat experience to track down this > problem. Can someone recommend where to look, to learn why the Ubuntu > solr-tomcat5.5 package is not working? > > I started an Ubuntu wiki page to eventually describe the process of > installing Solr on Ubuntu: https://wiki.ubuntu.com/Solr > > Thanks, Jack >
Question on WhitespaceTokenizerFactory concatenateAll
Hi, I have a requirement that one of the fields that I had indexed as a Text Field earlier should now return me results when searched with blank spaces in between the word. I had tried to use the example in wiki(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-83 c527b144cd9f71c341e7c4a061daee382bca40) to do so. I changed my schema to this. How ever, I still am not getting back any results when the searched word has space in between them. I have not re indexed the data after the change in schema.xml. Is there a way that I still get back results without having to re-index them. Warm Regards, Sundar Sankarnarayanan Software Engineer @University of Phoenix
Re: Shared index base
On 2-May-08, at 1:20 PM, Alok Dhir wrote: Here's another question on this rather old thread -- while poring through various options in solrconfig, I came across the the 'native' lockType option. That seems to indicate that SOLR/Lucene should work fine with multiple writers, as long as a proper locking mechanism is in place, such as would be provided by a POSIX compliant cluster file system, such as GPFS, GFS, Ibrix, OCFS2... Single shared index, multiple readers/writers, as long as the underlying filesystem implements fs locks properly. Is this correct? No. You wll avoid index corruption, but deletions/updates may not be handled properly. -Mike