Sort results on a field not ordered

2008-05-02 Thread Peter Hickman

I have some data which I am querying with

?q=aseptic technique&fl=id,score,chapterTitle&sort=chapterTitle asc&rows=200

The results are being reordered (at least they are no longer in score order)
but the order makes no sense:

Communication
Drug administration: cytotoxic drugs
Elimination: bowel care
Elimination: bladder lavage and irrigation
The context of nursing
Observations
Pain management and assessment
Pain management: Entonox administration
Pain management: epidural and intrathecal analgesia
Abdominal paracentesis
Barrier nursing: nursing the infectious or immunosuppressed patient
Perioperative care
Personal hygiene: eye care
Personal hygiene: mouth care
Discharge planning
Positioning
Drug administration: general principles
Haematological procedures
Assessment and the process of care
Cardiopulmonary resuscitation
Scalp cooling
Breast aspiration and seroma drainage
Specimen collection for microbiological analysis
Spinal cord compression management
Elimination: stoma care
Nutritional support
Aseptic technique
Compression therapy in the management of lymphoedema
Gene therapy for the management of cancer
Radioactive source therapy: sealed sources
Transfusion of blood and blood products
The unconscious patient
Radioactive source therapy and diagnostic procedures: unsealed sources
Elimination: continent urinary diversions
Elimination: urinary
Vascular access devices: insertion and management
Venepuncture
Renal replacement therapy: peritoneal dialysis and continuous venovenous
haemodiafiltration
Violence: prevention and management
Tracheostomy care and laryngectomy care (including voice rehabilitation)
Wound management

Can anyone shed a light on this?

-- 
View this message in context: 
http://www.nabble.com/Sort-results-on-a-field-not-ordered-tp17013905p17013905.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to extract terms associated with a field

2008-05-02 Thread Rantjil Bould
Thanks a lot. I could able to extract all terms in a field for any query.
Also I was wondering how can I extract nearest term info for autocomplete
kind of suggestion. In one of my earlier post, I asked same kind of question
related to faceted search.

-RB


On 4/27/08, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> Take a look at the LukeRequestHandler ... it can list all the terms in a
> field (or in many fields) along with their frequencies.
>
>
>
>
> -Hoss
>
>


Re: Sort results on a field not ordered

2008-05-02 Thread Erik Hatcher
What field type is chapterTitle?   I'm betting it is an analyzed  
field with multiple values (tokens/terms) per document.  To  
successfully sort, you'll need to have a single value per document -  
using copyField can help with this to have both a searchable field  
and a sortable version.


Erik



On May 2, 2008, at 6:42 AM, Peter Hickman wrote:


I have some data which I am querying with

?q=aseptic technique&fl=id,score,chapterTitle&sort=chapterTitle  
asc&rows=200


The results are being reordered (at least they are no longer in  
score order)

but the order makes no sense:

Communication
Drug administration: cytotoxic drugs
Elimination: bowel care
Elimination: bladder lavage and irrigation
The context of nursing
Observations
Pain management and assessment
Pain management: Entonox administration
Pain management: epidural and intrathecal analgesia
Abdominal paracentesis
Barrier nursing: nursing the infectious or immunosuppressed patient
Perioperative care
Personal hygiene: eye care
Personal hygiene: mouth care
Discharge planning
Positioning
Drug administration: general principles
Haematological procedures
Assessment and the process of care
Cardiopulmonary resuscitation
Scalp cooling
Breast aspiration and seroma drainage
Specimen collection for microbiological analysis
Spinal cord compression management
Elimination: stoma care
Nutritional support
Aseptic technique
Compression therapy in the management of lymphoedema
Gene therapy for the management of cancer
Radioactive source therapy: sealed sources
Transfusion of blood and blood products
The unconscious patient
Radioactive source therapy and diagnostic procedures: unsealed sources
Elimination: continent urinary diversions
Elimination: urinary
Vascular access devices: insertion and management
Venepuncture
Renal replacement therapy: peritoneal dialysis and continuous  
venovenous

haemodiafiltration
Violence: prevention and management
Tracheostomy care and laryngectomy care (including voice  
rehabilitation)

Wound management

Can anyone shed a light on this?

--
View this message in context: http://www.nabble.com/Sort-results-on- 
a-field-not-ordered-tp17013905p17013905.html

Sent from the Solr - User mailing list archive at Nabble.com.




Re: Sort results on a field not ordered

2008-05-02 Thread Geoffrey Young



Erik Hatcher wrote:
What field type is chapterTitle?   I'm betting it is an analyzed field 
with multiple values (tokens/terms) per document.  To successfully sort, 
you'll need to have a single value per document - using copyField can 
help with this to have both a searchable field and a sortable version.


does this apply to facet fields as well?  I noticed that that if I set 
facet.sort="true" the results are indeed sorted by count... until the 
counts are the same, after which they are in random order (instead of 
ascii alpha).


--Geoff


Re: Sort results on a field not ordered

2008-05-02 Thread Peter Hickman


Erik Hatcher wrote:
> 
> What field type is chapterTitle?   I'm betting it is an analyzed  
> field with multiple values (tokens/terms) per document.  To  
> successfully sort, you'll need to have a single value per document -  
> using copyField can help with this to have both a searchable field  
> and a sortable version.
> 
>   Erik
> 

The parts from schema.xml are:



Which uses this definition of the field type, which I believe is the 'out of
the box' settings. But I could be wrong.


  






  
  







  


Is this of any help?

-- 
View this message in context: 
http://www.nabble.com/Sort-results-on-a-field-not-ordered-tp17013905p17019099.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Sort results on a field not ordered

2008-05-02 Thread Peter Hickman

Ok, thanks for the question as it would seem that I have found the answer.
Changing to a string field type fixes the problem. As we do not search over
the chapterTitle field this is no loss.

Thanks again for pointing me in the right direction

-- 
View this message in context: 
http://www.nabble.com/Sort-results-on-a-field-not-ordered-tp17013905p17019288.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Zappos's new Solr Site

2008-05-02 Thread Alok Dhir

Hey Matt - congratulations on your new site -- it looks great.

I'm curious, after a few weeks of having run this way, what your  
findings are regarding running the shared index on NFS.  Any problems  
as of yet?


I assume you're indexing from one machine and calling 'commit' on the  
others on some schedule to get them to 'see' changes.


How is that working out for you?

---
Alok K. Dhir
[EMAIL PROTECTED]
Symplicity Corporation
1 703 351 0200 x 8080
www.symplicity.com

On Apr 11, 2008, at 1:35 PM, Matthew Runo wrote:


Hello folks!

First, the link: https://zeta.zappos.com (it's a very early open  
beta... we're just very proud of everyone's work and wanted to share  
it with you all)


We've been working on a new site here at Zappos for about the last 7  
months, with planning going back almost two years. We looked at  
Endeca, we looked at Fast, we looked at so many commercial  
search engine technologies in that time that I can't even remember  
them all. We ended up choosing Solr, and not just because it's free.  
Solr has a truly wonderful group of users here who respond to  
support questions far faster than most paid support contracts. I've  
never had a question that I couldn't get answered on this list, no  
matter how stupid it's been (sorry Hoss!) =p


Zappos has a long history of using open source technologies to drive  
their business, and have used Apache 1.3 + Perl 5 for the past 8  
years. Our new site is written in Java, and is really built around  
our Solr index. Solr powers all the navigation and facets, as well  
as the brand list and brand pages. One of the issues with our old  
site was how database heavy it was, with some pages generating 100s  
of queries. Zeta is much better in this regard, and we really think  
Solr is going to serve us very well.


Here's some stats on our Solr index...  158,821 documents in about 2  
gigs of disk space, running in Tomcat 6 with 10 gigs of ram set  
aside. We have 5 servers clustered together, and each runs an  
instance of zeta.zappos.com and a local copy of solr. For now, each  
of these servers reads from a single Solr index stored on NFS -  
we'll see how this works out, and are prepared to store a local copy  
of the index on each server.


Thanks, and we'd love any feedback on the new site (keep in mind,  
some parts of it aren't quite done).


Matthew Runo
Software Developer
Zappos.com
702.943.7833





Re: Sort results on a field not ordered

2008-05-02 Thread Yonik Seeley
On Fri, May 2, 2008 at 8:17 AM, Geoffrey Young
<[EMAIL PROTECTED]> wrote:
>  does this apply to facet fields as well?  I noticed that that if I set
> facet.sort="true" the results are indeed sorted by count... until the counts
> are the same, after which they are in random order (instead of ascii alpha).

facet.sort should be the default.
Ties in count are broken by order in the term index (not random).
This should correspond to alphabetical (ascii).

-Yonik


sometimes, snapshooter doesn't work

2008-05-02 Thread Feng Gao
Hi,

Here is the config info in solrconfig.xml:


  /opt/solr-jetty/solr/bin/snapshooter
  /opt/solr-jetty/solr/data
  true
   arg1 arg2 
   MYVAR=val1 



  /opt/solr-jetty/solr/bin/snapshooter
  /opt/solr-jetty/solr/data
  true


Our system updates the solr index once daily. Mostly, the snapshooter works.
Last night, it did not work.


Thanks,

Feng


Re: sometimes, snapshooter doesn't work

2008-05-02 Thread Otis Gospodnetic
Hi Feng,

That's not enough information for anyone to help.  You should have a look at 
the snapshooter log.
Here are some other ideas:
- What does "did not work" mean?  No snapshot was created?
- Is your disk/partition full?
- Can you run commit or optimize now and see if snapshooter will work?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Feng Gao <[EMAIL PROTECTED]>
> To: "solr-user@lucene.apache.org" 
> Sent: Friday, May 2, 2008 5:09:49 PM
> Subject: sometimes, snapshooter doesn't work 
> 
> Hi,
> 
> Here is the config info in solrconfig.xml:
> 
> 
>   /opt/solr-jetty/solr/bin/snapshooter
>   /opt/solr-jetty/solr/data
>   true
>arg1 arg2 
>MYVAR=val1 
> 
> 
> 
>   /opt/solr-jetty/solr/bin/snapshooter
>   /opt/solr-jetty/solr/data
>   true
> 
> 
> Our system updates the solr index once daily. Mostly, the snapshooter works.
> Last night, it did not work.
> 
> 
> Thanks,
> 
> Feng
> 




Re: Master / slave setup with multicore

2008-05-02 Thread Bill Au
snapinstall calls commit to trigger Solr to use the new index.  Do you see
the commit request in your Solr log?  Anything in the snapinstaller log?

Bill

On Thu, May 1, 2008 at 8:35 PM, James Brady <[EMAIL PROTECTED]>
wrote:

> Hi Ryan, thanks for that!
>
> I have one outstanding question: when I take a snapshot on the master,
> snappull and snapinstall on the slave, the new index is not being used:
> restarting the slave server does pick up the changes, however.
>
> Has anyone else had this problem with recent development builds?
>
> In case anyone is trying to do multicore replication, here some of the
> things I've done to get it working.. These could go on the wiki somewhere,
> what do people think?
>
> Obviously, have as much shared configuration as possible is ideal. On the
> master, I have core-specific:
> - scripts.conf, for webapp_name, master_data_dir and master_status_dir
> - solrconfig.xml, for the post-commit and post-optimise snapshooter
> locations
>
> On the slave, I have core-specific:
> -scripts.conf, as above
>
> I've also customised snappuller to accept a different rsync module name
> (hard coded to 'solr' at present). This module name is set in the slave
> scripts.conf
>
> James
>
>
> On 29 Apr 2008, at 13:44, Ryan McKinley wrote:
>
>
> > On Apr 29, 2008, at 3:09 PM, James Brady wrote:
> >
> > > Hi all,
> > > I'm aiming to use the new multicore features in development versions
> > > of Solr. My ideal setup would be to have master / slave servers on the 
> > > same
> > > machine, snapshotting across from the 'write' to the 'read' server at
> > > intervals.
> > >
> > > This was all fine with Solr 1.2, but the rsync & snappuller
> > > configuration doesn't seem to be set up to allow for multicore replication
> > > in 1.3.
> > >
> > > The rsyncd.conf file allows for several data directories to be
> > > defined, but the snappuller script only handles a single directory,
> > > expecting the Lucene index to be directly inside that directory.
> > >
> > > What's the best practice / best suggestions for replicating a
> > > multicore update server out to search servers?
> > >
> > >
> > Currently, for multicore replication you will need to install the snap*
> > scripts for _each_ core.  The scripts all expect a single core so for
> > multiple cores, you will just need to install it multiple times.
> >
> > ryan
> >
>
>


RE: sometimes, snapshooter doesn't work

2008-05-02 Thread Feng Gao
Hi Otis,

Thanks,
Before I sent my first email to solr-user, I checked the follows:

1. Disk is not full, there is 40G available.
2. There is not snapshot created last midnight under solr/data folder.
3. I checked the log and there is no any new log for the snapshoter.
4. I sent the snapshoter command manually at 2008/05/02 11:04:00, the snapshot 
was created.

Here it is a part of the log:
---
2008/05/01 05:30:50 started by solr
2008/05/01 05:30:50 command: /opt/solr-jetty/solr/bin/snapshooter
2008/05/01 05:30:50 taking snapshot 
/opt/solr-jetty/solr/data/snapshot.20080501053050
2008/05/01 05:30:50 ended (elapsed time: 0 sec)
2008/05/01 19:53:20 started by solr
2008/05/01 19:53:20 command: /opt/solr-jetty/solr/bin/snapshooter arg1 arg2
2008/05/01 19:53:20 taking snapshot 
/opt/solr-jetty/solr/data/snapshot.20080501195320
2008/05/01 19:53:20 ended (elapsed time: 0 sec)
2008/05/02 11:04:00 started by solr
2008/05/02 11:04:00 command: ./snapshooter
2008/05/02 11:04:00 taking snapshot 
/opt/solr-jetty/solr/data/snapshot.200805021
 10400
2008/05/02 11:04:00 ended (elapsed time: 0 sec)
-
After 2008/05/01 19:53:20, I am sure that we committed and optimized once 
around 2008/05/02 00:53:00.

And I checked the log of our program which sends the commit and optimize to 
solr. There is no any exception. And I can see the commit and optimize having 
been sent to solr in the log. I don't think that the problem is in solr. There 
is probably a bug in our program. I don't think so as well. (Really??), 
checking again...

The program and solr have been working fine for a few months. This is the 
second time I met this kind of problem.

Thanks,


Feng


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: May 2, 2008 11:46 AM
To: solr-user@lucene.apache.org
Subject: Re: sometimes, snapshooter doesn't work

Hi Feng,

That's not enough information for anyone to help.  You should have a look at 
the snapshooter log.
Here are some other ideas:
- What does "did not work" mean?  No snapshot was created?
- Is your disk/partition full?
- Can you run commit or optimize now and see if snapshooter will work?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Feng Gao <[EMAIL PROTECTED]>
> To: "solr-user@lucene.apache.org" 
> Sent: Friday, May 2, 2008 5:09:49 PM
> Subject: sometimes, snapshooter doesn't work
>
> Hi,
>
> Here is the config info in solrconfig.xml:
>
>
>   /opt/solr-jetty/solr/bin/snapshooter
>   /opt/solr-jetty/solr/data
>   true
>arg1 arg2
>MYVAR=val1
>
>
>
>   /opt/solr-jetty/solr/bin/snapshooter
>   /opt/solr-jetty/solr/data
>   true
>
>
> Our system updates the solr index once daily. Mostly, the snapshooter works.
> Last night, it did not work.
>
>
> Thanks,
>
> Feng
>




Re: Multiple open SegmentReaders?

2008-05-02 Thread Matthew Runo
Hah, thank you for doing this. Sometimes I see MultiSegmentReaders,  
sometimes SegmentReaders, so both show up from time to time. Right now  
we've got two MultiSegmentReaders open..


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On May 1, 2008, at 7:19 PM, Koji Sekiguchi wrote:

I can reproduce with solr/example setup.
What I did:

1. $ svn co http://svn.apache.org/repos/asf/lucene/solr/trunk TEMP
2. $ cd TEMP
3. $ ant clean example
4. $ cd example
5. $ java -jar start.jar

(to post commit)
6. $ cd $SOLR_HOME/example/exampledocs
7. $ ./post.sh

then see admin>statistics. I can see MultiSegmentReader instead of
SegmentReader, though.

name:  [EMAIL PROTECTED] main class:  
org.apache.solr.search.SolrIndexSearcher version: 1.0  
description: index searcher stats: caching : true

numDocs : 0
maxDoc : 0
readerImpl : MultiSegmentReader
readerDir : [EMAIL PROTECTED]:\Project\jakarta 
\lucene\solr\TEMP\example\solr\data\index

indexVersion : 1209693930226
openedAt : Fri May 02 11:05:30 JST 2008
registeredAt : Fri May 02 11:05:30 JST 2008
 name: [EMAIL PROTECTED] main class:  
org.apache.solr.search.SolrIndexSearcher version: 1.0  
description: index searcher stats: caching : true

numDocs : 0
maxDoc : 0
readerImpl : MultiSegmentReader
readerDir : [EMAIL PROTECTED]:\Project\jakarta 
\lucene\solr\TEMP\example\solr\data\index

indexVersion : 1209693930226
openedAt : Fri May 02 11:06:13 JST 2008
registeredAt : Fri May 02 11:06:13 JST 2008

Koji


Yonik Seeley wrote:

Hmmm, if there is a bug, odds are it's due to multicore stuff  -
probably nothing else has touched core stuff like that recently.
Can you reproduce (or rather help others to reproduce) with the
solr/example setup?

-Yonik

On Wed, Apr 30, 2008 at 5:39 PM, Matthew Runo <[EMAIL PROTECTED]>  
wrote:



Hello!

In using the SVN head version of Solr, I've found that recently we  
started

getting multiple open SegmentReaders, all registered... etc..

Any ideas why this would happen? They don't go away unless the  
server is
restarted, and don't go away with commits, etc. In fact, commits  
seem to

cause the issue. They're causing issues since it causes really stale
searchers to be around...

For example, right now...
org.apache.solr.search.SolrIndexSearcher
caching : true
numDocs : 153312
maxDoc : 153324
readerImpl : SegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944085143
openedAt : Wed Apr 30 14:04:15 PDT 2008
registeredAt : Wed Apr 30 14:04:15 PDT 2008

(and right below that one...)
org.apache.solr.search.SolrIndexSearcher
caching : true
numDocs : 153312
maxDoc : 153324
readerImpl : SegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944085143
openedAt : Wed Apr 30 14:30:02 PDT 2008
registeredAt : Wed Apr 30 14:30:02 PDT 2008

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833












Re: Master / slave setup with multicore

2008-05-02 Thread James Brady
Ah, wait, my fault - I didn't have the right Solr port configured in  
the slave, so snapinstaller was commiting the master :/


Thanks,
James

On 2 May 2008, at 09:17, Bill Au wrote:

snapinstall calls commit to trigger Solr to use the new index.  Do  
you see
the commit request in your Solr log?  Anything in the snapinstaller  
log?


Bill

On Thu, May 1, 2008 at 8:35 PM, James Brady <[EMAIL PROTECTED] 
>

wrote:


Hi Ryan, thanks for that!

I have one outstanding question: when I take a snapshot on the  
master,
snappull and snapinstall on the slave, the new index is not being  
used:

restarting the slave server does pick up the changes, however.

Has anyone else had this problem with recent development builds?

In case anyone is trying to do multicore replication, here some of  
the
things I've done to get it working.. These could go on the wiki  
somewhere,

what do people think?

Obviously, have as much shared configuration as possible is ideal.  
On the

master, I have core-specific:
- scripts.conf, for webapp_name, master_data_dir and  
master_status_dir

- solrconfig.xml, for the post-commit and post-optimise snapshooter
locations

On the slave, I have core-specific:
-scripts.conf, as above

I've also customised snappuller to accept a different rsync module  
name
(hard coded to 'solr' at present). This module name is set in the  
slave

scripts.conf

James


On 29 Apr 2008, at 13:44, Ryan McKinley wrote:



On Apr 29, 2008, at 3:09 PM, James Brady wrote:


Hi all,
I'm aiming to use the new multicore features in development  
versions
of Solr. My ideal setup would be to have master / slave servers  
on the same
machine, snapshotting across from the 'write' to the 'read'  
server at

intervals.

This was all fine with Solr 1.2, but the rsync & snappuller
configuration doesn't seem to be set up to allow for multicore  
replication

in 1.3.

The rsyncd.conf file allows for several data directories to be
defined, but the snappuller script only handles a single directory,
expecting the Lucene index to be directly inside that directory.

What's the best practice / best suggestions for replicating a
multicore update server out to search servers?


Currently, for multicore replication you will need to install the  
snap*
scripts for _each_ core.  The scripts all expect a single core so  
for

multiple cores, you will just need to install it multiple times.

ryan








Re: Zappos's new Solr Site

2008-05-02 Thread Matthew Runo
We have a dedicated server set up as the "master", with it's own local  
index. We have an NFS mount (read-only) on each of the other machines  
which the master copies it's index to every 20 minutes. We run a  
commit on each "slave" then to force them to open new readers. So far,  
it's worked fine. I would suggest having the reading and writing done  
to different indexes though, it makes it easier when you can have a  
read-only NFS mounted index (no chance of another server updating it  
at all).


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On May 2, 2008, at 6:41 AM, Alok Dhir wrote:

Hey Matt - congratulations on your new site -- it looks great.

I'm curious, after a few weeks of having run this way, what your  
findings are regarding running the shared index on NFS.  Any  
problems as of yet?


I assume you're indexing from one machine and calling 'commit' on  
the others on some schedule to get them to 'see' changes.


How is that working out for you?

---
Alok K. Dhir
[EMAIL PROTECTED]
Symplicity Corporation
1 703 351 0200 x 8080
www.symplicity.com

On Apr 11, 2008, at 1:35 PM, Matthew Runo wrote:


Hello folks!

First, the link: https://zeta.zappos.com (it's a very early open  
beta... we're just very proud of everyone's work and wanted to  
share it with you all)


We've been working on a new site here at Zappos for about the last  
7 months, with planning going back almost two years. We looked at  
Endeca, we looked at Fast, we looked at so many commercial  
search engine technologies in that time that I can't even remember  
them all. We ended up choosing Solr, and not just because it's  
free. Solr has a truly wonderful group of users here who respond to  
support questions far faster than most paid support contracts. I've  
never had a question that I couldn't get answered on this list, no  
matter how stupid it's been (sorry Hoss!) =p


Zappos has a long history of using open source technologies to  
drive their business, and have used Apache 1.3 + Perl 5 for the  
past 8 years. Our new site is written in Java, and is really built  
around our Solr index. Solr powers all the navigation and facets,  
as well as the brand list and brand pages. One of the issues with  
our old site was how database heavy it was, with some pages  
generating 100s of queries. Zeta is much better in this regard, and  
we really think Solr is going to serve us very well.


Here's some stats on our Solr index...  158,821 documents in about  
2 gigs of disk space, running in Tomcat 6 with 10 gigs of ram set  
aside. We have 5 servers clustered together, and each runs an  
instance of zeta.zappos.com and a local copy of solr. For now, each  
of these servers reads from a single Solr index stored on NFS -  
we'll see how this works out, and are prepared to store a local  
copy of the index on each server.


Thanks, and we'd love any feedback on the new site (keep in mind,  
some parts of it aren't quite done).


Matthew Runo
Software Developer
Zappos.com
702.943.7833







Re: Multiple open SegmentReaders?

2008-05-02 Thread Yonik Seeley
On Fri, May 2, 2008 at 1:08 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:
> Hah, thank you for doing this. Sometimes I see MultiSegmentReaders,
> sometimes SegmentReaders, so both show up from time to time. Right now we've
> got two MultiSegmentReaders open..

OK, this implies there's a leak and the initial searcher that is
opened never gets closed.
Could you open a JIRA issue for this?

-Yonik


>
>  Thanks!
>
>  Matthew Runo
>  Software Developer
>  Zappos.com
>  702.943.7833
>
>
>  On May 1, 2008, at 7:19 PM, Koji Sekiguchi wrote:
>
> > I can reproduce with solr/example setup.
> > What I did:
> >
> > 1. $ svn co http://svn.apache.org/repos/asf/lucene/solr/trunk TEMP
> > 2. $ cd TEMP
> > 3. $ ant clean example
> > 4. $ cd example
> > 5. $ java -jar start.jar
> >
> > (to post commit)
> > 6. $ cd $SOLR_HOME/example/exampledocs
> > 7. $ ./post.sh
> >
> > then see admin>statistics. I can see MultiSegmentReader instead of
> > SegmentReader, though.
> >
> > name:  [EMAIL PROTECTED] main class:
> org.apache.solr.search.SolrIndexSearcher version: 1.0 description:
> index searcher stats: caching : true
> > numDocs : 0
> > maxDoc : 0
> > readerImpl : MultiSegmentReader
> > readerDir :
> [EMAIL PROTECTED]:\Project\jakarta\lucene\solr\TEMP\example\solr\data\index
> > indexVersion : 1209693930226
> > openedAt : Fri May 02 11:05:30 JST 2008
> > registeredAt : Fri May 02 11:05:30 JST 2008
> >  name: [EMAIL PROTECTED] main class:
> org.apache.solr.search.SolrIndexSearcher version: 1.0 description:
> index searcher stats: caching : true
> > numDocs : 0
> > maxDoc : 0
> > readerImpl : MultiSegmentReader
> > readerDir :
> [EMAIL PROTECTED]:\Project\jakarta\lucene\solr\TEMP\example\solr\data\index
> > indexVersion : 1209693930226
> > openedAt : Fri May 02 11:06:13 JST 2008
> > registeredAt : Fri May 02 11:06:13 JST 2008
> >
> > Koji
> >
> >
> > Yonik Seeley wrote:
> >
> > > Hmmm, if there is a bug, odds are it's due to multicore stuff  -
> > > probably nothing else has touched core stuff like that recently.
> > > Can you reproduce (or rather help others to reproduce) with the
> > > solr/example setup?
> > >
> > > -Yonik
> > >
> > > On Wed, Apr 30, 2008 at 5:39 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > > > Hello!
> > > >
> > > > In using the SVN head version of Solr, I've found that recently we
> started
> > > > getting multiple open SegmentReaders, all registered... etc..
> > > >
> > > > Any ideas why this would happen? They don't go away unless the server
> is
> > > > restarted, and don't go away with commits, etc. In fact, commits seem
> to
> > > > cause the issue. They're causing issues since it causes really stale
> > > > searchers to be around...
> > > >
> > > > For example, right now...
> > > > org.apache.solr.search.SolrIndexSearcher
> > > > caching : true
> > > > numDocs : 153312
> > > > maxDoc : 153324
> > > > readerImpl : SegmentReader
> > > > readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
> > > > indexVersion : 1205944085143
> > > > openedAt : Wed Apr 30 14:04:15 PDT 2008
> > > > registeredAt : Wed Apr 30 14:04:15 PDT 2008
> > > >
> > > > (and right below that one...)
> > > > org.apache.solr.search.SolrIndexSearcher
> > > > caching : true
> > > > numDocs : 153312
> > > > maxDoc : 153324
> > > > readerImpl : SegmentReader
> > > > readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
> > > > indexVersion : 1205944085143
> > > > openedAt : Wed Apr 30 14:30:02 PDT 2008
> > > > registeredAt : Wed Apr 30 14:30:02 PDT 2008
> > > >
> > > > Thanks!
> > > >
> > > > Matthew Runo
> > > > Software Developer
> > > > Zappos.com
> > > > 702.943.7833
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
>
>


Too many open files

2008-05-02 Thread Wagner,Harry
I'm getting this with Solr 1.2 trying to load a large db. Is there a
workaround?



Re: Multiple open SegmentReaders?

2008-05-02 Thread Yonik Seeley
This bug was introduced in SOLR-509 (committed April 17th).
I'm working on a fix now.

-Yonik

On Fri, May 2, 2008 at 2:32 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Fri, May 2, 2008 at 1:08 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:
>  > Hah, thank you for doing this. Sometimes I see MultiSegmentReaders,
>  > sometimes SegmentReaders, so both show up from time to time. Right now 
> we've
>  > got two MultiSegmentReaders open..
>
>  OK, this implies there's a leak and the initial searcher that is
>  opened never gets closed.
>  Could you open a JIRA issue for this?
>
>  -Yonik
>
>
>
>
>  >
>  >  Thanks!
>  >
>  >  Matthew Runo
>  >  Software Developer
>  >  Zappos.com
>  >  702.943.7833
>  >
>  >
>  >  On May 1, 2008, at 7:19 PM, Koji Sekiguchi wrote:
>  >
>  > > I can reproduce with solr/example setup.
>  > > What I did:
>  > >
>  > > 1. $ svn co http://svn.apache.org/repos/asf/lucene/solr/trunk TEMP
>  > > 2. $ cd TEMP
>  > > 3. $ ant clean example
>  > > 4. $ cd example
>  > > 5. $ java -jar start.jar
>  > >
>  > > (to post commit)
>  > > 6. $ cd $SOLR_HOME/example/exampledocs
>  > > 7. $ ./post.sh
>  > >
>  > > then see admin>statistics. I can see MultiSegmentReader instead of
>  > > SegmentReader, though.
>  > >
>  > > name:  [EMAIL PROTECTED] main class:
>  > org.apache.solr.search.SolrIndexSearcher version: 1.0 description:
>  > index searcher stats: caching : true
>  > > numDocs : 0
>  > > maxDoc : 0
>  > > readerImpl : MultiSegmentReader
>  > > readerDir :
>  > [EMAIL PROTECTED]:\Project\jakarta\lucene\solr\TEMP\example\solr\data\index
>  > > indexVersion : 1209693930226
>  > > openedAt : Fri May 02 11:05:30 JST 2008
>  > > registeredAt : Fri May 02 11:05:30 JST 2008
>  > >  name: [EMAIL PROTECTED] main class:
>  > org.apache.solr.search.SolrIndexSearcher version: 1.0 description:
>  > index searcher stats: caching : true
>  > > numDocs : 0
>  > > maxDoc : 0
>  > > readerImpl : MultiSegmentReader
>  > > readerDir :
>  > [EMAIL PROTECTED]:\Project\jakarta\lucene\solr\TEMP\example\solr\data\index
>  > > indexVersion : 1209693930226
>  > > openedAt : Fri May 02 11:06:13 JST 2008
>  > > registeredAt : Fri May 02 11:06:13 JST 2008
>  > >
>  > > Koji
>  > >
>  > >
>  > > Yonik Seeley wrote:
>  > >
>  > > > Hmmm, if there is a bug, odds are it's due to multicore stuff  -
>  > > > probably nothing else has touched core stuff like that recently.
>  > > > Can you reproduce (or rather help others to reproduce) with the
>  > > > solr/example setup?
>  > > >
>  > > > -Yonik
>  > > >
>  > > > On Wed, Apr 30, 2008 at 5:39 PM, Matthew Runo <[EMAIL PROTECTED]> 
> wrote:
>  > > >
>  > > >
>  > > > > Hello!
>  > > > >
>  > > > > In using the SVN head version of Solr, I've found that recently we
>  > started
>  > > > > getting multiple open SegmentReaders, all registered... etc..
>  > > > >
>  > > > > Any ideas why this would happen? They don't go away unless the server
>  > is
>  > > > > restarted, and don't go away with commits, etc. In fact, commits seem
>  > to
>  > > > > cause the issue. They're causing issues since it causes really stale
>  > > > > searchers to be around...
>  > > > >
>  > > > > For example, right now...
>  > > > > org.apache.solr.search.SolrIndexSearcher
>  > > > > caching : true
>  > > > > numDocs : 153312
>  > > > > maxDoc : 153324
>  > > > > readerImpl : SegmentReader
>  > > > > readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
>  > > > > indexVersion : 1205944085143
>  > > > > openedAt : Wed Apr 30 14:04:15 PDT 2008
>  > > > > registeredAt : Wed Apr 30 14:04:15 PDT 2008
>  > > > >
>  > > > > (and right below that one...)
>  > > > > org.apache.solr.search.SolrIndexSearcher
>  > > > > caching : true
>  > > > > numDocs : 153312
>  > > > > maxDoc : 153324
>  > > > > readerImpl : SegmentReader
>  > > > > readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
>  > > > > indexVersion : 1205944085143
>  > > > > openedAt : Wed Apr 30 14:30:02 PDT 2008
>  > > > > registeredAt : Wed Apr 30 14:30:02 PDT 2008
>  > > > >
>  > > > > Thanks!
>  > > > >
>  > > > > Matthew Runo
>  > > > > Software Developer
>  > > > > Zappos.com
>  > > > > 702.943.7833
>  > > > >
>  > > > >
>  > > > >
>  > > > >
>  > > >
>  > > >
>  > > >
>  > >
>  > >
>  >
>  >
>


Re: Too many open files

2008-05-02 Thread Otis Gospodnetic
I'm not sure what "large db" you are referring to (indexing a RDBMS to Solr?), 
but the first thing to do is run ulimit -a (or some flavour of it, depending on 
the OS) and increase the open file descriptors limit if the one you see there 
is just very low (e.g. 1024).  If that limit is not low, make sure that things 
are getting closed properly by your app, so there are no f.d. leaks.
Also, make sure mergeFactor is not ridiculously high.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: "Wagner,Harry" <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, May 2, 2008 9:04:56 PM
> Subject: Too many open files
> 
> I'm getting this with Solr 1.2 trying to load a large db. Is there a
> workaround?
> 
> 




Distributed Search (shard) w/ Multicore?

2008-05-02 Thread Jon Baer

Hi,

Im trying to figure out if I can do this or if something else needs to  
be set, trying to run a query over multiple cores w/ the shard param?   
I seem to be getting the correct number of results back but no  
data ... any ideas?


Thanks.

- Jon


Re: Distributed Search (shard) w/ Multicore?

2008-05-02 Thread Yonik Seeley
On Fri, May 2, 2008 at 3:36 PM, Jon Baer <[EMAIL PROTECTED]> wrote:
>  Im trying to figure out if I can do this or if something else needs to be
> set, trying to run a query over multiple cores w/ the shard param?  I seem
> to be getting the correct number of results back but no data ... any ideas?

Should work OK (note that schemas should match across cores...
distributed search is not federated search).
You might need to be a little more explicit about what you are sending
and what you are getting back (the actual URL of the request, and the
actual XML of the response).

-Yonik


Re: Shared index base

2008-05-02 Thread Alok Dhir
Here's another question on this rather old thread -- while poring  
through various options in solrconfig, I came across the the 'native'  
lockType option.


That seems to indicate that SOLR/Lucene should work fine with multiple  
writers, as long as a proper locking mechanism is in place, such as  
would be provided by a POSIX compliant cluster file system, such as  
GPFS, GFS, Ibrix, OCFS2...


Single shared index, multiple readers/writers, as long as the  
underlying filesystem implements fs locks properly.


Is this correct?

---
Alok K. Dhir
[EMAIL PROTECTED]
Symplicity Corporation
1 703 351 0200 x 8080
www.symplicity.com

On Feb 27, 2008, at 3:10 AM, Otis Gospodnetic wrote:


Alok: correct - commit causes Solr to re-open the index.

Gene: That should work just fine.  While you can't have multiple  
concurrent writers, you can send multiple concurrent indexing  
requests to a single Solr instance designated to be the master.


Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 

From: Alok K. Dhir <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, February 26, 2008 7:51:19 PM
Subject: Re: Shared index base

thanks for your response - i've been waiting for this very
clarification.  so 'commit()' makes readers re-read the indexes?
On Feb 26, 2008, at 7:03 PM, Mike Klaas wrote:


There hasn't really been a concrete answer given in this thread,
so:  It works to point multiple Solr's at a single data dir, but you
can't have more than one writer.  If you try, the index could become
corrupted or inconsistent (especially if you are using 'simple' lock
type).  Also, the Solrs do not communicate with each other.  You
have to tell the readers manually that the index is updated (via
commit()--autoCommit will not work).

-Mike

On 26-Feb-08, at 9:39 AM, Alok Dhir wrote:


Are you saying all the servers will use the same 'data' dir?  Is
that a supported config?

On Feb 26, 2008, at 12:29 PM, Matthew Runo wrote:


We're about to do the same thing here, but have not tried yet. We
currently run Solr with replication across several servers. So
long as only one server is doing updates to the index, I think it
should work fine.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 26, 2008, at 7:51 AM, Evgeniy Strokin wrote:


I know there was such discussions about the subject, but I want
to ask again if somebody could share more information.
We are planning to have several separate servers for our search
engine. One of them will be index/search server, and all others
are search only.
We want to use SAN (BTW: should we consider something else?) and
give access to it from all servers. So all servers will use the
same index base, without any replication, same files.
Is this a good practice? Did somebody do the same? Any problems
noticed? Or any suggestions, even about different configurations
are highly appreciated.

Thanks,
Gene
















Re: Distributed Search (shard) w/ Multicore?

2008-05-02 Thread Jon Baer

Sorry about that, Im sending something simple like:

http://search.company.com:8115/solr/search/players?q=Smith&shards=box1:8115/search/players,box2:8115/search/players

Im getting back:


  
0
18
  
  


Identical schemas, it found the correct 13 but no docs attached.  In  
the logs I can see the results come back (w/  
wt=javabin&isShared=true) ...


- Jon

On May 2, 2008, at 3:41 PM, Yonik Seeley wrote:


On Fri, May 2, 2008 at 3:36 PM, Jon Baer <[EMAIL PROTECTED]> wrote:
Im trying to figure out if I can do this or if something else needs  
to be
set, trying to run a query over multiple cores w/ the shard param?   
I seem
to be getting the correct number of results back but no data ...  
any ideas?


Should work OK (note that schemas should match across cores...
distributed search is not federated search).
You might need to be a little more explicit about what you are sending
and what you are getting back (the actual URL of the request, and the
actual XML of the response).

-Yonik




Re: Distributed Search (shard) w/ Multicore?

2008-05-02 Thread Yonik Seeley
Try adding echoParams=all to the request.
Maybe there is a default rows=0 or something.

Are you using a recent version of Solr?

-Yonik

On Fri, May 2, 2008 at 4:30 PM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Sorry about that, Im sending something simple like:
>
>http://search.company.com:8115/solr/search/players?q=Smith&shards=box1:8115/search/players,box2:8115/search/players
>
>  Im getting back:
>
>  
>   
> 0
> 18
>   
>   
>  
>
>  Identical schemas, it found the correct 13 but no docs attached.  In the
> logs I can see the results come back (w/ wt=javabin&isShared=true) ...
>
>  - Jon
>
>
>
>  On May 2, 2008, at 3:41 PM, Yonik Seeley wrote:
>
>
> > On Fri, May 2, 2008 at 3:36 PM, Jon Baer <[EMAIL PROTECTED]> wrote:
> >
> > > Im trying to figure out if I can do this or if something else needs to
> be
> > > set, trying to run a query over multiple cores w/ the shard param?  I
> seem
> > > to be getting the correct number of results back but no data ... any
> ideas?
> > >
> >
> > Should work OK (note that schemas should match across cores...
> > distributed search is not federated search).
> > You might need to be a little more explicit about what you are sending
> > and what you are getting back (the actual URL of the request, and the
> > actual XML of the response).
> >
> > -Yonik


Re: solr on ubuntu 8.04

2008-05-02 Thread Albert Ramstedt
Hardy has solr packages already. You might want to look how they packaged
solr if you cannot move to that version.
Did you just drop the war file? Or did you use JNDI? You probably need to
configure solr/home, and maybe fiddle with
securitymanager stuff.

Albert

On Thu, May 1, 2008 at 6:46 PM, Jack Bates <[EMAIL PROTECTED]> wrote:

> I am trying to evaluate Solr for an open source records management
> project to which I contribute: http://code.google.com/p/qubit-toolkit/
>
> I installed the Ubuntu solr-tomcat5.5 package:
> http://packages.ubuntu.com/hardy/solr-tomcat5.5
>
> - and pointed my browser at: http://localhost:8180/solr/admin (The
> Ubuntu and Debian Tomcat packages run on port 8180)
>
> However, in response I get a Tomcat 404: The requested
> resource(/solr/admin) is not available.
>
> This differs from the response I get accessing a random URL:
> http://localhost:8180/foo/bar
>
> - which displays a blank page.
>
> From this I gather that the solr-tomcat5.5 package installed
> *something*, but that it's misconfigured or missing something.
> Unfortunately I lack the Java / Tomcat experience to track down this
> problem. Can someone recommend where to look, to learn why the Ubuntu
> solr-tomcat5.5 package is not working?
>
> I started an Ubuntu wiki page to eventually describe the process of
> installing Solr on Ubuntu: https://wiki.ubuntu.com/Solr
>
> Thanks, Jack
>


Question on WhitespaceTokenizerFactory concatenateAll

2008-05-02 Thread Sundar Sankaranarayanan
Hi,
I have a requirement that one of the fields that I had indexed as a
Text Field earlier should now return me results when searched with blank
spaces in between the word. I had tried to use the example in
wiki(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-83
c527b144cd9f71c341e7c4a061daee382bca40) to do so. I changed my schema to
this.
 

  
  
  
  
  
  
  
  
  
  
  

 

How ever, I still am not getting back any results when the searched word
has space in between them. I have not re indexed the data after the
change in schema.xml. 
Is there a way that I still get back results without having to re-index
them.
 
 
Warm Regards,
 
Sundar Sankarnarayanan
Software Engineer
@University of Phoenix
 


Re: Shared index base

2008-05-02 Thread Mike Klaas


On 2-May-08, at 1:20 PM, Alok Dhir wrote:

Here's another question on this rather old thread -- while poring  
through various options in solrconfig, I came across the the  
'native' lockType option.


That seems to indicate that SOLR/Lucene should work fine with  
multiple writers, as long as a proper locking mechanism is in place,  
such as would be provided by a POSIX compliant cluster file system,  
such as GPFS, GFS, Ibrix, OCFS2...


Single shared index, multiple readers/writers, as long as the  
underlying filesystem implements fs locks properly.


Is this correct?


No.  You wll avoid index corruption, but deletions/updates may not be  
handled properly.


-Mike