Re: Tagging

2007-02-23 Thread Erik Hatcher


On Feb 22, 2007, at 11:30 PM, Gmail Account wrote:
I use solr for searching and facets and love it.. The performance  
is awesome.


However I am about to add tagging to my application and I'm having  
a hard time deciding if I should just database my tags for now  
until a better solr solution is worked out... Does anyone know what  
technology some of the larger sites use for tagging? Database  
(MySQL, SQL Server) with denormalized cache tables everywhere,  
something similar to solr/lucene, or something else?


Simpy, Otis Gospodnetic's creation, uses Lucene.  I suspect most of  
the others use a relational database and lots and lots of caching...  
especially since the others use tags but not full-text search.  Simpy  
is special!


Erik



index browsing with solr

2007-02-23 Thread Pierre-Yves LANDRON

Hello everybody,

I'm new to this mailing list, so excuse me if my question has already been 
debated here (I've searched on the web and found nothing about it).


I've used solr for two weeks now, and so far it's a really neat solution. 
I've replaced my previous index searcher app by solr in my current project, 
but can not find a way to substitute the browseIndex(field, startterm, 
numberoftermsretuened) function i've used so far. It's a very usefull method 
and I'm sure it can be accomplished with solr, but I can't figure how.
Lucene does it throught the terms method from the class IndexReader, I think 
:
abstract  TermEnum	terms(Term t) : Returns an enumeration of all terms after 
a given term.


Does an implementation of this method exists in solr ?

If not, is it difficult to develop new instructions for solr ? where I must 
start to do so ?


Thanks !
Pierre-Yves Landron

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/




Re: index browsing with solr

2007-02-23 Thread Ryan McKinley


Does an implementation of this method exists in solr ?



i don;t think so.



If not, is it difficult to develop new instructions for solr ? where I must
start to do so ?



it will be easy to add.  take a look at a simple SolrRequestHandler:

http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/IndexInfoRequestHandler.java

this gets the IndexReader and writes out some stuff.


Problem indexing

2007-02-23 Thread Otis Gospodnetic
Hi,

This is related to SOLR-81.  Things bomb when I try indexing with my probably 
misconfigured schema.xml:
  - see it at http://www.krumpir.com/schema.xml - I added a few new fieldTypes, 
fields, and copyFields
  - just the diff: http://www.krumpir.com/schema.xml-diff.txt

I've created a dictionary.xml file with the following content:


  
application
  


I tried posting that, like this:
$ java -jar post.jar http://localhost:8983/solr/update dictionary.xml

That bombed with this:

SimplePostTool: WARNING: Unexpected response from Solr: 'java.lang.NullPointerException
at org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:248)
at 
org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:134)
at 
org.apache.solr.update.DirectUpdateHandler2.overwriteBoth(DirectUpdateHandler2.java:380)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:236)
 
Does this look familiar to anyone?  Anything in that schema.xml looks fishy or 
plain wrong?

Thanks,
Otis




Re: Problem indexing

2007-02-23 Thread Yonik Seeley

On 2/23/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

Hi,

This is related to SOLR-81.  Things bomb when I try indexing with my probably 
misconfigured schema.xml:
  - see it at http://www.krumpir.com/schema.xml - I added a few new fieldTypes, 
fields, and copyFields
  - just the diff: http://www.krumpir.com/schema.xml-diff.txt

I've created a dictionary.xml file with the following content:


  
application
  


I tried posting that, like this:
$ java -jar post.jar http://localhost:8983/solr/update dictionary.xml

That bombed with this:

SimplePostTool: WARNING: Unexpected response from Solr: 'java.lang.NullPointerException
at org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:248)
at 
org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:134)
at 
org.apache.solr.update.DirectUpdateHandler2.overwriteBoth(DirectUpdateHandler2.java:380)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:236)

Does this look familiar to anyone?  Anything in that schema.xml looks fishy or 
plain wrong?


I think the document you are adding is missing the uniqueKeyField

-Yonik


Re: Problem indexing

2007-02-23 Thread Otis Gospodnetic
Oh, look at that, adding 1 took care of the bombing, 
nice!

Thanks,
Otis

> I tried posting that, like this:
> $ java -jar post.jar http://localhost:8983/solr/update dictionary.xml
>
> That bombed with this:
>
> SimplePostTool: WARNING: Unexpected response from Solr: ' status="1">java.lang.NullPointerException
> at 
> org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:248)
> at 
> org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:134)
> at 
> org.apache.solr.update.DirectUpdateHandler2.overwriteBoth(DirectUpdateHandler2.java:380)
> at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:236)
>
> Does this look familiar to anyone?  Anything in that schema.xml looks fishy 
> or plain wrong?

I think the document you are adding is missing the uniqueKeyField

-Yonik





Re: Problem indexing

2007-02-23 Thread Walter Underwood
It is a bug, though. That should send an error message, not a
stack trace. --wunder


On 2/23/07 10:39 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> Oh, look at that, adding 1 took care of the bombing,
> nice!
> 
> Thanks,
> Otis
> 
>> I tried posting that, like this:
>> $ java -jar post.jar http://localhost:8983/solr/update dictionary.xml
>> 
>> That bombed with this:
>> 
>> SimplePostTool: WARNING: Unexpected response from Solr: '> status="1">java.lang.NullPointerException
>> at 
>> org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:248)
>> at 
>> org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:134)
>> at 
>> org.apache.solr.update.DirectUpdateHandler2.overwriteBoth(DirectUpdateHandler
>> 2.java:380)
>> at 
>> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:
>> 236)
>> 
>> Does this look familiar to anyone?  Anything in that schema.xml looks fishy
>> or plain wrong?
> 
> I think the document you are adding is missing the uniqueKeyField
> 
> -Yonik
> 
> 
> 



Re: Problem indexing

2007-02-23 Thread Chris Hostetter

: It is a bug, though. That should send an error message, not a
: stack trace. --wunder

I opened SOLR-172 to track getting a better exception then an NPE in this
case, but that *is* an error message being returned to the client, the
message just happens to be a stack trace ... SOLR-141 should hopefully
make the way errors get reported more uniform (if/when i/we ever get
arround to tackling it)


-Hoss



Re: index browsing with solr

2007-02-23 Thread Yonik Seeley

On 2/23/07, Pierre-Yves LANDRON <[EMAIL PROTECTED]> wrote:

I've used solr for two weeks now, and so far it's a really neat solution.
I've replaced my previous index searcher app by solr in my current project,
but can not find a way to substitute the browseIndex(field, startterm,
numberoftermsretuened) function i've used so far. It's a very usefull method
and I'm sure it can be accomplished with solr, but I can't figure how.
Lucene does it throught the terms method from the class IndexReader, I think
:
abstract  TermEnum  terms(Term t) : Returns an enumeration of all terms 
after
a given term.

Does an implementation of this method exists in solr ?


You can get this functionality from the current faceting implementation,
the downside is that it will be slower.

-Yonik


WordNet Ontologies

2007-02-23 Thread rubdabadub

Hi:

Does Solr supports ontology somehow? Has it been tried? Any tips on
how should I go about doing so?

Thanks.
Regards


lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Brian Whitman
I'm trying to add lots of documents at once (hundreds of thousands)  
in a loop. I don't need these docs to appear as results until I'm  
done, though.


For a simple test, I call the post.sh script in a loop with the same  
moderately sized xml file. This adds a 20K doc and then commits.  
Repeat hundreds of thousands of times.


This works fine for a while, but eventually (only 10K docs in or so)  
the Solr instance starts taking longer and longer to respond to my  
s (I print out the curl time, near the end it takes 10s an add)  
and the web server (resin 3.0) eventually log dumps out with "out of  
heap space" (my max heap is 1GB on a 4GB machine.)


I also see the "(Too many open files in system)" stacktrace coming  
from Lucene's SegmentReader during this test. My fs.file-max was  
361990, which bumped up to 2m, but I don't know how/why Solr/Lucene  
would open that many.



My question is about best practices for this sort of "bulk add."  
Since insert time is not a concern, I have some leeway. Should I  
commit after every add? Should I optimize every so many commits? Is  
there some reaper on a thread or timer that I should let breathe?












Re: lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Yonik Seeley

On 2/23/07, Brian Whitman <[EMAIL PROTECTED]> wrote:

I'm trying to add lots of documents at once (hundreds of thousands)
in a loop. I don't need these docs to appear as results until I'm
done, though.

For a simple test, I call the post.sh script in a loop with the same
moderately sized xml file. This adds a 20K doc and then commits.
Repeat hundreds of thousands of times.


Try not committing so often (perhaps until you are done).
Don't use post.sh, or modify it to remove the commit.

-Yonik


Re: lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Mike Klaas

On 2/23/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 2/23/07, Brian Whitman <[EMAIL PROTECTED]> wrote:
> I'm trying to add lots of documents at once (hundreds of thousands)
> in a loop. I don't need these docs to appear as results until I'm
> done, though.
>
> For a simple test, I call the post.sh script in a loop with the same
> moderately sized xml file. This adds a 20K doc and then commits.
> Repeat hundreds of thousands of times.

Try not committing so often (perhaps until you are done).
Don't use post.sh, or modify it to remove the commit.


Part of the problem might be repeatedly inserting the same doc over
and over again-- that is an odd pattern of deletes, which might be
triggering a bad performance case on the lucene or solr side (I'm
assuming the doc has a unique key).

-Mike


Re: lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Brian Whitman


Try not committing so often (perhaps until you are done).
Don't use post.sh, or modify it to remove the commit.



OK, I modified it to not commit after and I also realized I had  
SOLR-126 (autocommit) on, which I disabled. Is there a rule of thumb  
on when to commit / optimize?




Part of the problem might be repeatedly inserting the same doc over
and over again-- that is an odd pattern of deletes, which might be
triggering a bad performance case on the lucene or solr side (I'm
assuming the doc has a unique key).


Could be, but the same issue occurs on 400K unique docs. I made the  
post.sh test case to see what exactly the issue is.



I've discovered something wonky with commits and open files...

If either autoCommit is on, or I commit after every add, the number  
of open file descriptors from Lucene goes up way high and does not go  
back down. I just ran


# lsof | grep solr | wc -l

after adding 1000 docs and got 125265 open fdescs. If I stop adding  
docs this count does not go down -- it does not go down until I  
restart solr. This would be the cause of my too many files open  
problem. Turning off autocommit / not commiting after every add keeps  
this count steady at 100-200. The files are all of type:


java  32254 bwhitman 3654r  REG  8,1 12
15417767 /home/bwhitman/solr/working/data/index/_86u.nrm (deleted)
java  32254 bwhitman 3655r  REG  8,1  42024
15417813 /home/bwhitman/solr/working/data/index/_86t.fdt (deleted)
java  32254 bwhitman 3656r  REG  8,1 16
15417814 /home/bwhitman/solr/working/data/index/_86t.fdx (deleted)
java  32254 bwhitman 3657r  REG  8,1  27420
15417817 /home/bwhitman/solr/working/data/index/_86t.tis (deleted)
java  32254 bwhitman 3658r  REG  8,1368
15417818 /home/bwhitman/solr/working/data/index/_86t.tii (deleted)
java  32254 bwhitman 3659r  REG  8,1   7652
15417815 /home/bwhitman/solr/working/data/index/_86t.frq (deleted)
java  32254 bwhitman 3660r  REG  8,1  24860
15417816 /home/bwhitman/solr/working/data/index/_86t.prx (deleted)
java  32254 bwhitman 3661r  REG  8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java  32254 bwhitman 3662r  REG  8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java  32254 bwhitman 3663r  REG  8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java  32254 bwhitman 3664r  REG  8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java  32254 bwhitman 3665r  REG  8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java  32254 bwhitman 3666r  REG  8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java  32254 bwhitman 3667r  REG  8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java  32254 bwhitman 3668r  REG  8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java  32254 bwhitman 3669r  REG  8,1  21012
15417669 /home/bwhitman/solr/working/data/index/_85y.fdt
java  32254 bwhitman 3670r  REG  8,1  8
15417670 /home/bwhitman/solr/working/data/index/_85y.fdx
java  32254 bwhitman 3671r  REG  8,1  46736
15417673 /home/bwhitman/solr/working/data/index/_85y.tis
java  32254 bwhitman 3672r  REG  8,1503
15417674 /home/bwhitman/solr/working/data/index/_85y.tii
java  32254 bwhitman 3673r  REG  8,1   43936224
15417671 /home/bwhitman/solr/working/data/index/_85y.frq
java  32254 bwhitman 3674r  REG  8,1  12430
15417672 /home/bwhitman/solr/working/data/index/_85y.prx
java  32254 bwhitman 3675r  REG  8,1  80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java  32254 bwhitman 3676r  REG  8,1  80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java  32254 bwhitman 3677r  REG  8,1  80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java  32254 bwhitman 3678r  REG  8,1  80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java  32254 bwhitman 3679r  REG  8,1  80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java  32254 bwhitman 3680r  REG  8,1  80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java  32254 bwhitman 3681r  REG  8,1  80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java  32254 bwhitman 3682r  REG  8,1  80004
15417675 /home/bwhitman/solr/working/d

Re: WordNet Ontologies

2007-02-23 Thread Erik Hatcher


On Feb 23, 2007, at 5:33 PM, rubdabadub wrote:

Does Solr supports ontology somehow? Has it been tried? Any tips on
how should I go about doing so?


What are you wanting to do exactly?

Erik



Re: lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Yonik Seeley

On 2/23/07, Brian Whitman <[EMAIL PROTECTED]> wrote:

>
> Try not committing so often (perhaps until you are done).
> Don't use post.sh, or modify it to remove the commit.
>

OK, I modified it to not commit after and I also realized I had
SOLR-126 (autocommit) on, which I disabled. Is there a rule of thumb
on when to commit / optimize?


There is a map entry (UniqueKey->Integer) per document added/deleted,
and that's really the only in-memory state that is kept.  So you
should be good with at least >100K docs.


If either autoCommit is on, or I commit after every add, the number
of open file descriptors from Lucene goes up way high and does not go
back down.


Do you have any warming configured?
Too many searchers trying to initialize fieldcache entries can blow
out the memory, causing most of the CPU to be consumed by the garbage
collector.


I just ran

# lsof | grep solr | wc -l

after adding 1000 docs and got 125265 open fdescs. If I stop adding
docs this count does not go down


Is there detectable activity going on (like CPU usage)?
Does the admin page list all of these open searchers (check the
statistics page under "CORE")


-- it does not go down until I
restart solr. This would be the cause of my too many files open
problem. Turning off autocommit / not commiting after every add keeps
this count steady at 100-200. The files are all of type:

[...]

Bug or feature?


If the searchers holding these index files open are still working,
then this is a problem, but not exactly a bug.  If not, you may have
hit a new bug in searcher synchronization.

A workaround is to limit the number of warming searchers (see
maxWarmingSearchers in solrconfig.xml)

-Yonik


Re: lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Brian Whitman

On Feb 23, 2007, at 8:31 PM, Yonik Seeley wrote:


-- it does not go down until I
restart solr. This would be the cause of my too many files open
problem. Turning off autocommit / not commiting after every add keeps
this count steady at 100-200. The files are all of type:

[...]

Bug or feature?


If the searchers holding these index files open are still working,
then this is a problem, but not exactly a bug.  If not, you may have
hit a new bug in searcher synchronization.


It doesn't look like it. I hope I'm not getting a reputation on here  
for "discovering" bugs that seem to be my own fault, you'd all laugh  
if you knew how much time I wasted before posting about it this  
evening...


But I just narrowed this down to a bad line in my solrconfig.xml.

The one I was using said this for some reason :

class="solr.XmlUpdateRequestHandler"


changing my line to the trunk line fixed the fdesc problem.

The confounding thing to me is that the solr install worked fine  
otherwise. I don't know what would make removing the /xml path make a  
ton of files open but everything else work OK.


If you want to reproduce it:

1) Download trunk/nightly
2) Change line 347 of example/solr/conf/solrconfig.xml to


3) java -jar start.jar...
3) Run post.sh a bunch of times on the same xml file... (in a shell  
script or whatever)
4) After a few seconds/minutes jetty will crash with "too many open  
files"


Now, to see if this also caused my heap overflow problems. Thanks  
Mike and Yonik...






Re: lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Chris Hostetter

it sounds like we may have a very bad bug in the XmlUpdateRequestHandler

to clarify for people who may not know: the long standing "/update" URL
has historicaly been handled using a custom servlet, recently some of that
code was refactored into a RequestHandler along with a new Dispatcher
for RequestHandlers that works based on path mapping -- the goal being to
allow more customizable update processing and start accepting updates in a
variety of input formats ... if XmlUpdateRequestHandler is mapped to the
name "/update" it intercepts requests to the legacy update servlet, and
should have functioned exactly the same way.

Based on Brain's email, it sounds like it didn't work in *exactly* the
same way, because it caused some filedescriptor leaks (and possibly some
memory leaks)

Hopefully Ryan will be a rock star and spot the probably immediately --
but i'll try to look into it later this weekend.


: Date: Fri, 23 Feb 2007 22:33:10 -0500
: From: Brian Whitman <[EMAIL PROTECTED]>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: lots of inserts very fast, out of heap or file descs
:
: On Feb 23, 2007, at 8:31 PM, Yonik Seeley wrote:
:
: >> -- it does not go down until I
: >> restart solr. This would be the cause of my too many files open
: >> problem. Turning off autocommit / not commiting after every add keeps
: >> this count steady at 100-200. The files are all of type:
: > [...]
: >> Bug or feature?
: >
: > If the searchers holding these index files open are still working,
: > then this is a problem, but not exactly a bug.  If not, you may have
: > hit a new bug in searcher synchronization.
:
: It doesn't look like it. I hope I'm not getting a reputation on here
: for "discovering" bugs that seem to be my own fault, you'd all laugh
: if you knew how much time I wasted before posting about it this
: evening...
:
: But I just narrowed this down to a bad line in my solrconfig.xml.
:
: The one I was using said this for some reason :
:
:
: 3) java -jar start.jar...
: 3) Run post.sh a bunch of times on the same xml file... (in a shell
: script or whatever)
: 4) After a few seconds/minutes jetty will crash with "too many open
: files"
:
: Now, to see if this also caused my heap overflow problems. Thanks
: Mike and Yonik...
:
:
:



-Hoss