Re: Hardware config for SOLR

2008-09-19 Thread Karl Wettin


19 sep 2008 kl. 23.22 skrev Grant Ingersoll:

As for HDDs, people have noted some nice speedups in Lucene using  
Solid-state drives, if you can afford them.


I've seen the average response time cut in 5-10 times when switching  
to SSD. 64GB SSD is starting at EUR 200 so that can be a lot cheaper  
to do replace the disk than getting more servers, given you can fit  
your index on of those.



 karl


QParserPlugin

2009-01-26 Thread Karl Wettin

Hi forum,

I'm trying to get QParserPlugin to work, I've got


but still get Unknown query type 'myqueryparser' when I
/solr/select/?defType=myqueryparser&q=foo

There is no warning about myqueryparser from Solr at startup.

I do however manage to get this working:
  


  

So it shouldn't be my Solr environment or a classpath problem? That's  
the level of me setting up Solr, I'm left with no clues to why it  
doesn't register.



gratefully,

karl


Re: QParserPlugin

2009-01-27 Thread Karl Wettin

So it was me defining it in schema.xml rather than solrconfig.xml.

17:17 < erikhatcher> where are you defining the qparser plugin?
17:18 < erikhatcher> it's very odd... if it isn't picking them up but  
you reference them, it would certainly give an error
17:18 < karlwettin> as a first level child to schema element in  
schema.xml

17:19 < erikhatcher> qparser plugins go in solrconfig, not schema
17:19 < karlwettin> aha
17:19 < karlwettin> :)
17:19 < erikhatcher> :)


27 jan 2009 kl. 08.25 skrev Erik Hatcher:

Karl - where did you put your a.b.QParserPlugin?   You should put it  
in /lib within a JAR file.  I'm surprised you aren't  
seeing an error though.


Erik

On Jan 27, 2009, at 1:07 AM, Karl Wettin wrote:


Hi forum,

I'm trying to get QParserPlugin to work, I've got


but still get Unknown query type 'myqueryparser' when I
/solr/select/?defType=myqueryparser&q=foo

There is no warning about myqueryparser from Solr at startup.

I do however manage to get this working:

  
  


So it shouldn't be my Solr environment or a classpath problem?  
That's the level of me setting up Solr, I'm left with no clues to  
why it doesn't register.



gratefully,

  karl






Re: Text classification with Solr

2009-01-27 Thread Karl Wettin


27 jan 2009 kl. 17.23 skrev Neal Richter:



Is it really neccessary to use Solr for it? Things going much  
faster with
Lucene low-level api and much faster if you're loading the  
classification

corpus into the RAM.


Good points.  At the moment I'd rather have a daemon with a service
API.. as well as the filtering/tokenization capabilities Solr has
built in.  Probably will attempt to get the corpus' index in memory
via large memory allocation.

If it doesn't scale then I'll either go to Lucene api or implement a
custom inverted index via memcached.

Other note /at the moment/ is that it's not going to be a deeply
hierarchical taxonomy, much less a full indexing of an RDF/OWL
schema.. there are some gotchas for that.


If your corpus is small enought you may want to take a look at lucene/ 
contrib/instantiated. It was made just for these sort of things.



karl




facet count on partial results

2009-02-13 Thread Karl Wettin

Hi Solr,

I pass a rather large amount of OR clauses to Solr, ending up with  
lots and lots of results. It's however only the results above a  
certain score threadshold that is interesting for me, thus I'd like to  
only get facet count of the results within the threadshold. How can I  
do that?




  karl 


Re: facet count on partial results

2009-02-14 Thread karl wettin
On Fri, Feb 13, 2009 at 12:24 PM, Karl Wettin  wrote:
>
> I pass a rather large amount of OR clauses to Solr, ending up with lots and
> lots of results. It's however only the results above a certain score
> threadshold that is interesting for me, thus I'd like to only get facet
> count of the results within the threadshold. How can I do that?

I've been browsing the Solr code and it looks like quite a bit of work
to modify the facet handler to get it working with scores. Perhaps the
simplest way is to limit my result set right after hit collection. My
idea is that the facet counting would only be away of the documents
within my threadshold. Or is the facet counting using an alternative
query and hit collection than what produced the results?

However I can't seem to find the class where any of this takes place..
Is it QueryComponent?

Also, as my threadshold is based on the distance in score between the
first result it sounds like using a result start position greater than
0 is something I have to look out for. Or?


karl


Re: facet count on partial results

2009-02-16 Thread Karl Wettin


15 feb 2009 kl. 20.15 skrev Yonik Seeley:

On Sat, Feb 14, 2009 at 6:45 AM, karl wettin   
wrote:

Also, as my threadshold is based on the distance in score between the
first result it sounds like using a result start position greater  
than

0 is something I have to look out for. Or?


Hmmm - this isn't that easy in general as it requires knowledge of the
max score, right?


Hmmm indeed. Does Solr not collect 0-20 even though the request is for  
10-20? Wouldn't it then be possible to inject some code that limits  
the DocSet at that layer?


There is more. Not important but a nice thing to get: I create  
multiple documents per entity from my primary data source (e.g. each  
entity a book and each document a paragraph from the book) but I only  
want to present the top scoring document per entity. I handle this  
with client side post processing of the results. This means that I  
potentially get facet counts from documents that I actually don't  
present to the user. I would be nice to handle this in the same layer  
as my score threadshold restriction, but it would require loading the  
primary key from the document rather early. And it would also mean  
that even though I might get 2000 results within the threadshold the  
actual number of results I want to pass on to the client is a lot less  
than that. I.e. I'll have to request more results than I want in order  
to ensure I get enough even after filtering out documents that points  
at the an entity already member of the result list but with a greater  
score.


The question is if I can fit all this stuff in the same layer as the  
by score threadshold result set limiter.



I'm rather lost in the Solr code. Pointers at class and method names  
is most welcome.




 karl


Re: Multilanguage

2009-03-04 Thread Karl Wettin


17 feb 2009 kl. 21.26 skrev Grant Ingersoll:

I believe Karl Wettin submitted a Lucene patch for a Language  
guesser: http://issues.apache.org/jira/browse/LUCENE-826 but it is  
marked as won't fix.


The test case of LUCENE-1039 is a language classifier. I've use patch  
to detect languages of user queries (where I know the text contains  
text that is rather simple to classify as as specific language).



 karl


Re: Spell Check Handler

2007-08-11 Thread karl wettin


11 aug 2007 kl. 10.36 skrev climbingrose:


There is an issue on
Lucene issue tracker regarding multi-word spellchecker:
https://issues.apache.org/jira/browse/LUCENE-550


I think you mean LUCENE-626 that sort of depends on LUCENE-550.


--
karl





Re: Spell Check Handler

2007-08-11 Thread karl wettin


12 aug 2007 kl. 02.35 skrev climbingrose:

I think you mean LUCENE-626



Yeah. Is it possible to use it in product environment?


It's been running live for a long time at this one place, but the  
code is stuck at Lucene 2.0 and an old version of 550. I don't really  
do any more Solr than to monitor the forums and use some analysis  
code, so I could not say how much work it would take you to get it  
running.


I'm aiming at giving the code an overview and bring it up to date  
with the Lucene trunk any day, week, month or year now, depending on  
workload and if I manage to fix a verion of 550 that is accepted to  
the trunk.


You are welcome to break out the TokenPhraseSuggester and  
NgramTokenSuggester, the parts I think you are intrerested in. If you  
do, feel free to report about it and posting a patch in the issue.


--
karl


Re: Spell Check Handler

2007-08-17 Thread karl wettin
I updated LUCENE-626 last night. It should now run smooth without  
LUCENE-550, but smoother with.


Perhaps it is something you can use.


12 aug 2007 kl. 14.24 skrev climbingrose:

I'm happy to contribute code for the SpellCheckerRequestHandler.  
I'll post

the code once I strip off stuff related to our product.

On 8/12/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:


On 11/08/07,
climbingrose<
[EMAIL PROTECTED]> wrote:


That's exactly what I did with my custom version of the
SpellCheckerHandler.
However, I didn't handle suggestionCount and only returned the one
corrected
phrase which contains the "best" corrected terms. There is an  
issue on

Lucene issue tracker regarding multi-word spellchecker:

https://issues.apache.org/jira/browse/LUCENE-550? 
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel





I'd be interested to take a look at your modifications to the
SpellCheckerHandler, how did you handle phrase queries? maybe we  
can open

a
JIRA issue to expand the spell checking functionality to perform  
analysis

on
multi-word input values.

I did find http://issues.apache.org/jira/browse/LUCENE-626 after  
looking

at
LUCENE-550, but since these patches are not yet included in the  
Lucene

trunk
yet it might be a little difficult to justify implementing them in  
Solr.






--
Regards,

Cuong Hoang




Re: largish test data set?

2007-09-17 Thread Karl Wettin


17 sep 2007 kl. 12.06 skrev David Welton:



I'm in the process of evaluating solr and sphinx, and have come to
realize that actually having a large data set to run them against
would be handy.  However, I'm pretty new to both systems, so thought
that perhaps asking around my produce something useful.

What *I* mean by largish is something that won't fit into memory - say
5 or 6 gigs, which is probably puny for some and huge for others.


IMDB is about 1.2GB of data:



You can extract real queries from the TPB data collection, it should  
contain about 1M queries in the movie category:






--
karl


UserTagDesign

2007-09-17 Thread Karl Wettin
I've been looking at  on  
and off for a while and think all the use cases could be explained  
with simple UML class diagram semantics:



[Taggable](tag:Tag)-- {0..*} |--- {0..*} --(tag:Tag)[Tagger]

 |
 [Tagging]


Rendered: 

This is of course a design that might not fit everybody, it could be  
represented using an n-ary association or what not. But I find the  
text on the wiki much easier to follow with this in my head.


How (or even if) one would represent this in a index is a completely  
different story.


Translated to Java the diagram would look something like this:


/** the user */
class Tagger {
  Map> taggingsByTag;
}

/** the content */
class Taggable {
  Map> taggingsByTag;
}

/** content tagging */
class Tagging {
  Tagger tagger;
  Taggable tagged;
  Date created;
}

class Tag {
  String text;
}


Thought it was better to let you people decide whether or not this  
fits in the wiki.



--
karl




End user session tracking

2007-10-16 Thread Karl Wettin

Where in Solr would I add my own services? Do I really want to do that?

For reinforcement learning reasons I would like to keep track of all  
queries placed during an end user session, and as it expires I want  
to feed this information to an aggregated class used by a request  
handler.


I belive it makes more sense to keep track of this session data in  
Solr rather than in the Solr client code.


But then there is the possible future distribution and replication.  
It would make sense if all nodes would share the same session  
handler. So perhaps it makes more sense if this was some stand alone  
service? Perhaps that is pre emptive optimization and I should really  
just focus on getting my code running first?


So I'll start with an ad hoc session manager within Solr. Where in  
Solr should I add such a service?



--
karl


Re: End user session tracking

2007-10-17 Thread Karl Wettin


16 okt 2007 kl. 17.12 skrev Ryan McKinley:

So I'll start with an ad hoc session manager within Solr. Where in  
Solr should I add such a service?


I am using a custom filter that extends SolrDispatchFilter.


Alright, thanks!

--
karl


simple ui?

2008-05-27 Thread Karl Wettin
It would be perfect if all I had to do was to define a couple of facet  
fields, a default text query field and some title/body/class type to  
render the results.


Is there such a formula 1A JSP/servlet (or PHP) user interface for  
Solr? Perhaps something in the example or admin that I missed? If none  
of above, is there any commercial solution I can buy and have up and  
running today? I'm hysterically bad at UI.



  karl


Re: simple ui?

2008-05-30 Thread Karl Wettin


28 maj 2008 kl. 14.15 skrev Erik Hatcher:


On May 28, 2008, at 2:34 AM, Karl Wettin wrote:

It would be perfect if all I had to do was to define a couple of  
facet fields, a default text query field and some title/body/class  
type to render the results.


Is there such a formula 1A JSP/servlet (or PHP) user interface for  
Solr? Perhaps something in the example or admin that I missed? If  
none of above, is there any commercial solution I can buy and have  
up and running today? I'm hysterically bad at UI.


Solr Flare to the rescue :)   (sort of)

http://wiki.apache.org/solr/Flare


Excellent stuff! Thanks!


karl


!Solr

2006-06-01 Thread karl wettin
Hi all,

I need to get something up and running in 12 hours, so I thought
it could be fun to see if Solr would work out of the box for me.

Neither the example nor the dist war would start. 

No big deal, I'll hack something up another way. Just thought it
would be a good thing to report this.

I'm on IBM 1.5 on my PPC Linux. 

Here are the logs:

[EMAIL PROTECTED]:~/download/solr-nightly/example$ java -jar start.jar
18:52:16.463 INFO   [main] org.mortbay.log.LogImpl.add(LogImpl.java:110) >14> 
added [EMAIL PROTECTED]
18:52:16.215 INFO   [main] 
org.mortbay.util.FileResource.(FileResource.java:61) >09> Checking 
Resource aliases
18:52:16.774 WARN!! [main] org.mortbay.xml.XmlParser.(XmlParser.java:82) 
>10> Schema validation may not be supported
18:52:17.335 INFO   [main] 
org.mortbay.http.HttpServer.doStart(HttpServer.java:686) >07> Version 
Jetty/5.1.11RC0
18:52:17.537 INFO   [main] org.mortbay.util.Container.start(Container.java:75) 
>11> Started [EMAIL PROTECTED]
18:52:17.645 INFO   [main] org.mortbay.util.Container.start(Container.java:75) 
>08> Started ServletHttpContext[/,/]
18:52:17.756 INFO   [main] 
org.mortbay.http.SocketListener.start(SocketListener.java:206) >08> Started 
SocketListener on 127.0.0.1:8081
18:52:17.864 INFO   [main] org.mortbay.util.Container.start(Container.java:75) 
>06> Started [EMAIL PROTECTED]
18:52:18.417 INFO   [main] 
org.mortbay.http.HttpServer.setStatsOn(HttpServer.java:1131) >12> Statistics on 
= false for [EMAIL PROTECTED]
18:52:18.533 INFO   [main] 
org.mortbay.http.HttpServer.doStart(HttpServer.java:686) >07> Version 
Jetty/5.1.11RC0
18:52:18.987 INFO   [main] 
org.mortbay.jetty.servlet.WebApplicationContext.resolveWebApp(WebApplicationContext.java:249)
 >10> Extract 
jar:file:/home/kalle/download/solr-nightly/example/webapps/solr.war!/ to 
/tmp/Jetty__8983__solr/webapp
18:52:19.427 WARN!! [main] org.mortbay.xml.XmlParser.(XmlParser.java:82) 
>14> Schema validation may not be supported
18:52:22.850 WARN!! [main] 
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContext.java:520)
 >09> Configuration error on 
jar:file:/home/kalle/download/solr-nightly/example/webapps/solr.war!/
java.net.MalformedURLException: Absolute URL required with null context: 
../../../conf/web.external.xml
   at java.net.URL.(libgcj.so.7)
   at java.net.URL.(libgcj.so.7)
   at gnu.xml.aelfred2.SAXDriver.absolutize(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseEntityDecl(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseMarkupdecl(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseDoctypedecl(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseProlog(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseDocument(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.doParse(libgcj.so.7)
   at gnu.xml.aelfred2.SAXDriver.parse(libgcj.so.7)
   at gnu.xml.aelfred2.XmlReader.parse(libgcj.so.7)
   at javax.xml.parsers.SAXParser.parse(libgcj.so.7)
   at org.mortbay.xml.XmlParser.parse(XmlParser.java:218)
   at org.mortbay.xml.XmlParser.parse(XmlParser.java:235)
   at 
org.mortbay.jetty.servlet.XMLConfiguration.configureWebApp(XMLConfiguration.java:190)
   at 
org.mortbay.jetty.servlet.WebApplicationContext.configureWebApp(WebApplicationContext.java:422)
   at 
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContext.java:481)
   at org.mortbay.util.Container.start(Container.java:73)
   at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
   at org.mortbay.util.Container.start(Container.java:73)
   at org.mortbay.jetty.Server.main(Server.java:466)
   at java.lang.reflect.Method.invoke(libgcj.so.7)
   at org.mortbay.start.Main.invokeMain(Main.java:151)
   at org.mortbay.start.Main.start(Main.java:481)
   at org.mortbay.start.Main.main(Main.java:99)

18:52:23.264 INFO   [main] 
org.mortbay.http.SocketListener.start(SocketListener.java:206) >08> Started 
SocketListener on 0.0.0.0:8983
18:52:23.375 WARN!! [main] org.mortbay.jetty.Server.main(Server.java:454) >05> 
EXCEPTION
org.mortbay.util.MultiException[java.net.MalformedURLException: Absolute URL 
required with null context: ../../../conf/web.external.xml]
   at org.mortbay.http.HttpServer.doStart(HttpServer.java:686)
   at org.mortbay.util.Container.start(Container.java:73)
   at org.mortbay.jetty.Server.main(Server.java:466)
   at java.lang.reflect.Method.invoke(libgcj.so.7)
   at org.mortbay.start.Main.invokeMain(Main.java:151)
   at org.mortbay.start.Main.start(Main.java:481)
   at org.mortbay.start.Main.main(Main.java:99)
java.net.MalformedURLException: Absolute URL required with null context: 
../../../conf/web.external.xml
   at java.net.URL.(libgcj.so.7)
   at java.net.URL.(libgcj.so.7)
   at gnu.xml.aelfred2.SAXDriver.absolutize(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseEntityDecl(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseMarkupdecl(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseDoctypedecl(libgcj.so.7)
   at gnu.xml.aelfred2.XmlParser.parseProlog(libgcj.so.7)
   at gnu.xml.aelfre

Re: !Solr

2006-06-04 Thread karl wettin
That was it. Runs smooth now.

On Thu, 2006-06-01 at 13:19 -0400, Yonik Seeley wrote:
> Thanks for the report Karl, much appreciated.
> It looks like a problem with your servlet container/JVM not liking the
> XML entity "../../../conf/web.external.xml" in the web.xml
> I guess the IBM JVM uses some stricter XML parsing rules or something.
> 
> If you remove that from the web.xml, it should be fine (in fact I had
> removed it in the past already... I don't know how it came back).
> I'll remove it now so it will be fixed for the next nightly build.
> 
> -Yonik
> 
> 
> On 6/1/06, karl wettin <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > I need to get something up and running in 12 hours, so I thought
> > it could be fun to see if Solr would work out of the box for me.
> >
> > Neither the example nor the dist war would start.
> >
> > No big deal, I'll hack something up another way. Just thought it
> > would be a good thing to report this.
> >
> > I'm on IBM 1.5 on my PPC Linux.
> >
> > Here are the logs:
> >
> > [EMAIL PROTECTED]:~/download/solr-nightly/example$ java -jar start.jar
> > 18:52:16.463 INFO   [main] org.mortbay.log.LogImpl.add(LogImpl.java:110) 
> > >14> added [EMAIL PROTECTED]
> > 18:52:16.215 INFO   [main] 
> > org.mortbay.util.FileResource.(FileResource.java:61) >09> Checking 
> > Resource aliases
> > 18:52:16.774 WARN!! [main] 
> > org.mortbay.xml.XmlParser.(XmlParser.java:82) >10> Schema validation 
> > may not be supported
> > 18:52:17.335 INFO   [main] 
> > org.mortbay.http.HttpServer.doStart(HttpServer.java:686) >07> Version 
> > Jetty/5.1.11RC0
> > 18:52:17.537 INFO   [main] 
> > org.mortbay.util.Container.start(Container.java:75) >11> Started [EMAIL 
> > PROTECTED]
> > 18:52:17.645 INFO   [main] 
> > org.mortbay.util.Container.start(Container.java:75) >08> Started 
> > ServletHttpContext[/,/]
> > 18:52:17.756 INFO   [main] 
> > org.mortbay.http.SocketListener.start(SocketListener.java:206) >08> Started 
> > SocketListener on 127.0.0.1:8081
> > 18:52:17.864 INFO   [main] 
> > org.mortbay.util.Container.start(Container.java:75) >06> Started [EMAIL 
> > PROTECTED]
> > 18:52:18.417 INFO   [main] 
> > org.mortbay.http.HttpServer.setStatsOn(HttpServer.java:1131) >12> 
> > Statistics on = false for [EMAIL PROTECTED]
> > 18:52:18.533 INFO   [main] 
> > org.mortbay.http.HttpServer.doStart(HttpServer.java:686) >07> Version 
> > Jetty/5.1.11RC0
> > 18:52:18.987 INFO   [main] 
> > org.mortbay.jetty.servlet.WebApplicationContext.resolveWebApp(WebApplicationContext.java:249)
> >  >10> Extract 
> > jar:file:/home/kalle/download/solr-nightly/example/webapps/solr.war!/ to 
> > /tmp/Jetty__8983__solr/webapp
> > 18:52:19.427 WARN!! [main] 
> > org.mortbay.xml.XmlParser.(XmlParser.java:82) >14> Schema validation 
> > may not be supported
> > 18:52:22.850 WARN!! [main] 
> > org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContext.java:520)
> >  >09> Configuration error on 
> > jar:file:/home/kalle/download/solr-nightly/example/webapps/solr.war!/
> > java.net.MalformedURLException: Absolute URL required with null context: 
> > ../../../conf/web.external.xml
> >at java.net.URL.(libgcj.so.7)
> >at java.net.URL.(libgcj.so.7)
> >at gnu.xml.aelfred2.SAXDriver.absolutize(libgcj.so.7)
> 



a thought on cache

2006-08-03 Thread karl wettin
I don't do Solr, but had this thought that might be interesting: instead
of associating cache with an IndexSearcher, it could stand by it self.
When new documents are inserted (if I understand it right, Solr have
some kind of notification system for this) the cached queries are placed
on the new documents (indexed in a Memory- or InstantiatedIndex [Lucene
issue 550]) to see if they affect the cached results. If not, cache is
kept. If, cache is rebuilt or removed. With pre-tokenized fields (Lucene
issue 580) it would not consume that much resources at all, but perhaps
that will not fit in the Solr-scheme.

Any immediate comments on that? I'd like to implement something like
this for my self as I notice the CPU working a bit harder than I want it
to every time I update an index.



Re: a thought on cache

2006-08-04 Thread karl wettin
On Thu, 2006-08-03 at 23:53 -0700, Chris Hostetter wrote:

>   1) as new docs come in, add them to a purely in memory index
>   2) when it becomes time to "commit" the new documents, test all queries
>  in the cache against this in memory index.
>   3) any query in the cache which has a hit on this in memory index should
>  be invalidated, any query which does not have a hit is still valid.

You got it.

> ...this could probably work if the index was purely additive 

> check if one of the cached queries matched on the deleted document

Hmm, didn't see that one coming. Quick and dirt would be to rebuild
the document for original source. Have to think of a better solution
than that though.

> the next segment merge could collapse doc ids above deleted docs which
> were totally unrelated to any docs that were added or deleted -- so
> you would think they are still valid even though the doc ids in the
> cache don't correspond to the same documents anymore.

This is not the first time I think of low level hooks in the index. If
an optimization could report changes this would not be a problem, or?

> while the "old" IndexSearcher is still being used by external requests
> (and still using it's cache) a new "on deck" IndexSearcher is opened,
> and an internal thread is running queries against it (the results of

I do something similar to that. But all them queries (in some cases
tens of thousands and a frequently updated index) hogs more CPU than I
think it has to. I'm low on CPU (spent on real time collaborative
filtering et.c.) but have more or less an unlimited amount of RAM.



Re: a thought on cache

2006-08-05 Thread karl wettin
On Fri, 2006-08-04 at 11:18 -0400, Yonik Seeley wrote:
> On 8/4/06, karl wettin <[EMAIL PROTECTED]> wrote:
> > When new documents are inserted (if I understand it right, Solr have
> > some kind of notification system for this) the cached queries are placed
> > on the new documents (indexed in a Memory- or InstantiatedIndex [Lucene
> > issue 550]) to see if they affect the cached results.
> 
> It would be complicated enough for a filter cache (just the docs that
> match), but doesn't even seem possible for a query cache where
> relevancy scores could change due to changes in idf.  Perhaps doable
> if one were willing to drop all idf terms from scoring...

Ouch. Yes, this is a hard nut to crack. I'll most definitely sleep on it
for a couple of night though.

Thanks all for the input!



Re: Incremental updates/Sorting problem

2006-08-08 Thread karl wettin
On Tue, 2006-08-08 at 02:14 -0700, bo_b wrote:
> Is there any solution to this problem? I would like to be able to sort, but
> we cant live with 264 second downtime after every commit.

There has been many long threads in the Lucene-users forum on this
subject. Try searching for "sorting" in subject. I personally suggest a
List where index is the document number and the value the global
order, set by iterating TermEnum and TermDocs at index time.

But many people think this is a bad solution. So read threads to catch
up on the alternatives.



Re: Spelling

2007-02-06 Thread karl wettin


6 feb 2007 kl. 04.19 skrev Michael Kimsal:

Thanks Erik.  That worked, then threw me for another loop, which I  
sort of

have fixed I think.

I'm using the highligher functionality, but it doesn't seem to  
highlight the
'matched' word if it's a partial match, although it does in fact  
return that
record.  Am I missing something obvious here, or is highlighting of  
partial

matches not supported?


You need to rewrite the query. See Query.rewrite.

(I think that's it.)


But,

fuzzy queries are sort of slow, at least compared to many other things.
Depending on your server load and corpus size, perhaps I would  
recommend you
using some sort of "did you mean"- functionallity rather than fuzzy  
queries.



--
karl


Re: cache warming optmization

2007-02-07 Thread karl wettin


7 feb 2007 kl. 19.04 skrev Erik Hatcher:

I'm interested in improving my existing custom cache warming by  
being selective about what updates rather than rebuilding completely.


I know it is not Solr, but I've made great progress on my cache that  
updates affected results only, on insert and delete. It's available  
in LUCENE-550, and based on the InstantiatedIndex and NotifiableIndex  
avilable in the same patch. Java 1.5. Perhaps that is something you  
can take a look at for some ideas.


--
karl


Re: Solr logo poll

2007-04-11 Thread karl wettin

B.

It is juicy. I like juice. Also, I think it fits better with the  
other Lucene sub-project logos.



6 apr 2007 kl. 19.51 skrev Yonik Seeley:


Quick poll...  Solr 2.1 release planning is underway, and a new logo
may be a part of that.
What "form" of logo do you prefer, A or B?  There may be further
tweaks to these pictures, but I'd like to get a sense of what the user
community likes.

A) http://issues.apache.org/jira/secure/attachment/12349897/logo- 
solr-d.jpg


B) http://issues.apache.org/jira/secure/attachment/ 
12353535/12353535_solr-nick.gif


Just respond to this thread with your preference.

-Yonik




Re: Sort on multiple fields not working?

2007-04-13 Thread karl wettin


12 apr 2007 kl. 17.06 skrev Yonik Seeley:


Sorting works on indexed tokens, and hence doesn't really work on
analyzed fields that produce more than one token per document.  I
suspect your title field falls into that category.  You could also
index the title field into another field that is indexed as a string
(non-tokenized), but that might take up a lot of memory if you have
long titles.


It just hit me (and I did not consider it any further) that perhaps  
one could store String.valueOf(theTitle.hashcode()) in an alternative  
field and sort by that instead? It will not be 100% accurate, but in  
most cases it will. However, I'm not sure how negative values will be  
handled. If that would be a problem, one could convert the integer to  
alfanum. That should also save a bunch of memory.


--
karl


Re: Sort on multiple fields not working?

2007-04-13 Thread karl wettin


13 apr 2007 kl. 15.48 skrev Yonik Seeley:


On 4/13/07, karl wettin <[EMAIL PROTECTED]> wrote:

It just hit me (and I did not consider it any further) that perhaps
one could store String.valueOf(theTitle.hashcode()) in an alternative
field and sort by that instead? It will not be 100% accurate, but in
most cases it will.


That would only mostly work for titles around 5 characters long,
right?  It seems like after that, the correlation between hashCode and
sort order breaks down almost immediately since you lose the leftmost
hash bits.


That might be true, as I said, I didn't really think about it too  
long. But some alternative hashCode could probably be implemented,  
one that use all available bits in a string, rather than the 32 bit  
limitation of an integer.


--
karl


Re: Sort on multiple fields not working?

2007-04-13 Thread karl wettin


13 apr 2007 kl. 20.11 skrev Chris Hostetter:



: That might be true, as I said, I didn't really think about it too
: long. But some alternative hashCode could probably be implemented,
: one that use all available bits in a string, rather than the 32 bit
: limitation of an integer.

if you're going to use all the bits in the string, and not confine
yourself to an integer, how is that different from sorting on the  
string

itself?


Smaller string values does not consume as much memory?

I might not understand your question.

--
karl