Re: Some new SOLR features

Ryan McKinley Mon, 15 Sep 2008 08:44:53 -0700

Here are my gut reactions to this list... in general, most of thiscomes down to "sounds great, if someone did the work I'm all for it"!

Also, no need to post to solr-user AND solr-dev, probably better tothink of solr-user as a superset of solr-dev.

1. Machine learning based suggest feature
https://issues.apache.org/jira/browse/LUCENE-626 which is implemented
as is similar to what Google in their suggest implementation.  The
Fuzzy based spellchecker is ok, but it would be better to incorporate
use behavior.
2. Realtime updates https://issues.apache.org/jira/browse/LUCENE-1313
and work being planned for IndexWriter
3. Realtime untokenized field updates
https://issues.apache.org/jira/browse/LUCENE-1292


Without knowing the details of these patches, everything sounds great.

In my view, SOLR should offer a nice interface to anything in lucenecore/contrib


4. BM25 Scoring


Again, no idea, but if implement in lucene yes


5. Integration with an open source SQL database such as H2.  This
would mean under the hood, SOLR would enable storing data in a
relational database to allow for joins and things.  It would need to
be combined with realtime updates.  H2 has Lucene integration but it
is the usual index everything at once, non-incrementally.  The new
system would simply index as a new row in a table is added.  The SOLR
schema could allow for certain fields being stored in an SQL database.


Sounds interesting -- what is the basic problem you are addressing?

(It seems you are pointing to something specific, and describing yoursolution)


6. SOLR schema allowing for multiple indexes without using the
multicore.  The indexes could be defined like SQL tables in the
schema.xml file.

Is this just a configuration issue? I defiantly hope we can makeconfiguration easier in the future.

As is, a custom handler can look at multiple indexes... why is their aneed to have multiple lucene indexes within a single SolrCore?


6. Crowd by feature ala GBase
http://code.google.com/apis/base/attrs-queries.html#crowding which is
similar to Field Collapsing.  I am thinking it is advantageous from a
performance perspective to obtain an excessive amount of results, then
filter down the result set, rather than first sort a result set.


Again, sounds great!  I would love to see it.


7. Improved relevance based on user clicks of individual query results
for individual queries.  This can be thought of as similar to what
Digg does.  I'm sure Google does something similar.  It is a feature
that would be of value to almost any SOLR implementation.

Agreed -- if there is a good way to quickly update a field used forsorting/scoring, this would happen


8. Integration of LocalSolr into the standard SOLR distribution.
Location is something many sites use these days and is standard in
GBase and most likely other products like FAST.

I'm working on it.... will be a lucene contrib package and cookedinto the core solr distribution.


9. Distributed search and updates using a object serialization which
could use.  https://issues.apache.org/jira/browse/LUCENE-1336  This
allows span queries, custom payload queries, custom similarities,
custom analyzers, without compiling and deploying and a new SOLR war
file to individual servers.



sounds good (but I have no technical basis to say so)


ryan

Re: Some new SOLR features

Reply via email to