Hello,

There are a few features I would like to see in SOLR going forward and
I am interested in finding out what other folks thought about them to
get a priority list.  I believe there are many features that Google
and FAST have that SOLR and Lucene will want to implement in future
releases.

1. Machine learning based suggest feature
https://issues.apache.org/jira/browse/LUCENE-626 which is implemented
as is similar to what Google in their suggest implementation.  The
Fuzzy based spellchecker is ok, but it would be better to incorporate
use behavior.
2. Realtime updates https://issues.apache.org/jira/browse/LUCENE-1313
and work being planned for IndexWriter
3. Realtime untokenized field updates
https://issues.apache.org/jira/browse/LUCENE-1292
4. BM25 Scoring
5. Integration with an open source SQL database such as H2.  This
would mean under the hood, SOLR would enable storing data in a
relational database to allow for joins and things.  It would need to
be combined with realtime updates.  H2 has Lucene integration but it
is the usual index everything at once, non-incrementally.  The new
system would simply index as a new row in a table is added.  The SOLR
schema could allow for certain fields being stored in an SQL database.
6. SOLR schema allowing for multiple indexes without using the
multicore.  The indexes could be defined like SQL tables in the
schema.xml file.
6. Crowd by feature ala GBase
http://code.google.com/apis/base/attrs-queries.html#crowding which is
similar to Field Collapsing.  I am thinking it is advantageous from a
performance perspective to obtain an excessive amount of results, then
filter down the result set, rather than first sort a result set.
7. Improved relevance based on user clicks of individual query results
for individual queries.  This can be thought of as similar to what
Digg does.  I'm sure Google does something similar.  It is a feature
that would be of value to almost any SOLR implementation.
8. Integration of LocalSolr into the standard SOLR distribution.
Location is something many sites use these days and is standard in
GBase and most likely other products like FAST.
9. Distributed search and updates using a object serialization which
could use.  https://issues.apache.org/jira/browse/LUCENE-1336  This
allows span queries, custom payload queries, custom similarities,
custom analyzers, without compiling and deploying and a new SOLR war
file to individual servers.

Cheers,
Jason

Reply via email to