On 22-Nov-07, at 6:02 AM, Jörg Kiegeland wrote:
1. Does Solr support this kind of index access with better
performance ?
Is there anything special to define in schema.xml?
No... Solr uses Lucene at it's core, and all matching documents for a
query are scored.
So it is not possible to have a "google" like performance with
Solr, i.e. to search for a set of keywords and only the 10 best
documents are listed, without touching the other millions of (web)
documents matching less keywords.
I infact would not know how to program such an index, however
google has done it somehow..
I can be fairly certain that google does not execute queries that
match millions of documents on a single machine. The default query
operator is (mostly) AND, so the possible match sets is much
smaller. Also, I imagine they have relatively few documents per
machine.
2. Can one switch off this ordering and just return any 100
documents
fullfilling the query (though getting best-matching documents
would be
a nice feature if it would be fast)?
a feature like this could be developed... but what is the usecase for
this? What are you tring to accomplish where either relevancy or
complete matching doesn't matter? There may be an easier workaround
for your specific case.
This is not an actual Use-Case for my project, however I just
wanted to know if it would be possible.
Because of the performance results, we designed a new type of
query. I would like to know how fast it would be before I implement
the following query:
I have N keywords and execute a query of the form
keyword1 AND keyword2 AND .. AND keywordN
there may be again some millions of matching documents and I want
to get the first 100 documents.
To have a ordering criteria, each Solr document has a field named
"REV" which has a natural number. The returned 100 documents shall
be those with
the lowest numbers in the "REV" field.
My questions now are:
(1) Will the query perform in O(100) or in O(all possible matches)?
O(all possible matches)
(2) If the answer to (1) is O(all possible matches), what will be
the performance if I dont order for the "REV" field? Will Solr
order it after the point of time where a document was created/
modified? What I have to do to get O(100) complexity finally?
Ordering by natural document order in the index is sufficient to
achieve O(100), but you'll have to insert code in Solr to stop after
100 docs (another alternative is to stop processing after a given
amount of time). Also, using O() in the case isn't quite accurate:
there are costs that vary based on the number of docs in the index too.
-Mike