a small problem of distributed search

Li Li Mon, 16 Aug 2010 20:23:08 -0700

  current implementation of distributed search use unique key in the
STAGE_EXECUTE_QUERY stage.


  public int distributedProcess(ResponseBuilder rb) throws IOException {
    ...
    if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) {
      createMainQuery(rb);
      return ResponseBuilder.STAGE_GET_FIELDS;
    }
    ...
  }

  in CreateMainQuery
  sreq.params.set(CommonParams.FL,
rb.req.getSchema().getUniqueKeyField().getName() + ",score");
  which will set fl=url,score
  url is my unique key which is indexed without analyzed and stored
  the url is actually load in BinaryResponseWriter.writeDocList
  which call Document doc = searcher.doc(id, returnFields); //url is
in returnFields

  So all the url of top N doc's url is read from fdt file
  But unique key is usually short and can be loaded into memory. So we
can use StringIndex to cache it.
  In my application, we need top 100 docs for collpasing and
reranking. And it speeds up more than 50ms(we use SCSI disk) for each
query and worst results become less frequent.

a small problem of distributed search

Reply via email to