Re: Return Lucene DocId in Solr Results

Erick Erickson Thu, 02 Dec 2010 12:07:11 -0800

You have to call termDocs.next() after termDocs.seek. Something like
termDocs.seek().
if (termDocs.next()) {
   // means there was a term/doc matching and your references should be
valid.
}


On Thu, Dec 2, 2010 at 10:22 AM, Lohrenz, Steven
<steven.lohr...@hmhpub.com>wrote:

> I must be missing something as I'm getting a NPE on the line: docIds[i] =
> termDocs.doc();
> here's what I came up with:
>
> private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, List<Favorites>
> favsBeans) throws ParseException {
>        // open the core & get data directory
>        String indexDir = req.getCore().getIndexDir();
>
>         FSDirectory indexDirectory = null;
>        try {
>            indexDirectory = FSDirectory.open(new File(indexDir));
>         } catch (IOException e) {
>            throw new ParseException("IOException, cannot open the index at:
> " + indexDir + " " + e.getMessage());
>        }
>
>         //String pkQueryString = "resourceId:" + favBean.getResourceId();
>         //Query pkQuery = new QueryParser(Version.LUCENE_CURRENT,
> "resourceId", new StandardAnalyzer()).parse(pkQueryString);
>
>        IndexSearcher searcher = null;
>        TopScoreDocCollector collector = null;
>         IndexReader indexReader = null;
>        TermDocs termDocs = null;
>
>        try {
>            searcher = new IndexSearcher(indexDirectory, true);
>            indexReader = new FilterIndexReader(searcher.getIndexReader());
>            termDocs = indexReader.termDocs();
>         } catch (IOException e) {
>            throw new ParseException("IOException, cannot open the index at:
> " + indexDir + " " + e.getMessage());
>        }
>
>        int[] docIds = new int[favsBeans.size()];
>        int i = 0;
>        for(Favorites favBean: favsBeans) {
>             Term term = new Term("resourceId", favBean.getResourceId());
>            try {
>                termDocs.seek(term);
>                docIds[i] = termDocs.doc();
>            } catch (IOException e) {
>                throw new ParseException("IOException, cannot seek to the
> primary key " + favBean.getResourceId() + " in : " + indexDir + " " +
> e.getMessage());
>             }
>            //ScoreDoc[] hits = collector.topDocs().scoreDocs;
>            //if(hits != null && hits[0] != null) {
>
>             i++;
>            //}
>        }
>
>        Arrays.sort(docIds);
>        return docIds;
>    }
>
> Thanks,
> Steve
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 02 December 2010 14:20
> To: solr-user@lucene.apache.org
> Subject: Re: Return Lucene DocId in Solr Results
>
> Ahhh, you're already down in Lucene. That makes things easier...
>
> See TermDocs. Particularly seek(Term). That'll directly access the indexed
> unique key rather than having to form a bunch of queries.
>
> Best
> Erick
>
>
> On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven
> <steven.lohr...@hmhpub.com>wrote:
>
> > I would be interested in hearing about some ways to improve the
> algorithm.
> > I have done a very straightforward Lucene query within a loop to get the
> > docIds.
> >
> > Here's what I did to get it working where favsBean are objects returned
> > from a query of the second core, but there is probably a better way to do
> > it:
> >
> > private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req,
> List<Favorites>
> > favsBeans) throws ParseException {
> >        // open the core & get data directory
> >        String indexDir = req.getCore().getIndexDir();
> >        FSDirectory index = null;
> >        try {
> >            index = FSDirectory.open(new File(indexDir));
> >        } catch (IOException e) {
> >            throw new ParseException("IOException, cannot open the index
> at:
> > " + indexDir + " " + e.getMessage());
> >        }
> >
> >        int[] docIds = new int[favsBeans.size()];
> >        int i = 0;
> >        for(Favorites favBean: favsBeans) {
> >            String pkQueryString = "resourceId:" +
> favBean.getResourceId();
> >            Query pkQuery = new QueryParser(Version.LUCENE_CURRENT,
> > "resourceId", new StandardAnalyzer()).parse(pkQueryString);
> >
> >            IndexSearcher searcher = null;
> >            TopScoreDocCollector collector = null;
> >            try {
> >                searcher = new IndexSearcher(index, true);
> >                collector = TopScoreDocCollector.create(1, true);
> >                searcher.search(pkQuery, collector);
> >            } catch (IOException e) {
> >                throw new ParseException("IOException, cannot search the
> > index at: " + indexDir + " " + e.getMessage());
> >            }
> >
> >            ScoreDoc[] hits = collector.topDocs().scoreDocs;
> >            if(hits != null && hits[0] != null) {
> >                docIds[i] = hits[0].doc;
> >                i++;
> >            }
> >        }
> >
> >        Arrays.sort(docIds);
> >        return docIds;
> >     }
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: 02 December 2010 13:46
> > To: solr-user@lucene.apache.org
> > Subject: Re: Return Lucene DocId in Solr Results
> >
> > Sounds good, especially because your old scenario was fragile. The doc
> IDs
> > in
> > your first core could change as a result of a single doc deletion and
> > optimize. So
> > the doc IDs stored in the second core would then be wrong...
> >
> > Your user-defined unique key is definitely a better way to go. There are
> > some tricks
> > you could try if there are performance issues....
> >
> > Best
> > Erick
> >
> > On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz, Steven
> > <steven.lohr...@hmhpub.com>wrote:
> >
> > > I know the doc ids from one core have nothing to do with the other. I
> was
> > > going to use the docId returned from the first core in the solr results
> > and
> > > store it in the second core that way the second core knows about the
> doc
> > ids
> > > from the first core. So when you query the second core from the Filter
> in
> > > the first core you get returned a set of data that includes the docId
> > from
> > > the first core that the document relates to.
> > >
> > > I have backed off from this approach and have a user defined primary
> key
> > in
> > > the firstCore, which is stored as the reference in the secondCore and
> > when
> > > the filter performs the search it goes off and queries the firstCore
> for
> > > each primary key and gets the lucene docId from the returned doc.
> > >
> > > Thanks,
> > > Steve
> > >
> > > -----Original Message-----
> > > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > > Sent: 02 December 2010 02:19
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Return Lucene DocId in Solr Results
> > >
> > > On the face of it, this doesn't make sense, so perhaps you can explain
> a
> > > bit.The doc IDs
> > > from one Solr instance have no relation to the doc IDs from another
> Solr
> > > instance. So anything
> > > that uses doc IDs from one Solr instance to create a filter on another
> > > instance doesn't seem
> > > to be something you'd want to do...
> > >
> > > Which may just mean I don't understand what you're trying to do. Can
> you
> > > back up a bit
> > > and describe the higher-level problem? This seems like it may be an XY
> > > problem, see:
> > > http://people.apache.org/~hossman/#xyproblem
> > >
> > > Best
> > > Erick
> > >
> > > On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven
> > > <steven.lohr...@hmhpub.com>wrote:
> > >
> > > > Hi,
> > > >
> > > > I was wondering how I would go about getting the lucene docid
> included
> > in
> > > > the results from a solr query?
> > > >
> > > > I've built a QueryParser to query another solr instance and and join
> > the
> > > > results of the two instances through the use of a Filter.  The Filter
> > > needs
> > > > the lucene docid to work. This is the only bit I'm missing right now.
> > > >
> > > > Thanks,
> > > > Steve
> > > >
> > > >
> > >
> >
>

Re: Return Lucene DocId in Solr Results

Reply via email to