RE: Return Lucene DocId in Solr Results

Lohrenz, Steven Thu, 02 Dec 2010 07:23:41 -0800

I must be missing something as I'm getting a NPE on the line: docIds[i] = 
termDocs.doc(); 
here's what I came up with:


private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, List<Favorites> 
favsBeans) throws ParseException {
        // open the core & get data directory
        String indexDir = req.getCore().getIndexDir();

        FSDirectory indexDirectory = null;
        try {
            indexDirectory = FSDirectory.open(new File(indexDir));
        } catch (IOException e) {
            throw new ParseException("IOException, cannot open the index at: " 
+ indexDir + " " + e.getMessage());
        }

        //String pkQueryString = "resourceId:" + favBean.getResourceId();
        //Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, "resourceId", 
new StandardAnalyzer()).parse(pkQueryString);

        IndexSearcher searcher = null;
        TopScoreDocCollector collector = null;
        IndexReader indexReader = null;
        TermDocs termDocs = null;

        try {
            searcher = new IndexSearcher(indexDirectory, true);
            indexReader = new FilterIndexReader(searcher.getIndexReader());
            termDocs = indexReader.termDocs();
        } catch (IOException e) {
            throw new ParseException("IOException, cannot open the index at: " 
+ indexDir + " " + e.getMessage());
        }
        
        int[] docIds = new int[favsBeans.size()];
        int i = 0;
        for(Favorites favBean: favsBeans) {
            Term term = new Term("resourceId", favBean.getResourceId());
            try {
                termDocs.seek(term);
                docIds[i] = termDocs.doc();
            } catch (IOException e) {
                throw new ParseException("IOException, cannot seek to the 
primary key " + favBean.getResourceId() + " in : " + indexDir + " " + 
e.getMessage());
            }
            //ScoreDoc[] hits = collector.topDocs().scoreDocs;
            //if(hits != null && hits[0] != null) {

            i++;
            //}
        }
        
        Arrays.sort(docIds);
        return docIds;
    }

Thanks,
Steve
-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 02 December 2010 14:20
To: solr-user@lucene.apache.org
Subject: Re: Return Lucene DocId in Solr Results

Ahhh, you're already down in Lucene. That makes things easier...

See TermDocs. Particularly seek(Term). That'll directly access the indexed
unique key rather than having to form a bunch of queries.

Best
Erick


On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven
<steven.lohr...@hmhpub.com>wrote:

> I would be interested in hearing about some ways to improve the algorithm.
> I have done a very straightforward Lucene query within a loop to get the
> docIds.
>
> Here's what I did to get it working where favsBean are objects returned
> from a query of the second core, but there is probably a better way to do
> it:
>
> private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, List<Favorites>
> favsBeans) throws ParseException {
>        // open the core & get data directory
>        String indexDir = req.getCore().getIndexDir();
>        FSDirectory index = null;
>        try {
>            index = FSDirectory.open(new File(indexDir));
>        } catch (IOException e) {
>            throw new ParseException("IOException, cannot open the index at:
> " + indexDir + " " + e.getMessage());
>        }
>
>        int[] docIds = new int[favsBeans.size()];
>        int i = 0;
>        for(Favorites favBean: favsBeans) {
>            String pkQueryString = "resourceId:" + favBean.getResourceId();
>            Query pkQuery = new QueryParser(Version.LUCENE_CURRENT,
> "resourceId", new StandardAnalyzer()).parse(pkQueryString);
>
>            IndexSearcher searcher = null;
>            TopScoreDocCollector collector = null;
>            try {
>                searcher = new IndexSearcher(index, true);
>                collector = TopScoreDocCollector.create(1, true);
>                searcher.search(pkQuery, collector);
>            } catch (IOException e) {
>                throw new ParseException("IOException, cannot search the
> index at: " + indexDir + " " + e.getMessage());
>            }
>
>            ScoreDoc[] hits = collector.topDocs().scoreDocs;
>            if(hits != null && hits[0] != null) {
>                docIds[i] = hits[0].doc;
>                i++;
>            }
>        }
>
>        Arrays.sort(docIds);
>        return docIds;
>     }
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 02 December 2010 13:46
> To: solr-user@lucene.apache.org
> Subject: Re: Return Lucene DocId in Solr Results
>
> Sounds good, especially because your old scenario was fragile. The doc IDs
> in
> your first core could change as a result of a single doc deletion and
> optimize. So
> the doc IDs stored in the second core would then be wrong...
>
> Your user-defined unique key is definitely a better way to go. There are
> some tricks
> you could try if there are performance issues....
>
> Best
> Erick
>
> On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz, Steven
> <steven.lohr...@hmhpub.com>wrote:
>
> > I know the doc ids from one core have nothing to do with the other. I was
> > going to use the docId returned from the first core in the solr results
> and
> > store it in the second core that way the second core knows about the doc
> ids
> > from the first core. So when you query the second core from the Filter in
> > the first core you get returned a set of data that includes the docId
> from
> > the first core that the document relates to.
> >
> > I have backed off from this approach and have a user defined primary key
> in
> > the firstCore, which is stored as the reference in the secondCore and
> when
> > the filter performs the search it goes off and queries the firstCore for
> > each primary key and gets the lucene docId from the returned doc.
> >
> > Thanks,
> > Steve
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: 02 December 2010 02:19
> > To: solr-user@lucene.apache.org
> > Subject: Re: Return Lucene DocId in Solr Results
> >
> > On the face of it, this doesn't make sense, so perhaps you can explain a
> > bit.The doc IDs
> > from one Solr instance have no relation to the doc IDs from another Solr
> > instance. So anything
> > that uses doc IDs from one Solr instance to create a filter on another
> > instance doesn't seem
> > to be something you'd want to do...
> >
> > Which may just mean I don't understand what you're trying to do. Can you
> > back up a bit
> > and describe the higher-level problem? This seems like it may be an XY
> > problem, see:
> > http://people.apache.org/~hossman/#xyproblem
> >
> > Best
> > Erick
> >
> > On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven
> > <steven.lohr...@hmhpub.com>wrote:
> >
> > > Hi,
> > >
> > > I was wondering how I would go about getting the lucene docid included
> in
> > > the results from a solr query?
> > >
> > > I've built a QueryParser to query another solr instance and and join
> the
> > > results of the two instances through the use of a Filter.  The Filter
> > needs
> > > the lucene docid to work. This is the only bit I'm missing right now.
> > >
> > > Thanks,
> > > Steve
> > >
> > >
> >
>

RE: Return Lucene DocId in Solr Results

Reply via email to