I'll try reindexing the timestamp.
The id-creation approach suggested by Erick sounds attractive, but the
nutch/solr integration seems rather tight. I don't where to break in to
insert the id into solr.
On Mon, Jul 29, 2013 at 4:11 AM, Erick Erickson wrote:
> No SolrJ doesn't provide this autom
No SolrJ doesn't provide this automatically. You'd be providing the
counter by inserting it into the document as you created new docs.
You could do this with any kind of document creation you are
using.
Best
Erick
On Mon, Jul 29, 2013 at 2:51 AM, Aditya wrote:
> Hi,
>
> The easiest solution wou
Hi,
The easiest solution would be to have timestamp indexed. Is there any issue
in doing re-indexing?
If you want to process records in batch then you need a ordered list and a
bookmark. You require a field to sort and maintain a counter / last id as
bookmark. This is mandatory to solve your probl
Basically, I was thinking about running a range query like Shawn suggested
on the tstamp field, but unfortunately it was not indexed. Range queries
only work on indexed fields, right?
On Sun, Jul 28, 2013 at 9:49 PM, Joe Zhang wrote:
> I've been thinking about tstamp solution int the past few d
I've been thinking about tstamp solution int the past few days. but too
bad, the field is avaialble but not indexed...
I'm not familiar with SolrJ. Again, sounds like SolrJ is providing the
counter value. If yes, that would be equivalent to an autoincrement id. I'm
indexing from Nutch though; don'
Why wouldn't a simple timestamp work for the ordering? Although
I guess "simple timestamp" isn't really simple if the time settings
change.
So how about a simple counter field in your documents? Assuming
you're indexing from SolrJ, your setup is to query q=*:*&sort=counter desc.
Take the counter f
In both cases, for better performance, first I'd load just all the IDs,
after, during processing I'd load each document.
For what concern the incremental requirement, it should not be difficult to
write an hash function which maps a non-numerical I'd to a value.
On Jul 27, 2013 7:03 AM, "Joe Zhang
On Sat, Jul 27, 2013 at 4:17 PM, Shawn Heisey wrote:
> On 7/27/2013 11:38 AM, Joe Zhang wrote:
> > I have a constantly growing index, so not updating the index can't be
> > practical...
> >
> > Going back to the beginning of this thread: when we use the vanilla
> > "*:*"+pagination approach, woul
On 7/27/2013 11:38 AM, Joe Zhang wrote:
> I have a constantly growing index, so not updating the index can't be
> practical...
>
> Going back to the beginning of this thread: when we use the vanilla
> "*:*"+pagination approach, would the ordering of documents remain stable?
> the index is dyn
I have a constantly growing index, so not updating the index can't be
practical...
Going back to the beginning of this thread: when we use the vanilla
"*:*"+pagination approach, would the ordering of documents remain stable?
the index is dynamic: update/insertion only, no deletion.
On Sat,
On 7/27/2013 11:17 AM, Joe Zhang wrote:
> Thanks for sharing, Roman. I'll look into your code.
>
> One more thought on your suggestion, Shawn. In fact, for the id, we need
> more than "unique" and "rangeable"; we also need some sense of atomic
> values. Your approach might run into risk with a tex
Thanks for sharing, Roman. I'll look into your code.
One more thought on your suggestion, Shawn. In fact, for the id, we need
more than "unique" and "rangeable"; we also need some sense of atomic
values. Your approach might run into risk with a text-based id field, say:
the id/key has values 'a',
Dear list,
I'vw written a special processor exactly for this kind of operations
https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs/src/java/org/apache/solr/handler/batch
This is how we use it
http://labs.adsabs.harvard.edu/trac/ads-invenio/wiki/SearchEngineBatch
It is capable of
Thanks.
On Fri, Jul 26, 2013 at 11:34 PM, Shawn Heisey wrote:
> On 7/27/2013 12:30 AM, Joe Zhang wrote:
> > ==> so a "url" field would work fine?
>
> As long as it's guaranteed unique on every document (especially if it is
> your uniqueKey) and goes into the index as a single token, that should
On 7/27/2013 12:30 AM, Joe Zhang wrote:
> ==> so a "url" field would work fine?
As long as it's guaranteed unique on every document (especially if it is
your uniqueKey) and goes into the index as a single token, that should
work just fine for the range queries I've described.
Thanks,
Shawn
On Fri, Jul 26, 2013 at 11:18 PM, Shawn Heisey wrote:
> On 7/26/2013 11:50 PM, Joe Zhang wrote:
> > ==> Essentially we are doing paigination here, right? If performance is
> not
> > the concern, given that the index is dynamic, does the order of
> > entries remain stable over time?
>
> Yes, it's
On 7/26/2013 11:50 PM, Joe Zhang wrote:
> ==> Essentially we are doing paigination here, right? If performance is not
> the concern, given that the index is dynamic, does the order of
> entries remain stable over time?
Yes, it's pagination. Just like the other method that I've described in
detail
On a related, inspired by what you said, Shawn, an auto increment id seems
perfect here. Yet I found there is no such support in solr. The UUID only
guarantees uniqueness.
On Fri, Jul 26, 2013 at 10:50 PM, Joe Zhang wrote:
> Thanks for your kind reply, Shawn.
>
> On Fri, Jul 26, 2013 at 10:27 P
Thanks for your kind reply, Shawn.
On Fri, Jul 26, 2013 at 10:27 PM, Shawn Heisey wrote:
> On 7/26/2013 11:02 PM, Joe Zhang wrote:
> > I have an ever-growing solr repository, and I need to process every
> single
> > document to extract statistics. What would be a reasonable process that
> > sati
On 7/26/2013 11:02 PM, Joe Zhang wrote:
> I have an ever-growing solr repository, and I need to process every single
> document to extract statistics. What would be a reasonable process that
> satifies the following properties:
>
> - Exhaustive: I have to traverse every single document
> - Increme
Dear list:
I have an ever-growing solr repository, and I need to process every single
document to extract statistics. What would be a reasonable process that
satifies the following properties:
- Exhaustive: I have to traverse every single document
- Incremental: in other words, it has to allow me
21 matches
Mail list logo