Is there a simple way to get solr to maintain shards as rolling partitions by
date, e.g., the last day's documents in one shard, the week before yesterday
in the next shard, the month before that in the next shard, and so on? I
really don't need querying to be fast on the entire index, but it is
cr
I have recently started getting the error pasted below with solr-3.6 on
/select queries. I don't know of anything that changed in the config to
start causing this error. I am also running a second independent solr server
on the same machine, which continues to run fine and has the same
configuratio
For the first install, I copied over all files in the directory "example"
into, let's call it, "install1". I did the same for "install2". The two
installs run on different ports, use different jar files, are not really
related to each other in any way as far as I can see. In particular, they
are no
Erick, thanks for pointing that out. I was going to say in my original post
that it is almost like some limit on max documents got violated all of a
sudden, but the rest of the symptoms didn't seem to quite match. But now
that I think about it, the problem probably happened at 2B (corresponding
exa
Yes, wonky indeed.
numDocs : -2006905329
maxDoc : -1993357870
And yes, I meant that the holes are in the database auto-increment ID space,
nothing to do with lucene IDs.
I will set up sharding. But is there any way to retrieve most of the current
index? Currently, all select queries even in
Thanks. Do you know if the tons of index files with names like '_zxt.tis' in
the index/data/ directory have the lucene IDs embedded in the binaries? The
files look good to me and are partly readable even if in binary. I am
wondering if I could just set up a new solr instance and move these index
fi
Erick, thanks for the advice, but let me make sure you haven't misunderstood
what I was asking.
I am not trying to split the huge existing index in install1 into shards. I
am also not trying to make the huge install1 index as one shard of a sharded
solr setup. I plan to use a sharded setup only fo
Erick, much thanks for detailing these options. I am currently trying the
second one as that seems a little easier and quicker to me.
I successfully deleted documents with IDs after the problem time that I do
know to an accuracy of a couple hours. Now, the stats are:
numDocs : 2132454075
maxDo
So, I tried 'optimize', but it failed because of lack of space on the first
machine. I then moved the whole thing to a different machine where the index
was pretty much the only thing and was using about 37% of disk, but it still
failed because of a "No space left on device" IOException. Also, the
I get a JSON parse error (pasted below) when I send an update to a replica
node. I downloaded solr 4 alpha and followed the instructions at
http://wiki.apache.org/solr/SolrCloud/ and setup numShards=1 with 3 total
servers managed by a zookeeper ensemble, the primary at 8983 and the other
two at 757
I am trying to wrap my head around replication in SolrCloud. I tried the
setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication
for high query throughput. The setup at the URL above appears to maintain
just one copy of the index at the primary node (instead of a replicated
index
My understanding is that the DIH in solr only enters last_indexed_time in
dataimport.properties, but not say last_indexed_id for a primary key 'id'.
How can I efficiently get the max(id) (note that 'id' is an auto-increment
field in the database) ? Maintaining max(id) outside of solr is brittle and
exactly how you are adding the document?
Eg, what update handler are you using, and what is the document you are adding?
On Jul 8, 2012, at 12:52 PM, avenka wrote:
> I get a JSON parse error (pasted below) when I send an update to a replica
> node. I downloaded solr 4 alpha and followed t
e is no master/slave setup any more. And you do
> _not_ have to configure replication.
>
> Best
> Erick
>
> On Sun, Jul 8, 2012 at 1:03 PM, avenka <[hidden email]> wrote:
>
> > I am trying to wrap my head around replication in SolrCloud. I tried the
> > setup at
Hmm, never mind my question about replicating using symlinks. Given that
replication on a single machine improves throughput, I should be able to get
a similar improvement by simply sharding on a single machine. As also
observed at
http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-se
Thanks. Can you explain more the first TermsComponent option to obtain
max(id)? Do I have to modify schema.xml to add a new field? How exactly do I
query for the lowest value of "1 - id"?
--
View this message in context:
http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-
16 matches
Mail list logo