Hi David,
good to know that sorting solved your problem.
I understand perfectly that given the urgency of your situation, having the
solution ready takes priority over continuing with the investigations.
I would recommend anyway to open a Jira issue in Apache Solr with all the
information gathered
Hi Erick & Alessandro,
I have solved my problem by re-ordering the data in the SQL query. I don't
know why it works but it does. I can consistently re-produce the problem
without changing anything else except the database table. As our Solr build is
scripted and we always build a new Solr s
I didn't mean to imply that _you'd_ changed things, the _defaults_ may
have changed. So the "string" fieldType may be defined with
docValues="true" in your new schema and "false" in your old schema
without you intentionally changing anything at _all_.
That's why the LukeRequestHandler will hel
Hi Erick,
I'm 99% sure that I haven't changed the field types between the two snapshots
as all of my test runs are completely scripted and build a new Solr server from
scratch (both the virtual machine and the Solr software). I can diff the
scripts between two runs to make sure I haven't acci
Well, I'm not entirely sure either ;)
What I'm seeing. And, BTW, I'm making a couple of assumptions here. In
the one listing, your biggest segment starts with _7l and in the other
its _zd. The aggregate size is
2,815M for _7l and 705M for _zd. So multiplying the individual files
in _zd by 4 (p
Hi Erick,
Thinking some more about the differences between the two sort orders has
suggested another possibility. We also have a geo spatial field defined in the
index:
echo "$(date) Creating geoLocation field"
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-fiel
Hi Erick,
Below is the file listing for when the index is loaded with the table ordered
in a way that produces the smaller index.
I have checked the console, and we have no deleted docs and we have the same
number of docs in the index as there are rows in the staging table that we load
from.
Hi Alessandro,
There are 14,061,990 records in the staging table and that is how many
documents that we end up with in Solr. I would be surprised if we have a
problem with the id, as we use the primary key of the table as the id in Solr
so it must be unique.
The primary key of the staging ta
It's a silly thing, but to confirm the direction that Erick is suggesting :
How many rows in the DB ?
If updates are happening on Solr ( causing the deletes), I would expect a
greater number of documents in the DB than in the Solr index.
Is the DB primary key ( if any) the same of the uniqueKey fie
Hi Emir,
We have no copy field definitions. To keep things simple, we have a one to one
mapping between the columns in our staging table and the fields in our Solr
index.
Regards,
David
David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000
T 039106
Hi David,
I skimmed through thread and don’t see if already eliminated, so will ask: Can
you check if there are some copyField rules that are triggered when new field
is added. You mentioned that ordering fixed the size of the index, but might be
worth checking.
Emir
--
Monitoring - Log Managem
This isn't terribly useful without a similar dump of "the other" index
directory. The point is to compare the different extensions some
segment where the sum of all the files in that segment is roughly
equal. So if you have a listing of the old index around, that would
help.
bq: We don't have any
Hi Erick,
I have the full dump of the Solr index file sizes as well if that is of any
help. I have attached it below this message.
We don't have any deleted docs in our index, as we always build it from a brand
new virtual machine with a brand new installation of Solr.
The ordering is defini
David:
Rats, the cfs files make everything I'd hoped to understand with the
sizes ambiguous, since they conceal the underlying sizes of each other
extension. We can approach it a bit differently though. Take one
segment that's _not_ in cfs format where the total size of all files
making up that se
@Alessandro I will see if I can reproduce the same issue just by turning
off omitNorms on field type. I'll open another mail thread if required.
Thanks.
On Thu, Feb 15, 2018 at 6:12 AM, Howe, David
wrote:
>
> Hi Alessandro,
>
> Some interesting testing today that seems to have gotten me closer t
Hi Alessandro,
Some interesting testing today that seems to have gotten me closer to what the
issue is. When I run the version of the index that is working correctly
against my database table that has the extra field in it, the index suddenly
increases in size. This is even though the data i
@Pratik: you should have investigated. I understand that solved your issue,
but in case you needed norms it doesn't make sense that cause your index to
grow up by a factor of 30. You must have faced a nasty bug if it was just
the norms.
@Howe :
*Compound File* .cfs, .cfe An optional "virtua
Subject: RE: Index size increases disproportionately to size of added field
when indexed=false
I have set docValues=false on all of the string fields in our index that have
indexed=false and stored=true. This gave a small improvement in the index size
from 13.3GB to 12.82GB.
I have also tried
You are right, in my case this field type was applied to many text fields.
These includes many copy fields and dynamic fields as well. In my case,
only specifying omitNorms=true for field type "text_general" fixed the
issue. I didn't do anything else or had any other bug.
On Wed, Feb 14, 2018 at 1
Hi pratik,
how is it possible that just the norms for a single field were causing such
a massive index size increment in your case ?
In your case I think it was for a field type used by multiple fields, but
it's still suspicious in my opinions,
norms should be that big.
If I remember correctly in
067904
>>
>> M 0424036591
>>
>> E david.h...@auspost.com.au
>>
>> W auspost.com.au
>> W startrack.com.au
>>
>> -Original Message-
>> From: Howe, David [mailto:david.h...@auspost.com.au]
>> Sent: Wednesday, 14 February 2018 7:26 AM
>
nt: Wednesday, 14 February 2018 7:26 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Index size increases disproportionately to size of added
> field when indexed=false
>
>
> Thanks Hoss. I will try setting docValues to false, as we only ever want
> to be able to retrieve the
auspost.com.au
W startrack.com.au
-Original Message-
From: Howe, David [mailto:david.h...@auspost.com.au]
Sent: Wednesday, 14 February 2018 7:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Index size increases disproportionately to size of added field
when indexed=false
Thanks Hoss. I will
Thanks Hoss. I will try setting docValues to false, as we only ever want to be
able to retrieve the value of this field.
Regards,
David
David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000
T 0391067904
M 0424036591
E david.h...@auspost.com.au
W
Hi Erick,
Thanks for responding. You are correct that we don't have any deleted docs.
When we want to re-index (once a fortnight), we build a brand new installation
of Solr from scratch and re-import the new data into an empty index.
I will try setting docValues to false and see if that make
Hi Alessandro,
The docker image is like a disk image of the entire server, so it includes the
operating system, the Solr installation and the data. Because we run in the
cloud and our index isn't that big, this is an easy and fast way for us to
scale our Solr cluster without having to configu
To piggy back on this, what would be the right scenarios to use
docvalues='true'?
On Tue, Feb 13, 2018 at 1:10 PM, Chris Hostetter
wrote:
>
> : We are using Solr 7.1.0 to index a database of addresses. We have found
> : that our index size increases massively when we add one extra field to
> :
: We are using Solr 7.1.0 to index a database of addresses. We have found
: that our index size increases massively when we add one extra field to
: the index, even though that field is stored and not indexed, and doesn’t
what about docValues?
: When we run an index load without the problema
David:
Right, Optimize Is Evil. Well, actually in your case it's not. In your
specific case you can optimize every time you build your index and be
OK, gory details here:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
But that's just for background. The key
Hi David,
given the fact that you are actually building a new index from scratch, my
shot in the dark didn't hit any target.
When you say : "Once the import finishes we save the docker image in the
AWS docker repository. We then build our cluster using that image as the
base"
Do you mean just c
Hi Alessanro,
Thanks for responding. We rebuild the index every time starting from a fresh
installation of Solr. Because we are running at AWS, we have automated our
deployment so we start with the base docker image, configure Solr and then
import our data every time the data changes (it onl
I assume you re-index in full right ?
My shot in the dark is that this increment is temporary.
You re-index, so effectively delete and add all documents ( this means that
even if the new field is just stored, you re-build the entire index for all
the fields).
Create new segments and the old docs ar
Hi,
We are using Solr 7.1.0 to index a database of addresses. We have found that
our index size increases massively when we add one extra field to the index,
even though that field is stored and not indexed, and doesn’t contain a lot of
data. When this occurs, we also observe a significant i
33 matches
Mail list logo