Hi,
We are using Solr 7.1.0 to index a database of addresses. We have found that
our index size increases massively when we add one extra field to the index,
even though that field is stored and not indexed, and doesn’t contain a lot of
data. When this occurs, we also observe a significant i
Hi Alessanro,
Thanks for responding. We rebuild the index every time starting from a fresh
installation of Solr. Because we are running at AWS, we have automated our
deployment so we start with the base docker image, configure Solr and then
import our data every time the data changes (it onl
Hi Alessandro,
The docker image is like a disk image of the entire server, so it includes the
operating system, the Solr installation and the data. Because we run in the
cloud and our index isn't that big, this is an easy and fast way for us to
scale our Solr cluster without having to configu
Hi Erick,
Thanks for responding. You are correct that we don't have any deleted docs.
When we want to re-index (once a fortnight), we build a brand new installation
of Solr from scratch and re-import the new data into an empty index.
I will try setting docValues to false and see if that make
Thanks Hoss. I will try setting docValues to false, as we only ever want to be
able to retrieve the value of this field.
Regards,
David
David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000
T 0391067904
M 0424036591
E david.h...@auspost.com.au
W
auspost.com.au
W startrack.com.au
-Original Message-
From: Howe, David [mailto:david.h...@auspost.com.au]
Sent: Wednesday, 14 February 2018 7:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Index size increases disproportionately to size of added field
when indexed=false
Thanks Hoss. I will
16, 111 Bourke Street Melbourne VIC 3000
T 0391067904
M 0424036591
E david.h...@auspost.com.au
W auspost.com.au
W startrack.com.au
-Original Message-
From: Howe, David [mailto:david.h...@auspost.com.au]
Sent: Wednesday, 14 February 2018 12:49 PM
To: solr-user@lucene.apache.org
Hi Alessandro,
Some interesting testing today that seems to have gotten me closer to what the
issue is. When I run the version of the index that is working correctly
against my database table that has the extra field in it, the index suddenly
increases in size. This is even though the data i
Hi Erick,
I have the full dump of the Solr index file sizes as well if that is of any
help. I have attached it below this message.
We don't have any deleted docs in our index, as we always build it from a brand
new virtual machine with a brand new installation of Solr.
The ordering is defini
Hi Emir,
We have no copy field definitions. To keep things simple, we have a one to one
mapping between the columns in our staging table and the fields in our Solr
index.
Regards,
David
David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000
T 039106
Hi Alessandro,
There are 14,061,990 records in the staging table and that is how many
documents that we end up with in Solr. I would be surprised if we have a
problem with the id, as we use the primary key of the table as the id in Solr
so it must be unique.
The primary key of the staging ta
Hi Erick,
Below is the file listing for when the index is loaded with the table ordered
in a way that produces the smaller index.
I have checked the console, and we have no deleted docs and we have the same
number of docs in the index as there are rows in the staging table that we load
from.
Hi Erick,
Thinking some more about the differences between the two sort orders has
suggested another possibility. We also have a geo spatial field defined in the
index:
echo "$(date) Creating geoLocation field"
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-fiel
Hi Erick,
I'm 99% sure that I haven't changed the field types between the two snapshots
as all of my test runs are completely scripted and build a new Solr server from
scratch (both the virtual machine and the Solr software). I can diff the
scripts between two runs to make sure I haven't acci
You will need to use your favourite java tooling to take the code that you have
written and package it as a jar file. In my case, we use maven so I have set
my custom extensions up as a maven project, and in my POM file (which tells
maven what dependencies your project has), I declare:
Hi Aakanksha,
We use the following for geo queries which works for us:
/solr/core/select?defType=edismax&indent=on&ps=0&start=0&wt=json&sow=true&hl=on&hl.fl=*&fq=%7B!geofilt%7D&pt=-6.08165,145.8612430&d=10&sfield=geoLocation&sort=geodist()%20asc&rows=10&fl=*,score,distance:geodist()
This gives u
Hi Erick & Alessandro,
I have solved my problem by re-ordering the data in the SQL query. I don't
know why it works but it does. I can consistently re-produce the problem
without changing anything else except the database table. As our Solr build is
scripted and we always build a new Solr s
Hi Rick,
Thanks for your response. The reason that we do it like this is that the
localities are also part of another indexed field that contains the entire
address. We actually do the search over that field, and we are only using the
highlighting on the problematic field so that we can tell
18 matches
Mail list logo