Index size increases disproportionately to size of added field when indexed=false

2018-02-12 Thread Howe, David
Hi, We are using Solr 7.1.0 to index a database of addresses. We have found that our index size increases massively when we add one extra field to the index, even though that field is stored and not indexed, and doesn’t contain a lot of data. When this occurs, we also observe a significant i

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Alessanro, Thanks for responding. We rebuild the index every time starting from a fresh installation of Solr. Because we are running at AWS, we have automated our deployment so we start with the base docker image, configure Solr and then import our data every time the data changes (it onl

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Alessandro, The docker image is like a disk image of the entire server, so it includes the operating system, the Solr installation and the data. Because we run in the cloud and our index isn't that big, this is an easy and fast way for us to scale our Solr cluster without having to configu

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Erick, Thanks for responding. You are correct that we don't have any deleted docs. When we want to re-index (once a fortnight), we build a brand new installation of Solr from scratch and re-import the new data into an empty index. I will try setting docValues to false and see if that make

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Thanks Hoss. I will try setting docValues to false, as we only ever want to be able to retrieve the value of this field. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T 0391067904 M 0424036591 E david.h...@auspost.com.au W

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
auspost.com.au W startrack.com.au -Original Message- From: Howe, David [mailto:david.h...@auspost.com.au] Sent: Wednesday, 14 February 2018 7:26 AM To: solr-user@lucene.apache.org Subject: RE: Index size increases disproportionately to size of added field when indexed=false Thanks Hoss. I will

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Howe, David
16, 111 Bourke Street Melbourne VIC 3000 T 0391067904 M 0424036591 E david.h...@auspost.com.au W auspost.com.au W startrack.com.au -Original Message- From: Howe, David [mailto:david.h...@auspost.com.au] Sent: Wednesday, 14 February 2018 12:49 PM To: solr-user@lucene.apache.org

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David
Hi Alessandro, Some interesting testing today that seems to have gotten me closer to what the issue is. When I run the version of the index that is working correctly against my database table that has the extra field in it, the index suddenly increases in size. This is even though the data i

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David
Hi Erick, I have the full dump of the Solr index file sizes as well if that is of any help. I have attached it below this message. We don't have any deleted docs in our index, as we always build it from a brand new virtual machine with a brand new installation of Solr. The ordering is defini

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Emir, We have no copy field definitions. To keep things simple, we have a one to one mapping between the columns in our staging table and the fields in our Solr index. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T 039106

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Alessandro, There are 14,061,990 records in the staging table and that is how many documents that we end up with in Solr. I would be surprised if we have a problem with the id, as we use the primary key of the table as the id in Solr so it must be unique. The primary key of the staging ta

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, Below is the file listing for when the index is loaded with the table ordered in a way that produces the smaller index. I have checked the console, and we have no deleted docs and we have the same number of docs in the index as there are rows in the staging table that we load from.

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, Thinking some more about the differences between the two sort orders has suggested another possibility. We also have a geo spatial field defined in the index: echo "$(date) Creating geoLocation field" curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-fiel

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, I'm 99% sure that I haven't changed the field types between the two snapshots as all of my test runs are completely scripted and build a new Solr server from scratch (both the virtual machine and the Solr software). I can diff the scripts between two runs to make sure I haven't acci

RE: Solr Plugins Documentation

2018-02-17 Thread Howe, David
You will need to use your favourite java tooling to take the code that you have written and package it as a jar file. In my case, we use maven so I have set my custom extensions up as a maven project, and in my POM file (which tells maven what dependencies your project has), I declare:

RE: Getting the error - The field '*********' does not support spatial filtering

2018-02-18 Thread Howe, David
Hi Aakanksha, We use the following for geo queries which works for us: /solr/core/select?defType=edismax&indent=on&ps=0&start=0&wt=json&sow=true&hl=on&hl.fl=*&fq=%7B!geofilt%7D&pt=-6.08165,145.8612430&d=10&sfield=geoLocation&sort=geodist()%20asc&rows=10&fl=*,score,distance:geodist() This gives u

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-18 Thread Howe, David
Hi Erick & Alessandro, I have solved my problem by re-ordering the data in the SQL query. I don't know why it works but it does. I can consistently re-produce the problem without changing anything else except the database table. As our Solr build is scripted and we always build a new Solr s

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-03-08 Thread Howe, David
Hi Rick, Thanks for your response. The reason that we do it like this is that the localities are also part of another indexed field that contains the entire address. We actually do the search over that field, and we are only using the highlighting on the problematic field so that we can tell