Re: Index Replication Failure

2020-10-20 Thread Parshant Kumar
Hi all, please check the details On Sat, Oct 17, 2020 at 5:52 PM Parshant Kumar wrote: > > > *Architecture is master->repeater->slave servers in hierarchy.* > > *One of the Below exceptions are occuring whenever replication fails.* > > 1)WARN : Error in fetching file: _4rnu_t.liv (downloaded 0 o

Re: Index Replication Failure

2020-10-17 Thread Parshant Kumar
*Architecture is master->repeater->slave servers in hierarchy.* *One of the Below exceptions are occuring whenever replication fails.* 1)WARN : Error in fetching file: _4rnu_t.liv (downloaded 0 of 11505507 bytes) java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.I

Re: Index Replication Failure

2020-10-17 Thread Erick Erickson
None of your images made it through the mail server. You’ll have to put them somewhere and provide a link. > On Oct 17, 2020, at 5:17 AM, Parshant Kumar > wrote: > > Architecture image: If not visible in previous mail > > > > > On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar > wrote: > Hi

Re: Index Replication Failure

2020-10-17 Thread Parshant Kumar
Architecture image: If not visible in previous mail [image: image.png] On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar wrote: > Hi all, > > We are having solr architecture as below. > > > > *We are facing the frequent replication failure between master to repeater > server as well as between r

Re: Index Deeply Nested documents and retrieve a full nested document in solr

2020-09-24 Thread Alexandre Rafalovitch
It is yes to both questions, but I am not sure if they play well together for historical reasons. For storing/parsing original JSON in any (custom) format: https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html (srcField parameter) For indexing nested children (with na

Re: Index files on Windows fileshare

2020-06-25 Thread Fiz N
Thanks Jason. Appreciate your response. Thanks Fiz N. On Thu, Jun 25, 2020 at 5:42 AM Jason Gerlowski wrote: > Hi Fiz, > > Since you're just looking for a POC solution, I think Solr's > "bin/post" tool would probably help you achieve your first > requirement. > > But I don't think "bin/post" gi

Re: Index files on Windows fileshare

2020-06-25 Thread Jason Gerlowski
Hi Fiz, Since you're just looking for a POC solution, I think Solr's "bin/post" tool would probably help you achieve your first requirement. But I don't think "bin/post" gives you much control over the fields that get indexed - if you need the file path to be stored, you might be better off writi

Re: Index file on Windows fileshare..

2020-06-23 Thread Erick Erickson
The program I pointed you to should take about an hour to make work. But otherwise, you can try the post tool: https://lucene.apache.org/solr/guide/7_2/post-tool.html Best, Erick > On Jun 23, 2020, at 8:45 AM, Fiz N wrote: > > Thanks Erick. Is there easy way of doing this? Index files from win

Re: Index file on Windows fileshare..

2020-06-23 Thread Fiz N
Thanks Erick. Is there easy way of doing this? Index files from windows share folder to SOLR. This is for POC only. Thanks Nadian. On Mon, Jun 22, 2020 at 3:54 PM Erick Erickson wrote: > Consider running Tika in a client and indexing the docs to Solr. > At that point, you have total control ove

Re: Index file on Windows fileshare..

2020-06-22 Thread Erick Erickson
Consider running Tika in a client and indexing the docs to Solr. At that point, you have total control over what’s indexed. Here’s a skeletal program to get you started: https://lucidworks.com/post/indexing-with-solrj/ Best, Erick > On Jun 22, 2020, at 1:21 PM, Fiz N wrote: > > Hello Solr exp

Re: Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Florin Babes
Hello, The patch is to fix the display. It doesn't configure or limit the speed :) În mar., 16 iun. 2020 la 14:26, Shawn Heisey a scris: > On 6/14/2020 12:06 AM, Florin Babes wrote: > > While checking ways to optimize the speed of replication I've noticed > that > > the index download speed is

Re: Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Shawn Heisey
On 6/14/2020 12:06 AM, Florin Babes wrote: While checking ways to optimize the speed of replication I've noticed that the index download speed is fixed at 5.1 in replication.html. There is a reason for that? If not, I would like to submit a patch with the fix. We are using solr 8.3.1. Looking a

Re: index join without query criteria

2020-06-08 Thread Mikhail Khludnev
or probably -director_id:[* TO *] On Mon, Jun 8, 2020 at 10:56 PM Hari Iyer wrote: > Hi, > > It appears that a query criteria is mandatory for a join. Taking this > example from the documentation: fq={!join from=id fromIndex=movie_directors > to=director_id}has_oscar:true. What if I want to find

Re: Index using CSV file

2020-04-18 Thread Jörn Franke
Please also do not forget that you should create a schema in the Solr collection so that the data is correctly indexed so that you get fast and correct query result. I usually recommend to read one of the many Solr books out there to get started. This will save you a lot of time. > Am 18.04.2

Re: Index using CSV file

2020-04-18 Thread Jörn Franke
This you don’t do via the Solr UI. You have many choices amongst others 1) write a client yourself that parses the csv and post it to the standard Update handler https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html 2) use the Solr post tool https://lucene.apache.org/

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
Ok, i created collection from scratch based on config Unfortunately, it does not improve. It is just growing and growing. Except when I stop solr and then during startup the unnecessary index files are purged. Even with the previous config this did not happen in older Solr versions (for sure not i

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
After testing the update?commit=true i now face an error: "Maximum lock count exceeded". strange this is the first time i see this in the lockfiles and when doing commit=true ava.lang.Error: Maximum lock count exceeded at java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryA

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
The only weird thing is I see that for instance I have ${solr.autoCommit.maxTime:15000} and similar entries. It looks like a template gone wrong, but this was not caused due to an internal development. It must have been come from a Solr version. On Tue, Jan 21, 2020 at 10:49 PM Jörn Franke wrote

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
It is btw. a Linux system and autosoftcommit is set to -1. However, indeed openSearcher is set to false. A commit is set to true after doing all the updates, but the index is not shrinking. The files are not disappearing during shutdown, but they disappear after starting up again. On Tue, Jan 21,

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
thanks for the answer I will look into it - it is a possible explanation. > Am 20.01.2020 um 14:30 schrieb Erick Erickson : > > Jörn: > > The only thing I can think of that _might_ cause this (I’m not all that > familiar with the code) is if your solrconfig settings never open a searcher. >

Re: Index growing and growing until restart

2020-01-20 Thread Erick Erickson
Jörn: The only thing I can think of that _might_ cause this (I’m not all that familiar with the code) is if your solrconfig settings never open a searcher. Either you need to be sure openSearcher is set to true in the autocommit section in solrconfig.xml or your autoSoftCommit is set to somethi

Re: Index growing and growing until restart

2020-01-20 Thread Jörn Franke
From what is see it basically duplicates the index files, but does not delete the old ones. It uses caffeine cache. What I observe is that there is an exception when shutting down for the collection that is updated - timeout waiting for all directory ref counts to be released - gave up waiting

Re: Index growing and growing until restart

2020-01-20 Thread Jörn Franke
Sorry I missed a line - not tlog is growing but the /data/index folder is growing - until restart when it seems to be purged. > Am 20.01.2020 um 10:47 schrieb Jörn Franke : > > Hi, > > I have a test system here with Solr 8.4 (but this is also reproducible in > older Solr versions), which has

Re: Index fetch failed

2019-09-03 Thread Erick Erickson
15460487 62144 6007 >> -/+ buffers/cache: 9308 6639 >> Swap:0 0 0 >> >> >> Thanks & Regards, >> Akreeti Agarwal >> >> -Original Message- >> From: Akreeti

Re: Index fetch failed

2019-09-03 Thread Shankar Ramalingam
gust 28, 2019 2:45 PM > To: solr-user@lucene.apache.org > Subject: RE: Index fetch failed > > Yes I am using solr-5.5.5. > This error is intermittent. I don't think there must be any issue with > master connection limits. This error is accompanied by

RE: Index fetch failed

2019-09-02 Thread Akreeti Agarwal
0 Thanks & Regards, Akreeti Agarwal -Original Message- From: Akreeti Agarwal Sent: Wednesday, August 28, 2019 2:45 PM To: solr-user@lucene.apache.org Subject: RE: Index fetch failed Yes I am using solr-5.5.5. This error is intermittent. I don't think there must be

RE: Index fetch failed

2019-08-28 Thread Akreeti Agarwal
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748) Thanks & Regards, Akreeti Agarwal -Original Message- From: Atita Arora Sent: Wednesday, August 28, 2019 2:23 PM To: solr-user@lucene.apache.org Subject: Re: Index fetch fai

Re: Index fetch failed

2019-08-28 Thread Atita Arora
99G 45G 49G 48% / > tmpfs 7.8G 0 7.8G 0% /dev/shm > > Thanks & Regards, > Akreeti Agarwal > > -Original Message- > From: Atita Arora > Sent: Wednesday, August 28, 2019 11:15 AM > To: solr-user@lucene.apache.org > Subject: Re: Index fetch fai

RE: Index fetch failed

2019-08-28 Thread Akreeti Agarwal
7.8G 0 7.8G 0% /dev/shm Thanks & Regards, Akreeti Agarwal -Original Message- From: Atita Arora Sent: Wednesday, August 28, 2019 11:15 AM To: solr-user@lucene.apache.org Subject: Re: Index fetch failed Hii, Do you have enough memory free for the index chunk to be fet

Re: Index fetch failed

2019-08-27 Thread Atita Arora
Hii, Do you have enough memory free for the index chunk to be fetched/Downloaded on the slave node? On Wed, Aug 28, 2019 at 6:57 AM Akreeti Agarwal wrote: > Hello Everyone, > > I am getting this error continuously on Solr slave, can anyone tell me the > solution for this: > > 642141666 ERROR (

Re: Upgrading Solr 6.3.0 to 7.5.0 without having to re-index

2019-04-17 Thread Shawn Heisey
On 4/17/2019 3:52 AM, Ritesh Kumar wrote: Field type in old configuration - string (solr.StrField) indexed and stored set to true. Field type in new configuration - solr.SortableTextField (docValues enabled) On your schema, you have changed the field class -- from StrField to SortableTextFie

Upgrading Solr 6.3.0 to 7.5.0 without having to re-index

2019-04-17 Thread Ritesh Kumar
Hello Team, I have been trying to upgrade Solr 6.3.0 to 7.5.0 and I do not want to re-index. I tried it using the Index Upgrader Tool <https://lucene.apache.org/solr/guide/7_5/indexupgrader-tool.html>. The tool did its part and the current index is according to the current file format

Re: Solr exception: java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct

2019-04-10 Thread Erick Erickson
"Re-index with correct docvalues”. I.e. define weight to have docValues=true in your schema. WARNING: you have to totally get rid of your current data, I’d recommend starting with a new collection. > On Apr 10, 2019, at 12:21 AM, Alex Broitman > wrote: > > We got the So

Solr exception: java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct doc

2019-04-10 Thread Alex Broitman
#x27;weight' (expected one of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct docvalues type.java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, S

RE: Index database with SolrJ using xml file directly throws an error

2019-03-04 Thread sami
Thanks James, it works! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Index database with SolrJ using xml file directly throws an error

2019-03-01 Thread Dyer, James
Instead of dataConfig=data-config.xml, use config=data-config.xml . From: sami Sent: Friday, March 1, 2019 3:05 AM To: solr-user@lucene.apache.org Subject: RE: Index database with SolrJ using xml file directly throws an error Hi James, Thanks for your reply. I am not absolotuely sure I

RE: Index database with SolrJ using xml file directly throws an error

2019-03-01 Thread sami
Hi James, Thanks for your reply. I am not absolotuely sure I understood everything correctly here. I would like to index my database to start with fresh index. I have already done it with DIH execute function. It works absolutely fine

RE: Index database with SolrJ using xml file directly throws an error

2019-02-28 Thread Dyer, James
The parameter "dataConfig" should hold an actual xml document to override the data-config.xml file you store in zookeeper (cloud) or the configuration directory (standalone). Typically you do not use this parameter. Instead, specify the "config" parameter with the filename (eg. data-config.xml

Re: Index database with SolrJ using xml file directly throws an error

2019-02-28 Thread Erick Erickson
That error usually means there are characters (even spaces) at the _beginning_ of the xml file. DIH may be more forgiving on that front. Basically, anything preceding the opening tag may cause this error. Best, Erick On Thu, Feb 28, 2019 at 8:24 AM sami wrote: > > I would like to index my datab

Re: index size, stored vs indexed

2018-11-14 Thread Erick Erickson
Can't really be answered. For instance, stored data is held in *.fdt files and is largely irrelevant to searching since that data is only consulted for returning stored fields of the top N docs. So if your index consists of 90% stored data it's one answer, if 10% it's totally another. the stored da

Re: Index optimization takes too long

2018-11-04 Thread Toke Eskildsen
On Sat, 2018-11-03 at 21:41 -0700, Wei wrote: > Thanks everyone! I checked the system metrics during the optimization > process. CPU usage is quite low, there is no I/O wait, and memory > usage is not much different from before the docValues change. So I > wonder what could be the bottleneck. Ar

Re: Index optimization takes too long

2018-11-03 Thread Wei
Thanks everyone! I checked the system metrics during the optimization process. CPU usage is quite low, there is no I/O wait, and memory usage is not much different from before the docValues change. So I wonder what could be the bottleneck. Thanks, Wei On Sat, Nov 3, 2018 at 1:38 PM Erick Ericks

Re: Index optimization takes too long

2018-11-03 Thread Erick Erickson
Going from my phone so it'll be terse. See uninvertingmergeuodateprocessor (or something like that). Also, there's an idea in SOLR-12259 IIRC, but that'll be in 7.6 at the earliest. On Sat, Nov 3, 2018, 07:13 Shawn Heisey On 11/3/2018 5:32 AM, Dave wrote: > > On a side note, does adding docvalue

Re: Index optimization takes too long

2018-11-03 Thread Shawn Heisey
On 11/3/2018 5:32 AM, Dave wrote: On a side note, does adding docvalues to an already indexed field, and then optimizing, prevent the need to reindex to take advantage of docvalues? I was under the impression you had to reindex the content. You must reindex when changing the schema to add doc

Re: Index optimization takes too long

2018-11-03 Thread Dave
On a side note, does adding docvalues to an already indexed field, and then optimizing, prevent the need to reindex to take advantage of docvalues? I was under the impression you had to reindex the content. > On Nov 3, 2018, at 4:41 AM, Deepak Goel wrote: > > I would start by monitoring the h

Re: Index optimization takes too long

2018-11-03 Thread Deepak Goel
I would start by monitoring the hardware (CPU, Memory, Disk) & software (heap, threads) utilization's and seeing where the bottlenecks are. Or what is getting utilized the most. And then tune that parameter. I would also look at profiling the software. Deepak "The greatness of a nation can be ju

Re: Index optimization takes too long

2018-11-02 Thread Shawn Heisey
On 11/2/2018 5:00 PM, Wei wrote: After a recent schema change, it takes almost 40 minutes to optimize the index. The schema change is to enable docValues for all sort/facet fields, which increase the index size from 12G to 14G. Before the change it only takes 5 minutes to do the optimization.

Re: Index fetch failed. Exception: Server refused connection

2018-10-25 Thread Walter Underwood
A 1 Gb heap is probably too small on the master. Run with 8 Gb like the slaves. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 24, 2018, at 10:20 PM, Bharat Yadav wrote: > > Hello Team, > > We are now a days frequently facing below issue on o

Re: Index size issue in SOLR-6.5.1

2018-10-08 Thread Dominique Bejean
HI, In the Solr Admin console, you can access for each core to the "Segment info" page. You can see if there are more deleted documents in segments on server X. Dominique Le lun. 8 oct. 2018 à 07:29, SOLR4189 a écrit : > About which details do you ask? Yesterday we restarted all our solr > ser

Re: Index size issue in SOLR-6.5.1

2018-10-07 Thread SOLR4189
About which details do you ask? Yesterday we restarted all our solr services and index size in serverX descreased from 82Gb to 60Gb, and in serverY index size didn't change (49Gb). -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Index size issue in SOLR-6.5.1

2018-10-07 Thread Dominique Bejean
Hi, What about cores segment details in admin UI interface ? More deleted documents ? Regards Dominique Le dim. 7 oct. 2018 à 08:22, SOLR4189 a écrit : > Hi all, > > We use SOLR-6.5.1 and we have very strange issue. In our collection index > size is very different from server to server (33gb

Re: Index Upgrader tool

2018-08-24 Thread Shawn Heisey
On 8/24/2018 12:44 AM, dami...@gmail.com wrote: Shawn, Is it possible to run optimize on the live collection? For example, /solr/collection/update?commit=true&optimize=true For all the reasons in the blog post that Erick referenced, we recommend that you do not do this. Something to note:  T

Re: Index Upgrader tool

2018-08-24 Thread Erick Erickson
Yes, it's possible to run optimize on a live index. I wouldn't though, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ Lucene has _never_ guaranteed proper functioning of an index created with version X-2 with version X. It hasn't been super-obvious, but

Re: Index Upgrader tool

2018-08-23 Thread damienk
Shawn, Is it possible to run optimize on the live collection? For example, /solr/collection/update?commit=true&optimize=true On Wed, 22 Aug 2018 at 06:50, Shawn Heisey wrote: > On 8/21/2018 2:29 AM, Artjoms Laivins wrote: > > We are running Solr cloud with 3 nodes v. 6.6.2 > > We started with ve

Re: Index Upgrader tool

2018-08-21 Thread Shawn Heisey
On 8/21/2018 2:29 AM, Artjoms Laivins wrote: We are running Solr cloud with 3 nodes v. 6.6.2 We started with version 5 so we have some old index that we need safely move over to v. 7 now. New data comes in several times per day. Our questions are: Should we run IndexUpgrader tool on one slave n

Re: Index protected zip

2018-05-29 Thread Cassandra Targett
Someone needs to update the Ref Guide. That can be a patch submitted on a JIRA issue, or a committer could forego a patch and make changes directly with commits. Otherwise, this wiki page is making a bad situation even worse. On Tue, May 29, 2018 at 12:06 PM Tim Allison wrote: > I’m happy to co

Re: Index protected zip

2018-05-29 Thread Tim Allison
I’m happy to contribute to this message in any way I can. Let me know how I can help. On Tue, May 29, 2018 at 2:31 PM Cassandra Targett wrote: > It's not as simple as a banner. Information was added to the wiki that does > not exist in the Ref Guide. > > Before you say "go look at the Ref Guide

Re: Index protected zip

2018-05-29 Thread Cassandra Targett
It's not as simple as a banner. Information was added to the wiki that does not exist in the Ref Guide. Before you say "go look at the Ref Guide" you need to make sure it says what you want it to say, and the creation of this page just 3 days ago indicates to me that the Ref Guide is missing somet

Re: Index protected zip

2018-05-29 Thread Erick Erickson
On further reflection ,+1 to marking the Wiki page superseded by the reference guide. I'd be fine with putting a banner at the top of all the Wiki pages saying "check the Solr reference guide first" ;) On Tue, May 29, 2018 at 10:59 AM, Cassandra Targett wrote: > Couldn't the same information on t

Re: Index protected zip

2018-05-29 Thread Cassandra Targett
Couldn't the same information on that page be put into the Solr Ref Guide? I mean, if that's what we recommend, it should be documented officially that it's what we recommend. I mean, is anyone surprised people keep stumbling over this? Shawn's wiki page doesn't point to the Ref Guide (instead po

Re: Index protected zip

2018-05-26 Thread Erick Erickson
Thanks! now I can just record the URL and then paste it in ;) Who knows, maybe people will see it first too! On Sat, May 26, 2018 at 9:48 AM, Tim Allison wrote: > W00t! Thank you, Shawn! > > The "don't use ERH in production" response comes up frequently enough >> that I have created a wiki page

Re: Index protected zip

2018-05-26 Thread Tim Allison
W00t! Thank you, Shawn! The "don't use ERH in production" response comes up frequently enough > that I have created a wiki page we can use for responses: > > https://wiki.apache.org/solr/RecommendCustomIndexingWithTika > > Tim, you are extremely well-qualified to expand and correct this page. > Er

Re: Index protected zip

2018-05-26 Thread Shawn Heisey
On 5/26/2018 4:52 AM, Tim Allison wrote: Please see Erick Erickson’s evergreen advice and linked blog post: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201805.mbox/%3ccan4yxve_0gn0a1y7wjpr27inuddo6+jzwwfgvzkfs40gh3r...@mail.gmail.com%3e The "don't use ERH in production" response

Re: Index protected zip

2018-05-26 Thread Tim Allison
On third thought, I can’t think of how you’d easily inject a PasswordProvider into Solr’s integration. Please see Erick Erickson’s evergreen advice and linked blog post: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201805.mbox/%3ccan4yxve_0gn0a1y7wjpr27inuddo6+jzwwfgvzkfs40gh3r...@m

Re: Index protected zip

2018-05-26 Thread Tim Allison
You’ll need to provide a PasswordProvider in the ParseContext. I don’t think that is currently possible in the Solr integration. Please open a ticket if SolrJ doesn’t meet your needs. On Thu, May 24, 2018 at 1:03 PM Alexandre Rafalovitch wrote: > Hmm. If it works, then it is Tika magic. Which m

Re: Index protected zip

2018-05-24 Thread Alexandre Rafalovitch
Hmm. If it works, then it is Tika magic. Which may mean they may have a setting for passwords. Which would need to be configured and then exposed through Solr. So, I would check if you can extract text with Tika standalone first. Regards, Alex On Thu, May 24, 2018, 5:05 AM Dimitris Kardarako

Re: Index filename while indexing JSON file

2018-05-23 Thread Shawn Heisey
On 5/18/2018 1:47 PM, S.Ashwath wrote: > I have 2 directories: 1 with txt files and the other with corresponding > JSON (metadata) files (around 9 of each). There is one JSON file for > each CSV file, and they share the same name (they don't share any other > fields). > > The txt files just hav

Re: Index filename while indexing JSON file

2018-05-21 Thread Bernd Fehling
I don't know if DIH can solve your problem but I would go for a simple self programmed ETL in JAVA and use SolrJ for loading. Best regards, Bernd Am 18.05.2018 um 21:47 schrieb S.Ashwath: Hello, I have 2 directories: 1 with txt files and the other with corresponding JSON (metadata) files (aro

Re: Index filename while indexing JSON file

2018-05-21 Thread S.Ashwath
Thanks Raymond. As I was doing the indexing of other delimited files directly with Solr and the terminal (without a client), I thought it would be possible to index the filename of JSON files this way as well. But like you say, I'm parsing the search results in Python. So I might as well build the

Re: Index filename while indexing JSON file

2018-05-20 Thread Raymond Xie
would you consider to include the filename as another meta data fields for being indexed? I think your downstream python can do that easily. ** *Sincerely yours,* *Raymond* On Fri, May 18, 2018 at 3:47 PM, S.Ashwath wrote: > Hello, > > I have 2

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-19 Thread Alessandro Benedetti
Hi David, good to know that sorting solved your problem. I understand perfectly that given the urgency of your situation, having the solution ready takes priority over continuing with the investigations. I would recommend anyway to open a Jira issue in Apache Solr with all the information gathered

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-18 Thread Howe, David
Hi Erick & Alessandro, I have solved my problem by re-ordering the data in the SQL query. I don't know why it works but it does. I can consistently re-produce the problem without changing anything else except the database table. As our Solr build is scripted and we always build a new Solr s

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread Gora Mohanty
On 18 February 2018 at 08:18, @Nandan@ wrote: > Thanks Rick. > Is it possible to get some demo learning video link or web links from > where I can get overview with real example? > By which I can able to know in more details. > Searching Google for "Solr index data database" turns up many links

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread @Nandan@
Thanks Rick. Is it possible to get some demo learning video link or web links from where I can get overview with real example? By which I can able to know in more details. On Feb 18, 2018 4:11 AM, "Rick Leir" wrote: > Nandan > Work backwards from your results screen. When a user has done a sear

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread Rick Leir
Nandan Work backwards from your results screen. When a user has done a search, what information would you like to appear on the screen? That tells you what your Solr document needs to contain. How will you get that information into the Solr document? You will do the SQL select(s) as necessary,

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread @Nandan@
Hi David , Thanks for your reply. My few questions are :- 1) I have to denormalize my MySQL data manually or some process is there. 2) is it like when Data will insert into my MySQL , it has to auto index into solr ? Please explain these . Thanks On Feb 18, 2018 1:51 AM, "David Hastings" wrote:

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread David Hastings
Your first step is to denormalize your data into a flat data structure. Then index that into your solr instance. Then you’re done On Feb 17, 2018, at 12:16 PM, @Nandan@ mailto:nandanpriyadarshi...@gmail.com>> wrote: Hi Team, I am working on one e-commerce project in which my data is storing int

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Erick Erickson
I didn't mean to imply that _you'd_ changed things, the _defaults_ may have changed. So the "string" fieldType may be defined with docValues="true" in your new schema and "false" in your old schema without you intentionally changing anything at _all_. That's why the LukeRequestHandler will hel

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, I'm 99% sure that I haven't changed the field types between the two snapshots as all of my test runs are completely scripted and build a new Solr server from scratch (both the virtual machine and the Solr software). I can diff the scripts between two runs to make sure I haven't acci

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Erick Erickson
Well, I'm not entirely sure either ;) What I'm seeing. And, BTW, I'm making a couple of assumptions here. In the one listing, your biggest segment starts with _7l and in the other its _zd. The aggregate size is 2,815M for _7l and 705M for _zd. So multiplying the individual files in _zd by 4 (p

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, Thinking some more about the differences between the two sort orders has suggested another possibility. We also have a geo spatial field defined in the index: echo "$(date) Creating geoLocation field" curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-fiel

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, Below is the file listing for when the index is loaded with the table ordered in a way that produces the smaller index. I have checked the console, and we have no deleted docs and we have the same number of docs in the index as there are rows in the staging table that we load from.

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Alessandro, There are 14,061,990 records in the staging table and that is how many documents that we end up with in Solr. I would be surprised if we have a problem with the id, as we use the primary key of the table as the id in Solr so it must be unique. The primary key of the staging ta

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Alessandro Benedetti
It's a silly thing, but to confirm the direction that Erick is suggesting : How many rows in the DB ? If updates are happening on Solr ( causing the deletes), I would expect a greater number of documents in the DB than in the Solr index. Is the DB primary key ( if any) the same of the uniqueKey fie

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Emir, We have no copy field definitions. To keep things simple, we have a one to one mapping between the columns in our staging table and the fields in our Solr index. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T 039106

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Emir Arnautović
Hi David, I skimmed through thread and don’t see if already eliminated, so will ask: Can you check if there are some copyField rules that are triggered when new field is added. You mentioned that ordering fixed the size of the index, but might be worth checking. Emir -- Monitoring - Log Managem

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Erick Erickson
This isn't terribly useful without a similar dump of "the other" index directory. The point is to compare the different extensions some segment where the sum of all the files in that segment is roughly equal. So if you have a listing of the old index around, that would help. bq: We don't have any

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David
Hi Erick, I have the full dump of the Solr index file sizes as well if that is of any help. I have attached it below this message. We don't have any deleted docs in our index, as we always build it from a brand new virtual machine with a brand new installation of Solr. The ordering is defini

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Erick Erickson
David: Rats, the cfs files make everything I'd hoped to understand with the sizes ambiguous, since they conceal the underlying sizes of each other extension. We can approach it a bit differently though. Take one segment that's _not_ in cfs format where the total size of all files making up that se

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Pratik Patel
@Alessandro I will see if I can reproduce the same issue just by turning off omitNorms on field type. I'll open another mail thread if required. Thanks. On Thu, Feb 15, 2018 at 6:12 AM, Howe, David wrote: > > Hi Alessandro, > > Some interesting testing today that seems to have gotten me closer t

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David
Hi Alessandro, Some interesting testing today that seems to have gotten me closer to what the issue is. When I run the version of the index that is working correctly against my database table that has the extra field in it, the index suddenly increases in size. This is even though the data i

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Alessandro Benedetti
@Pratik: you should have investigated. I understand that solved your issue, but in case you needed norms it doesn't make sense that cause your index to grow up by a factor of 30. You must have faced a nasty bug if it was just the norms. @Howe : *Compound File* .cfs, .cfe An optional "virtua

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Howe, David
Subject: RE: Index size increases disproportionately to size of added field when indexed=false I have set docValues=false on all of the string fields in our index that have indexed=false and stored=true. This gave a small improvement in the index size from 13.3GB to 12.82GB. I have also tried

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
You are right, in my case this field type was applied to many text fields. These includes many copy fields and dynamic fields as well. In my case, only specifying omitNorms=true for field type "text_general" fixed the issue. I didn't do anything else or had any other bug. On Wed, Feb 14, 2018 at 1

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Alessandro Benedetti
Hi pratik, how is it possible that just the norms for a single field were causing such a massive index size increment in your case ? In your case I think it was for a field type used by multiple fields, but it's still suspicious in my opinions, norms should be that big. If I remember correctly in

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Erick Erickson
067904 >> >> M 0424036591 >> >> E david.h...@auspost.com.au >> >> W auspost.com.au >> W startrack.com.au >> >> -Original Message- >> From: Howe, David [mailto:david.h...@auspost.com.au] >> Sent: Wednesday, 14 February 2018 7:26 AM >

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
> Level 16, 111 Bourke Street Melbourne VIC 3000 > > T 0391067904 > > M 0424036591 > > E david.h...@auspost.com.au > > W auspost.com.au > W startrack.com.au > > -Original Message- > From: Howe, David [mailto:david.h...@auspost.com.au] > Se

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
auspost.com.au W startrack.com.au -Original Message- From: Howe, David [mailto:david.h...@auspost.com.au] Sent: Wednesday, 14 February 2018 7:26 AM To: solr-user@lucene.apache.org Subject: RE: Index size increases disproportionately to size of added field when indexed=false Thanks Hoss. I will

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Thanks Hoss. I will try setting docValues to false, as we only ever want to be able to retrieve the value of this field. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T 0391067904 M 0424036591 E david.h...@auspost.com.au W

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Erick, Thanks for responding. You are correct that we don't have any deleted docs. When we want to re-index (once a fortnight), we build a brand new installation of Solr from scratch and re-import the new data into an empty index. I will try setting docValues to false and see if

  1   2   3   4   5   6   7   8   9   10   >