I can not see any problem in that, but talking about commits I would like to make a difference between "Hard" and "Soft" .
Hard commit -> durability Soft commit -> visibility I suggest you this interesting reading : https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ It's an old interesting Erick post. It explains you better what are the differences between different commit types. I would put you in this scenario : Heavy (bulk) indexing > > The assumption here is that you’re interested in getting lots of data to > the index as quickly as possible for search sometime in the future. I’m > thinking original loads of a data source etc. > > - Set your soft commit interval quite long. As in 10 minutes or even > longer (-1 for no soft commits at all). *Soft commit is about > visibility, *and my assumption here is that bulk indexing isn’t about > near real time searching so don’t do the extra work of opening any kind of > searcher. > - Set your hard commit intervals to 15 seconds, openSearcher=false. > Again the assumption is that you’re going to be just blasting data at Solr. > The worst case here is that you restart your system and have to replay 15 > seconds or so of data from your tlog. If your system is bouncing up and > down more often than that, fix the reason for that first. > - Only after you’ve tried the simple things should you consider > refinements, they’re usually only required in unusual circumstances. But > they include: > - Turning off the tlog completely for the bulk-load operation > - Indexing offline with some kind of map-reduce process > - Only having a leader per shard, no replicas for the load, then > turning on replicas later and letting them do old-style replication to > catch up. Note that this is automatic, if the node discovers it is “too > far” out of sync with the leader, it initiates an old-style replication. > After it has caught up, it’ll get documents as they’re indexed to the > leader and keep its own tlog. > - etc. > > Actually you could do the commit only at the end, but I can not see any advantage in that. I suggest you to play with auto hard/soft commit config and get a better idea of the situation ! Cheers 2015-06-05 16:08 GMT+01:00 Bruno Mannina <bmann...@free.fr>: > Hi Alessandro, > > I'm actually on my dev' computer, so I would like to post 1 000 000 xml > file (with a structure defined in my schema.xml) > > I have already import 1 000 000 xml files by using > bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5 > where /DATA0/X contains 20 000 xml files (I do it 20 times by just > changing X from 1 to 50) > > I would like to do now > bin/post -c mydb /DATA1 > > I would like to know If my SOLR5 will run fine and no provide an memory > error because there are too many files > in one post without doing a commit? > > The commit will be done at the end of 1 000 000. > > Is it ok ? > > > > Le 05/06/2015 16:59, Alessandro Benedetti a écrit : > >> Hi Bruno, >> I can not see what is your challenge. >> Of course you can index your data in the flavour you want and do a commit >> whenever you want… >> Are those xml Solr xml ? >> If not you would need to use the DIH, the extract update handler or any >> custom Indexer application. >> Maybe I missed your point… >> Give me more details please ! >> >> Cheers >> >> 2015-06-05 15:41 GMT+01:00 Bruno Mannina <bmann...@free.fr>: >> >> Dear Solr Users, >>> >>> I would like to post 1 000 000 records (1 records = 1 files) in one >>> shoot >>> ? >>> and do the commit and the end. >>> >>> Is it possible to do that ? >>> >>> I've several directories with each 20 000 files inside. >>> I would like to do: >>> bin/post -c mydb /DATA >>> >>> under DATA I have >>> /DATA/1/*.xml (20 000 files) >>> /DATA/2/*.xml (20 000 files) >>> /DATA/3/*.xml (20 000 files) >>> .... >>> /DATA/50/*.xml (20 000 files) >>> >>> Actually, I post 5 directories in one time (it takes around 1h30 for 100 >>> 000 records/files) >>> >>> But it's Friday and I would like to run it during the W.E. alone. >>> >>> Thanks for your comment, >>> >>> Bruno >>> >>> --- >>> Ce courrier électronique ne contient aucun virus ou logiciel malveillant >>> parce que la protection avast! Antivirus est active. >>> https://www.avast.com/antivirus >>> >>> >>> >> > > --- > Ce courrier électronique ne contient aucun virus ou logiciel malveillant > parce que la protection avast! Antivirus est active. > https://www.avast.com/antivirus > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England