https://bugs.kde.org/show_bug.cgi?id=420939
--- Comment #48 from Scott <shagooser...@gmail.com> --- To try and understand better what is going on I decided to delete the index file and start from scratch. I have also piped the terminal output to a file to better assist me to do this. I will list the issues I come across as I come across them. 1/ Baloo indexer terminated again but I was not able to regain the prompt by pressing enter. 2/ I think I may have confused the issue of indexing with displaying duration by using them interchangeably. I have now discovered that some files are indexed but do not display a duration. The following file appears in the indexed list but there is no duration displayed: Indexing: /mnt/pool/Entertainment/Documentaries/Movies/Battle of Jutland the Navys Bloodiest Day 2016.ts: Ok xdg-mime query filetype "Battle of Jutland the Navys Bloodiest Day 2016".ts = video/mp2t balooshow -x "Battle of Jutland the Navys Bloodiest Day 2016".ts = f8c500000031 49 63685 Battle of Jutland the Navys Bloodiest Day 2016.ts: No index information found 3/ I then did a balooctl disable > balooctl enable and compared what was indexed this time: First index 5491 files indxed Second index 5345 files indexed The 2 files seem totally unrelated with both significant overlap and differences. The file size is seemingly unrelated, some large files are indexed and many small files are not. 4/ The Jutland film was indexed on both occasions. stat "Battle of Jutland the Navys Bloodiest Day 2016".ts File: Battle of Jutland the Navys Bloodiest Day 2016.ts Size: 1299907200 Blocks: 2538896 IO Block: 4096 regular file Device: 31h/49d Inode: 70217 Links: 1 Access: (0775/-rwxrwxr-x) Uid: ( 1001/ shagoo) Gid: ( 1002/shagadmin) Access: 2021-08-07 08:50:24.837378054 +0800 Modify: 2016-06-03 11:55:55.180896791 +0800 Change: 2021-08-02 20:01:39.015217906 +0800 5/ This seems to be your "red flag" baloosearch -i "Battle of Jutland the Navys Bloodiest Day 2016".ts f3f600000031 /mnt/pool/Entertainment/Documentaries/Movies/Battle of Jutland the Navys Bloodiest Day 2016.ts f8c500000031 /mnt/pool/Entertainment/Documentaries/Movies/Battle of Jutland the Navys Bloodiest Day 2016.ts Elapsed: 0.997359 msecs 6/ No duration was displayed for this file on the first indexing or the second. These results were obtained consecutively without any other programs being run or re-booting the computer. On Fri, Aug 6, 2021 at 4:11 PM <bugzilla_nore...@kde.org> wrote: > https://bugs.kde.org/show_bug.cgi?id=420939 > > --- Comment #47 from tagwer...@innerjoin.org --- > (In reply to Scott from comment #46) > > No problem, we carry on troubleshooting. > > > I think the problem is more than just misidentifying mime types. > Finding out about the mimetypes and that baloo would never attempt to index > some files was one step along the way. Good to find out but there's more > to do. > > > 3/ Further it reports files waiting to be indexed and files failed to > index > > both being zero when in fact approximately 1,000 of the 6,000 files in > the > > dataset have not been indexed. I have restarted baloo repeatedly and they > > never get indexed, it re-indexes what it had before. > It's possible that we've got another mimetype issue with these files, or > they > are your 1000 biggest files, or something else. I think copy one of them to > your home directory and check with > > xdg-mime query filetype ...newstrangefile... > > Check that the mimetype is sensible, then see what > > balooshow -x ...newstrangefile... > > says. > > > 1/ baloo terminates during indexing for unknown reasons (not > > hanging/freezing as I erroneously stated previously) without providing a > > reason code. > I'll ask a bit more about this. Your "balooctl status" output says > > > Baloo File Indexer is running > > Indexer state: Idle > That's what baloo says when it's alive and thinks it has nothing more to > do. > There is the content indexer process "baloo_file_extractor" that is run > when > there is indexing necessary, does its job, saves the results, stops and is > run > again when there is more to do. This would/should happen in the background > and > you wouldn't see exit codes. > > > 2/ On restarting the indexing baloo re-indexes the same files with an > > erroneous message that the files have changed (see my last email) or > added > > with baloo being turned off. Baloo is not checking that these index > entries > > already exist or there is some problem with the index file itself and so > > just duplicates them which is why baloo reports over 21,000 files indexed > > from a dataset only containing 6,000 entries. > The error message is a: > > > ... id seems to have changed. Perhaps baloo was not running, and this > file was deleted + re-created > Need to check the Id and see if it is really changing. Ask with "stat", > you'll > get something like: > > $ stat 1.ts > File: 1.ts > Size: 41416704 Blocks: 80896 IO Block: 4096 regular > file > Device: fc01h/64513d Inode: 794964 Links: 1 > Access: (0664/-rw-rw-r--) Uid: ( 1000/ test) Gid: ( 1000/ > test) > Access: 2021-07-24 22:50:57.838161084 +0200 > Modify: 2021-07-24 22:50:57.838161084 +0200 > Change: 2021-07-24 22:51:42.686181710 +0200 > Birth: - > > It's the "Device" and "Inode" numbers that you need to keep you eye on. > The: > > Device: fc01h/64513d Inode: 794964 > > If you reboot and these change, baloo will think it's got a new file and > try to > index it again. Keep a note of the numbers, check again after a reboot and > compare. > > You could also try a baloosearch for one of the files that always seems to > be > reindexed > > $ baloosearch -i ...oneofyoursavedfiles... > > If you are OK, baloosearch will give a single result, if the id has been > changing, "baloosearch -i" would show several lines - with different ID > numbers > and the same file/pathname. Something like: > > $ baloosearch -i testfile > 9ca00000028 /home/test/testfile > 9ca0000002a /home/test/testfile > 9ca0000002c /home/test/testfile > > That would be a red flag... > > > I had to disable baloo because it somehow seriously interferes with my > > ability to move files from the admin PC to the server. With baloo running > > on the server any attempt to transfer files to it results in very slow > > transfer speeds and on occasion failure to complete the move and this is > > occuring while the indexer is reporting idle. > I can only guess where - but you are indexing *really* large files, and > there > were a couple of fixes two months ago to stop a Mime lookup read the whole > file > into memory. Bug 398908, fixed according to > https://bugs.kde.org/show_bug.cgi?id=398908#c97 > with 5.83. If you don't have this version, maybe the best thing to do it > wait > until it gets to you with an update. > > -- > You are receiving this mail because: > You reported the bug. -- You are receiving this mail because: You are watching all bug changes.