On Tue 07 Apr 2020 at 19:11:57 (+0300), Anastasios Lisgaras wrote: > Youtube-dl <https://github.com/ytdl-org/youtube-dl> is indeed a > powerful and very good software for this job with many features and > options, but can you download videos *from anywhere ?* > > What I want to say is that there are many web pages which greatly > hinder (prohibit) this possibility. > In this case, what can we do? Can we always find the hidden link > (source) of the video? If so, how? > If the page requires you to be logged in, what can we do?
I'm not sure what the implications are of having to login to a site. But in general you need different tools for different web sites. The BBC iplayer and youtube-dl are two such tools, and sometimes a download link is even available, which either the browser or wget can use (the latter preserving the metadata). Where videos exist in their entirety, some sites still play them by downloading to a temporary file (and you can see the download in the progress bar, ahead of what's actually playing. A technique there is to examine /proc/N/fd where N is the process number of the browser tab. (The process name used to be xul-runner, Web Content etc, and looks as if it's currently /usr/lib/firefox-esr/firefox-esr -contentproc.) If you find an fd number F that's pointing to a file (deleted) in /tmp, then try copying that /proc/N/fd/F (following links). Do it when the download progress bar has reached the end, but the file is still playing. (Sometimes everything disappears as soon as the end is reached.) Another technique is where the source is streaming (and might be open-ended). Here, the video can end up as fragments in your browser cache. How you handle them depends on whether they are audiovisual or in two seperate streams, and whether they are timestamped. Some are, some aren't. The former are relatively easy to reassemble with ffprobe to read the timings and ffmpeg to concatenate the pieces (and merge audio/video if necessary). Where there's no internal timestamping, you can sometimes rely on the filesystem's own to figure out the correct ordering. But I prefer to run a script that watches files in the cache as they are closed (with inotifywait), and immediately copies them out (if the filetype is of interest) with a sequence number and the file type in the filename. The relevant segments can then be concatenated quite easily. A timeformat of %Y%m%d-%H%M%S works well as a more meaningful sequence number, particularly if you append %N to include nanoseconds for the necessary time resolution. Be aware that the fragments in your cache might not all be identified by the file program's defaults. For example, I use 0 string G@ TS transport stream in ~/.magic to pickup files that file might otherwise label as 'data'. Sometimes, even then, you have to use a little ingenuity for the quiet life: eg there's a UK railway site that has three webcams (two stations and the yard) which run simultaneously on the same web page. Fortunately, each webcam runs with a different frame speed, so it's quick and easy to distinguish their files and divide them up. Finally, when all else fails, and if you've read this far, you can just capture the screen contents with ffmpeg's x11grab and record it to an mpg file. The disadvantages are that you capture extraneous screen decorations, and you've got to dedicate the whole screen to watching the video, remembering to increase your blanking timeout too. If you can only record audio through the microphone, you get more extraneous rubbish there too. Cheers, David.