An npm package that provides an abstract class to scrape videos with Puppeteer.
To install video-scraper-core, run:
$ npm install video-scraper-core
This module is written because videos hosted on some websites are difficult to download and watchable only in the browser. Even by using some browser tools, sometimes, it may be difficult or impossible to download the video. A solution that can always be used, is actually taking a video screen recording after having played the video, but it is too time-consuming to be done manually.
This is why I have written this module, that uses puppeteer and puppeteer-stream under the hood to open a google-chrome browser, see the video and take a video recording of it.
The module is written in Typescript, uses Webpack to reduce the bundle size (even if most of it comes from the puppeter browser), uses euberlog for a scoped debug log and is full of configurations.
The module provides an abstract class that you can extend to create your own scraper. By overriding some simple methods, you can adapt the scraper to your needs.
The scraper:
afterPageLoaded
, for example if a login is neededAn example to create a scraper for TumConf:
import { VideoScraperCore, ScrapingOptions, BrowserOptions } from 'video-scraper-core';
import { Page } from 'puppeteer';
import { Logger } from 'euberlog';
// Extend VideoScraperCore to create the scraper class
export class TumConfScraper extends VideoScraperCore {
// The passcode used to login
private readonly passcode: string;
// The constructor that allows the passcode to be specified
constructor(passcode: string, browserOptions: BrowserOptions) {
super(browserOptions);
this.passcode = passcode;
}
// The selector of the full screen button
protected getFullScreenSelector(): string {
return '.vjs-fullscreen-toggle-control-button';
}
// The selector of the play button
protected getPlayButtonSelector(): string {
return '.vjs-play-control';
}
// The selector of the video time duration
protected getVideoDurationSelector(): string {
return '.vjs-time-range-duration';
}
// After the page is loaded, login by using puppeteer
protected async afterPageLoaded(_options: ScrapingOptions, page: Page, logger: Logger): Promise<void> {
logger.debug('Putting the passcode to access the video');
await page.waitForSelector('input#password');
await page.$eval(
'input#password',
(el: HTMLInputElement, passcode: string) => (el.value = passcode),
this.passcode
);
logger.debug('Clicking the button to access the video');
await page.waitForSelector('.btn-primary.submit');
await page.$eval('.btn-primary.submit', (button: HTMLButtonElement) => button.click());
}
}
async function main() {
// Create an instance of the scraper
const scraper = new TumConfScraper('mypasscode', { debug: true });
// Launch the Chrome browser
await scraper.launch();
// Scrape and save the video
await scraper.scrape('https://videourl.com', './saved.webm');
// Close the browser
await scraper.close();
}
main();
The documentation site is: video-scraper-core documentation
The documentation for development site is: video-scraper-core dev documentation
The VideoScraperCore class, that can be extended to scrape a video from a website and saves it to a file.
Constructor:
VideoScraperCore(options)
Parameters:
BrowserOptions
object that specifies the options for this instance.Public methods:
options
parameter.url
and saves it to destPath
. Some ScrapingOptions can be passed.Protected methods:
Protected and abstract methods:
The options given to the VideoScraperCore constructor.
Parameters:
false
. If true, it will show debug log.'VideoScraperCore'
. The scope given to the euberlog debug logger.'/usr/bin/google-chrome'
. The path to the browser executable.{ width: 1920, height: 1080 }
. The object that says how big the window size will be.The options given to a scrape method.
Parameters:
null
. The duration in milliseconds of the recorded video.0
. The delay in milliseconds after that the play button has been clicked.15_000
. The delay in milliseconds after that the duration milliseconds are past and before that the recording is stopped.false
. If true, the video will be recorded after having put it on fullscreen.true
. If true, the audio will be recorded.true
. If true, the video will be recorded.'video/webm'
. The mimetype of the recorded video or audio.undefined
. The chosen bitrate for the audio component of the media. If not specified, it will be adaptive, depending upon the sample rate and the number of channels.undefined
. The chosen bitrate for the video component of the media. If not specified, the rate will be 2.5Mbps.20
. The number of milliseconds to record into each packet.true
. If true, the global logger will be used, ignoring other debug options in this object.null
. If null, the debug will be shown by looking at the passed BrowserOptions. Otherwise, if useGlobalDebug is false, this specifies if the debug will be shown.null
. If useGlobalDebug is true, this will be ignore. Otherwise, this specifies if the euberlog logger scope for the debug of this scrape.There are also some error classes that can be thrown by this module:
/usr/bin/google-chrome
, because Chromium did not support the BBB videos. You can always change the browser executable path on the configurations.null
), the duration of the recording will be automatically detected by looking at the vjs player of the page and by adding a stopping delay of 15 seconds.Generated using TypeDoc