JavaScript Development Space

Download Images From Instagram Using NodeJS and Puppeteer

Add to your RSS feed22 March 20244 min read
Download Images From Instagram Using NodeJS and Puppeteer

This article explain how to use Google Puppeteer and download images from a Instagram using Puppeteer.

Downloading images from Instagram using Node.js and Puppeteer involves automating the process of navigating to Instagram, accessing the desired images, and saving them to your local machine. Here's a basic example of how you can achieve this:

Let's download images from Instagram of Kim Kardashian (@kimkardashian).

What is Puppeteer?

Puppeteer is a Node.js library developed by Google that provides a high-level API over the Chrome DevTools Protocol. It allows you to control and automate Chromium or Chrome browser instances, enabling tasks such as web scraping, automated testing, taking screenshots, generating PDFs, and more.

Puppeteer provides a powerful set of features for interacting with web pages programmatically.

Setup Application

Step 1: Install Dependencies

First, you need to create Puppeteer config file and install a library:

Create file .puppeteerrc.cjs

js
1 const { join } = require('path');
2
3 /**
4 * @type {import("puppeteer").Configuration}
5 */
6 module.exports = {
7 // Changes the cache location for Puppeteer.
8 cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
9 };

now run

npm install puppeteer

Add to your package.json file:

json
1 "type": "module"

Step 2: Test the Puppeteteer

We will attempt to create a screenshot using Puppeteer of a random post by Kim Kardashian (https://www.instagram.com/kimkardashian/p/C4lwwOYSpW-/?hl=en&img_index=1).

Create a JavaScript file, for example, downloadInstagramImages.js, and write the script to check if the puppeteteer is working properly:

js
1 import puppeteer from 'puppeteer';
2
3 async function run() {
4 const browser = await puppeteer.launch({ headless: true });
5 const page = await browser.newPage();
6 await page.goto('https://www.instagram.com/kimkardashian/p/C4lwwOYSpW-/?hl=en&img_index=1');
7 await page.waitForSelector('section');
8 await page.setViewport({ width: 1080, height: 1024 });
9 await page.screenshot({ path: 'screen.png', fullPage: true });
10 await browser.close();
11 }
12
13 run();

Now run the code:

node downloadInstagramImages.js

We got this in our screen.png file:

Instagram Screenshot using Puppeteer Instagram Screenshot using Puppeteer

Step 3: Create Helpers Functions

We need to create two functions: one to download an image from a source link and another to check if our destination folder already exists.

Check if the destination folder already exists function

js
1 const checkIfDirExists = (directory) => {
2 return new Promise((resolve, reject) => {
3 fs.access(directory, fs.constants.F_OK, (err) => {
4 if (err) {
5 // Directory doesn't exist, create it
6 fs.mkdir(directory, { recursive: true }, (err) => {
7 if (err) {
8 console.error('Error creating directory:', err);
9 reject();
10 } else {
11 console.log('Directory created successfully');
12 resolve();
13 }
14 });
15 } else {
16 console.log('Directory already exists');
17 resolve();
18 }
19 resolve();
20 });
21 });
22 };

You can also use another method to resolve a directory

Download function

js
1 const download = (url, destination) => {
2 return new Promise((resolve, reject) => {
3 checkIfDirExists('images').then(() => {
4 const file = fs.createWriteStream(destination);
5
6 https
7 .get(url, (response) => {
8 response.pipe(file);
9
10 file.on('finish', () => {
11 file.close(resolve(true));
12 });
13 })
14 .on('error', (error) => {
15 fs.unlink(destination);
16
17 reject(error.message);
18 });
19 });
20 });
21 };

Add new imports at the top of the file:

js
1 import fs from 'fs';
2 import https from 'https';

Step 4: Write the Run Function

js
1 async function run() {
2 const browser = await puppeteer.launch({ headless: true });
3 const page = await browser.newPage();
4 await page.goto('https://www.instagram.com/kimkardashian/p/C4lwwOYSpW-/?hl=en&img_index=1');
5 await page.waitForSelector('section');
6 await page.setViewport({ width: 1080, height: 1024 });
7 await page.screenshot({ path: 'screen.png', fullPage: true });
8 const links = await page.evaluate(() =>
9 Array.from(document.querySelectorAll('article a'), (el) => el.href),
10 );
11 const images = await page.evaluate(() =>
12 Array.from(document.querySelectorAll('article div[role=button] div._aagv img'), (img) => {
13 return {
14 imgUrl: img.src,
15 alt: img.alt,
16 slug: img.src.slice(img.src.lastIndexOf('/') + 1, img.src.lastIndexOf('.jpg') + 4),
17 };
18 }),
19 );
20
21 await browser.close();
22 images.map(async (img) => {
23 download(img.imgUrl, 'images/' + img.slug);
24 });
25 }
26
27 run();

Step 5: Run the Script

Run the script using Node.js:

node downloadInstagramImages.js

Here is a complete example of the script:

js
1 import fs from 'fs';
2 import https from 'https';
3 import puppeteer from 'puppeteer';
4
5 const checkIfDirExists = (directory) => {
6 return new Promise((resolve, reject) => {
7 fs.access(directory, fs.constants.F_OK, (err) => {
8 if (err) {
9 // Directory doesn't exist, create it
10 fs.mkdir(directory, { recursive: true }, (err) => {
11 if (err) {
12 console.error('Error creating directory:', err);
13 reject();
14 } else {
15 console.log('Directory created successfully');
16 resolve();
17 }
18 });
19 } else {
20 console.log('Directory already exists');
21 resolve();
22 }
23 resolve();
24 });
25 });
26 };
27
28 const download = (url, destination) => {
29 return new Promise((resolve, reject) => {
30 checkIfDirExists('images').then(() => {
31 const file = fs.createWriteStream(destination);
32
33 https
34 .get(url, (response) => {
35 response.pipe(file);
36
37 file.on('finish', () => {
38 file.close(resolve(true));
39 });
40 })
41 .on('error', (error) => {
42 fs.unlink(destination);
43
44 reject(error.message);
45 });
46 });
47 });
48 };
49
50 async function run() {
51 const browser = await puppeteer.launch({ headless: true });
52 const page = await browser.newPage();
53 await page.goto('https://www.instagram.com/kimkardashian/p/C4lwwOYSpW-/?hl=en&img_index=1');
54 await page.waitForSelector('section');
55 await page.setViewport({ width: 1080, height: 1024 });
56 await page.screenshot({ path: 'screen.png', fullPage: true });
57 const links = await page.evaluate(() =>
58 Array.from(document.querySelectorAll('article a'), (el) => el.href),
59 );
60 const images = await page.evaluate(() =>
61 Array.from(document.querySelectorAll('article div[role=button] div._aagv img'), (img) => {
62 return {
63 imgUrl: img.src,
64 alt: img.alt,
65 slug: img.src.slice(img.src.lastIndexOf('/') + 1, img.src.lastIndexOf('.jpg') + 4),
66 };
67 }),
68 );
69
70 await browser.close();
71 images.map(async (img) => {
72 download(img.imgUrl, 'images/' + img.slug);
73 });
74 }
75
76 run();

Conclusion:

Using Puppeteer, you can automate the process of downloading images from Instagram. However, keep in mind the legal and ethical considerations involved when accessing and downloading content from websites.

Related Posts:

JavaScript Development Space

© 2024 JavaScript Development Space - Master JS and NodeJS. All rights reserved.