Puppeteer

Puppeteer is a tool that allows you to automate interactions with web pages. It lets you control a headless Chrome browser (which means you won't see it on your screen) and do things like fill out forms, click buttons, and navigate to different pages.

Now, let's get started with an example. Let's say we want to use Puppeteer to go to the Google website and search for "puppies". Here's what our code might look like:


const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://www.google.com');
  await page.type('input[name="q"]', 'puppies');
  await page.click('input[type="submit"]');

  // Wait for search results to load
  await page.waitForSelector('#search');

  console.log('Search results loaded!');

  await browser.close();
})();

What each line of this code does:


const puppeteer = require('puppeteer');

This line imports the Puppeteer library into our Node.js script.


const browser = await puppeteer.launch();

This line launches a new instance of the headless Chrome browser.


const page = await browser.newPage();

This line creates a new page in the browser instance.


await page.goto('https://www.google.com');

This line navigates to the Google website.


await page.type('input[name="q"]', 'puppies');

This line finds the search bar on the page (identified by the input name "q") and types the word "puppies" into it.


await page.click('input[type="submit"]');

This line finds the submit button on the page (identified by the input type "submit") and clicks it.


await page.waitForSelector('#search');

This line waits for the search results to load (identified by the CSS selector "#search").


console.log('Search results loaded!');

This line logs a message to the console indicating that the search results have loaded.


await browser.close();

Finally, this line closes the browser instance.

That's it! With these few lines of code, we were able to automate the process of searching for "puppies" on Google using Puppeteer. I hope this helps you understand how Puppeteer works in Node.js!

Puppeteer:

Puppeteer is a Node.js library developed by the Google Chrome team. It provides a high-level API to control headless Chrome or Chromium over the

Protocol. Puppeteer allows you to automate the testing and scraping of web pages, as well as perform other tasks such as generating screenshots and PDFs of web pages.

In simple terms, Puppeteer is a tool that allows you to programmatically control a web browser (Chrome or Chromium) to interact with web pages and perform various actions, like clicking buttons, filling out forms, and navigating to different pages. With Puppeteer, you can write scripts in Node.js that automate repetitive tasks on the web, which can save you a lot of time and effort.

Puppeteer is built on top of the Chrome DevTools Protocol, which is a set of APIs for interacting with Chrome and Chromium. Puppeteer provides a simpler, more high-level API that abstracts away many of the complexities of the DevTools Protocol and makes it easier to write automation scripts.

Overall, Puppeteer is a powerful tool for web automation and testing, and is widely used in the web development and testing communities.

Here are some of its key features:

Automating user interactions: With Puppeteer, you can simulate user interactions with a web page, such as clicking buttons, filling out forms, and navigating to different pages.

Generating screenshots and PDFs: Puppeteer allows you to generate screenshots and PDFs of web pages, which can be useful for testing and debugging.

Web scraping: Puppeteer makes it easy to scrape data from web pages, allowing you to extract information like prices, product details, and more.

Performance testing: Puppeteer provides tools for measuring the performance of web pages, including metrics like page load time and resource usage.

Mobile emulation: Puppeteer can simulate mobile devices, allowing you to test how your web pages look and perform on different devices and screen sizes.

Headless mode: Puppeteer can run in headless mode, which means it runs without a visible user interface, making it faster and more efficient.

Easy setup: Puppeteer can be installed with npm and is easy to set up, making it accessible to developers of all skill levels.

Here are some of the key classes in Puppeteer:

Browser: The Browser class represents a browser instance, which can be used to create new pages and perform other browser-level tasks.

Page: The Page class represents a web page, and provides methods for interacting with the page, such as navigating to a URL, clicking elements, and filling out forms.

ElementHandle: The ElementHandle class represents a DOM element on a web page, and provides methods for interacting with the element, such as clicking it, typing into it, and getting its properties.

Frame: The Frame class represents a frame or iframe on a web page, and provides methods for interacting with the frame, such as navigating it and evaluating JavaScript in it.

Request: The Request class represents a network request made by a web page, and provides information about the request, such as its URL, headers, and response.

Response: The Response class represents a network response received by a web page, and provides information about the response, such as its status code, headers, and content.

Here are some of the key functionalities of Puppeteer:

Web page automation: With Puppeteer, you can automate interactions with web pages, such as clicking buttons, filling out forms, and navigating to different pages. This allows you to test and debug web pages, and automate repetitive tasks.

Web scraping: Puppeteer makes it easy to scrape data from web pages, allowing you to extract information like prices, product details, and more. This can be useful for a variety of applications, such as data mining and price comparison.

Performance testing: Puppeteer provides tools for measuring the performance of web pages, including metrics like page load time and resource usage. This allows you to optimize the performance of your web pages and ensure that they are fast and responsive.

PDF and screenshot generation: Puppeteer allows you to generate PDFs and screenshots of web pages, which can be useful for testing and debugging, as well as for generating reports and documentation.

Mobile emulation: Puppeteer can simulate mobile devices, allowing you to test how your web pages look and perform on different devices and screen sizes. This can be useful for ensuring that your web pages are responsive and mobile-friendly.

Headless mode: Puppeteer can run in headless mode, which means it runs without a visible user interface. This makes it faster and more efficient, and allows you to automate tasks without being distracted by a visual interface.

Mouse interactions: Puppeteer allows you to simulate mouse interactions with a web page using the mouse object. You can move the mouse to a specific point on the page using the move method, click an element using the click method, and perform other mouse actions using other methods like down, up, and wheel.


// Example: Click on a button using the mouse
await page.waitForSelector('#my-button');
const button = await page.$('#my-button');
await button.click();

Keyboard interactions: Puppeteer also allows you to simulate keyboard interactions with a web page using the keyboard object. You can type text into an element using the type method, press and release specific keys using the press and release methods, and more.


// Example: Type "hello world" into an input field using the keyboard
await page.waitForSelector('#my-input');
const input = await page.$('#my-input');
await input.type('hello world');

File chooser: Puppeteer provides a way to simulate the selection of a file using the FileChooser class. You can use the setFiles method to set the files to be uploaded, and then use the accept method to accept the file selection.


// Example: Upload a file using a file chooser
await page.waitForSelector('#my-file-input');
const input = await page.$('#my-file-input');
const [fileChooser] = await Promise.all([
  page.waitForFileChooser(),
  input.click(),
]);
await fileChooser.setFiles('/path/to/my/file.pdf');
await fileChooser.accept();

Browser context: Puppeteer allows you to create separate browser contexts using the BrowserContext class. A browser context is like a separate instance of the browser that has its own cookies, cache, and other state. This can be useful for testing scenarios where you need to isolate the state of the browser.


// Example: Create a new browser context and navigate to a page
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
await page.goto('https://www.example.com');

Page navigation: Puppeteer provides a variety of methods for navigating between pages and controlling the browser history. You can navigate to a new page using the goto method, go back or forward in the browser history using the goBack and goForward methods, and reload the current page using the reload method.


// Example: Navigate to a new page and go back in the browser history
await page.goto('https://www.example.com');
await page.goBack();

Element handling: Puppeteer provides a variety of methods for interacting with elements on a page, including selecting elements by CSS selector, XPath, or other criteria, getting the text or value of an element, and more.


// Example: Get the text of a paragraph element
await page.waitForSelector('p');
const paragraph = await page.$('p');
const text = await page.evaluate(element => element.textContent, paragraph);
console.log(text);

Network interception: Puppeteer allows you to intercept and modify network requests made by a page using the intercept method. You can use this to mock responses, block certain requests, or modify the request or response headers.


// Example: Intercept a network request and modify the response
await page.setRequestInterception(true);
page.on('request', request => {
  if (request.url().endsWith('.png')) {
    request.respond({
      content: 'image/png',
      body: Buffer.from('fake-image-data'),
    });
  } else {
    request.continue();
  }
});

Page events: Puppeteer provides a variety of events that you can listen for on a page, such as the load event, the dialog event (which is triggered when a JavaScript alert or confirmation dialog appears), and the console event (which is triggered when a page logs a message to the console).


// Example: Log console messages to the console
page.on('console', message => console.log(message.text()));

Puppeteer provides several methods for working with URLs, which allow you to navigate to pages, manipulate URLs, and retrieve information about them. Here are some examples:

Navigation: You can navigate to a new page using the goto method, which takes a URL as its argument. You can also retrieve the current URL of a page using the url method.


// Navigate to a new page
await page.goto('https://www.example.com');

// Get the current URL
const currentUrl = await page.url();
console.log(currentUrl);

Manipulating URLs: Puppeteer provides the URL class, which allows you to manipulate URLs by adding or removing query parameters, fragments, and more. You can create a new URL instance by passing a URL string to its constructor.


// Create a new URL object
const url = new URL('https://www.example.com');

// Add a query parameter
url.searchParams.set('key', 'value');

// Remove a fragment
url.hash = '';

// Get the updated URL string
const updatedUrl = url.toString();
console.log(updatedUrl);

Retrieving information about URLs: You can use the parse method of the url module to parse a URL string and retrieve information about its components, such as the protocol, hostname, and port.


// Parse a URL string
const url = new URL('https://www.example.com/path/to/page?query=parameter');

// Get the protocol
console.log(url.protocol); // "https:"

// Get the hostname
console.log(url.hostname); // "www.example.com"

// Get the port (returns an empty string if the port is not specified)
console.log(url.port); // ""

Extracting URLs from a page: You can use Puppeteer to extract URLs from a page, for example by finding all the links on a page and retrieving their href attributes.


// Get all links on the page and extract their URLs
const links = await page.$$eval('a', elements => elements.map(element => element.href));
console.log(links);

Checking the URL of a page: You can use Puppeteer to check whether the URL of a page matches a certain pattern, for example to make sure that a redirect has taken you to the expected page.


// Navigate to a page and check its URL
await page.goto('https://www.example.com/redirect');
const currentUrl = await page.url();
if (currentUrl === 'https://www.example.com/expected-page') {
  console.log('Redirect succeeded!');
} else {
  console.log('Redirect failed: expected URL was', expectedUrl, 'but actual URL was', currentUrl);
}

Handling URL fragments: Puppeteer allows you to retrieve and manipulate the fragment (the part of a URL after the # symbol) using the hash property of the URL object.


// Navigate to a page and retrieve the fragment
await page.goto('https://www.example.com/page#fragment');
const url = new URL(await page.url());
const fragment = url.hash;
console.log(fragment);

// Modify the fragment and navigate to the updated URL
url.hash = 'new-fragment';
await page.goto(url.toString());

List of some common methods provided by Puppeteer:

browser.newPage(): Creates a new Page object in the current browser context.

page.goto(url[, options]): Navigates to the specified URL.

page.click(selector[, options]): Clicks the element specified by the given selector.

page.type(selector, text[, options]): Types the given text into the element specified by the given selector.

page.waitForSelector(selector[, options]): Waits for the element specified by the given selector to be added to the page.

page.waitForNavigation([options]): Waits for the page to navigate to a new URL.

page.screenshot([options]): Takes a screenshot of the current page and returns it as a PNG buffer.

page.evaluate(pageFunction[, ...args]): Executes the given function in the context of the page and returns its result.

page.$(selector): Finds the first element matching the given selector.

page.$$(selector): Finds all elements matching the given selector.

page.setContent(html[, options]): Sets the HTML content of the page.

page.goBack([options]): Navigates to the previous page in the history.

page.goForward([options]): Navigates to the next page in the history.

page.waitForTimeout(timeout): Waits for the specified amount of time (in milliseconds) before continuing.

page.waitForFunction(pageFunction[, options[, ...args]]): Waits for the given function to return a truthy value before continuing.

page.waitForNavigation([options]): Waits for the page to navigate to a new URL.

page.setViewport(viewport) Sets the size of the viewport for the page.

page.evaluateHandle(pageFunction[, ...args]): Executes the given function in the context of the page and returns a handle to its result.

page.addScriptTag(options): Adds a script tag to the page.

page.setRequestInterception(value): Enables or disables request interception for the page.

Comparison between Puppeteer and Selenium

Sr. No.	Puppeteer	Selenium
1.	Puppeteer is developed mainly for Chromium so the tests developed are mainly executed in Chrome	Selenium can be used to execute tests on multiple browsers like Chrome, Firefox, IE, Safari, and so on.
2.	Puppeteer code can be implemented only in JavaScript	Selenium code can be implemented on multiple languages like Java, Python, JavaScript, C#. and so on.
3.	Puppeteer provides APIs to manage headless execution in Chrome by using the DevTools protocol.	Selenium requires additional external browser drivers that trigger tests as per the user commands.
4.	Puppeteer manages the Chrome browser.	Selenium is primarily used to execute tests to automate the actions performed on the browser.
5.	Puppeteer is faster in executing tests than Selenium	Selenium is slower in executing tests than Puppeteer.
6.	Puppeteer is a module in node developed for Chromium engine.	Selenium is a dedicated test automation tool.
7.	Puppeteer can be used for API testing by utilising the requests and the responses.	API testing with Selenium is difficult.
8.	Puppeteer can be used to verify the count of CSS and JavaScript files utilised for loading a webpage.	Selenium cannot be used to verify the count of CSS and JavaScript files utilised for loading a webpage.
9.	Puppeteer can be used to work on the majority of features in the DevTools in the Chrome browser.	Selenium cannot be used to work on the majority of features in the DevTools in the Chrome browser.
10.	Puppeteer can be used to execute tests on various devices with the help of the emulators	Using an emulator with Selenium is not easy.
11.	Puppeteer can be used to obtain the time needed for a page to load.	Selenium cannot be used to obtain the time needed for a page to load.
12.	Puppeteer can be used to save a screenshot in both image and PDF formats.	Selenium can be used to save a screenshot in both image and PDF formats only in the Selenium 4 version
13.	Puppeteer was first introduced in the year 2017.	Selenium was first introduced in the year 2004.
14.	In Puppeteer, we can verify an application without image loading.	In Selenium, we can verify an application without image loading.

Demo Code1:


const puppeteer = require('puppeteer');
(async function () {
    const browser = await puppeteer.launch();
    console.log("Launched");
    const page = await browser.newPage();
    await page.goto('https://www.google.com/');
    console.log("In Site");
    await page.screenshot({ path: './Demo2.png' });
    console.log("Captured");
    browser.close();
})();

This is what we are doing in this small script:

We import the Puppeteer library using require.

Launch a new browser.

Open a new page (tab) inside that browser.

Navigate to the Wikipedia page.

Take a screenshot.

Close the browser.

Some of the most common methods:

waitUntil: This method specifies when the page.waitFor...() method should stop waiting. The available options are:

load: Wait until the page is fully loaded (i.e., all resources like images, stylesheets, scripts, etc. have finished loading).

domcontentloaded: Wait until the DOMContentLoaded event is fired (i.e., the HTML content has been parsed and rendered, but some resources may still be loading).

networkidle0: Wait until there are no more than 0 network connections for at least 500ms (i.e., the page is considered fully loaded when there are no pending network requests).

networkidle2: Wait until there are no more than 2 network connections for at least 500ms (i.e., the page is considered fully loaded when there are no pending network requests or when there are at most 2 network connections left, which may be useful for pages that load resources dynamically).

timeout: This method specifies the maximum amount of time (in milliseconds) to wait for the condition to be met before timing out and throwing an error.

visible: This method specifies whether to wait for an element to become visible on the page.

hidden: This method specifies whether to wait for an element to become hidden on the page.

selector: This method specifies the CSS selector of the element to wait for.

https://appexchange.salesforce.com/

appx-tiles-grid-ul

data-listing-name


const puppeteer = require('puppeteer');
const xlsx = require('xlsx');

(async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.goto('https://appexchange.salesforce.com/consulting');

  // Scroll to the bottom of the page to load all data
  await autoScroll(page);

  // Collect the data and save it to an Excel file
  const data = await page.evaluate(() => {
    const rows = [];
    document.querySelectorAll('.appx-tile-content-el').forEach(span => {
      const cells = [span.innerText];
      rows.push(cells);
    });
    return rows;
  });
  const wb = xlsx.utils.book_new();
  const ws = xlsx.utils.aoa_to_sheet(data);
  xlsx.utils.book_append_sheet(wb, ws, 'Data');
  xlsx.writeFile(wb, 'data.xlsx');

  await browser.close();
})();

async function autoScroll(page) {
  await page.evaluate(async () => {
    await new Promise((resolve, reject) => {
      let totalHeight = 0;
      const distance = 100;
      const scrollInterval = setInterval(() => {
        const scrollHeight = document.body.scrollHeight;
        window.scrollBy(0, distance);
        totalHeight += distance;
        if (totalHeight >= scrollHeight) {
          clearInterval(scrollInterval);
          resolve();
        }
      }, 1000); // Scroll every 1 second
    });

    await new Promise(resolve => setTimeout(resolve, 2000)); // Wait for 2 seconds after scrolling

    // Click the "Load More" button repeatedly until it's no longer present
    while (document.querySelector('#appx-load-more-button-id')) {
      document.querySelector('#appx-load-more-button-id').click();
      await new Promise(resolve => setTimeout(resolve, 2000)); // Wait for 2 seconds after clicking
    }

    await new Promise(resolve => setTimeout(resolve, 2000)); // Wait for 2 seconds after clicking all "Load More" buttons
  });
}

Salesforce Application	Sales Cloud
Paid Applications	ZOOMINFO FOR SALESFORCE, DEMANDBASE (DATA AND SALES INTELLIGENCE CLOUD), CLEARBIT - AUTOMATICALLY ENRICH LEADS AND CONTACTS IN REAL-TIME, MERCURY SMS: SEND & RECEIVE TEXT MESSAGES, SMS-MAGIC
Free Applications	DATALOADER.IO, THE #1 DATA LOADER FOR SALESFORCE, ASANA FOR SALESFORCE, NATIVE DOCUMENT GENERATION & E-SIGNATURE: PDF, WORD, XLS, EMAIL, REPORTS: S-DOCS
ㅤ	Leading enterprise cloud marketplace Apps, solutions, and consultants Every industry and department Sales, marketing, customer service, and more Service Cloud Phone, email, social media, apps, or any other channel Solve customer problems fast, get insights into their behavior Getfeedback: Surveys for Salesforce - the best rated for CSAT, CES, NPS Q-assign: Lead routing, case assignment, round robin distribution Distribution engine: Lead assignment & opportunity routing. Round robin. In-gage – Surveys, compliance checks, quality audits & case categorization Five9 for Service Cloud Voice BYOT Vonage for Service Cloud Voice and Contact Center, CTI, speech analytics (BYOT) Talkdesk for Service Cloud Voice Avaya OneCloud™ for Salesforce - Service Cloud Voice (BYOT) powered by Avaya B+S Connects for Service Cloud Voice CTI, omni-channel, HVS, dialer, BYOT voice InGenius Nice CXone Agent for Service Cloud Voice (BYOT) CTI, BYOT, phone, HVS Mirage Connector for Service Cloud Voice - BYOT Natterbox Glance Gainsight: The #1 Rated Customer Success Platform RWS Language Weaver for Live Agent UPS Shipping App: Shipping, Returns, RMAs and Tracking Vonage for Service Cloud Voice and Contact Center, CTI, speech analytics (BYOT) Nice CXone Agent for Salesforce - CTI / IVR / ACD / Dialer / Contact Center Five9 Intelligent Cloud Contact Center Interactive Intelligence CTI 8x8 Virtual Office: CTI Ingenius

Key	Value
Salesforce Apps	Service Cloud Applications
AppExchange	Leading enterprise cloud marketplace
Ready-to-install	Apps, solutions, and consultants
Extend Salesforce	Every industry and department
Solutions	Sales, marketing, customer service, and more
Latest Collections	Service Cloud
Customer Service Platform	#1
Support channels	Phone, email, social media, apps, or any other channel
Solutions from AppExchange	Solve customer problems fast, get insights into their behavior
Service & Support Dashboards	Getfeedback: Surveys for Salesforce - the best rated for CSAT, CES, NPS
Lead Routing	Q-assign: Lead routing, case assignment, round robin distribution
Lead assignment	Distribution engine: Lead assignment & opportunity routing. Round robin.
Surveys	In-gage – Surveys, compliance checks, quality audits & case categorization
Service Cloud Voice Telephony Partners	Five9 for Service Cloud Voice BYOT
Contact center	Vonage for Service Cloud Voice and Contact Center, CTI, speech analytics (BYOT)
Cloud Contact Center	Talkdesk for Service Cloud Voice
Salesforce	Avaya OneCloud™ for Salesforce - Service Cloud Voice (BYOT) powered by Avaya
Cisco Contact Center Integration	B+S Connects for Service Cloud Voice
Genesys Cloud for Salesforce	CTI, omni-channel, HVS, dialer, BYOT voice
Partner Telephony	InGenius
Omnichannel	Nice CXone Agent for Service Cloud Voice (BYOT)
Odigo for Salesforce Service Cloud Voice	CTI, BYOT, phone, HVS
Mirage Connector	Mirage Connector for Service Cloud Voice - BYOT
Speech Analytics	Natterbox
Advanced Service Cloud Features	Glance
Customer Success Platform	Gainsight: The #1 Rated Customer Success Platform
Live Agent	RWS Language Weaver for Live Agent
Shipping	UPS Shipping App: Shipping, Returns, RMAs and Tracking
Service Cloud CTI Partners	Vonage for Service Cloud Voice and Contact Center, CTI, speech analytics (BYOT)
CTI/IVR/ACD/Dialer/Contact Center	Nice CXone Agent for Salesforce - CTI / IVR / ACD / Dialer / Contact Center
Intelligent Cloud Contact Center	Five9 Intelligent Cloud Contact Center
PureConnect	Interactive Intelligence
Amazon Connect CTI Adapter	CTI
Virtual Office	8x8 Virtual Office: CTI
Computer Telephony Integration	Ingenius
Offer your solution on AppExchange	-