Scrape H&M with MultiOn

This example combines MultiOn step and retrieve to scrape the H&M website catalog.

Project setup

1

Initialize project

Create a new project by running the following command in your terminal:

$npm init
2

Install package

Install the multion package by running the following command in your terminal:

$npm install multion
3

Import library

Create a new file called index.ts and import the required library for the example:

1import { MultiOnClient } from 'multion';
4

Initialize client

Initialize the MultiOn client with your API key.

1const multion = new MultiOnClient({ apiKey: "YOUR_API_KEY" });
5

Run script

Run your script by running the following command in your terminal:

$node index.ts

Scrape first page

To scrape the first page of the H&M catalog, we can simply call retrieve.

1const retrieveResponse = await multion.retrieve({
2 url: "https://www2.hm.com/en_us/men/products/view-all.html",
3 cmd: "Get all items and their name, price, colors, purchase url, and image url.",
4 fields: ["name", "price", "colors", "purchase_url", "image_url"]
5});
6
7const data = retrieveResponse.data;
8console.log(data);

However, you might notice that while the first few items are complete, the rest are incomplete and some are even broken—especially images. This is because H&M dynamically loads the images as the user scrolls down the page.

To help with this, we can use renderJs to ensure image links are included and scrollToBottom to scroll down the page.

1const retrieveResponse = await multion.retrieve({
2 url: "https://www2.hm.com/en_us/men/products/view-all.html",
3 cmd: "Get all items and their name, price, colors, purchase url, and image url.",
4 fields: ["name", "price", "colors", "purchase_url", "image_url"],
5 renderJs: true,
6 scrollToBottom: true
7});
8
9const data = retrieveResponse.data;
10console.log(data);

If we only want 10 items from the page, we can use maxItems to speed up the request.

1const retrieveResponse = await multion.retrieve({
2 url: "https://www2.hm.com/en_us/men/products/view-all.html",
3 cmd: "Get all items and their name, price, colors, purchase url, and image url.",
4 fields: ["name", "price", "colors", "purchase_url", "image_url"],
5 renderJs: true,
6 scrollToBottom: true,
7 maxItems: 10
8});
9
10const data = retrieveResponse.data;
11console.log(data);

Scrape multiple pages autonomously

To scrape multiple pages autonomously, we can use retrieve with step to navigate to next page. To do this, we must first create a session.

1const createResponse = await multion.sessions.create({
2 url: "https://www2.hm.com/en_us/men/products/view-all.html"
3 // Can set useProxy to true to circumvent IP block
4});
5
6const sessionId = createResponse.sessionId;
7console.log("Session created: ", sessionId);

Then, we can create a while loop that will keep running until the last page. At each iteration, the agent will retrieve data and step to navigate to the next page.

1let hasMore = true;
2while (hasMore) {
3 const retrieveResponse = await multion.retrieve({
4 sessionId: sessionId,
5 cmd: "Get all items and their name, price, colors, purchase url, and image url.",
6 fields: ["name", "price", "colors", "purchase_url", "image_url"],
7 renderJs: true,
8 scrollToBottom: true
9 });
10 console.log("Data retrieved: ", retrieveResponse.data);
11 const stepResponse = await multion.sessions.step(sessionId, {
12 cmd: "Keep clicking on the next page button.",
13 mode: "fast",
14 });
15 console.log("Navigating to next page: ", stepResponse.message);
16 hasMore = !stepResponse.message.includes("last page");
17 // Can implement better way of checking if more pages to scrape
18}

Scrape multiple pages in parallel

To massively speed up the scraping process, we can call retrieve for each page simultaneously. This works for H&M because the URL is numbered for each page.

1const pagePromises = Array.from({ length: 10 }, (_, i) => i + 1).map(async (i) => {
2 const retrieveResponse = await multion.retrieve({
3 url: `https://www2.hm.com/en_us/men/products/view-all.html?page=${i}`,
4 cmd: "Get all items and their name, price, colors, purchase url, and image url.",
5 fields: ["name", "price", "colors", "purchase_url", "image_url"],
6 renderJs: true,
7 scrollToBottom: true
8 });
9 console.log(`Data retrieved for page ${i}: `, retrieveResponse.data);
10 return retrieveResponse.data;
11});
12
13await Promise.all(pagePromises);