Scrape H&M with MultiOn

This example combines MultiOn step and retrieve to scrape the H&M website catalog.

Project setup

TypeScript

Python

Initialize project

Create a new project by running the following command in your terminal:

$ npm init

Install package

Install the multion package by running the following command in your terminal:

$ npm install multion

Import library

Create a new file called index.ts and import the required library for the example:

1 import { MultiOnClient } from 'multion';

Initialize client

Initialize the MultiOn client with your API key.

1 const multion = new MultiOnClient({ apiKey: "YOUR_API_KEY" });

Run script

Run your script by running the following command in your terminal:

$ node index.ts

Scrape first page

To scrape the first page of the H&M catalog, we can simply call retrieve.

1 const retrieveResponse = await multion.retrieve({
2   url: "https://www2.hm.com/en_us/men/products/view-all.html",
3   cmd: "Get all items and their name, price, colors, purchase url, and image url.",
4   fields: ["name", "price", "colors", "purchase_url", "image_url"]
5 });
6 
7 const data = retrieveResponse.data;
8 console.log(data);

However, you might notice that while the first few items are complete, the rest are incomplete and some are even broken—especially images. This is because H&M dynamically loads the images as the user scrolls down the page.

To help with this, we can use renderJs to ensure image links are included and scrollToBottom to scroll down the page.

1 const retrieveResponse = await multion.retrieve({
2   url: "https://www2.hm.com/en_us/men/products/view-all.html",
3   cmd: "Get all items and their name, price, colors, purchase url, and image url.",
4   fields: ["name", "price", "colors", "purchase_url", "image_url"],
5   renderJs: true,
6   scrollToBottom: true
7 });
8 
9 const data = retrieveResponse.data;
10 console.log(data);

If we only want 10 items from the page, we can use maxItems to speed up the request.

1 const retrieveResponse = await multion.retrieve({
2   url: "https://www2.hm.com/en_us/men/products/view-all.html",
3   cmd: "Get all items and their name, price, colors, purchase url, and image url.",
4   fields: ["name", "price", "colors", "purchase_url", "image_url"],
5   renderJs: true,
6   scrollToBottom: true,
7   maxItems: 10
8 });
9 
10 const data = retrieveResponse.data;
11 console.log(data);

Scrape multiple pages autonomously

To scrape multiple pages autonomously, we can use retrieve with step to navigate to next page. To do this, we must first create a session.

1 const createResponse = await multion.sessions.create({
2   url: "https://www2.hm.com/en_us/men/products/view-all.html"
3   // Can set useProxy to true to circumvent IP block
4 });
5 
6 const sessionId = createResponse.sessionId;
7 console.log("Session created: ", sessionId);

Then, we can create a while loop that will keep running until the last page. At each iteration, the agent will retrieve data and step to navigate to the next page.

1 let hasMore = true;
2 while (hasMore) {
3   const retrieveResponse = await multion.retrieve({
4     sessionId: sessionId,
5     cmd: "Get all items and their name, price, colors, purchase url, and image url.",
6     fields: ["name", "price", "colors", "purchase_url", "image_url"],
7     renderJs: true,
8     scrollToBottom: true
9   });
10   console.log("Data retrieved: ", retrieveResponse.data);
11   const stepResponse = await multion.sessions.step(sessionId, {
12     cmd: "Keep clicking on the next page button.",
13     mode: "fast",
14   });
15   console.log("Navigating to next page: ", stepResponse.message);
16   hasMore = !stepResponse.message.includes("last page"); 
17   // Can implement better way of checking if more pages to scrape
18 }

Scrape multiple pages in parallel

To massively speed up the scraping process, we can call retrieve for each page simultaneously. This works for H&M because the URL is numbered for each page.

1 const pagePromises = Array.from({ length: 10 }, (_, i) => i + 1).map(async (i) => {
2   const retrieveResponse = await multion.retrieve({
3     url: `https://www2.hm.com/en_us/men/products/view-all.html?page=${i}`,
4     cmd: "Get all items and their name, price, colors, purchase url, and image url.",
5     fields: ["name", "price", "colors", "purchase_url", "image_url"],
6     renderJs: true,
7     scrollToBottom: true
8   });
9   console.log(`Data retrieved for page ${i}: `, retrieveResponse.data);
10   return retrieveResponse.data;
11 });
12 
13 await Promise.all(pagePromises);