Scrape H&M with MultiOn
This example combines MultiOn step and retrieve to scrape the H&M website catalog.
Project setup
TypeScript
Python
Scrape first page
To scrape the first page of the H&M catalog, we can simply call retrieve.
However, you might notice that while the first few items are complete, the rest are incomplete and some are even broken—especially images. This is because H&M dynamically loads the images as the user scrolls down the page.
To help with this, we can use renderJs
to ensure image links are included and scrollToBottom
to scroll down the page.
If we only want 10 items from the page, we can use maxItems
to speed up the request.
Scrape multiple pages autonomously
To scrape multiple pages autonomously, we can use retrieve with step to navigate to next page. To do this, we must first create a session.
Then, we can create a while loop that will keep running until the last page. At each iteration, the agent will retrieve data and step to navigate to the next page.
Scrape multiple pages in parallel
To massively speed up the scraping process, we can call retrieve for each page simultaneously. This works for H&M because the URL is numbered for each page.