What is retrieve?

Retrieve is a function that allows you to retrieve structured data from any webpage. Use it to scrape and summarize information without creating traditional web scraping scripts.

It can be used standalone or as part of an agent session. Combine retrieve with step to create truly autonomous web research agents that can navigate sites and retreive full page data.

Retrieve data

Call retrieve with a URL and a command to create a new agent session and start retrieving data. The data will be returned as a JSON array of objects. While it is optional, we recommend that you specify fields for structured data outputs. It is also helpful to specify what each field means and the desired type in cmd.

1import { MultiOnClient } from "multion";
2
3const multion = new MultiOnClient({ apiKey: "YOUR_API_KEY" });
4
5const retrieveResponse = await multion.retrieve({
6 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
7 url: "https://news.ycombinator.com/",
8 fields: ["title", "creator", "time", "points", "comments", "url"]
9});
10
11const data = retrieveResponse.data;

Local mode

Use the local flag to run retrieve locally on your browser. Make sure the browser extension is installed and API Enabled is checked.

1const retrieveResponse = await multion.retrieve({
2 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
3 url: "https://news.ycombinator.com/",
4 fields: ["title", "creator", "time", "points", "comments", "url"],
5 local: true
6});

Max items

Use the max_items param to limit the number of items to retrieve. This is helpful for pages with lots of data, which usually takes more time to retrieve.

1const retrieveResponse = await multion.retrieve({
2 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
3 url: "https://news.ycombinator.com/",
4 fields: ["title", "creator", "time", "points", "comments", "url"],
5 maxItems: 10
6});

Retrieve viewport only

Set the full_page flag to false to retrieve from the agent viewport only. By default, retrieve will crawl the full page regardless of scrolling. Note that crawling the full page does not move the viewport, so dynamically loaded content can still be hidden. To ensure all content is loaded, use scroll to bottom.

1const retrieveResponse = await multion.retrieve({
2 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
3 url: "https://news.ycombinator.com/",
4 fields: ["title", "creator", "time", "points", "comments", "url"],
5 fullPage: false
6});

Retrieve JS elements

Use the render_js flag to render and retrieve JS and ARIA elements. This is helpful for retrieving image URLs, but will slow down the request.

1const retrieveResponse = await multion.retrieve({
2 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
3 url: "https://news.ycombinator.com/",
4 fields: ["title", "creator", "time", "points", "comments", "url"],
5 renderJs: true
6});

Scroll to bottom

Use the scroll_to_bottom flag to scroll to the bottom of the page before retrieving data. This is helpful for websites that dynamically load more content as you scroll down. If the retrieved data has more fields in the first few items or returns only items from the top of the page, consider setting scroll_to_bottom to true.

1const retrieveResponse = await multion.retrieve({
2 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
3 url: "https://news.ycombinator.com/",
4 fields: ["title", "creator", "time", "points", "comments", "url"],
5 scrollToBottom: true
6});

Get retreive screenshot

Use the include_screenshot flag to include a screenshot URL of the retrieval in the response.

1const retrieveResponse = await multion.retrieve({
2 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
3 url: "https://news.ycombinator.com/",
4 fields: ["title", "creator", "time", "points", "comments", "url"],
5 includeScreenshot: true
6});
7
8const screenshot = retrieveResponse.screenshot

Retrieve as part of a session

Use retrieve as part of a session to retrieve data alongside step actions. Calling retrieve without a session ID will create a new session. You can get the new session ID from the response.

1import { MultiOnClient } from "multion";
2
3const multion = new MultiOnClient({ apiKey: "YOUR_API_KEY" });
4
5const retrieveResponse = await multion.retrieve({
6 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
7 url: "https://news.ycombinator.com/",
8 fields: ["title", "creator", "time", "points", "comments", "url"]
9});
10
11const sessionId = retrieveResponse.sessionId;

You can also call retrieve with a session ID to use it as part of an already-created session.

1import { MultiOnClient } from "multion";
2
3const multion = new MultiOnClient({ apiKey: "YOUR_API_KEY" });
4
5const createResponse = await multion.sessions.create({
6 url: "https://news.ycombinator.com/"
7});
8
9const sessionId = createResponse.sessionId;
10
11const retrieveResponse = await multion.retrieve({
12 sessionId: sessionId,
13 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
14 fields: ["title", "creator", "time", "points", "comments", "url"]
15});

Use proxy

Use the use_proxy flag with create session to enable proxy for retrieve to bypass IP blocks and bot protections. When enabled, the agent will be slightly slower to respond.

1import { MultiOnClient } from "multion";
2
3const multion = new MultiOnClient({ apiKey: "YOUR_API_KEY" });
4
5const createResponse = await multion.sessions.create({
6 url: "https://news.ycombinator.com/",
7 useProxy: true
8});
9
10const sessionId = createResponse.sessionId;
11
12const retrieveResponse = await multion.retrieve({
13 sessionId: sessionId,
14 cmd: "Get all posts on Hackernews with title, creator, time created, points as a number, number of comments as a number, and the post URL.",
15 fields: ["title", "creator", "time", "points", "comments", "url"]
16});
Built with