Retrieve
What is retrieve?
Retrieve is a function that allows you to retrieve structured data from any webpage. Use it to scrape and summarize information without creating traditional web scraping scripts.
It can be used standalone or as part of an agent session. Combine retrieve with step to create truly autonomous web research agents that can navigate sites and retreive full page data.
Retrieve data
Call retrieve with a URL and a command to create a new agent session and start retrieving data. The data will be returned as a JSON array of objects. While it is optional, we recommend that you specify fields
for structured data outputs. It is also helpful to specify what each field means and the desired type in cmd
.
Local mode
Use the local
flag to run retrieve locally on your browser. Make sure the browser extension is installed and API Enabled is checked.
Max items
Use the max_items
param to limit the number of items to retrieve. This is helpful for pages with lots of data, which usually takes more time to retrieve.
Retrieve viewport only
Set the full_page
flag to false to retrieve from the agent viewport only. By default, retrieve will crawl the full page regardless of scrolling. Note that crawling the full page does not move the viewport, so dynamically loaded content can still be hidden. To ensure all content is loaded, use scroll to bottom.
Retrieve JS elements
Use the render_js
flag to render and retrieve JS and ARIA elements. This is helpful for retrieving image URLs, but will slow down the request.
Scroll to bottom
Use the scroll_to_bottom
flag to scroll to the bottom of the page before retrieving data. This is helpful for websites that dynamically load more content as you scroll down. If the retrieved data has more fields in the first few items or returns only items from the top of the page, consider setting scroll_to_bottom
to true.
Get retreive screenshot
Use the include_screenshot
flag to include a screenshot URL of the retrieval in the response.
Retrieve as part of a session
Use retrieve as part of a session to retrieve data alongside step actions. Calling retrieve without a session ID will create a new session. You can get the new session ID from the response.
You can also call retrieve with a session ID to use it as part of an already-created session.
Use proxy
Use the use_proxy
flag with create session to enable proxy for retrieve to bypass IP blocks and bot protections. When enabled, the agent will be slightly slower to respond.