How to extract data and download files from a URL more efficiently
💡 Do you need to download files from multiple URLs where each URL is a unique download link? You've got you covered.
Problem Description
You need to extract a list of URLs and perform certain actions thereafter) For example, you need to navigate to the following web page:
https://www.i-pex.com/library/white-paper
This page contains a list of URLs containing white papers. You will need to navigate to each link to download the PDF files, which can be tedious when using UI Elements.
Solution
The following flow provides a high level overview of the solution:
Firstly, we will need to extract the list of white papers as a datatable using the action “Extract data from web page”. Remember to specify the store data mode as “Variable”.
While the Extract data from web page action is open, navigate to ‘https://www.i-pex.com/library/white-paper’ browser manually. Right click the first title > Select Extract Element Value > Select Href
Right click the second title > Select Extract Element Value > Select Href
On the Live web helper pop up, it will automatically extract the list of remaining items. You may update the column name if required > Click Finish
Use an "If" action to check if the download directory exists. If it does not, create the directory.
Then, using the action “For each”, we will loop through each of the datarow “CurrentItem“ in the datatable. For each iteration of the loop:
We use the action “Go to web page” to navigate to the URL “%CurrentItem[‘URL’]%”
We use the action “Extract data from web page” to extract the file URL. Remember to specify the store data mode as “Variable”.
While “Extract data from web page” action is open, navigate one of the item manually. Right click the title > Select Extract element value > Select Href > Click Finish
We use the action “Download from web” to download the file
> Specify the “URL” as “%Link%”
> Configure the “Save response” and “File name”
> Specify the “Destination folder”
Close web browser after the end loop action.
Additional Information
Last updated on: 9 Dec 2024
Tested version(s): 2.50.00183.24303
Prerequisites: Browser (e.g. Chrome)
Dependencies: None
Known issues: None
References
Nil