If you want to scrape content from some websites using Python, this tutorial will definitely go to help you a lot. A few months back, I decided to scrape some site content. I did a lot of research, but I couldn’t find the best tutorial to learn the complete steps.
Because each article contains very small information about python based web scraping. So I decided to write this article.
How to setup Python with Selenium and visit webpages
You need 5 things to perform this action.
- Python Software (You must install Python on your system)
- Code Editor (I prefer Visual Studio code)
- Google Chrome (You must install Chrome on your system if you don’t have one)
- Chromedriver.exe file (from chromium.org)
- Selenium (install the selenium using the Python pip command)
First, install selenium using the following pip command
pip install selenium
In this tutorial, I am using Google Chrome to scrape the website content. So I have to download Chrome driver.exe. Find your currently installed Google Chrome version on your system and download exact same version of the driver from chromium.org.
To find your Google Chrom version, go to
chrome://settings/help
After you identified the version, then go to https://chromedriver.chromium.org/downloads
Download the software. Unzip and store the .exe file in some folder.
Create a new Python file and add the following code.
Final code (Stage 1):-
from selenium import webdriver Driver_location = '/pathto/chromedriver' driver = webdriver.Chrome(executable_path=Driver_location) driver.get('https://google.com')
Note: Replace the ‘/pathto/chromedriver’ with the actual path of the chrome driver file.
Example:-
Driver_location = 'D:\chrome-driver\chromedriver.exe'
If you run the above script, it will open a new chrome browser (without any extensions) and launch google.com
If you got minimized version of the Chrome window, then use the following command to maximize the window.
driver.maximize_window()
If you want to open a webpage for some particular time, and close the browser, use the following commands with the previous one.
import time #It will run the 5 seconds timer time.sleep(5) #to close the browser driver.close()
Python runs the code one by one line. So insert time.sleep(5) after the webpage loaded (driver.get(‘https://google.com’))
Final code (Stage 2):-
from selenium import webdriver import time #change this value with your chromedriver.exe file path Driver_location = 'D:\chrome-driver\chromedriver.exe' driver = webdriver.Chrome(executable_path=Driver_location) driver.get('https://google.com') driver.maximize_window() time.sleep(5) driver.close()
Use the following codes to navigate between chrome tabs.
To open a new empty tab
driver.execute_script("window.open('');")
To switch between tabs (Tabs value starts from zero (ex: window_handles[0])
driver.switch_to.window(driver.window_handles[1])
Change the window_handles[1] to 0 or 1 to navigate between the previous and next tab.
Final code (Stage 3):-
from selenium import webdriver import time #change this value with your chromedriver.exe file path Driver_location = 'D:\chrome-driver\chromedriver.exe' driver = webdriver.Chrome(executable_path=Driver_location) driver.get('https://google.com') driver.maximize_window() #To open a new tab driver.execute_script("window.open('');") time.sleep(5) driver.switch_to.window(driver.window_handles[1]) driver.get("https://facebook.com") driver.switch_to.window(driver.window_handles[0]) time.sleep(5) #To close the browser window driver.close()
To open a webpage in a new tab (Even if you already opened a page)
driver.execute_script("window.open('https://www.youtube.com/', 'new_window')")
To get current opened page details like
To get the currently opened website URL
print (driver.current_url)
To get the current page title
print (driver.title)
To get the webpage source
print(driver.page_source)
SEE ALSO: How to Scrape Data from Website using Python (BeautifulSoup)
Common Selenium Problems & solutions
Problem 1: Selenium automatically closed after it loaded the webpage. We even did not add driver.quit() command
Reason for the problem:-
This problem may occur after selenium adds a new feature to its code.
Solution: Enable the experimental option named detach
from selenium import webdriver from selenium.webdriver.chrome.options import Options options = Options() options.add_experimental_option('detach', True) driver = webdriver.Chrome(options=options)
That’s all. Now launch the website using “driver.get(“www.google.com”) ” command.