by Arun Vasudev N Jan 31, 2023 Python

Python Web Scraping with Selenium (Tutorial for Beginners)

If you want to scrape content from some websites using Python, this tutorial will definitely go to help you a lot. A few months back, I decided to scrape some site content. I did a lot of research, but I couldn't find the best tutorial to learn the complete steps.

Because each article contains very small information about python based web scraping. So I decided to write this article.

How to setup Python with Selenium and visit webpages

You need 5 things to perform this action.

Python Software (You must install Python on your system)
Code Editor (I prefer Visual Studio code)
Google Chrome (You must install Chrome on your system if you don't have one)
Chromedriver.exe file (from chromium.org)
Selenium (install the selenium using the Python pip command)

First, install selenium using the following pip command

pip install selenium

In this tutorial, I am using Google Chrome to scrape the website content. So I have to download Chrome driver.exe. Find your currently installed Google Chrome version on your system and download exact same version of the driver from chromium.org.

To find your Google Chrom version, go to

chrome://settings/help

After you identified the version, then go to https://chromedriver.chromium.org/downloads

Download the software. Unzip and store the .exe file in some folder.

Create a new Python file and add the following code.

Final code (Stage 1):-

from selenium import webdriver
Driver_location = '/pathto/chromedriver'
driver = webdriver.Chrome(executable_path=Driver_location)
driver.get('https://google.com')

Note: Replace the ‘/pathto/chromedriver' with the actual path of the chrome driver file.

Example:-

Driver_location = 'D:\chrome-driver\chromedriver.exe'

If you run the above script, it will open a new chrome browser (without any extensions) and launch google.com

If you got minimized version of the Chrome window, then use the following command to maximize the window.

driver.maximize_window()

If you want to open a webpage for some particular time, and close the browser, use the following commands with the previous one.

import time

#It will run the 5 seconds timer
time.sleep(5)

#to close the browser
driver.close()

Python runs the code one by one line. So insert time.sleep(5) after the webpage loaded (driver.get(‘https://google.com'))

Final code (Stage 2):-

from selenium import webdriver

import time

#change this value with your chromedriver.exe file path
Driver_location = 'D:\chrome-driver\chromedriver.exe' 

driver = webdriver.Chrome(executable_path=Driver_location) driver.get('https://google.com')

driver.maximize_window()

time.sleep(5)

driver.close()

Use the following codes to navigate between chrome tabs.

To open a new empty tab

driver.execute_script("window.open('');")

To switch between tabs (Tabs value starts from zero (ex: window_handles[0])

driver.switch_to.window(driver.window_handles[1])

Change the window_handles[1] to 0 or 1 to navigate between the previous and next tab.

Final code (Stage 3):-

from selenium import webdriver
import time

#change this value with your chromedriver.exe file path
Driver_location = 'D:\chrome-driver\chromedriver.exe'

driver = webdriver.Chrome(executable_path=Driver_location) 
driver.get('https://google.com')
driver.maximize_window()

#To open a new tab
driver.execute_script("window.open('');")
time.sleep(5)

driver.switch_to.window(driver.window_handles[1])
driver.get("https://facebook.com")
driver.switch_to.window(driver.window_handles[0])
time.sleep(5)

#To close the browser window
driver.close()

To open a webpage in a new tab (Even if you already opened a page)

driver.execute_script("window.open('https://www.youtube.com/', 'new_window')")

To get current opened page details like

To get the currently opened website URL

print (driver.current_url)

To get the current page title

print (driver.title)

To get the webpage source

print(driver.page_source)

Common Selenium Problems & solutions

Problem 1: Selenium automatically closed after it loaded the webpage. We even did not add driver.quit() command

Reason for the problem:-

This problem may occur after selenium adds a new feature to its code.

Solution: Enable the experimental option named detach

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options 

options = Options() 
options.add_experimental_option('detach', True)
driver = webdriver.Chrome(options=options)

That's all. Now launch the website using “driver.get(“www.google.com”) ” command.

Tags: Free Advice