Get The Data You Need ( Web Scraping in Python)

nijanthan
3 min readMar 10, 2021

“Torture the data, and it will confess to anything.” — Ronald Coase
“Data is the new oil.” — Clive Humby

Introduction:

In this era, Data is a treasure. Those who have the map ( handling data technique ) to the treasure will become a success. Data are available where ever you see on the internet. Most of the data are public and some data are private. Collecting private data is illegal but most of the big organizations are doing it legally (It is not legal to talk about that thing 😜). So here I am going to tell you how to collect the public data from the website (web scraping).

what is web scraping?

Web scraping is a method to scrape or collect data from websites.

How it’s work?

It fetches the data from DOM objects or HTML tags and then stores it in a structured format. It stores the data according to the user desire format
like CSV, Excel, JSON, etc.

Is it legal?

In India, no law can expressly tell that web scraping is legal. But if you want to scrape any data you need permission from the data owner. Because they can file a case against the scrape. (Eg: In a case before the Delhi High Court, OLX had successfully obtained a permanent restraining order against a company to prevent them from using automated/manual means to scrape any data, including commercial data, about OLX’s website. After that the company removed all its content of the OLX)

Scraping public data is mostly legal if you are using it for non-commercial purposes.

You cannot scrape Facebook without Facebook’s express written permission.
Most of the websites have a robots.txt file. That file will give info on whether the scraper can scrape the website or not. Even if the terms of use of a website prohibit data scraping, the websites mentioned in the robots.txt file have been technically permitted to do it. So before scraping please check for the robots.txt file.

why we are using python for scraping?

Python is the most popular language and very easy to code. There are many libraries available for scraping. The most important methods to scrape the websites using python are

  1. Scraping using BS4 (Beautiful soup)
  2. Scraping using Selenium
  3. Scraping using Scrapy

We can see the detailed view of the above methods in upcoming blogs.

Application:

  1. Price comparison — By using web scraping, We can fetch the data of a certain product from many e-commerce websites.
  2. News Monitoring — Particular news scans are fetched from the news websites and then we can monitor them. Most of the companies are scraping news and make detailed report and provide it to common users as detailed news.
  3. Market Analysis — Stock data of the companies are scraped from the stock website.
  4. Sentimental Analysis — Most of the companies scrape the data of social media (Facebook, Twitter, etc) to find what the general sentiment about their products is.
  5. Email Marketing — Collect emails from the website which is used for bulk email advertisement and digital marketing
  6. etc.

--

--