Scraping with Python, Scrapy, Beautiful Soup or Selenium ?

Scrap is the Technic to pic public information from other web site and save it to analyse or only for you to manage it as different topics, let's see a few of legal examples:

  • Read different product prices from different web sites, this save time to you if you do the same manually.
  • Convert html tables to excel files, 
  • Check information on different social media media account that you own, 
  • Download pictures from different sites, 

All this task can be annoy if you do it manually in a huge web sites list, if you have a few knowledge about programming language, there are some interesting alternatives than software like web-scraper

lets see a comparison table of scrap libraries in python and I'll how some examples of each one, as soon i code it :-)



ScrapyBeautiful SoupSelenium

Pros:
Robust
Portable
Efficient
Pros:
Easy to learn
Friendly Interface
extensions
Pros:
JavaScript friendly
perfect for test automation
you can run it on dialog mode

Cons:
coding knowledge
Cons:
inconsistent

Big dependencies
Cons:
not really a full web scraper beside it does similar thinks

Lets see some examples of this:

Scrapy: Extract text from div's and span elements.

import scrapy
class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
        'http://quotes.toscrape.com/page/2/',
    ]
    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
                'tags': quote.css('div.tags a.tag::text').getall(),

            }


Beautiful Soupbla bla
<code comming soon>
 
Seleniumbla bla
<code comming soon>


Code Sources:

 https://docs.scrapy.org/en/latest/intro/tutorial.html