Scrap is the Technic to pic public information from other web site and save it to analyse or only for you to manage it as different topics, let's see a few of legal examples:
- Read different product prices from different web sites, this save time to you if you do the same manually.
- Convert html tables to excel files,
- Check information on different social media media account that you own,
- Download pictures from different sites,
All this task can be annoy if you do it manually in a huge web sites list, if you have a few knowledge about programming language, there are some interesting alternatives than software like web-scraper
lets see a comparison table of scrap libraries in python and I'll how some examples of each one, as soon i code it :-)
Scrapy | Beautiful Soup | Selenium | |
Pros: Robust Portable Efficient | Pros: Easy to learn Friendly Interface extensions | Pros: JavaScript friendly perfect for test automation you can run it on dialog mode | |
Cons: coding knowledge | Cons: inconsistent Big dependencies | Cons: not really a full web scraper beside it does similar thinks | |
Lets see some examples of this:
Scrapy: Extract text from div's and span elements.
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
def parse(self, response):
for quote in response.css('div.quote'):
yield {
'text': quote.css('span.text::text').get(),
'author': quote.css('small.author::text').get(),
'tags': quote.css('div.tags a.tag::text').getall(),
}
Beautiful Soup: bla bla
<code comming soon>
Selenium: bla bla
<code comming soon>
Code Sources:
https://docs.scrapy.org/en/latest/intro/tutorial.html