谷歌图像爬虫

一个通过搜索关键词,下载谷歌的搜索结果图片的爬虫。比如:配置了关键词“猫”,“狗”,运行代码,就会把Google搜索中“猫”、“狗”的图片分别以目录的方式保存。代码运行中会弹出Chome浏览器,这是在模拟浏览器访问,不用管就行了。

开源:https://github.com/ohyicong/Google-Image-Scraper

安装使用很简单,配置好python环境后,安装分三步:

git clone https://github.com/ohyicong/Google-Image-Scraper
pip install selenium, requests, pillow
python main.py

样例代码,也就是main.py的代码也很简单:

#Import libraries (Don't change)
from GoogleImageScrapper import GoogleImageScraper
import os
from patch import webdriver_executable

#Define file path (Don't change)
webdriver_path = os.path.normpath(os.path.join(os.getcwd(), 'webdriver', webdriver_executable()))
image_path = os.path.normpath(os.path.join(os.getcwd(), 'photos'))

#Add new search key into array ["cat","t-shirt","apple","orange","pear","fish"]
search_keys= ["cat","t-shirt"]

#Parameters
number_of_images = 10
headless = True
min_resolution=(0,0)
max_resolution=(1920,1080)

#Main program
for search_key in search_keys:
image_scrapper = GoogleImageScraper(webdriver_path,image_path,search_key,number_of_images,headless,min_resolution,max_resolution)
image_urls = image_scrapper.find_image_urls()
image_scrapper.save_images(image_urls)

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据