python怎么爬虫-代理ip-Python学习网

本文教程操作环境：windows7系统、Python 3.9.1，DELL G3电脑。

1、爬虫组合工具

（1）requests + BeautifulSoup

（2）requests + lxml

2、准备工作

（1）首先代码使用python3.x编写的，要有一个本地的python3环境。

python下载地址页面：https://www.python.org/downloads/release/python-370/

（2）然后要有一个开发工具，推荐PyCharm，一款很好的Python交互IDE。Python自带编译器 -- IDLE也可以。

（3）使用优质的ip代理

有免费的和收费的，这里推荐品易云http代理。

3、实例

使用requests + BeautifulSoup + select css选择器

# select method
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'}
 
url = 'http://news.qq.com/'
 
Soup = BeautifulSoup(requests.get(url=url, headers=headers).text.encode("utf-8"), 'lxml')
 
em = Soup.select('em[class="f14 l24"] a')
for i in em:
 
    title = i.get_text()
 
    link = i['href']
 
    print({'标题': title,
'链接': link
 
    })
select method
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'}
 
url = 'http://news.qq.com/'
 
Soup = BeautifulSoup(requests.get(url=url, headers=headers).text.encode("utf-8"), 'lxml')
 
em = Soup.select('em[class="f14 l24"] a')
for i in em:
 
    title = i.get_text()
 
    link = i['href']
 
    print({'标题': title,
'链接': link
 
    })

以上就是python爬虫的使用，在爬虫工具的选择上比较丰富，大家做了基础的准备工作后，就可以正式运行代码体会了。爬虫在采集大量数据的时候，使用HTTP代理IP配合会比较便捷。希望对大家有所帮助。更多常见问题解决：爬虫

python怎么爬虫

相关文章推荐

相关课程推荐

全部评论我要评论

Python学习网