python采集《狂飙》评论

357次阅读
一条评论

共计 2114 个字符,预计需要花费 6 分钟才能阅读完成。

前言

昨晚,2023 年首部爆款剧集《狂飙》迎来大结局,一度冲上热搜第一

python 采集《狂飙》评论

“是非面前稍不留神,就会步入万丈深渊,唯有坚守信仰,才能守得初心”

python 采集《狂飙》评论

面对这么多广大网友的讨论,我也来凑上一个热闹

用 python 采集一下《狂飙》评论数据

代码展示

(源码、教程、文档 点击此处跳转 跳转文末名片加入君羊,找管理员小姐姐领取呀~ )

导入模块

import requests
import parsel
  • 1
  • 2

伪装

headers = {
    'Cookie': 'll="118267"; bid=vmTru_a25m8; __utma=30149280.50068328.1675317520.1675317520.1675317520.1; __utmc=30149280; __utmz=30149280.1675317520.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; ap_v=0,6.0; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1675317540%2C%22https%3A%2F%2Fwww.douban.com%2F%22%5D; _pk_ses.100001.4cf6=*; __utma=223695111.62892083.1675317540.1675317540.1675317540.1; __utmb=223695111.0.10.1675317540; __utmc=223695111; __utmz=223695111.1675317540.1.1.utmcsr=douban.com|utmccn=(referral)|utmcmd=referral|utmcct=/; __gads=ID=fb33508fbeefffdc-22b1618a7fd900c1:T=1675317540:RT=1675317540:S=ALNI_Ma0hUcCRHqTpc0wmcM01k3qpX3big; __gpi=UID=0000099c3e5d1190:T=1675317540:RT=1675317540:S=ALNI_MYqY1aqMuFbXYpmO6sFDn6zMnHB9g; __yadk_uid=KpA5hjYEmww6Sf2qskRgZamuj7aaecAC; ct=y; __utmb=30149280.3.10.1675317520; _vwo_uuid_v2=D091DE0AFC99F8C5AFC3169D9CB1E30F3|b218e266efb05a6a7a8652ac6ceecfe9; _pk_id.100001.4cf6=a8eb1a0fc7d89e94.1675317540.1.1675318206.1675317540.',
    'Host': 'movie.*****.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
}
  • 1
  • 2
  • 3
  • 4
  • 5
for page in range(0, 4000):
    print(page)
  • 1
  • 2

发送请求

    url = f'https://movie.***.com/subject/35465232/comments?start={page*20}&limit=20&status=P&sort=new_score'
  • 1
    response = requests.get(url=url, headers=headers)
    select = parsel.Selector(response.text)
    comments = select.css('.comment-item .comment')
    for comment in comments:
        name = comment.css('.comment-info a::text').get()
        try:
            score_str = comment.css('.comment-info .rating::attr(class)').get()
            score = score_str.replace('0 rating', '').replace('allstar', '')
        except:
            score = 0
        comment_time = comment.css('.comment-info .comment-time::text').get().strip()
        vote_count = comment.css('.comment-vote .votes.vote-count::text').get()
        comment_content = comment.css('.comment-content span::text').get()
        print(name, score, comment_time, vote_count, comment_content)

 

效果展示

python 采集《狂飙》评论

贴出来的代码可以采集前十页的数据

后面的评论就需要登录才可以看到采集拉

你们可以登录后改一下’Cookie’然后就可以全部采集拉~

python 采集《狂飙》评论

 

本文转载自 CDSN

正文完
 
lucky
版权声明:本站原创文章,由 lucky 2023-02-04发表,共计2114字。
转载说明:转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
评论(一条评论)
2024-05-09 18:36:56 回复

Can you be more specific about the content of your article? After reading it, I still have some doubts. Hope you can help me.

 Windows  Edge  美国加利福尼亚旧金山