当前位置: 首页 > news >正文

【爬虫】DrissionPage-获取douyim用户下的视频

之前看过DrissionPage,觉得很厉害,比selenium简单,适合新手。因为盲目跟风逆向,今天看了一个DrissionPage案例直播,学习一下,真香哈。

DrissionPage官网:🛰️ 概述 | DrissionPage官网

需求分析:

爬取douyim用户下的视频。分四个步骤:(最后予完整代码)

 实现步骤:

1. 获取到数据包的 json 数据

浏览器开发者工具来分析:

按f2,就一个直接就找到了,可能运气成分:

代码部分:

# 导入自动化模块包
from DrissionPage import ChromiumPage as cpdp = cp()  # 实例化一个浏览器对象dp.listen.start('web/aweme/post')  # 开启监听,监听所有的操作dp.get('https://www.douyin.com/user/MS4wLjABAAAAx7--dRYA0mPwhwvxNJ-35i6sB8d1Kv4Sj1WmugquqiHK19QYlB18Ikx6cECT1RVO?from_tab_name=main&showTab=post')# 等待数据包并获取它
data = dp.listen.wait()
json_data = data.response.body  # 从数据包中获取json数据# 打印json数据
print(json_data)

2. 解析数据

就是通过键值对获取的方式

 代码部分:

# --------解析数据---------
# 遍历json数据的视频所在列表,拿到 标题 ,视频链接 ,视频ID
json_data = json_data['aweme_list']
for i in json_data:title = i['desc']  # 标题video_url = i['video']['play_addr']['url_list'][0]  # 视频链接video_id = i['aweme_id']  # 视频ID# 下载视频import osif not os.path.exists('./video'):  # 如果不存在video文件夹就创建一个os.mkdir('./video')video_content = requests.get(video_url,headers=headers).contentwith open(f'./video/{title}-{video_id}.mp4','wb') as f:  # 保存视频f.write(video_content)print(title,video_id,video_url)  # 打印

3. 保存视频

为什么要加 try 捕获异常是因为有些是图文,不是视频,程序运行会报错的。 

代码部分:

    # 下载视频import osif not os.path.exists('./video'):  # 如果不存在video文件夹就创建一个os.mkdir('./video')video_content = requests.get(video_url,headers=headers).contentwith open(f'./video/{title}-{video_id}.mp4','wb') as f:  # 保存视频f.write(video_content)

 4. 滚动页面:

其实有很多方法,看看官网,这里用的是css定位元素的方式。

🛰️ 元素交互 | DrissionPage官网

# 模拟滚动
tab = dp.ele('css:.ayFW3zux')
dp.scroll.to_see(tab)

完整代码:

# --------请求下载视频---requests---------
# 导入请求模块
import requests# 模拟伪装,referer 防盗链参数必须加,是因为抖音的视频是通过referer来判断是否是来自抖音的请求
headers = {'Cookie':'UIFID_TEMP=d42e6e1cd8d751060412d7a6f8b88e73787f686280d2b04969f7be98a81bcfeab7c5fa21b4ba9f3b4996435adfa6ea387e4ef8c399c158c8c557c5155ba553eeab9e17f762b160298890a9671639ff40; fpk1=U2FsdGVkX1+4XWEGtp8nHfpnuY/mjw1pF1fs1QaREN2E+4sCUAI2f/8+whtBvZ2JpHtdCAx9Q4h3MYkr5XhObw==; fpk2=a16ddaab909d2cf27fce353f26dd2ff2; UIFID=d42e6e1cd8d751060412d7a6f8b88e73787f686280d2b04969f7be98a81bcfeab7c5fa21b4ba9f3b4996435adfa6ea388efa7e6a3c894c5d5ccaf7877f18bc3881332b3c108e7facfb7fbbf943b86af535c00cf61ac78c3e6d14a88d40438e519d8b3afe6b8ea3c5b940c528da4e1330372bad55ca598810a3770be41d5799542c939ff40099b794b2e4f44aa22a9a7dd44b9e5342a62bfc8341204fc8b3abbc; csrf_session_id=77f1ddc0d383baa6888bd27425ac0006; is_staff_user=false; SEARCH_RESULT_LIST_TYPE=%22single%22; passport_assist_user=CkFISWrdf4TGRRYuSvpBsN1e-LIzc61qD1l7RCpMxs77nNqyKHZAOMAX7IquTQw8jiH0FtCDUcXqKDnFg_TeH-KqNBpKCjxDfhRMGtgLZZ0jvyBDqN13Em4qO3zQVYMgYaWN5SR0Wk5WNOEe1rRbLXaG8hyztNvo7-tnHSSbQ2rzpA4QpbrmDRiJr9ZUIAEiAQO605iB; uid_tt=3aad5a01473a92a367fbb427a8dc8fd1; uid_tt_ss=3aad5a01473a92a367fbb427a8dc8fd1; sid_tt=f2a11ada0a99bd065517c0d345e4d54a; sessionid=f2a11ada0a99bd065517c0d345e4d54a; sessionid_ss=f2a11ada0a99bd065517c0d345e4d54a; passport_csrf_token=5bffbc8074e17276b412b6937e7a16a5; bd_ticket_guard_client_web_domain=2; douyin.com; device_web_cpu_core=16; device_web_memory_size=8; hevc_supported=true; dy_swidth=1707; dy_sheight=1067; __security_mc_1_s_sdk_crypt_sdk=40f47ba1-4dc9-863a; __security_mc_1_s_sdk_cert_key=e63c5915-4014-a214; __security_mc_1_s_sdk_sign_data_key_web_protect=56cd4cb0-4e38-9fb8; is_dash_user=1; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Afalse%2C%22volume%22%3A0.51%7D; s_v_web_id=verify_m9tpjol8_8AiUEWPm_mVrq_47t0_Bfpz_VDjuchxS8I4Y; sid_guard=f2a11ada0a99bd065517c0d345e4d54a%7C1745399076%7C5184000%7CSun%2C+22-Jun-2025+09%3A04%3A36+GMT; sid_ucp_v1=1.0.0-KGQ0ZGYxY2UwMDM1ODg0NDQwMmYzMjRiZDAyNTk3MTJkMDk0YzBlMzEKGwi4tNDQ0I27BRCk2qLABhjvMSAMOAZA9AdIBBoCbGYiIGYyYTExYWRhMGE5OWJkMDY1NTE3YzBkMzQ1ZTRkNTRh; ssid_ucp_v1=1.0.0-KGQ0ZGYxY2UwMDM1ODg0NDQwMmYzMjRiZDAyNTk3MTJkMDk0YzBlMzEKGwi4tNDQ0I27BRCk2qLABhjvMSAMOAZA9AdIBBoCbGYiIGYyYTExYWRhMGE5OWJkMDY1NTE3YzBkMzQ1ZTRkNTRh; live_use_vvc=%22false%22; xgplayer_user_id=740434788336; xgplayer_device_id=46605573762; ttwid=1%7CYtvmoWaQoIoT6lDfBN3mTA4u5Gdp0-z8cMxUyA5Z2MY%7C1745411664%7C646ab1cdbb2bf9c40fd7e5b6cd236061e3e4c8b5870ee035ee5417e6efdc1ef5; passport_fe_beating_status=true; xg_device_score=7.659677575262982; my_rd=2; stream_player_status_params=%22%7B%5C%22is_auto_play%5C%22%3A0%2C%5C%22is_full_screen%5C%22%3A0%2C%5C%22is_full_webscreen%5C%22%3A0%2C%5C%22is_mute%5C%22%3A0%2C%5C%22is_speed%5C%22%3A1%2C%5C%22is_visible%5C%22%3A0%7D%22; publish_badge_show_info=%220%2C0%2C0%2C1745477129551%22; strategyABtestKey=%221745546882.414%22; biz_trace_id=190a294e; _bd_ticket_crypt_cookie=aa406af6d4786da0e1a600d8eec6733c; FOLLOW_RED_POINT_INFO=%221%22; FOLLOW_NUMBER_YELLOW_POINT_INFO=%22MS4wLjABAAAAs5GeYafYBCD76fhhG9xmPTMGt4m7bxVsVgtI2xIrDd4f2F5bwoyXWl1x0SgcymKW%2F1745596800000%2F1745574470779%2F1745574470586%2F0%22; FRIEND_NUMBER_RED_POINT_INFO=%22MS4wLjABAAAAs5GeYafYBCD76fhhG9xmPTMGt4m7bxVsVgtI2xIrDd4f2F5bwoyXWl1x0SgcymKW%2F1745596800000%2F1745574475173%2F0%2F0%22; SelfTabRedDotControl=%5B%7B%22id%22%3A%227234445667354019898%22%2C%22u%22%3A714%2C%22c%22%3A714%7D%2C%7B%22id%22%3A%227176589422232619008%22%2C%22u%22%3A224%2C%22c%22%3A224%7D%2C%7B%22id%22%3A%227326135789812811827%22%2C%22u%22%3A44%2C%22c%22%3A44%7D%2C%7B%22id%22%3A%227316397227957651508%22%2C%22u%22%3A435%2C%22c%22%3A435%7D%5D; __ac_nonce=0680b5ccd00011b417918; __ac_signature=_02B4Z6wo00f01JJnf8gAAIDCHE9VYiYp8giSR3tAAEyX4c; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1707%2C%5C%22screen_height%5C%22%3A1067%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A16%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A100%7D%22; FOLLOW_LIVE_POINT_INFO=%22MS4wLjABAAAAs5GeYafYBCD76fhhG9xmPTMGt4m7bxVsVgtI2xIrDd4f2F5bwoyXWl1x0SgcymKW%2F1745596800000%2F0%2F1745576152449%2F0%22; home_can_add_dy_2_desktop=%221%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCSk8vYktlRmJjcnNKN3ZFdk9mWUlpTGtGcFBFRE9HYlJ2Uk9UN2ZSZEszRko3L3EzbEdpYjF1L3J5WUw3QTU3aFZiYVgvcXZxeXBPRFBuVnV6L0hEWnc9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoyfQ%3D%3D; odin_tt=68649f1a25bb97f1351835ecad1f6853050360fba260ef27a15c43e1850bc75557c2534169ba7a75e2e3d601f117d4aa; IsDouyinActive=true','Referer':'https://www.douyin.com/user/MS4wLjABAAAAx7--dRYA0mPwhwvxNJ-35i6sB8d1Kv4Sj1WmugquqiHK19QYlB18Ikx6cECT1RVO?from_tab_name=main&showTab=post','User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36 115Browser/27.0.6.3'
}# ---------抓包获得数据---DrissionPage--------------
# 导入自动化模块包
from DrissionPage import ChromiumPage as cpdp = cp()  # 实例化一个浏览器对象dp.listen.start('web/aweme/post')  # 开启监听,监听所有的操作dp.get('https://www.douyin.com/user/MS4wLjABAAAAx7--dRYA0mPwhwvxNJ-35i6sB8d1Kv4Sj1WmugquqiHK19QYlB18Ikx6cECT1RVO?from_tab_name=main&showTab=post')
for j in range(1,11):try:print(f'第{j}页数据获取中......')# 等待数据包并获取它data = dp.listen.wait(timeout=5)json_data = data.response.body  # 从数据包中获取json数据# 打印json数据# print(json_data)# --------解析数据---------# 遍历json数据的视频所在列表,拿到 标题 ,视频链接 ,视频IDjson_data = json_data['aweme_list']for i in json_data:title = i['desc']  # 标题video_url = i['video']['play_addr']['url_list'][0]  # 视频链接video_id = i['aweme_id']  # 视频ID# 下载视频import osif not os.path.exists('./video'):  # 如果不存在video文件夹就创建一个os.mkdir('./video')video_content = requests.get(video_url,headers=headers).contentwith open(f'./video/{title}-{video_id}.mp4','wb') as f:  # 保存视频f.write(video_content)print(title,video_id,video_url)  # 打印except Exception as e:  # 如果超时就跳过pass# 模拟滚动tab = dp.ele('css:.ayFW3zux')dp.scroll.to_see(tab)

相关文章:

  • 时间复杂度分析
  • GIS开发笔记(15)基于osg和osgearth实现三维地图上添加路网数据(矢量shp)
  • 什么是大模型(LLMs)?一文读懂什么是大模型
  • windows编程字符串处理
  • windows服务器及网络:搭建FTP服务器
  • 【C++】继承----下篇
  • BUUCTF-[GWCTF 2019]re3
  • 大模型驱动智能服务变革:从全流程赋能到行业纵深落地
  • DPIN河内AI+DePIN峰会:共绘蓝图,加速构建去中心化AI基础设施新生态
  • 【合新通信】---浸没式液冷光模块化学兼容性测试方法
  • Lesar: 面向 Lustre/Scade 语言的形式化模型检查工具
  • DeepSeek/AI驱动的销售业绩倍增实战
  • 施工安全巡检二维码制作
  • Linux文件管理(2)
  • PyTorch 实现食物图像分类实战:从数据处理到模型训练
  • 聚力共赢:超聚变联合枫清科技,构建“算力底座+知识中台”企业智能化新引擎
  • 【C/C++】深入理解指针(五)
  • 智慧联络中心SaaS平台Java项目面试实战
  • Linux操作系统从入门到实战(三)Linux基础指令(上)
  • 【现代深度学习技术】循环神经网络06:循环神经网络的简洁实现
  • 第152次中老缅泰湄公河联合巡逻执法行动圆满结束
  • 中法共创《海底两万里》,演员保剑锋重回戏剧舞台演船长
  • 我国首次实现地月距离尺度的卫星激光测距
  • 乐聚创始人:人形机器人当前要考虑泡沫问题,年底或将进入冷静期
  • 上海开展2025年“人民城市 文明风采”群众性主题活动
  • 复旦大学校友夫妇一次性捐赠10亿元,成立学敏高等研究院