用Python爬取B站热门视频并自动保存到本地
一、项目功能简介
我们要做的事情很简单:
-
访问B站官方开放的热门视频接口
-
提取出热门视频的关键信息,比如标题、作者、播放量、点赞数等等
-
把这些数据保存到本地的
.json
和.csv
文件里 -
支持每天定时自动执行,不用手动点运行!
最终效果是,每天都会在本地生成一份当天的热门视频数据。
二、准备工作
在开始之前,你需要安装以下Python第三方库:
pip install requests schedule
-
requests
用来发起HTTP请求 -
schedule
用来做定时任务 -
(
csv
、json
、datetime
是Python内置库,不用额外安装)
三、完整代码
直接上代码!可以一键复制运行:
import requests
import json
import csv
import time
import schedule
from datetime import datetime# 请求头(伪装浏览器)
HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36','Referer': 'https://www.bilibili.com/'
}# 获取B站热门视频数据
def get_bilibili_hot_videos(limit=20):url = f'https://api.bilibili.com/x/web-interface/popular?ps={limit}'try:response = requests.get(url, headers=HEADERS)response.raise_for_status()data = response.json()if data['code'] == 0:videos = []for item in data['data']['list']:video_info = {'aid': item['aid'],'bvid': item['bvid'],'title': item['title'],'author': item['owner']['name'],'view': item['stat']['view'],'danmaku': item['stat']['danmaku'],'reply': item['stat']['reply'],'favorite': item['stat']['favorite'],'coin': item['stat']['coin'],'share': item['stat']['share'],'like': item['stat']['like'],'duration': item['duration'],'pubdate': item['pubdate'],'ctime': datetime.now().strftime('%Y-%m-%d %H:%M:%S')}videos.append(video_info)return videoselse:print(f"获取数据失败: {data['message']}")return Noneexcept Exception as e:print(f"请求出错: {str(e)}")return None# 保存到JSON文件
def save_to_json(data, filename):try:with open(filename, 'w', encoding='utf-8') as f:json.dump(data, f, ensure_ascii=False, indent=4)print(f"数据已保存到 {filename}")except Exception as e:print(f"保存JSON文件出错: {str(e)}")# 保存到CSV文件
def save_to_csv(data, filename):if not data:returntry:with open(filename, 'w', encoding='utf-8', newline='') as f:writer = csv.DictWriter(f, fieldnames=data[0].keys())writer.writeheader()writer.writerows(data)print(f"数据已保存到 {filename}")except Exception as e:print(f"保存CSV文件出错: {str(e)}")# 每日定时任务
def daily_crawl_task():print(f"开始执行每日爬取任务: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")videos = get_bilibili_hot_videos(limit=50)if videos:date_str = datetime.now().strftime('%Y%m%d')json_filename = f'bilibili_hot_videos_{date_str}.json'csv_filename = f'bilibili_hot_videos_{date_str}.csv'save_to_json(videos, json_filename)save_to_csv(videos, csv_filename)print("每日爬取任务完成\n")# 主程序入口
def main():schedule.every().day.at("10:00").do(daily_crawl_task) # 设置每天10点自动执行daily_crawl_task() # 启动时先爬一次print("定时爬虫已启动,每天10:00自动爬取B站热门视频...")while True:schedule.run_pending()time.sleep(60)if __name__ == '__main__':main()
四、效果展示
运行程序后,你会在同目录下看到类似这样的文件:
-
bilibili_hot_videos_20250423.json
-
bilibili_hot_videos_20250423.csv
里面记录了当天的热门视频详细信息。比如CSV文件打开长这样👇: