当前位置：首页 > news >正文

Python实例题：Python自动获取小说工具

news 来源：原创 2025/4/30 0:51:04

Python实例题

题目

Python自动获取小说工具

题目分析

需求理解

要实现一个 Python 工具，能够自动从小说网站获取小说的章节列表和每个章节的具体内容，并将其保存到本地。

关键知识点

网络请求：使用 requests 库向小说网站发送请求，获取网页内容。
HTML 解析：使用 BeautifulSoup 库解析 HTML 页面，提取小说的章节列表和章节内容。
数据存储：将提取的小说内容保存到本地文件。

实现思路分析

获取小说目录页：发送请求获取小说的目录页 HTML 内容。
解析章节列表：使用 BeautifulSoup 解析目录页，提取每个章节的链接和标题。
获取章节内容：遍历章节列表，发送请求获取每个章节的 HTML 内容，并解析出章节的具体文本。
保存小说内容：将每个章节的标题和内容保存到本地文件。

代码实现

import requests
from bs4 import BeautifulSoup
import os

# 小说目录页 URL
novel_url = 'https://www.biquge.cm/1_1094/'

# 请求头，模拟浏览器访问
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}


def get_chapter_list(url):
    """
    获取小说的章节列表
    :param url: 小说目录页 URL
    :return: 章节列表，每个元素为 (章节标题, 章节链接)
    """
    response = requests.get(url, headers=headers)
    response.encoding = response.apparent_encoding
    soup = BeautifulSoup(response.text, 'html.parser')
    chapter_list = []
    for a in soup.find_all('a'):
        if 'href' in a.attrs and a['href'].startswith('/1_1094/'):
            chapter_title = a.text
            chapter_url = 'https://www.biquge.cm' + a['href']
            chapter_list.append((chapter_title, chapter_url))
    return chapter_list


def get_chapter_content(url):
    """
    获取章节的具体内容
    :param url: 章节页面 URL
    :return: 章节内容文本
    """
    response = requests.get(url, headers=headers)
    response.encoding = response.apparent_encoding
    soup = BeautifulSoup(response.text, 'html.parser')
    content_div = soup.find('div', id='content')
    if content_div:
        content = content_div.text
        return content
    return ''


def save_novel(chapter_list):
    """
    保存小说内容到本地文件
    :param chapter_list: 章节列表
    """
    if not os.path.exists('novel'):
        os.makedirs('novel')
    for title, url in chapter_list:
        content = get_chapter_content(url)
        file_name = os.path.join('novel', f'{title}.txt')
        with open(file_name, 'w', encoding='utf-8') as f:
            f.write(title + '\n')
            f.write(content)
        print(f'Saved: {title}')


if __name__ == "__main__":
    chapter_list = get_chapter_list(novel_url)
    save_novel(chapter_list)
    print('小说获取完成！')

代码解释

get_chapter_list 函数：
- 发送请求获取小说目录页的 HTML 内容。
- 使用 BeautifulSoup 解析 HTML，找到所有章节链接，并提取章节标题和链接。
- 返回章节列表，每个元素为 (章节标题，章节链接)。
get_chapter_content 函数：
- 发送请求获取章节页面的 HTML 内容。
- 使用 BeautifulSoup 解析 HTML，找到章节内容所在的 <div> 标签，并提取文本内容。
- 返回章节内容文本。
save_novel 函数：
- 创建一个名为 novel 的文件夹，用于保存小说章节。
- 遍历章节列表，获取每个章节的内容，并保存到以章节标题命名的文本文件中。
- 打印保存信息。
主程序：
- 调用 get_chapter_list 函数获取章节列表。
- 调用 save_novel 函数保存小说内容。
- 打印完成信息。

运行思路

安装依赖库：确保已经安装了 requests 和 beautifulsoup4 库，可以使用 pip install requests beautifulsoup4 进行安装。
修改小说目录页 URL：将 novel_url 替换为你要获取的小说的目录页 URL。
运行脚本：在终端中运行 python novel_fetching_tool.py，脚本将自动获取小说的章节内容并保存到本地。