当前位置: 首页 > news >正文

前端渲染pdf文件解决方案-pdf.js

目录

一、前言

二、简介

1、pdf.js介绍

2、插件版本参数

三、通过viewer.html实现预览(推荐)

1、介绍

2、部署

【1】下载插件包

【2】客户端方式

【3】服务端方式(待验证)

3、使用方法

【1】预览PDF文件

【2】外部搜索条件触发pdf.js的搜索逻辑

四、把pdf渲染为canvas实现预览

1、安装

2、功能实现

【1】实现pdf预览

【2】实现pdf内容文本可选进行复制

【3】实现搜索,匹配内容高亮,并且可以跳转至匹配内容的位置

【4】获取pdf文件中目录的数据结构


一、前言

        在前端开发中,渲染PDF文件一直是一项重要而挑战性的任务。而今,我们可以借助pdf.js库来轻松实现这一目标。pdf.js是一个开源的JavaScript库,它可以在浏览器中渲染PDF文件,实现了在网页上查看PDF文档的功能。它提供了丰富的API和功能,使得在前端页面展示PDF文件变得轻而易举。让我们一起探索pdf.js的奇妙之处,轻松实现前端PDF文件的渲染与展示吧!

二、简介

1、pdf.js介绍

        pdf.js是一款基于JavaScript的开源PDF阅读器组件,可以在网页中直接显示和操作PDF文件,目前已知的前端渲染pdf组件都是基于pdf.js进行封装。

git地址:https://github.com/mozilla/pdf.js

注:开源且免费

2、插件版本参数

插件版本
Nodev22.13.0

@types/react

^18.0.33

@types/react-dom

^18.0.11
pdfjs-2.5.207-es5-dist.zip (viewer.js使用方式)2.5.207
pdfjs-dist   (canvas渲染方式)

3.6.172

三、通过viewer.html实现预览(推荐)

1、介绍

        除了PDF预览,还待配套的工具栏,支持功搜索、缩放、目录、打印等功能~ 

Demo如图:

2、部署

【1】下载插件包

下载地址:https://github.com/mozilla/pdf.js/releases/tag/v2.5.207

【2】客户端方式

       把下载后的pdfjs-2.5.207-es5-dist.zip解压后,放在项目中的public文件夹下

【3】服务端方式(待验证)

   可将pdf.js包 放到服务器上 如:http://[ip]:[port]/static/pdfjs

3、使用方法

【1】预览PDF文件

        1)客户端方式(基于React框架为例)

const viewPDF: React.FC = () => {// pdf文件路径,放在项目的public目录下const pdfUrl = '/A.pdf'; //pdf.js库的代码,放在项目的public目录下const pdfServerUrl = '/pdfjs-2.5.207-es5-dist/web/viewer.html'const url = `${pdfServerUrl}?file=${pdfUrl}`return <><h1>pdf 搜索(基于pdf-dist,pdf_viewer.html)</h1><iframe id='pdfIframe' src={url} width="100%" height="100%"></iframe></>;
}

        2)服务端方式(待整理)

【2】外部搜索条件触发pdf.js的搜索逻辑

  • 跳转至第一个匹配的内容
  • 匹配内容高亮
const viewPDF: React.FC = () => {// pdf文件路径,放在项目的public目录下const pdfUrl = '/A.pdf'; //pdf.js库的代码,放在项目的public目录下const pdfServerUrl = '/pdfjs-2.5.207-es5-dist/web/viewer.html'const url = `${pdfServerUrl}?file=${pdfUrl}`let pdfContentWindow: any = null; //缓存iframContentconst getPdfContent = () => {const pdfFrame: any = document.getElementById('pdfIframe');if (!pdfFrame) {return;}pdfContentWindow = pdfFrame.contentWindow;//pdf组件部分信息,包括:当前页码、总共页码等console.log('page===>', pdfContentWindow.PDFViewerApplication); }const onSearchForOut = (searchText: string) => {pdfContentWindow.postMessage(searchText, '*');pdfContentWindow.addEventListener('message', (e: any) => {// 高亮匹配结果pdfContentWindow.PDFViewerApplication.findBar.findField.value = e.data;pdfContentWindow.PDFViewerApplication.findBar.highlightAll.checked = true;                        pdfContentWindow.PDFViewerApplication.findBar.dispatchEvent('highlightallchange');//触发搜索项‘下一个’事件pdfContentWindow.PDFViewerApplication.findBar.dispatchEvent('again', false);}, false);}useEffect(() => {getPdfContent();setTimeout(() => {// 外部的搜索条件onSearchForOut('阳区CBD核心区')}, 3* 1000)}, []);  return <><h1>pdf 搜索(基于pdf-dist,pdf_viewer.html)</h1><iframe id='pdfIframe' src={url} width="100%" height="100%"></iframe></>;
}

四、把pdf渲染为canvas实现预览

1、安装

npm install pdfjs-dist --save

2、功能实现

【1】实现pdf预览

import { Button } from 'antd';
import { useState, useEffect, useRef } from 'react';
import * as pdfjsLib from 'pdfjs-dist'; // 引入pdfjs-dist
const pdfUrl = '/zyk.pdf'; // pdf 文件路径,pdf文件存放于public目录下
const workerUrl = `/pdf.worker.min.js`; //webworker存放于public目录下
pdfjsLib.GlobalWorkerOptions.workerSrc = workerUrl;const viewPdf = (props: {height: string}) => {const {height} = props;const pdfContainerRef = useRef<any>(null);const [pagesList, setPagesList] = useState<any>([]);const scale = 2; // 缩放比例// 渲染单个页面const renderPage = async (page: any, pageNumber: number) => {const viewport = page.getViewport({ scale });const pageContentDom = document.createElement('div');pageContentDom.id = `pdfPage-content-${pageNumber}`;pageContentDom.style.width = `${viewport.width}px`;pageContentDom.style.height = `${viewport.height}px`;pageContentDom.style.position = 'relative';// 创建 Canvas 元素const canvas = document.createElement('canvas');const context = canvas.getContext('2d');canvas.id=`pdfPage-${pageNumber}`canvas.width = viewport.width;canvas.height = viewport.height;canvas.style.border = '1px solid black';pageContentDom.appendChild(canvas);pdfContainerRef.current.appendChild(pageContentDom);// 渲染 PDF 页面到 Canvasawait page.render({canvasContext: context,viewport,}).promise;};// 渲染 PDF 页面const renderPagesGroup = ( pages: any) => {pages.forEach(({page}:any, index: number) => {renderPage(page, index);});};// 加载 PDF 文件const loadPdf = async (url: any) => {const pdf = await pdfjsLib.getDocument(url).promise;const pages: any[] = [];for (let i = 1; i <= pdf.numPages; i++) {const page = await pdf.getPage(i);const textContent = await page.getTextContent();pages.push({page,textContent});}setPagesList(pages);renderPagesGroup(pages);};useEffect(() => {loadPdf(pdfUrl);}, []);return <><div><h1>PDF 搜索(基于@pdfjs-dist-自定义实现)</h1> <div><div style={{ height: height || '500px' }}>{/* PDF 容器 */}<div ref={pdfContainerRef} style={{ position: 'relative', height: '100%', overflowY: 'scroll' }} /></div></div></div></>
};export default viewPdf;

【2】实现pdf内容文本可选进行复制

...
//基于“【1】实现pdf预览”代码, 修改renderPage方法// 渲染单个页面const renderPage = async (page: any, pageNumber: number) => {const viewport = page.getViewport({ scale });const pageContentDom = document.createElement('div');pageContentDom.id = `pdfPage-content-${pageNumber}`;//add-begin: 文本可选则  为了文本层和canvas层重叠,利用组件库的类名(类名不能修改)pageContentDom.className = 'pdfViewer';pageContentDom.style.setProperty('--scale-factor', scale as any);//add-end: 文本可选则pageContentDom.style.width = `${viewport.width}px`;pageContentDom.style.height = `${viewport.height}px`;pageContentDom.style.position = 'relative';// 创建 Canvas 元素const canvas = document.createElement('canvas');const context = canvas.getContext('2d');canvas.id=`pdfPage-${pageNumber}`canvas.width = viewport.width;canvas.height = viewport.height;canvas.style.border = '1px solid black';pageContentDom.appendChild(canvas);createHeightLightCanvas(viewport, pageNumber, pageContentDom);pdfContainerRef.current.appendChild(pageContentDom);// 渲染 PDF 页面到 Canvasawait page.render({canvasContext: context,viewport,}).promise;//add-begin: 文本可选则const textLayerDiv = document.createElement('div');textLayerDiv.style.width = viewport.width;textLayerDiv.style.height = viewport.height;//为了文本层和canvas层重叠,利用组件库的类名textLayerDiv.className = 'textLayer';const textContent = await page.getTextContent();pdfjsLib.renderTextLayer({textContentSource: textContent,container: textLayerDiv,viewport: viewport,textDivs: [],});pageContentDom.appendChild(textLayerDiv);//add-end: 文本可选则};

【3】实现搜索,匹配内容高亮,并且可以跳转至匹配内容的位置

import { Button } from 'antd';
import { useState, useEffect, useRef } from 'react';
import * as pdfjsLib from 'pdfjs-dist'; // 引入pdfjs-dist
const pdfUrl = '/zyk.pdf'; // pdf 文件路径,pdf文件存放于public目录下
const workerUrl = `/pdf.worker.min.js`; //webworker存放于public目录下
pdfjsLib.GlobalWorkerOptions.workerSrc = workerUrl;const viewPdf = (props: {height: string}) => {const {height} = props;const [searchText, setSearchText] = useState('');const pdfContainerRef = useRef<any>(null);const [pagesList, setPagesList] = useState<any>([]);const [matchList, setMatchList] = useState<any>([]);const scale = 2; // 缩放比例const createHeightLightCanvas = (viewport: any, pageNumber: number, parentDom: any) => {// 为每页创建一个高亮层canvasconst highlightCanvas = document.createElement('canvas');highlightCanvas.id = `highlightCanvas-${pageNumber}`;highlightCanvas.className = 'highlightCanvas';highlightCanvas.width = viewport.width;highlightCanvas.height = viewport.height;highlightCanvas.style.position = 'absolute';highlightCanvas.style.top = '0';highlightCanvas.style.left = '0';highlightCanvas.style.zIndex = '1';parentDom.appendChild(highlightCanvas);}// pageNumber 页码(从0开始)const jumpToPage = (pageIndex: number) => {let beforeCanvasHeight = 0;for (let i = 0; i < pageIndex; i++) {const canvasParentDom = pdfContainerRef.current.querySelector(`#pdfPage-content-${i}`);let canvasParentHeight = canvasParentDom.style.height.replace('px', '');beforeCanvasHeight += Number(canvasParentHeight);}pdfContainerRef.current.scrollTo({top: beforeCanvasHeight, // 垂直滚动位置behavior: 'smooth'});}const getCurrentTextContentY = (canvas: any, match: any) => {// pdfjs 坐标系原点在左下角。transform[5]代表y轴的基线,所以需要减去高度const {textBlock} = match;const { transform, height } = textBlock;return canvas.height - (transform[5] + height -2) * scale;}// 滚动到指定的匹配项const scrollToMatch = (match: any) => {const { pageIndex, matchList } = match;const firstMatchContent = matchList[0];// 获取滚动区域的高度const scrollHeight = pdfContainerRef.current.scrollHeight;console.log('滚动区域的高度:', scrollHeight);// 获取当前页码之前dom的高度let beforePageHeight = 0;for (let i = 0; i < pageIndex; i++) {const canvasParentDom = pdfContainerRef.current.querySelector(`#pdfPage-content-${i}`);let canvasParentHeight = canvasParentDom.style.height.replace('px', '');beforePageHeight += Number(canvasParentHeight);}// todo 继续计算 匹配项目的高度const currentPageCanvas = pdfContainerRef.current.querySelector(`#pdfPage-${pageIndex}`);const textContentY = getCurrentTextContentY(currentPageCanvas, firstMatchContent);const offsetTop = 50; //为了滚动目标文字不顶格const targetScrollTop = beforePageHeight + textContentY -offsetTop;pdfContainerRef.current.scrollTo({top: targetScrollTop, // 垂直滚动位置behavior: 'smooth'});};// 绘制高亮区域const drawHighlights = async (canvas: any, matchesList: MatchBlockItem[]) => {if (matchesList.length === 0) {return;}const context = canvas.getContext('2d');context.fillStyle = 'rgba(255, 255, 0, 0.5)'; // 黄色半透明填充matchesList.forEach((match: any) => {const {textBlock} = match;const { transform, width, height, str } = textBlock;// 获取每一个字符的宽度const charWidth = width / str.length;const lightWidth = (match.textEndIndex - match.textStartIndex) * charWidth;const lightHeight = height;const x = transform[4] + match.textStartIndex * charWidth;const y = getCurrentTextContentY(canvas, match);context.fillRect(Math.floor(x * scale), Math.floor(y), Math.ceil(lightWidth * scale), Math.ceil(lightHeight * scale));});};// 渲染单个页面const renderPage = async (page: any, pageNumber: number) => {const viewport = page.getViewport({ scale });const pageContentDom = document.createElement('div');pageContentDom.id = `pdfPage-content-${pageNumber}`;//为了文本层和canvas层重叠,利用组件库的类名pageContentDom.className = 'pdfViewer';pageContentDom.style.setProperty('--scale-factor', scale as any);pageContentDom.style.width = `${viewport.width}px`;pageContentDom.style.height = `${viewport.height}px`;pageContentDom.style.position = 'relative';// 创建 Canvas 元素const canvas = document.createElement('canvas');const context = canvas.getContext('2d');canvas.id=`pdfPage-${pageNumber}`canvas.width = viewport.width;canvas.height = viewport.height;canvas.style.border = '1px solid black';pageContentDom.appendChild(canvas);createHeightLightCanvas(viewport, pageNumber, pageContentDom);pdfContainerRef.current.appendChild(pageContentDom);// 渲染 PDF 页面到 Canvasawait page.render({canvasContext: context,viewport,}).promise;// 渲染文本框const textLayerDiv = document.createElement('div');textLayerDiv.style.width = viewport.width;textLayerDiv.style.height = viewport.height;//为了文本层和canvas层重叠,利用组件库的类名textLayerDiv.className = 'textLayer';const textContent = await page.getTextContent();pdfjsLib.renderTextLayer({textContentSource: textContent,container: textLayerDiv,viewport: viewport,textDivs: [],});pageContentDom.appendChild(textLayerDiv)};// 渲染 PDF 页面const renderPagesGroup = ( pages: any) => {pages.forEach(({page}:any, index: number) => {renderPage(page, index);});};// 加载 PDF 文件const loadPdf = async (url: any) => {const pdf = await pdfjsLib.getDocument(url).promise;const pages: any[] = [];for (let i = 1; i <= pdf.numPages; i++) {const page = await pdf.getPage(i);const textContent = await page.getTextContent();pages.push({page,textContent});}setPagesList(pages);renderPagesGroup(pages);};const findAllMatches = (text: string, pattern: string) => {// 创建正则表达式对象const regex = new RegExp(pattern, 'g');// 使用match方法找到所有匹配项const matches = text.match(regex);// 如果没有匹配项,返回空数组if (!matches) {return [];}// 创建一个数组来存储所有匹配的位置const positions = [];// 遍历所有匹配项,找到它们在字符串中的位置let match;while ((match = regex.exec(text)) !== null) {positions.push(match.index);}return positions;}// todo 优化参数个数,const getMatchesList = (items: any,currentItem: any, currentItemIndex: number,currentTextIndex: number, searchStr: string): MatchBlockItem[] => {let matchSearchList: MatchBlockItem[] = [];if(currentItem.str.length - (currentTextIndex + 1) < searchStr.length -1 ) {// 获取当前文本块中剩余字符,如果小于搜索字符长度,则继续查找下一个文本块let itemText = currentItem.str.slice(currentTextIndex); // 获取当前文本块中剩余字符let tempMatchSearchList = [{blockIndex: currentItemIndex,textStartIndex: currentTextIndex,textEndIndex: currentItem.str.length,// 由于统一使用slice截取,所以不包括最后一位textBlock: currentItem}]; // 存储后续文本块let index = currentItemIndex;const otherSearchLength = searchStr.length -1;while (itemText.length <= otherSearchLength) {index = index + 1;const currentOtherSearchLength = otherSearchLength - itemText.length; // 当前剩余搜索字符长度if (items[index].str.length > currentOtherSearchLength) {// 文本块的长度大于剩余搜索字符长度,则截取剩余搜索字符长度的字符itemText = `${itemText}${items[index].str.slice(0, currentOtherSearchLength+1)}`;tempMatchSearchList.push({blockIndex: index,textStartIndex: 0,textEndIndex: currentOtherSearchLength + 1,textBlock: items[index]})} else {// 文本块的长度小于剩余搜索字符长度,则截取全部字符, 继续itemText = `${itemText}${items[index].str}`;tempMatchSearchList.push({blockIndex: index,textStartIndex: 0,textEndIndex: items[index].str.length,textBlock: items[index]})}}if (itemText === searchStr) {matchSearchList = matchSearchList.concat(tempMatchSearchList);}}else {// 获取当前文本块中剩余字符,如果大于等于搜索字符长度,则截取当前文本块中搜索文本长度的字符const textEndIndex = currentTextIndex + searchStr.length;const text = currentItem.str.slice(currentTextIndex, textEndIndex); // 取出匹配字符所在文本块及后续文本块if (text === searchStr) {console.log('匹配到了:', currentItem, currentItemIndex)matchSearchList.push({blockIndex: currentItemIndex,textStartIndex: currentTextIndex,textEndIndex: textEndIndex,textBlock: currentItem})}}return matchSearchList;}// 查找文本的所有出现位置const findAllOccurrences = (items: any, searchStr: string): MatchBlockItem[] => {const firstSearchStr = searchStr[0];let matchSearchList: MatchBlockItem[] = [];for(let i=0; i<items.length; i++) {const currentItem = items[i];const currentMatchIndexList = findAllMatches(currentItem.str, firstSearchStr); // 获取当前文本块中第一个匹配字符的索引列表if (currentMatchIndexList.length > 0) {for(let j=0; j<currentMatchIndexList.length; j++){matchSearchList = [...matchSearchList, ...getMatchesList(items, currentItem, i, currentMatchIndexList[j], searchStr)];}}}return matchSearchList;};const clearHeightLightsCanvas = () => {const highlightCanvases = Array.from(pdfContainerRef.current.querySelectorAll('.highlightCanvas'));highlightCanvases.forEach((canvas: any) => {const context = canvas.getContext('2d');context.clearRect(0, 0, canvas.width, canvas.height);});}const handleSearch = async () => {clearHeightLightsCanvas()if (!searchText) {jumpToPage(0);return;}const newMatches: any = [];console.log('pagesList', pagesList)// todo  目前是按照每页来匹配,可能会匹配不到跨页的内容pagesList.forEach(async ({textContent}: any, pageIndex: number) => {const pageMatches = findAllOccurrences(textContent.items, searchText);newMatches.push({pageIndex, // 页面索引matchList: pageMatches, // 匹配项列表});})console.log('newMatches', newMatches);const isNotMatch = newMatches.every((match: any) => match.matchList.length === 0);if (isNotMatch) {alert('未找到匹配项');return;}/// 重新绘制高亮区域pagesList.forEach((_: any, pageIndex: number) => {const highlightCanvas = pdfContainerRef.current.querySelectorAll('.highlightCanvas')[pageIndex]; // 获取高亮层 Canvasconst currentMatches = newMatches.find((match: any) => match.pageIndex === pageIndex);drawHighlights(highlightCanvas,currentMatches?.matchList || []);});// 跳转const isExistItem = newMatches.find((match: any) => match.matchList.length > 0);if (isExistItem) {scrollToMatch(isExistItem);}};// 初始化 PDF.jsuseEffect(() => {loadPdf(pdfUrl);}, []);return <><div><h1>PDF 搜索(基于@pdfjs-dist-自定义实现)</h1><inputtype="text"value={searchText}onChange={(e) => setSearchText(e.target.value)}placeholder="输入要搜索的内容"/><Button onClick={handleSearch}>搜索</Button><div><div style={{ height: height || '500px' }}>{/* PDF 容器 */}<div ref={pdfContainerRef} style={{ position: 'relative', height: '100%', overflowY: 'scroll' }} /></div></div></div></>
};export default viewPdf;

【4】获取pdf文件中目录的数据结构

....
//基于‘【1】实现pdf预览’的代码const get= async (url: any) => {const pdf = await pdfjsLib.getDocument(url).promise;// 获取目录数据const pdfCatalogue= await pdf.getOutline();console.log('目录数据:', pdfCatalogue); };
...

相关文章:

  • 工具指南:免费将 PDF 转换为 Word 的 10 个工具
  • MIT IDSS深度解析:跨学科融合与系统科学实践
  • 【白雪讲堂】GEO优化第7篇 -构建《推荐类》内容的结构化模板
  • 代码随想录训练营第39天 || 198. 打家劫舍 213. 打家劫舍 II 337. 打家劫舍 III
  • YOLO学习笔记 | 从YOLOv5到YOLOv11:技术演进与核心改进
  • 搭建 Stable Diffusion 图像生成系统并通过 Ngrok 暴露到公网(实现本地系统网络访问)——项目记录
  • 基于esp32-s3,写一个实现json键值对数据创建和读写解析c例程
  • HCIA-Access V2.5_18_网络管理基础_1_网络管理系统架构
  • 【AI】Trae的MCP配置及使用测试
  • 什么是 GLTF/GLB? 3D 内容创建的基本数据格式说明,怎么下载GLB/GLTF格式模型
  • 实现支付宝沙箱环境搭建
  • 单片机与FPGA的核心差异、优缺点、编程差异、典型应用场景、选型等对比分析
  • DocsGPT remote接口RCE(CVE-2025-0868)
  • 解析虚拟机与Docker容器化服务的本质差异及Docker核心价值
  • Maven 手动添加 JAR 包到本地仓库笔记
  • 5.2.3 WPF 中 XAML 文件 Converter 使用介绍
  • 基于ACL方式手动建立站点间 IPSec 隧道
  • 牛客 verilog入门 VIP
  • 云原生后端架构:重塑后端开发的新范式
  • bert学习
  • 体坛联播|AC米兰挺进意大利杯决赛,弗雷戴特宣布退役
  • 欧盟数字法开出首张罚单:苹果和Meta合计被罚7亿欧元
  • 航空货运三巨头去年净利合计超88亿元,密切关注关税政策变化和市场反应
  • 迎世界读书日,2025年首都大学生系列读书活动发布
  • 中国泳协:新奥运周期竞争激烈,“三从一新”全力提升实力
  • 中方决定对在涉港问题上表现恶劣的美国国会议员、官员和非政府组织负责人实施制裁