当前位置: 首页 > news >正文

YOLOv12综述:基于注意力的增强与先前版本的对比分析

A Review of YOLOv12

Attention-Based Enhancements vs. Previous Versions

https://arxiv.org/pdf/2504.11995

The ​​YOLO (You Only Look Once) series​​ has been a leading framework in ​​real-time object detection​​, consistently improving the balance between ​​speed and accuracy​​. However, integrating ​​attention mechanisms​​ into YOLO has been challenging due to their ​​high computational overhead​​. ​​YOLOv12​​ introduces a novel approach that successfully incorporates ​​attention-based enhancements​​ while preserving ​​real-time performance​​. This paper provides a comprehensive review of ​​YOLOv12's architectural innovations​​, including ​​Area Attention​​ for ​​computationally efficient self-attention​​, ​​Residual Efficient Layer Aggregation Networks​​ for ​​improved feature aggregation​​, and ​​FlashAttention​​ for ​​optimized memory access​​. Additionally, we benchmark ​​YOLOv12​​ against prior ​​YOLO versions​​ and competing ​​object detectors​​, analyzing its improvements in ​​accuracy, inference speed, and computational efficiency​​. Through this analysis, we demonstrate how ​​YOLOv12​​ advances ​​real-time object detection​​ by refining the ​​latency-accuracy trade-off​​ and optimizing ​​computational resources​​.

YOLO(You Only Look Once)系列​​ 一直是​​实时目标检测​​领域的领先框架,通过不断改进​​速度与精度​​的平衡取得了显著进展。然而,由于​​高计算开销​​的问题,将​​注意力机制​​集成到YOLO中一直存在挑战。​​YOLOv12​​引入了一种新颖的方法,成功实现了​​基于注意力的增强​​,同时保持了​​实时性能​​。本文全面回顾了​​YOLOv12的架构创新​​,包括用于​​计算高效自注意力​​的​​区域注意力(Area Attention)​​、用于​​改进特征聚合​​的​​残差高效层聚合网络(Residual Efficient Layer Aggregation Networks)​​,以及用于​​优化内存访问​​的​​FlashAttention​​。此外,本文将​​YOLOv12​​与先前的​​YOLO版本​​和竞争性​​目标检测器​​进行基准测试,分析其在​​准确性、推理速度和计算效率​​方面的改进。通过这一分析,展示了​​YOLOv12​​如何通过优化​​延迟-准确性权衡​​和​​计算资源​​来推动​​实时目标检测​​的发展。

   1 Introduction   

Real-time object detection​​ is a cornerstone of ​​modern computer vision​​, playing a pivotal role in applications such as ​​autonomous driving [1, 2, 3, 4], robotics [5, 6, 7], and video surveillance [8, 9, 10]​​. These domains demand not only ​​high accuracy​​ but also ​​low-latency performance​​ to ensure ​​real-time decision-making​​. Among the various ​​object detection frameworks​​, the ​​YOLO (You Only Look Once) series​​ has emerged as a ​​dominant solution [11]​​, striking a balance between ​​speed and precision​​ by continuously refining ​​convolutional neural network (CNN) architectures [12, 13, 14, 15, 16, 17, 18, 19, 20, 21]​​. However, a fundamental challenge in ​​CNN-based detectors​​ lies in their ​​limited ability to capture long-range dependencies​​, which are crucial for understanding ​​spatial relationships in complex scenes​​. This limitation has led to increased research into ​​attention mechanisms​​, particularly ​​Vision Transformers (ViTs) [22, 23]​​, which excel at ​​global feature modeling​​. Despite their advantages, ​​ViTs suffer from quadratic computational complexity [24] and inefficient memory access [25, 26]​​, making them impractical for ​​real-time deployment​​.

​实时目标检测​​是​​现代计算机视觉​​的基石,在​​自动驾驶[1, 2, 3, 4]、机器人技术[5, 6, 7]和视频监控[8, 9, 10]​​等应用中发挥着关键作用。这些领域不仅要求​​高精度​​,还需要​​低延迟性能​​以确保​​实时决策​​。在众多​​目标检测框架​​中,​​YOLO(You Only Look Once)系列​​已成为​​主导解决方案[11]​​,通过持续优化​​卷积神经网络(CNN)架构[12, 13, 14, 15, 16, 17, 18, 19, 20, 21]​​,在​​速度与精度​​之间取得了平衡。然而,​​基于CNN的检测器​​面临的一个根本性挑战是其​​捕获长距离依赖关系的能力有限​​,而这对于理解​​复杂场景中的空间关系​​至关重要。这一局限性促使研究者们加大对​​注意力机制​​的研究,尤其是​​视觉变换器(ViTs)[22, 23]​​,其在​​全局特征建模​​方面表现出色。尽管具有优势,​​ViTs仍存在二次计算复杂度[24]和内存访问效率低[25, 26]​​的问题,使其难以应用于​​实时部署​​。

To address these limitations, ​​YOLOv12 [27]​​ introduces an ​​attention-centric approach​​ that integrates ​​key innovations​​ to enhance ​​efficiency​​ while maintaining ​​real-time performance​​. By embedding ​​attention mechanisms​​ within the ​​YOLO framework​​, it successfully bridges the gap between ​​CNN-based and transformer-based detectors​​ without compromising ​​speed​​. This is achieved through several ​​architectural enhancements​​ that optimize ​​computational efficiency​​, improve ​​feature aggregation​​, and refine ​​attention mechanisms​​:

  1. ​Area Attention (A²)​​: A novel mechanism that ​​partitions spatial regions​​ to reduce the complexity of ​​self-attention​​, preserving a ​​large receptive field​​ while improving ​​computational efficiency​​. This enables ​​attention-based models​​ to compete with ​​CNNs in speed​​.

  2. ​Residual Efficient Layer Aggregation Networks (R-ELAN)​​: An enhancement over traditional ​​ELAN​​, designed to ​​stabilize training in large-scale models​​ by introducing ​​residual shortcuts​​ and a revised ​​feature aggregation strategy​​, ensuring better ​​gradient flow​​ and ​​optimization​​.

  3. Architectural Streamlining​​: Several ​​structural refinements​​, including the integration of ​​FlashAttention​​ for ​​efficient memory access​​, the ​​removal of positional encoding​​ to ​​simplify computations​​, and an ​​optimized MLP ratio​​ to balance ​​performance​​ and ​​inference speed​​.

为解决这些局限性,​​YOLOv12[27]​​提出了一种​​以注意力为核心的方法​​,通过集成​​关键创新​​来提升​​效率​​,同时保持​​实时性能​​。通过将​​注意力机制​​嵌入​​YOLO框架​​,它成功弥合了​​基于CNN和基于变换器的检测器​​之间的差距,且未牺牲​​速度​​。这是通过多项​​架构增强​​实现的,包括优化​​计算效率​​、改进​​特征聚合​​以及完善​​注意力机制​​:

​区域注意力(Area Attention, A²)​​:一种创新机制,通过​​划分空间区域​​来降低​​自注意力机制​​的计算复杂度,在保持​​大感受野​​的同时提升​​计算效率​​。这使得​​基于注意力的模型​​能够在​​速度​​上与​​CNN​​相媲美。

​残差高效层聚合网络(Residual Efficient Layer Aggregation Networks, R-ELAN)​​:对传统​​ELAN​​的改进,通过引入​​残差连接​​和优化的​​特征聚合策略​​,旨在​​稳定大规模模型的训练​​,确保更好的​​梯度流动​​和​​优化效果​​。

​架构精简优化​​:包含多项​​结构改进​​,包括集成​​FlashAttention​​以实现​​高效内存访问​​、​​移除位置编码​​以​​简化计算​​,以及采用​​优化的MLP比率​​来平衡​​性能​​与​​推理速度​​。

   2  Technical Evolution of YOLO Architectures   

The You Only Look Once (YOLO) series​​ has revolutionized ​​real-time object detection​​ through ​​continuous architectural innovation​​ and ​​performance optimization​​. The evolution of YOLO can be traced through distinct versions, each introducing ​​significant advancements​​.

YOLO(You Only Look Once)系列​​通过​​持续的架构创新​​和​​性能优化​​,彻底改变了​​实时目标检测​​领域。YOLO的发展历程可以通过各个版本追溯,每个版本都带来了​​重大技术进步​​。

YOLOv1 (2015) [11]​​, developed by ​​Joseph Redmon et al.​​, introduced the concept of ​single-stage object detection​, prioritizing ​​speed over accuracy​​. It divided the image into a ​​grid​​ and predicted ​​bounding boxes​​ and ​​class probabilities​​ directly from each grid cell, enabling ​​real-time inference​​. This method significantly reduced the ​​computational overhead​​ compared to ​​two-stage detectors​​, albeit with some trade-offs in ​​localization accuracy​​.

YOLOv2 (2016) [12]​​, also by ​​Joseph Redmon​​, enhanced ​​detection capabilities​​ with the introduction of ​​anchor boxes​​, ​​batch normalization​​, and ​​multi-scale training​​. ​Anchor boxes​ allowed the model to predict bounding boxes of various shapes and sizes, improving its ability to detect ​​diverse objects​​. ​Batch normalization​ stabilized training and improved convergence, while ​multi-scale training​ made the model more robust to ​​varying input resolutions​​.

​YOLOv1(2015)[11]​​由​​Joseph Redmon等人​​开发,首次提出了​​单阶段目标检测​​的概念,优先考虑​​速度而非精度​​。该方法将图像划分为​​网格​​,直接从每个网格单元预测​​边界框​​和​​类别概率​​,实现了​​实时推理​​。与​​两阶段检测器​​相比,虽然​​定位精度​​有所妥协,但显著降低了​​计算开销​​。

​YOLOv2(2016)[12]​​同样由​​Joseph Redmon​​开发,通过引入​​锚框​​、​​批量归一化​​和​​多尺度训练​​增强了​​检测能力​​。​​锚框​​使模型能够预测各种形状和大小的边界框,提高了检测​​多样化物体​​的能力。​​批量归一化​​稳定了训练过程并改善了收敛性,而​​多尺度训练​​使模型对​​不同输入分辨率​​具有更强的鲁棒性。

YOLOv3 (2018) [13]​​, again by ​​Joseph Redmon​​, further improved ​​accuracy​​ with the ​​Darknet-53 backbone​​, ​​Feature Pyramid Networks (FPN)​​, and ​​logistic classifiers​​. ​Darknet-53​ provided a deeper and more powerful ​​feature extractor​​, while ​FPN​ enabled the model to leverage ​​multi-scale features​​ for improved detection of ​​small objects​​. ​Logistic classifiers​ replaced softmax for ​​class prediction​​, allowing for ​​multi-label classification​​.

YOLOv4 (2020) [14]​​, developed by ​​Alexey Bochkovskiy et al.​​, incorporated ​​CSPDarknet​​, ​​Mish activation​​, ​​PANet​​, and ​​Mosaic augmentation​​. ​CSPDarknet​ reduced computational costs while maintaining performance, ​Mish activation​​ improved ​​gradient flow​​, ​PANet​ enhanced ​​feature fusion​​, and ​Mosaic augmentation​ increased ​​data diversity​​.

YOLOv3(2018)[13]​​再次由​​Joseph Redmon​​开发,通过​​Darknet-53骨干网络​​、​​特征金字塔网络(FPN)​​和​​逻辑分类器​​进一步提高了​​准确度​​。​​Darknet-53​​提供了更深层、更强大的​​特征提取器​​,​​FPN​​使模型能够利用​​多尺度特征​​来改进​​小物体检测​​,​​逻辑分类器​​取代softmax进行​​类别预测​​,支持​​多标签分类​​。

​YOLOv4(2020)[14]​​由​​Alexey Bochkovskiy等人​​开发,整合了​​CSPDarknet​​、​​Mish激活函数​​、​​PANet​​和​​Mosaic数据增强​​。​​CSPDarknet​​在保持性能的同时降低了计算成本,​​Mish激活函数​​改善了​​梯度流动​​,​​PANet​​增强了​​特征融合​​能力,​​Mosaic数据增强​​则增加了​​数据多样性​​。

YOLOv5 (2020) [15]​​, developed by ​​Ultralytics​​, marked a ​​pivotal shift​​ by introducing a ​​PyTorch implementation​​. This significantly simplified ​​training and deployment​​, making YOLO more ​​accessible​​ to a wider audience. It also featured ​auto-anchor learning​, which dynamically adjusted ​​anchor box sizes​​ during training, and incorporated advancements in ​​data augmentation​​. The transition from ​Darknet to PyTorch​ was a ​​major change​​, and greatly contributed to the model's ​​popularity​​.

YOLOv6 (2022) [16]​​, developed by ​​Meituan​​, focused on ​​efficiency​​ with the ​​EfficientRep backbone​​, ​​Neural Architecture Search (NAS)​​, and ​​RepOptimizer​​. ​EfficientRep​ optimized the model's architecture for ​​speed and accuracy​​, ​NAS​ automated the search for ​​optimal hyperparameters​​, and ​RepOptimizer​ reduced ​​inference time​​ through ​​structural re-parameterization​​.

​YOLOv5(2020)[15]​​由​​Ultralytics​​开发,通过采用​​PyTorch实现​​标志着​​关键转折​​。这极大地简化了​​训练和部署​​流程,使YOLO对更广泛的用户群体更加​​易用​​。该版本还具备​​自动锚框学习​​功能,可在训练过程中动态调整​​锚框尺寸​​,并采用了先进的​​数据增强​​技术。从​​Darknet转向PyTorch​​的转变是一个​​重大变革​​,极大地提升了该模型的​​普及度​​。

​YOLOv6(2022)[16]​​由​​美团​​开发,专注于​​效率优化​​,采用​​EfficientRep骨干网络​​、​​神经架构搜索(NAS)​​和​​RepOptimizer​​。​​EfficientRep​​优化了模型架构以实现​​速度与精度的平衡​​,​​NAS​​自动搜索​​最优超参数​​,​​RepOptimizer​​通过​​结构重参数化​​减少了​​推理时间​​。

YOLOv7 (2022) [17]​​, developed by ​​Wang et al.​​, further improved ​​efficiency​​ through ​​Extended Efficient Layer Aggregation Network (E-ELAN)​​ and ​​re-parameterized convolutions​​.​E-ELAN​ enhanced ​​feature integration​​ and ​​learning capacity​​, while​re-parameterized convolutions​​ reduced ​​computational overhead​​.

YOLOv8 (2023) [18]​​, also developed by ​​Ultralytics​​, introduced ​​C2f modules​​, ​​task-specific detection heads​​, and ​​anchor-free detection​​. ​C2f modules​ enhanced ​​feature fusion​​ and ​​gradient flow​​, ​task-specific detection heads​ allowed for more ​​specialized detection tasks​​, and ​anchor-free detection​ eliminated the need for ​​predefined anchor boxes​​, simplifying the ​​detection process​​.

YOLOv7(2022)[17]​​由​​Wang等人​​开发,通过​​扩展高效层聚合网络(E-ELAN)​​和​​重参数化卷积​​进一步提升了​​效率​​。​​E-ELAN​​增强了​​特征整合​​和​​学习能力​​,而​​重参数化卷积​​则降低了​​计算开销​​。

​YOLOv8(2023)[18]​​同样由​​Ultralytics​​开发,引入了​​C2f模块​​、​​任务特定检测头​​和​​无锚框检测​​。​​C2f模块​​优化了​​特征融合​​和​​梯度流动​​,​​任务特定检测头​​支持更专业的​​检测任务​​,​​无锚框检测​​消除了对​​预定义锚框​​的需求,简化了​​检测流程​​。

YOLOv9 (2024) [19]​​, developed by ​​Chien-Yao Wang et al.​​, introduces ​​Generalized Efficient Layer Aggregation Network (GELAN)​​ and ​​Programmable Gradient Information (PGI)​​. ​GELAN​ improves the model's ability to learn ​​diverse features​​, and ​PGI​ helps to avoid ​​information loss​​ during ​​deep network training​​.

YOLOv10 (2024) [20]​​, developed by ​​various research contributors​​, emphasizes ​​dual label assignments​​, ​​NMS-free detection​​, and ​​end-to-end training​​. ​Dual label assignments​ enhance the model's ability to handle ​​ambiguous object instances​​, ​NMS-free detection​ reduces ​​computational overhead​​, and ​​end-to-end training​​ simplifies the ​​training process​​. The reason for stating "​​various research contributors​​" is that, at this time, there isn't a single, universally recognized, and consistently credited developer or organization for this specific release, as with previous versions.

​YOLOv9(2024)[19]​​由​​Chien-Yao Wang等人​​开发,采用​​广义高效层聚合网络(GELAN)​​和​​可编程梯度信息(PGI)​​。​​GELAN​​提升了模型学习​​多样化特征​​的能力,​​PGI​​有效避免了​​深度网络训练​​中的​​信息丢失​​问题。

​YOLOv10(2024)[20]​​由​​多方研究贡献者​​共同开发,重点采用​​双重标签分配​​、​​无NMS检测​​和​​端到端训练​​。​​双重标签分配​​增强了模型处理​​模糊目标实例​​的能力,​​无NMS检测​​降低了​​计算开销​​,​​端到端训练​​简化了​​训练流程​​。标注"​​多方研究贡献者​​"是因为该版本不像前代那样有明确单一的开发主体。

YOLOv11 (2024) [21]​​, developed by ​​Glenn Jocher and Jing Qiu​​, focuses on the ​C3K2 module​​, ​​feature aggregation​​, and ​​optimized training pipelines​​. The ​​C3K2 module​​ enhances ​​feature extraction​​, ​​feature aggregation​​ improves the model's ability to integrate ​​multi-scale features​​, and ​​optimized training pipelines​​ reduce ​​training time​​. Similar to ​​YOLOv10​​, the developer information is ​​less consolidated​​ and more ​​collaborative​​.

YOLOv12 (2025) [27]​​, the ​​latest iteration​​, integrates ​​attention mechanisms​​ while preserving ​​real-time efficiency​​. It introduces ​A²​, ​​Residual-Efficient Layer Aggregation Networks (R-ELAN)​​, and ​FlashAttention​, alongside a ​hybrid CNN-Transformer framework​​. These innovations refine ​​computational efficiency​​ and optimize the ​​latency-accuracy trade-off​​, surpassing both ​​CNN-based​​ and ​​transformer-based object detectors​​.

YOLOv11(2024)[21]​​由​​Glenn Jocher和Jing Qiu​​主导开发,聚焦​​C3K2模块​​、​​特征聚合​​和​​优化训练流程​​。​​C3K2模块​​强化了​​特征提取​​能力,​​特征聚合​​优化了​​多尺度特征整合​​,​​优化训练流程​​显著缩短了​​训练时间​​。与​​YOLOv10​​类似,其开发信息呈现​​去中心化​​的​​协作特征​​。

​YOLOv12(2025)[27]​​作为​​最新迭代版本​​,在保持​​实时效率​​的同时整合了​​注意力机制​​。创新性地引入​​A²模块​​、​​残差高效层聚合网络(R-ELAN)​​和​​FlashAttention​​,结合​​CNN-Transformer混合框架​​。这些突破性设计优化了​​计算效率​​,完善了​​延迟-精度平衡​​,性能超越传统​​CNN基检测器​​和​​Transformer基检测器​​。

The ​​evolution of YOLO models​​ highlights a shift from ​​Darknet-based architectures [11, 12, 13, 14]​to ​​PyTorch implementations [15, 16, 17, 18, 19, 20, 21]​​, and more recently, towards ​​hybrid CNN-transformer architectures [27]​​. Each generation has balanced ​​speed and accuracy​​, incorporating advancements in ​​feature extraction​​, ​​gradient optimization​​, and ​​data efficiency​​. ​​Figure 1​​ illustrates the ​​progression of YOLO architectures​​, emphasizing ​​key innovations​​ across versions.​

Figure 1: Evolution of YOLO architectures

With YOLOv12's architectural refinements​​, ​​attention mechanisms​​ are now embedded within the ​​YOLO framework​​, optimizing both ​​computational efficiency​​ and ​​high-speed inference​​. The next section analyzes these ​​enhancements in detail​​, benchmarking ​​YOLOv12's performance​​ across multiple ​​detection tasks​​.

​YOLO模型的演进历程​​清晰展现了从​​Darknet架构到​​PyTorch实现再到​​CNN-Transformer混合架构​的技术跃迁。每一代都完美平衡了​​速度与精度​​,在​​特征提取​​、​​梯度优化​​和​​数据效率​​方面持续突破。​​图1​​直观呈现了​​YOLO架构的演进路线​​,突出各版本的​​关键技术革新​​。

YOLOv12架构革新​​将​​注意力机制​​深度集成至​​YOLO框架​​,在优化​​计算效率​​的同时确保​​高速推理​​性能。下文将详细解析这些​​架构增强​​,并通过多维度​​检测任务​​对​​YOLOv12性能​​进行全面基准测试。

   ​​3  Architectural Design of YOLOv12​   

The ​​YOLO framework​​ revolutionized ​​object detection​​ by introducing a ​​unified neural network​​ that simultaneously performs ​​bounding box regression​​ and ​​object classification​​ in a ​​single forward pass [28]​​. Unlike traditional ​​two-stage detection methods​​, YOLO adopts an ​​end-to-end approach​​, making it highly efficient for ​​real-time applications​​. Its ​​fully differentiable design​​ allows ​​seamless optimization​​, leading to improved ​​speed and accuracy​​ in ​​object detection tasks​​.

At its core, the ​​YOLOv12 architecture​​ consists of two primary components: the ​​backbone​​ and the ​​head​​. The ​​backbone​​ serves as the ​​feature extractor​​, processing the ​​input image​​ through a series of ​​convolutional layers​​ to generate ​​hierarchical feature maps​​ at different scales. These features capture essential ​​spatial and contextual information​​ necessary for ​​object detection​​. The ​​head​​ is responsible for ​​refining these features​​ and generating ​​final predictions​​ by performing ​​multi-scale feature fusion​​ and ​​localization​​. Through a combination of ​​upsampling​​, ​​concatenation​​, and ​​convolutional operations​​, the head enhances ​​feature representations​​, ensuring robust detection of ​​small, medium, and large objects​​. The ​​Backbone and Head Architecture of YOLOv12​​ is depicted in ​​Algorithm 1​​.

​YOLO框架​​通过构建​​统一神经网络​​架构,实现​​单次前向传播​中同步完成​​边界框回归​​与​​目标分类​​,彻底革新了​​目标检测​​领域。相较于传统​​两阶段检测方法​​采用的​​端到端方案​​使其在​​实时应用​​场景中展现卓越效能。其​​全可微设计​​支持​​无缝优化​​,显著提升了​​目标检测任务​​的​​速度与精度​​表现。

​YOLOv12架构​​核心包含两大组件:​​骨干网络​​与​​检测头​​。​​骨干网络​​作为​​特征提取器​​,通过级联​​卷积层​​处理​​输入图像​​,生成多尺度​​层级特征图​​。这些特征精准捕获​​目标检测​​所需的​​空间与上下文信息​​。​​检测头​​通过​​多尺度特征融合​​与​​精确定位​​,实现​​特征优化​​并输出​​最终预测​​。结合​​上采样​​、​​特征拼接​​与​​卷积操作​​的协同机制,显著增强​​特征表征​​能力,确保对​​大/中/小目标​​的鲁棒检测性能。​​YOLOv12骨干网络与检测头架构​​详见​​算法1​​示意图。

​3.1  Backbone: Feature Extraction​

The ​​backbone of YOLOv12​​ processes the ​​input image​​ through a series of ​​convolutional layers​​, progressively reducing its ​​spatial dimensions​​ while increasing the ​​depth of feature maps​​. The process begins with an ​​initial convolutional layer​​ that extracts ​​low-level features​​, followed by additional ​​convolutional layers​​ that perform ​​downsampling​​ to capture ​​hierarchical information​​. The first stage applies a ​​3×3 convolution​​ with a ​​stride of 2​​ to generate the ​​initial feature map​​. This is followed by another ​​convolutional layer​​ that further reduces the ​​spatial resolution​​ while increasing ​​feature depth​​.

As the image moves through the backbone​​, it undergoes ​​multi-scale feature learning​​ using specialized modules like ​​C3k2​​ and ​​A2C2F​​. The ​​C3k2 module​​ enhances ​​feature representation​​ while maintaining ​​computational efficiency​​, and the ​​A2C2F module​​ improves ​​feature fusion​​ for better ​​spatial and contextual understanding​​. The backbone continues this process until it generates three key ​​feature maps​​: ​​P3, P4, and P5​​, each representing different scales of ​​feature extraction​​. These feature maps are then passed to the ​​detection head​​ for further processing.

​3.1 骨干网络:特征提取​

​YOLOv12的骨干网络​​通过一系列​​卷积层​​处理​​输入图像​​,在逐步降低​​空间维度​​的同时增加​​特征图深度​​。该过程始于提取​​低级特征​​的​​初始卷积层​​,随后通过执行​​下采样​​的额外​​卷积层​​来捕获​​层次化信息​​。第一阶段采用​​步长为2​​的​​3×3卷积​​生成​​初始特征图​​,随后通过另一​​卷积层​​在提升​​特征深度​​的同时进一步降低​​空间分辨率​​。

当图像通过骨干网络时,系统采用​​C3k2​​和​​A2C2F​​等专用模块进行​​多尺度特征学习​​。其中​​C3k2模块​​在保持​​计算效率​​的同时增强​​特征表征​​能力,而​​A2C2F模块​​则通过优化​​特征融合​​来提升​​空间与上下文理解​​能力。该处理持续进行直至生成三个关键​​特征图​​:​​P3、P4和P5​​,分别代表不同尺度的​​特征提取​​结果,这些特征图随后被传递至​​检测头​​进行后续处理。

​3.2 Head: Feature Fusion and Object Detection​

The ​​head of YOLOv12​​ is responsible for ​​merging multi-scale features​​ and generating final ​​object detection predictions​​. It employs a ​​feature fusion strategy​​ that combines information from different levels of the backbone to enhance ​​detection accuracy​​ across ​​small, medium, and large objects​​. This is achieved through a series of ​​upsampling​​ and ​​concatenation operations​​. The process begins with the ​​highest-resolution feature map (P5)​​ being upsampled using a ​​nearest-neighbor interpolation method​​. It is then ​​concatenated​​ with the corresponding ​​lower-resolution feature map (P4)​​ to create a ​​refined feature representation​​. The fused feature is further processed using the ​​A2C2F module​​ to enhance its ​​expressiveness​​.

A similar process is repeated for the next scale by ​​upsampling the refined feature map​​ and ​​concatenating it with the lower-scale feature (P3)​​. This ​​hierarchical fusion​​ ensures that both ​​low-level and high-level features​​ contribute to the ​​final detection​​, improving the model’s ability to detect objects at ​​varying scales​​.

​3.2 检测头:特征融合与目标检测​

​YOLOv12的检测头​​负责​​融合多尺度特征​​并生成最终的​​目标检测预测​​。其采用的​​特征融合策略​​通过整合骨干网络不同层次的特征信息,显著提升了​​小/中/大目标​​的​​检测精度​​。该过程通过​​上采样​​和​​特征拼接操作​​实现:首先对​​最高分辨率特征图(P5)​​进行​​最近邻插值上采样​​,随后与​​低分辨率特征图(P4)​​进行​​特征拼接​​,形成​​优化后的特征表示​​,最后通过​​A2C2F模块​​进一步增强其​​特征表达能力​​。

系统通过​​迭代优化流程​​对下一尺度特征进行处理:将​​优化后的特征图​​再次​​上采样​​后与​​更小尺度特征(P3)​​进行​​拼接​​。这种​​层级融合机制​​确保​​底层细节特征​​和​​高层语义特征​​共同参与​​最终检测​​,显著提升模型在​​多尺度目标​​上的检测性能。

After ​​feature fusion​​, the network undergoes ​​final processing​​ to prepare for detection. The refined features are ​​downsampled again​​ and merged at different levels to strengthen ​​object representations​​. The ​​C3k2 module​​ is applied at the ​​largest scale (P5/32-large)​​ to ensure that ​​high-resolution features​​ are preserved while reducing ​​computational cost​​. These processed feature maps are then passed through the ​​final detection layer​​, which applies ​​classification​​ and ​​localization predictions​​ across different ​​object categories​​. The detailed breakdown of its ​​backbone and head architecture​​ is formally described in ​​Algorithm 1​​.

完成​​特征融合​​后,网络进入​​检测前处理阶段​​:通过​​重复下采样​​在不同层级强化​​目标特征表示​​。在​​最大尺度(P5/32-large)​​应用​​C3k2模块​​,在降低​​计算开销​​的同时保留​​高分辨率特征​​优势。最终,处理完成的特征图输入​​检测输出层​​,同步完成​​多类别目标​​的​​分类预测​​与​​定位回归​​。完整的​​骨干网络-检测头架构​​技术细节详见​​算法1​​规范描述。

   4 Architectural Innovations of YOLOv12   ​

​YOLOv12​​ introduces a ​​novel attention-centric approach​​ to ​​real-time object detection​​, bridging the ​​performance gap​​ between ​​conventional CNNs​​ and ​​attention-based architectures​​. Unlike previous ​​YOLO versions​​ that primarily relied on ​​CNNs for efficiency​​, ​​YOLOv12 integrates attention mechanisms​​ without sacrificing ​​speed​​. This is achieved through ​​three key architectural improvements​​: the ​​A² Module​​, ​​R-ELAN​​, and ​​enhancements to the overall model structure​​, including ​​FlashAttention​​ and ​​reduced computational overhead​​ in the ​​multi-layer perceptron (MLP)​​. Each of these components is detailed below.

​YOLOv12​​提出​​创新的注意力核心方案​​,在​​实时目标检测​​中成功弥合了​​传统CNN​​与​​注意力架构​​之间的​​性能鸿沟​​。不同于早期​​YOLO版本​​主要依赖​​CNN保证效率​​,​​YOLOv12集成注意力机制​​的同时仍保持​​高速性能​​。这一突破源自三大核心架构改进:​​A²模块​​、​​R-ELAN​​以及包含​​FlashAttention​​和​​MLP计算开销优化​​在内的​​整体模型增强​​。各组件技术细节如下。

​4.1 Area Attention Module​

The ​​efficiency of attention mechanisms​​ has traditionally been hindered by their ​​high computational cost​​, particularly due to the ​​quadratic complexity​​ associated with ​​self-attention operations [29]​​. A common strategy to mitigate this issue is ​​linear attention [30]​​, which reduces complexity by approximating ​​attention interactions​​ with more ​​efficient transformations​​. However, while ​​linear attention improves speed​​, it suffers from ​​global dependency degradation [31]​​, ​​instability during training [32]​​, and ​​sensitivity to input distribution shifts [33]​​. Additionally, due to its ​​low-rank representation constraints [34, 32]​​, it struggles to retain ​​fine-grained details​​ in ​​high-resolution images​​, limiting its effectiveness in ​​object detection​​.

​4.1 区域注意力模块​

传统​​注意力机制效率​​受限于​​高计算成本​​,尤其是​​自注意力操作[29]​​的​​平方级复杂度​​。常用解决方案​​线性注意力[30]​​通过​​高效变换​​近似​​注意力交互​​来降低复杂度,但存在​​全局依赖退化[31]​​、​​训练不稳定性[32]​​和​​输入分布敏感性[33]​​等缺陷。受​​低秩表示约束[34,32]​​影响,其在​​高分辨率图像​​中难以保持​​细粒度细节​​,制约了​​目标检测​​效果。

To address these limitations, ​​YOLOv12 introduces the A² Module​​, which retains the strengths of ​​self-attention​​ while significantly reducing ​​computational overhead [27]​​. Unlike traditional ​​global attention mechanisms​​ that compute interactions across the ​​entire image​​, ​​Area Attention​​ divides the ​​feature map​​ into ​​equal-sized non-overlapping segments​​, either ​​horizontally or vertically​​. Specifically, a feature map of dimensions ​​(H, W)​​ is partitioned into ​​L segments​​ of size ​​(H/L, W)​​ or ​​(H, W/L)​​, eliminating the need for explicit ​​window partitioning methods​​ seen in other attention models such as ​​Shifted Window [35]​​, ​​Criss-Cross Attention [36]​​, or ​​Axial Attention [37]​​. These methods often introduce ​​additional complexity​​ and reduce ​​computational efficiency​​, whereas ​​A² achieves segmentation​​ via a ​​simple reshape operation​​, maintaining a ​​large receptive field​​ while significantly enhancing ​​processing speed [27]​​. This approach is depicted in ​​Figure 2​​.

Figure 2: Comparison of different local attention techniques, with the proposed Area Attention method

​YOLOv12​​提出的​​A²模块​​在保留​​自注意力​​优势的同时大幅降低​​计算开销[27]​​。不同于传统​​全局注意力​​需计算​​全图交互​​,​​区域注意力​​将​​特征图​​划分为​​均等非重叠区块​​(水平或垂直方向)。具体而言,(H,W)尺寸特征图被分割为L个(H/L,W)或(H,W/L)区块,摒弃了​​移位窗口[35]​​、​​十字交叉注意力[36]​​等方案复杂的​​窗口划分方法​​。这些方法往往引入​​额外复杂度​​,而​​A²模块​​仅需​​简单变形操作​​即可实现分割,在保持​​大感受野​​的同时显著提升​​处理速度[27]​​(如图2所示)。

Although ​​A² reduces the receptive field​​ to ​​1/4 of the original size​​, it still surpasses ​​conventional local attention methods​​ in ​​coverage and efficiency​​. Moreover, its ​​computational cost​​ is nearly ​​halved​​, reducing from ​​2n²hd​​ (traditional self-attention complexity) to ​​n²hd/2​​. This ​​efficiency gain​​ allows ​​YOLOv12​​ to process ​​large-scale images​​ more effectively while maintaining ​​robust detection accuracy [27]​​.

尽管​​A²​​将​​感受野​​缩减至​​原尺寸1/4​​,但其​​覆盖范围与效率​​仍优于​​传统局部注意力​​。计算成本从​​2n²hd​​(传统自注意力)降至​​n²hd/2​​,近乎​​减半​​。这一​​效率提升​​使​​YOLOv12​​能更高效处理​​大规模图像​​,同时保持​​强健检测精度[27]​​。

4.2 Residual Efficient Layer Aggregation Networks (R-ELAN)

Feature aggregation​​ plays a ​​crucial role​​ in improving ​​information flow​​ within ​​deep learning architectures​​. Previous ​​YOLO models​​ incorporated ​​Efficient Layer Aggregation Networks (ELAN) [17]​​, which optimized ​​feature fusion​​ by splitting the output of ​​1 × 1 convolution layers​​ into multiple ​​parallel processing streams​​ before merging them back together. However, this approach introduced two ​​major drawbacks​​: ​​gradient blocking​​ and ​​optimization difficulties​​. These issues were particularly evident in ​​deeper models​​, where the lack of ​​direct residual connections​​ between the input and output impeded ​​effective gradient propagation​​, leading to ​​slow or unstable convergence​​.

To address these challenges, ​​YOLOv12 introduces R-ELAN​​, a ​​novel enhancement​​ designed to improve ​​training stability​​ and ​​convergence​​. Unlike ​​ELAN​​, ​​R-ELAN integrates residual shortcuts​​ that connect the input directly to the output with a ​​scaling factor​​ (default set to ​​0.01​​) [27]. This ensures ​​smoother gradient flow​​ while maintaining ​​computational efficiency​​. These ​​residual connections​​ are inspired by ​​layer scaling techniques​​ in ​​Vision Transformers [38]​​, but they are specifically adapted to ​​convolutional architectures​​ to prevent ​​latency overhead​​, which often affects ​​attention-heavy models​​.

​Figure 3​​ illustrates a ​​comparative overview​​ of different architectures, including ​​CSPNet​​, ​​ELAN​​, ​​C3k2​​, and ​​R-ELAN​​, highlighting their ​​structural distinctions​​.

Figure 3: Comparison of CSPNet, ELAN, C3k2, and R-ELAN Architectures.

4.2 残差高效层聚合网络(R-ELAN)​

​特征聚合​​在改善​​深度学习架构​​的​​信息流动​​方面具有​​关键作用​​。早期​​YOLO模型​​采用的​​高效层聚合网络(ELAN)[17]​​通过将​​1×1卷积层​​输出分割为多个​​并行处理流​​再进行融合来优化​​特征融合​​,但存在​​梯度阻断​​和​​优化困难​​两大缺陷。这些问题在​​深层模型​​中尤为显著,由于输入输出间缺乏​​直接残差连接​​,严重阻碍了​​有效梯度传播​​,导致​​收敛缓慢或不稳定​​。

为解决这些问题,​​YOLOv12​​提出​​创新性改进方案R-ELAN​​。不同于​​ELAN​​,​​R-ELAN​​通过引入​​残差捷径连接​​(默认​​缩放因子0.01​​[27]),在保持​​计算效率​​的同时确保​​梯度平滑流动​​。这些​​残差连接​​借鉴了​​视觉Transformer[38]​​的​​层缩放技术​​,但专门针对​​卷积架构​​进行适配,有效避免了​​注意力密集型模型​​常见的​​延迟开销​​问题。

​图3​​对比展示了​​CSPNet​​、​​ELAN​​、​​C3k2​​和​​R-ELAN​​的​​结构差异​​,直观呈现各架构特性。

• ​CSPNet (Cross-Stage Partial Network)​​: ​​CSPNet​​ improves ​​gradient flow​​ and reduces ​​redundant computation​​ by splitting the ​​feature map​​ into two parts, processing one through a sequence of ​​convolutions​​ while keeping the other ​​unaltered​​, and then merging them. This ​​partial connection approach​​ enhances ​​efficiency​​ while preserving ​​representational capacity [39]​​.

​• CSPNet(跨阶段部分网络)​​:​​CSPNet​​通过将​​特征图​​分割为两部分——一部分经过​​卷积序列​​处理,另一部分​​保持原状​​,最后合并两者——从而改善​​梯度流动​​并减少​​冗余计算​​[39]。这种​​部分连接方法​​在保持​​表征能力​​的同时提升了​​计算效率​​。

• ​ELAN (Efficient Layer Aggregation Networks)​​: ​​ELAN​​ extends ​​CSPNet​​ by introducing ​​deeper feature aggregation​​. It utilizes multiple ​​parallel convolutional paths​​ after the initial ​​1 × 1 convolution​​, which are ​​concatenated​​ to enrich ​​feature representation​​. However, the absence of ​​direct residual connections​​ limits ​​gradient flow​​, making ​​deeper networks harder to train [17]​​.

​• ELAN(高效层聚合网络)​​:​​ELAN​​在​​CSPNet​​基础上引入​​深层特征聚合​​机制,通过​​1×1卷积​​后的多路​​并行卷积路径​​进行​​特征拼接​​,丰富​​特征表示​​[17]。但由于缺乏​​直接残差连接​​,其​​梯度流动​​受限,导致​​深层网络训练困难​​。

• ​C3k2​: A modified version of ​​ELAN​​, ​​C3k2​​ incorporates additional ​​transformations​​ within the ​​feature aggregation process​​, but it still inherits the ​​gradient-blocking issues​​ from ELAN. While it improves ​​structural efficiency​​, it does not fully resolve the ​​optimization challenges​​ faced in ​​deep networks [21, 19]​​.

​• C3k2​​:作为​​ELAN​​的改进版,​​C3k2​​在​​特征聚合过程​​中增加了额外​​变换操作​​,但仍存在​​梯度阻断问题​​[21,19]。虽然提升了​​结构效率​​,但未能完全解决​​深层网络​​的​​优化挑战​​。

• ​R-ELAN​: Unlike ​​ELAN​​ and ​​C3k2​​, ​​R-ELAN​​ restructures ​​feature aggregation​​ by incorporating ​​residual connections​​. Instead of first splitting the ​​feature map​​ and processing the parts independently, ​​R-ELAN​​ adjusts ​​channel dimensions​​ upfront, generating a ​​unified feature map​​ before passing it through ​​bottleneck layers​​. This design significantly enhances ​​computational efficiency​​ by reducing ​​redundant operations​​ while ensuring effective ​​feature integration [27]​​.

​• R-ELAN​​:相较​​ELAN​​和​​C3k2​​,​​R-ELAN​​创新性地整合​​残差连接​​结构。其先调整​​通道维度​​生成​​统一特征图​​,再通过​​瓶颈层​​处理,而非先分割后处理的方式。这一设计通过减少​​冗余运算​​显著提升​​计算效率​​,同时确保​​特征整合​​效果[27]。

The introduction of ​​R-ELAN​​ in ​​YOLOv12​​ yields several advantages, including ​​faster convergence​​, improved ​​gradient stability​​, and reduced ​​optimization difficulties​​, particularly for ​​larger-scale models (L- and X-scale)​​. Previous versions often faced ​​convergence failures​​ under standard optimizers like ​​Adam​​ and ​​AdamW [17]​​, but ​​R-ELAN​​ effectively mitigates these issues, making ​​YOLOv12​​ more robust for ​​deep learning applications [27]​​.

​R-ELAN​​的引入为​​YOLOv12​​带来三大优势:​​更快收敛速度​​、​​更强梯度稳定性​​,以及针对​​大尺度模型(L/X规格)​​的​​优化难度降低​​。早期版本使用​​Adam/AdamW​​[17]等优化器常出现​​收敛失败​​,而​​R-ELAN​​有效解决这些问题,使​​YOLOv12​​在​​深度学习应用​​中表现更鲁棒[27]。

4.3 Additional Improvements and Efficiency Enhancements

Beyond the introduction of A² and R-ELAN​​, ​​YOLOv12​​ incorporates several ​​additional architectural refinements​​ to enhance ​​overall performance​​:

• ​Streamlined Backbone with Fewer Stacked Blocks​: Prior versions of ​​YOLO [18, 19, 20, 21]​​ incorporated multiple ​​stacked attention and convolutional layers​​ in the final stages of the backbone. ​​YOLOv12​​ optimizes this by retaining only a ​​single R-ELAN block​​, leading to ​​faster convergence​​, ​​better optimization stability​​, and ​​improved inference efficiency​​—especially in ​​larger models​​.

4.3 其他改进与效率优化​

除引入​​A²模块​​和​​R-ELAN​​外,​​YOLOv12​​还进行了多项​​架构优化​​以提升​​整体性能​​:

• ​​精简骨干网络结构​​:早期​​YOLO版本[18-21]​​在骨干网络末端使用多个​​堆叠注意力与卷积层​​,而​​YOLOv12​​仅保留​​单个R-ELAN模块​​,显著提升​​收敛速度​​、​​优化稳定性​​和​​推理效率​​,在​​大模型​​中表现尤为突出。

• ​Efficient Convolutional Design​: To enhance ​​computational efficiency​​, ​​YOLOv12​​ strategically retains ​​convolution layers​​ where they offer advantages. Instead of using ​​fully connected layers with Layer Normalization (LN)​​, it adopts ​​convolution operations combined with Batch Normalization (BN)​​, which better suits ​​real-time applications [27]​​. This allows the model to maintain ​​CNN-like efficiency​​ while incorporating ​​attention mechanisms​​.

• ​​高效卷积设计​​:为增强​​计算效率​​,​​YOLOv12​​策略性保留优势​​卷积层​​。采用​​卷积运算结合批归一化(BN)​​替代​​全连接层+层归一化(LN)​​,更适配​​实时应用场景[27]​​。该设计在集成​​注意力机制​​的同时,保持​​CNN级效率​​。

• ​Removal of Positional Encoding​: Unlike traditional ​​attention-based architectures​​, ​​YOLOv12​​ discards explicit ​​positional encoding​​ and instead employs ​​large-kernel separable convolutions (7×7)​​ in the attention module [27], known as the ​​Position Perceiver​​. This ensures ​​spatial awareness​​ without adding ​​unnecessary complexity​​, improving both ​​efficiency​​ and ​​inference speed​​.

• ​​取消位置编码​​:不同于传统​​注意力架构​​,​​YOLOv12​​摒弃显式​​位置编码​​,转而在注意力模块采用​​大核可分离卷积(7×7)​​[27](称为​​位置感知器​​)。该方案在确保​​空间感知力​​的同时避免​​冗余计算复杂度​​,同步提升​​效率​​与​​推理速度​​。

• ​Optimized MLP Ratio​: Traditional ​​Vision Transformers​​ typically use an ​​MLP expansion ratio of 4​​, leading to ​​computational inefficiencies​​ in ​​real-time settings​​. ​​YOLOv12​​ reduces the ​​MLP ratio to 1.2 [27]​​, ensuring that the ​​feed-forward network​​ does not dominate ​​overall runtime​​. This refinement helps balance ​​efficiency and performance​​, preventing ​​unnecessary computational overhead​​.

• ​​优化MLP比率​​:传统​​视觉Transformer​​通常采用​​MLP扩展比4​​,导致​​实时场景​​出现​​计算效率低下​​问题。​​YOLOv12​​将​​MLP比率降至1.2[27]​​,确保​​前馈网络​​不会主导​​整体运行时间​​,有效平衡​​效率与性能​​,避免​​不必要计算开销​​。

​FlashAttention Integration​: One of the key bottlenecks in ​​attention-based models​​ is ​​memory inefficiency [25, 26]​​. ​​YOLOv12​​ incorporates ​​FlashAttention​​, an ​​optimization technique​​ that reduces ​​memory access overhead​​ by restructuring computation to better utilize ​​GPU high-speed memory (SRAM)​​. This allows ​​YOLOv12​​ to match ​​CNNs in speed​​ while leveraging the ​​superior modeling capacity​​ of ​​attention mechanisms​​.

• ​​集成FlashAttention​​:针对​​注意力模型​​的​​内存效率低下[25,26]​​瓶颈,​​YOLOv12​​引入​​FlashAttention优化技术​​。通过重构计算流程以更好利用​​GPU高速内存(SRAM)​​,显著降低​​内存访问开销​​,使模型在保持​​注意力机制优越建模能力​​的同时,达到​​CNN级推理速度​​。

   5  Discussion   

(Chapter 7 in the paper)

​​​YOLOv12​​在​​目标检测​​领域实现了重大突破,在​​YOLOv11​​的坚实基础之上,融入了​​前沿架构优化​​。该模型在​​精度​​、​​速度​​和​​计算效率​​之间取得了精妙平衡,成为​​实时计算机视觉应用​​的​​最优解决方案​​。

​5.1 模型效率与部署​

​YOLOv12​​提供从​​纳米级(12n)​​到​​超大型(12x)​​的​​多尺寸模型​​,可适配​​各类硬件平台​​。这种​​可扩展性​​确保其既能高效运行于​​资源受限的边缘设备​​,也能充分发挥​​高性能GPU​​的潜力,在​​优化推理速度​​的同时保持​​高精度​​。其中​​纳米级和小型变体​​在保持​​检测精度​​的前提下显著降低​​延迟​​,是​​自主导航[44,45]​​、​​机器人技术[5]​​及​​智能监控[46-48]​​等​​实时应用​​的理想选择。

​​5.2 架构创新与计算效率​​

​YOLOv12​​通过多项​​关键架构改进​​同时提升了​​特征提取​​和​​处理效率​​。​​R-ELAN​​优化了​​特征融合​​与​​梯度传播​​,支持构建​​更深层​​却更高效的​​网络结构​​。引入​​7×7可分离卷积​​在减少​​参数量​​的同时保持​​空间一致性​​,以最小​​计算开销​​实现更优的​​特征提取​​。

​YOLOv12​​最突出的优化之一是采用​​FlashAttention驱动的区域注意力机制​​,在提升​​检测精度​​的同时降低​​内存开销​​。这使得模型能在​​复杂动态环境​​中更精准地​​定位目标​​,且不影响​​推理速度​​。这些​​架构改进​​共同实现了​​更高mAP​​,同时维持​​实时处理效率​​,使其特别适合需要​​低延迟目标检测​​的应用场景。

​5.3 性能提升与硬件适应性​

基准测试表明,​​YOLOv12​​在精度和效率上均超越前代YOLO版本。其中​​YOLOv12m​​变体在参数量减少25%的情况下,达到了与​​YOLOv11x​​相当甚至更优的mAP值,展现了显著的​​计算效率提升​​。更小型的​​YOLOv12s​​变体则大幅降低了​​推理延迟​​,使其特别适合​​边缘计算​​和​​嵌入式视觉应用​​[49]。

从硬件部署角度看,​​YOLOv12​​展现出卓越的​​可扩展性​​,既能充分发挥​​高性能GPU​​的算力,也能适配​​低功耗AI加速器​​。其优化的模型变体可灵活部署于​​自动驾驶车辆​​、​​工业自动化​​、​​安防监控​​等​​实时应用场景​​[50-52]。该模型高效的​​内存利用率​​和较低的​​计算占用​​,使其成为​​资源受限环境​​的理想选择。

相关文章:

  • Spring MVC 执行流程全解析:从请求到响应的七步走
  • JavaWeb学习打卡-Day1-分层解耦、Spring IOC、DI
  • 实践项目开发-hbmV4V20250407-readme
  • Redis 慢查询分析与优化
  • 2025 年职业院校技能大赛网络建设与运维赛项Docker赛题解析
  • JS省市区三级联动查询示例代码(城市查询、地区查询)
  • SaltStack远程协助工具
  • 我用deepseek做了一个提取压缩文件夹下pdf和word文件工具
  • Java抽象类、接口和内部类介绍
  • Spring AI Alibaba Graph基于 ReAct Agent 的天气预报查询系统
  • 解决 Arduino IDE 2.3.6 在 Windows 上无法启动:缺少 Documents 文件夹与注册表路径错误
  • Spring AOP 事务
  • 【Linux专栏】zip 多个文件不带路径
  • 2025年渗透测试面试题总结-拷打题库09(题目+回答)
  • Windows1909,21H2哪个版本更稳定
  • 小刚说C语言刷题——1039 求三个数的最大数
  • 1️⃣4️⃣three.js_Stats性能监视器
  • 机器学习中,什么叫监督学习?什么叫非监督学习?
  • 25.解决中医知识问答删除历史对话功能后端处理请求时抛出异常
  • 【大数据、数据开发与数据分析面试题汇总(含答案)】
  • 澳大利亚大选提前投票开始
  • 第八届进博会将致力于打造“五个高”,为展商增值赋能
  • “代课老师被男友杀害案”一审开庭,将择期宣判
  • 美国防部查信息外泄,防长四名亲信被解职
  • 成都一医院孕妇产下七胞胎?涉事医院辟谣:信息不实已举报
  • 二手服装“批发”市集受到年轻人追捧,是哪一股潮流在推动?