当前位置：首页 > news >正文

怎么样才能在idea中写入spark程序

news 来源：原创 2025/4/29 5:43:37

一、准备环境

1.安装Scala插件

专业版IDEA自带Scala插件，社区版需手动安装

确保插件版本与IDEA版本匹配

2.选择用哪个构建工具

sbt‌：适用于依赖管理简单、快速迭代的项目，需提前安装sbt工具24。

‌Maven‌：适合熟悉Java生态、需复杂依赖管理的场景。

二、创建项目

方式1：sbt项目

新建项目时选择‌sbt‌作为构建系统。
配置Scala版本与Spark兼容（如Spark 3.5.5对应Scala 2.12.x）。
勾选“添加示例代码”生成标准目录结构4。

方式2：Maven项目

新建Maven项目，填写GroupId/ArtifactId。
删除默认生成的src模块，新建子模块管理代码。

三、配置依赖

sbt项目

在build.sbt中添加：

libraryDependencies += "org.apache.spark" %% "spark-core" % "3.5.5"
// 其他组件如
spark-sql、spark-streaming等按需添加:ml-citation{ref="2,4" 
data="citationList"}

Maven项目

在pom.xml中添加

<dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.12</artifactId><version>3.5.5</version>
</dependency>:ml-citation{ref="5,7" data="citationList"}

四、编写Spark程序

1.创建Scala类

在目录下新建Scala文件

例如：

import org.apache.spark.{SparkConf, SparkContext}object WordCount {def main(args: Array[String]): Unit = {val conf = new SparkConf().setAppName("WordCount").setMaster("local")  // 本地模式运行，集群模式需注释此行:ml-citation{ref="1,7" data="citationList"}val sc = new SparkContext(conf)sc.setLogLevel("ERROR")  // 减少日志输出:ml-citation{ref="1" data="citationList"}val textFile = sc.textFile("hdfs://path/to/input.txt")  // 或本地文件路径val wordCounts = textFile.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _)wordCounts.collect().foreach(println)sc.stop()}
}