MongoDB on SparkSql的读取和写入操作(Scala版本)
1.1 添加依赖
需要添加一下依赖:
<!-- spark 连接 mongo的连接器 -->
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.1</version>
</dependency>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1.2 读取mongodb数据
1.2.1 编写代码
package com.mongodb.spark
import org.apache.spark.sql.SparkSession
object ReadMongo {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local")
.appName("MyApp")
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.user")
.getOrCreate()
// 设置log级别
spark.sparkContext.setLogLevel("WARN")
val df = MongoSpark.load(spark)
df.show()
df.createOrReplaceTempView("user")
val resDf = spark.sql("select name,age,sex from user")
resDf.show()
spark.stop()
System.exit(0)
}
}
运行结果:
1.3 读取mongo数据,使用Schema约束
编写代码:
package com.mongodb.spark
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
object ReadMongoSchema {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local")
.appName("MyApp")
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.user")
.getOrCreate()
// 设置log级别
spark.sparkContext.setLogLevel("WARN")
val schema = StructType(
List(
StructField("name", StringType),
StructField("age", IntegerType),
StructField("sex", StringType)
)
)
// 通过schema约束,直接获取需要的字段
val df = spark.read.format("com.mongodb.spark.sql").schema(schema).load()
df.show()
df.createOrReplaceTempView("user")
val resDf = spark.sql("select * from user")
resDf.show()
spark.stop()
System.exit(0)
}
}
输出结果:
1.4 写入mongodb数据
编写代码:
package com.mongodb.spark
import org.apache.spark.sql.SparkSession
import org.bson.Document
object WriteMongo {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local")
.appName("MyApp")
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.user")
.getOrCreate()
// 设置log级别
spark.sparkContext.setLogLevel("WARN")
val document1 = new Document()
document1.append("name", "sunshangxiang").append("age", 18).append("sex", "female")
val document2 = new Document()
document2.append("name", "diaochan").append("age", 24).append("sex", "female")
val document3 = new Document()
document3.append("name", "huangyueying").append("age", 23).append("sex", "female")
val seq = Seq(document1, document2, document3)
val df = spark.sparkContext.parallelize(seq)
// 将数据写入mongo
MongoSpark.save(df)
spark.stop()
System.exit(0)
}
}
输出结果:
具体见博主原博客,项目参见github
---------------------
【转载,仅作分享,侵删】
作者:张行之
原文:https://blog.csdn.net/qq_33689414/article/details/83421766
版权声明:本文为博主原创文章,转载请附上博文链接!
|
|