[root@node01 ~]# spark-shell --master spark://node01:7077Spark context available as 'sc' (master = spark://node01:7077, app id = app-20190619142652-0000). Spark session available as 'spark'. |
// 将spark/examples/src/main/resource下有各种资源,存放到hdfs上。 scala> val rdd1 = sc.textFile("/spark_res/people.txt").map(_.split(", ")) scala> case class People(val name:String,val age:Int) scala> val rdd2 = rdd1.map(x => People(x(0),x(1).toInt)) scala> rdd2.toDF 这里报错:已存在元数据,但对本案例无影响 res3: org.apache.spark.sql.DataFrame = [name: string, age: int] scala> res3.printSchema 查看数据结构 scala> res3.show 查看数据内容 |
scala> val df = spark.read.text("/spark_res/people.txt")df: org.apache.spark.sql.DataFrame = [value: string] scala> val rdd1 = spark.read.json("/spark_res/people.json") rdd1: org.apache.spark.sql.DataFrame = [age: bigint, name: string] scala> val rdd1 = spark.read.parquet("/spark_res/users.parquet") rdd1: org.apache.spark.sql.DataFrame = [name: string, favorite_color: string ... 1 more field] |
1.启动spark-shell指定驱动包 spark-shell \ --master spark://node01:7077 \ --executor-memory 1g \ --total-executor-cores 2 \ --jars /export/servers/hive-1.1.0-cdh5.14.0/lib/mysql-connector-java-5.1.38.jar \--driver-class-path /export/servers/hive-1.1.0-cdh5.14.0/lib/mysql-connector-java-5.1.38.jar 2.从mysql中加载数据 val mysqlDF = spark.read.format("jdbc").options( Map("url" -> "jdbc:mysql://192.168.52.105:3306/iplocation", "driver" -> "com.mysql.jdbc.Driver","dbtable" -> "iplocation", "user" -> "root", "password" -> "123456")).load() scala> mysqlDF.show +----------+---------+-----------+ | longitude| latitude|total_count| +----------+---------+-----------+ |108.948024|34.263161| 1824| |116.405285|39.904989| 1535| | 107.7601| 29.32548| 85| +----------+---------+-----------+ |
scala> df1.select("name").show scala> df1.select($"age",$"age"+1).show scala> df1.filter("age is not null").select($"age",$"age"+1).show scala> df1.filter(col("age")>21).show |
scala> val df = rdd2.toDF scala> val df = rdd2.toDF scala> spark.sql("select * from t_people").show scala> spark.sql("select * from t_people where age>20").show |
欢迎光临 黑马程序员技术交流社区 (http://bbs.itheima.com/) | 黑马程序员IT技术论坛 X3.2 |