尚硅谷大数据hadoop教程-笔记01入门(代码片段)

延锋L 延锋L     2023-03-23     684

关键词:

视频地址:尚硅谷大数据Hadoop教程(Hadoop 3.x安装搭建到集群调优)

  1. 尚硅谷大数据Hadoop教程-笔记01【入门】
  2. 尚硅谷大数据Hadoop教程-笔记02【HDFS】
  3. 尚硅谷大数据Hadoop教程-笔记03【MapReduce】
  4. 尚硅谷大数据Hadoop教程-笔记04【Yarn】
  5. 尚硅谷大数据Hadoop教程-笔记04【生产调优手册】
  6. 尚硅谷大数据Hadoop教程-笔记04【源码解析】

目录

00_尚硅谷大数据Hadoop课程整体介绍

P001【001_尚硅谷_Hadoop_开篇_课程整体介绍】08:38

01_尚硅谷大数据技术之大数据概论

P002【002_尚硅谷_Hadoop_概论_大数据的概念】04:34

P003【003_尚硅谷_Hadoop_概论_大数据的特点】07:23

P004【004_尚硅谷_Hadoop_概论_大数据的应用场景】09:58

P005【005_尚硅谷_Hadoop_概论_大数据的发展场景】08:17

P006【006_尚硅谷_Hadoop_概论_未来工作内容】06:25

02_尚硅谷大数据技术之Hadoop(入门)V3.3

P007【007_尚硅谷_Hadoop_入门_课程介绍】07:29

P008【008_尚硅谷_Hadoop_入门_Hadoop是什么】03:00

P009【009_尚硅谷_Hadoop_入门_Hadoop发展历史】05:52

P010【010_尚硅谷_Hadoop_入门_Hadoop三大发行版本】05:59

P011【011_尚硅谷_Hadoop_入门_Hadoop优势】03:52

P012【012_尚硅谷_Hadoop_入门_Hadoop1.x2.x3.x区别】03:00

P013【013_尚硅谷_Hadoop_入门_HDFS概述】06:26

P014【014_尚硅谷_Hadoop_入门_YARN概述】06:35

P015【015_尚硅谷_Hadoop_入门_MapReduce概述】01:55

P016【016_尚硅谷_Hadoop_入门_HDFS&YARN&MR关系】03:22

P017【017_尚硅谷_Hadoop_入门_大数据技术生态体系】09:17

P018【018_尚硅谷_Hadoop_入门_VMware安装】04:41

P019【019_尚硅谷_Hadoop_入门_Centos7.5软硬件安装】15:56

P020【020_尚硅谷_Hadoop_入门_IP和主机名称配置】10:50

P021【021_尚硅谷_Hadoop_入门_Xshell远程访问工具】09:05

P022【022_尚硅谷_Hadoop_入门_模板虚拟机准备完成】12:25

P023【023_尚硅谷_Hadoop_入门_克隆三台虚拟机】15:01

P024【024_尚硅谷_Hadoop_入门_JDK安装】07:02

P025【025_尚硅谷_Hadoop_入门_Hadoop安装】07:20

P026【026_尚硅谷_Hadoop_入门_本地运行模式】11:56

P027【027_尚硅谷_Hadoop_入门_scp&rsync命令讲解】15:01

P028【028_尚硅谷_Hadoop_入门_xsync分发脚本】18:14

P029【029_尚硅谷_Hadoop_入门_ssh免密登录】11:25

P030【030_尚硅谷_Hadoop_入门_集群配置】13:24

P031【031_尚硅谷_Hadoop_入门_群起集群并测试】16:52

P032【032_尚硅谷_Hadoop_入门_集群崩溃处理办法】08:10

P033【033_尚硅谷_Hadoop_入门_历史服务器配置】05:26

P034【034_尚硅谷_Hadoop_入门_日志聚集功能配置】05:42

P035【035_尚硅谷_Hadoop_入门_两个常用脚本】09:18

P036【036_尚硅谷_Hadoop_入门_两道面试题】04:15

P037【037_尚硅谷_Hadoop_入门_集群时间同步】11:27

P038【038_尚硅谷_Hadoop_入门_常见问题总结】10:57


00_尚硅谷大数据Hadoop课程整体介绍

P001【001_尚硅谷_Hadoop_开篇_课程整体介绍】08:38

Hadoop3.x从入门到精通

一、课程升级的重点内容
    1、yarn
    2、生产调优手册
    3、源码
二、课程特色
    1、新  hadoop3.1.3
    2、细  从搭建集群开始  每一个配置每一行代码都有注释。出书
    3、真  20+的企业案例  30+企业调优  从百万代码中阅读源码
    4、全  全套资料
三、资料获取方式
    1、关注尚硅谷教育 公众号:回复 大数据
    2、谷粒学院
    3、b站
四、技术基础要求
    Javase,maven + idea + linux常用命令

01_尚硅谷大数据技术之大数据概论

P002【002_尚硅谷_Hadoop_概论_大数据的概念】04:34

第1章,大数据概念:大数据(Big Data):指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合,是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。

大数据主要解决,海量数据的采集、存储和分析计算问题。

P003【003_尚硅谷_Hadoop_概论_大数据的特点】07:23

第2章,大数据特点(4V)

  1. Volume(大量)
  2. Velocity(高速)
  3. Variety(多样)
  4. Value(低价值密度)

P004【004_尚硅谷_Hadoop_概论_大数据的应用场景】09:58

第3章,大数据应用场景

  1. 抖音:推荐的都是你喜欢的视频。
  2. 电商站内广告推荐:给用户推荐可能喜欢的商品。
  3. 零售:分析用户消费习惯,为用户购买商品提供方便,从而提升商品销量。
  4. 物流仓储:京东物流,上午下单下午送达、下午下单次日上午送达。
  5. 保险:海量数据挖掘及风险预测,助力保险行业精准营销,提升精细化定价能力。
  6. 金融:多维度体现用户特征,帮助金融机构推荐优质客户,防范欺诈风险。
  7. 房产:大数据全面助力房地产行业,打造精准投策与营销,选出更合适的地,建造更合适的楼,卖给更合适的人。
  8. 人工智能 + 5G + 物联网 + 虚拟与现实。

P005【005_尚硅谷_Hadoop_概论_大数据的发展场景】08:17

第4章,好!

P006【006_尚硅谷_Hadoop_概论_未来工作内容】06:25

第5章,大数据部门间业务流程分析

第6章,大数据部门内组织结构

02_尚硅谷大数据技术之Hadoop(入门)V3.3

P007【007_尚硅谷_Hadoop_入门_课程介绍】07:29

P008【008_尚硅谷_Hadoop_入门_Hadoop是什么】03:00

P009【009_尚硅谷_Hadoop_入门_Hadoop发展历史】05:52

P010【010_尚硅谷_Hadoop_入门_Hadoop三大发行版本】05:59

Hadoop三大发行版本:Apache、Cloudera、Hortonworks。

1Apache Hadoop

官网地址:http://hadoop.apache.org

下载地址:https://hadoop.apache.org/releases.html

2Cloudera Hadoop

官网地址:https://www.cloudera.com/downloads/cdh

下载地址:https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_6_download.html

(1)2008年成立的Cloudera是最早将Hadoop商用的公司,为合作伙伴提供Hadoop的商用解决方案,主要是包括支持、咨询服务、培训。

22009Hadoop的创始人Doug Cutting也加盟Cloudera公司。Cloudera产品主要为CDH,Cloudera Manager,Cloudera Support

(3)CDH是Cloudera的Hadoop发行版,完全开源,比Apache Hadoop在兼容性,安全性,稳定性上有所增强。Cloudera的标价为每年每个节点10000美元

(4)Cloudera Manager是集群的软件分发及管理监控平台,可以在几个小时内部署好一个Hadoop集群,并对集群的节点及服务进行实时监控。

3Hortonworks Hadoop

官网地址:https://hortonworks.com/products/data-center/hdp/

下载地址:https://hortonworks.com/downloads/#data-platform

(1)2011年成立的Hortonworks是雅虎与硅谷风投公司Benchmark Capital合资组建。

2)公司成立之初就吸纳了大约25名至30名专门研究Hadoop的雅虎工程师,上述工程师均在2005年开始协助雅虎开发Hadoop,贡献了Hadoop80%的代码。

(3)Hortonworks的主打产品是Hortonworks Data Platform(HDP),也同样是100%开源的产品,HDP除常见的项目外还包括了Ambari,一款开源的安装和管理系统。

(4)2018年Hortonworks目前已经被Cloudera公司收购

P011【011_尚硅谷_Hadoop_入门_Hadoop优势】03:52

Hadoop优势(4高)

  1. 高可靠性
  2. 高拓展性
  3. 高效性
  4. 高容错性

P012【012_尚硅谷_Hadoop_入门_Hadoop1.x2.x3.x区别】03:00

P013【013_尚硅谷_Hadoop_入门_HDFS概述】06:26

Hadoop Distributed File System,简称 HDFS,是一个分布式文件系统。

  • 1)NameNode(nn):存储文件的元数据,如文件名,文件目录结构,文件属性(生成时间、副本数、文件权限),以及每个文件的块列表和块所在的DataNode等。
  • 2)DataNode(dn):在本地文件系统存储文件块数据,以及块数据的校验和。
  • 3)Secondary NameNode(2nn):每隔一段时间对NameNode元数据备份。

P014【014_尚硅谷_Hadoop_入门_YARN概述】06:35

Yet Another Resource Negotiator 简称 YARN ,另一种资源协调者,是 Hadoop 的资源管理器。

P015【015_尚硅谷_Hadoop_入门_MapReduce概述】01:55

MapReduce 将计算过程分为两个阶段:Map 和 Reduce

  • 1)Map 阶段并行处理输入数据
  • 2)Reduce 阶段对 Map 结果进行汇总

P016【016_尚硅谷_Hadoop_入门_HDFS&YARN&MR关系】03:22

  1. HDFS
    1. NameNode:负责数据存储。
    2. DataNode:数据存储在哪个节点上。
    3. SecondaryNameNode:秘书,备份NameNode数据恢复NameNode部分工作。
  2. YARN:整个集群的资源管理。
    1. ResourceManager:资源管理,map阶段。
    2. NodeManager
  3. MapReduce

P017【017_尚硅谷_Hadoop_入门_大数据技术生态体系】09:17

大数据技术生态体系

推荐系统项目框架

P018【018_尚硅谷_Hadoop_入门_VMware安装】04:41

 

P019【019_尚硅谷_Hadoop_入门_Centos7.5软硬件安装】15:56

P020【020_尚硅谷_Hadoop_入门_IP和主机名称配置】10:50

[root@hadoop100 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoop100 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.88.133  netmask 255.255.255.0  broadcast 192.168.88.255
        inet6 fe80::363b:8659:c323:345d  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:0f:0a:6d  txqueuelen 1000  (Ethernet)
        RX packets 684561  bytes 1003221355 (956.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 53538  bytes 3445292 (3.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 84  bytes 9492 (9.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 84  bytes 9492 (9.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:1c:3c:a9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@hadoop100 ~]# systemctl restart network
[root@hadoop100 ~]# cat /etc/host
cat: /etc/host: 没有那个文件或目录
[root@hadoop100 ~]# cat /etc/hostname
hadoop100
[root@hadoop100 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
[root@hadoop100 ~]# vim /etc/hosts
[root@hadoop100 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.88.100  netmask 255.255.255.0  broadcast 192.168.88.255
        inet6 fe80::363b:8659:c323:345d  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:0f:0a:6d  txqueuelen 1000  (Ethernet)
        RX packets 684830  bytes 1003244575 (956.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 53597  bytes 3452600 (3.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 132  bytes 14436 (14.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 132  bytes 14436 (14.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:1c:3c:a9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@hadoop100 ~]# ll
总用量 40
-rw-------. 1 root root 1973 3月  14 10:19 anaconda-ks.cfg
-rw-r--r--. 1 root root 2021 3月  14 10:26 initial-setup-ks.cfg
drwxr-xr-x. 2 root root 4096 3月  14 10:27 公共
drwxr-xr-x. 2 root root 4096 3月  14 10:27 模板
drwxr-xr-x. 2 root root 4096 3月  14 10:27 视频
drwxr-xr-x. 2 root root 4096 3月  14 10:27 图片
drwxr-xr-x. 2 root root 4096 3月  14 10:27 文档
drwxr-xr-x. 2 root root 4096 3月  14 10:27 下载
drwxr-xr-x. 2 root root 4096 3月  14 10:27 音乐
drwxr-xr-x. 2 root root 4096 3月  14 10:27 桌面
[root@hadoop100 ~]# 

vim /etc/sysconfig/network-scripts/ifcfg-ens33

TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="3241b48d-3234-4c23-8a03-b9b393a99a65"
DEVICE="ens33"
ONBOOT="yes"

IPADDR=192.168.88.100
GATEWAY=192.168.88.2
DNS1=192.168.88.2

vim /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.88.100 hadoop100
192.168.88.101 hadoop101
192.168.88.102 hadoop102
192.168.88.103 hadoop103
192.168.88.104 hadoop104
192.168.88.105 hadoop105
192.168.88.106 hadoop106
192.168.88.107 hadoop107
192.168.88.108 hadoop108

192.168.88.151 node1 node1.itcast.cn
192.168.88.152 node2 node2.itcast.cn
192.168.88.153 node3 node3.itcast.cn

P021【021_尚硅谷_Hadoop_入门_Xshell远程访问工具】09:05

P022【022_尚硅谷_Hadoop_入门_模板虚拟机准备完成】12:25

yum install -y epel-release

systemctl stop firewalld

systemctl disable firewalld.service

P023【023_尚硅谷_Hadoop_入门_克隆三台虚拟机】15:01

vim /etc/sysconfig/network-scripts/ifcfg-ens33

vim /etc/hostname

reboot

P024【024_尚硅谷_Hadoop_入门_JDK安装】07:02

在hadoop102上安装jdk,然后将jdk拷贝到hadoop103与hadoop104上。

P025【025_尚硅谷_Hadoop_入门_Hadoop安装】07:20

同P024图!

P026【026_尚硅谷_Hadoop_入门_本地运行模式】11:56

Apache Hadoop

http://node1:9870/explorer.html#/

[root@node1 ~]# cd /export/server/hadoop-3.3.0/share/hadoop/mapreduce/
[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.3.0.jar wordcount /wordcount/input /wordcount/output
2023-03-20 14:43:07,516 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.88.151:8032
2023-03-20 14:43:09,291 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1679293699463_0001
2023-03-20 14:43:11,916 INFO input.FileInputFormat: Total input files to process : 1
2023-03-20 14:43:12,313 INFO mapreduce.JobSubmitter: number of splits:1
2023-03-20 14:43:13,173 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1679293699463_0001
2023-03-20 14:43:13,173 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-03-20 14:43:14,684 INFO conf.Configuration: resource-types.xml not found
2023-03-20 14:43:14,684 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-03-20 14:43:17,054 INFO impl.YarnClientImpl: Submitted application application_1679293699463_0001
2023-03-20 14:43:17,123 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1679293699463_0001/
2023-03-20 14:43:17,124 INFO mapreduce.Job: Running job: job_1679293699463_0001
2023-03-20 14:43:52,340 INFO mapreduce.Job: Job job_1679293699463_0001 running in uber mode : false
2023-03-20 14:43:52,360 INFO mapreduce.Job:  map 0% reduce 0%
2023-03-20 14:44:08,011 INFO mapreduce.Job:  map 100% reduce 0%
2023-03-20 14:44:16,986 INFO mapreduce.Job:  map 100% reduce 100%
2023-03-20 14:44:18,020 INFO mapreduce.Job: Job job_1679293699463_0001 completed successfully
2023-03-20 14:44:18,579 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=31
                FILE: Number of bytes written=529345
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=142
                HDFS: Number of bytes written=17
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=11303
                Total time spent by all reduces in occupied slots (ms)=6220
                Total time spent by all map tasks (ms)=11303
                Total time spent by all reduce tasks (ms)=6220
                Total vcore-milliseconds taken by all map tasks=11303
                Total vcore-milliseconds taken by all reduce tasks=6220
                Total megabyte-milliseconds taken by all map tasks=11574272
                Total megabyte-milliseconds taken by all reduce tasks=6369280
        Map-Reduce Framework
                Map input records=2
                Map output records=5
                Map output bytes=53
                Map output materialized bytes=31
                Input split bytes=108
                Combine input records=5
                Combine output records=2
                Reduce input groups=2
                Reduce shuffle bytes=31
                Reduce input records=2
                Reduce output records=2
                Spilled Records=4
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=546
                CPU time spent (ms)=3680
                Physical memory (bytes) snapshot=499236864
                Virtual memory (bytes) snapshot=5568684032
                Total committed heap usage (bytes)=365953024
                Peak Map Physical memory (bytes)=301096960
                Peak Map Virtual memory (bytes)=2779201536
                Peak Reduce Physical memory (bytes)=198139904
                Peak Reduce Virtual memory (bytes)=2789482496
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=34
        File Output Format Counters 
                Bytes Written=17
[root@node1 mapreduce]#

[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.3.0.jar wordcount /wc_input /wc_output
2023-03-20 15:01:48,007 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.88.151:8032
2023-03-20 15:01:49,475 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1679293699463_0002
2023-03-20 15:01:50,522 INFO input.FileInputFormat: Total input files to process : 1
2023-03-20 15:01:51,010 INFO mapreduce.JobSubmitter: number of splits:1
2023-03-20 15:01:51,894 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1679293699463_0002
2023-03-20 15:01:51,894 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-03-20 15:01:52,684 INFO conf.Configuration: resource-types.xml not found
2023-03-20 15:01:52,687 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-03-20 15:01:53,237 INFO impl.YarnClientImpl: Submitted application application_1679293699463_0002
2023-03-20 15:01:53,487 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1679293699463_0002/
2023-03-20 15:01:53,492 INFO mapreduce.Job: Running job: job_1679293699463_0002
2023-03-20 15:02:15,329 INFO mapreduce.Job: Job job_1679293699463_0002 running in uber mode : false
2023-03-20 15:02:15,342 INFO mapreduce.Job:  map 0% reduce 0%
2023-03-20 15:02:26,652 INFO mapreduce.Job:  map 100% reduce 0%
2023-03-20 15:02:40,297 INFO mapreduce.Job:  map 100% reduce 100%
2023-03-20 15:02:41,350 INFO mapreduce.Job: Job job_1679293699463_0002 completed successfully
2023-03-20 15:02:41,557 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=60
                FILE: Number of bytes written=529375
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=149
                HDFS: Number of bytes written=38
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=8398
                Total time spent by all reduces in occupied slots (ms)=9720
                Total time spent by all map tasks (ms)=8398
                Total time spent by all reduce tasks (ms)=9720
                Total vcore-milliseconds taken by all map tasks=8398
                Total vcore-milliseconds taken by all reduce tasks=9720
                Total megabyte-milliseconds taken by all map tasks=8599552
                Total megabyte-milliseconds taken by all reduce tasks=9953280
        Map-Reduce Framework
                Map input records=4
                Map output records=6
                Map output bytes=69
                Map output materialized bytes=60
                Input split bytes=100
                Combine input records=6
                Combine output records=4
                Reduce input groups=4
                Reduce shuffle bytes=60
                Reduce input records=4
                Reduce output records=4
                Spilled Records=8
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=1000
                CPU time spent (ms)=3880
                Physical memory (bytes) snapshot=503771136
                Virtual memory (bytes) snapshot=5568987136
                Total committed heap usage (bytes)=428343296
                Peak Map Physical memory (bytes)=303013888
                Peak Map Virtual memory (bytes)=2782048256
                Peak Reduce Physical memory (bytes)=200757248
                Peak Reduce Virtual memory (bytes)=2786938880
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=49
        File Output Format Counters 
                Bytes Written=38
[root@node1 mapreduce]# pwd
/export/server/hadoop-3.3.0/share/hadoop/mapreduce
[root@node1 mapreduce]# 

P027【027_尚硅谷_Hadoop_入门_scp&rsync命令讲解】15:01

第一次同步用scp,后续同步用rsync。

rsync主要用于备份和镜像,具有速度快、避免复制相同内容和支持符号链接的优点。

rsyncscp区别:rsync做文件的复制要比scp的速度快,rsync只对差异文件做更新。scp是把所有文件都复制过去。

P028【028_尚硅谷_Hadoop_入门_xsync分发脚本】18:14

拷贝同步命令

  1. scp(secure copy)安全拷贝
  2. rsync 远程同步工具
  3. xsync 集群分发脚本

dirname命令:截取文件的路径,去除文件名中的非目录部分,仅显示与目录有关的内容。

[root@node1 ~]# dirname /home/atguigu/a.txt
/home/atguigu
[root@node1 ~]#

basename命令:获取文件名称。

[root@node1 atguigu]# basename /home/atguigu/a.txt
a.txt
[root@node1 atguigu]#

#!/bin/bash

#1. 判断参数个数
if [ $# -lt 1 ]
then
    echo Not Enough Arguement!
    exit;
fi

#2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
    echo ====================  $host  ====================
    #3. 遍历所有目录,挨个发送

    for file in $@
    do
        #4. 判断文件是否存在
        if [ -e $file ]
            then
                #5. 获取父目录
                pdir=$(cd -P $(dirname $file); pwd)

                #6. 获取当前文件的名称
                fname=$(basename $file)
                ssh $host "mkdir -p $pdir"
                rsync -av $pdir/$fname $host:$pdir
            else
                echo $file does not exists!
        fi
    done
done
[root@node1 bin]# chmod 777 xsync 
[root@node1 bin]# ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月  20 16:00 xsync
[root@node1 bin]# cd ..
[root@node1 atguigu]# xsync bin/
==================== node1 ====================
sending incremental file list

sent 94 bytes  received 17 bytes  222.00 bytes/sec
total size is 727  speedup is 6.55
==================== node2 ====================
sending incremental file list
bin/
bin/xsync

sent 871 bytes  received 39 bytes  606.67 bytes/sec
total size is 727  speedup is 0.80
==================== node3 ====================
sending incremental file list
bin/
bin/xsync

sent 871 bytes  received 39 bytes  1,820.00 bytes/sec
total size is 727  speedup is 0.80
[root@node1 atguigu]# pwd
/home/atguigu
[root@node1 atguigu]# ls -al
总用量 20
drwx------  6 atguigu atguigu  168 3月  20 15:56 .
drwxr-xr-x. 6 root    root      56 3月  20 10:08 ..
-rw-r--r--  1 root    root       0 3月  20 15:44 a.txt
-rw-------  1 atguigu atguigu   21 3月  20 11:48 .bash_history
-rw-r--r--  1 atguigu atguigu   18 8月   8 2019 .bash_logout
-rw-r--r--  1 atguigu atguigu  193 8月   8 2019 .bash_profile
-rw-r--r--  1 atguigu atguigu  231 8月   8 2019 .bashrc
drwxrwxr-x  2 atguigu atguigu   19 3月  20 15:56 bin
drwxrwxr-x  3 atguigu atguigu   18 3月  20 10:17 .cache
drwxrwxr-x  3 atguigu atguigu   18 3月  20 10:17 .config
drwxr-xr-x  4 atguigu atguigu   39 3月  10 20:04 .mozilla
-rw-------  1 atguigu atguigu 1261 3月  20 15:56 .viminfo
[root@node1 atguigu]# 
连接成功
Last login: Mon Mar 20 16:01:40 2023
[root@node1 ~]# su atguigu
[atguigu@node1 root]$ cd /home/atguigu/
[atguigu@node1 ~]$ pwd
/home/atguigu
[atguigu@node1 ~]$ xsync bin/
==================== node1 ====================
The authenticity of host 'node1 (192.168.88.151)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.88.151' (ECDSA) to the list of known hosts.
atguigu@node1's password: 
atguigu@node1's password: 
sending incremental file list

sent 98 bytes  received 17 bytes  17.69 bytes/sec
total size is 727  speedup is 6.32
==================== node2 ====================
The authenticity of host 'node2 (192.168.88.152)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.88.152' (ECDSA) to the list of known hosts.
atguigu@node2's password: 
atguigu@node2's password: 
sending incremental file list

sent 94 bytes  received 17 bytes  44.40 bytes/sec
total size is 727  speedup is 6.55
==================== node3 ====================
The authenticity of host 'node3 (192.168.88.153)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.88.153' (ECDSA) to the list of known hosts.
atguigu@node3's password: 
atguigu@node3's password: 
sending incremental file list

sent 94 bytes  received 17 bytes  44.40 bytes/sec
total size is 727  speedup is 6.55
[atguigu@node1 ~]$ 
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 17:22:20 2023 from 192.168.88.151
[root@node2 ~]# su atguigu
[atguigu@node2 root]$ vim /etc/sudoers
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 root]$ su root
密码:
[root@node2 ~]# vim /etc/sudoers
[root@node2 ~]# cd /opt/
[root@node2 opt]# ll
总用量 0
drwxr-xr-x  4 atguigu atguigu 46 3月  20 11:32 module
drwxr-xr-x. 2 root    root     6 10月 31 2018 rh
drwxr-xr-x  2 atguigu atguigu 67 3月  20 10:47 software
[root@node2 opt]# su atguigu
[atguigu@node2 opt]$ cd /home/atguigu/
[atguigu@node2 ~]$ llk
bash: llk: 未找到命令
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月  20 15:56 bin
[atguigu@node2 ~]$ cd ~
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月  20 15:56 bin
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月  20 15:56 bin
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 ~]$ cd bin
[atguigu@node2 bin]$ ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月  20 16:00 xsync
[atguigu@node2 bin]$ 
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 17:22:26 2023 from 192.168.88.152
[root@node3 ~]# vim /etc/sudoers
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# cd /opt/
[root@node3 opt]# ll
总用量 0
drwxr-xr-x  4 atguigu atguigu 46 3月  20 11:32 module
drwxr-xr-x. 2 root    root     6 10月 31 2018 rh
drwxr-xr-x  2 atguigu atguigu 67 3月  20 10:47 software
[root@node3 opt]# cd ~
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月  11 2020 anaconda-ks.cfg
-rw-------  1 root root    0 2月  23 16:20 nohup.out
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月  11 2020 anaconda-ks.cfg
-rw-------  1 root root    0 2月  23 16:20 nohup.out
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# cd ~
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月  11 2020 anaconda-ks.cfg
-rw-------  1 root root    0 2月  23 16:20 nohup.out
[root@node3 ~]# su atguigu
[atguigu@node3 root]$ cd ~
[atguigu@node3 ~]$ ls
bin
[atguigu@node3 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月  20 15:56 bin
[atguigu@node3 ~]$ cd bin
[atguigu@node3 bin]$ ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月  20 16:00 xsync
[atguigu@node3 bin]$ 
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 16:01:40 2023
[root@node1 ~]# su atguigu
[atguigu@node1 root]$ cd /home/atguigu/
[atguigu@node1 ~]$ pwd
/home/atguigu
[atguigu@node1 ~]$ xsync bin/
==================== node1 ====================
The authenticity of host 'node1 (192.168.88.151)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.88.151' (ECDSA) to the list of known hosts.
atguigu@node1's password: 
atguigu@node1's password: 
sending incremental file list

sent 98 bytes  received 17 bytes  17.69 bytes/sec
total size is 727  speedup is 6.32
==================== node2 ====================
The authenticity of host 'node2 (192.168.88.152)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.88.152' (ECDSA) to the list of known hosts.
atguigu@node2's password: 
atguigu@node2's password: 
sending incremental file list

sent 94 bytes  received 17 bytes  44.40 bytes/sec
total size is 727  speedup is 6.55
==================== node3 ====================
The authenticity of host 'node3 (192.168.88.153)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.88.153' (ECDSA) to the list of known hosts.
atguigu@node3's password: 
atguigu@node3's password: 
sending incremental file list

sent 94 bytes  received 17 bytes  44.40 bytes/sec
total size is 727  speedup is 6.55
[atguigu@node1 ~]$ xsync /etc/profile.d/my_env.sh
==================== node1 ====================
atguigu@node1's password: 
atguigu@node1's password: 
.sending incremental file list

sent 48 bytes  received 12 bytes  13.33 bytes/sec
total size is 223  speedup is 3.72
==================== node2 ====================
atguigu@node2's password: 
atguigu@node2's password: 
sending incremental file list
my_env.sh
rsync: mkstemp "/etc/profile.d/.my_env.sh.guTzvB" failed: Permission denied (13)

sent 95 bytes  received 126 bytes  88.40 bytes/sec
total size is 223  speedup is 1.01
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
==================== node3 =========&#

尚硅谷大数据技术hadoop教程-笔记06hadoop-生产调优手册(代码片段)

视频地址:尚硅谷大数据Hadoop教程(Hadoop3.x安装搭建到集群调优)尚硅谷大数据技术Hadoop教程-笔记01【大数据概论】尚硅谷大数据技术Hadoop教程-笔记02【Hadoop-入门】尚硅谷大数据技术Hadoop教程-笔记03【Hadoop-HDFS】尚硅... 查看详情

学习笔记尚硅谷hadoop大数据教程笔记

本文是尚硅谷Hadoop教程的学习笔记,由于个人的需要,只致力于搞清楚Hadoop是什么,它可以解决什么问题,以及它的原理是什么。至于具体怎么安装、使用和编写代码不在我考虑的范围内。一、Hadoop入门大数据的... 查看详情

学习笔记尚硅谷hadoop大数据教程笔记

本文是尚硅谷Hadoop教程的学习笔记,由于个人的需要,只致力于搞清楚Hadoop是什么,它可以解决什么问题,以及它的原理是什么。至于具体怎么安装、使用和编写代码不在我考虑的范围内。一、Hadoop入门大数据的... 查看详情

学习笔记尚硅谷hadoop大数据教程笔记

本文是尚硅谷Hadoop教程的学习笔记,由于个人的需要,只致力于搞清楚Hadoop是什么,它可以解决什么问题,以及它的原理是什么。至于具体怎么安装、使用和编写代码不在我考虑的范围内。一、Hadoop入门大数据的... 查看详情

大数据周会-本周学习内容总结09

...学内容】02【Saprk】会议记录01【scala】1.1【已学内容】尚硅谷大数据技术Scala教程-笔记01【Scala课程简介、Scala入门、变量和数据类型、运算符、流程控制】尚硅谷大数据技术Scala教程-笔记02【函数式编程】尚硅谷大数据技术Scala教... 查看详情

clickhouse-尚硅谷(6.入门-副本)学习笔记(代码片段)

上一篇:(5.入门-SQL操作)学习笔记下一篇:(7.入门-分片集群)文章目录1副本写入流程2配置步骤1副本写入流程2配置步骤启动zookeeper集群在hadoop102的/etc/clickhouse-server/config.d目录下创建一个名为metrika.xml... 查看详情

:搭建hadoop集群(送尚硅谷大数据笔记)

尚硅谷Hadoop3.x官方文档大全免费下载。https://pan.baidu.com/share/init?surl=P5JAtWlGDKMAPWmHAAcbyA提取码:5h60搭建集群没什么好讲的,跟着视频和笔记出不了什么问题。唯一遇到的问题就是安装好VmWare后打不开,发现是老师给... 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(8)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课程中共有... 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(8)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课程中共有... 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(4)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(6)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(7)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(8)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(3)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(9)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(9)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(6)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情

尚硅谷vue系列教程学习笔记(代码片段)

尚硅谷Vue系列教程学习笔记(3)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情