关键词:
视频地址:尚硅谷大数据Hadoop教程(Hadoop 3.x安装搭建到集群调优)
- 尚硅谷大数据Hadoop教程-笔记01【入门】
- 尚硅谷大数据Hadoop教程-笔记02【HDFS】
- 尚硅谷大数据Hadoop教程-笔记03【MapReduce】
- 尚硅谷大数据Hadoop教程-笔记04【Yarn】
- 尚硅谷大数据Hadoop教程-笔记04【生产调优手册】
- 尚硅谷大数据Hadoop教程-笔记04【源码解析】
目录
P001【001_尚硅谷_Hadoop_开篇_课程整体介绍】08:38
P002【002_尚硅谷_Hadoop_概论_大数据的概念】04:34
P003【003_尚硅谷_Hadoop_概论_大数据的特点】07:23
P004【004_尚硅谷_Hadoop_概论_大数据的应用场景】09:58
P005【005_尚硅谷_Hadoop_概论_大数据的发展场景】08:17
P006【006_尚硅谷_Hadoop_概论_未来工作内容】06:25
P007【007_尚硅谷_Hadoop_入门_课程介绍】07:29
P008【008_尚硅谷_Hadoop_入门_Hadoop是什么】03:00
P009【009_尚硅谷_Hadoop_入门_Hadoop发展历史】05:52
P010【010_尚硅谷_Hadoop_入门_Hadoop三大发行版本】05:59
P011【011_尚硅谷_Hadoop_入门_Hadoop优势】03:52
P012【012_尚硅谷_Hadoop_入门_Hadoop1.x2.x3.x区别】03:00
P013【013_尚硅谷_Hadoop_入门_HDFS概述】06:26
P014【014_尚硅谷_Hadoop_入门_YARN概述】06:35
P015【015_尚硅谷_Hadoop_入门_MapReduce概述】01:55
P016【016_尚硅谷_Hadoop_入门_HDFS&YARN&MR关系】03:22
P017【017_尚硅谷_Hadoop_入门_大数据技术生态体系】09:17
P018【018_尚硅谷_Hadoop_入门_VMware安装】04:41
P019【019_尚硅谷_Hadoop_入门_Centos7.5软硬件安装】15:56
P020【020_尚硅谷_Hadoop_入门_IP和主机名称配置】10:50
P021【021_尚硅谷_Hadoop_入门_Xshell远程访问工具】09:05
P022【022_尚硅谷_Hadoop_入门_模板虚拟机准备完成】12:25
P023【023_尚硅谷_Hadoop_入门_克隆三台虚拟机】15:01
P024【024_尚硅谷_Hadoop_入门_JDK安装】07:02
P025【025_尚硅谷_Hadoop_入门_Hadoop安装】07:20
P026【026_尚硅谷_Hadoop_入门_本地运行模式】11:56
P027【027_尚硅谷_Hadoop_入门_scp&rsync命令讲解】15:01
P028【028_尚硅谷_Hadoop_入门_xsync分发脚本】18:14
P029【029_尚硅谷_Hadoop_入门_ssh免密登录】11:25
P030【030_尚硅谷_Hadoop_入门_集群配置】13:24
P031【031_尚硅谷_Hadoop_入门_群起集群并测试】16:52
P032【032_尚硅谷_Hadoop_入门_集群崩溃处理办法】08:10
P033【033_尚硅谷_Hadoop_入门_历史服务器配置】05:26
P034【034_尚硅谷_Hadoop_入门_日志聚集功能配置】05:42
P035【035_尚硅谷_Hadoop_入门_两个常用脚本】09:18
P036【036_尚硅谷_Hadoop_入门_两道面试题】04:15
P037【037_尚硅谷_Hadoop_入门_集群时间同步】11:27
P038【038_尚硅谷_Hadoop_入门_常见问题总结】10:57
00_尚硅谷大数据Hadoop课程整体介绍
P001【001_尚硅谷_Hadoop_开篇_课程整体介绍】08:38
一、课程升级的重点内容
1、yarn
2、生产调优手册
3、源码
二、课程特色
1、新 hadoop3.1.3
2、细 从搭建集群开始 每一个配置每一行代码都有注释。出书
3、真 20+的企业案例 30+企业调优 从百万代码中阅读源码
4、全 全套资料
三、资料获取方式
1、关注尚硅谷教育 公众号:回复 大数据
2、谷粒学院
3、b站
四、技术基础要求
Javase,maven + idea + linux常用命令
01_尚硅谷大数据技术之大数据概论
P002【002_尚硅谷_Hadoop_概论_大数据的概念】04:34
第1章,大数据概念:大数据(Big Data):指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合,是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。
大数据主要解决,海量数据的采集、存储和分析计算问题。
P003【003_尚硅谷_Hadoop_概论_大数据的特点】07:23
第2章,大数据特点(4V)
- Volume(大量)
- Velocity(高速)
- Variety(多样)
- Value(低价值密度)
P004【004_尚硅谷_Hadoop_概论_大数据的应用场景】09:58
第3章,大数据应用场景
- 抖音:推荐的都是你喜欢的视频。
- 电商站内广告推荐:给用户推荐可能喜欢的商品。
- 零售:分析用户消费习惯,为用户购买商品提供方便,从而提升商品销量。
- 物流仓储:京东物流,上午下单下午送达、下午下单次日上午送达。
- 保险:海量数据挖掘及风险预测,助力保险行业精准营销,提升精细化定价能力。
- 金融:多维度体现用户特征,帮助金融机构推荐优质客户,防范欺诈风险。
- 房产:大数据全面助力房地产行业,打造精准投策与营销,选出更合适的地,建造更合适的楼,卖给更合适的人。
- 人工智能 + 5G + 物联网 + 虚拟与现实。
P005【005_尚硅谷_Hadoop_概论_大数据的发展场景】08:17
第4章,好!
P006【006_尚硅谷_Hadoop_概论_未来工作内容】06:25
第5章,大数据部门间业务流程分析
第6章,大数据部门内组织结构
02_尚硅谷大数据技术之Hadoop(入门)V3.3
P007【007_尚硅谷_Hadoop_入门_课程介绍】07:29
P008【008_尚硅谷_Hadoop_入门_Hadoop是什么】03:00
P009【009_尚硅谷_Hadoop_入门_Hadoop发展历史】05:52
P010【010_尚硅谷_Hadoop_入门_Hadoop三大发行版本】05:59
Hadoop三大发行版本:Apache、Cloudera、Hortonworks。
1)Apache Hadoop
官网地址:http://hadoop.apache.org
下载地址:https://hadoop.apache.org/releases.html
2)Cloudera Hadoop
官网地址:https://www.cloudera.com/downloads/cdh
下载地址:https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_6_download.html
(1)2008年成立的Cloudera是最早将Hadoop商用的公司,为合作伙伴提供Hadoop的商用解决方案,主要是包括支持、咨询服务、培训。
(2)2009年Hadoop的创始人Doug Cutting也加盟Cloudera公司。Cloudera产品主要为CDH,Cloudera Manager,Cloudera Support
(3)CDH是Cloudera的Hadoop发行版,完全开源,比Apache Hadoop在兼容性,安全性,稳定性上有所增强。Cloudera的标价为每年每个节点10000美元。
(4)Cloudera Manager是集群的软件分发及管理监控平台,可以在几个小时内部署好一个Hadoop集群,并对集群的节点及服务进行实时监控。
3)Hortonworks Hadoop
官网地址:https://hortonworks.com/products/data-center/hdp/
下载地址:https://hortonworks.com/downloads/#data-platform
(1)2011年成立的Hortonworks是雅虎与硅谷风投公司Benchmark Capital合资组建。
(2)公司成立之初就吸纳了大约25名至30名专门研究Hadoop的雅虎工程师,上述工程师均在2005年开始协助雅虎开发Hadoop,贡献了Hadoop80%的代码。
(3)Hortonworks的主打产品是Hortonworks Data Platform(HDP),也同样是100%开源的产品,HDP除常见的项目外还包括了Ambari,一款开源的安装和管理系统。
(4)2018年Hortonworks目前已经被Cloudera公司收购。
P011【011_尚硅谷_Hadoop_入门_Hadoop优势】03:52
Hadoop优势(4高)
- 高可靠性
- 高拓展性
- 高效性
- 高容错性
P012【012_尚硅谷_Hadoop_入门_Hadoop1.x2.x3.x区别】03:00
P013【013_尚硅谷_Hadoop_入门_HDFS概述】06:26
Hadoop Distributed File System,简称 HDFS,是一个分布式文件系统。
- 1)NameNode(nn):存储文件的元数据,如文件名,文件目录结构,文件属性(生成时间、副本数、文件权限),以及每个文件的块列表和块所在的DataNode等。
- 2)DataNode(dn):在本地文件系统存储文件块数据,以及块数据的校验和。
- 3)Secondary NameNode(2nn):每隔一段时间对NameNode元数据备份。
P014【014_尚硅谷_Hadoop_入门_YARN概述】06:35
Yet Another Resource Negotiator 简称 YARN ,另一种资源协调者,是 Hadoop 的资源管理器。
P015【015_尚硅谷_Hadoop_入门_MapReduce概述】01:55
MapReduce 将计算过程分为两个阶段:Map 和 Reduce
- 1)Map 阶段并行处理输入数据
- 2)Reduce 阶段对 Map 结果进行汇总
P016【016_尚硅谷_Hadoop_入门_HDFS&YARN&MR关系】03:22
- HDFS
- NameNode:负责数据存储。
- DataNode:数据存储在哪个节点上。
- SecondaryNameNode:秘书,备份NameNode数据恢复NameNode部分工作。
- YARN:整个集群的资源管理。
- ResourceManager:资源管理,map阶段。
- NodeManager
- MapReduce
P017【017_尚硅谷_Hadoop_入门_大数据技术生态体系】09:17
大数据技术生态体系
推荐系统项目框架
P018【018_尚硅谷_Hadoop_入门_VMware安装】04:41
P019【019_尚硅谷_Hadoop_入门_Centos7.5软硬件安装】15:56
P020【020_尚硅谷_Hadoop_入门_IP和主机名称配置】10:50
[root@hadoop100 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoop100 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.88.133 netmask 255.255.255.0 broadcast 192.168.88.255
inet6 fe80::363b:8659:c323:345d prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:0f:0a:6d txqueuelen 1000 (Ethernet)
RX packets 684561 bytes 1003221355 (956.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 53538 bytes 3445292 (3.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 84 bytes 9492 (9.2 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 84 bytes 9492 (9.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:1c:3c:a9 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@hadoop100 ~]# systemctl restart network
[root@hadoop100 ~]# cat /etc/host
cat: /etc/host: 没有那个文件或目录
[root@hadoop100 ~]# cat /etc/hostname
hadoop100
[root@hadoop100 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
[root@hadoop100 ~]# vim /etc/hosts
[root@hadoop100 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.88.100 netmask 255.255.255.0 broadcast 192.168.88.255
inet6 fe80::363b:8659:c323:345d prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:0f:0a:6d txqueuelen 1000 (Ethernet)
RX packets 684830 bytes 1003244575 (956.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 53597 bytes 3452600 (3.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 132 bytes 14436 (14.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 132 bytes 14436 (14.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:1c:3c:a9 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@hadoop100 ~]# ll
总用量 40
-rw-------. 1 root root 1973 3月 14 10:19 anaconda-ks.cfg
-rw-r--r--. 1 root root 2021 3月 14 10:26 initial-setup-ks.cfg
drwxr-xr-x. 2 root root 4096 3月 14 10:27 公共
drwxr-xr-x. 2 root root 4096 3月 14 10:27 模板
drwxr-xr-x. 2 root root 4096 3月 14 10:27 视频
drwxr-xr-x. 2 root root 4096 3月 14 10:27 图片
drwxr-xr-x. 2 root root 4096 3月 14 10:27 文档
drwxr-xr-x. 2 root root 4096 3月 14 10:27 下载
drwxr-xr-x. 2 root root 4096 3月 14 10:27 音乐
drwxr-xr-x. 2 root root 4096 3月 14 10:27 桌面
[root@hadoop100 ~]#
vim /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="3241b48d-3234-4c23-8a03-b9b393a99a65"
DEVICE="ens33"
ONBOOT="yes"IPADDR=192.168.88.100
GATEWAY=192.168.88.2
DNS1=192.168.88.2vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.88.100 hadoop100
192.168.88.101 hadoop101
192.168.88.102 hadoop102
192.168.88.103 hadoop103
192.168.88.104 hadoop104
192.168.88.105 hadoop105
192.168.88.106 hadoop106
192.168.88.107 hadoop107
192.168.88.108 hadoop108192.168.88.151 node1 node1.itcast.cn
192.168.88.152 node2 node2.itcast.cn
192.168.88.153 node3 node3.itcast.cn
P021【021_尚硅谷_Hadoop_入门_Xshell远程访问工具】09:05
P022【022_尚硅谷_Hadoop_入门_模板虚拟机准备完成】12:25
yum install -y epel-release
systemctl stop firewalld
systemctl disable firewalld.service
P023【023_尚硅谷_Hadoop_入门_克隆三台虚拟机】15:01
vim /etc/sysconfig/network-scripts/ifcfg-ens33
vim /etc/hostname
reboot
P024【024_尚硅谷_Hadoop_入门_JDK安装】07:02
在hadoop102上安装jdk,然后将jdk拷贝到hadoop103与hadoop104上。
P025【025_尚硅谷_Hadoop_入门_Hadoop安装】07:20
同P024图!
P026【026_尚硅谷_Hadoop_入门_本地运行模式】11:56
[root@node1 ~]# cd /export/server/hadoop-3.3.0/share/hadoop/mapreduce/
[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.3.0.jar wordcount /wordcount/input /wordcount/output
2023-03-20 14:43:07,516 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.88.151:8032
2023-03-20 14:43:09,291 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1679293699463_0001
2023-03-20 14:43:11,916 INFO input.FileInputFormat: Total input files to process : 1
2023-03-20 14:43:12,313 INFO mapreduce.JobSubmitter: number of splits:1
2023-03-20 14:43:13,173 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1679293699463_0001
2023-03-20 14:43:13,173 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-03-20 14:43:14,684 INFO conf.Configuration: resource-types.xml not found
2023-03-20 14:43:14,684 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-03-20 14:43:17,054 INFO impl.YarnClientImpl: Submitted application application_1679293699463_0001
2023-03-20 14:43:17,123 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1679293699463_0001/
2023-03-20 14:43:17,124 INFO mapreduce.Job: Running job: job_1679293699463_0001
2023-03-20 14:43:52,340 INFO mapreduce.Job: Job job_1679293699463_0001 running in uber mode : false
2023-03-20 14:43:52,360 INFO mapreduce.Job: map 0% reduce 0%
2023-03-20 14:44:08,011 INFO mapreduce.Job: map 100% reduce 0%
2023-03-20 14:44:16,986 INFO mapreduce.Job: map 100% reduce 100%
2023-03-20 14:44:18,020 INFO mapreduce.Job: Job job_1679293699463_0001 completed successfully
2023-03-20 14:44:18,579 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=31
FILE: Number of bytes written=529345
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=142
HDFS: Number of bytes written=17
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11303
Total time spent by all reduces in occupied slots (ms)=6220
Total time spent by all map tasks (ms)=11303
Total time spent by all reduce tasks (ms)=6220
Total vcore-milliseconds taken by all map tasks=11303
Total vcore-milliseconds taken by all reduce tasks=6220
Total megabyte-milliseconds taken by all map tasks=11574272
Total megabyte-milliseconds taken by all reduce tasks=6369280
Map-Reduce Framework
Map input records=2
Map output records=5
Map output bytes=53
Map output materialized bytes=31
Input split bytes=108
Combine input records=5
Combine output records=2
Reduce input groups=2
Reduce shuffle bytes=31
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=546
CPU time spent (ms)=3680
Physical memory (bytes) snapshot=499236864
Virtual memory (bytes) snapshot=5568684032
Total committed heap usage (bytes)=365953024
Peak Map Physical memory (bytes)=301096960
Peak Map Virtual memory (bytes)=2779201536
Peak Reduce Physical memory (bytes)=198139904
Peak Reduce Virtual memory (bytes)=2789482496
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=34
File Output Format Counters
Bytes Written=17
[root@node1 mapreduce]#
[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.3.0.jar wordcount /wc_input /wc_output
2023-03-20 15:01:48,007 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.88.151:8032
2023-03-20 15:01:49,475 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1679293699463_0002
2023-03-20 15:01:50,522 INFO input.FileInputFormat: Total input files to process : 1
2023-03-20 15:01:51,010 INFO mapreduce.JobSubmitter: number of splits:1
2023-03-20 15:01:51,894 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1679293699463_0002
2023-03-20 15:01:51,894 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-03-20 15:01:52,684 INFO conf.Configuration: resource-types.xml not found
2023-03-20 15:01:52,687 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-03-20 15:01:53,237 INFO impl.YarnClientImpl: Submitted application application_1679293699463_0002
2023-03-20 15:01:53,487 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1679293699463_0002/
2023-03-20 15:01:53,492 INFO mapreduce.Job: Running job: job_1679293699463_0002
2023-03-20 15:02:15,329 INFO mapreduce.Job: Job job_1679293699463_0002 running in uber mode : false
2023-03-20 15:02:15,342 INFO mapreduce.Job: map 0% reduce 0%
2023-03-20 15:02:26,652 INFO mapreduce.Job: map 100% reduce 0%
2023-03-20 15:02:40,297 INFO mapreduce.Job: map 100% reduce 100%
2023-03-20 15:02:41,350 INFO mapreduce.Job: Job job_1679293699463_0002 completed successfully
2023-03-20 15:02:41,557 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=60
FILE: Number of bytes written=529375
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=149
HDFS: Number of bytes written=38
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8398
Total time spent by all reduces in occupied slots (ms)=9720
Total time spent by all map tasks (ms)=8398
Total time spent by all reduce tasks (ms)=9720
Total vcore-milliseconds taken by all map tasks=8398
Total vcore-milliseconds taken by all reduce tasks=9720
Total megabyte-milliseconds taken by all map tasks=8599552
Total megabyte-milliseconds taken by all reduce tasks=9953280
Map-Reduce Framework
Map input records=4
Map output records=6
Map output bytes=69
Map output materialized bytes=60
Input split bytes=100
Combine input records=6
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=60
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1000
CPU time spent (ms)=3880
Physical memory (bytes) snapshot=503771136
Virtual memory (bytes) snapshot=5568987136
Total committed heap usage (bytes)=428343296
Peak Map Physical memory (bytes)=303013888
Peak Map Virtual memory (bytes)=2782048256
Peak Reduce Physical memory (bytes)=200757248
Peak Reduce Virtual memory (bytes)=2786938880
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=49
File Output Format Counters
Bytes Written=38
[root@node1 mapreduce]# pwd
/export/server/hadoop-3.3.0/share/hadoop/mapreduce
[root@node1 mapreduce]#
P027【027_尚硅谷_Hadoop_入门_scp&rsync命令讲解】15:01
第一次同步用scp,后续同步用rsync。
rsync主要用于备份和镜像,具有速度快、避免复制相同内容和支持符号链接的优点。
rsync和scp区别:用rsync做文件的复制要比scp的速度快,rsync只对差异文件做更新。scp是把所有文件都复制过去。
P028【028_尚硅谷_Hadoop_入门_xsync分发脚本】18:14
拷贝同步命令
- scp(secure copy)安全拷贝
- rsync 远程同步工具
- xsync 集群分发脚本
dirname命令:截取文件的路径,去除文件名中的非目录部分,仅显示与目录有关的内容。
[root@node1 ~]# dirname /home/atguigu/a.txt
/home/atguigu
[root@node1 ~]#basename命令:获取文件名称。
[root@node1 atguigu]# basename /home/atguigu/a.txt
a.txt
[root@node1 atguigu]#
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
[root@node1 bin]# chmod 777 xsync
[root@node1 bin]# ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月 20 16:00 xsync
[root@node1 bin]# cd ..
[root@node1 atguigu]# xsync bin/
==================== node1 ====================
sending incremental file list
sent 94 bytes received 17 bytes 222.00 bytes/sec
total size is 727 speedup is 6.55
==================== node2 ====================
sending incremental file list
bin/
bin/xsync
sent 871 bytes received 39 bytes 606.67 bytes/sec
total size is 727 speedup is 0.80
==================== node3 ====================
sending incremental file list
bin/
bin/xsync
sent 871 bytes received 39 bytes 1,820.00 bytes/sec
total size is 727 speedup is 0.80
[root@node1 atguigu]# pwd
/home/atguigu
[root@node1 atguigu]# ls -al
总用量 20
drwx------ 6 atguigu atguigu 168 3月 20 15:56 .
drwxr-xr-x. 6 root root 56 3月 20 10:08 ..
-rw-r--r-- 1 root root 0 3月 20 15:44 a.txt
-rw------- 1 atguigu atguigu 21 3月 20 11:48 .bash_history
-rw-r--r-- 1 atguigu atguigu 18 8月 8 2019 .bash_logout
-rw-r--r-- 1 atguigu atguigu 193 8月 8 2019 .bash_profile
-rw-r--r-- 1 atguigu atguigu 231 8月 8 2019 .bashrc
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
drwxrwxr-x 3 atguigu atguigu 18 3月 20 10:17 .cache
drwxrwxr-x 3 atguigu atguigu 18 3月 20 10:17 .config
drwxr-xr-x 4 atguigu atguigu 39 3月 10 20:04 .mozilla
-rw------- 1 atguigu atguigu 1261 3月 20 15:56 .viminfo
[root@node1 atguigu]#
连接成功
Last login: Mon Mar 20 16:01:40 2023
[root@node1 ~]# su atguigu
[atguigu@node1 root]$ cd /home/atguigu/
[atguigu@node1 ~]$ pwd
/home/atguigu
[atguigu@node1 ~]$ xsync bin/
==================== node1 ====================
The authenticity of host 'node1 (192.168.88.151)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.88.151' (ECDSA) to the list of known hosts.
atguigu@node1's password:
atguigu@node1's password:
sending incremental file list
sent 98 bytes received 17 bytes 17.69 bytes/sec
total size is 727 speedup is 6.32
==================== node2 ====================
The authenticity of host 'node2 (192.168.88.152)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.88.152' (ECDSA) to the list of known hosts.
atguigu@node2's password:
atguigu@node2's password:
sending incremental file list
sent 94 bytes received 17 bytes 44.40 bytes/sec
total size is 727 speedup is 6.55
==================== node3 ====================
The authenticity of host 'node3 (192.168.88.153)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.88.153' (ECDSA) to the list of known hosts.
atguigu@node3's password:
atguigu@node3's password:
sending incremental file list
sent 94 bytes received 17 bytes 44.40 bytes/sec
total size is 727 speedup is 6.55
[atguigu@node1 ~]$
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 17:22:20 2023 from 192.168.88.151
[root@node2 ~]# su atguigu
[atguigu@node2 root]$ vim /etc/sudoers
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 root]$ su root
密码:
[root@node2 ~]# vim /etc/sudoers
[root@node2 ~]# cd /opt/
[root@node2 opt]# ll
总用量 0
drwxr-xr-x 4 atguigu atguigu 46 3月 20 11:32 module
drwxr-xr-x. 2 root root 6 10月 31 2018 rh
drwxr-xr-x 2 atguigu atguigu 67 3月 20 10:47 software
[root@node2 opt]# su atguigu
[atguigu@node2 opt]$ cd /home/atguigu/
[atguigu@node2 ~]$ llk
bash: llk: 未找到命令
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
[atguigu@node2 ~]$ cd ~
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 ~]$ cd bin
[atguigu@node2 bin]$ ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月 20 16:00 xsync
[atguigu@node2 bin]$
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 17:22:26 2023 from 192.168.88.152
[root@node3 ~]# vim /etc/sudoers
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# cd /opt/
[root@node3 opt]# ll
总用量 0
drwxr-xr-x 4 atguigu atguigu 46 3月 20 11:32 module
drwxr-xr-x. 2 root root 6 10月 31 2018 rh
drwxr-xr-x 2 atguigu atguigu 67 3月 20 10:47 software
[root@node3 opt]# cd ~
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月 11 2020 anaconda-ks.cfg
-rw------- 1 root root 0 2月 23 16:20 nohup.out
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月 11 2020 anaconda-ks.cfg
-rw------- 1 root root 0 2月 23 16:20 nohup.out
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# cd ~
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月 11 2020 anaconda-ks.cfg
-rw------- 1 root root 0 2月 23 16:20 nohup.out
[root@node3 ~]# su atguigu
[atguigu@node3 root]$ cd ~
[atguigu@node3 ~]$ ls
bin
[atguigu@node3 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
[atguigu@node3 ~]$ cd bin
[atguigu@node3 bin]$ ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月 20 16:00 xsync
[atguigu@node3 bin]$
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 16:01:40 2023
[root@node1 ~]# su atguigu
[atguigu@node1 root]$ cd /home/atguigu/
[atguigu@node1 ~]$ pwd
/home/atguigu
[atguigu@node1 ~]$ xsync bin/
==================== node1 ====================
The authenticity of host 'node1 (192.168.88.151)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.88.151' (ECDSA) to the list of known hosts.
atguigu@node1's password:
atguigu@node1's password:
sending incremental file list
sent 98 bytes received 17 bytes 17.69 bytes/sec
total size is 727 speedup is 6.32
==================== node2 ====================
The authenticity of host 'node2 (192.168.88.152)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.88.152' (ECDSA) to the list of known hosts.
atguigu@node2's password:
atguigu@node2's password:
sending incremental file list
sent 94 bytes received 17 bytes 44.40 bytes/sec
total size is 727 speedup is 6.55
==================== node3 ====================
The authenticity of host 'node3 (192.168.88.153)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.88.153' (ECDSA) to the list of known hosts.
atguigu@node3's password:
atguigu@node3's password:
sending incremental file list
sent 94 bytes received 17 bytes 44.40 bytes/sec
total size is 727 speedup is 6.55
[atguigu@node1 ~]$ xsync /etc/profile.d/my_env.sh
==================== node1 ====================
atguigu@node1's password:
atguigu@node1's password:
.sending incremental file list
sent 48 bytes received 12 bytes 13.33 bytes/sec
total size is 223 speedup is 3.72
==================== node2 ====================
atguigu@node2's password:
atguigu@node2's password:
sending incremental file list
my_env.sh
rsync: mkstemp "/etc/profile.d/.my_env.sh.guTzvB" failed: Permission denied (13)
sent 95 bytes received 126 bytes 88.40 bytes/sec
total size is 223 speedup is 1.01
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
==================== node3 =========尚硅谷大数据技术hadoop教程-笔记06hadoop-生产调优手册(代码片段)
视频地址:尚硅谷大数据Hadoop教程(Hadoop3.x安装搭建到集群调优)尚硅谷大数据技术Hadoop教程-笔记01【大数据概论】尚硅谷大数据技术Hadoop教程-笔记02【Hadoop-入门】尚硅谷大数据技术Hadoop教程-笔记03【Hadoop-HDFS】尚硅... 查看详情
学习笔记尚硅谷hadoop大数据教程笔记
本文是尚硅谷Hadoop教程的学习笔记,由于个人的需要,只致力于搞清楚Hadoop是什么,它可以解决什么问题,以及它的原理是什么。至于具体怎么安装、使用和编写代码不在我考虑的范围内。一、Hadoop入门大数据的... 查看详情
学习笔记尚硅谷hadoop大数据教程笔记
本文是尚硅谷Hadoop教程的学习笔记,由于个人的需要,只致力于搞清楚Hadoop是什么,它可以解决什么问题,以及它的原理是什么。至于具体怎么安装、使用和编写代码不在我考虑的范围内。一、Hadoop入门大数据的... 查看详情
学习笔记尚硅谷hadoop大数据教程笔记
本文是尚硅谷Hadoop教程的学习笔记,由于个人的需要,只致力于搞清楚Hadoop是什么,它可以解决什么问题,以及它的原理是什么。至于具体怎么安装、使用和编写代码不在我考虑的范围内。一、Hadoop入门大数据的... 查看详情
大数据周会-本周学习内容总结09
...学内容】02【Saprk】会议记录01【scala】1.1【已学内容】尚硅谷大数据技术Scala教程-笔记01【Scala课程简介、Scala入门、变量和数据类型、运算符、流程控制】尚硅谷大数据技术Scala教程-笔记02【函数式编程】尚硅谷大数据技术Scala教... 查看详情
clickhouse-尚硅谷(6.入门-副本)学习笔记(代码片段)
上一篇:(5.入门-SQL操作)学习笔记下一篇:(7.入门-分片集群)文章目录1副本写入流程2配置步骤1副本写入流程2配置步骤启动zookeeper集群在hadoop102的/etc/clickhouse-server/config.d目录下创建一个名为metrika.xml... 查看详情
:搭建hadoop集群(送尚硅谷大数据笔记)
尚硅谷Hadoop3.x官方文档大全免费下载。https://pan.baidu.com/share/init?surl=P5JAtWlGDKMAPWmHAAcbyA提取码:5h60搭建集群没什么好讲的,跟着视频和笔记出不了什么问题。唯一遇到的问题就是安装好VmWare后打不开,发现是老师给... 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(8)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课程中共有... 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(8)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课程中共有... 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(4)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(6)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(7)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(8)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(3)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(9)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(9)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(6)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情
尚硅谷vue系列教程学习笔记(代码片段)
尚硅谷Vue系列教程学习笔记(3)参考课程:《尚硅谷Vue2.0+Vue3.0全套教程丨vuejs从入门到精通》参考链接:https://www.bilibili.com/video/BV1Zy4y1K7SH?vd_source=4f4d3466cdc2c5a2539504632c862ce7笔记上传说明:上述课 查看详情