架构师必知必会系列系统架构设计需要知道的5大精要(5systemdesignfundamentals)...(代码片段)

禅与计算机程序设计艺术 禅与计算机程序设计艺术     2022-12-15     397

关键词:

无论是在大厂还是初创公司,技术产品经理 (TPM)都需要具备系统设计的基础知识。从历史上看,系统设计基础知识通常是软件工程师在面试时的要求,而 TPM 不受此期望的约束。然而,现在趋势正在发生变化。作为 TPM,您需要在面试和领导产品团队的工作中,对系统设计有深入的了解。

Technical product managers (TPMs) at big tech companies and startups are required to have a fundamental knowledge of System Design. Historically, System Design fundamentals were usually a requirement for software engineers during interviews, and TPMs were exempt from that expectation. However, the trend is now changing. As a TPM, you need to have a solid understanding of System Design in interviews and on the job as you lead a product team.

在本文中,我们将讨论5 个最常见的系统设计基础知识,您必须了解这些基础知识才能在技术产品管理中取得成功,同时领导工程团队并推出出色的产品。

In this article, we’ll discuss 5 of the most common fundamentals of System Design that you must know to succeed in your role in technical product management while leading an engineering team and rolling out great products.

我们将涵盖:

  • 什么是系统设计,作为 TPM 为什么要关心?

  • 1.负载均衡

  • 2.键值存储

  • 3.限速器

  • 4. 内容分发网络 (CDN)

  • 5.数据库

  • 从这里开始系统设计

We’ll cover:

  • What is System Design, and why should you care as a TPM?

  • 1. Load balancing

  • 2. Key-value stores

  • 3. Rate limiters

  • 4. Content delivery networks (CDNs)

  • 5. Databases

  • Where to go from here on System Design

什么是系统设计,作为 TPM 为什么要关心?

系统设计是构建系统以满足所有功能和非功能需求的过程,包括 API、用例和集成。即使您作为 TPM 不直接负责此体系结构的复杂细节,您也应该了解全局,了解不同的系统组件如何支持您的组织的目标,并满足您的产品的要求。

TPM 应该具备系统设计的基础知识才能做好他们的工作。

Educative 的 CEO Fahim ul Haq 在 FAANG 公司从事分布式系统工作八年。在广泛采访了 TPM 之后,他同意了。

“TPM 应该了解可伸缩系统如何工作以及分布式系统的不同部分如何在抽象层面上相互交互以正确指导开发的所有基本概念。他们需要了解他们正在构建的系统的核心概念和构建块。” — Fahim ul Haq(前系统工程师,Educative 首席执行官)

TPM 需要了解系统设计基础知识,以便为其产品做出明智的设计决策。例如,如果你要设计 Facebook 的照片存储系统,你需要为每张图片分配一个特定的 ID,并有一个系统来唯一标识每张上传的照片。如果您了解系统设计,那么您就会知道您需要一个序列生成器来实现此功能。

因此,作为有效的 TPM,您的目标应该是构建敏捷、可扩展、可靠、可维护且健壮的系统,以在任何给定时间满足用户需求。

接下来,我们将介绍系统设计的五个基本概念,它们是您作为 TPM 在工作中取得成功所绝对必要的。

What is System Design, and why should you care as a TPM?

System Design is the process of architecting a system so that all functional and non-functional requirements are met, including APIs, use cases, and integrations. Even if you’re not directly responsible for the intricate details of this architecture as a TPM, you should understand the big-picture, how different system components support the goals of your organization, and meet the requirements of your products.

TPMs should have a fundamental knowledge of System Design to do their jobs well.

Fahim ul Haq, the CEO of Educative, has worked on distributed systems at FAANG companies for eight years. Having interviewed TPMs extensively, he agrees.

“A TPM should understand all the basic concepts of how scalable systems work and how different parts of a distributed system interact with each other at an abstract level to properly guide development. They need to know the core concepts and building blocks of the systems they are building.” — Fahim ul Haq (former systems engineer, CEO of Educative)

A TPM needs to know System Design fundamentals to make informed design decisions for their products. For instance, if you were to design Facebook’s photo storage system, you’d need to assign a specific ID to each image and have a system in place to uniquely identify every photo uploaded. If you know System Design, then you’d know that you’ll need a sequence generator to achieve this function.

Therefore, as an effective TPM, you should aim to architect agile, scalable, reliable, maintainable, and robust systems that meet user requirements at any given time.

Next, we’ll cover five fundamental concepts of System Design that are an absolute necessity for you, a TPM, to succeed in your job.

1.负载均衡

负载平衡是系统设计生命周期的一个组成部分,指的是在不同的计算服务器之间重新分配任务以提高系统性能和可靠性。

每秒有数百万个请求,负载均衡器将在可用资源之间平均分配任务,以确保流量顺畅流动。

负载均衡的优点

  • 提高效率:负载均衡器在不同服务器之间平均分配负载流量,从而提高效率并同时降低成本。

  • 服务器的可用性:如果一台或多台服务器发生故障,负载均衡器将绕过它们并通过在正常运行的服务器之间分配流量来确保系统保持可用。

  • 可扩展性:添加更多服务器可确保通过负载平衡同时增加应用程序容量。

为什么 TPM 应该知道负载平衡?

作为 TPM,您将经常面临需要扩展服务器以满足用户需求的情况,或者出现流量激增和故障的情况。在这种情况下,负载均衡器就会派上用场。

此外,您必须具有决策能力,可以根据定价、涉众承诺和其他变量为您的开发团队选择合适的负载均衡器算法。

负载平衡器将帮助您的系统提高可伸缩性、性能、可用性,并将减少冗余。通过确保可以同时更改服务器容量,故障服务器优先于正常工作的服务器被绕过,并且服务器负载均匀分布。

1. Load Balancing

Load balancing is an integral part of the System Design lifecycle and refers to redistributing tasks across different computing servers to enhance system performance and reliability.

With millions of requests per second, load balancers will evenly spread tasks between available resources to ensure traffic flows smoothly.

Advantages of load balancing

  • Improving efficiency: Load balancers distribute load traffic evenly among different servers, therefore improving efficiency and cutting down costs simultaneously.

  • Availability of servers: If one or more servers break down, load balancers will bypass them and ensure the system remains available by distributing traffic among properly functioning servers.

  • Scalability: Adding more servers ensures application capacity is increased concurrently via load balancing.

Why should a TPM know load balancing?

As a TPM, you will constantly be faced with situations where your servers need to be scaled up to meet user demand, or where there is a traffic surge and failure. In this scenario, a load balancer will come in handy.

Additionally, you must have the decision-making capacity to select a suitable load balancer algorithm for your development team depending on pricing, stakeholder commitment, and other variables.

A load balancer will help your system improve scalability, performance, availability, and will reduce redundancy. By ensuring server capacity can be altered simultaneously, failed servers are bypassed in preference for working ones and server load is distributed evenly.

2.键值存储

键值存储是一种软件存储系统,它建立在关联数组数据模型(例如哈希表或字典)的基础上,为集合中的每个键分配唯一值。值可以是唯一 ID、blob 或服务器名称中的任何内容。

在分布式环境中使用传统存储系统进行扩展,同时仍保持强大且一致的可用性可能具有挑战性。几家顶级科技公司,包括 Facebook、Netflix 和亚马逊,比传统的在线事务处理 (OLTP) 数据库更依赖主键访问数据存储。根据定义,OLTP 是通过 Internet 快速实时执行大量数据库事务。

键值存储的优点

  • 可扩展性:它们可以持续处理越来越多的数据,而不会显着降低性能。

  • 速度:简单的检索和使用命令,如getputdelete确保效率。

  • 灵活性:由于结合了可扩展性和速度键值存储提供的功能,扩展任何大型业务模型都变得更加容易。

为什么 TPM 应该了解键值存储?

在将系统设计为 TPM 时,您应该考虑何时何地需要使用键值存储,以及为什么它可能是当时的最佳选择。该模型有利于存储客户个性化数据,因为它具有可扩展性、速度和灵活性。

例如,您可以提高系统的处理性能,因为您将在多台具有更多内存的计算机上处理数据集,并提高容错能力。LinkedIn、Amazon 和 MongoDB 等公司在过去几年中使用键值存储来显着扩展。

2. Key-value storage

A key-value store is a software storage system that builds on an associative array data model such as a hash table or dictionary to assign every key with a unique value in a collection. Values can be anything from unique ids, blobs, or server names.

It can be challenging to scale with traditional storage systems in distributed environments while still maintaining strong and consistent availability. Several top tech companies, including Facebook, Netflix, and Amazon, rely more on primary-key access data stores than traditional online transaction processing (OLTP) databases. By definition, OLTP is the rapid real-time execution of huge database transactions over the internet.

Advantages of key-value storage

  • Scalability: They can continuously process increasingly large amounts of data without a significant drop in performance.

  • Speed: Simple retrieval and usage commands like getput, and delete ensure efficiency.

  • Flexibility: Scaling any large business model is easier because of the combined scalability and speed key-value stores offer.

Why should a TPM know about key-value storage?

When designing a system as a TPM, you should consider when and where you need to use a key-value store and why it may be the best choice at that given time. This model is beneficial for storing customer personalization data because of the scalability, speed, and flexibility that come with it.

For instance, you can increase processing performance on your systems because you will be working with datasets on multiple computers with more memory and also increase fault tolerance. Companies like LinkedIn, Amazon, and MongoDB have used key-value stores to scale significantly over the last couple of years.

3.限速器

速率限制器确保服务仅响应设定数量的请求。超出预定义限制的任何内容都会受到限制。例如,如果服务的 API 配置为每分钟仅处理 200 个请求,则超过该请求的任何请求都将被阻止。

限速器的优点

  • 成本效率:它们有助于控制运营成本,例如,通过防止运营实验超过服务器请求的设定配额。

  • 避免资源剥夺:通过速率限制可以防止由于软件配置错误而发生的多种拒绝服务 (DoS) 攻击。

  • 分配数据流:与负载均衡器一样,速率限制器确保系统不会因大量数据而负担过重,并在需要时帮助在不同服务器之间平均分配负载。

为什么 TPM 应该了解速率限制器?

在您作为 TPM 的角色中,您希望确保您的服务器以最佳方式运行并且数据库不会因性能低下而受到损害。这是可以应用适当的速率限制算法的地方。

像 Lyft 这样的公司利用速率限制器来有效地运行他们的流程。

3. Rate limiters

A rate limiter ensures that a service responds only to a set number of requests. Anything beyond the predefined limits is throttled. For example, if an API for a service has been configured to handle only 200 requests per minute, any requests over that will be blocked.

Advantages of rate limiters

  • Cost efficiency: They help control operational costs, for instance, by preventing operational experiments from exceeding the set quota of server requests.

  • Averting resource deprivation: Several denial of service (DoS) attacks that happen due to software configuration errors are prevented with rate limiting.

  • Distributing data flow: Like load balancers, rate limiters ensure that systems are not overburdened with a large amount of data and help evenly spread the load among different servers when required.

Why should a TPM know about rate limiters?

In your role as a TPM, you want to ensure that your servers are running optimally and databases are not being compromised by slow performance. This is where an appropriate rate limiting algorithm can be applied.

Companies like Lyft make use of rate limiters to run their processes efficiently.

4. 内容分发网络 (CDN)

内容分发网络是地理分布的服务器,它们协同工作以确保通过 Internet 快速高效地分发内容。CDN 使用缓存作为一种机制来加速内容在 Web 上的传输。

CDN 服务的内容可以有多种类型,包括网站数据、社交媒体内容、可下载媒体等。

一些组织使用 CDN 来加速通过 Internet 传送内容。例如,银行可能会使用 CDN 来安全地传输敏感数据。

CDN的优势

  • 提高效率:CDN 可以缩短网页加载时间,同时降低跳出率。这使用户留在页面上并防止他们放弃它。

  • 增强安全性:通过缓解分布式拒绝服务 (DDoS) 攻击,CDN 在增强安全性方面发挥着巨大作用。

  • 降低带宽成本:由于 CDN 主要依赖缓存和其他优化,它们可以显着降低服务器带宽,从而降低网站管理员和所有者的托管成本。

为什么 TPM 应该了解 CDN?

如果您的组织内容繁多,作为 TPM,您可能会发现在某些情况下使用 CDN 很有帮助。您将能够减少数据加载时间和延迟、减少冗余、提高安全性并减少带宽费用,从而为组织节省时间和成本。

4. Content delivery networks (CDNs)

Content delivery networks are geographically distributed servers that work together to ensure quick and efficient content delivery over the internet. CDNs use caching as a mechanism to speed up the delivery of content across the web.

Content serviced by CDNs can be of several types, including website data, social media content, downloadable media, and so on.

Several organizations use CDNs to accelerate the delivery of content via the internet. A bank, for instance, might use a CDN to transfer sensitive data securely.

Advantages of CDNs

  • Improving efficiency: CDNs enhance web page load times while simultaneously cutting down bounce rates. This keeps a user on the page and prevents them from abandoning it.

  • Enhancing security: By mitigating distributed denial-of-service (DDoS) attacks, CDNs play a massive role in boosting security.

  • Cutting down on bandwidth costs: Because CDNs primarily rely on caching and other optimizations, they can significantly reduce server bandwidth, keeping hosting costs down for website administrators and owners.

Why should a TPM know about CDNs?

If your organization is content-heavy, you, as a TPM, may find a CDN helpful to employ in some instances. You will be able to reduce data load times and latency, reduce redundancy, boost security and reduce bandwidth expenses hence saving time and costs for the organization.

5.数据库

传统的文件系统有很多缺点,因此数据库通常是首选。数据库是以易于访问、维护、管理和结构化的方式组织的数据集合,以便可以有效地更新和处理。

数据库有两种主要类型:

  • 关系数据库是在多个表、列和记录中组织的数据集的集合。关系数据库通过数据库表相互通信。结构化查询语言 (SQL) 用于使用 、 、 和 等命令从这些数据库中操作insertdelete检索update信息retrieve

  • 非关系数据库(NoSQL) 通常以与关系数据库不同的格式存储非结构化数据。NoSQL 数据库有多种类型,包括图形、键值、文档和宽列。

数据库的优势

  • 数据一致性:数据库将确保消除数据冗余,所做的更改会立即反映在数据库中,因此不会出现数据不一致的情况。

  • 数据完整性:通过确保向所有用户提供正确和准确的信息,可以维护数据完整性。

  • 数据安全:包括密码和用户身份验证在内的多种安全功能有助于维护数据库中数据的安全性。

为什么 TPM 应该了解数据库?

在这个数字时代,每个组织都使用数据库来扩展业务并改进工作流程和效率。你作为 TPM 的角色可能经常需要你也戴上数据产品经理的帽子。在这里,您可能需要监督数据在组织内的分布和使用方式的整个生命周期。在这里,强大的数据科学背景和使用数据库将帮助您大放异彩。

根据您选择的类型,数据库有几个优势,包括数据完整性、数据一致性、数据安全性、数据持久性和易于访问等。

5. Databases

Traditional file systems come with many disadvantages, so databases are often preferred. A database is a collection of data organized in a way that is easily accessible, maintainable, manageable, and structured so that it can be updated and processed efficiently.

Databases come in two main types:

  • Relational databases are collections of datasets organized in multiple tables, columns, and records. Relational databases communicate with each other via database tables. Structured query language (SQL) is used to manipulate and retrieve information from these databases with commands like insertdeleteupdate, and retrieve.

  • Non-relational databases (NoSQL) typically store unstructured data in a different format from relational databases. NoSQL databases have several types, including graph, key-value, document, and wide-column.

Advantages of databases

  • Data consistency: Databases will ensure that data redundancy is eliminated and changes made are reflected in the database immediately, hence no inconsistency in data.

  • Data integrity: By ensuring all users are presented with correct and accurate information, data integrity is maintained.

  • Data security: Several security features including password and user authentication help maintain the security of data in databases.

Why should a TPM know about databases?

Every organization works with databases in this digital era to scale their business and improve workflows and efficiency. Your role as a TPM may often require you to wear the hat of a data product manager too. Here, you may be required to oversee the entire lifecycle of how data is distributed and used within an organization. This is where a strong background in data science and working with databases will help you shine.

Databases have several advantages depending on the type you choose, including data integrity, data consistency, data security, data persistence, and ease of access, among others.

NoSQL Database

NoSQL refers to "not only SQL" or non-relational database which does not store the data in the form of a table. Depending on the model, NoSQL has a variety of database types to store the data. The main types are document database,key-value pair, wide column, and graph database.
NoSQL database is used in the real-time web application.NoSQL database can easily be scaled with a large amount of data and high user loads. Horizontal scaling to clusters of machines can easily be done.

Different types of NoSQL database:

Document Database - store the data in JSON like format. Different types of data can be stored (strings, numbers, booleans, arrays, or objects).Ex(MongoDB is the example of the document database.)

Key-value Databases - items are stored in the key-value pair. This key is used as a unique identifier. It is simple to retrieve the information from the key-value pair does not require the complex queries. Example -:Redis and DynanoDB are popular key-value databases.

Wide-column Stores - store data in tables ,rows and dynamic columns . It is more flexible than a relational database table because it is not necessary for each row the same set of columns. Wide-column stores are used for storing the IOT data and user profile data. Examples:- Cassandra and HBase fall under this category of databases.

Graph Databases - We know, the graph is made up of two elements node and relationship. The node represents the entity and the edges represent the relationship. These types of databases are generally used for social networks.
Example: - Neo4j and Janusgraph fall under this category.

从这里开始系统设计

您学习系统设计的基础知识是否有趣?无论您是刚刚考虑开始技术产品管理的职业生涯还是已经在该领域,我们希望本文能帮助您在构建可扩展的软件产品时驾驭系统设计的复杂性。

但我们只是触及了这个话题的表面。本文未讨论但作为 TPM 绝对需要学习和掌握的更多系统设计基础知识包括:

  • 域名系统 (DNS)

  • 序列器

  • 分布式缓存

  • 发布-订阅系统

  • 分片计数器

  • 分布式消息队列

  • 分布式任务调度

  • 分布式日志记录

快乐学习!

Where to go from here on System Design

Did you have fun learning about the fundamentals of System Design? Whether you are just thinking about starting a career in technical product management or are already in the field, we hope this article helped you navigate the complexities of System Design as you build scalable software products.

But we have just scratched the surface of this topic. More System Design fundamentals not discussed in this article but absolutely necessary for you as a TPM to learn and master include:

  • Domain Name Systems (DNS)

  • Sequencers

  • Distributed Caching

  • Publish-Subscribe Systems

  • Sharded Counters

  • Distributed Messaging Queues

  • Distributed Task Scheduling

  • Distributed Logging

Happy learning!

Continue learning about System Design on Educative

  • How to prepare for the System Design Interview in 2022

  • System Design fundamentals: What is the CAP theorem?

  • How machine learning gives you an edge in System Design

开始讨论

TPM 还需要了解哪些其他系统设计基础知识?本文是否有帮助?在下面的评论中让我们知道!

Start a discussion

What other System Design fundamentals do TPMs need to know? Was this article helpful? Let us know in the comments below!

附:架构师知识图谱


【更多阅读】

架构实践架构师必知必会的5种业界主流的架构风格

 【架构实践】架构师必知必会的5种业界主流的架构风格目录 【架构实践】架构师必知必会的5种业界主流的架构风格 查看详情

架构师必知必会常见的nosql数据库种类以及使用场景(代码片段)

...悉理解各种类型nosql数据的特点和应用场景,对提高架构能力有巨大帮助,是高级后端架构师必须理解的知识点。NoSQL不是一个工具,而是由多个互补和竞争的工具组成的生态系统。标有NoSQL绰号的工具,提供了一... 查看详情

大数据必知必会系列__面试官问能不能徒手画一下你们的项目架构[新星计划]

文章目录引言一.ETL架构及Kudu框架二.OGG及Canal数据同步架构图总结引言大家好,我是ChinaManor,直译过来就是中国码农的意思,俺希望自己能成为国家复兴道路的铺路人,大数据领域的耕耘者,一个平凡而不平庸的人。学习大数据差不多... 查看详情

大数据必知必会|hive架构设计和原理(代码片段)

...Alex。在上一篇文章简单介绍HDFS,MapReduce,Yarn的架构思想和原理,收获和反响还不错,那本篇内容,我们继续,本篇文章,我来为大家介绍Hive架构思想和设计原理。Hive        我们在上一节讲到... 查看详情

大数据必知必会|hive架构设计和原理(代码片段)

...Alex。在上一篇文章简单介绍HDFS,MapReduce,Yarn的架构思想和原理,收获和反响还不错,那本篇内容,我们继续,本篇文章,我来为大家介绍Hive架构思想和设计原理。Hive        我们在上一节讲到... 查看详情

必知必会的设计原则——合成复用原则(代码片段)

 设计原则系列文章 必知必会的设计原则——单一职责原则必知必会的设计原则——开放封闭原则必知必会的设计原则——依赖倒置原则必知必会的设计原则——里氏替换原则必知必会的设计原则——接口隔离原则必知必... 查看详情

必知必会-使用kafka之前要掌握的知识(代码片段)

...分组队列还是分发消费方式API前记消息队列是分布式系统架构中不可或缺的基础组件,它主要负责服务间的消息通信和数据传输。市面上有很多的开源消息队列服务可以选择,除了kafka,还有Activemq,Rocketmq等。对... 查看详情

java架构面试必知必会的微服务面试题解析(代码片段)

...布出来供其他服务调用,一组互相依赖的服务就构成了SOA架构下的系统。基于这些基础的服务,可以将业务过程用类似BPEL流程的方式编排起来,而BPEL反映的是业务处理的过程,这些过程对于业务人员更为直观,调整也比hardcode... 查看详情

java架构面试必知必会的微服务面试题解析(代码片段)

...布出来供其他服务调用,一组互相依赖的服务就构成了SOA架构下的系统。基于这些基础的服务,可以将业务过程用类似BPEL流程的方式编排起来,而BPEL反映的是业务处理的过程,这些过程对于业务人员更为直观,调整也比hardcode... 查看详情

高级系统架构师必知的经纪人broker设计

什么是经纪人(Broker)解决方案每个网络节点的本地Broker代表系统中的领域对象进行协商并实现进程间通信的功能。远程领域对象的显式接口采用ClientProxy(客户端代理)的方式在其客户端的地址空间实现,并处理所有与Broker之... 查看详情

大数据必知必会:hadoop分布式集群环境安装(代码片段)

(大数据必知必会:Hadoop(3)分布式集群环境安装)安装前准备集群环境下,至少需要3台服务器。IP地址主机名称10.0.0.5node110.0.0.6node210.0.0.7node3需要保证每台服务器的配置都一致,以下步骤在3台服务器上都需要做一次。操作系统... 查看详情

设计模式必知必会系列终章(代码片段)

目录装饰器模式工厂方法模式抽象工厂模式​编辑适配器模式代理模式装饰器模式官方定义:  动态地给⼀个对象增加⼀些额外的职责。就增加功能而言,装饰器模式比生成子类更为灵活。——《设计模式》GoF通俗解释: 装... 查看详情

大数据必知必会:hadoop高可用集群安装(代码片段)

(大数据必知必会:Hadoop(4)高可用集群安装)安装前准备高可用集群环境下,至少需要3台服务器,这里准备5台。IP地址主机名称角色10.0.0.5node1JournalNode、NameNode、ResourceManager10.0.0.6node2JournalNode、NameNode、ResourceManager10.0.0.7node3Journa... 查看详情

面试-必知必会的微服务面试题

...布出来供其他服务调用,一组互相依赖的服务就构成了SOA架构下的系统。基于这些基础的服务,可以将业务过程用类似BPEL流程的方式编排起来,而BPEL反映的是业务处理的过程,这些过程对于业务人员更为直观,调整也比hardcode... 查看详情

spark必知必会的基本概念

...器上,这是一个容量巨大、具有高容错性的磁盘。通常的架构是一个NameNode(存放元数据)多个DataNode,为了防止namenode宕机,有一个备用的NameNode:StandbyNameNode。     图3HDFS架构  spark的资管管理与调度使用YARN 查看详情

正则表达式必知必会读书笔记

架构图模拟小案例1.匹配美元 查看详情

大数据必知必会:hadoop单机环境安装(代码片段)

(大数据必知必会:Hadoop(1)单机环境安装)安装前准备操作系统准备本次安装采用的操作系统是Ubuntu20.04。更新一下软件包列表。sudoapt-getupdate安装Java8+使用命令安装Java8。sudoapt-getinstall-yopenjdk-8-jdk配置环境变量。vi~/.bashrcexportJAVA... 查看详情

❤️hadoop必知必会的基本知识❤️

🏃‍HDFS🏊‍HDFS的组成架构:这种架构主要由四个部分组成,分别为HDFSClient、NameNode、DataNode和SecondaryNameNode。下面我们分别介绍这四个组成部分。1)Client:就是客户端。 (1)文件切分。文件上... 查看详情