木鸟杂记

大规模数据系统

Distributed Storage Interview Experience

I changed jobs a while ago for some reasons and interviewed for several distributed storage-related positions. I noticed there are relatively few experience-sharing posts on this topic, so I decided to share mine. Due to company privacy policies, I won’t list questions by company; instead, I’ll briefly summarize the general directions and contents of the interviews. Given my limited experience and expertise, please feel free to point out any mistakes in the comments.

Distributed storage positions cover a wide range, and can generally be categorized by direction as follows:

  1. Distributed File Storage
  2. Object Storage
  3. Distributed KV or Cache
  4. Distributed Database (NewSQL)
  5. Table Storage
  6. Block Storage

Their positioning and directions also differ slightly:

Distributed File Storage. Supports POSIX semantics or trimmed POSIX. It can serve as a storage base for disaggregated storage and compute, or be used directly by applications, such as deep learning training and intermediate storage for big data processing. Common products include Pangu File System, Polarfs, JuiceFS, etc.

Object Storage. Generally stores unstructured data such as images and videos, usually compatible with Amazon’s S3 API. Common products include Amazon S3, Alibaba Cloud OSS, and Tencent Cloud COS.

Distributed KV or Cache. Usually compatible with the Redis interface, or a more simplified KV interface. Generally seeks speed, based on memory or SSD, or even new hardware like persistent memory. Used for low-latency business caching or as the base for disaggregated storage and compute systems. Products include ByteDance’s ABase, Alibaba Cloud’s Tair, and PingCAP’s TiKV.

Distributed Database (or NewSQL). Usually provides a SQL interface and unlimited horizontal scalability. Common products include PingCAP’s TiDB, Alibaba Cloud’s PolarDB, and Tencent Cloud’s TDSQL.

Table Storage. The classic interface can refer to column-oriented HBase, which is widely used in the big data field. Products include HBase and ByteDance’s ByteTable.

Block Storage. Provides a block device interface, generally used for the system disk of cloud hosts. Products include SmartX’s hyper-converged solution.

Author: Muniao’s Notes https://www.qtmuniao.com/2021/04/17/storage-interview/ , please indicate the source when reprinting

Interview Content

Distributed storage interviews are generally divided into several parts:

  1. Project Experience
  2. Fundamentals
  3. Algorithms and Coding
  4. Domain Knowledge
  5. System Design
  6. Programming Languages

Project Experience. If your project experience is a good match, the requirements in other areas will be relatively lower, but the project itself will be examined in great detail. Each interviewer may approach the project discussion from a different angle, but there is always some assessment purpose. By purpose, they can be roughly divided into the following types:

  1. Communication Skills. This is the most intangible criterion, but generally the most important, because interviewers consciously or unconsciously have an implicit standard: whether I would be willing to work with this candidate in the future. The project experience asked about at this point may not even be related to the position. When facing this type of assessment, be especially careful not to dive straight into implementation details. A cognitively sound answer should briefly describe your workflow when you first take on a new project (note that it must be brief, because no one likes long-winded, pointless rambling, and time is limited—this is where abstraction and summarization skills are tested): What is the project background and requirements? What open-source solutions are available on the market? How did we make technical choices? Which module was I responsible for? etc. In terms of attitude, be neither arrogant nor obsequious—don’t show arrogance like “how do you not know this,” and don’t show flattery like “you’re my boss.” Clearly state what you know, and openly admit what you don’t.
  2. Matching Experience. If your experience is a good match for the position, the interviewer may ask you to quickly introduce the project overview and then dive into some details. This is also why you shouldn’t spend too long on the project introduction—otherwise, the interviewer may want to focus on key questions but feel awkward interrupting you. The assessment here is generally related to the design approach and performance optimization of a particular module. For this type of question, simply follow the interviewer’s lead and do a brief review.
  3. Leading to Other Topics. Some interviewers ask about projects just to lead into certain fundamental computer science knowledge that the project might involve, in order to assess your curiosity, depth of knowledge, and other abilities. After all, knowledge related to a project is generally understood in depth. If you say “I just used it and don’t know anything else,” it will be somewhat detrimental to your score.

Fundamentals. Since distributed systems interact quite a bit with low-level components, to optimize performance to the extreme requires a lot of knowledge about operating systems and computer networks. The most commonly tested areas include the Linux I/O stack, file systems, process scheduling, and TCP protocol details.

Algorithms and Coding. Although both take the form of solving problems, the focus of algorithm and coding assessments differs. Algorithm questions emphasize common algorithmic thinking; the problems may be relatively novel, typically focusing on binary search, greedy algorithms, divide and conquer, search, and even dynamic programming. Coding questions emphasize programming proficiency and code style; the problems may be relatively old, but the implementation can be quite tedious, typically focusing on linked lists, binary trees, and graphs. LRU and reversing a linked list in groups of k are high-frequency questions.

Domain Knowledge. Mainly classical concepts in distributed systems and storage. The most frequently tested topic is consensus protocols, such as Raft. Entry-level questions might ask you to introduce the general concepts and basic flow; more advanced questions might cover how to perform linearizable reads, how to handle the thundering herd problem, election log requirements, election details, etc. Other topics include classic projects and papers, such as certain design details of GFS, or compaction details of LSM-trees.

System Design. Since I’ve been working for several years, there will be some basic system design questions. The most commonly tested points are load balancing, failure tolerance, and message queues. For example, design a highly available, low-latency distributed KV system; design a delayed-trigger event management system; design a thread-safe LRU, etc.

Programming Languages. The storage field still uses C++ quite a lot, so you might be tested on some C++ knowledge. Why do I say “might”? Because I mainly used Go before, so most interviewers didn’t test me much on this. But if they do, the most common Go topics are its runtime, including the Goroutine M:N scheduling model and details of the tri-color garbage collector. I’ve only written small things in C++ and haven’t worked on large projects, so I don’t know many of its features—I won’t speak out of turn here.

Interview Process

Interviews usually consist of 2–4 technical rounds followed by one HR round. Technical rounds are generally divided into a basic technical round and a senior behavior round.

The basic technical rounds are usually the first two or three. The interviewers are generally your future colleagues and junior leaders; the main assessment content is as described in the previous section. An interview is a two-way selection process, and these interviewers directly determine your future work happiness, so you should also assess them: whether their personality is compatible with yours, and whether they can help you grow technically.

The senior round is usually the last technical round; it may be a department manager for the position you’re interviewing for, or a leader from another team conducting a cross-interview. This round usually doesn’t ask technical details, and the assessment style is relatively formulaic. Basically, they ask what problem impressed you the most, how you solved it, what your strengths are, and what your weaknesses are. It mainly assesses the candidate’s communication skills, intelligence, behavior, etc. However, generally speaking, this step won’t be a deal-breaker.

The HR round is also quite formulaic, generally consisting of three standard moves:

  1. Expected Salary: Current salary structure, whether you have other offers, expected total compensation. Expected salary is a perennial problem—I don’t know what to say either.
  2. Reason for Leaving: How is the company doing now, why are you leaving, and what are your expectations for the new company. Don’t disparage your previous employer; just state some objective facts.
  3. Personality Assessment: Talk about the most impressive problem you encountered at work, how you solved it, and what needs improvement. Sometimes they just directly ask if you’re introverted or extroverted, your hobbies, weaknesses, etc. Be neither arrogant nor obsequious, and don’t dig a hole for yourself.

Note that during the HR round, you can inquire about the work pace of the position to see if it meets your expectations—third-party information sources would be even better. In addition, you can ask the HR to add the previous technical interviewers on WeChat to get detailed information about future work content.

Question Summary

Consider this an appendix, summarizing detailed questions by category.

Domain Knowledge

  1. How does GFS ensure high availability of data? How are retries handled when errors occur?
  2. How does Raft implement reads from followers?
  3. After a majority of nodes in Raft have committed a certain log entry, can a node that does not contain that log entry become the leader?
  4. In LevelDB, if many compactions have occurred and the underlying file system has become fragmented, can WAL still maintain efficient sequential write performance?
  5. In the Raft paper, when a peer starts up it is a follower; can it be a candidate?
  6. How does Raft avoid the thundering herd problem?

Fundamentals

  1. Brief description of the TCP three-way handshake
  2. What is the meaning and initial value of the TCP sequence number? Is it random?
  3. What are the primitives in socket programming?
  4. What state in the state machine corresponds to TCP listen?
  5. What does 100% disk load mean?
  6. What does excessively high CPU load indicate?
  7. When the file system’s open function is executed, what happens from the upper layers down to the lower layers behind the scenes?
  8. What are the benefits of virtual memory design?
  9. Does mmap shared memory break process isolation?

System Design

  1. Design a distributed KV storage that supports user metadata and user follow relationships.
  2. Design a thread-safe LRU.

Coding and Algorithms

  1. RandomSet: design a set that supports a random interface, where random returns a value with equal probability in O(1) complexity.
  2. LRU
  3. Quick sort on a linked list
  4. Reverse a linked list in groups of k
  5. Linked list shuffle: Node0->Node1->…->Node(n-1), shuffle into: Node0->Node(n-1)->Node1…
  6. In Python, determine if two dicts are equal? Dynamic language, same type, same value, may contain cycles.
  7. Implement a read-write lock using mutexes
  8. Five-finger piano playing. Given an unordered sequence of natural numbers representing piano key positions, a single hand can cover at most five consecutive positions at a time; find the minimum number of hand movements required to play all positions in the sequence.
  9. A stick of length m is cut into n segments; how many ways are there to cut it? (m and n are both positive integers)

Project-Related

  1. How do you identify the bottleneck of system latency?
  2. How do you reduce the impact of GC on main traffic?
  3. How is the leader selected among multiple replicas?
  4. How is consistency and reliability guaranteed when writing to multiple replicas?

Programming Languages

  1. Go: What is the overhead of defer, and how can it be optimized?
  2. Go: In what form does the runtime exist? As a library or a binary?
  3. Go: Why are goroutines more lightweight?
  4. Go: How are channels implemented?

我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg