This is a problem I encountered a long time ago. It was quite interesting, so I still remember it to this day. The problem borrows the context of TCP and asks you to implement a key piece of TCP logic: “in-order assembly”:
- From the TCP layer’s perspective, IP layer packets are received out of order.
- From the application layer’s perspective, data delivered by the TCP layer is in order.
What’s interesting about this problem is that by borrowing the TCP context, you can first discuss some TCP fundamentals with the candidate, then pivot to introduce this problem. This way, you can test both foundational knowledge and engineering coding skills.
Problem
1 | struct Packet { |
作者:木鸟杂记 https://www.qtmuniao.com/2024/05/05/infra-interview-tcp 转载请注明出处
Discussion
Because this problem is somewhat grounded in “reality,” there are quite a few points worth discussing. Conversely, without proper modeling and simplification, the implementation complexity would be very high—you couldn’t finish it in forty-five minutes.
Regarding the Packet struct:
- Will offsets overlap? (This can happen if enough data is sent to exhaust the range of
size_t.) - Is the length fixed or variable?
Regarding the read call:
- Is
TCP::read()blocking? If it is blocking, should it block until it has accumulatedcountbytes before returning, or should it return immediately as soon as some data is available? - What does
TCP::read()return afterTCP::finish()is called?
Regarding memory issues:
- When
TCP::readfills data into the application layer’sbuf, should it copy the data? - Should the memory for
Packet::datainTCP::receivebe freed within the TCP class?
Implementation
This is essentially a producer-consumer problem. We need to maintain a thread-safe ordered data structure: the producer (TCP::receive) puts data into it, and the consumer (TCP::read) takes data out of it. The requirements are: out-of-order insertion, in-order retrieval, and splittable.
To simplify the implementation, you can agree on the following assumptions with the interviewer.
Regarding the Packet struct:
- Offsets do not overlap.
- Variable length.
Regarding whether read blocks, we can assume:
- The
readfunction is blocking. readreturns immediately once some data is received, even if it hasn’t reachedcount.
Regarding memory, we can assume:
- The TCP class is not responsible for the lifecycle of received data, but the caller must ensure that the data in each packet remains valid (not freed) throughout the entire lifetime of the TCP object.
- Data delivered to the application layer will be copied into the user-provided
buf.
Here is the code:
1 | struct Packet{ |
Key implementation points include:
- Use
std::mapto organize all IP layer packets. - Use
nextOffset_to track the next offset to deliver to the application layer. - If a packet is split, remember to put the remainder back.
Testing
Testing for this problem is also interesting:
- How to simulate out-of-order delivery.
- How to manage the lifecycle of data in all IP packets.
- How to verify the data received by the application layer.
1 | TCP tcp; |
If you want to run the code yourself, you can grab it here.
Final Notes
If there’s still time, the interviewer might discuss with you how to implement it if the assumptions you made are removed. But as long as you can solve the basic version, the rest is all bonus points.
This article is from my column “System Daily Notes” in the infra programmer interview question series. This series collects some essential problems I’ve encountered over the years while interviewing and being interviewed. So far, seven articles have been published: blocking queue, lock-free queue, event queue, hash table, in-order assembly, heap, and trie. If you’re interested, more will be updated in the future.
The column currently has over a hundred articles. If you find my writing helpful, feel free to leave comments and subscribe to support me. Your support is my greatest motivation to keep creating.
