CSCI 356 - Computer Networking

CSCI 356 / Fall 2024

Computer Networking

Homework 2

UPDATE -- I should not use the word "average" in most of these questions. I mostly mean "what would you expect to happen, in practice, in a typical scenario". There are only a few places where "arithmetical average" would actually make sense.

Submit a single PDF with your solutions. You can type your responses with interspersed photos/diagrams (preferred), or take photos of handwritten work and convert to a single PDF.

This assignment has two parts. The first part contains written problems intended to reinforce your knowledge of key networking concepts. The first midterm exam, which won't be held for several more weeks, will contain similar problems, so this part of the assignment also serves as practice for the exam. Several of these questions directly or closely derive from exercises in Kurose & Ross, chapters 1 and 2.

The second part of this assignment is a short practicum in which you will gain some familiarity with common networking administration and diagnostics tools.

Part A - Short Answer & Written Exercises

For all exercises that require calculations, use and show units as appropriate, and show your work and/or explain your reasoning (briefly -- no need for an essay please).

Q1. Consider host names, IPv4 addresses, IPv6 addresses and port numbers. Give two actual, specific examples of each, and say what host or program your addresses or port numbers belong to. In general, how many bits (not bytes) does it take to write a host name? An IPv4 address? An IPv6 address? A port number? If the size is variable, say so but still also give a typical or approximate size.

Hint: Use "nslookup" or "dig" in the terminal on logos (or on your MacOS, or "nslookup" on windows)! These can give you the IPv4 and/or IPv6 address for any host name you like, e.g. "nslookup www.instagram.com" ... though some hosts won't have an IPv6 address, or may have weird and confusing configurations.)

Q2. Consider the network below, with 8 hosts (A-D, W-Z), 6 routers (R1-R6). Link bandwidths and propagation delays are shown.

Suppose Host A is transmitting data to Host W over a connection that passes through routers R1, R3, and R2.

(a) Assuming ideal conditions and no other traffic in the network, what throughput can we expect, ~~onb average~~ in practice, for this connection? What is the propagation delay?

(b) Assuming host A sends data at exactly the rate you determined, and ignoring all delays caused by the three routers (e.g. queuing or procesing delays, i.e., as if there were a single link going directly from host A to the destination host W with the throughput and delay you determined above), how long would it take to transfer a moderately small 400 KB file from A to W? Include both transmission and propagation delays, counting from the moment A begins sending data, to the moment W receives the last bit of data.

(d) Re-calculate (b) and (c), the end-to-end transfer times for a 400 KB and a 400 MB file, taking into account store-and-forward behavior of the routers. Here, host A transmits data to R1 as a single contiguous packet (the full 400 KB or 400 MB amount), as fast as the A-to-R1 link allows. Once the packet has fully arrived at R1, R1 in turn transmits to R3 at the full R1-to-R3 link speed, and so on.

Q3. Consider the same network shown in Q2, again assuming ideal conditions and no other traffic in the network.

(a) For each of four hosts on the right side (Hosts W, X, Y, and Z), find the path with shortest propagation time from Host B to that host. Say which routers each path goes through, and what propagation time is achieved for the path.

(b) For each of four hosts on the right side (Hosts W, X, Y, and Z), find the path with highest throughput from Host B to that host. Say which routers each path goes through, and what throughput is achieved for the path. Do each pair in isolation, e.g. when B is sending to W, assume there is no other traffic in the network, then similary when B is sending to X, etc.

Q4. Consider the same network shown in Q2. Each host on the left (A, B, C, and D) wants to send data at a rate of 30 Mbps to the corresponding host on the right (W, X, Y, and Z, with A sending to W, B sending to X, etc.).

(a) If the network uses circuit-switching, is it possible for the network to support all four connections simultaneously? Explain. Either identify four 30 Mbps paths which together don't overload any of the individual links in the network, or explain why that's not possible. Note: this kind of calculation is exactly why the reservation approach used by circuit-switching is complicated and slow.

(b) If we increased the connections to 35 Mbps each, can the network support all four connections? If not, what's the maximum number of connections the network can support between the hosts on the left and the hosts on the right? Explain.

(c) If the network instead uses packet-switching, and the packets for each connection do not need to all follow the same route, what is the maximum transmision rate the network could support while still allowing for all four connections? You might, for example, have B send half of its data through routers R1, R3, and R4, and send the other half of its data through a longer path R1, R5, R6, then R4. By splitting data across multiple routes, hopefully the hosts can send at a faster aggregate rate.

Q5. In a circuit-switched network using cut-through routing, suppose there are N users each with a long-lived network connection, and all N connections happen to traverse a particular network link in common. That link has 10 Mbps total capacity and 15ms propagation delay. The applications for these network connections transmit data at a maximum rate of 1.5 Mbps, but they do not transmit continuously: instead, they alternate between idle periods, when they do not transmit any data, and busy periods when they transmit at their full 1.5 Mbps rate. Suppose on average each connection is busy for 20% of the time, and idle the remaining time. (Imagine, for example, a zoom call, where a typical user has their camera and mic on only about 20% of the time, and otherwise keeps their camera and mic off so is not transmitting data.)

(a) What is the maximum number of users N that can be supported? (Hint: This is a simple calculation... remember that in this type of network, there is no queueing delay and each connection is allocated and guaranteed some fixed, constant bandwidth for its entire duration.) (Also: For this and all exercises below that require calculations, show your work and/or explain your reasoning!)

(b) Using your answer for N, what is the resulting utilization for this network link, on average? That is, averaged over a long period of time, and accounting for the idle/busy behavior of the connections, what percentage of the 10 Mbps link capacity is actually used to transmit data for the N connections together.

Q6. Consider the same scenario as Q5 but in a packet-switched network with store-and-forward routing.

(a) Under ideal circumstances, what is the maximum number of users N that can be supported? You can assume all N users cooperate to perfectly coordinate and synchronize their idle/busy transmission behavior in whatever way is needed to achieve the best possible network performance.

(b) Using your answer for N, what is the resulting utilization for this network link, on average?

Q7. Consider the same scenario in Q5, again with packet-switching and store-and-forward routing, but now with users that transmit at random with 20% probability, rather than in perfect coordination. Consider the queueing behavior of the shared link's entrance router. At any given time, if many users happen to transmit, the link will be overloaded: it will carry 10 Mbps of data (100% utilization during at this moment) and a queue of packets will begin to grow at the entrance router. But if very few or no users happen to transmit, the link will be underutilized: any remaining queued packets will be transmitted (at full 100% link utilization), or once the queue is empty, then the link will sit idle (0% utilization).

(a) It seems unlikely that every single user user would choose to transmit at the same time. If there are N=16 users, calculate the probability that, at some given time, all of them transmit. If this unlikely event occurs, and sustains for 5 seconds, calculate the change in queue size. In other words, how much larger or smaller will the queue be at the end of those 5 seconds? Hint: Remember that data is being both added and removed from the queue during this time. (See hint below on calculating probabilities.)

(b) If there are N=16 users, calculate the probability that, at some given time, no users are transmitting. Also calculate the change in queue size if this event occurs and sustains for 5 seconds.

(c) If there are N=16 users, what is the chance that exactly X=10 of them transmit at some given moment, with the other N-X=6 of them remaining idle at that moment? Also calculate the change in queue size if this event occurs and sustains for 5 seconds.

(d) With N=16 users, we should expect the queue to sometimes grow (when many users transmit) and sometimes shrink (when few users transmit). How often should we expect the queue to grow, and how often should we expect it to shrink? In other words, at any given moment, what is the chance the queue is growing, and what is the chance it is shrinking? Hint: trial and error (guess and check) is a valid way to calculate!

(e) If we want to ensure the queue shrinks more often than it grows, what is the maximum number N of random-behavior users the system can support? What happens to the queue, over time, if there are more than this number of users? What happens to the queue, over time, if there are fewer than this number of users?

(f) If we define a "safe" situation as one where the queue shrinks at least 75% of the time, and grows at most 25% of the time, what is the maximum number N of random-behavior users the system can "safely" support?

Hints for calculating probabilities: Here are some formulas to help. Suppose we are playing multiple games of chance. In all suppose we play n trials (or "games"), each independent of the others, and there is a p=0.20 chance (20%) of winning in any given trial. We will win some number x out of the n trials, where x could be as low as 0 (we lose all the trials), as high as n (we win all the trials), or any number between (we win some and lose some).

The chance of losing any given trial is probability = 1-p. For example, if p=0.20, then there is a 1-0.20 = 0.80 chance (80%) of losing any given trial. The probabilities add to 100%, because we either win or lose, there are no other options.
The chance of winning all n trials (x=n) can be calculated by: probability = pⁿ. If p=0.20 and n=10 for example, this probability will be very small, which makes sense: its very unlikely we would win every single trial.
The chance of winning none of the n trials (x=0) can be calculated by: probability = (1-p)ⁿ. If p=0.20, this should be more likely than the "all wins" scenario (since losing is more common than winning), though still unlikely overall.
The chance of winning exactly x of n trials can be calculated by the binomial probability distribution: probability = (n choose x)p^x(1-p)^n-x. There are calculators available online, like this one: plug in your parameters and it will calculate the probabilities.

Q8. This is about queuing delays at a router in a packet-switched network with store-and-forward routing. For simplicity, all packets have length L=1500 bytes, and all links have rate R=10 Mbps. Initially the router is idle: no packets are being transmitted on any links, and no packets are queued at the router.

(a) Suppose N=20 packets arrive simultaneously (e.g. from different inbound links) at the entrance router for some common outbound link. At the very moment the packets (simulaneously) finish arriving at the router, the router immediately begins transmiting one of them on the outbound link (making the outbound link now busy), and the other packets are put in a queue for transmission when the outbound link is no longer busy. Considering all N packets, what is the average queuing delay experienced? Hint: What is the queuing delay experienced by the second packet? the third? the last? Note: When thinking about the queueing time for a packet, don't include that packet's own transmission time as it leaves the router. So the queuing delay for the first packet transmitted is zero, since it starts being transmitted immediately after arrival.

(b) Suppose N=20 packets all arrive simultaneously, and this repeats once every 25 ms. Describe the behavior of the queue over long periods of time. Does it grow without bound as more and more packets arrive? Does it stabilize at some fized level? If possible, calculate or estimate the average queueing delay that would be observed by a packet in this system.

(c) Re-analyze the situation in (b) for the case where incoming packets are no longer synchronized. Here, there are still N=20 senders, each sending at a steady rate of one packet every 25 ms. But each sender began at some a random time in the past, so the packets don't necessarily arrive simultaneously at the router in groups of 20. Is there a worst case scenario here? A best case scenario? What would you expect on average, if senders truly began at random times?

Q9. From Worcester, MA to San Francisco, CA is about 4800 kilometers if we follow a route along major highways and communications right-of-way corridors. Suppose host A, in Worcster, is sending a 50 GB file over a direct fiber-optic link to host B, in San Francisco. The link has a (rather pathetic) capacity of R = 2 Mbps. Light travels along the fiber at a speed of 2.5 x 10⁸ meters/sec.

(a) What is the propagation delay, d_prop? Use appropriate units, e.g. ms, us, ns, or similar.

(b) What is the transmission time for this file?

(c) [somewhat weird question] How many meters "long" is a single bit? Is it longer or shorter than a football field? Hint: If we transmit a single bit, that takes a tiny amount of time... then figure out how far the light reaches down the fiber during that tiny amount of time.

(d) [another weird question] Using your answer to (c), or any other method you devise, how many bits can fit "in the fiber" at once between Worcester and San Francisco. You can assume the bits are packed tightly, with no gaps.

(e) Calculate C = R * d_prob, which is known as the "bandwidth delay product". Be sure to include and convert units as appropriate to simplify your answer. Note: One might ask... now why would anyone ever want to calculate this product? Review your answers to (c) and (d) to find out. If you don't see it, you probably did something wrong.

(f) Would it be faster to put my 50 GB file on a thumb drive and simply drive it to San Francisco in a car? Rest stops optional. For context, see: Andrew Tanenbaum's station wagon. Or the now-retired AWS snowmobile.

Q10. Same scenario as Q9, but with a much smaller 1 MB file.

(a) Suppose the 1 MB file is transferred in a single continous transmission from A to B, followed by a single short acknowledgement packet from B to A to confirm the data arrived safely. How long does the transfer take in all, accounting for both the file and acknowledgement? You can assume the acknowledgement is small, e.g. just a few bytes.

(b) Suppose the file is split into 100 equal sized pieces. Each piece is sent from A to B, then B sends an acknowledgement packet to A, before the next piece can be sent. How much slower or faster is this than the method in (a)?

Part B - Practicum

These exercises can be completed on your own MacOS, Windows, or Linux computer. Some (but not all) can be done using logos (via SSH or vscode terminal). The exact commands needed for this practicum vary by operating system, so you may need to adjust the commands slightly.

The ping program is a network utility that you can use to test whether a remote host is reachable across an IP network. It also measures the round-trip time (RTT) between the local host and the remote host. From this simple RTT measurement, we can actually learn a lot: with cleverness, we can estimate things like the how far away the destination is from us (in miles or kilometers), and the bandwidth of certain links along the route. Skim the manual page (type 'man ping' in a linux terminal, or search "man ping") to see what ping can do. Ping runs forever, so you must use Control-C to kill it, or use the "-c count" option. Please DO NOT attempt to make ping send more than one packet per second—the '-i' option is impolite at best, and somewhat dangerous at worst. There may be a graphical version on MacOS (under "Network Utility"), but the command-line version suffices.

Use ping to find the RTT (round-trip-time) from your computer (or from logos, if you don't have a computer capable of running ping) to www.holycross.edu and back again. Do not use an online web-based ping program. If the first trial or two are outliers, ignore them (the server might just have been waking up from sleep). When you kill ping with Control-C, it displays a summary of the minimum, average, and maximum RTT observed, along with the standard deviation. Report these as your answer.
The packets that ping uses to measure RTT will be affected by many different kinds of delays, including queuing delays caused by other traffic within the network, transmission delays for each link, etc. If we are interested in estimating the propagation delay of the underlying network links (or, roughly, the physical length of the links or distance between endpoints, miles or kilometers), which of the reported values should we look at: first, last, minimum, maximum, or average? Why? Hint: your first intuition is probably wrong. Think carefully. The ping program sends a packet of a given size, waits for a reply, and uses the elapsed time as it's delay measurement. Consider all the underlying factors that go into the delay, things like: transmission time at the first link and at each router, propagation time across each link, processing delays at each router, queuing delays at each router, and similar for the reply message). Some of these are fixed and do not vary much. Some depend on the size of the ping packet. Some depend on the bandwidth of the links. Some vary seemingly at random depending on how much other traffic is arriving at the routers. We aren't interested here in queuing delays, transmission-related delays, or delays due to congestion from other traffic in the network.
Repeat the experiment, but this time use the '-s' option to send pings with various packet sizes. Sizes under a dozen or so bytes are too small to be useful, but try sizes up to several thousand bytes at least. Make a plot of packet size vs RTT. Use software for this, not hand drawings. Calculate the slope of the graph ("rise over run"), and the inverse of the slope (that is, 1/slope or "run over rise"). Be sure to include units (e.g. ms, or bytes, ms per byte, or byte per ms, or whatever is appropriate). What conclusions can you draw from these results? That is, what is the meaning of the slope of the graph (or its inverse)? Be sure to say if you are talking about the entire path, or a single link on the path (perhaps the "first" link, or the "last" link, or the fastest link, or the slowest link)? NOTE: Some routers and/or hosts block large pings as a security measure, or trim large pings down to a smaller size. Be sure to look at the output to see if this is happening, and interpret your results accordingly or try pinging a different host.
Repeat the previous experiments with a host chosen as far from Holy Cross as you can reasonably find. You could try our pop3 servers from the last project, but better yet try a few hosts from foreign countries, or in very distant cities, or perhaps local organizations in under-developed countries, or governments on the other side of the globe. Aim to pick a host different from anyone else in the class, and aim for as distant as possible. Give the baseline RTT estimate and your conclusion based on using ping with varying packet size. Note: some hosts do not respond to ping packets—try a variety of hosts until you find one that does.
The '-t' option (or '-m' on MacOS) to ping sets the time-to-live (TTL) of the ping packets. A packet with a TTL of N will be dropped after traversing N routers in the network (the TTL feature helps keep packets from meandering around the network forever, most operating systems use a TTL of 64 or 128 by default for all outgoing packets). Some routers will send back a "time-to-live exceeded" error packet in such cases, which ping will display on the console. Other routers will silently discard packets that have outlived their time-to-live, so ping may appear to get stuck waiting for a reply that never arrives. Use the '-t' (or '-m' on MacOS) option to estimate out how many routers sit between your laptop (or our cloud servers) and the distant host you chose in the previous question. You should be able to find a number n, such that ping still works with a TTL of n, but fails with a TTL of n-1.
Lastly, use the traceroute tool (caled tracert on Windows) to examine the path between your laptop and your distant host. (Or, if your laptop has difficulties running traceroute or tracert, use logos instead.) Traceroute uses successively larger TTL values and special error packets to discover the names and IP address of routers between your host and a distant host. Does traceroute agree with your TTL results from ping? Submit your traceroute output. Note: some routers are uncooperative, and packets can also be lost, so the output might be missing some data points. This is normal.
Now examine your traceroute output. Pick two (or more) routers on the path and try to identify who owns the router and where it is located geographically. The router names can tell you a lot—they often contain company or government names, city names, country codes, or even zip codes. Or, oddly enough, the three-letter codes for nearby airports, like "OHR" for O'Hare International Airport. You might have to investigate or guess a bit using to answer this question. Bonus points if you can identify a likely trans-oceanic underwater network link crossing, or a satellite network link, in your traceroute output.

Help on plotting: You can plot your RTT data by hand, neatly on carefully drawn graph paper. Or better yet, there are many simple online tools you can use, for example, this one. Or you can use Microsoft Excel, Google Sheets, or Mac Numbers to plot data. Details vary, but here is a rough outline of steps for Excel or Google Sheets. Put your data in two columns, with the x data (packet size) on the left, and the y data (RTT) on the right. Highlight the data, select "insert chart", and choose "scatter plot". Be sure to use scatter plot! Bar charts, line charts, etc., are not appropriate here. It doesn't matter if the dots are connected or not, but you need a scatter plot or x-y scatter plot. Find the slope by hand, or there is also a "SLOPE" function in most spreadsheets that can calculate slopes for you, or you can likely add a "linear trendline" by right-clicking the graph if you like. Look in the help for your software for specific instructions.

Submissions

Submit your single-file-PDF response here.