CSCI 356 / Fall 2024
Computer Networking
This semester, we will be writing networked programs using the Python 3 programming language. We'll be managing and deploying our code using git and GitHub. And we will be connecting to (and soon running code on) machines in datacenters across the world using Google Cloud's Compute Engine (GCE) and/or Amazon Web Services (AWS), two of the top three largest cloud providers. The goal for this project is to start getting familiar with these technologies, if you are not already. Oh, and you'll also implement a networked email protocol client.
Please use Python version 3.10 or higher. Do not use Python 2.x, it is an older and now obsolete version of the language. You will need to be familiar with at least:
If you'd like to use logos, the department server, that is fine, it has Python 3.10.12 already installed, just type "python3" at the command line.
You can install Python on your own computer (you might already have it). Or use the python3 command on logos. For simple things, repl.it can be a nice way to play with Python and other languages right in your browser.
git is an invaluable tool for distributed networked source code version control. GitHub is a centralized networked site for sharing source code. GitHub also provides a user-friendly point-and-clicky way to use git, and Visual Studio Code integrates with both git and GitHub. But you will need to learn to use git directly from the command line too, so you might as well start now.
Before the advent of web-based mail interfaces, we used local email clients—graphical ones like Eudora, Thunderbird or Outlook. Or even text-based, command-line clients like pine (from the University of Washington), GNU mailx (available and installed on logos, amazingly, though not properly configured), or mutt (slogan: "All mail clients suck. This one just sucks less.").
Email isn't dead (yet), and neither are SMTP, POP3, and IMAP. All of the above infrastructure is still around, and still used to deliver mail, though nearly everyone these days uses a web service like gmail, rather than a local mail program. But even gmail still uses and supports all three protocols. Yes, you can still fetch your gmail using POP3, and you can even configure one gmail account to retrieve mail from some other email server using POP3 or IMAP, or to send outbound email using any SMTP server of your choice.
The POP3 protocol is defined by RFC 1939. Refer to this document for details on the protocol.
The POP3 protocol is text-based, and simple enough that a human can type out the protocol messages by hand if needed. It is rarely a good idea to test code on a production system like your real email accounts and Holy Cross's (or Google's) actual POP3 and SMTP servers (you might accidentally delete all of your own email, for example). Instead, I have set up two test mail servers running a POP3 server (you can see the code in pop-server.py, with test accounts and test mailbox files, both on Google's cloud, one in Northern Virginia and the other in Dallas, Texas. You don't need to run the pop-server.py, since I already am running it on those two severs. Connect to the servers using the starter code in pop-client.py like this:
./pop-client.py whitehouse.kwalsh.org 110 # connects to Northern Virginia POP3 server
or:
./pop-client.py enron.kwalsh.org 110 # connects to Dallas, Texas POP3 server
I created five test user accounts on those servers: skilling, delainey, forney, lay, and kwalsh.Footnote 1 All of them use hunter2 as the password.
If you find that one of the mailbox accounts is locked, it is because someone else is logged in to that mailbox, and the POP3 protocol mandates that only one connection to a mailbox is allowed at a time. Try a different username. Or, for testing purposes, I have made nine additional copies of each mailbox, at skilling1, skilling2, ..., skilling9, etc., so just try one of these usernames.
Once you have a feel for how POP3 works using pop-client.py, create a new program, email-client.py, that provides a user-friendly (but still text-based) interface. The details of what email-client.py does are up to you, but here is an example of how the user would interact with it (user input is in blue italics):
shell$ ./email-client.py nobody Welcome to the email client! Connecting to server enron.kwalsh.org on port 110. Connected! Welcome to POP3 enron server for csci356 Logging in to server as user nobody with default password. Failed. Server error was: -ERR Sorry, user name nobody doesn't seem to have a mailbox. shell$ ./email-client.py skilling Welcome to the email client! Connecting to server enron.kwalsh.org on port 110. Connected! Welcome to POP3 enron for csci356 Logging in to server as user skilling with default password. You have 1253 messages in your mailbox. Type 'q' at any time to quit, or hit enter to see the list of messages. * [1] (new) (12303 bytes) dorsey@enron.com Monday's Committee Meeting [2] (new) (855 bytes) shankman@enron.com Re: resid business [3] (new) (1710 bytes) tom.mashington@enron.com Africa [4] (new) (1656 bytes) andrew.fastow@enron.com FW: Monday Meeting [5] (new) (1021 bytes) tomskilljr@enron.com Pictures Do you want to (r)ead, (d)elete, or (s)kip message 1, (g)oto other message, or (q)uit? s [1] (new) (12303 bytes) dorsey@enron.com Monday's Committee Meeting * [2] (new) (855 bytes) shankman@enron.com Re: resid business [3] (new) (1710 bytes) tom.mashington@enron.com Africa [4] (new) (1656 bytes) andrew.fastow@enron.com FW: Monday Meeting [5] (new) (1021 bytes) tomskilljr@enron.com Pictures Do you want to (r)ead, (d)elete, or (s)kip message 2, (g)oto other message, or (q)uit? s [1] (new) (12303 bytes) dorsey@enron.com Monday's Committee Meeting [2] (new) (855 bytes) shankman@enron.com Re: resid business * [3] (new) (1710 bytes) tom.mashington@enron.com Africa [4] (new) (1656 bytes) andrew.fastow@enron.com FW: Monday Meeting [5] (new) (1021 bytes) tomskilljr@enron.com Pictures Do you want to (r)ead, (d)elete, or (s)kip message 3, (g)oto other message, or (q)uit? r [start of message 3] Message-ID: <19349609.1075840149401.JavaMail.evans@thyme> Date: Sun, 15 Apr 2001 23:52:00 -0700 (PDT) From: tom.mashington@enron.com To: jeff.skilling@enron.com Subject: Africa ... more lines of the message (omitted here) ... [end of message 3] [1] (new) (12303 bytes) dorsey@enron.com Monday's Committee Meeting [2] (new) (855 bytes) shankman@enron.com Re: resid business * [3] (old) (1710 bytes) tom.mashington@enron.com Africa [4] (new) (1656 bytes) andrew.fastow@enron.com FW: Monday Meeting [5] (new) (1021 bytes) tomskilljr@enron.com Pictures Do you want to (r)ead, (d)elete, or (s)kip message 3, (g)oto other message, or (q)uit? d [1] (new) (12303 bytes) dorsey@enron.com Monday's Committee Meeting [2] (new) (855 bytes) shankman@enron.com Re: resid business [3] (old) (deleted) * [4] (new) (1656 bytes) andrew.fastow@enron.com FW: Monday Meeting [5] (new) (1021 bytes) tomskilljr@enron.com Pictures Do you want to (r)ead, (d)elete, or (s)kip message 4, (g)oto other message, or (q)uit? d [1] (new) (12303 bytes) dorsey@enron.com Monday's Committee Meeting [2] (new) (855 bytes) shankman@enron.com Re: resid business [3] (old) (deleted) [4] (new) (deleted) * [5] (new) (1021 bytes) tomskilljr@enron.com Pictures Do you want to (r)ead, (d)elete, or (s)kip message 5, (g)oto other message, or (q)uit? s [4] (new) (deleted) [5] (new) (1021 bytes) tomskilljr@enron.com Pictures * [6] (new) (3071 bytes) keep.guessing@enron.com Image of Enron in India [7] (new) (1480 bytes) ted.hall@enron.com Meeting with Jeff Skilling at Enron [8] (new) (3111 bytes) tim.despain@enron.com Moody"s Annual Review Meeting Do you want to (r)ead, (d)elete, or (s)kip message 6, (g)oto other message, or (q)uit? d [4] (new) (deleted) [5] (new) (1021 bytes) tomskilljr@enron.com Pictures [6] (new) (deleted) * [7] (new) (1480 bytes) ted.hall@enron.com Meeting with Jeff Skilling at Enron [8] (new) (3111 bytes) tim.despain@enron.com Moody"s Annual Review Meeting Do you want to (r)ead, (d)elete, or (s)kip message 7, (g)oto other message, or (q)uit? g Message number (1 - 1253)? 3 [1] (new) (12303 bytes) dorsey@enron.com Monday's Committee Meeting [2] (new) (855 bytes) shankman@enron.com Re: resid business * [3] (old) (deleted) [4] (new) (deleted) [5] (new) (1021 bytes) tomskilljr@enron.com Pictures Do you want to (u)ndelete or (s)kip message 3, (g)oto other message, or (q)uit? u [1] (new) (12303 bytes) dorsey@enron.com Monday's Committee Meeting [2] (new) (855 bytes) shankman@enron.com Re: resid business * [3] (old) (1710 bytes) tom.mashington@enron.com Africa [4] (new) (deleted) [5] (new) (1021 bytes) tomskilljr@enron.com Pictures Do you want to (r)ead, (d)elete, or (s)kip message 3, (g)oto other message, or (q)uit? q You have marked 2 messages for deletion. Are you sure you want to delete them (y/n)? y All done!
Notice that the user does not, and should not, have to know any POP3 protocol commands, and never sees any cryptic POP3 details. Your email client should hide these behind a friendly interface. Some user actions (like deleting an email) might correspond to a single POP3 command. But other actions, like starting the program, might correspond to multiple POP3 commands (logging in with username and password, then getting the number of messages, then listing the first few messages). Similarly, there is no single command for undeleting a message, you'll need to implement this using a series of POP3 commands instead. And some actions, like "skip message", don't have a direct POP3 equivalent at all.
You don't need to follow the precise format as the example interaction above, but try to follow the general idea:
Error checking: With network programming, communicating parties don't necessarily fully trust each other. So in general, you should be very defensive about any messages you receive, making as few assumptions as possible. But to keep this project, we will ignore this, and assume the server is non-buggy and non-malicious. Specifically, if the POP3 server response starts with "+OK", you can assume the remaining info sent from the server is valid and has the correct format. So you don't need to check for malfomed messages. But note, for many responses, the string following the "+OK" can be essentially anything at all, and may vary at random from message to message. If the response starts with "-ERR" or anything other than "+OK", you can assume your client code, or the server code, or the network connection, has fatally crashed and is no longer able to function properly. That means your program should check for POP3 errors ("-ERR" responses, or anything that doesn't start with the expected "+OK"), but you don't need to do anything particularly fancy here. You can simply display the error message to the user then exit the program. This should not happen in practice, hopefully, because aside from the username, there should be no opportunity for the user to cause a POP3 error. For example, your program should not allow the user to attempt to read read a message that does not exist, or to use a negative message number, or any other action that might cause an illegal POP3 command. Your program should only ever send valid, legal POP3 commands to the server, regardless of what the user types.
Aside from the built-in "socket" module, which you should use, you MUST NOT USE any built-in or third-party python libraries related to POP3, IMAP, email, or network communication.. Your task is to implement that functionality yourself, not simply use someone else's POP3 python code.
You MAY use other built-in or third-party python libraries ("modules") as you like, and you can borrow bits and pieces of code freely from anywhere you can find (just be sure to cite your sources, as always), so long as they do not relate directly to POP3, IMAP, or email. The main part of the program, including anything related to POP3, should be code you write yourself, entirely from scratch.
Once your code is committed to GitHub classroom, there is nothing further to submit. All code you commit to projects in the CSCI 356 GitHub classroom are shared with your instructor automatically, but otherwise are private by default.
GitHub's 10 Minute Git Handbook gives a decent and quick introduction to git. You can go a long way with just git clone, git add, git commit, git status, git push, and git pull. There are many other tutorials, like this one. You can skip the advanced stuff with branches, merging, rebasing, reflogs, etc. And you don't need to use the git init command if you use GitHub to create your projects then git clone to download them.
For a more hands-on tutorial, in our GitHub classroom you'll find a starter tutorial assignment. This is entirely optional, I've never done it myself, but you might find it useful if you are new to git and GitHub. Feel free to try it out.
There is an endless variety of python tutorials available online. Many tutorials are even interactive, allowing you to write python code right in your browser. Some recommendations are below, but feel free to find one you like better. I have included a very brief python tutorial below. Be sure to use Python 3.x, rather than Python 2.x. The main difference you might notice is that in python 3 one writes:
print("Hello World") # python 3
Whereas in python 2 the parenthesis are left off for print (but only print, not other functions):
print "Hello World" # python 2
Also, Python 3 is moe fussy about text encodings and the difference between bytes and strings.
Warning: Indentation is meaningful in python! Python has no curly braces. Instead, indents (using spaces or tabs) are used to show the start and end of each block of code for loops, conditionals, etc. You need to be consistent: use only spaces, or only tabs, and make sure your code lines up in a nice column. It matters in python.
Hello: Create a file named "hello.py" containing the following:
# My first python program print("Hello World!") # We can do variables! x = 2 y = 5*x + 3 if x > 0: y = y - 1 else: y = y + 1 print("The numbers are", x, "and", y) mylist = [17.0, x, y, "hello", "goodbye" ] print(mylist[3]) for v in mylist: print(v)
Running python programs: There are two ways to run a python program. Note: usually you must specify python3 when running your code, because plain python often defaults to python2. You can type this at the command line:
python3 hello.py
Linux/MacOS only: Or, you can tell the operating system that your "hello.py" file should be considered to be a program, by following these three steps:
Interactive python: A nice feature of interpreted languages like python is that you can work interactively. On the command line, type python3, then start typing python code. This is a nice way to test out little snippets of code and get a feel for the language.
More programs: Here are some simple programs. Python is very easy to read, so you can probably figure out what these do just by reading the code.
Python 3.x string vs bytes. Python 3.x is fussy about the difference. Basically, a string holds unicode text, but a bytes object holds raw data. You can encode() a string to get the raw bytes representing that text, or you can decode() some raw bytes to get back unicode text. Also, "hello" is a string, but b"hello" (with the "b" prefix) is a bytes object.
Global variables Creating a global variable is easy, just make the variable outside of any function, e.g. at the top of the program:
myvariable = 37 mylist = [ "Swords", "Fenwick", "Dinand" ]
When you are inside a function and want to modify some global variable (myvariable, for example, or mylist), you might need to put "global myvariable" or "global mylist" INSIDE each such function, just once near the top of the function. Like this:
def somefunction(...): global myvariable global mylist ... myvariable += 1 ... mylist.append("whatever")
When you are inside a function and want to use but not modify some global variable, you can probably just go ahead and use it, it should work fine.
(1) There are surprisingly few public data sets containing realistic email messages to use for testing purposes in projects like this one. In fact, there is essentially only one: the Enron email archives, containing about 500,000 messages mostly from senior Enron management involved in a massive energy trading and fraud scandal. This trove of email was made public and posted to the web by the Federal Energy Regulatory Commission during its investigations. The user names and contents of the mailboxes used in this project are a tiny subset of that data, hosted here. I have not reviewed or filtered the contents of these emails in any way, and do not know what is in them. For purposes of this project, they are just convenient filler text that happens to be in the proper format for the POP3 protocol. Read them at your own discretion.
(2) There is command in POP3 that can "undelete" a single message, there is only a command to reset all messages back to the "undeleted" state. So you'll need to implement this feature "on the client side". One way is for your client code to keep a list of message numbers that have been deleted ( or an array of booleans, one for each message). Then, to "undelete" one specific message, you can remove it from your list (or modify that message's boolean in your array), then reset all messages back to "undeleted", then re-delete all the ones still in your list. There are other client-side approaches that would work too.
(2) POP3 servers do not have any notion of "old" or "new" message status. So you'll need to implement this feature on the "client side", without the server's help. One way is for your client code to maintain a list of messages that have been previously read. Or, maintain an array of booleans, one for each message.