CSCI 356 / Fall 2024
Computer Networking
Some people want more than just a little interactivity in their web pages. They want fully dynamic, multi-party interactive "web applications". To support this, a common approach is to build a client-side application as a set of html and javascript pages that run in the browser. These make HTTP requests to a custom application-specific webserver that processes those requests. The requests and responses are formatted and processed as standard HTTP messages, but the contents of those requests essentially form an entirely new protocol running "on top of HTTP" (or "inside HTTP"), piggy-backing on top of the existing HTTP infrastructure.
In this project you will implement whisper, an interactive, multi-party web app, loosely inspired by micro-blogging and social media platforms like Twitter/X or TikTok. The client side of the web app is provided, written in a combination of html/css and javascript. Your task is to implement the server side of the application in Python. Most web developpers would draw from existing libraries, frameworks, etc., but we will do everything from scratch.
This project can be done individually or in teams of two. You'll build off of your webserver.py from the last project. If you believe your webserver.py is broken beyond repair, stop by office hours and we can fix it together. In office hours I can also provide enough code to get it working to the level needed for this project. As long as mime types and the basic functionality work, you should be fine.
Code for the project can be found on GitHub classroom, using this invitation: https://classroom.github.com/a/SoNd3kPN If working as a team, make sure to create a single team github repository, rather than two separate ones.
Regarding online sources: Same policy as always. Be sure to cite and fully understand any code you find online while working on this project.
Ultimately, the whisper should allow a anyone to post messages (anonymously, for now), and to view messages they or anyone else has posted. Messages are organized by topic (aka hashtag). Along the left side is a list of topics that have been mentioned in messages. So if someone posts "Anyone do #StudyAbroad last year? #travel #australia", then the sidebar would show "#StudyAbroad", "#travel", and "#australia", along with other topics that have been mentioned in other earlier messages. You can select one of the sidebar topics to see all messages mentioning that topic. You can't "like" individual messages, but instead there is a "Like Topic" button, and everyone can see how many "likes" each topic has gotten. As a reach goal, you can downvote or censor specific messages.
Caution: In the sidebar, and in messages, topics are shown with a leading "#". But within the back end client-server protocol, topics are usually encoded without the leading "#". This saves a little space, and avoids URL-encoding annoyances, since "#" can't appear in a URL.
Your server will need to store topics, messages, and other data in global variables. Soon, you'll need to implement careful locking and synronization for these variables, for initially, just make whatever variables you need. You can organize this data however you see fit, e.g. using classes, arrays, tuples, strings, etc. You'll need to keep track of:
It's helpful to initialize some of these variables with some default data. Initialize the topics list to contain one or more default topics, e.g. "whatever", "holycross", "music". You can include the "#" prefix if you like, e.g. "#holycross", or not. The like-count and message-count for those default topics should also be initialized. All version numbers should be 0 initially, adn the message counts should be zero, unless you have any default initial messages.
One of the first things the whisper client javascript code does, running in the
browser, is to GET /whisper/topics?version=0. Modify the server to respond to
this HTTP request. Your server should send back "200 OK" HTTP response and
"text/plain" response body formatted like this:
0\n 7 2 whatever\n 15 6 holycross\n
The first line is the topic list version number ("0" in this example), ending in a newline character. Each other line contains information about each topic. There are 7 messages and 2 likes for #whatever, and there are 15 messages and 6 likes for #holycross. At first, you'll probably have mostly zeros everywhere, since no messages or likes have been posted yet. Try to follow the format exactly: a number, a newline, two integers and a string separated by spaces, another newline, etc.
Very soon after sending the first HTTP request, the whisper client will do GET /whisper/topics?version=1 (or some number other than 0), For now, your server's worker thread should just go into an infinite loop and never return any HTTP response at all. That's right: your code in this case should not return an error, or do anything at all for now -- it should deliberately go into an infinite loop, e.g.
while True: pass
Locking and synchronization: Because multiple browsers may be requesting the topics list simultaneously, all access to the global variables containing the whisper data must be done with special care, using a lock. See notes on concurrency and synchronization below.
Testing: At this point, you should be able to run your server, open the whisper client in a browser, and you should see your default topics listed in the left sidebar. You should be able to have multiple browser windows open, and all of them should see the same list of topics appear.
The whisper client has a button to send a message (like posting a tweet, or posting a comment on a forum). When the user clicks it, the whisper client javascript will send an HTTP POST request to /whisper/messages containing the message and the list of topics mentioned in the message. Modify the server to handle "POST /whisper/messages", adding the message to each of the topics mentioned, and increasing the message count for each of those topics. If one of the topics from the HTTP request doesn't yet exist in your list of topics, you should add it along with the associated like-count, message-count, etc. Send back an HTTP "200 OK" response with a "text/plain" body containing just the word "success" in lowercase without any newlines, spaces, etc. Anything else will be considered an error by the client.
Locking and synchronization: As before, you must use a lock while accessing any of the whisper global variable data, since multiple threads will be trying to access the data at the same time. I suggest using one lock to protect the main "topics list" and it's associated data, and another lock for each topic to protect the message list for that topic.
Testing: You can test this feature now in the browser, and check the python console output to see if it works. But the web page won't show anthing new: the message won't appear yet, and the topic list won't update automatically either, unless you refresh the page.
POST Parameters: The message text, and accompanying list of topics mentioned in the message being posted, is *NOT* encoded into the POST request URL path. It is in the POST request payload (the body), in text/plain format. Try printing req.path and req.body to see what I mean. HTTP allows clients to put parameters in the URL path or to put arbitrary data into the payload, and for POST /whisper/messages, I chose to put the message into the payload rather than the URL path. For the interactive hello page you made in the last project, the parameter was in the URL path instead. The format of the payload will be like this:
tags... holycross music coding\n message... Anyone going to #coding #music concert at #holycross?\n
There will be two lines of text. The first will be "tags..." then a space, then a list of space-separated topics, then a newline. The second line will be "message..." then a space, then the message text, then a newline.
Error handling For this and all other requests, if anything goes wrong, you should send back an appropriate HTTP error, or supply a reasonable default value. For example:
The whisper client javascript code, running in the browser, needs to discover any new topics as soon as they appear in the server. A terrible, naive way to do this is to "poll" the server: the client could re-request the topic list over and over again, making repeated HTTP "GET /whisper/topics" requests, monitoring the responses to see when the list returned from the server changes. This is wasteful and considered bad form. Instead, whisper client and whisper server will use version numbers to help synchronize changes.
The initial topic list is version 0, containing only the default topics the server starts with. This is the version the client initially requested (GET /whisper/topics?version=0), and this is what your server initially returns (the "0" on the first line of the response). Now implement the rest of the versioning protocol. The idea is that if the client requests version X, the server should wait until the version number is at least X before replying.
Waiting in python: See below on wait/notify_all to learn how to wait for a variable to change in python. Conceptually, you want something like:
while topic_list_version_number < X: wait for that variable to change
Testing: Test your app. The sidebar list of topics should update immediately when new messages are posted (though the messages themselves will still not show up on the page). This should work even across browsers or browser windows: if you post a message in one browser window mentioning some new topic, the other window should immediately react, showing the new topics mentioned, without the user needing to refresh the page.
Your server should keep track of all messages that have been posted so far, either one list per topic, or one large combined list. You should have variables to keep track of a version number for each topic.
In response to a "GET /whisper/feed/TOPIC?version=X" request, where TOPIC is a topic string like "holycross" or "coding", and X is an integer, your server should wait until the version for that topic is at least X, then return a list of messages for that topic. The response should have HTTP status "200 OK" with a text/plain body like this example, which is for the "coding" topic:
3\n - Anyone going to #coding #music concert at #holycross?\n - Competitive #coding contest soon!\n - Got a #coding interview today. #fun\n
As always, try to follow the format precisely. The first line contains the version number for this topic (which must be X or higher, where X comes from the URL path), and a newline. The remaining lines each must contain a dash, a space, a message, and a newline. For now, the version number will also match the number of messages, since each message POST adds one to the version number, but that won't always be the case. If messages get deleted (see reach goal below), for example, that would increment the version number, but decrease the number of messages.
As with the topic list versioning, here the server should wait until the version of the requested topic is at least X, before sending back a response containing the message feed for that topic.
Error checking: As always, perform some basic error checking. If anything goes wrong, send an HTTP error response (e.g. "404 NOT FOUND" if the requested topic doesn't exist, or "400 BAD REQUEST" if the request was malformed.
Testing: You should be able to now select a topic in the sidebar and see all messages that mention that topic. If you post a new message mentioning that topic, the new message should instantly appear in the list. This should work regardless of whether you post from the same browser window or from a different browser window.
The web app has a button to "like" the currently viewed topic. When this button is pressed, the client will POST to /whisper/like/TOPIC (where TOPIC is a string like "holycross" or "coding"). Your server should increment the "like" counter for the corresponding topic, increment the version number for the topics list, and return "200 OK" with a text/plain response body containing just the word "success" and nothing else.
Error checking: As always, do reasonable error checking and send back appropriate error responses.
Testing: The "like" button, and all other features, should now work. You should see the "like counts" appear in the sidebar, instantly, whenever a topic is "liked".
Sort topics by popularity: When replying to GET /whisper/topics, sort the list in some way. You might sort by most upvoted topic, or most posts, or by most recent activity, for example.
Limit by recency: Don't store *all* messages posted to a topic. Instead, only keep the most recent messages. You might only keep the most recent 10 messages, for example, deleting old messages as newer ones are posted. Or you might keep a timestamp for each message, and delete any message more than N minutes old.
Downvotes: The client already has code for a per-message downvote feature. For every message posted to your server, assign some kind of "ID". Your ID can be numeric, or a combination of words and numbers, etc., but should not have any spaces. When replying to GET /whisper/feed/TOPIC requests, rather than using a dash on each line, include the message ID at the start of each line. For example:
3\n 75 Anyone going to #coding #music concert at #holycross?\n 119 Competitive #coding contest soon!\n 47 Got a #coding interview today. #prayers\n
The numbers 75, 119, and 47 are the message IDs for those messages. When the user clicks a downvote button, the server will POST to /whisper/downvote/MSGID (where MSGID is one of the IDs from the server's list). Your server should respond by removing the corresponding message, or changing the message text to something like "(This message has been removed.)", and also incrementing the version number for that topic, then return "200 OK" and with text/plain response body "success". (Hint: For the message IDs, you could use something like "holycross.7" to represent the 7th message in the list for the holycross topic. Or "5:2" to represent 5th topic, 2nd message. Or pick a large random number when a message is posted, and keep track of them somehow. Or sequential numeric IDs assigned when messages are posted. It's up to you.)
Other server-side or client-side features: Implement any other feature, e.g. using cookies or anything else you can think of. The above features only require server-side python code, and there are many other features one could add by only changing the server. If you are familiar with html/css/javascript, you may modify the client as well. You could add a username feature, for example, tagging each message with the name of who posted it. Or store messages in a file, so it persists even after the server is restarted.
It's fine to use global variables to keep the list of topics, the lists (or list) of messages, the version numbers and like-counts, etc. It is good practice to use the "global" keyword when using global variables.
When you are inside a function and want to use but not modify some global variable, you can probably just go ahead and use it, without any "global" declaration, it should work fine. But if you forget the "global" keyword inside some function, and that function tries to modify the variable, it will accidentally create a second local variable instead. This leads to subtle bugs. Moral of the story: use the global keyword for global variables.
IMPORTANT: You need to use locks and the wait/notify_all syntax below, otherwise this project won't work properly.
There will be multiple simultaneous connections and requests, even if you only have one browser window open. Your server will be simultaneously processing multiple requests at the same time. This is normal, and necessary. For example, suppose the topic list is at version 4. While one browser might have made a request to GET /whisper/topics?version=5, this request will be on hold, waiting, while some other request from a different browser to POST /whisper/messages might be in progress, and yet other requests from other browsers might be happening as well to "like" a topic. Global variables cause trouble in this situation. To avoid trouble, you must use Python's threading.Condition() objects. Here's how.
Suppose we have a few global variables that are all related to each other in some way.
a = [ "Foo", "Bar" ] b = 17 c = "Something else"
Make another global variable to accompany them, like this:
# the updates variable is a kind of lock, meant to protect a, b, and c updates = threading.Condition()
Now, whenever any piece of code ever tries to access a, b, or c, it should always do so inside of a "with updates" block, like this:
... some code ... # danger zone: don't ever use a, b, or c out here with updates: a.append("Hi mom") # this line is within the safe zone b = b + 1 # this line is within the safe zone c = "Nevermind" # this line is within the safe zone ... more code here ... # danger zone: don't ever use a, b, or c out here
What the "with updates:" block accomplishes: Even though many different threads may be running, executing many different parts of your code at the same time, Python will ensure that only one thread at a time will ever be within any "with updates:" block. Essentially, it makes your code take turns executing the "with updates:" blocks. The "updates" variable acts like a lock, where only one thread can open (or "hold") the lock at a time.
Example: The webserver.py code from the last project has a variable named stats.lock, which is used to protect all of the the server statistics variables. It works exactly as described above.
You will need an "updates" variable like this to protect the global topic list and it's version number, like counts, etc. And you'll need an additional "updates" variables like this for each topic to protect the message list associated with the topic and the matching version numbers.
wait/notify_all: In a few places, your code needs to wait for some future event to happen. The "updates" variable described above provides this ability. For example, to wait for variable b to be 10 or higher, we can do:
... some code ... # danger zone: don't ever use a, b, or c out here with updates: ... # this line is within the safe zone while b < 10: # repeatedly checks if b is less than 10 updates.wait() # if not, wait until notified that something changed ... # this line is within the safe zone ... more code here ... # dont ever use a, b, or c out here
That's half the code. To make it work, we need to add a line of code wherever b is modified:
... some code ... # danger zone: don't ever use a, b, or c out here with updates: ... # this line is within the safe zone b = b + 1 # this changes b updates.notify_all() # this notifies everyone that something changed ... # this line is within the safe zone ... more code here ... # dont ever use a, b, or c out here
Push your code to your github repository on the master branch. Be sure to include your "webserver.py" and any other files needed.
Collaboration log Please either add a collaboration.txt file to your webserver.py or README.md file.