CSCI 307
Home | | Schedule | | Resources Python resoucres Anaconda is a great distribution to get started with Python for data mining applications, as it comes pre-installed with a number of useful libraries and has a good library management systems. For people more familiar with C++/Java, these can be useful.
The first library we will start studying is NumPy. And the official documentation is quite good. Beyond that, I quite like this chapter from the "Data Analysis with Python" book by Wes McKinney.Next we start on Pandas. Which is great for reading in structured tabular data, such as in .csv files or such. It also has a lot of functionalities for data analysis and works seamlessly with numpy, matplotlib (which we'll see later) etc.
The primary plotting library we will be using is matplotlib.pyplot. There's a very short tutorial here, that gives you lots of examples. Feel free to find the scatter plots in the page and compare it to what you do in HW2. Math Resources Matrix and Linear Algebra
Principal Component Analysis
|