Let’s create a basic program that we can run as a file on the command line. We’ll start with a basic framework using a main()
function.
# file_execise.py
def main():
pass
if __name__ == "__main__":
main()
Save your file as file_exercise.py
and run it from the command line using python file_exercise.py
.
What happened? Because you ran the file directly, the file’s __name__
variable is set to __main__
, which triggers the if
statement to run the main()
function. This is a common pattern that you’ll see in Python programs, and it comes in handy for being able to write programs that work both on their own and when imported into other programs. The pass
keyword does nothing, it’s just there to keep the empty main()
function from throwing a syntax error.
Let’s start filling in our main()
function. We have a json file named cities.json
which contains the top five cities in the US, sorted by population. You can download cities.json
here. Let’s open the file and load in the data.
# file_execise.py
import json
def main():
cities_file = open("cities.json")
cities_data = json.load(cities_file)
print(cities_data)
if __name__ == "__main__":
main()
First, we imported the built-in json
library to help us decode the json file. Then, we opened the file using the open()
function, and passed the open file handle to the json.load()
function. The load()
function read our data in and spit it out as a Python representation - in this case, a list of dictionaries. We then print this list.
(env) $ python file_execise.py
This list is a little hard to make sense of in its raw form, let’s print it a little nicer. Use enumerate()
to go through the list and print it nicely:
# file_execise.py
import json
def main():
cities_file = open("cities.json")
cities_data = json.load(cities_file)
print("Largest cities in the US by population:")
for index, entry in enumerate(cities_data):
print(f"{index + 1}: {entry['name']} - {entry['pop']}")
if __name__ == "__main__":
main()
A few new things here: first, remember that enumerate()
outputs a tuple of (index, entry), so we use index
and entry
variables to capture those. Then, for every item in the list, we print the index (+ 1, because zero-indexed lists are sometimes hard to read), and we pull the name and population out of each entry dictionary using the dictionary []
syntax.
(env) $ python file_execise.py
One more thing to clean up - using the open()
keyword on its own is frowned upon, because it won’t automatically close any resources you might open. Even if you call the close()
keyword yourself, there’s no guarantee your program won’t crash, leaving important resources dangling. It’s safer to open files inside a context using the with
keyword. Once your code exits the scope of the context, your file is automatically closed. Note: our reading and formatting code has shifted to the right because of the change in scope.
# file_execise.py
import json
def main():
with open("cities.json") as cities_file:
cities_data = json.load(cities_file)
print("Largest cities in the US by population:")
for index, entry in enumerate(cities_data):
print(f"{index + 1}: {entry['name']} - {entry['pop']}")
print("The file is now closed.")
if __name__ == "__main__":
main()
Parsing files - especially if you didn’t create them - is often tricky, and you’re going to have to deal with less-than-perfect data. For example, go into your cities.json
file and delete the last ]
character. Run your program again.
(env) $ python file_execise.py
Helpfully, the library told you (on the last line) approximately what is wrong and where. It also provides a Traceback to help you see what happened, starting with your main()
function, which called json.load(cities_file)
, and into the functions used internally to the json
library. This will become more useful once you start writing your own libraries, so practice reading and understanding your Tracebacks.
But let’s say we’re writing a web app or user-facing app and don’t want our users to see Tracebacks (they can be scary if you’re not a programmer, as well as risk your security by leaking information about your software). Let’s catch that JSONDecodeError
and return something prettier.
# file_execise.py
import json
def main():
with open("cities.json") as cities_file:
try:
cities_data = json.load(cities_file)
print("Largest cities in the US by population:")
for index, entry in enumerate(cities_data):
print(f"{index + 1}: {entry['name']} - {entry['pop']}")
except json.decoder.JSONDecodeError as error:
print("Sorry, there was an error decoding that json file:")
print(f"\t {error}")
print("The file is now closed.")
if __name__ == "__main__":
main()
Here, we’ve wrapped our business logic in another scope - the try - except
block. For the except
, we reach into the json
library and reference the JSONDecodeError
that’s part of the decoder
module. We assign it to error
so that we can reference it later. We then print out the entire error, prefixed with a tab character \t
to make it a little easier to read. Voilà, we’ve caught our error and reported it to the user with (hopefully) helpful information (but not too much information). Run your program again.
Let’s review what we learned today and put it all together.
For the final exercise of today, we’re going to write a small program that requests the top repositories from GitHub, ordered by the number of stars each repository has, then we’re going to print the results to our terminal. Create a new file called day_one.py
.
You may need to install the requests
library using python -m pip install requests
. You may see pip
used directly, but using python -m pip
is recommended by Python.
Let’s start with our key function, the one that gets the data from the GitHub API. Use the requests
library to do a GET request on the GitHub search API URL (“https://api.github.com/search/repositories"). Use if __name__ == "__main__"
to check to make sure we’re running the file directly, and to call our function. Don’t forget to import requests
Run your exercise:
(env) $ python day_one.py
Looks like we got a response from the GitHub API! Looks like we hit an error - we’re missing search parameter. Checking the documentation_url
that GitHub helpfully provides, we can see that we’re missing the parameter q
, which contains search keywords. Let’s hardcode a query string to find repos with more than 50,000 stars and try again. We’ll add our query string to the parameters
dict as q
, and pass it to the params
argument of requests.get()
Woah, we got a huge response from GitHub, including metadata for 33 repos. Let’s parse it out so we can make better sense of what we have - use response.json()
to get the returned data in json format. We see that GitHub returns a list called items
in our response, so let’s return
that. Then, in your main function, loop through it and print out the important bits.
We should now have a much more readable list of 33 or so repos, along with their number of stars. Let’s narrow down our search a bit. To use multiple search keywords, we’ll have to programatically construct our query string. Using the GitHub API documentation, let’s make a new function to construct a query string for the repository search endpoint that searches for any number of languages, and limits our query to repos with more than 50,000 stars.
Now, let’s call our new create_query()
function from repos_with_most_stars()
, replacing our hardcoded query string. Add a languages
argument so that we can pass in a list of languages to use to create our query. Also add sort
and order
parameters, which we’ll hardcode to “stars” and “desc” for now.
Finally, let’s add a languages
list to limit which languages we’re interested in, and pass it to repos_with_most_stars()
. Now, when we call our repos_with_most_stars()
function with ["python", "javascript", "ruby"]
as our languages, the create_query()
function will output create a query string that looks like q=stars:>50000+language:python+language:javascript+language:ruby+&sort=stars&order=desc
. Because this is a simple GET request, this gets appended to our gh_api_repo_search_url
, so our actual request URL is https://api.github.com/search/repositories?q=stars:>50000+language:python+language:javascript+language:ruby+&sort=stars&order=desc
.
Run your program.
Looking good, we now have a sorted list of the top python, javascript, and ruby repos. Let’s do a little bit of clean up and error handling. We might not always want to sort by “stars” or order by “desc”, so move those to keyword arguments. That way, they’ll be good defaults, but if someone calling our repos_with_most_stars
function wants to override them, they can.
Next, we should handle any errors we might run into with the API. Maybe you’ve gotten one already. Let’s add some basic error handling on the response’s HTTP status code. We’ll check for a 403
, a common error that GitHub uses to tell you that you’re hitting their API too quickly, and raise
and error. We’ll also raise
an error if the status code is anything but 200
(success).
There, your code should do the same thing, but should handle errors much better. The final code can be found below.
The Python standard library has a huge number of packages - no matter what you want to do, there’s probably a package included. Let’s practice using some of the more common ones. Create a new file and use the os
module to see if you can get a file listing for the folder your new file is in.
# libraries_exercise.py
import os
my_folder = os.getcwd()
print(f"Here are the files in {my_folder}:")
with os.scandir(my_folder) as folder:
for entry in folder:
print(f" - {entry.name}")
(env) $ python libraries_exercise.py
sys
is another commonly useful library, giving you access to some variables and functions used or maintained by the Python interpreter. Let’s try using sys
to get the arguments passed into our program from the command line, and to figure out what kind of computer we’re using:
# libraries_exercise.py
import sys
arguments = sys.argv
print(f"We received the following arguments:")
for arg in arguments:
print(f" - {arg}")
print(f"We are running on a '{sys.platform}' machine")
PyPI (the Python Package Index) is an awesome service that helps you find and install almost any 3rd-party Python package. You can browse the site at PyPI.org but most of the time you will probably interact with it through Python’s pip
command line tool.
For example, earlier you may have installed the requests
module. If you search pip
for requests
, you’ll see every package in the index containing the word requests:
(env) $ python -m pip search requests
requests-hawk (1.0.0) - requests-hawk
requests-dump (0.1.3) - `requests-dump` provides hook functions for requests.
requests-aws4auth (0.9) - AWS4 authentication for Requests
...
We just want the one named requests
, so we’ll install it with the install
keyword. If you don’t have it installed, pip
will install it for you. If you installed it earlier, you’ll see something like this:
(env) $ python -m pip install requests
Requirement already satisfied: requests in /usr/local/lib/python3.7/site-packages (2.21.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests) (2019.3.9)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests) (2.8)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests) (1.24.1)
Python comes with a very easy-to-use unittest
library built in. Write a simple function that accepts two numbers, and returns True
if the first number is evenly divisible by the second.
# divisible.py
def divisible_by(check_number, divisor):
return check_number % divisor == 0
Save your file as divisible.py
. In a second file called test_divisible.py
, create a TestCase
using the unittest
framework and use asserts to verify that the divisible_by()
function returns the correct result. Don’t forget to import your divisible_by()
function.
# test_divisible.py
import unittest
from divisible import divisible_by
class TestCase(unittest.TestCase):
def test_divisible_by(self):
self.assertTrue(divisible_by(10, 2))
self.assertTrue(divisible_by(10, 3))
if __name__ == '__main__':
unittest.main()
Name your file test_divisible.py
and run it:
(env) $ python test_divisible.py --verbose
You should have gotten an error: AssertionError: False is not true
caused by self.assertTrue(divisible_by(10, 3))
. Makes sense, because 10 is not evenly divisible by 3.
Change self.assertTrue
to self.assertFalse
and your test should pass.
(env) $ python test_divisible.py --verbose
test_divisible_by (__main__.TestCase) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
Delete the last ]
character from your cities.json
file and run your program again.
Now try wrapping the JSON decoding in a try... except
block.
Use the requests
library to do a GET request on the GitHub search API.
Run your GitHub search exercise.