More Details of the Requests Module¶
Once we run the get
function in the module requests
, AKA requests.get
, we get a response object. It’s an instance of a class called Response
that is defined in the requests
module. You can think of it as analogous to the Turtle class. Each Response has some attributes, just like each Turtle had some attributes; different responses have different values for the same attribute. All responses can also invoke certain methods that work for things of type Response
.
Previously, we saw that a response object has an attribute (AKA an “instance variable”) .text
, which contains the contents of the page you are requesting – the stuff after all the HTTP headers.
Response objects have some other useful attributes and methods that we can access as well. A few are used and explained below. Others will be introduced in other chapters.
The get
function inside the requests
module requires one input: a URL.
There are also additional optional inputs, which you’ll see later in more detail.
Essentially, the function is reaching for the place on the internet that that specific URL specifies, and getting all the stuff that lives there so that you can deal with it in some way in your program. A programmer might care about the text content of the page, as we just saw – a programmer might also care about other information that lives on that web page, some of which isn’t something most people who use the internet immediately notice.
import requests
page1 = requests.get("https://github.com/RunestoneInteractive/runestoneserver")
page2 = requests.get("https://github.com/RunestoneInteractive/nonsense")
page3 = requests.get("http://github.com/RunestoneInteractive/runestoneserver")
for p in [page1, page2, page3]:
print("********")
print("url:", p.url)
print("status:", p.status_code)
print("content type:", p.headers['Content-type'])
if len(p.text) > 1040:
print("content snippet:", p.text[1000:1040])
if len(p.history) > 0:
print("redirection history")
for h in p.history:
print(" ", h.url, h.status_code)
Here’s the output that is produced when I run that code.
$ python fetching.py
********
url: https://github.com/RunestoneInteractive/runestoneserver
status: 200
content type: text/html; charset=utf-8
content snippet: Sw==" rel="stylesheet" href="https://ass
********
url: https://github.com/RunestoneInteractive/nonsense
status: 404
content type: text/html; charset=utf-8
content snippet: ut[type=text],
input[type=password
********
url: https://github.com/RunestoneInteractive/runestoneserver
status: 200
content type: text/html; charset=utf-8
content snippet: Sw==" rel="stylesheet" href="https://ass
redirection history
http://github.com/RunestoneInteractive/runestoneserver 301
First, consider the .url
attribute. It is the URL that was actually accessed. We will see in a later chapter that requests.get lets us pass additional parameters that are used to construct the full URL, so this will be useful for seeing the full URL.
Next, consider the .status_code
attribute.
- When a server thinks that it is sending back what was requested, it send the code 200.
- When the requested page doesn’t exist, it sends back code 404, which is sometimes described as “File Not Found”. In the above example, that’s what happened for page2,
https://github.com/RunestoneInteractive/nonsense
- When the page has moved to a different location, it sends back code 301 and a different URL where the client is supposed to retrieve from. The request.get method is so smart that when it gets a 301, it looks at the new url and fetches it. For example, github redirects all requests using http to the corresponding page using https (the secure http protocol). Thus, when we asked for page3,
http://github.com/RunestoneInteractive/runestoneserver
, github sent back a 301 code and the url https://github.com/presnick/runestone. Therequests.get
function then fetched the other url. It reports a status of 200 and the updated url. We have to do further inquiry to find out that a redirection occurred (see below).
The .headers
attribute has as its value a dictionary consisting of keys and values. To find out all the headers, you can run the code and add a statement print(p.headers.keys())
. One of the headers is ‘Content-type’. For pages 1 and 3 its value is text/html; charset-utf-8
. For page2, where we got an error, the contents are of type application/json; charset=utf-8
.
The .text
attribute we have seen before. It contains the contents of the file (or sometimes the error message).
The .history
attribute contains a list of previous responses, if there were redirects. That list is empty, except for page3. For page3, we are able to see what happened in the original request: what the url was and the response code of 301.
To summarize, a Response
object has the following useful attributes that can be accessed in your program:
- .text
- .url
- .status_code
- .headers
- .history