More Details of the Requests Module

Once we run the get function in the module requests, AKA requests.get, we get a response object. It’s an instance of a class called Response that is defined in the requests module. You can think of it as analogous to the Turtle class. Each Response has some attributes, just like each Turtle had some attributes; different responses have different values for the same attribute. All responses can also invoke certain methods that work for things of type Response.

Previously, we saw that a response object has an attribute (AKA an “instance variable”) .text, which contains the contents of the page you are requesting – the stuff after all the HTTP headers.

Response objects have some other useful attributes and methods that we can access as well. A few are used and explained below. Others will be introduced in other chapters.

The get function inside the requests module requires one input: a URL.

There are also additional optional inputs, which you’ll see later in more detail.

Essentially, the function is reaching for the place on the internet that that specific URL specifies, and getting all the stuff that lives there so that you can deal with it in some way in your program. A programmer might care about the text content of the page, as we just saw – a programmer might also care about other information that lives on that web page, some of which isn’t something most people who use the internet immediately notice.

import requests

page1 = requests.get("https://github.com/RunestoneInteractive/runestoneserver")
page2 = requests.get("https://github.com/RunestoneInteractive/nonsense")
page3 = requests.get("http://github.com/RunestoneInteractive/runestoneserver")

for p in [page1, page2, page3]:
    print("********")
    print("url:", p.url)
    print("status:", p.status_code)
    print("content type:", p.headers['Content-type'])
    if len(p.text) > 1040:
        print("content snippet:", p.text[1000:1040])
    if len(p.history) > 0:
        print("redirection history")
        for h in p.history:
            print("  ", h.url, h.status_code)

Here’s the output that is produced when I run that code.

$ python fetching.py
********
url: https://github.com/RunestoneInteractive/runestoneserver
status: 200
content type: text/html; charset=utf-8
content snippet: Sw==" rel="stylesheet" href="https://ass
********
url: https://github.com/RunestoneInteractive/nonsense
status: 404
content type: text/html; charset=utf-8
content snippet: ut[type=text],
      input[type=password
********
url: https://github.com/RunestoneInteractive/runestoneserver
status: 200
content type: text/html; charset=utf-8
content snippet: Sw==" rel="stylesheet" href="https://ass
redirection history
   http://github.com/RunestoneInteractive/runestoneserver 301

First, consider the .url attribute. It is the URL that was actually accessed. We will see in a later chapter that requests.get lets us pass additional parameters that are used to construct the full URL, so this will be useful for seeing the full URL.

Next, consider the .status_code attribute.

The .headers attribute has as its value a dictionary consisting of keys and values. To find out all the headers, you can run the code and add a statement print(p.headers.keys()). One of the headers is ‘Content-type’. For pages 1 and 3 its value is text/html; charset-utf-8. For page2, where we got an error, the contents are of type application/json; charset=utf-8.

The .text attribute we have seen before. It contains the contents of the file (or sometimes the error message).

The .history attribute contains a list of previous responses, if there were redirects. That list is empty, except for page3. For page3, we are able to see what happened in the original request: what the url was and the response code of 301.

To summarize, a Response object has the following useful attributes that can be accessed in your program:

  • .text
  • .url
  • .status_code
  • .headers
  • .history
Next Section - Generating Request URLs