Home » Downloading a CSV from REST API using Python Requests

Downloading a CSV from REST API using Python Requests

Recently, when using OpenDataKit, there was a need to download multiple CSV files from a remote server. The API was protected using HTTP Basic Authentication. I also wanted to do error handling and logging properly.

I wrote a Python Application to

  • Create a Timestamped directory to download current data
  • Create a latest_data directory with symbolic links to latest data files. This way one only needs to look into this directory to get latest data
  • Connect to the URL and download data
  • Error handling when downloading data, creating directories, or writing files or creating symbolic links
  • Logging

First, create a Python virtual environment using the following command and install requests

python3 -m venv data_pull cd data_pull source /bin/activate pip3 install requests
Code language: Python (python)

Then write the Python script: /home/vivek/data_pull/data_pull.py

import requests from requests.models import HTTPBasicAuth from requests import ReadTimeout, ConnectTimeout, Timeout, HTTPError, ConnectionError from time import sleep import os from datetime import datetime import logging import errno import logging.handlers LOG_FILENAME = './logs/Api_Pull.log' # create logger with level = DEBUG logger = logging.getLogger("Api_pull_log") logger.setLevel(logging.DEBUG) # create file handler which logs even debug messages fh = logging.handlers.RotatingFileHandler(LOG_FILENAME, maxBytes=1000000, backupCount=100) fh.setLevel(logging.DEBUG) # create console handler with a WARNING log level ch = logging.StreamHandler() ch.setLevel(logging.WARNING) # create formatter and add it to the handlers formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y/%m/%d %I:%M:%S %p') fh.setFormatter(formatter) ch.setFormatter(formatter) # add the handlers to the logger logger.addHandler(fh) logger.addHandler(ch) username = 'username@email.com' password = 'MyPassWord' url = "https://www.MyURl.org/downloads/" files = ['Csvfile1', 'CSVFile2', 'CsvFile3', 'CsvFile4', 'CsvFile5'] logger.info(" ") logger.info('======== Initiated New Data Pull ==========') os.makedirs(os.path.join(os.getcwd(), 'data'), exist_ok=True) os.makedirs(os.path.join(os.getcwd(), 'latest_data'), exist_ok=True) basepath = os.path.join(os.getcwd(), 'data') latestdatapath = os.path.join(os.getcwd(), 'latest_data') mydir = os.path.join(basepath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S')) print(mydir) try: logger.info('Trying to Create: %s', str(mydir)) os.makedirs(mydir) logger.info('Created:%s', str(mydir)) except OSError as e: if e.errno != errno.EEXIST: print("Error Creating Directory :", e) logger.error("Error Creating Directory", exc_info=True) else : print("Directory exists :", e) logger.error("Directory exists :", exc_info=True) for x in files: sleep(5) filename = x filename = filename + '.csv' localfilename = os.path.join(mydir, filename) latestfilename = os.path.join(latestdatapath, filename) fullurl = url + filename print("Trying ", fullurl) try: logger.info('Trying to Connect to : %s', fullurl) # connect 20 seconds and the read = 300 seconds timeouts. r = requests.get(fullurl, auth=HTTPBasicAuth(username, password) , timeout=(20, 300)) r.raise_for_status() if (r.status_code == 200): responseContentType = r.headers.get('content-type') print(responseContentType) if (responseContentType =='text/csv') : r.encoding = 'utf-8' try: csvfile = open(localfilename, 'wt', encoding='utf-8') # wt = write as text csvfile.write(r.text) print(localfilename + " SAVED") logger.info("File Saved : %s", str(localfilename)) except: print("Something went wrong when writing to the file") logger.error("Something went wrong when writing to the file :", exc_info=True) finally: csvfile.close() logger.info("File Closed : %s", str(localfilename)) try: os.unlink(latestfilename) logger.info("Prior Symbolic Link Removed: %s", str(latestfilename)) except: logger.info("No Prior Symbolic Link: %s", str(latestfilename)) os.symlink(localfilename, latestfilename) logger.info("Symbolic Link Created : %s", str(latestfilename)) else : print("Error Content Type is not - Text/CSV") logger.error("Error Content Type is not - Text/CSV") else : print(r.status_code, r.reason) except requests.exceptions.ConnectionError as err_con: print ("Error Establishing Connection:", err_con) logger.error("Connection Error :", exc_info=True) except requests.exceptions.HTTPError as err_http: print("HTTP Error: ", err_http) logger.error("HTTP Error :", exc_info=True) except requests.exceptions.Timeout as err_timeout: print("Timed Out: ", err_timeout) logger.error("Timeout Error :", exc_info=True) except requests.exceptions.TooManyRedirects as err_redirects: print("Too many Redirects ", err_redirects) logger.error("Too many redirects :", exc_info=True) except requests.exceptions.RequestException as e_cat: print("Catarstrophic Error, Exiting", e_cat) # catastrophic error. bail. logger.error("Catastropic Error :", exc_info=True) raise SystemExit(e) logger.info('======== Completed Data Pull ============') logger.info(" ")
Code language: Python (python)

Then add the script to Crontab to run it every 2 hours between 9 AM to 7 PM

crontab -e # m h dom mon dow command 0 9-18/2 * * * /usr/bin/python3 /home/vivek/data_pull/data_pull.py >> ~/cron.log 2>&1
Code language: PHP (php)