Epidemiology & Technology

Downloading a CSV from REST API using Python Requests

Recently, when using OpenDataKit, there was a need to download multiple CSV files from a remote server. The API was protected using HTTP Basic Authentication. I also wanted to do error handling and logging properly.

I wrote a Python Application to

  • Create a Timestamped directory to download current data
  • Create a latest_data directory with symbolic links to latest data files. This way one only needs to look into this directory to get latest data
  • Connect to the URL and download data
  • Error handling when downloading data, creating directories, or writing files or creating symbolic links
  • Logging

First, create a Python virtual environment using the following command and install requests

python3 -m venv data_pull
cd data_pull
source /bin/activate
pip3 install requestsCode language: Python (python)

Then write the Python script: /home/vivek/data_pull/data_pull.py

import requests
from requests.models import HTTPBasicAuth
from requests import ReadTimeout, ConnectTimeout, Timeout, HTTPError, ConnectionError
from time import sleep
import os
from datetime import datetime
import logging
import errno
import logging.handlers

LOG_FILENAME = './logs/Api_Pull.log'

# create logger with level = DEBUG
logger = logging.getLogger("Api_pull_log")

# create file handler which logs even debug messages
fh = logging.handlers.RotatingFileHandler(LOG_FILENAME, maxBytes=1000000, backupCount=100)
fh.setLevel(logging. DEBUG)

# create console handler with a WARNING log level
ch = logging.StreamHandler()

# create formatter and add it to the handlers
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y/%m/%d %I:%M:%S %p')

# add the handlers to the logger

username = 'username@email.com'
password = 'MyPassWord'
url = "https://www.MyURl.org/downloads/"
files = ['Csvfile1', 'CSVFile2', 'CsvFile3', 'CsvFile4', 'CsvFile5']

logger.info("  ")
logger.info('======== Initiated New Data Pull ==========')

os.makedirs(os.path.join(os.getcwd(), 'data'), exist_ok=True)
os.makedirs(os.path.join(os.getcwd(), 'latest_data'), exist_ok=True)
basepath = os.path.join(os.getcwd(), 'data')
latestdatapath = os.path.join(os.getcwd(), 'latest_data')

mydir = os.path.join(basepath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
    logger.info('Trying to Create: %s', str(mydir))
    logger.info('Created:%s', str(mydir))
except OSError as e:
    if e.errno != errno.EEXIST:
        print("Error Creating Directory :", e)
        logger.error("Error Creating Directory", exc_info=True)
    else :
        print("Directory exists :", e)
        logger.error("Directory exists :", exc_info=True)

for x in files:
    filename = x
    filename = filename + '.csv'
    localfilename  = os.path.join(mydir, filename)
    latestfilename = os.path.join(latestdatapath, filename)

    fullurl = url + filename
    print("Trying  ", fullurl)
        logger.info('Trying to Connect to : %s', fullurl)
        # connect 20 seconds and the read = 300 seconds timeouts.
        r = requests.get(fullurl, auth=HTTPBasicAuth(username, password) , timeout=(20, 300)) 
        if (r.status_code == 200):
            responseContentType = r.headers.get('content-type')
            if (responseContentType =='text/csv') :
                r.encoding = 'utf-8'
                    csvfile = open(localfilename, 'wt', encoding='utf-8') # wt = write as text
                    print(localfilename + "   SAVED")
                    logger.info("File Saved : %s", str(localfilename))
                    print("Something went wrong when writing to the file")
                    logger.error("Something went wrong when writing to the file :", exc_info=True)
                    logger.info("File Closed : %s", str(localfilename))
                        logger.info("Prior Symbolic Link Removed: %s", str(latestfilename))
                        logger.info("No Prior Symbolic Link: %s", str(latestfilename))
                    os.symlink(localfilename, latestfilename)
                    logger.info("Symbolic Link Created  : %s", str(latestfilename))
            else :
                print("Error Content Type is not  -  Text/CSV")
                logger.error("Error Content Type is not  -  Text/CSV")
        else :
            print(r.status_code, r.reason)

    except requests.exceptions.ConnectionError as err_con:
        print ("Error Establishing Connection:", err_con)
        logger.error("Connection Error :", exc_info=True)

    except requests.exceptions.HTTPError as err_http:
        print("HTTP Error: ", err_http)
        logger.error("HTTP Error :", exc_info=True)

    except requests.exceptions.Timeout as err_timeout:
        print("Timed Out: ", err_timeout)
        logger.error("Timeout Error :", exc_info=True)

    except requests.exceptions.TooManyRedirects as err_redirects:
        print("Too many Redirects ", err_redirects)
        logger.error("Too many redirects :", exc_info=True)
    except requests.exceptions.RequestException as e_cat:
        print("Catarstrophic Error, Exiting", e_cat)    # catastrophic error. bail.
        logger.error("Catastropic Error :", exc_info=True)
        raise SystemExit(e)

logger.info('========  Completed Data Pull  ============')
logger.info("  ")

Code language: Python (python)

Then add the script to Crontab to run it every 2 hours between 9 AM to 7 PM

crontab -e
# m    h         dom mon dow   command
  0   9-18/2      *   *   *     /usr/bin/python3  /home/vivek/data_pull/data_pull.py >> ~/cron.log 2>&1Code language: PHP (php)

Related posts