Chapter 4 Data Connections & Access

4.1 SQL Server

Our primary data source is the SQL server, which holds many databases, each with its own sets of tables accessible through different tools. Currently, the teams lean heavily on 3 of the following tools for accessing data from the server:

  • SQL Server Management Studio

  • R

  • Python

4.2 Pertinent IT Databases

There are a series of databases that we use to collect data. Your Windows profile restricts access to these servers, and a ticket with IT is required to access the database, but talk to your supervisor first to see if there are accessible points already in place before approaching IT. below are some credentials provided for a few databases:

Table 4.1: ODBC Connections
Database Driver Server Credentials
DataWarehouse odbc driver 17 for sql server TLCBDWH request from IT
Azure_Trip_Data odbc driver 17 for sql server tlcsqlmi01.fa986d691ca7.database.windows.net request from IT
APPLUS sql server 10.224.244.114 request from IT
TLC_Policy_Programs_Dev odbc driver 17 for sql server msdwvd-tlctxy01.csc.nycnet request from IT

4.3 Open Data Database

Before submitting data to the city Open Data portal, we store it in the Open Data SQL server. Contact Nikita Voevodin or Maxim Smolyaninov from IT if you ever need access to it.

4.4 Open Data

City agencies that work with data are often required to post their data to the city’s open data portal. It can be handy to know how to pull data from there. Even for our data, different departments publish different datasets, which might not be shared between departments. You can either download the whole datasets as CSV, JSON, or whatever, or you can use their API. They have helpful code snippets of working with their data in different programming languages. The first thing you should do is set up an account and get an API key. Then, you can do something like:

  • R:
library(RSocrata)

date <- Sys.Date()

test <- read.socrata(
  paste0("https://data.cityofnewyork.us/resource/rhe8-mgbb.json?last_updated_date=",date,""),
  app_token = "yourtokenhere",
  email     = "yourcreds",
  password  = "yourcreds"
)
  • Python:
from sodapy import Socrata;
import pandas as pd;

client = Socrata("data.cityofnewyork.us",
                 "token",
                 username="yourcreds",
                 password="yourcreds")

today = date.today()

results = client.get("rhe8-mgbb", limit = 20000, last_updated_date=today)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results) 

4.5 Proxy Settings

At some point, you might encounter a proxy problem. That is a firewall blocking certain connections from outside. Unfortunately, everything that has to do with installing python or javascript packages is considered undesirable by our firewall and ultimately will be blocked. That is a very annoying issue to deal with for anybody. Fortunately, there is a solution. For most tasks, adding the following proxy settings will fix the problem.

Open Anaconda -> file -> preferences -> configure Conda. Paste the following in there (use your username):

channels:
  - defaults
proxy_servers:
  http: http://csc\yourusername@10.155.126.15:8080
  https: http://csc\yourusername@10.155.126.15:8080

ssl_verify: false

Proxy Settings

Here are the settings for other languages and systems:

  • Linux
#Use the following syntax to configure the proxy for http, https and ftp traffic on the Linux 
#command line :

# export http_proxy="http://bcpxy.nycnet:8080"
# export https_proxy="https://bcpxy.nycnet:8080"
# export ftp_proxy="http://bcpxy.nycnet:8080"

-------------------------------------------------------------------------------
#Use the following syntax if the proxy server requires authentication :

# export http_proxy="http://user:password@bcpxy.nycnet:8080"
# export https_proxy="https://user:password@bcpxy.nycnet:8080"
# export ftp_proxy="http://user:password@bcpxy.nycnet:8080"

-------------------------------------------------------------------------------
#Using your email address username%40agency.nyc.gov

# export http_proxy="http://username%40agency.nyc.gov:<password>@bcpxy.nycnet:8080"
# export https_proxy=https://username%40agency.nyc.gov:<password>@bcpxy.nycnet:8080
# export ftp_proxy=http://username%40agency.nyc.gov:<password>@bcpxy.nycnet:8080
  • NPM
# npm config set proxy http://bcpxy.nycnet:8080
# npm config set https-proxy http://bcpxy.nycnet:8080

If none of this helps, contact either Nikita Voevodin or Jordan Mamet from IT.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.