Chapter 8 Major Projects
The data team is responsible for a lot of interesting and essential projects. Some of these projects are very important to the agency’s day-to-day operations. Here is an overview of the top 5 most essential data projects. This section is flexible and will be either expanded or shrunk as needed.
8.1 TLC Datawarehouse
- What
Known as our policy dev server, IT provides a database where we store aggregated tables and essential metrics. Every 1-2 weeks, numbers are updated to support policy.
- When
Ongoing project. Most of the tables in the database are updated automatically following different schedules.
- Where
#Directories and sources
:\COF\COF\_M3trics2\automation I
#Catalog and data dictionaries here
: I:\COF\COF\_M3trics2\automation\data_dictionaries Data science reference
- Who
Built by the policy analytics team, it is maintained now by Nikita Voevodin with IT support from Maxim Smolyaninov.
- Other
New tables should be created for requests that are deemed repetitive and automatable.
8.2 Data Reports - Monthly Indicators
- What
This report is published on open data and our website and is reviewed with the commissioner every month. It includes a lot of relevant data from the traditional FHV bases (trip patterns, vehicle and driver counts, etc.)
The table published on open data with the following columns: Base License Number, Base Name, DBA, Year, Month, Month Name, Total Dispatched Trips, Total Dispatched Shared Trips, Unique Dispatched Vehicles.
- When
It is a recurring project. It is updated on the last Monday of each month.
- Where
https://data.cityofnewyork.us/Transportation/FHV-Base-Aggregate-Report/2v9c-2k7f
- Who
Point: Nikita Voevodin runs the updates. Support: IT/ Web Konstantin Onishchenko, PR Alan Fromberg, Rebecca Harshbarger
- Other
N/A
8.3 Driver Utilization data
- What
Driver utilization is calculated and loaded into our policy data warehouse. It is currently run unweighted, meaning that app logon time which is the denominator in this calculation, is evenly split for apps a driver is logged into simultaneously. Note that every nth time a year, we re-evaluate utilization publicly as per the law – legal can provide more assistance on the timeline as Ryan wrote the rules.
- When
It is a recurring project. It is updated during the last week of each month.
- Where
#Metrics
:\COF\COF\_M3trics2\automation\data_dictionaries. File: "company_indicators_weekly_utilization_even" I
- Who
Nikita
- Other
N/A
8.4 TLC Data Hub
- What
TLC Data Hub offers users a new and convenient location to access and visualize taxi and for-hire industry data. TLC Data Hub uses public data available on Open Data and the TLC website and does not use, track or display any private information of the drivers or companies. The Hub currently consists of two dashboards. The ‘Trip Viz’ dashboard allows the public to run queries on TLC-collected trip data, while the ‘Industry metrics’ dashboard provides standard visualizations of monthly industry trends.
- When
This project is on pause for now.
- Where
https://tlcanalytics.shinyapps.io/dash_test/
- Who
Nikita Voevodin is the creator and maintainer of the project.
- Other
The project was put on pause due to a lack of data updates. It should be switched to monthly updates (parallel to the raw trips publishing timeline) and reinstated.
8.5 Raw Trip Records publishing
- What Every six months, TLC aims to publicly release the previous six months of raw trip record data on our website and Open Data.
Process:
Ticket to Lana to create monthly files
Review with Chair
Send Konstantin links for him to stage (they will have predictable names based on month and industry)
Ticket to Lana to load files to AWS
Send links to Alex Finkel at DoITT to post to Open Data
- When
It is a recurring project. It is updated on the first weeks of September and March, Bi-Annually.
- Where
User Guide: https://www1.nyc.gov/assets/tlc/downloads/pdf/trip_record_user_guide.pdf
Yellow Dictionary:
https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
Green Dictionary:
https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_green.pdf
FHV Dictionary:
https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_fhv.pdf
High Volume Dictionary:
https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_hvfhs.pdf
- Who
Point: Nikita Voevodin runs the updates. Support: IT/ Web Konstantin Onishchenko, PR Alan Fromberg, Rebecca Harshbarger
- Support
IT/Data: Lana Goldenberg, IT/ Web: Konstantin Onishchenko, PR: Alan Fromberg, Rebecca Harshbarger
- Other
Publish raw trip records monthly on a two-month delay. Reason for 2-month delay: trad fhv bases submit their data with varying delay (4-6 weeks). For reference, the HVFHV delay is 2-3 weeks, yellow and green: 2 weeks. I do not recommend releasing the data as it comes or on different schedules, as the process is very time-consuming. As of now, we release bi-annually. Releasing monthly is x6 the workload. Releasing ‘as it comes’ is x24 the workload. Additionally, releasing on a 2-month delay schedule would allow us to catch submission errors and ensure data integrity.
8.6 Data Tasks Spreadsheet
There are many more tasks that we handle. Some of them are listed in the “Recurring_Tasks” Document located at:
:\COF\COF\_DA&E_\Nikita\Reports\Task_spreadsheet I
This work is licensed under a Creative Commons Attribution 4.0 International License.