top of page

Portfolio: Analysis of Toronto's Residential Development

Analysis of Toronto's development through open data

Intro

I was browsing City of Toronto’s open data portal when I came across a quite extensive dataset of Toronto’s apartments. The dataset is a product of Toronto’s “Apartment Building Standard (ABS) program … which defines formal criteria to identify all rental apartment buildings in the city with 3 or more storeys and 10 or more units. … This dataset contains building information for the buildings that are registered in the ABS program. This information was collected from building owners/managers during the initial registration process, and is updated on an annual basis as part of the registration renewal process.” [City of Toronto]

​

The sheer number of entries (3457) and columns (70) piqued my interest and I decided to analyze this dataset in order to find any interesting patterns.

Preparing the Dataset

I wrote a Python script which takes the original csv document and uploads each entry into my personal MySQL database. The script also splits columns like AMENITIES_AVAILABLE into separate binary columns in order to make analysis easier. Upon parsing the data through the Python script and with the help of some basic data wrangling and manipulation in Excel, I ended up with 52 unique and non-derived columns for each row. These are my resulting columns:

​

AIR_CONDITIONING_TYPE, AMENITIES_AVAILABLE, BALCONIES  BARRIER_FREE_ACCESSIBILTY_ENTR, BIKE_PARKING, EXTERIOR_FIRE_ESCAPE FIRE_ALARM, GARBAGE_CHUTES, HEATING_TYPE, INTERCOM, LAUNDRY_ROOM, LOCKER_OR_STORAGE_ROOM, NO_OF_ELEVATORS, PARKING_TYPE, PETS_ALLOWED, PROP_MANAGEMENT_COMPANY_NAME, PROPERTY_TYPE, RSN, SEPARATE_GAS_METERS_EACH_UNIT, SEPARATE_HYDRO_METER_EACH_UNIT, SEPARATE_WATER_METERS_EA_UNIT, SITE_ADDRESS, SPRINKLER_SYSTEM, VISITOR_PARKING, WARD, WINDOW_TYPE, YEAR_BUILT, YEAR_REGISTERED, NO_OF_STOREYS, IS_THERE_EMERGENCY_POWER_q, NON_SMOKING_BUILDING_q, NO_OF_UNITS, NO_OF_ACCESSIBLEPARKING_SPACES, FACILITIES_AVAILABLE_q, IS_THERE_A_COOLING_ROOM_q, NO_BARRIERFREE_ACCESSBLE_UNITS, build_primary_key, outdoor_rec_fac, outdoor_pool, indoor_pool, indoor_rec_room, sauna, indoor_exercise_room, child_play_area, underground_garage, ground_level_garage, garage_thru_building, carport, sufrace_parking, parking_deck, indoor_bike_parking, outdoor_bike_parking

Exploration of Available Variables

So, what kind of analysis can we do with the remaining columns?

​

We could do a lot of descriptive statistics with these columns and answer questions of the form “What is the percentage of apartments that have X (and also Y)?”. However, I think we can observe more interesting patterns if we work with the YEAR_BUILT variable (year that the building was built). In fact, the YEAR_BUILT variable allows us to turn our open data into the time series from which we can observe a change over time. This will allow us to go beyond mere descriptive questions and, instead, answer questions of the form “When did Toronto start doing X?”

 i 

Disclaimer:

The dataset contains information about the existing residential buildings, and, as such, it does not present a complete historical information. It is also worth noting that not all existing residential buildings appear in the dataset (for unknown reasons). As such, it is very likely that certain trends will be distorted or outright erased because of the scope of the dataset. However, that does not mean that the dataset is completely useless, as it can still provide some information.

Data Analysis

First, let’s see how did Toronto grow in terms of available residential space and, additionally, let’s group data by municipal wards in order to see where did the development take place. Here are the plots, I grouped them by overall growth rates:

wardsA.png
wardsB.png
WardsCD.png
wardsD.png

We can get a lot more out of these plots than may seem at first. We can make observations in regards to the timing, geography, and the clustering of growth.

​

Timing: It appears that in almost all wards, the largest bulk of development happened between 1950 and 1980 (as indicated by the steep rise in the amount of available residential units). Upon hitting 1980, most Wards have either slowed down or plateaued in terms of new residential units being built.

​

Geography: As indicated by the sheer number of residential units (and not the financial cost), the most amount of real estate development happened outside of the Wards 9, 10, 22, 23, 25. If we look at the scale of each plot, we can see that the Wards 9, 10, 22, 23, 25 grew on the scale of 0 to 8,000 residential units, whereas all other Wards grew on the scale of 0 to 17,000 residential units, with the exception of Wards 12 and 13, which grew on the scale of 0 to 27,000 residential units.

​

Clustering: We can cluster the growth figures into three categories—normal growth (plot 1 and 2), super-growth (plot 4), and least growth (plot 3).

Next, let us look at how ‘dense’ the buildings are in terms of units per floor and let us also sort data by ownership type. Here are the plots:

units_per_floor_year_ownership.png
units_per_floor_distr.png
bottom of page