Numerous motion picture, music, television and streaming production activities take place in New York City every year.
Permits are generally required for exclusive use of city properties, e.g. sidewalks, streets or parks. The Film Office, a division of the Mayor's Office of Media and Entertainment (MOME), issues permits for production activities in the NYC boroughs.
This exploratory data analysis (EDA) focuses on filming activities and aims at answering the following questions for the 2012-2021 period:
Data on film permits is provided by MOME via NYC Open Data.
A copy of the dataset was downloaded from the NYC Open Data website as a csv file on 22 May 2022. The file contains the following columns:
EventID
: Auto-generated unique event identification numberEventType
: Type of Activity for this approved permitStartDateTime
: Activity scheduled to beginEndDateTime
: Activity scheduled to be completedEnteredOn
: Date permit request submitted to MOMEEventAgency
ParkingHeld
: Locations of request to hold parking in advance for permitted filming activityBorough
: First borough of activity for the dayCommunityBoard(s)
: First Community Board of activity for the dayPolicePrecinct(s)
: First Police precinct of activity for the dayCategory
: Description of production as selected by permit applicantSubCategoryName
: More specific description of production as selected by permit applicantCountry
: Project originZipCode(s)
: First zip code of production activityAccording to the Data Dictionary, the data provided by MOME refers to approved permits.
The dictionary also presents information on the event type the permit refers to:
The EDA is based on permits for filming activities (Shooting) only.
# Import libraries that will be used to perform the data analysis
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'plotly_mimetype+notebook'
# Load data downloaded from NYC Open Data into the permits DataFrame
permits = pd.read_csv('./Film_Permits.csv')
permits.head(5)
EventID | EventType | StartDateTime | EndDateTime | EnteredOn | EventAgency | ParkingHeld | Borough | CommunityBoard(s) | PolicePrecinct(s) | Category | SubCategoryName | Country | ZipCode(s) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 446040 | Shooting Permit | 10/19/2018 02:00:00 PM | 10/20/2018 04:00:00 AM | 10/16/2018 11:57:27 AM | Mayor's Office of Film, Theatre & Broadcasting | THOMPSON STREET between PRINCE STREET and SPRI... | Manhattan | 2 | 1 | Television | Cable-episodic | United States of America | 10012 |
1 | 446168 | Shooting Permit | 10/19/2018 02:00:00 PM | 10/20/2018 02:00:00 AM | 10/16/2018 07:03:56 PM | Mayor's Office of Film, Theatre & Broadcasting | MARBLE HILL AVENUE between WEST 227 STREET an... | Manhattan | 12, 8 | 34, 50 | Film | Feature | United States of America | 10034, 10463 |
2 | 186438 | Shooting Permit | 10/30/2014 07:00:00 AM | 10/31/2014 02:00:00 AM | 10/27/2014 12:14:15 PM | Mayor's Office of Film, Theatre & Broadcasting | LAUREL HILL BLVD between REVIEW AVENUE and RUS... | Queens | 2, 5 | 104, 108 | Television | Episodic series | United States of America | 11378 |
3 | 445255 | Shooting Permit | 10/20/2018 07:00:00 AM | 10/20/2018 06:00:00 PM | 10/09/2018 09:34:58 PM | Mayor's Office of Film, Theatre & Broadcasting | JORALEMON STREET between BOERUM PLACE and COUR... | Brooklyn | 2 | 84 | Still Photography | Not Applicable | United States of America | 11201 |
4 | 128794 | Theater Load in and Load Outs | 11/16/2013 12:01:00 AM | 11/17/2013 06:00:00 AM | 11/07/2013 03:48:28 PM | Mayor's Office of Film, Theatre & Broadcasting | WEST 31 STREET between 7 AVENUE and 8 AVENUE... | Manhattan | 4, 5 | 14 | Theater | Theater | United States of America | 10001, 10121 |
# Summary of the permits DataFrame
permits.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 74938 entries, 0 to 74937 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 EventID 74938 non-null int64 1 EventType 74938 non-null object 2 StartDateTime 74938 non-null object 3 EndDateTime 74938 non-null object 4 EnteredOn 74938 non-null object 5 EventAgency 74938 non-null object 6 ParkingHeld 74938 non-null object 7 Borough 74938 non-null object 8 CommunityBoard(s) 74920 non-null object 9 PolicePrecinct(s) 74920 non-null object 10 Category 74938 non-null object 11 SubCategoryName 74938 non-null object 12 Country 74938 non-null object 13 ZipCode(s) 74920 non-null object dtypes: int64(1), object(13) memory usage: 8.0+ MB
# Number of unique permits
permits['EventID'].unique().shape[0]
74938
There are no duplicated film permits in the DataFrame, i.e. the dataset contains information on 74,938 permits.
# Remove columns that will not be used in the analysis
columns_to_be_removed = ['EndDateTime', 'EnteredOn', 'EventAgency', 'ParkingHeld', 'CommunityBoard(s)',
'PolicePrecinct(s)', 'SubCategoryName', 'ZipCode(s)']
permits = permits.drop(columns_to_be_removed, axis = 1)
permits.head(5)
EventID | EventType | StartDateTime | Borough | Category | Country | |
---|---|---|---|---|---|---|
0 | 446040 | Shooting Permit | 10/19/2018 02:00:00 PM | Manhattan | Television | United States of America |
1 | 446168 | Shooting Permit | 10/19/2018 02:00:00 PM | Manhattan | Film | United States of America |
2 | 186438 | Shooting Permit | 10/30/2014 07:00:00 AM | Queens | Television | United States of America |
3 | 445255 | Shooting Permit | 10/20/2018 07:00:00 AM | Brooklyn | Still Photography | United States of America |
4 | 128794 | Theater Load in and Load Outs | 11/16/2013 12:01:00 AM | Manhattan | Theater | United States of America |
# Create new DataFrame without Load in/out, DCAS Prep/Shoot/Wrap and Rigging events
filming_permits = permits[permits['EventType'] == 'Shooting Permit'].copy(deep = True)
filming_permits.head(5)
EventID | EventType | StartDateTime | Borough | Category | Country | |
---|---|---|---|---|---|---|
0 | 446040 | Shooting Permit | 10/19/2018 02:00:00 PM | Manhattan | Television | United States of America |
1 | 446168 | Shooting Permit | 10/19/2018 02:00:00 PM | Manhattan | Film | United States of America |
2 | 186438 | Shooting Permit | 10/30/2014 07:00:00 AM | Queens | Television | United States of America |
3 | 445255 | Shooting Permit | 10/20/2018 07:00:00 AM | Brooklyn | Still Photography | United States of America |
5 | 43547 | Shooting Permit | 01/10/2012 07:00:00 AM | Brooklyn | Television | United States of America |
# Remove EventType column
filming_permits = filming_permits.drop('EventType', axis = 1)
filming_permits.head(5)
EventID | StartDateTime | Borough | Category | Country | |
---|---|---|---|---|---|
0 | 446040 | 10/19/2018 02:00:00 PM | Manhattan | Television | United States of America |
1 | 446168 | 10/19/2018 02:00:00 PM | Manhattan | Film | United States of America |
2 | 186438 | 10/30/2014 07:00:00 AM | Queens | Television | United States of America |
3 | 445255 | 10/20/2018 07:00:00 AM | Brooklyn | Still Photography | United States of America |
5 | 43547 | 01/10/2012 07:00:00 AM | Brooklyn | Television | United States of America |
# Convert StartDateTime column to datetime
filming_permits.loc[:, 'StartDateTime'] = pd.to_datetime(filming_permits.loc[:, 'StartDateTime'],
format = '%m/%d/%Y %I:%M:%S %p')
filming_permits.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 65621 entries, 0 to 74937 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 EventID 65621 non-null int64 1 StartDateTime 65621 non-null datetime64[ns] 2 Borough 65621 non-null object 3 Category 65621 non-null object 4 Country 65621 non-null object dtypes: datetime64[ns](1), int64(1), object(3) memory usage: 3.0+ MB
# Use the StartDateTime column to create StartDay, StartDayOfWeek, StartMonth and StartYear columns
filming_permits[['StartDay', 'StartDayOfWeek', 'StartMonth', 'StartYear']] = filming_permits.copy(
deep = True).apply(lambda x: [x['StartDateTime'].floor('d'),
x['StartDateTime'].day_name(),
x['StartDateTime'].month_name(),
x['StartDateTime'].year],
axis = 1, result_type = 'expand')
filming_permits.head(5)
EventID | StartDateTime | Borough | Category | Country | StartDay | StartDayOfWeek | StartMonth | StartYear | |
---|---|---|---|---|---|---|---|---|---|
0 | 446040 | 2018-10-19 14:00:00 | Manhattan | Television | United States of America | 2018-10-19 | Friday | October | 2018 |
1 | 446168 | 2018-10-19 14:00:00 | Manhattan | Film | United States of America | 2018-10-19 | Friday | October | 2018 |
2 | 186438 | 2014-10-30 07:00:00 | Queens | Television | United States of America | 2014-10-30 | Thursday | October | 2014 |
3 | 445255 | 2018-10-20 07:00:00 | Brooklyn | Still Photography | United States of America | 2018-10-20 | Saturday | October | 2018 |
5 | 43547 | 2012-01-10 07:00:00 | Brooklyn | Television | United States of America | 2012-01-10 | Tuesday | January | 2012 |
# StartDay column range
print (min(filming_permits['StartYear']), max(filming_permits['StartYear']))
2012 2022
# Remove rows that refer to 2022 activities to focus on full years only
filming_permits = filming_permits[filming_permits.loc[:, 'StartYear'] != 2022]
print (min(filming_permits['StartYear']), max(filming_permits['StartYear']))
2012 2021
# Number of unique permits (filming activities only)
filming_permits['EventID'].unique().shape[0]
65194
65,194 permits for filming activities were issued between 2012 and 2021.
While the Borough
column refers to the first borough of activity for the day only, its values can still shed some light on the preferred boroughs for filming activities in NYC.
activities_by_borough = filming_permits['Borough'].value_counts(
ascending = False).rename_axis('borough').reset_index(name = 'activities')
activities_by_borough['activities (%)'] = (
activities_by_borough['activities'] / filming_permits.shape[0]) * 100
activities_by_borough
borough | activities | activities (%) | |
---|---|---|---|
0 | Manhattan | 28742 | 44.086879 |
1 | Brooklyn | 21578 | 33.098138 |
2 | Queens | 11762 | 18.041538 |
3 | Bronx | 2170 | 3.328527 |
4 | Staten Island | 942 | 1.444918 |
fig = px.bar(activities_by_borough.loc[::-1], x = 'activities (%)', y = 'borough',
title = """Over 77% of the filming activities were scheduled to happen in Manhattan or Brooklyn
<br><sup>Scheduled filming activities by borough (2012-2021)<sup>""",
orientation = 'h', height = 400)
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_xaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.show()
activities_by_category = filming_permits['Category'].value_counts(
ascending = False).rename_axis('category').reset_index(name = 'activities')
activities_by_category['activities (%)'] = (
activities_by_category['activities'] / filming_permits.shape[0]) * 100
activities_by_category
category | activities | activities (%) | |
---|---|---|---|
0 | Television | 39775 | 61.010216 |
1 | Film | 11082 | 16.998497 |
2 | Commercial | 5764 | 8.841304 |
3 | Still Photography | 4295 | 6.588030 |
4 | WEB | 2717 | 4.167561 |
5 | Theater | 678 | 1.039973 |
6 | Documentary | 334 | 0.512317 |
7 | Student | 300 | 0.460165 |
8 | Music Video | 248 | 0.380403 |
9 | Red Carpet/Premiere | 1 | 0.001534 |
fig = px.bar(activities_by_category.loc[::-1], x = 'activities (%)', y = 'category',
title = """Over 78% of the filming activities referred to television and film productions
<br><sup>Scheduled filming activities by category (2012-2021)<sup>""",
orientation = 'h', height = 600)
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_xaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.show()
# Create crosstab to check how production categories are distributed across NYC boroughs
category_borough_crosstab = pd.crosstab(filming_permits['Category'], filming_permits['Borough'],
rownames = ['category'], colnames = ['borough'], normalize = 'columns')
category_borough_crosstab
borough | Bronx | Brooklyn | Manhattan | Queens | Staten Island |
---|---|---|---|---|---|
category | |||||
Commercial | 0.080184 | 0.083604 | 0.117180 | 0.031882 | 0.045648 |
Documentary | 0.005069 | 0.003847 | 0.007724 | 0.001360 | 0.002123 |
Film | 0.207834 | 0.184911 | 0.174936 | 0.113076 | 0.300425 |
Music Video | 0.003687 | 0.004773 | 0.003897 | 0.001785 | 0.004246 |
Red Carpet/Premiere | 0.000000 | 0.000000 | 0.000035 | 0.000000 | 0.000000 |
Still Photography | 0.028571 | 0.064464 | 0.092826 | 0.014623 | 0.002123 |
Student | 0.005530 | 0.006349 | 0.003653 | 0.002891 | 0.012739 |
Television | 0.651613 | 0.610900 | 0.525050 | 0.807261 | 0.629512 |
Theater | 0.000922 | 0.000881 | 0.022511 | 0.000765 | 0.001062 |
WEB | 0.016590 | 0.040272 | 0.052188 | 0.026356 | 0.002123 |
fig = px.imshow(category_borough_crosstab * 100,
labels = dict(color = 'filming activities (%)'),
x = category_borough_crosstab.columns,
y = category_borough_crosstab.index,
title = """Television productions accounted for the majority of filming activities in all NYC boroughs
<br><sup>Scheduled filming activities by category and borough (2012-2021)<sup>""",
aspect = 'auto',
color_continuous_scale = 'viridis')
fig.show()
Filming activities related to film productions were the second most common in NYC boroughs.
productions_by_country = filming_permits['Country'].value_counts(
ascending = False).rename_axis('country').reset_index(name = 'filming permits')
productions_by_country['filming permits (%)'] = (
productions_by_country['filming permits'] / filming_permits.shape[0]) * 100
productions_by_country
country | filming permits | filming permits (%) | |
---|---|---|---|
0 | United States of America | 65125 | 99.894162 |
1 | United Kingdom | 20 | 0.030678 |
2 | Canada | 17 | 0.026076 |
3 | Japan | 8 | 0.012271 |
4 | France | 7 | 0.010737 |
5 | Panama | 7 | 0.010737 |
6 | Australia | 5 | 0.007669 |
7 | Netherlands | 2 | 0.003068 |
8 | Ireland | 2 | 0.003068 |
9 | Germany | 1 | 0.001534 |
The vast majority of productions were domestic.
activities_by_year = filming_permits['StartYear'].value_counts(ascending = False).rename_axis(
'year').reset_index(name = 'activities')
activities_by_year
year | activities | |
---|---|---|
0 | 2015 | 7905 |
1 | 2018 | 7736 |
2 | 2013 | 7116 |
3 | 2014 | 7115 |
4 | 2017 | 7091 |
5 | 2019 | 7074 |
6 | 2016 | 7056 |
7 | 2012 | 6105 |
8 | 2021 | 5692 |
9 | 2020 | 2304 |
fig = px.bar(activities_by_year.sort_values(by = 'year'), x = 'year', y = 'activities',
title = """The number of scheduled activities plummeted in 2020 and did not return to 2013-2019 levels in 2021
<br><sup>Scheduled filming activities by year (2012-2021)<sup>""")
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_yaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.show()
Only 2,304 filming activities were scheduled in 2020 as productions were heavily impacted by the COVID-19 pandemic.
An activities-by-day bar chart may help understand how the pandemic affected productions in 2020.
# Create DateFrame for activities scheduled for 2020 only
filming_permits_2020 = filming_permits[filming_permits.loc[:, 'StartYear'] == 2020]
activities_by_day_2020 = filming_permits_2020['StartDay'].value_counts(ascending = False).rename_axis(
'day').reset_index(name = 'activities')
# Create activities-by-day bar chart
fig = px.bar(activities_by_day_2020.sort_values(by = 'day'), x = 'day', y = 'activities',
title = """No filming activities were scheduled for April, May or June 2020
<br><sup>Scheduled filming activities by day (2020)<sup>""", )
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_yaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.show()
activities_by_month = filming_permits['StartMonth'].value_counts(ascending = False).rename_axis(
'month').reset_index(name = 'activities')
activities_by_month['activities (%)'] = (
activities_by_month['activities'] / activities_by_month['activities'].sum()) * 100
activities_by_month.set_index('month', inplace = True)
activities_by_month = activities_by_month.sort_index(
key = lambda x: pd.to_datetime(x, format = "%B"))
activities_by_month
activities | activities (%) | |
---|---|---|
month | ||
January | 4176 | 6.405497 |
February | 4546 | 6.973034 |
March | 5613 | 8.609688 |
April | 4799 | 7.361107 |
May | 4730 | 7.255269 |
June | 4876 | 7.479216 |
July | 5462 | 8.378072 |
August | 6288 | 9.645059 |
September | 5960 | 9.141946 |
October | 7385 | 11.327730 |
November | 6468 | 9.921158 |
December | 4891 | 7.502224 |
fig = px.bar(activities_by_month, x = activities_by_month.index, y = 'activities (%)',
title = """October was the preferred month for filming activities
<br><sup>Scheduled filming activities by month (2012-2021)<sup>""")
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_yaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.show()
Was October the preferred month for filming activities in every year of the 2012-2021 period? Small multiples, a series of similar charts using the same scale and axes, can help answer the question.
# Create DataFrame with percentages of filming activities by year by month
activities_year_month = filming_permits.groupby(
['StartYear', 'StartMonth'])['EventID'].count().reset_index()
activities_year_month.rename(columns = {'StartYear': 'year', 'StartMonth': 'month',
'EventID': 'activities'}, inplace = True)
activities_year_month['activities (%)'] = activities_year_month.groupby('year', group_keys=False)\
.apply(lambda x: 100 * x['activities'] / x['activities'].sum())
# Order rows by year and by month
activities_year_month['month'] = pd.Categorical(
activities_year_month['month'], categories =
['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October',
'November', 'December'],
ordered = True)
activities_year_month = activities_year_month.sort_values(by = ['year', 'month'])
activities_year_month
year | month | activities | activities (%) | |
---|---|---|---|---|
4 | 2012 | January | 341 | 5.585586 |
3 | 2012 | February | 441 | 7.223587 |
7 | 2012 | March | 548 | 8.976249 |
0 | 2012 | April | 548 | 8.976249 |
8 | 2012 | May | 548 | 8.976249 |
... | ... | ... | ... | ... |
106 | 2021 | August | 619 | 10.874912 |
116 | 2021 | September | 548 | 9.627547 |
115 | 2021 | October | 581 | 10.207309 |
114 | 2021 | November | 588 | 10.330288 |
107 | 2021 | December | 390 | 6.851722 |
117 rows × 4 columns
# Check for NaN values
activities_year_month.isnull().sum()
year 0 month 0 activities 0 activities (%) 0 dtype: int64
# Create small multiples
fig = px.bar(activities_year_month, x = 'month', y = 'activities (%)',
facet_col = 'year', facet_col_wrap = 5,
title = """October had the highest number of scheduled activities in 7 of the last 10 years
<br><sup>Scheduled filming activities by month and year (2012-2021)<sup>""",
height = 600)
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_yaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.for_each_annotation(lambda a: a.update(text = a.text.replace('year=', '')))
fig.show()
October was the preferred month for filming activities in 2012, 2013, 2014, 2015, 2016, 2018 and 2019. August was the favourite one in 2017 and 2021.
January had the lowest amount of filming activities in 2012, 2014, 2015, 2016, 2018, 2019 and 2021 and the highest amount of said activities in 2020.
activities_by_day_of_week = filming_permits['StartDayOfWeek'].value_counts(ascending = False).rename_axis(
'day of the week').reset_index(name = 'activities')
activities_by_day_of_week['activities (%)'] = (
activities_by_day_of_week['activities'] / activities_by_day_of_week['activities'].sum()) * 100
activities_by_day_of_week['day of the week'] = pd.Categorical(
activities_by_day_of_week['day of the week'], categories =
['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday'],
ordered = True)
activities_by_day_of_week.set_index('day of the week', inplace = True)
activities_by_day_of_week = activities_by_day_of_week.sort_index()
activities_by_day_of_week
activities | activities (%) | |
---|---|---|
day of the week | ||
Monday | 10423 | 15.987668 |
Tuesday | 11834 | 18.151977 |
Wednesday | 12434 | 19.072307 |
Thursday | 12558 | 19.262509 |
Friday | 11802 | 18.102893 |
Saturday | 3314 | 5.083290 |
Sunday | 2829 | 4.339356 |
fig = px.bar(activities_by_day_of_week, x = activities_by_day_of_week.index, y = 'activities (%)',
title = """Over 90% of the filming activities were scheduled to start on weekdays
<br><sup>Scheduled filming activities by day of the week (2012-2021)<sup>""")
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_yaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.show()
Was Sunday the least favourite day for filming activities in every year of the 2012-2021 period?
# Create DataFrame with percentages of filming activities by year by day of the week
activities_year_day_of_the_week = filming_permits.groupby(
['StartYear', 'StartDayOfWeek'])['EventID'].count().reset_index()
activities_year_day_of_the_week.rename(columns = {'StartYear': 'year', 'StartDayOfWeek': 'day of the week',
'EventID': 'activities'}, inplace = True)
activities_year_day_of_the_week['activities (%)'] = activities_year_day_of_the_week.groupby('year',
group_keys=False).apply(lambda x: 100 * x['activities'] / x['activities'].sum())
# Order rows by year and by month
activities_year_day_of_the_week['day of the week'] = pd.Categorical(
activities_year_day_of_the_week['day of the week'], categories =
['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday'],
ordered = True)
activities_year_day_of_the_week = activities_year_day_of_the_week.sort_values(by = ['year', 'day of the week'])
activities_year_day_of_the_week
year | day of the week | activities | activities (%) | |
---|---|---|---|---|
1 | 2012 | Monday | 982 | 16.085176 |
5 | 2012 | Tuesday | 1048 | 17.166257 |
6 | 2012 | Wednesday | 1090 | 17.854218 |
4 | 2012 | Thursday | 1130 | 18.509419 |
0 | 2012 | Friday | 1038 | 17.002457 |
... | ... | ... | ... | ... |
69 | 2021 | Wednesday | 1136 | 19.957836 |
67 | 2021 | Thursday | 1090 | 19.149684 |
63 | 2021 | Friday | 1059 | 18.605060 |
65 | 2021 | Saturday | 265 | 4.655657 |
66 | 2021 | Sunday | 171 | 3.004216 |
70 rows × 4 columns
# Check for NaN values
activities_year_day_of_the_week.isnull().sum()
year 0 day of the week 0 activities 0 activities (%) 0 dtype: int64
# Create small multiples
fig = px.bar(activities_year_day_of_the_week, x = 'day of the week', y = 'activities (%)',
facet_col = 'year', facet_col_wrap = 5,
title = """Sunday had the lowest amount of scheduled filming activities in the last 10 years
<br><sup>Scheduled filming activities by day of the week and year (2012-2021)<sup>""",
height = 600)
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_yaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.for_each_annotation(lambda a: a.update(text = a.text.replace('year=', '')))
fig.show()
Thursday was the preferred day for filming activities in 2012, 2013, 2014, 2015, 2016 and 2018. Wednesday had the highest amount of filming activities in 2017, 2019 and 2021.
# Create DataFrame with percentages of filming activities by year by borough
activities_year_borough = filming_permits.groupby(
['StartYear', 'Borough'])['EventID'].count().reset_index()
activities_year_borough.rename(columns = {'StartYear': 'year', 'Borough': 'borough',
'EventID': 'activities'}, inplace = True)
activities_year_borough['activities (%)'] = activities_year_borough.groupby('year', group_keys=False)\
.apply(lambda x: 100 * x['activities'] / x['activities'].sum())
activities_year_borough = activities_year_borough.sort_values(by = ['year', 'borough'])
activities_year_borough
year | borough | activities | activities (%) | |
---|---|---|---|---|
0 | 2012 | Bronx | 97 | 1.588862 |
1 | 2012 | Brooklyn | 1900 | 31.122031 |
2 | 2012 | Manhattan | 3113 | 50.990991 |
3 | 2012 | Queens | 901 | 14.758395 |
4 | 2012 | Staten Island | 94 | 1.539722 |
5 | 2013 | Bronx | 131 | 1.840922 |
6 | 2013 | Brooklyn | 2171 | 30.508713 |
7 | 2013 | Manhattan | 3517 | 49.423834 |
8 | 2013 | Queens | 1189 | 16.708825 |
9 | 2013 | Staten Island | 108 | 1.517707 |
10 | 2014 | Bronx | 162 | 2.276880 |
11 | 2014 | Brooklyn | 2304 | 32.382291 |
12 | 2014 | Manhattan | 3381 | 47.519325 |
13 | 2014 | Queens | 1178 | 16.556571 |
14 | 2014 | Staten Island | 90 | 1.264933 |
15 | 2015 | Bronx | 231 | 2.922201 |
16 | 2015 | Brooklyn | 2508 | 31.726755 |
17 | 2015 | Manhattan | 3597 | 45.502846 |
18 | 2015 | Queens | 1483 | 18.760278 |
19 | 2015 | Staten Island | 86 | 1.087919 |
20 | 2016 | Bronx | 225 | 3.188776 |
21 | 2016 | Brooklyn | 2348 | 33.276644 |
22 | 2016 | Manhattan | 3100 | 43.934240 |
23 | 2016 | Queens | 1286 | 18.225624 |
24 | 2016 | Staten Island | 97 | 1.374717 |
25 | 2017 | Bronx | 294 | 4.146101 |
26 | 2017 | Brooklyn | 2363 | 33.323932 |
27 | 2017 | Manhattan | 3046 | 42.955860 |
28 | 2017 | Queens | 1273 | 17.952334 |
29 | 2017 | Staten Island | 115 | 1.621774 |
30 | 2018 | Bronx | 360 | 4.653568 |
31 | 2018 | Brooklyn | 2671 | 34.526887 |
32 | 2018 | Manhattan | 3191 | 41.248707 |
33 | 2018 | Queens | 1390 | 17.967942 |
34 | 2018 | Staten Island | 124 | 1.602896 |
35 | 2019 | Bronx | 252 | 3.562341 |
36 | 2019 | Brooklyn | 2350 | 33.220243 |
37 | 2019 | Manhattan | 2852 | 40.316653 |
38 | 2019 | Queens | 1447 | 20.455188 |
39 | 2019 | Staten Island | 173 | 2.445575 |
40 | 2020 | Bronx | 123 | 5.338542 |
41 | 2020 | Brooklyn | 813 | 35.286458 |
42 | 2020 | Manhattan | 833 | 36.154514 |
43 | 2020 | Queens | 521 | 22.612847 |
44 | 2020 | Staten Island | 14 | 0.607639 |
45 | 2021 | Bronx | 295 | 5.182713 |
46 | 2021 | Brooklyn | 2150 | 37.772312 |
47 | 2021 | Manhattan | 2112 | 37.104708 |
48 | 2021 | Queens | 1094 | 19.219958 |
49 | 2021 | Staten Island | 41 | 0.720309 |
# Check for NaN values
activities_year_borough.isnull().sum()
year 0 borough 0 activities 0 activities (%) 0 dtype: int64
# Create small multiples
fig = px.bar(activities_year_borough, x = 'borough', y = 'activities (%)',
facet_col = 'year', facet_col_wrap = 5,
title = """Manhattan's participation in filming activities decreased by 13 percent points between 2012 and 2021
<br><sup>Scheduled filming activities by borough and year (2012-2021)<sup>""",
height = 600)
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_yaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.for_each_annotation(lambda a: a.update(text = a.text.replace('year=', '')))
fig.show()
Brooklyn's participation in filming activities increased by 6 percent points between 2012 and 2021.
# Create DataFrame with percentages of filming activities by year by category
activities_year_category = filming_permits.groupby(
['StartYear', 'Category'])['EventID'].count().reset_index()
activities_year_category.rename(columns = {'StartYear': 'year', 'Category': 'category',
'EventID': 'activities'}, inplace = True)
activities_year_category['activities (%)'] = activities_year_category.groupby('year', group_keys=False)\
.apply(lambda x: 100 * x['activities'] / x['activities'].sum())
activities_year_category = activities_year_category.sort_values(by = ['year', 'category'])
activities_year_category
year | category | activities | activities (%) | |
---|---|---|---|---|
0 | 2012 | Commercial | 699 | 11.449631 |
1 | 2012 | Documentary | 34 | 0.556921 |
2 | 2012 | Film | 1383 | 22.653563 |
3 | 2012 | Music Video | 27 | 0.442260 |
4 | 2012 | Still Photography | 452 | 7.403767 |
... | ... | ... | ... | ... |
86 | 2021 | Still Photography | 268 | 4.708363 |
87 | 2021 | Student | 17 | 0.298665 |
88 | 2021 | Television | 3951 | 69.413212 |
89 | 2021 | Theater | 19 | 0.333802 |
90 | 2021 | WEB | 270 | 4.743500 |
91 rows × 4 columns
# Check for NaN values
activities_year_category.isnull().sum()
year 0 category 0 activities 0 activities (%) 0 dtype: int64
# Create small multiples
fig = px.bar(activities_year_category, x = 'category', y = 'activities (%)',
facet_col = 'year', facet_col_wrap = 5,
title = """Filming activities related to television productions increased by 18 p.p. between 2012 and 2021
<br><sup>Scheduled filming activities by category and year (2012-2021)<sup>""",
height = 600)
fig.update_traces(marker_color = '#219ebc')
fig.update_layout({
'plot_bgcolor': '#ffffff',
'paper_bgcolor': '#ffffff',
})
fig.update_yaxes(showgrid = True, gridwidth = 1, gridcolor = '#e0e0e0')
fig.for_each_annotation(lambda a: a.update(text = a.text.replace('year=', '')))
fig.show()
Filming activities related to film productions decreased by 11 p.p. between 2012 and 2021.