Distance Between Chinatown and Financial District in Major U.S. Cities

4 minute read

I have lived in New York City and Boston, and have visited some other cities. In those cities, I have noticed one thing, it seems that Chinatown is always close to the financial district.

Therefore, I want to do some simple analysis to see if that’s always the case.

List of Cities

I searched online and found this list: 10 best Chinatowns across the USA.

Then what I did was just some very very simple web scraping to get the city names and make them into a data frame.

url <- "http://www.usatoday.com/story/travel/destinations/2014/03/08/chinatown-chinese-asian-food/6173601/"
 
usa_news <- read_html(url)
city <- usa_news %>% 
  html_nodes("b") %>% 
  html_text() %>% 
  tbl_df() %>% 
  filter(str_detect(value, "^\\d"), !str_detect(value, "Honolulu")) %>% 
  rename(city = value) %>% 
  mutate(city = str_replace(city, "^\\d{1,2}\\. ", ""))
city
              City
1    San Francisco
2    New York City
3          Chicago
4          Seattle
5     Philadelphia
6           Boston
7      Los Angeles
8          Houston
9 Washington, D.C.

Now I need to find where Chinatowns and the Financial Districts are located in those cities.

Note: The reason I removed Honolulu is that I only want to consider the Contiguous United States.

List of Zip Codes

Here I did some manual look-up to find the zip codes of Chinatowns and Financial Districts and add them to the city data frame.

ct_ft <- city %>% 
  bind_cols(
    tribble(
      ~Chinatown, ~Financial,
      "94108", "94111",
      "10013", "10038",
      "60616", "60605",
      "98104", "98104",
      "19107", "19103",
      "02111", "02110",
      "90012", "90071",
      "77036", "77002",
      "20001", "20433"
    )
  )

This is not ideal, as at first I was trying to programmatically find the polygon coordinates of those two areas, however, it seems that Google Maps API doesn’t support that, though Google apparently has that data:(.

Then I tried to use geocode function from ggmap package to get the zip codes:

ct_fd <- city %>%
  mutate(city = str_replace(city, "^\\d{1,2}\\. ", ""),
         chinatown = str_c(city, " Chinatown"),
         financial_district = str_c(city, " Financial District")) %>%
  gather(area, query_name, -city) %>%
  arrange(city)
 
 
ct_fd_geo <- ct_fd %>%
  invoke_rows(function(query_name, ...) geocode(query_name, output = "more"), ., .to = "geo_code")

However, the problem here is that in certain cities, Chinatown and/or Financial District are not well-defined, and thus the above function would return some empty results. Compromisingly, I had to do some research and find the zip codes. I tried to pick the zip code that’s in the center of the area.

Distances

To get the distances, I used the mapdist function from ggmap package.

              City Chinatown Financial  from    to     m     km     miles seconds   minutes      hours
1          Seattle     98104     98104 98104 98104     0  0.000  0.0000000       0   0.00000 0.0000000
2    San Francisco     94108     94111 94108 94111  1334  1.334  0.8289476     989  16.48333 0.2747222
3           Boston     02111     02110 02111 02110  1875  1.875  1.1651250    1442  24.03333 0.4005556
4      Los Angeles     90012     90071 90012 90071  1893  1.893  1.1763102    1412  23.53333 0.3922222
5     Philadelphia     19107     19103 19107 19103  1932  1.932  1.2005448    1510  25.16667 0.4194444
6    New York City     10013     10038 10013 10038  2310  2.310  1.4354340    1731  28.85000 0.4808333
7 Washington, D.C.     20001     20433 20001 20433  3000  3.000  1.8642000    2390  39.83333 0.6638889
8          Chicago     60616     60605 60616 60605  4022  4.022  2.4992708    2975  49.58333 0.8263889
9          Houston     77036     77002 77036 77002 18679 18.679 11.6071306   14067 234.45000 3.9075000

From the above, we could see that the distances are mostly within 3 miles, and usually within 30 minutes walk. The reason I chose to use walking mode when measure distance is because walking is usually more flexible and traffic-free.

Draw on Maps

At last, I also made some visualizations to show Chinatown and Financial Districts on city maps. I used the zipcode dataset from the zipcode package to get the longitude and latitude of those zip codes so that I could plot them on map.

data("zipcode")
 
ct_ft_zips <- ct_ft %>% 
  gather(Area, zip, -city) %>% 
  left_join(zipcode %>% 
              select(-city),
            by = "zip") %>% 
  arrange(city)

Next, I created a function to map them on maps without having to write the similar code again and again.

map_area <- function(city, state, zoom, size) {
  ggmap(get_map(str_c(city, ", ", state), zoom = zoom, maptype = "toner-lite", source = "google")) +
    geom_point(data = ct_ft_zips %>% filter(state == state), 
               aes(x = longitude, y = latitude, color = Area), size = size) +
    labs(x = "Longitude", y = "Latitude")
}

1. Seattle, WA

plot of chunk unnamed-chunk-7

2. San Francisco, CA

plot of chunk unnamed-chunk-8

3. Boston, MA

plot of chunk unnamed-chunk-9

4. Los Angeles, CA

plot of chunk unnamed-chunk-10

5. Philadelphia, PA

plot of chunk unnamed-chunk-11

6. New York City, NY

plot of chunk unnamed-chunk-12

7. Washington, DC

plot of chunk unnamed-chunk-13

8. Chicago, IL

plot of chunk unnamed-chunk-14

9. Houston, TX

plot of chunk unnamed-chunk-15

Summary

Objectively speaking, I am a little bit disappointed because they are not as close as I thought they would be, and especially in Houston, they are really far away from each other.

Subjectively speaking, there are some issues with this analysis:

  • I wasn’t able to get border coordinates of the two areas in all the cities

  • In some cities, Financial Districts are not well-defined

  • Sample size is small as there is still a lot of other major cities

  • The zoom settings are different in different cities when ploting points on map because the areas of those cities are different

  • Closesness is somewhat vague and objective, here I used the absolute distance, but I could also use relative distance, taking city size into account

The good thing is that this analysis gives me a chance to test my initial hypothesis, it doesn’t matter if the result supports your hypothesis/theory or not, data analysis is always fun.

Comments

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...