quinn's code showcase!

I started coding in Python in 2017, back when Codecademy had a functioning free tier and Stack Overflow was alive. As a result, I now speak Python more fluently than I speak Chinese -- a language I'm natively bilingual in -- and was a TA for several introductory Python courses at NYU footnote.

While studying data science at NYU, I developed a strong foundation in statistical analysis and machine learning, and learned frontend development in my spare time as a hobby. My work in data analysis and ML found me doing just that at the NYC Mayor's Office of Management and Budget, and my frontend explorations somehow landed me collaborations with Dan Toomey of YouTube's Good Work, Michelladonna of TikTok's Shop Cats, and Neal of neal.fun.

The following are three projects that showcase my work in data analysis, statistical testing, and machine learning. The repositories are hyperlinked in the project titles. To see more of my frontend work, you can have a browse here.


⇱ The 411 on 311: NYC Service Requests, Mapped and Analyzed

PythonData AnalysisData VizHTML/CSSJavaScriptFrontend

Background: The last apartment building I was in had a tenant group chat that I'm pretty sure fired off an average of ten 311 service requests per week. We had it all: mice, roaches, leaks, heat, noise... it made me think about how NYC is just this, but hundreds of thousands of times over.

The city's 311 system receives millions of service requests every year, routed across 17 city agencies. But not every complaint gets the same response, as most of us probably know. So I looked at whether a New Yorker's ZIP code or the type of complaint they file predicts how long they'll wait for the city to act, and mapped it out with interactive tooltips.

Data sources:

Findings: The average New Yorker waits ~20 days for a 311 complaint to close -- but that masks a range from 4 hours to nearly 2 years depending on complaint type and location. Complaints routed to the NYPD (noise, parking) close in hours; those sent to regulatory agencies (housing, food inspection, tree requests) can sit open for months. Full findings and methods are here footnote, and you can read the full story here; it has pretty maps!

Key skills: large-scale data cleaning, geospatial analysis (GeoPandas/GeoJSON), D3.js choropleth mapping, scroll-driven data storytelling (scrollytelling).


⇱ Folio: A Personalized arXiv Paper Recommendation Engine

PythonMachine LearningNLPInformation RetrievalStreamlitFull Stack

Background: For our senior project in our machine learning class, our group built a personalized arXiv paper recommendation engine. The app, called Folio, generates a daily feed of arXiv papers tailored to your research interests, which you can update in real time with feedback on what you like and don't like. It's built as a local Streamlit app that runs entirely on your machine.

At onboarding, your research interests (entered as free text, curated tags, or imported from a public Google Scholar profile) are decomposed into 1–3 "research thread" vectors. Each day, those vectors drive retrieval from a k-means-clustered SPECTER2 embedding index of arXiv papers, and a custom scoring function balances relevance, recency, and feed diversity. Feedback (likes, saves, skips) updates your interest vectors in real time via an exponential moving average.

Architecture highlights:

Key skills: vector search and embedding index design, k-means clustering for retrieval, recommendation system design (EMA centroid updates, diversity scoring), UMAP dimensionality reduction for embedding visualization, full-stack Streamlit app architecture, SQLite schema design.


⇱ NYC Congestion Pricing Air Quality Analysis

PythonData AnalysisData VizCausal Inference

Background: While working in the Policy & Operations task force in NYC OMB, we looked at the unexpected effects of congestion pricing, and I focused on air quality. Smog from car exhaust and emissions contribute to poor air quality in NYC, particularly in areas with high traffic density. So when NYC implemented congestion pricing, starting January 5, 2025, we wondered if air pollution would improve in return. For metrics, we used PM2.5 particulate matter concentrations from the NYC Department of Environmental Protection.

Data sources:

Findings: PM2.5 levels in NYC were meaningfully lower in spring 2025 than spring 2024, across all five boroughs and all hours of the day, but not enough so reach a significance threshold footnote. Read more about the methods and results here!

Key skills: multi-source data ingestion, time series aggregation, causal inference (DiD), statistical hypothesis testing, data visualization.

  1. believe me, my dual fluency in Python and Chinese came in handy for that!

  2. i also unearthed a years-long grudge between a building off the corner of Central Park and a food vendor; more details in the README

  3. specifically, the standard significance threshold of p < 0.05 was not reached; whether or not the effect is practically significant is up for interpretation