quinn's code showcase! ↩
I started coding in Python in 2017, back when Codecademy had a functioning free tier and Stack Overflow was alive. As a result, I now speak Python more fluently than I speak Chinese -- a language I'm natively bilingual in -- and was a TA for several introductory Python courses at NYU footnote.
While studying data science at NYU, I developed a strong foundation in statistical analysis and machine learning, and learned frontend development in my spare time as a hobby. My work in data analysis and ML found me doing just that at the NYC Mayor's Office of Management and Budget, and my frontend explorations somehow landed me collaborations with Dan Toomey of YouTube's Good Work, Michelladonna of TikTok's Shop Cats, and Neal of neal.fun.
The following are three projects that showcase my work in data analysis, statistical testing, and machine learning. The repositories are hyperlinked in the project titles. To see more of my frontend work, you can have a browse here.
⇱ The 411 on 311: NYC Service Requests, Mapped and Analyzed
PythonData AnalysisData VizHTML/CSSJavaScriptFrontend
Background: The last apartment building I was in had a tenant group chat that I'm pretty sure fired off an average of ten 311 service requests per week. We had it all: mice, roaches, leaks, heat, noise... it made me think about how NYC is just this, but hundreds of thousands of times over.
The city's 311 system receives millions of service requests every year, routed across 17 city agencies. But not every complaint gets the same response, as most of us probably know. So I looked at whether a New Yorker's ZIP code or the type of complaint they file predicts how long they'll wait for the city to act, and mapped it out with interactive tooltips.
Data sources:
- 15.35 million 311 service requests (Oct 2020 – Oct 2025) from NYC OpenData, covering 202 ZIP codes, 248 complaint types, and 17 agencies.
- NYC Modified Zip Code Tabulation Areas (MODZCTA) shapefile for geographic boundary mapping.
Findings: The average New Yorker waits ~20 days for a 311 complaint to close -- but that masks a range from 4 hours to nearly 2 years depending on complaint type and location. Complaints routed to the NYPD (noise, parking) close in hours; those sent to regulatory agencies (housing, food inspection, tree requests) can sit open for months. Full findings and methods are here footnote, and you can read the full story here; it has pretty maps!
Key skills: large-scale data cleaning, geospatial analysis (GeoPandas/GeoJSON), D3.js choropleth mapping, scroll-driven data storytelling (scrollytelling).
⇱ Folio: A Personalized arXiv Paper Recommendation Engine
PythonMachine LearningNLPInformation RetrievalStreamlitFull Stack
Background: For our senior project in our machine learning class, our group built a personalized arXiv paper recommendation engine. The app, called Folio, generates a daily feed of arXiv papers tailored to your research interests, which you can update in real time with feedback on what you like and don't like. It's built as a local Streamlit app that runs entirely on your machine.
At onboarding, your research interests (entered as free text, curated tags, or imported from a public Google Scholar profile) are decomposed into 1–3 "research thread" vectors. Each day, those vectors drive retrieval from a k-means-clustered SPECTER2 embedding index of arXiv papers, and a custom scoring function balances relevance, recency, and feed diversity. Feedback (likes, saves, skips) updates your interest vectors in real time via an exponential moving average.
Architecture highlights:
- Offline pipeline builds a memory-mapped SPECTER2 embedding matrix (~6 GB for the full corpus) and a k-means cluster index; retrieval at serve time is a dot-product search within candidate clusters, not a full scan.
- Diversity scoring uses a greedy selection algorithm with a tunable diversity index (δ) that balances user-centroid coverage and cluster saturation: if you have two research threads, the feed doesn't collapse entirely into whichever one dominates by similarity.
- Query search expands short queries into scientific retrieval text, then ranks a candidate pool with query similarity, user-profile similarity, recency, and lightweight lexical evidence from title and abstract.
- Workspace tab lets you collect saved papers, generate AI synthesis and connection graphs (optional, via OpenAI API), and open PDFs for annotation in the Research Lab.
Key skills: vector search and embedding index design, k-means clustering for retrieval, recommendation system design (EMA centroid updates, diversity scoring), UMAP dimensionality reduction for embedding visualization, full-stack Streamlit app architecture, SQLite schema design.
⇱ NYC Congestion Pricing Air Quality Analysis
PythonData AnalysisData VizCausal Inference
Background: While working in the Policy & Operations task force in NYC OMB, we looked at the unexpected effects of congestion pricing, and I focused on air quality. Smog from car exhaust and emissions contribute to poor air quality in NYC, particularly in areas with high traffic density. So when NYC implemented congestion pricing, starting January 5, 2025, we wondered if air pollution would improve in return. For metrics, we used PM2.5 particulate matter concentrations from the NYC Department of Environmental Protection.
Data sources:
- Hourly PM2.5 readings from 18 NYC air quality monitoring stations across all boroughs (Jan 2024 – May 2025), from the NYC DOT/NYSDEC sensor network.
- Boston daily PM2.5 data as an external control city.
- Station location/metadata CSV for geographic filtering.
Findings: PM2.5 levels in NYC were meaningfully lower in spring 2025 than spring 2024, across all five boroughs and all hours of the day, but not enough so reach a significance threshold footnote. Read more about the methods and results here!
Key skills: multi-source data ingestion, time series aggregation, causal inference (DiD), statistical hypothesis testing, data visualization.