AWS Essentials for Data Science
1. Why Cloud Computing?

April 17, 2022

Imagine you’re a coordinator for data science meet & greets in New York. A major part of your planning involves reserving a venue to accommodate your guests. You’ve always rented venues around the city, but you wonder if it’d be better to just buy your own to avoid the hassle of searching for a free one every time.

Read More

Intermediate SQL

January 17, 2022

When I started learning SQL, I found it hard to progress beyond the absolute basics. I loved DataCamp’s courses because I could just type the code directly into a console on the screen. But once the courses ended, how could I practice what I learned? And how could I continue improving, when all the tutorials I found just consisted of code snippits, without an underlying database I could query myself?

Read More

Exploring stacks and queues

November 28, 2021

In our last post, we covered data structures, or the ways that programming languages store data in memory. We touched upon abstract data types, theoretical entities that are implemented via data structures. The concept of a “vehicle” can be viewed as an abstract data type, for example, with a “bike” being the data structure.

Read More

Intro to data structures

October 19, 2021

Imagine you build a wildly popular app that is quickly growing towards a million users. (Congrats!) While users love the app, they’re complaining that the app is becoming slower and slower, to the point that some users are starting to leave. You notice that the main bottleneck is how user info is retrieved during authentication: currently, your app searches through an unsorted list of Python dictionaries until it finds the requested user ID.

Read More

A deep dive on ARIMA models

August 11, 2021

Predicting the future has forever been a universal challenge, from decisions like whether to plant crops now or next week, marry someone or remain single, sell a stock or hold, or go to college or play music full time. We will never be able to perfectly predict the future[1], but we can use tools from the statistical field of forecasting to better understand what lies ahead.

Read More

A hands-on demo of analyzing big data with Spark

June 16, 2021

Cloud services firm Domo estimates that for every minute in 2020, WhatsApp users sent 41.7 million messages, Netflix streamed 404,000 hours of video, $240,000 changed hands on Venmo, and 69,000 people applied for jobs on LinkedIn. In that firehose of data are patterns those companies use to understand the present, predict the future, and ultimately stay alive in a hyper-competitive market.

Read More

Efficient type validation for Python functions

April 18, 2021

When it comes to writing complex pipelines running in production, it’s critical to have a clear understanding of what each function does, and how its outputs affect downstream functions. But despite our best efforts to write modular, well-tested functions, bugs love hiding in the handoffs between functions, and they can be hard to catch even with end-to-end tests.

Read More

SQL vs. NoSQL databases in Python

April 5, 2021

From ancient government, library, and medical records to present-day video and IoT streams, we have always needed ways to efficiently store and retrieve data. Yesterday’s filing cabinets have become today’s computer databases, with two major paradigms for how to best organize data: the relational (SQL) versus non-relational (NoSQL) approach.

Read More

Transitioning to data science from academia

February 10, 2021

“I could always do data science if academia doesn’t work out.” It’s a recurring thought many graduate students and postdocs experience, especially if their work involves hearty servings of programming and statistics, the core elements of data science. Data science can be a rewarding alternative to academia, and academics do have many qualities that make them attractive candidates for data science roles. However, there are also often large holes in academics’ skill sets that can deter them from being hired straight off the bat.

Read More

How to enter data science
5. The people

January 21, 2021

So far, we’ve covered the technical side to data science: statistics, analytics, and software engineering. But no matter how talented you are at crunching numbers and writing code, your effectiveness as a data scientist is limited if you chase questions that don’t actually help your company, or you can’t get anyone to incorporate the results of your analyses. Similarly, how do you stay motivated and relevant in a field that’s constantly evolving?

Read More

How to enter data science
3. The analytics

December 15, 2020

Welcome to the third post in our series on how to enter data science! The first post covered how to navigate the broad diversity of data science roles in the industry, and the second was a deep dive on (some!) statistics essential to being an effective data scientist. In this post, we’ll cover skills you’ll need when manipulating and analyzing data. Get ready for lots of syntax highlighting!

Read More

How to enter data science
2. The statistics

December 5, 2020

In the last post, we defined the key elements of data science as 1) deriving insights from data and 2) communicating those insights to others. Despite the huge diversity in how these elements are expressed in actual data scientist roles, there is a core skill set that will serve you well no matter where you go. The remaining posts in this series will define and explore these skills in detail.

Read More

How to enter data science
1. The target

August 27, 2020

The data science hype is real. Glassdoor labeled data scientist as the best job in America four years in a row, nudged out of the top spot only this year. Data science is transforming medicine, healthcare, finance, business, nonprofits, and government. MIT is spending a billion dollars on a college dedicated solely to AI. An entire education industry has sprouted to train new data scientists as fast as possible to fill the burgeoning demand, and for good reason: when 90 percent of the world’s data was generated in the last two years, we’re in dire need of people who understand how to find patterns in that pile of numbers.

Read More

Perspectives on Python after R

July 24, 2020

My first programming language was R. I fell in love with the nuance R granted for visualizing data, and how with a little practice it was straightforward to pull off complex statistical analyses. I coded in R throughout my Ph.D., but I needed to switch to Python for my first non-academic job. Picking up a second language went much faster than the first, but there was a lot to get used to when I transitioned.

Read More

Visualizing the danger of multiple t-test comparisons

May 13, 2018

It’s often tempting to make multiple t-test comparisons when running analyses with multiple groups. If you have three groups, this logic would look like “I’ll run a t-test to see if Group A is significantly different from Group B, then another to check if Group A is significantly different from Group C, then one more for whether Group B is different from Group C.” This logic, while seemingly intuitive, is seriously flawed. I’ll use an R function I wrote, false_pos, to help visualize why multiple t-tests can lead to highly inflated false positive rates.

Read More

Linear regression via gradient descent

April 22, 2018

After hearing so much about Andrew Ng’s famed Machine Learning Coursera course, I started taking the course and loved it. (His demeanor can make any topic sound reassuringly simple!) Early in the course, Ng covers linear regression via gradient descent. In other words, given a series of points, how can we find the line that best represents those points? And to take it a step further, how can we do that with machine learning?

Read More

Visualizing my daily commute

November 1, 2017

I love data visualization, and one holiday my partner surprised me with the book Dear Data. The book is a series of weekly letters two data analysts wrote to one another with visualizations of data on random topics. One week they tracked the number of times they said “thank you,” for example; another week, they counted the number of times they looked at a clock. In their letters, they visualized their data. One of the most interesting parts of the book was seeing how differently they could plot the same type of data.

Read More

How to be fancy with comparisons in R

October 14, 2017

Welcome to another episode of “Random R,” where we’ll ask random programming and statistical questions and answer them with R. Today, for whatever reason, let’s say we want to dive into methods for comparing values. We’ll start simple (e.g. is 5 greater than 4? Read on to find out.) and then work our way towards trickier element-wise comparisons among multiple matrices.

Read More

Ph.D. reflections
4th year

September 18, 2017

Writing this in September 2017 after the new first-years have arrived on campus, I realize it’s now been four years since I started the PhD. Back in 2013, I had just finished a year in Germany on a Fulbright scholarship, studying social and antipredator behavior in birds at the Max Planck Institute for Ornithology. The work I had helped with there was on its way to being published in Animal Behaviour, as had my undergraduate senior thesis work. I had just received an NSF-GRFP fellowship – a great vote of confidence from the federal government – and had spent the last weeks of summer traveling and enjoying the Behaviour conference in Newcastle, UK.

Read More

For loops vs. apply - a race in efficiency

July 13, 2017

Welcome to the first Random R post, where we ask random programming questions and use R to figure them out. In this post we’ll look at the computational efficiency of for loops versus the apply function.

Read More

Ph.D. reflections
3rd year

August 12, 2016

Year three. If an American PhD takes 5-6 years, then this is when you pass the halfway point. You’re now in the thick of the weeds. The big picture science that originally got you into this PhD gets harder to remember. The questions you set off to answer years ago really need to stand up to the second guesses that come from when you start dedicating hundreds of hours to answering them. It gets hard not to hear those quiet voices asking if there’s a better way to do things: is academia the path of most happiness for me, what’s consulting or industry like, am I actually doing interesting work? And of course, the eternal question of investing in automation versus doing the mind-numbing manual work! (See xkcd for the right ratio of investment to payoff!)

Read More

Ph.D. reflections
2nd year

October 8, 2015

The 2nd year of the PhD felt like being a teenager. You’re no longer new to graduate school, and you’re starting to feel the pressure of having something to show for your time here. Within your second year, you go from a PhD student interested in a topic, to a PhD candidate who understands the topic well enough that he can convince others it’s important. This transition has felt a bit like growing up: a loss of naivety and the addition of responsibilities, but also a legitimization of who you are as a researcher. This blog post describes my experiences during this year and what I’ve learned from them.

Read More

Prelims summary and advice

July 28, 2015

Sometime within the first two years of a North American biology PhD, grad students take an exam that determines whether their research ideas hold water or whether they should leave. No pressure! This rite of passage is called the generals, qualifying, or preliminary exam (“generals,” “quals,” and “prelims”), and it’s analogous to a Masters defense in the European system. The specifics of the exam vary greatly between universities but tend to involve a written literature review and thesis proposal, sometimes a written exam, and a multiple-hour oral exam by the thesis committee. The committee, which consists of 3-5 professors who read your proposal, will ask you questions for around three hours and then decide whether you should stay.

Read More

Behind the Scenes
Couzin et al. 2011

July 16, 2015

The story behind Couzin et al. 2011: “Uninformed individuals promote democratic consensus in animal groups”

Couzin ID, Ioannou CC, Demirel G, Gross T, Torney CJ, Hartnett A, Conradt L, Levin SA, Leonard NE. 2011. Uninformed individuals promote democratic consensus in animal groups. Science. 334: 1578-1580.

Read More

Ph.D. reflections
1st year

June 5, 2014

The first year of the PhD is over… I guess! It doesn’t really feel like your second year until the new first-years arrive in September, and work hasn’t suddenly stopped with the end of the semester, unlike in college. If anything, this summer is when I’ll actually make any progress on experimental ideas I’ve been developing since I first e-mailed my advisor two years ago. But enough time has passed that I think I can share some reflections on my first year that can hopefully help someone else starting or considering starting a PhD in biology.

Read More

Writing the self-contained universe

November 23, 2013

You are not sitting next to me right now as I type these thoughts. You’re most likely not in New Jersey, and you might not even be in the U.S. The fact that it’s even possible for you to be reading these words right now highlights the power of communicating ideas through writing. Effective communication is the difference between you growing bored and leaving halfway through this blog post to explore other parts of the internet, and you reaching the end (before moving on to explore the rest of the internet!).

Read More

Gap years - trying things out

June 8, 2013

In the sciences it’s easy to get in the mindset of “go to college, go to grad school, get a postdoc, be a professor” for your career. While this trajectory works, I want to talk about the crazy idea of breaking from the path for a year or two before you throw yourself into a PhD program. This applies to people applying to professional schools like medicine or law, as well!

Read More

Application advice for the NSF-GRFP

April 16, 2013

[Disclaimer: I received the GRFP during the 2012-13 application cycle, which might not reflect what the current GRFP is looking for. But hopefully the broader themes in this post still apply!]

Read More

Nine months out of the college bubble

February 16, 2013

As the one-year anniversary of my graduation approaches, I’ve had some time to reflect on my college experience and the ways it has - and hasn’t - prepared me for graduate-level research.

Read More

How to get into grad school for bio

November 1, 2011

At this point in the year, with grad applications closing and the waiting process beginning (or continuing for some of us), this post might not seem all that relevant to the college seniors who have hopefully figured out how to apply to graduate schools. This post may seem early for juniors who are interested in grad school but figure they have time before they apply. Maybe the occasional freshman or sophomore who stumbles across this blog will think that grad school is so far in the distance it’s not even worth thinking about right now. However, the following advice is a general path for leveling up as a researcher and figuring out what about biology interests you, knowledge that will serve you well regardless if you pursue grad school.

Read More