Lessons from the first two data scientists at a startupWritten by Matt Sosna on April 29, 2021
Data science in startups is notorious for being a memorable ride. From work that pivots on a dime from spreadsheets to customer interviews to CI/CD pipelines, to being handed more responsibility than you likely know what to do with, you’re guaranteed to learn nonstop in this role.
But what about being the first data scientist at a startup, the tip of the spear? Or what if you’re the second to join, the first extra manpower for those more ambitious projects?
To answer these questions, I spoke with my great friend and colleague, Minkyung Kang (MK), who founded the data science team at our company Aquicore. As the second data scientist, I was interested to see where our experiences aligned versus diverged. Together, we identified three key summaries of our experiences, as well as three questions to ask yourself if you’re considering joining a startup as one of the first data scientists.
1. Learn constantly through trial and error
Startups invariably have fewer resources than mature companies. This means that while data science spans a broad analytics-engineering spectrum comprising several more specific occupations, all work even remotely related to data science will come to you simply because you’re the only “data person” around.
This can either be frustrating or empowering depending on your perspective. Because there might not be a dedicated product manager coordinating your roadmap, you can sharpen your business skills by working directly with stakeholders to define the user experience. Since the other software engineers are busy, you can be the one who sets up the database, containerizes the model, and deploys it.
By being involved in every step of a project $-$ from ideas scribbled on a whiteboard to releasing your model to production $-$ you gain a holistic picture of the business and the role data science can play. This end-to-end lifecycle means you can become a full-stack data scientist, a coveted “unicorn.”
But this learning doesn’t come easily. Unless you’re joining the company with years of experience under your belt, you will constantly face questions you don’t know the answers to. Is a SQL or NoSQL database the right storage for a new data source? What do I do if I build a dashboard that internal teams end up not using? Do customers care more about false negatives or false positives when being alerted about potential anomalies?
For the business or software engineering questions, you can hopefully find mentors in the company to help you figure out the answers. But when it comes to core data science questions, like identifying the right features for a model or how to deal with biased training data $-$ if you’re the first data scientist, you’re on your own! And even if you’re the second data scientist, it can be hard for the two of you to identify the scalable, best-practice way to build a product without a mentor who’s “been there” before.
The result is a lot of trial and error. It will be humbling constantly facing how much you don’t know as you take on a never-ending flow of fresh challenges. But if you’re willing to learn from your mistakes, you’ll quickly grow as an analyst, engineer, and critical thinker.
2. Develop business acumen
One of the best things about being an early data scientist in a startup is working closely with senior leadership. You never work on a task “just because” $-$ there is always a pressing business context, a clearly defined need for you to address. By meeting regularly with the CEO, or head of Product, or VP of Sales, you begin to understand the broader ecosystem your company exists in.
The result is that you develop a strong understanding of how to do impactful data science. You never have to wonder if the model or dashboard you build is actually useful $-$ your stakeholders are probably sitting at the desk next to you! If your output misses the mark, you will immediately know why.
As you iterate on their feedback, you inevitably develop a solutions-focused rather than technology-focused approach. You start thinking in terms of “What’s needed to solve this problem?” rather than “What can I do with this tool?” This mentality is incredibly valuable for building products that actually help the company.
But there is a downside to the time you spend growing a strong business sense: you’re not necessarily gaining that deep an expertise in any data science domain. Unless your company’s core market offering is image recognition, you’re not going to become an expert in computer vision. Unless your company’s product is a text summarizer, don’t expect to significantly grow your NLP skills. While this is arguably the case in most small or mid-sized companies, it’s especially true in startups; the business needs simply change too quickly.
Rather than becoming a time series expert, you’re more likely to become an expert in communicating to non-technical stakeholders. And rather than learning the latest cutting-edge analytics tool, you’re likely to become an expert in statistics fundamentals as you repeatedly explain how your model arrives at its conclusions.
3. Be the driver of your vision
Startups are biased towards action. A startup is usually not profitable until the business is well-established, meaning it’s constantly burning through investor cash. Each month, there is less money in the bank as the CEO signs checks for salaries, office space, raw materials, and other operating expenses. This cash runway is often only 12-24 months, but it’s also often terrifyingly close to zero!
That means there’s tremendous pressure on startups to get the business to a point where it pulls in more money than it loses, or at least entice investors to risk investing more. Given how much data science can help $-$ from automating simple reports and data quality assurance, to providing a competitive edge with innovative product features $-$ startups are eager for your ideas on how to keep them alive and thriving.
If you’re the only data scientist, your company will consider you the authority on data science (at least compared to the other employees!). You’ll be expected to both generate ideas (i.e. understand the business context) and carry them out (i.e. know the technical execution). While this can be a lot of pressure, it also means you have the opportunity to drive a vision for data science at the company.
Do you think scraping Twitter for customer sentiment will help inform marketing decisions? Sure, build it and let’s see. Is it worth creating an internal OCR app to automate transcribing receipts? Sure, build it and let’s see. Are you convinced a customer churn predictor will help the Customer Success team tailor their effort? Sure, build it and let’s see.
Unlike in a larger organization with established protocols and roadmaps that change slowly, your impact in a startup is limited more by how quickly you can generate and execute on ideas. It’s a bit of the Wild West $-$ if you have interesting ideas and the initiative to carry them out, you’ll do well.
If these experiences align with what you’re looking for, then you might do well as a data scientist at a startup! But before you join just any company, there are a few questions to ask to make sure you’ll be happy and productive in your new role.
1. What is the company’s vision for the role?
The most important question to ask is what the company is trying to achieve with data science. Are they trying to help business leadership make more informed decisions, do they want a software engineer who also knows machine learning, or something in between?
Your experience will look like entirely different roles depending on the answer to this question. In analytics-heavy versus engineering-heavy roles, you will have different source data (e.g. Excel files vs. databases), deliverables (e.g. a report vs. software), and people you interact with (e.g. business leadership vs. the engineering team).
But more importantly, has the company thought closely about how data science can help them, or do they have no idea what they want and are expecting you to define the role? If the company is just hiring data scientists because that’s just what everyone’s doing… then really make sure you know what you’re doing before you join. We strongly discourage joining a company like this $-$ even years of experience, clear business sense, and an assertive personality might not overcome office politics and a lack of planning.
2. Is the data ready?
Companies vary tremendously in how prepared their data is for a data scientist to use. It’s not uncommon for companies to hire data scientists even when their data isn’t “ready”! The level of data readiness falls into three buckets:
A. Ideas but no data
The company, maybe a very early-stage startup, needs a data scientist to kick off their journey and figure out what data to collect, how to get it, and how to utilize it. The company and the data scientist will try, fail, and iterate together.
In this case, your job will likely heavily involve running proofs of concept as you test out which of the company’s ideas hold water. This can be rewarding if the company’s ideas resonate with you and you want to pioneer their data sources and architecture… though be prepared to learn a lot of data engineering.
B. Some data but not in a structured or usable way
This company perhaps has data but could really use a data engineer. The data consists of Excel files on hard drives or Google Drive or Dropbox, or you need to click around Salesforce each time you want to run an analysis.
These inefficiencies might only be a nuisance if your role is to analyze data and generate reports. But if your deliverable is software, unreliable data access will be a big impediment to you being able to do your job. You’ll either need to take on this engineering work yourself, or you’ll need to educate the company on data engineering and advocate for hiring data engineers to unblock you.
C. Easily consumable data
The final possibility is that the company has all the data ready for you in some consumable way through a database, data warehouse, or set of APIs. Maybe they’ve been collecting the data for years and have clear ideas for ways a data scientist could derive value from that data. This is a major plus to consider when evaluating companies!
Finally, note that a company can fall into several of these buckets depending on the types of data $-$ maybe the core product data is tightly structured but customer metadata is scattered and hard to use.
3. Is there a data science “champion”?
Successful data science requires a good amount of surrounding architecture: well-defined business questions, data to access easily and scalably, and software engineering to integrate your work into the company software (if needed).
These resources are incredibly hard to coordinate without help. Especially as the first or second data scientist, you need a “data champion” in business leadership who can advocate for you and promote your work. Without such a champion, your work can easily be unseen, unnoticed, and undervalued, and you will constantly need to fight to prove the value you’re bringing.
Having a champion doesn’t mean you don’t need to regularly communicate your work and educate others. But even the most convincing arguments won’t get far without someone with influence who can make larger changes happen, such as hiring data engineers if the company data isn’t easily usable, or defending the amount of time research takes. You’re likely set up to fail if you don’t have this support: it will be incredibly hard to carry out the work you’re expected to do, and you won’t be able to institute the changes needed to succeed.
Joining a startup is a big decision! The breakneck pace isn’t for everyone, but there’s no better place to do immediately impactful work, especially early in your career. Startups are ideal for learning constantly, developing business sense, and implementing innovative ideas. You’ll grow to think critically about the business as a whole and the sorts of problems data science can and can’t solve.
But it’s also easy to join a company that isn’t really prepared to take on a data scientist. Make sure the company has a clear vision for your role that aligns with the work you want to do, or you’ll find yourself in a role entirely different than what you signed up for. Be prepared to handle whatever format the data (or lack thereof) comes in, even if that means building out your data engineering skills. Finally, make sure you have at least one data champion who can help grease the wheels when bigger company changes are needed.
Matt and MK
If your startup is a SaaS company, the story likely differs a little when it comes to software engineering skills. You will learn a wide range of technical skills needed to get your model across the finish line, whether that’s an internal dashboard or integrated into your company’s software. You will become comfortable with the command line as you deploy Flask APIs, use servers in the cloud, and set up a daily pipeline in Databricks or Airflow. But this knowledge will likely be patchy: you’ll know enough for your use case, but not necessarily enough to help others with theirs.