View job on Handshake

This is a unique opportunity to join a well-funded Y Combinator backed startup. The candidate will be working side-by-side with the CTO to design and build the early major releases of the company’s products and help set the technical direction and culture of the company.


Dataherald is a Y Combinator backed SaaS start-up which helps companies, no matter how big or small, better leverage the world’s data to create business value. We do this by providing a no-code software to automate the ingestion, transformation, analysis and visualization of data from multiple sources without ever needing to code. Our product manages the data needs end-to-end so that businesses can focus on their goals without worrying about the complexity of large datasets.

Dataherald was launched in 2020 and has grown 10x in the last year. We currently work with leading media publishers like Chicago Tribune, Sacramento Bee etc. and are backed by a mix of Silicon Valley and media institutions such as Y Combinator, Bertelsmann Digital Media Investments (BDMI) and Garage Technology Ventures.

About the role

We seek an experienced, talented Data Scientist to join the product and engineering team at Dataherald in Los Angeles or remotely anywhere around the world. You’ll be bringing your skills and expertise to create data content that makes our business possible and delivers insights to many.

The position will work closely (and directly) with the Founders and particularly with the CTO.

You will be involved throughout the product life-cycle of delivering data content to our customers. From interfacing with our Operations and Customer Success teams to collect customer requirements and priorities, designing and building data visualizations and dashboards, owning relationships with data partners and exploring datasets to find key insights.


  • Independently design, build and launch new ETL pipelines in production
  • Collaborate on improving the company’s data pipeline engine
  • Design schema for master records from multiple external sources
  • Own relationships with our data partners
  • Design and build data integrity and quality controls and processes
  • Design and build ML models to classify incoming data
  • Mentor other team members


  • Previous experience working with large datasets
  • Advanced knowledge of SQL and query optimizations
  • Experience with dimensional data modeling & schema design
  • Experience in ETL design, implementation and maintenance
  • Highly experienced with Python, Pandas

Why us?

  • Impact– We will not make you work on tasks which are low priority and meaningless. You will be making a real contribution to our growth from day 1 and the work you do will have clearly visible and quantifiable results.
  • Learn – This opportunity is like no other when it comes to learning how to grow a start-up from scratch. We have a trial-and-learn mindset when it comes to growth and are doing many things in parallel without the fear of failing. So roll-up our sleeves and get in the action!
  • Flexibility – We are a highly distributed company and believe in asynchronous work rather than long monotonous meetings. You get to work from wherever you want and choose your own timings which make you comfortable.
  • Fun – There is one thing we take extremely seriously at work and it is having fun while working. Because why else would someone work at a startup? 😜

Interview process

  • We’ll have a short intro chat with you (~30 min)
  • We’ll ask you to work on a small take home exercise (~120 min)
  • After reviewing your work, we’ll ask you to talk about it over a conversation with a founder (~30 min)
  • And finally, hopefully we’ll send you an offer. And hopefully you’ll accept!