Lessons From The Green Owl: Observations From 4 Years At Duolingo

January 2026


Yes, Duolingo is an actual company with physical offices and real employees. When I started here in early 2022, the app had about 10 million daily active users. By the time 2026 rolled around, that number had grown 5x to more than 50 million. More people now learn on Duolingo daily than the entire Grade 1-12 population of the United States. During the same period, the app went from burning cash every quarter to becoming consistently profitable.

During these 4 years, the entire app was redesigned. The number of languages courses on offer 10x-ed, and the length of existing courses doubled on average. Major features like the ability to practice conversation were launched. Users were learning faster than ever across the different competencies that we tracked (speaking, writing, reading and listening). The company also expanded beyond language learning and into math, music and chess.

This post contains some notable observations from my time at the company that stood out to someone who was then fresh out of college.

Keeping Learners Motivated is 90% Of The Puzzle

I've seen lots of EdTech startups launch with fresh ideas about how to teach effectively with technology. The founders tend to be engineers or academics excited by things like memory retention curves and personalized tutoring. However, most of these startups fail because users don't stick. This isn't a new problem - textbooks and videos have been around for a long time, but only people with serious reasons for learning use them.

Duolingo's core insight was that for 95% of people, learning is a motivation problem first, and a pedagogical one only later. It unlocked a new market of tens of millions of users by making learning fun.

What does this mean in practice? At work, we have to actively curb our 'engineer instincts' - and look at what the data says about people's behavior - sometimes this means gently shackling our learning scientists to prevent them from adding verb conjugation exercises. This is not easy, especially for a company filled with tech and language nerds. It is why we've largely resisted explicit grammar instruction in favor of making people learn by doing. If a feature helps 5 people but repels 95 others, that is still a net loss of learning.

Another such example is investing heavily in delightful visuals and animations. These "frivolous" additions keep users in the seat long enough to build a habit and let the learning take effect. This is especially important in an attention market dominated by TikTok, Instagram and mobile games.

To preempt the snarky comments about Duolingo being too gamified with little learning value, I'd like to note that in this case, the incentives align - the most robust way to motivate users to come back is to ensure they are actually learning something valuable. This is the holy grail for the company and what the majority of employees work on each day. Whenever we roll out a new feature or content, we track the rate at which users progress down a course, how long they spend on exercises, how often they get things right, and a host of other metrics. We almost never launch a change that boosts engagement but hurts learning.

A/B Test Everything

Every change made to the app is rolled out to 50% of users first. Duolingo did not invent A/B testing, but the discipline with which we experiment on everything seems rare for a company of this size.

It is the norm to run experiments even on tiny changes. Just one example: we recently ran an experiment lowering the pitch of a character's voice by 5%, and found that it improved the accuracy rate of older users on listening exercises by 0.3%. We even run experiments on changes that should not be visible to users at all, like moving a backend database from one source to another, just to make sure nothing went wrong. If a change would take an hour to implement without an experiment, but several days of engineering work to implement as an experiment (e.g. to serve different experiences to different users), it is completely acceptable, heck, even expected that we take the longer route and run the experiment.

For each change, we look at its effect on a whole host of user metrics, and only launch the change if it is neutral or positive on most of them. In the long run, this creates a ratchet effect: each change pushes the app a notch higher on the metrics we track. For certain metrics, this has a compounding effect - each tiny increase in user retention rates causes a much larger increase in daily active users over time.

We have a very high bar for ignoring experiment data. If a Product Manager has a strong intuition that fails the A/B test, the experiment is killed. We respect the aggregate preferences of 50+ million users over the intuition of the PM.

When do we override experiment results?

Sometimes, we find that we've optimized ourselves into a local maximum, and any change that shifts us in a more promising direction would hurt metrics initially. In such cases, we register ahead of time that we intend to launch a change even if results are slightly negative, and keep track of the actual impact on users. We then run lots of sub-experiments or follow up experiments to plug the metrics impact over time. Having hard data keeps us accountable.

One example was the shift from 'Hearts' to 'Energy' as the main monetization lever in the app. Users used to have 5 hearts, and would lose 1 for each mistake they made. Once they ran out, users had to either practice, watch an ad, or wait several hours to regain hearts and be able to use Duolingo again. This was sunset in favor of 'Energy'. Energy is like a battery that slowly depletes as users use the app. Energy would not penalize users for making mistakes, but would worsen the experience for some power free users who rarely made errors.

The Hearts mechanism was finely tuned over the years by hundreds of experiments, which meant it would be hard for any change to Energy to be positive initially. That did indeed turn out to be the case - the number of active users and subscription bookings went up, but overall time spent learning on the app went down. However, we knew the magnitude of this loss, and over the next few months, teams launched dozens of follow-up experiments to bring this new system to parity with the old one.

Don't Take People Online Too Seriously

If Reddit were to be believed, every change Duolingo made since its inception has been terrible and would lead to the immediate downfall of the app. If the metrics tell us one story and online comments tell us another, we listen to the metrics.

One notable instance was when we rolled out Duolingo V2 in 2022, which was a complete redesign of how learning content was presented to learners. Instead of allowing users to pick and choose which topics they wanted to learn, V2 would prescribe a fixed path for users, reducing friction and baking spaced repetition into the app. Online sentiment at the time of its rollout was vociferously negative. Every day, there would be dozens of posts from people describing how they would end their 1000+ day long streaks and decamp to a different app. The feedback was certainly tough to read for the people working on this change. However, the data told us a different story. It showed that users were continuing to use the app at similar rates and were learning as much or more than ever. We ultimately chose to trust the metrics and launch the redesign.

3 years later, the app is still ticking with more people than ever, and the redesign opened up an entirely new lever for us to test different learning optimizations with. This was a personal lesson to me to always look at revealed preferences, not stated preferences from the most vocal users.

Double Down on What Works

At Duolingo, if we find that an idea has worked well for a small fraction of people or in a small corner of the app, we allocate more resources and people to test out scaling that idea as much as possible, as fast as possible. For example, if a certain exercise type has done well in a particular section of a course, we try to scale it to our biggest courses as soon as possible, starting the next day, not one week or one quarter later. Every unnecessary delay is lost utility to millions of people.

This point seems obvious, but sometimes it's worth stating explicitly, because inertia can be powerful. Even when a feature or idea has worked well, orgs tend to move on to the next project on the roadmap on autopilot.

There is a corollary to this principle, which is to stop doing what doesn't work. Even if an idea seemed like a surefire win, or even if new teams were specifically staffed up to work on it, Duolingo doesn't shy away from halting all work on the idea and moving people to a higher-alpha project if the gains turn out to be minimal.

Aggressively Create and Kill Teams

Working across team boundaries is hard. Especially when you try to convince someone on another team to do something for you when their own job does not depend on it. This is why we try as much as possible to have everyone needed for the success of a particular project on the same team, including product managers, designers and engineers.

This idea is not new - Amazon famously coined the phrase '2-pizza teams' to describe teams that are small but self-sufficient. However, I've anecdotally heard and seen lots of tales at orgs where this does not happen. For example, a friend at Google described having to coordinate with 2 other teams across 3 offices just to secure enough compute and support needed to run an ML experiment on Shopping Ads.

Applying this principle seriously means that we can't be beholden to a particular team structure. Since we try lots of new ideas at the company, doubling down on what works and stopping what doesn't, Duolingo is aggressive at shutting down and spinning up new teams. Quarterly 're-orgs' are frequent. In my area of the company, no team that exists at the time of publishing this post has been around for more than 6 months!

To be clear, this does have its downsides. It's common for teams to be handed old features to maintain from non-existent teams. If a bug pops up, the new teams may have to chase down the original creator, who probably never intended for their Google Sheets backed tool to now be used on TBs of data. Every now and then, something does fall through the cracks, with no clear owner of a legacy project.

Despite these downsides, in the low-risk domain that Duolingo operates in (no one dies if they can't learn Spanish for an hour), I think this model makes sense. It also has more subtle benefits - (i) it reduces tribalism. People are outspoken about killing projects because they are not beholden to maintaing the continued existence of their current team, (ii) people work with different colleagues over time, making future collaboration easier, and (iii) people work on things important to the company, which means they stick around longer.

If You Have a Valuable Distribution Channel, Milk It

People in developed countries no longer download new apps as much as they used to. After a flurry of activity in the decade where smartphones first came onto the scene, people have developed 'digital habits' and stick to the same few apps they've gotten used to. This means it is hard for any new app to get a foothold without tons of marketing spend.

The most important decision Duolingo made in recent years was to not develop a standalone app for each new product it was creating (Chess, Music and Math), and instead bundle everything together into the existing Duolingo app. This may seem obvious in hindsight, but at the time there were both product questions (does it make sense to teach shoehorn math into an app called Duo-'lingo'?) and technical questions (Can we fit everything in a downloadable app of manageable size? Will this turn our codebase into a Lovecraftian nightmare?) that had to be answered. Ultimately, an executive decision was made from the top to bundle everything together, and to do whatever it takes to make the technical aspect work.

The benefits were immediate. We had order(s) of magnitude more users in weeks on courses released after this decision (Math, Music and Chess) than we did in years on our older, standalone apps (Duolingo ABC for kids and the Duolingo English Test). This happened almost instantly, with zero dollars spent on marketing.

Getting a large pool of initial users also gave us economies of scale in a less-traditional sense. It gave A/B tests greater statistical power, which meant we could improve the app faster and more scientifically, which in turn attracted more users to the new courses. It also made subcriptions more valuable instantly, because a subscription that previously only worked for learning languages could now be used on math, music and chess. Lastly, expensive changes like animation improvements that would only make sense for a large pool of language learning users could now be borrowed for nearly free for these new subjects.

It Doesn't Hurt To Be Nice

I've often heard Duolingo's culture be compared to the early years of Google, probably because a good chunk of senior leadership originated from there and took a leaf from its playbook. Beyond the colorful offices and free food, working culture at Duolingo has stayed relatively sane, unlike some AI startups now in the news for borrowing China's 996-schedule. In addition to 20 days of vacation, the entire company is flown out to Cancun for a 3 day vacation every Spring, and everyone gets 2 weeks off around Christmas and New Year's. Oh, and did I mention that every employee gets an extra week-long, all-expenses paid vacation to Europe, Asia or Mexico after 2 years? People work with urgency to get their work done, but unless the site is broken, no one is expected to respond after hours.

Admittedly, the company has not gone through truly adverse times, so I can't predict how the culture would change if that were to happen, but I've heard that the culture was relatively normal even in the early years when the company was not making any money. At the very least, it offers existence proof that you can build a successful company while treating your employees decently (I swear this is not an ad for Duolingo and I'm not being held at gunpoint to write this!).

What Next?

Duolingo is not perfect. Some select areas where we're not yet as good as we'd like to be (opinions are mine, not the company's):
  • Personalization. There is room to customize the platform more to reach users on either end of the motivation and learning ability curves. For example, we could do better for power users who are really motivated to learn a new language.
  • We are still in early stages in our journey to leverage LLMs in a cost efficient way to allow non-subscribers, a.k.a free users to practice language production, like speaking and writing.
  • We are good at reinforcing concepts for math learners, but not yet great at teaching math from scratch in a smartphone form factor.
Luis (Duolingo's CEO) likes to say that the company's ultimate goal is to teach as well as the best human tutors. We're still a ways away from that, but making steady progress, with a clear line of sight to the goal.