DSF Meetup with Busuu

Tuesday, October 16, 2018 - 18:00
Data Science Festival - London

Ever wondered how language learning apps like Busuu decide which words are burned into your memory and which need to be reviewed? Join Data Science Festival - London, in partnership with Busuu October 16th to learn just how they do it!


Please click here to apply for a ticket: https://www.datasciencefestival.com/event/live/2018/dsf-meetup-with-busuu/

Those randomly selected and approved will then be e-mailed tickets for the event. If you do not receive an approval e-mail from us by the 12th of October 2018 you have been unsuccessful in getting a ticket for this event.

If you get an allocated ticket, please bring a copy of your paper ticket or your ticket on your phone to the event to check in with your QR code. Tickets are non-transferable.


6:00pm Guests arrive
6:30pm - Tom Richardson
7:15pm - Break
7:45pm - Sole Galli
8:30pm - Networking
9:00pm - Close

Speaker 1: Tom Richardson - Senior Data Scientist

Modelling Memory with Busuu's Vocab Trainer

Summary: Ever wondered how language learning apps like Busuu decide which words are burned into your memory and which need to be reviewed? In this talk we describe the spaced repetition model behind Busuu’s Vocab Trainer which predicts the rate at which your memory of a word decays. We then use model predictions on real users to answer questions like, “Which words do users find the most difficult?”, “Which exercises in the app are the best measures for ‘knowing’ a word?” and “How do learning patterns change from country to country?”.

Bio: With a background spanning elections forecasting, data science consultancy and trying in vain to figure out what dark matter is, Tom is the stats and ML guy at Busuu. His current ML projects include dynamic discounting, lifetime value modelling and algorithm design for the vocab trainer.

Speaker 2: Sole Galli

Engineering and selecting features for machine learning

Summary: We use machine learning algorithms to determine patterns in past data and then predict behaviour in future observations. However, the data available in business is generally not ready for use in machine learning modelling. On the contrary, we typically devote an extensive amount of time in pre-processing and selecting the variables that we will finally feed into our models. What are the typical problems we find in data? And what are the advantages of selecting variables to build models for in business? In this talk, I will describe common data issues for numerical and categorical variables, highlighting which machine learning models are susceptible. I will introduce and compare different feature engineering techniques for imputation of missing data, processing of outliers and encoding of categorical variables. I will continue with an overview of different feature selection procedures, focusing on the limitations and advantages of each technique. By the end of the talk, I hope to give you a flavour of variable preprocessing and selection for building business models that can be put in production.

Bio: Soledad is a Lead Data Scientist at LV=, with 2+ years of experience in data science and analytics in the financial sector, and 10+ years of experience in scientific research in academia. She is passionate about extracting meaningful information from data and supporting institutions make solid and reliable data driven decisions. At LV=, Soledad and the data science team are leading the implementation of machine learning across the multiple company business areas. Having transitioned from academia to data science, Soledad is passionate about enabling and facilitating data scientists and academics transition into the field, and helping data scientists increase their breath of knowledge. During the last year, Soledad shared insight in blogs and talks in the data science community.


50 Finsbury Square, EC2A 1HD