Scraping the web with Puppeteer

Date: 
Thursday, February 14, 2019 - 19:00
Source: 
Journocoders
Attendees: 
47
City: 
London

This month, we'll be learning to scrape data from websites using the Puppeteer library.

Puppeteer is a library that lets us control the Chrome browser using code. We'll be writing that code using Node, which lets us run the Javascript language on our computer. We'll cover cover how to extract data from simple sites such as Wikipedia as well as more complex ones such as Reddit.

Bring a laptop along as this is a practical, hands-on workshop. No programming experience is required. However, if you feel comfortable doing so, please install Node (https://nodejs.org/en/download/current/) and a text editor such as Atom (https://atom.io/). If you have trouble doing that, don't worry, we will help you on the day.

Make sure you're signed up to Dropbox to view our shared doc (https://bit.ly/journocoders-feb-2019). Then please add links to the show and tell section! These can be stories, analysis, announcements -- anything you think others might be interested in.

Schedule

7:00 Doors open
7:30 Show and tell
7:40 Tutorial
9:00 Drinks at the George

What is Journocoders?

Journocoders is a monthly meetup for journalists and others working in the media to learn and share technical skills for use in their reporting. We aim to bring the culture of knowledge sharing from the tech industry to journalism and give people the chance to learn and network in a supportive environment with likeminded individuals.

The News Building

1 London Bridge Street, London, SE1 9GF