Connectors and analytics with Kafka

Date: 
Wednesday, March 20, 2019 - 18:30
Source: 
Apache Kafka London
Attendees: 
133
City: 
London

Join us on Slack (https://kafka-london-invites.herokuapp.com/)

6.30pm - Doors open, Food + Drinks, Network

7.00pm - Talk - "Lessons Learned Building a Connector Using Kafka Connect" with Katherine Stanley and Andrew Schofield from IBM

While many companies are embracing Apache Kafka as their core event streaming platform they may still have events they want to unlock in other systems. Kafka Connect provides a common API for developers to do just that and the number of open-source connectors available is growing rapidly. The IBM MQ sink and source connectors allow you to flow messages between your Apache Kafka cluster and your IBM MQ queues. In this session, we will share our lessons learned and top tips for building a Kafka Connect connector. We will explain how a connector is structured, how the framework calls it and some of the things to consider when providing configuration options. The more Kafka Connect connectors the community creates the better, as it will enable everyone to unlock the events in their existing systems.

Katherine Stanley is a Software Engineer in the IBM Event Streams team based in the UK. Through her work on IBM Event Streams she has gained knowledge about running Apache Kafka on Kubernetes and running enterprise Kafka applications. In her previous role she specialised in cloud native Java applications and microservices architectures. Katherine has co-authored an IBM Redbook on Java microservices and has contributed to the open source microservice project Game On. She enjoys sharing her experiences and has presented at conferences around the world, including JavaOne in San Francisco, DevoxxUK and OSCON in London and JFokus in Sweden.

Andrew Schofield is a Senior Technical Staff Member in the Hybrid Integration group of IBM Cloud. He has more than 25 years of experience in messaging middleware and the Internet of Things, with particular expertise in the areas of data integrity, transactions, high availability and performance. Andrew is an active contributor to Apache Kafka. He works at the Hursley Park laboratory in England.

7:50 pm - Talk - "Building a Streaming Analytics Stack with Kafka and Druid" with Rachel Pedreschi from Imply.io

The maturation and development of open source technologies has made it easier than ever for companies to derive insights from vast quantities of data. In this talk, we will cover how data analytic stacks have evolved from data warehouses, to data lakes, and to more modern stream-oriented analytic stacks. We will also discuss building such a stack using Apache Kafka and Apache Druid.

Analytics pipelines running purely on Hadoop can suffer from hours of data lag. Initial attempts to solve this problem often lead to inflexible solutions, where the queries must be known ahead of time, or fragile solutions where the integrity of the data cannot be assured. Combining Hadoop with Kafka and Druid can guarantee system availability, maintain data integrity, and support fast and flexible queries.

In the described system, Kafka provides a fast message bus and is the delivery point for machine-generated event streams. Kafka streams can be used to manipulated data to load into Druid. Druid provides flexible, highly available, low-latency queries.

This talk is based on our real-world experiences building out such a stack for many use cases across many industries in the real world.

Rachel is the Worldwide Director of Field Engineering at Imply. A “Big Data Geek-ette,” Rachel is no stranger to the world of high-performance databases and data warehouses. She is a Cassandra, Vertica, Informix and Redbrick certified DBA on top of her work with Druid and has more than 20 years of business intelligence and ETL tool experience. Rachel has an MBA from San Francisco State University and a BA in Mathematics from University of California, Santa Cruz.

Confluent

1 Bedford Street