Scala Meetup: Traceable Data Entities with Spark 2.x

One of the most common abstraction for a big data platform is a “Data Lake”. Data is brought into the lake, then it’s filtered, parsed, transformed and in the process many more data assets are created. Metadata describes the data and with the growing amount of data, it is becoming more and more important and harder to properly describe the data, the schema and the data lineage.

Spark (2.x) is one of the most important tool in data engineer’s/scientist's toolbox, but currently it offers very little help how to connect the input data sources the output data sources.

At Sqooba they decided to extend spark’s built-in event mechanism to get more granular data about when data is used as an input or output for a spark application and that allows us to use event listeners to update the relevant entities on Apache Atlas to get real time Data Lineage and Metadata.

Bern Scala User Group


März 2021

Tegonal sucht Dich!

Wir suchen ein neues Teammitglied, das die Begeisterung für unser Handwerk teilt […]

Februar 2021

120000 Körbe mit OpenOlitor ausgeliefert!

Im Jahr 2020 hat die Regionale Vertragslandwirtschaft über 120000 Körbe […]

Januar 2021

Unsere 10

Tegonal unterstütz auch in diesem Jahr wieder 10 OpenSource Projekte finanziell […]

Oktober 2020

SwissCovid App und der Datenschutz

Sollen Sie die App installieren? Wir finden: Ja, unbedingt! […]

Juli 2020

DNS over TLS

Weshalb wir DoT einsetzen […]

Juni 2020

Neue Gesellschafter

Fabian und Robert ergänzen das Partnerteam von Tegonal […]

Mai 2020


Die PhaenoNet App ist online […]

Zum Newsarchiv