Januar 2019

Next Scala Meetup: LuceneRDD for Search and Entity Linkage

In this talk, the design and implementation of LuceneRDD for Apache Spark will be presented. […]

November 2018

Robert, willkommen bei Tegonal!

Robert Stoll ist unser neustes Teammitglied […]

November 2018

OpenOlitor presentation at Bits & Bäume

Sunday 18th of November 2018 OpenOlitor will be presented at Bits & Bäume […]

November 2018

Lasius - Open Source Webapplikation zur Administration von Organisationen

Mit Lasius lassen sich wiederkehrende Leistungen einfach administrieren […]

September 2018

Scala Days 2018

Review der Scala Days 2018 […]

August 2018

Scala Meetup

Scala.JS Konzepte und Hands-on […]

August 2018


Darf es auch mal Clojure sein? […]

Mai 2018

Pensionskasse: Anlagestrategie ist keine Geheimsache

Ethische, ökologische und transparente Anlagestrategie für unsere Vorsorgegelder […]

Zum Newsarchiv

Scala Meetup: Traceable Data Entities with Spark 2.x

One of the most common abstraction for a big data platform is a “Data Lake”. Data is brought into the lake, then it’s filtered, parsed, transformed and in the process many more data assets are created. Metadata describes the data and with the growing amount of data, it is becoming more and more important and harder to properly describe the data, the schema and the data lineage.

Spark (2.x) is one of the most important tool in data engineer’s/scientist's toolbox, but currently it offers very little help how to connect the input data sources the output data sources.

At Sqooba they decided to extend spark’s built-in event mechanism to get more granular data about when data is used as an input or output for a spark application and that allows us to use event listeners to update the relevant entities on Apache Atlas to get real time Data Lineage and Metadata.

Bern Scala User Group