PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

Latest

Using rquery On Databricks

R Query

Introduction In this blog, we will introduce rquery, a powerful query tool that allows R users to implement powerful data transformations using Apache Spark on Databricks. rquery is based on Edgar F. Codd’s relational algebra, informed by our experiences using SQL and R packages such as dplyr at big data scale. Data Transformation and Codd’s Relational Algebra rquery ...

A day at the zoo – Graphic UI’s for Apache Zookeeper.

Apache Zookeeper may not be the most interesting and appealing service but it plays a major role in many distributed systems like Hadoop or Kafka that use it to synchronize between different nodes and to store their state. Zookeeper resembles a filesystem. It has znodes where each znodes can contain leaf znodes (analogous to files) or ...

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 2

apache-kafka-logo

Implementing a Schema Store In Part 1, we saw the need for an Apache Avro schema provider but did not implement one. In this part we will implement a schema provider that works with Apache Kafka as storage. In-Memory SchemaStore First we can implement an in-memory store for schemas. This is useful to understand the requirements for such ...

Introducing Cloudera Altus SDX (Beta)

The motivation behind Cloudera Altus SDX is to enable multiple clusters to share the same consistent view of enterprise data hosted on Amazon S3 and Microsoft ADLS. At the heart of Altus SDX is a repository of attributes describing locations and structure of data, access rights, business glossary definitions, lineage and more. We often hear from ...

Robust Message Serialization in Apache Kafka Using Apache Avro, Part 1

apache-kafka-logo

In Apache Kafka, Java applications called producers write structured messages to a Kafka cluster (made up of brokers). Similarly, Java applications called consumers read these messages from the same cluster.  In some organizations, there are different groups in charge of writing and managing the producers and consumers. In such cases, one major pain point can ...

Announcing IBM Big Replicate v2.12

ibmreplicate

Today (July 10, 2018), I am very excited to announce the the GA of IBM Big Replicate for Hadoop 2.12, IBM Big Replicate for Object Stores 2.12, IBM Big Replicate for Hive 2.0, and IBM Big Replicate for Security 2.0 coming on July 13th, 2018! Big Replicate is a replication technology that gives you LIVE DATA ...

Db2 Big SQL and Big Replicate Newsletter – July 10th, 2018

Big SQL is a hybrid SQL on Hadoop engine delivering advanced data query for the enterprise. Use a single database connection or query for disparate sources such as HDFS, RDMS, NoSQL databases, object stores and WebHDFS. Benefit from low latency, high performance, security, SQL compatibility, federation capabilities and the ability to do ad-hoc and complex ...