PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

New in Cloudera 5.15: Simplifying the end user Data Catalog for the Self Service Analytic Database

Self-service BI and exploratory analytics are some of the most common use cases we see our customers running on Cloudera’s analytic database solution. Over the past year, we made significant advancements to provide a simpler user experience for SQL developers and make them more productive for their everyday self-service BI tasks and workflows by leveraging Hue as the SQL development workbench.

With the recent release of Cloudera 5.15, we continued to improve the query experience with Hue, focusing on easier discoverability of the data and parameterization of shared queries. In addition we continue to add a rich set of improvements and fixes for a smooth usage and transition from less flexible legacy tools.

Read on to learn more about the improvements in this new release and try it out with one-click at demo.gethue.com.

Data Catalog Exploration

Before typing any query to get insights, users need to find and explore the correct datasets. The Data Catalog search experience was introduced in Cloudera 5.11 and its usability has since been improved in each release.

The top bar of the interface offers free text search of SQL table and column names, as well as custom tags. You can also speed up your repeated use by utilizing the saved queries functionality. These features are particularly useful for quickly looking up a table among thousands or finding existing queries already analysing a certain dataset.

The most recent improvements include that the search experience now provides more results directly via the ‘Show more’ link. Existing tags can also now be faceted simply by typing ‘tags:’, to further speed up the discovery process.

Some example of searches:

  • usage → Returns any table matching ‘usage’ in its name, description, or tags.
  • type:view customer → Finds the view named ‘customer’
  • tax* tags:finance → Lists all the tables and views starting with ‘tax’ and tagged with ‘finance’
Searching all the available queries or data in the cluster

Searching all the available queries or data in the cluster

Listing the possible tags to filter on. This also works for 'types'

Listing the possible tags to filter on. This also works for ‘types’

Unification and Caching of all SQL metadata

The list of tables and their columns is displayed in multiple parts of the interface. This data is pretty costly to fetch and comes from different sources. In this new version, the information is cached and reused by all the Hue interface components. As the sources are diverse, e.g. Apache Hive, Cloudera Navigator, Cloudera Optimizer the returning metadata is stored into a single object, so that it is easier and faster to display without caring about the underlying technical details.

In addition to editing the tags of any SQL objects – such as tables, views, and columns – which has been available since 5.11, table descriptions can now also be edited. This allows self service documentation of the metadata by the end users, which was not possible until now as directly editing Hive comments require Sentry admin privileges which are not granted to regular users in a secure cluster.

Showing all the common data now cached and unified for a slicker experience

Showing all the common data now cached and unified for a slicker experience

SQL Editor Variables

A popular feature used after querying data and finding results is to share the queries with other collaborators. Sharing these queries was made easier with parameterization as detailed in the previous 5.14 blog post.

e.g. select * from web_logs where country_code = "${country_code=CA, FR, US}"

e.g. select * from web_logs where country_code = “${country_code=CA, FR, US}”

Now, they are even simpler to use in edition mode. Thanks to Hue’s SQL parser that implements 95% of Impala and Hive’s grammar, the editor can know which columns are tied to a variable and provide with one click a sample of values or a calendar widget. This is done automatically, depending on the type (e.g. a string or a date), and allow faster editing as there is no typing involved.

Clicking on the name of a variable shows a context popover

Clicking on the name of a variable shows a context popover

If the variable is a date or timestamp, the user gets a friendly calendar

If the variable is a date or timestamp, the user gets a friendly calendar

On top of these improvements, the upstream documentation was restyled and made more searchable.

We hope that this new version of the Analytic DB interface makes self-service data discovery and analytics easier and faster. If you have any questions or feedback, feel free to comment here, on the community forum or via @gethue!

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

Let’s block ads! (Why?)