Skip main navigation

Introduction to IoT Data Analytics and Storage

.

In this activity, you’ll learn about IoT data storage options and the basics of IoT data analytics in Azure IoT Hub, as well as IoT Business Intelligence.

In this step, we’ll learn about:

  • IoT data storage options
  • The basics of IoT data analytics in Azure IoT Hub.

Introduction to Data Storage

There are several cloud storage options to consider when planning a storage solution for your IoT data.

Azure Cosmos DB

Azure Cosmos DB is a multimodel storage option that includes a fully managed NoSQL database service that provides rich and familiar SQL query capabilities with consistent low latencies on JSON data. Cosmos DB is a great fit for IoT solutions and many other types of applications that need seamless scale and global replication.

SQL Database

SQL Database is a relational database service in the Microsoft cloud based on the Microsoft SQL Server engine and capable of handling mission-critical workloads. SQL Database delivers predictable performance at multiple service levels, dynamic scalability with no downtime, built-in business continuity, and data protection–all with near-zero administration. These capabilities allow you to focus on rapid app development and accelerating your time to market, rather than allocating precious time and resources to managing virtual machines and infrastructure. Because SQL Database is based on the SQL Server engine, SQL Database supports existing SQL Server tools, libraries, and APIs.

For more information, see SQL Database Documentation.

Azure Storage

Azure storage provides the following services that can be used in your IoT solutions:

Blob Storage stores unstructured object data. A blob can be any type of text or binary data such as a document, media file, or application installer. Blob storage is also referred to as object storage.

Table Storage stores structured datasets. Table storage is a NoSQL key-attribute data store which allows for rapid development and fast access to large quantities of data.

Queue Storage provides reliable messaging for workflow processing and for communication between components of cloud services.

For more information, see Azure Storage Documentation.

Azure Data Lake Store

Azure Data Lake Store is an enterprise-wide hyper-scale repository for big data analytic workloads. Azure Data Lake Store enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.

Azure Data Lake Store provides unlimited storage and is suitable for storing a variety of data for analytics. It does not impose any limits on account sizes, file sizes, or the amount of data that can be stored in a data lake. Individual files can range from kilobyte to petabytes in size, making it a great choice to store any type of data. Data is stored durably by making multiple copies and there’s no limit on the duration of time for which the data can be stored in the data lake.

For more information, see Data Lake Store Documentation.

Data Analytics and IoT

Being able to run analytics on data in real-time and generate alerts is a key component of most IoT solutions.

Stream Analytics Job

Azure Stream Analytics is a fully managed, real-time event processing engine that helps you to unlock deep insights from your data. Stream Analytics enables you to set up real-time analytic computations on data streaming from devices, sensors, applications, and more.

Diagram using icons to depicts stream analytics

The Azure portal enables you to create a Stream Analytics job using the same methods that you would use to add any other service. Once the service is deployed to your resource group, you are presented with a blade that can be used to specify the input source of the streaming data, the output sink for the results of your job, and a SQL-like query expression that can be modified to transform your data. You can monitor and adjust the scale/speed of your job in the Azure portal to scale from a few kilobytes to a gigabyte or more of events processed per second. Your Stream Analytics jobs are backed by highly tuned streaming engines for time-sensitive processing.

Scenarios of real-time streaming analytics can be found across all industries: personalised, real-time stock-trading analysis and alerts offered by financial services companies; real-time fraud detection; data and identity protection services; reliable ingestion and analysis of data generated by sensors and actuators embedded in physical objects (IoT); web clickstream analytics; and customer relationship management (CRM) applications issuing alerts when customer experience within a time frame is degraded.

Configuring Inputs

The data connection to Stream Analytics is a data stream of events from a data source. This is called an input. Stream Analytics has first-class integration with Azure data stream sources Event Hub, IoT Hub, and Blob storage that can be from the same or different Azure subscription as your analytics job.

As data is pushed to a data source, it is consumed by the Stream Analytics job and processed in real-time. Inputs are divided into two distinct types: data stream inputs and reference data inputs.

  • Data stream inputs: A data stream is unbounded sequence of events coming over time. Stream Analytics jobs must include at least one data stream input to be consumed and transformed by the job. Blob storage, Event Hubs, and IoT Hubs are supported as data stream input sources. Event Hubs are used to collect event streams from multiple devices and services, such as social media activity feeds, stock trade information or data from sensors. IoT Hubs are optimized to collect data from connected devices in IoT scenarios. Blob storage can be used as an input source for ingesting bulk data as a stream.
  • Reference data: Stream Analytics supports a second type of input known as reference data. This is auxiliary data which is either static or slowly changing over time and is typically used for performing correlation and look-ups. Azure Blob storage is currently the only supported input source for reference data. Reference data source blobs are limited to 100MB in size.

For more information on configuring inputs, click here.

Configuring Outputs

When authoring a Stream Analytics job, consider how the resulting data will be consumed. How will you view the results of the Stream Analytics job and where will you store it?

In order to enable a variety of application patterns, Azure Stream Analytics has different options for storing output and viewing analysis results. This makes it easy to view job output and gives you flexibility in the consumption and storage of the job output for data warehousing and other purposes. Any output configured in the job must exist before the job is started and events start flowing. For example, if you use blob storage as an output, the job will not create a storage account automatically. It needs to be created by the user before the ASA job is started.

We looked at various storage options above so refer to that content for your options.

Configuring Queries

Queries in Azure Stream Analytics are expressed in a SQL-like query language, which is documented in the Stream Analytics Query Language Reference guide. Using the Stream Analytics query language in the in-browser query editor, you get IntelliSense auto-complete to help you quickly and easily implement time series queries, including temporal-based joins, windowed aggregates, temporal filters, and other common operations such as joins, aggregates, projections, and filters. In addition, in-browser query testing against a sample data file enables quick, iterative development.

For an explanation of how to implement Query patterns that support the real-world scenarios listed below, review Query examples for common Stream Analytics usage patterns.

We’ve provided an overview of the many data storage options available for your IoT solution. Each has a specific purpose and you may use one or many of these options depending on the needs of your architecture. We also covered ways in which you can process data coming from your devices. We’ll talk more about how to present the data you’ve collected and stored in the next step.

You can learn more about how to work with data in a course devoted to this topic in this ExpertTrack. Whilst this gives you a taste of what’s available, you’ll need to explore the topic more deeply in other courses in order to learn how to use these tools and services in your implementation.

This article is from the free online

Microsoft Future Ready: Fundamentals of Internet of Things (IoT)

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education