Time series data is a sequence of measurements or events collected over a period of time. A time series can also show how a value has changed over time. Evaluating this data type, whether it be graphed or listed, can be challenging because the data sets are typically very large, change continuously, and are also time-sensitive. An important, but less evident property of such data is multi-dimensionality.
It’s important to remember that time-series data is a record of what has happened, not a projection of what will come. However, the records contained in time series can help you make educated guesses about what might happen in the future based upon extrapolating information from the data.
Examples and Uses of Time Series Data
Internet of Things
The Internet of Things (IoT) heavily utilizes time-series data. Some examples include data from automobiles, mobile devices, e-commerce applications, and inventory management systems that are all time-stamped based on an event. This time-series data needs to be ingested rapidly as some remote devices or metrics are captured continuously and then kept for analysis.
Time series data is used for monitoring computer system metrics. It reads information from computer systems that allow people to monitor their status. These metrics may include memory utilization and process count to see how computer resources are being utilized and determine if they need to be reallocated.
Key performance indicators (KPIs) are time-oriented and sampled repeatedly and fit easily into time series data. Some examples of these KPIs may include profit, revenue, cost, conversion rate, number of transactions and average order value. Once this information is collected and stored, it can be used to create dashboards.
Anomaly detection is used to find unexpected deviations within the time series data. Time series data contains a recorded value of every change to a system. It allows organizations to measure the change, determine how something changed in the past, monitor what is happening in the present and predict what will happen in the future.
When data is laid out in a time series plot, people can spot outliers visually. However, automated anomaly detection speeds up the process and offers real-world insights in real-time, allowing you to correlate the outliers more quickly.
What Is a Time Series Database?
A time series database is designed to handle time series data and is optimized for measuring changes that occur over time. One of the things that makes a time series database vary from other types of database workloads is the ability to handle large range scans, data summarization and data lifecycle management.
A time series database consists of state change information that is indexed by time.
A time series database consists of state change information that is indexed by time. It usually consists of a subject, the point in time and the measurements. When writing time series data, high throughput is critical for the database requiring it to be continuous and concurrent. This is important because writes make up roughly 95% of database operations.
When reading data, each read is by time range with the more recent data being more likely to be read. These hot and cold characteristics allow for separation. It also offers multi-precision where the shorter the time interval, the greater the precision and the more detailed and accurate. This can lead to multidimensional analysis and is great for data mining.
Why You Need Time Series Databases
The initial time series databases were designed primarily for financial data and to study stock volatility. However, the uses have grown beyond financial data to various industries. This is especially true with the Internet of Things, with devices generating time series data from around the world and transmitting it to be stored in databases.
This has required that databases evolve to handle these new workloads and the additional increase in data sources, data points and improved monitoring and controls.
This built-in purpose allows for a scalable database for time series data and can make it cheaper and easier to store the information, while making it easier to query and then analyze that data.
What Makes Time Series Databases Different from Other Types of Databases?
Time series databases are architecturally different from other types of databases. They’re built to provide data storage for time-stamped information and to be able to handle the large time series scans with numerous records.
Time series databases are optimized for providing millisecond-level query times when working with long periods of high resolution data that may include millions of time points. They’re also able to maintain high precision data for a short period of time, which can be aggregated and then down-sampled to create long-term trend data. As part of this process, a period of data is deleted once it has been summarized.
Application developers may find it difficult to implement this type of data lifecycle management with traditional databases. In order to do this with a traditional database, application developers have to create methods to summarize data at scale and then efficiently delete the larger detail data sets. However, with a time series database, this type of functionality already exists within the system.
Widely Used Time Series Databases
InfluxDB is optimized for time series data. In order to do this, it uses a simplified version of conflict resolution. It restricts access to deletes and updates and adds data in ascending order. All of these help to improve write performance. However, there are trade-offs for this improved write performance. For example, the database is not able to store duplicate data. It also restricts some functionality like deletes and updates. With InfluxDB, if you need to write data with random timestamps, it will decrease performance.
TimeScaleDB is an open-source time series database that is tied to PostgreSQL and based on SQL. It offers several supported programming languages including Python and Java. Since it is based on SQL, it allows developers with SQL experience to start using the database immediately with less of a learning curve. TimeScaleDB is able to scale similar to PostgreSQL, however, it also offers time series operations like fast ingest. TimeScaleDB also inherits many of the plugins and tools that were built by the sizable PostgreSQL community.
Redis recently introduced their own time series database called RedisTimeSeries. It simplifies the Redis database specifically for time series. RedisTimeSeries has the ability to ingest and then process millions of time-stamped records per second. It also offers downsampling and aggregation that ensures a minimal footprint. RedisTimeSeries also offers labeling and search techniques as well as increment and decrement counter operations.
Cassandra is an open-source, NoSQL database. It can quickly ingest and then process large amounts of data. Cassandra also offers high write speeds. Technically, Cassandra is not a time series database. However, it is useful for time series data because of its high durability and scalability. Its clustering column feature helps in storing ordered time series data even though it was not specifically designed for this purpose.
Time series data is not new. It has been around for as long as the databases themselves. However, time series data has grown in popularity over the last 10 years. This is mainly due to the growth of the Internet of Things and the billions of devices that create and send time series data from sensors all over the world.