Easy Time-Series Analysis with TDengine Database

Question

Easy Time-Series Analysis with TDengine Database

asked Jul 12, 2023 by DiegoBazley7 (280 points)

SELECT ts, pm25-DIFF(pm25), pm25 FROM weather.p1;
You can, of course, do the same thing if you are setting up a multivariate series as well. So for example, if you are looking at PM25, CO, NO2 and windspeed, you can do the following:

"I’m thrilled to see that TDengine has been selected as one of the most promising startups," said Jeff Tao, founder and core developer of TDengine. "This is another indication that industry insiders understand the potential of TDengine to power modern data workflows in IoT and other rapidly growing markets. I’m confident that TDengine is well positioned to continue making technical as well as business breakthroughs going forward."

With version 3.0.1.5, TDengine Database offers support for macOS in addition to Linux and Windows. You can now test your applications more efficiently and connect to a TDengine Server running on any operating system from your own MacBook.

Huffman coding:
This is one of the most widespread compression schemes, first described in the 1850s by David Huffman in his paper "A Method for Constructing Minimal Redundancy Codes". Huffman coding works by finding the optimal prefix code for a given alphabet. It should be noted that a prefix code here represents a value, and it is necessary to ensure that the prefix code of each symbol in the alphabet does not become the prefix of another symbol prefix code. For example, if 0 is the prefix code for our first symbol, A, then no other symbol in the alphabet can start with 0. This mechanism is also useful because the prefix code makes decoding of the bitstream unambiguous.
Run-length encoding (RLE)
A lossless data compression technique independent of the nature of the data, based on "replacing the original data that recurs continuously with a code of varying length" to achieve compression. For example, a set of string "AAAABBBCCDEEEE" consists of 4 A , 3 B , 2 C , 1 D , and 4 E. After RLE, the string can be compressed into 4A3B2C1D4E. Its advantages are simplicity, high speed, and the ability to compress continuous and highly repetitive data into small units. The disadvantage is also obvious, the data compression effect with low repetition is not good.
XOR
This algorithm is a specific algorithm designed in combination with the data characteristics that follow the IEEE754 standard floating-point storage format. The first value is not compressed, and the following values are the result of XOR (exclusive OR) with the first value. If the result is the same, only the storage A 0 to store the XORed result if the result is different. The algorithm is greatly affected by data fluctuation, the more severe the fluctuation, the worse the compression effect.
Delta
Differential encoding is also called incremental encoding. The first data is unchanged during encoding, and other data are converted into the delta of the previous data. The principle is similar to XOR, both of which are to calculate the difference between two adjacent data. This algorithm is widely used, such as when you need to see the history of changes to a file (version control, Git, etc.). However, this algorithm is rarely used alone in time series databases, and is generally used with RLE, Simple8b or Zig-zag, and the compression effect is better.
Delta-of-delta
Also known as second-order differential encoding, Delta encoding is used again on the basis of Delta encoding, which is more suitable for encoding monotonically increasing or decreasing sequence data. For example, 2,4,4,6,8 , 2,2,0,2,2 after delta encoding, and 2,0,-2,2,0 after delta encoding. Usually it is also used with RLE, Simple8b or Zig-zag.
Zig-zag
Zig-zag is to solve the problem of low coding efficiency for negative numbers by the varint algorithm. Its principle is very simple. It moves the flag bit to the end and removes the extra 0 in the coding, so as to achieve the effect of compression. For relatively small values, the compression efficiency is very high, but for large data, the efficiency may not be improved but may also be reduced. Therefore, Zig-zag is often paired with Delta encoding, which works well to convert large numerical data into smaller ones
Snappy
The Snappy compression algorithm draws on the idea of the LZ77 algorithm. Due to the high time complexity of the pattern matching process in the LZ77 algorithm, Google has made many optimizations for it and tdengine Time Series Database opened it up in 2011. The principle is to assume that we have a sequence S=[9,1,2,3,4,5,1,2,3,4], and find the subsequence S~2,5~=[1,2,3, 4] is the same as S~7,10~=[1,2,3,4], so the sequence is encoded as S=[9,1,2,3,4,5,6,(2,4 )], tdengine 2 is the start position, and 4 is the number of digits. Snappy has the advantages of fast speed and reasonable compression rate, and is widely used in many open source projects, such as Cassandra, Hadoop, MongoDB, RocksDB , Spark, InfluxDB , etc.
LZ4
The LZ4 data compression algorithm belongs to the byte-oriented LZ77 compression scheme family, the compression ratio is not high, and it is characterized by extremely fast decoding speed. According to the test results of the official test benchmark lzbench , the decompression speed is as high as 4970MB/s under the default configuration.
Simple8b
Simple8b is a 64-bit algorithm that implements compressing multiple integers (between 0 and 1<<60-1) into a 64-bit storage structure. The first 4 bits represent the selector, and the last 60 bits are used to store data. The advantage is that it is simple and efficient, and the fixed-length encoding ensures the decompression efficiency, but the compression rate is low for large integers or large floating values. This algorithm is suitable for unsigned integers with a small range.<br>LZO
LZO is a block compression algorithm that also belongs to the LZ77 family of compression schemes. The goal of this algorithm is fast compression and decompression, not compression ratio. In contrast, LZ4 decompresses faster. Because the data types stored in the block may be varied, the overall compression effect is far less good than the compression algorithm for a certain data type.
DEFLATE
DEFLATE is a classic lossless data compression algorithm that uses both the LZ77 algorithm and Huffman Coding. In fact, DEFLATE is just an algorithm for compressing data streams. This algorithm is the default algorithm for zip file compression, and it is encapsulated in algorithms such as gzip and zlib .
Zstandard
Zstandard ( Zstd ) is designed to provide a compression ratio similar to the DEFLATE algorithm, but faster, especially for decompression. Its compression level can be adjusted from minus 5 (fastest) to 22 (slowest compression, but the highest compression ratio). In text log compression scenarios, its compression performance is comparable to or even better than LZ4 and Snappy. Linux kernel, HTTP protocol, Hadoop, HBase, etc. have all added support for Zstd . It is foreseeable that Zstd will be a compression algorithm that will be widely concerned in the next few years.
Bit packing
The Bit-packing compression algorithm removes unnecessary bits from the data we want to compress, based on the premise that not all integers need 32-bit or 64-bit storage. For example, a 32-bit integer data whose value range is between [0,100] can be represented by 7 bits.
TDengine implementation
TDengine Database provides three compression options: no compression, one-stage compression, and two-stage compression.

Easy Time-Series Analysis with TDengine Database

Your answer

0 Answers