Redis data types

来自WHY42
Riguz留言 | 贡献2021年6月28日 (一) 12:40的版本 →‎时间复杂度

基本数据类型

Redis的数据类型如下:

Binary-safe strings
Strings in Redis are binary safe, meaning they have a known length not determined by any special terminating characters. Thus, you can store anything up to 512 megabytes in one string.
Lists
collections of string elements sorted according to the order of insertion. They are basically linked lists.
Sets
collections of unique, unsorted string elements.
Sorted sets
similar to Sets but where every string element is associated to a floating number value, called . The elements are always taken sorted by their score, so unlike Sets it is possible to retrieve a range of elements
Hashes
which are maps composed of fields associated with values. Both the field and the value are strings.
Bit arrays(bitmaps)
it is possible, using special commands, to handle String values like an array of bits: you can set and clear individual bits, count all the bits set to 1, find the first set or unset bit, and so forth.
HyperLogLogs
this is a probabilistic data structure which is used in order to estimate the cardinality of a set.
Streams
append-only collections of map-like entries that provide an abstract log data type.

Binary-safe string

Redis的字符串是字节序列。在Redis中字符串是二进制安全的,这意味着他们有一个已知的长度,是没有任何特殊字符终止决定的,所以可以存储任何东西,最大长度可达512兆。

> set mykey somevalue
OK
> get mykey
"somevalue"

虽然redis中value存储为字符串,但是有一些命令可以按照数字来处理,例如自增:

> set counter 100
OK
> incr counter
(integer) 101
> incr counter
(integer) 102
> incrby counter 50
(integer) 152

批量设置(atomic):

> mset a 10 b 20 c 30
OK
> mget a b c
1) "10"
2) "20"
3) "30"

Lists

Redis Lists are implemented with because for a database system it is crucial to be able to add elements to a very long list in a very fast way. Another strong advantage, as you'll see in a moment, is that Redis Lists can be taken at constant length in constant time.

应用场景

  • Remember the latest updates posted by users into a social network.
  • Communication between processes, using a consumer-producer pattern where the producer pushes items into a list, and a consumer (usually a worker) consumes those items and executed actions. Redis has special list commands to make this use case both more reliable and efficient.

Sets

Redis Sets are unordered collections of strings. The SADD command adds new elements to a set. It's also possible to do a number of other operations against sets like testing if a given element already exists, performing the intersection, union or difference between multiple sets, and so forth.

> sadd myset 1 2 3
(integer) 3
> smembers myset
1. 3
2. 1
3. 2

应用场景

  • tags

Sorted sets

Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set as well.


However while elements inside sets are not ordered, every element in a sorted set is associated with a floating point value, called the score (this is why the type is also similar to a hash, since every element is mapped to a value).

Moreover, elements in a sorted sets are taken in order (so they are not ordered on request, order is a peculiarity of the data structure used to represent sorted sets). They are ordered according to the following rule:

If A and B are two elements with a different score, then A > B if A.score is > B.score. If A and B have exactly the same score, then A > B if the A string is lexicographically greater than the B string. A and B strings can't be equal since sorted sets only have unique elements.

> zadd hackers 1940 "Alan Kay"
(integer) 1
> zadd hackers 1957 "Sophie Wilson"
(integer) 1
> zadd hackers 1953 "Richard Stallman"
(integer) 1

Sorted sets are implemented via a data structure containing both a skip list and a hash table, so every time we add an element Redis performs an O(log(N)) operation. That's good, but when we ask for sorted elements Redis does not have to do any work at all, it's already all sorted:


> zrange hackers 0 -1
1) "Alan Turing"
2) "Hedy Lamarr"
3) "Claude Shannon"
4) "Alan Kay"
5) "Anita Borg"
6) "Richard Stallman"
7) "Sophie Wilson"
8) "Yukihiro Matsumoto"
9) "Linus Torvalds"

应用场景

Just a final note about sorted sets before switching to the next topic. Sorted sets' scores can be updated at any time. Just calling ZADD against an element already included in the sorted set will update its score (and position) with O(log(N)) time complexity. As such, sorted sets are suitable when there are tons of updates.

Because of this characteristic a common use case is leader boards. The typical application is a Facebook game where you combine the ability to take users sorted by their high score, plus the get-rank operation, in order to show the top-N users, and the user rank in the leader board (e.g., "you are the #4932 best score here").

Hashes

Bit arrays

Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type. Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 232 different bits.

Bit operations are divided into two groups: constant-time single bit operations, like setting a bit to 1 or 0, or getting its value, and operations on groups of bits, for example counting the number of set bits in a given range of bits (e.g., population counting).

> setbit key 0 1
(integer) 0
> setbit key 100 1
(integer) 0
> bitcount key
(integer) 2

HyperLogLogs

A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, which in the case of the Redis implementation is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.

> pfadd hll a b c d
(integer) 1
> pfcount hll
(integer) 4

应用场景

An example of use case for this data structure is counting unique queries performed by users in a search form every day.