Redis data types
基本数据类型
Redis的数据类型如下:
- Binary-safe strings
- Strings in Redis are binary safe, meaning they have a known length not determined by any special terminating characters. Thus, you can store anything up to 512 megabytes in one string.
- Lists
- collections of string elements sorted according to the order of insertion. They are basically linked lists.
- Sets
- collections of unique, unsorted string elements.
- Sorted sets
- similar to Sets but where every string element is associated to a floating number value, called score. The elements are always taken sorted by their score, so unlike Sets it is possible to retrieve a range of elements
- Hashes
- which are maps composed of fields associated with values. Both the field and the value are strings.
- Bit arrays(bitmaps)
- it is possible, using special commands, to handle String values like an array of bits: you can set and clear individual bits, count all the bits set to 1, find the first set or unset bit, and so forth.
- HyperLogLogs
- this is a probabilistic data structure which is used in order to estimate the cardinality of a set.
- Streams
- append-only collections of map-like entries that provide an abstract log data type.
Binary-safe string
Redis的字符串是字节序列。在Redis中字符串是二进制安全的,这意味着他们有一个已知的长度,是没有任何特殊字符终止决定的,所以可以存储任何东西,最大长度可达512兆。
> set mykey somevalue
OK
> get mykey
"somevalue"
虽然redis中value存储为字符串,但是有一些命令可以按照数字来处理,例如自增:
> set counter 100
OK
> incr counter
(integer) 101
> incr counter
(integer) 102
> incrby counter 50
(integer) 152
批量设置(atomic):
> mset a 10 b 20 c 30
OK
> mget a b c
1) "10"
2) "20"
3) "30"
时间复杂度
SET, GET, DEL, INCR, DECR
都是 $$O(1)$$。
Lists
Redis Lists are implemented with linked lists because for a database system it is crucial to be able to add elements to a very long list in a very fast way. Another strong advantage, as you'll see in a moment, is that Redis Lists can be taken at constant length in constant time.
应用场景
- Remember the latest updates posted by users into a social network.
- Communication between processes, using a consumer-producer pattern where the producer pushes items into a list, and a consumer (usually a worker) consumes those items and executed actions. Redis has special list commands to make this use case both more reliable and efficient.
时间复杂度
$$O(1)$$: LPUSH, RPUSH, LPOP, RPOP, LLEN
$$O(N)$$: LINDEX, LRANGE, LSET
使用LRANGE
从头或者尾取小范围的值还是可以被视为$$O(1)$$,例如取最新的10条数据
Sets
Redis Sets are unordered collections of strings. The SADD command adds new elements to a set. It's also possible to do a number of other operations against sets like testing if a given element already exists, performing the intersection, union or difference between multiple sets, and so forth.
> sadd myset 1 2 3
(integer) 3
> smembers myset
1. 3
2. 1
3. 2
应用场景
- 建立事物之间的关系,例如多对多映射,tags等
例如存储文章对应的tag,可以建立这样的sets:
- post:<id>:tags → [Tags]
- tag:<tag name> → [Posts]
的映射,这样如果想获取文章所有的tag:
GET post:1:tags
获取所有包含MySQL、Java的文章:
SINTER tag:MySQL:posts tag:Java:posts
时间复杂度
$$O(1)$$: SADD, SREM, SPOP, SISMEMBER, SCARD, SDIFF
$$O(N)$$: SUNION, SMEMBERS
Sorted sets
Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set as well.
However while elements inside sets are not ordered, every element in a sorted set is associated with a floating point value, called the score (this is why the type is also similar to a hash, since every element is mapped to a value).
Moreover, elements in a sorted sets are taken in order (so they are not ordered on request, order is a peculiarity of the data structure used to represent sorted sets). They are ordered according to the following rule:
If A and B are two elements with a different score, then A > B if A.score is > B.score. If A and B have exactly the same score, then A > B if the A string is lexicographically greater than the B string. A and B strings can't be equal since sorted sets only have unique elements.
> zadd hackers 1940 "Alan Kay"
(integer) 1
> zadd hackers 1957 "Sophie Wilson"
(integer) 1
> zadd hackers 1953 "Richard Stallman"
(integer) 1
Sorted sets are implemented via a dual-ported data structure containing both a skip list and a hash table, so every time we add an element Redis performs an O(log(N)) operation. That's good, but when we ask for sorted elements Redis does not have to do any work at all, it's already all sorted:
> zrange hackers 0 -1
1) "Alan Turing"
2) "Hedy Lamarr"
3) "Claude Shannon"
4) "Alan Kay"
5) "Anita Borg"
6) "Richard Stallman"
7) "Sophie Wilson"
8) "Yukihiro Matsumoto"
9) "Linus Torvalds"
应用场景
Just a final note about sorted sets before switching to the next topic. Sorted sets' scores can be updated at any time. Just calling ZADD against an element already included in the sorted set will update its score (and position) with O(log(N)) time complexity. As such, sorted sets are suitable when there are tons of updates.
Because of this characteristic a common use case is leader boards. The typical application is a Facebook game where you combine the ability to take users sorted by their high score, plus the get-rank operation, in order to show the top-N users, and the user rank in the leader board (e.g., "you are the #4932 best score here").
Hashes
Bit arrays
Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type. Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 232 different bits.
Bit operations are divided into two groups: constant-time single bit operations, like setting a bit to 1 or 0, or getting its value, and operations on groups of bits, for example counting the number of set bits in a given range of bits (e.g., population counting).
> setbit key 0 1
(integer) 0
> setbit key 100 1
(integer) 0
> bitcount key
(integer) 2
HyperLogLogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, which in the case of the Redis implementation is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
> pfadd hll a b c d
(integer) 1
> pfcount hll
(integer) 4
应用场景
An example of use case for this data structure is counting unique queries performed by users in a search form every day.