聚合查询
PPG007 ... 2022-8-15 About 3 min
# 聚合查询
# Bucket 聚合
Bucket (opens new window) 类似 SQL 中的 GROUP BY,是满足特定条件的文档的集合。
首先准备一些数据:
POST /test-agg-cars/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
简单聚合,得出哪个颜色的销量最好:
GET test-agg-cars/_search
{
"aggs": {
"best sold colors": {
"terms": {
"field": "color.keyword",
"size": 1
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
多个聚合,统计颜色和制造商:
GET test-agg-cars/_search
{
"aggs": {
"best sold colors": {
"terms": {
"field": "color.keyword",
"size": 1
}
},
"make_by": {
"terms": {
"field": "make.keyword",
"size": 1
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
聚合嵌套,计算每个制造商产品的平均价格:
GET /test-agg-cars/_search
{
"size" : 0,
"aggs": {
"makers": {
"terms": {
"field": "make.keyword"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
前置条件的过滤,类似 MongoDB Aggregation 中的 $match:
GET /test-agg-cars/_search
{
"size": 0,
"aggs": {
"make_by": {
"filter": { "term": { "make.keyword": "honda" } },
"aggs": {
"avg_price": { "avg": { "field": "price" } }
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
2
3
4
5
6
7
8
9
10
11
12
对 filter 进行分组聚合:filters,先创建数据,模拟日志系统,每条日志都在文本中,包含 warning、info 等信息。
PUT /test-agg-logs/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "body" : "warning: page could not be rendered" }
{ "index" : { "_id" : 2 } }
{ "body" : "authentication error" }
{ "index" : { "_id" : 3 } }
{ "body" : "warning: connection timed out" }
{ "index" : { "_id" : 4 } }
{ "body" : "info: hello pdai" }
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
对不同的日志类型进行分组,使用 filters:
GET /test-agg-logs/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"other_bucket_key": "other_messages",
"filters" : {
"infos" : { "match" : { "body" : "info" }},
"warnings" : { "match" : { "body" : "warning" }}
}
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2
3
4
5
6
7
8
9
10
11
12
13
14
15
对 number 类型聚合,ranges 类似 MongoDB Aggregation 中的 $bucket 操作的 boundaries 字段:
GET /test-agg-cars/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 20000 },
{ "from": 20000, "to": 40000 },
{ "from": 40000 }
]
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
::: waring
包含 from 值不包含 to 值。
::;
对 IP 类型聚合:
PUT test-agg-ips
{
"mappings": {
"properties": {
"ip_addr": {
"type": "ip"
}
}
}
}
PUT /test-agg-ips/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "ip" : "172.21.0.1 }
{ "index" : { "_id" : 2 } }
{ "ip" : "192.168.1.18" }
{ "index" : { "_id" : 3 } }
{ "ip" : "172.17.0.1" }
{ "index" : { "_id" : 4 } }
{ "ip" : "223.5.5.5" }
GET /test-agg-ips/_search
{
"size": 10,
"aggs": {
"ip_ranges": {
"ip_range": {
"field": "ip_addr",
"ranges": [
{ "to": "10.0.0.5" },
{ "from": "10.0.0.5" }
]
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
对日期类型聚合:
GET /test-agg-cars/_search
{
"size": 0,
"aggs": {
"range": {
"date_range": {
"field": "sold",
"format": "yyyy-MM-dd",
"ranges": [
{ "from": "2014-01-01" },
{ "to": "2014-12-31" }
]
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Metric 聚合
Mertics 聚合对 Bucket 内的文档进行统计计算。
# 单值分析
avg 平均值:
GET test-agg-cars/_search
{
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
max 最大值:
GET test-agg-cars/_search
{
"aggs": {
"max_price": {
"max": {
"field": "price"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
min 最小值
GET test-agg-cars/_search
{
"aggs": {
"min_price": {
"min": {
"field": "price"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
sum 求和:
GET test-agg-cars/_search
{
"aggs": {
"total_price": {
"sum": {
"field": "price"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
去重计数:
GET test-agg-cars/_search
{
"aggs": {
"total_maker": {
"cardinality": {
"field": "make.keyword"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
中位数:
GET test-agg-cars/_search
{
"aggs": {
"middle_price": {
"median_absolute_deviation": {
"field": "price"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
# 非单值分析
stats,计算总数、最小值、最大值、平均值、求和:
GET test-agg-cars/_search
{
"aggs": {
"stats": {
"stats": {
"field": "price"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
extended_stats,信息量更大,同样用于数值计算:
GET test-agg-cars/_search
{
"aggs": {
"stats": {
"extended_stats": {
"field": "price"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
string_stats,统计字符串信息:
GET test-agg-cars/_search
{
"aggs": {
"stats": {
"string_stats": {
"field": "color.keyword"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
percentiles 百分数范围:
// 可以得到各个价位段及其比重
GET test-agg-cars/_search
{
"aggs": {
"stats": {
"percentiles": {
"field": "price"
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
12
2
3
4
5
6
7
8
9
10
11
12
percentile_ranks 百分数排行,计算一个或多个百分位等级:
// 得到 23500 和 30000 以下价格的比例
GET test-agg-cars/_search
{
"aggs": {
"stats": {
"percentile_ranks": {
"field": "price",
"values": [23500, 30000]
}
}
},
"size": 0
}
1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
# Pipeline 聚合
TODO: