聚合操作

提示

如果查询和聚合一起出现，那么ES会先根据查询过滤文档，对符合条件的文档再进行聚合操作。
需要注意的是text类型的因为会分词，尽量不要用于聚合，建议使用keyword类型的字段进行聚合。在此前提下仍需要聚合的话，有两种方式：

使用字段.keyword替代字段，即使用name.keyword作为聚合field。
设置字段类型时，加上fielddata:true的配置。

单值聚合

http

POST /product/_search
{
  "aggs": {
    "avg_aggs": {
      "avg": {           // 聚合类型： 平均值
        "field": "price"
      }
    }
  },
  "size": 0
}

聚合类型：value_count（数量汇总）、avg（平均值）、sum（求和）、max（最大值）、min（最小值）、cardinality（基数）。
cardinality作用等同于SQL中的count(distinct)，但是它是一个近似算法，ES提供了precision_threshold来设置阈值，默认是100，字段基数如果在阈值以下，几乎100%是准确的，高于阈值会为了节省内存而牺牲精度。

多值聚合

percentiles

http

POST /product/_search
{
  "aggs": {
    "percent_aggs": {
      "percentiles": {
        "field": "price",
        "percents": [
          50
        ]
      }
    }
  },
  "size": 0
}
# 结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "percent_aggs" : {
      "values" : {
        "50.0" : 5999.0 // 符合条件的值中，50%的数据不超过5999.0
      }
    }
  }
}

percentiles理解成百分位，用来表示所有值中百分之多少比这个值低，这个百分比是查询时传递的参数。

percentile_ranks

http

POST /product/_search
{
  "aggs": {
    "percent_aggs": {
      "percentile_ranks": {
        "field": "price",
        "values": [
          4999
        ]
      }
    }
  },
  "size": 0
}

与percentiles完全相反，查询时传递值参数，计算百分比，即百分之多少的数据低于这个值

stats

http

POST /product/_search
{
  "aggs": {
    "stats_aggs": {
      "stats": {
        "field": "price"
      }
    }
  },
  "size": 0
}

stats是count、min、max、avg、sum指标的汇总。

extended_stats

http

{
  "aggs": {
    "stats_aggs": {
      "extended_stats": {
        "field": "price"
      }
    }
  },
  "size": 0
}

extended_stats是stats的扩展，添加了标准差和平方和。

terms

http

POST /product/_search
{
  "aggs": {
    "terms_aggs": {
      "terms": {
        "field": "price",
        "size": 10, // 分片数据汇总后返回的数量
        "shard_size": "10", // 每个分片最多返回的数量
        "min_doc_count": "10", // group by后最小的count数量，低于此数量的不展示
        "include": ".*", // 通配符过滤bucket数据
        "exclude": ".*", // 排除符合的bucket数据
        "order": {
          "_key": "desc",
          "_count": "asc"
        }
      }
    }
  },
  "size": 0
}

terms中的size用于控制聚合中符合条件的doc_count数量，即group by price limit size。
terms支持对结果排序，_key表示按照字母顺序，_count表示按照聚合后的统计数量进行排序。

range

http

POST /product/_search
{
  "aggs": {
    "range_aggs": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 4000,
            "to": 7000
          }
        ]
      }
    }
  }
}
# 时间range聚合
POST /product/_search
{
  "aggs": {
    "date_aggs": {
      "date_range": {
        "field": "time",
        "ranges": [
          {
            "from": "now-1y/y",
            "to": 20221101
          }
        ]
      }
    }
  }
}

range支持多个区间的桶聚合，返回的结果中将from-to组合成key返回，范围区间为左闭右开。
date_range中时间字段同样支持date表达式。

histogram

http

POST /product/_search
{
  "aggs": {
    "histogram_aggs": {
      "histogram": {
        "field": "price",
        "interval": 50, 
        "min_doc_count": 0
      }
    }
  }
}
{
 "aggregations" : {
    "histogram_aggs" : {
      "buckets" : [
        {
          "key" : 4950.0, // 表示 4950 -> (4950 +interval)这个区间有1条符合条件的文档
          "doc_count" : 1
        }
      ]
    }
  }
}
# date_histogram
POST /product/_search
{
  "aggs": {
    "histogram_aggs": {
      "date_histogram": {
        "field": "time",
        "calendar_interval": "1d", // 支持分钟 (1m)、小时 (1h)、天 (1d)、星期 (1w)、月 (1M)、季度 (1q)、年 (1y)
        "missing": "20220907", 
        "time_zone": "+08:00", 
        "min_doc_count": 1
      }
    }
  }
}

histogram又称为直方图聚合，interval用于指定区间的间隔，min_doc_count用于筛选符合数量的区间。

nested

http

POST /product/_search
{
  "aggs": {
    "first_aggs": {
      "terms": {
        "field": "category"
      },
      "aggs": {
        "second_aggs": {
          "terms": {
            "field": "price",
            "size": 10
          }
        }
      }
    }
  }
}

嵌套聚合，就是在第一个聚合的基础上再次聚合。

global

http

POST /product/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ],
      "filter": [
        {
          "range": {
            "time": {
              "gte": 20221001,
              "lte": 20221020
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "first_aggs": {
      "global": {},
      "aggs": {
        "first_aggs": {
          "terms": {
            "field": "category"
          }
        }
      }
    }
  }
}

如果使用了global，那么搜索查询和聚合两者互不影响，即聚合的文档不由搜索决定。

聚合操作 ​

单值聚合 ​

多值聚合 ​

percentiles ​

percentile_ranks ​

stats ​

extended_stats ​

terms ​

range ​

histogram ​

nested ​

global ​

聚合操作

单值聚合

多值聚合

percentiles

percentile_ranks

stats

extended_stats

terms

range

histogram

nested

global