ES API 之 GET

GET API是Elasticsearch中常用的操作，一般用于验证文档是否存在；或者执行CURD中的文档查询。与检索不同的是，GET查询是实时查询，可以实时查询到索引结果。而检索则是需要经过处理才能搜索到。合理利用这些方法，可以更灵活的使用Elasticsearch。
参考：http://www.cnblogs.com/xing901022/p/5317698.html

查询样例

Get API允许基于ID字段从Elasticsearch查询JSON文档，下面就是一个查询的例子：

1	`curl -XGET 'http://localhost:9200/website/blog/123?pretty'`

上面的命令表示，在website索引的blog类型中查询id为123的文档，返回结果如下：

{
    "_index": "website",
    "_type": "blog",
    "_id": "123",
    "_version": 1,
    "found": true,
    "_source": {
        "title": "My first blog entry",
        "text": "Just trying this out...",
        "date": "2014/01/01"
    }
}

上面返回的数据包括文档的基本内容：

_index是索引名称
_type是类型
_id是ID
_version是版本号
_source字段包括了文档的基本内容
found字段代表是否找到
这个API支持使用HEAD方式提交，这样可以验证这个ID是否存在，而不会返回无用的数据。

1	`curl -XHEAD -i 'http://localhost:9200/website/blog/123'`

1
2
3

HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

实时

默认情况下get API是实时的，并不会受到索引的刷新频率的影响。（也就是说，只要索引的数据，就可以立马查询到）
有的时候我们可能想要关闭实时查询，这样可以设置realtime=false。
也可以在配置文件中配置，使之全局可用，即配置action.get.realtime为false。

类型可选

API中类型_type是可选的，如果想要查询所有的类型，可以直接指定类型为_all，从而匹配所有的类型。

source过滤

默认情况下get操作会返回_source字段，除非你使用了fields字段或者禁用了_source字段。通过设置_source属性，可以禁止返回source内容（source内容为空）:

1	`curl -XGET 'http://localhost:9200/website/blog/123?_source=false'`

{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"found": true,
"_source": { }
}

如果想要返回特定的字段，可以使用_source_include(包含)或者_source_exclude（排除）进行过滤。可以使用逗号分隔来设置多种匹配模式，比如：

1
2
3

curl -XGET 'http://localhost:9200/website/blog/123?_source_include=title,date'
curl -XGET 'http://localhost:9200/website/blog/123?_source_exclude=date'
curl -XGET 'http://localhost:9200/website/blog/123?_source_include=*&_source_exclude=date'

{
    "_index": "website",
    "_type": "blog",
    "_id": "123",
    "_version": 1,
    "found": true,
    "_source": {
        "text": "Just trying this out...",
        "title": "My first blog entry"
    }
}

字段

get操作允许设置fields字段，返回特定的字段：

1	`curl -XGET 'http://localhost:9200/website/blog/123?fields=title,text'`

{
    "_index": "website",
    "_type": "blog",
    "_id": "123",
    "_version": 1,
    "found": true,
    "fields": {
        "title": [
        "My first blog entry"
        ],
        "text": [
        "Just trying this out..."
        ]
    }
}

如果请求的字段没有被存储，那么他们会从source中分析出来，这个功能也可以用source_filter来替代。
元数据比如_routing和_parent是永远不会被返回的。

Generated fields

如果在执行完索引操作，没有刷新，那么GET操作会读取translog的内容来查询文档。然而有一些字段仅仅是在索引的时候产生的。如果你尝试读取索引中的生成的字段，就会出现错误。可以设置ignore_erros_on_generated_fields=true来忽略错误。
Translog就是索引的数据要进行存储，总不可能索引一条就更新一次Lucene结构。于是就搞了个translog，数据的变动会先放在translog里面，再刷新到es中。实时查询，其实是读取了translog中，还未持久化的数据。

仅返回_source

使用/{index}/{type}/{id}/_source可以仅仅返回_source字段，而不必返回过多不必要的信息，浪费网络带宽。

1	`curl -XGET 'http://localhost:9200/website/blog/123/_source'`

{
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}

也可以使用过滤机制：

1	`curl -XGET 'http://localhost:9200/website/blog/123/_source?_source_include=title,text,date'`

{
    "date": "2014/01/01",
    "text": "Just trying this out...",
    "title": "My first blog entry"
}

也是支持使用HEAD方式，验证是否存在：

1	`curl -XHEAD -i 'http://localhost:9200/website/blog/123/_source'`

1
2
3

HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

路由

当索引的时候指定了路由，那么查询的时候就一定要指定路由。

1	`curl -XGET 'http://localhost:9200/XXX/XXX/XXX?routing=XXX'`

如果路由信息不正确，就会查找不到文档。

Preference

控制为get请求维护一个分片的索引，这个索引可以设置为：

_primary 这个操作仅仅会在主分片上执行。
_local 这个操作会在本地的分片上执行。
Custom (string) value 用户可以自定义值，对于相同的分片可以设置相同的值。这样可以保证不同的刷新状态下，查询不同的分片。就像sessionid或者用户名一样。

刷新

refresh参数可以让每次get之前都刷新分片，使这个值可以被搜索。设置true的时候，尽量要考虑下性能问题，因为每次刷新都会给系统带来一定的压力。

分布式

get操作会通过特定的哈希方法，把请求分配给特定的分片进行查询。由于在分布式的环境下，主分片和备份分片作为一个组，都可以支持get请求。这就意味着，分片的数量越多，get执行的规模就越大。

ELK

#默认标签

ES API 之 GET

https://leehoward.cn/2019/10/17/ES API 之 GET/

作者

lihao

发布于

2019年10月17日

许可协议

ES API 之 DELETE 上一篇

ES禁止Body覆盖URL中的参数下一篇