黑马程序员技术交流社区

标题: 【石家庄校区】全文搜索引擎 Elasticsearch 入门教程（2） [打印本页]

作者: htb52110 时间: 2018-3-15 13:35
标题: 【石家庄校区】全文搜索引擎 Elasticsearch 入门教程（2）
四、中文分词设置
首先，安装中文分词插件。这里使用的是 ik，也可以考虑其他插件（比如 smartcn）。
$ ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.1/elasticsearch-analysis-ik-5.5.1.zip
上面代码安装的是5.5.1版的插件，与 Elastic 5.5.1 配合使用。
接着，重新启动 Elastic，就会自动加载这个新安装的插件。
然后，新建一个 Index，指定需要分词的字段。这一步根据数据结构而异，下面的命令只针对本文。基本上，凡是需要搜索的中文字段，都要单独设置一下。
$ curl -X PUT 'localhost:9200/accounts'-d '{ "mappings": { "person": { "properties": { "user": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" }, "title": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" }, "desc": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } } }}'
上面代码中，首先新建一个名称为accounts的 Index，里面有一个名称为person的 Type。person有三个字段。

user
title
desc

这三个字段都是中文，而且类型都是文本（text），所以需要指定中文分词器，不能使用默认的英文分词器。
Elastic 的分词器称为 analyzer。我们对每个字段指定分词器。
"user":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_max_word"}
上面代码中，analyzer是字段文本的分词器，search_analyzer是搜索词的分词器。ik_max_word分词器是插件ik提供的，可以对文本进行最大数量的分词。
五、数据操作
5.1 新增记录
向指定的 /Index/Type 发送 PUT 请求，就可以在 Index 里面新增一条记录。比如，向/accounts/person发送请求，就可以新增一条人员记录。
$ curl -X PUT 'localhost:9200/accounts/person/1'-d '{ "user": "张三", "title": "工程师", "desc": "数据库管理"}'
服务器返回的 JSON 对象，会给出 Index、Type、Id、Version 等信息。
{"_index":"accounts","_type":"person","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}
如果你仔细看，会发现请求路径是/accounts/person/1，最后的1是该条记录的 Id。它不一定是数字，任意字符串（比如abc）都可以。
新增记录的时候，也可以不指定 Id，这时要改成 POST 请求。
$ curl -X POST 'localhost:9200/accounts/person'-d '{ "user": "李四", "title": "工程师", "desc": "系统管理"}'
上面代码中，向/accounts/person发出一个 POST 请求，添加一个记录。这时，服务器返回的 JSON 对象里面，_id字段就是一个随机字符串。
{"_index":"accounts","_type":"person","_id":"AV3qGfrC6jMbsbXb6k1p","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}
注意，如果没有先创建 Index（这个例子是accounts），直接执行上面的命令，Elastic 也不会报错，而是直接生成指定的 Index。所以，打字的时候要小心，不要写错 Index 的名称。
5.2 查看记录
向/Index/Type/Id发出 GET 请求，就可以查看这条记录。
$ curl 'localhost:9200/accounts/person/1?pretty=true'
上面代码请求查看/accounts/person/1这条记录，URL 的参数pretty=true表示以易读的格式返回。
返回的数据中，found字段表示查询成功，_source字段返回原始记录。
{"_index":"accounts","_type":"person","_id":"1","_version":1,"found":true,"_source":{"user":"张三","title":"工程师","desc":"数据库管理"}}
如果 Id 不正确，就查不到数据，found字段就是false。
$ curl 'localhost:9200/weather/beijing/abc?pretty=true'{"_index":"accounts","_type":"person","_id":"abc","found":false}

作者: Yin灬Yan 时间: 2018-3-15 14:12
我来占层楼啊

欢迎光临黑马程序员技术交流社区 (http://bbs.itheima.com/)

黑马程序员IT技术论坛 X3.2