OpenSearch(ES)的入门教程-向量库
先开个篇
opensearch是aws根据Elasticsearch(ES)开发的一个组件,本文将讲解的入门教程。使用spring boot 2.5.3 和opensearch-java 1.0 JDK 开发
1、 先说使用es的比较重要,也是现在比较火的一个主要的功能, 向量库
向量的基础语法
{"query": {"knn": {"embedding": {"vector": keyword_embedding,"k": k}}},"size": size}
java 构建ES的向量查询
public Map<String, Double> getOpenSearchMap(String testStr) {if(StringUtils.isBlank(testStr)) return null;StopWatch stopWatch = new StopWatch();stopWatch.start();String embeddingData = this.getEmbeddingData(testStr);if(org.apache.commons.lang3.StringUtils.isBlank(embeddingData)) {log.error("embeddingData empty {}", embeddingData);return null;}// 构建 OpenSearch 查询String index = "/opensearch_index_embedding/_search";String jsonStart = "{\n" +" \"size\": 7,\n" +" \"query\": {\n" +" \"script_score\": {\n" +" \"query\": {\n" +" \"match_all\": {}\n" +" },\n" +" \"script\": {\n" +" \"source\": \"knn_score\",\n" +" \"lang\": \"knn\",\n" +" \"params\": {\n" +" \"field\": \"embedding_data\",\n" +" \"space_type\": \"cosinesimil\",\n" +" \"query_value\":";String jsonEnd = "}\n" +" }\n" +" }\n" +" }\n" +"}";String json = jsonStart+embeddingData+jsonEnd;// 执行查询OpenSearchCommonRes commonRes = opensearchUtils.queryByJson(index, "GET", json);stopWatch.stop();log.info("getOpenSearchMap,param:{},cost:{}",category,stopWatch.getTotalTimeMillis());// 处理查询结果if(Objects.nonNull(commonRes) && commonRes.getHits().getTotal().getValue()>0) {return commonRes.getHits().getHits().stream().filter(p -> p.get_score() > 1.9d).collect(Collectors.toMap(p -> p.get_source().get("category").toString(),p -> p.get_score(),(p1, p2) -> p1));}return null;}// 获取查询字段的具体 embedding的值private String getEmbeddingData(String testStr) {if(StringUtils.isBlank(testStr)) return null;StopWatch stopWatch = new StopWatch();stopWatch.start();String index = "/opensearch_index_embedding/_search";String jsonStart = "{\n" +" \"query\": {\n" +" \"term\": {\n" +" \"embedding_data\": {\n" +" \"value\": \"";String jsonEnd = "\"\n" +" }\n" +" }\n" +" },\n" +" \"_source\": [\"embedding_data\"]\n" +" \n" +"}";// 拼接查询字符串String json = jsonStart+testStr+jsonEnd;// 执行查询OpenSearchCommonRes commonRes = opensearchUtils.queryByJson(index, "POST", json);// 记录查询耗时if (commonRes == null || commonRes.getHits() == null || commonRes.getHits().getTotal() == null) {log.error("getEmbeddingData,testStr:{},result is null", testStr);return null;}stopWatch.stop();// 如果查询结果不为空且只有一个结果,返回该结果的 embedding_data 字段if(Objects.nonNull(commonRes) && commonRes.getHits().getTotal().getValue()==1) {return commonRes.getHits().getHits().stream().map(p -> p.get_source().get("embedding_data").toString()).findFirst().get();}return null;}
2、es中的搜索算法
- BM25算法
BM25(Best Matching 25)是一种基于概率检索框架的排名函数,用于估算文档和查询之间的相关性。它是TF-IDF(词频-逆文档频率)算法的改进版本,考虑了文档的长度和词频等因素,旨在解决TF-IDF在处理某些问题时的不足。
3、es中的基础的几个查询
普通查询
_search
精确查找
精确查找
4、ES中的数据类型
nested 查询{"unitCard": {"type": "nested","properties": {"binNo": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"color": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"productName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"unitNo": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}}}
0 条评论