Lucene Field简介
在 Apache Lucene 中,Field 类是文档中存储数据的基础。不同类型的 Field 用于存储不同类型的数据(如文本、数字、二进制数据等)。


TextField
TextField 介绍
- 路径:
org.apache.lucene.document.TextField
- 用途:用于存储(Field.Store.YES)文本数据,并对其进行分词和索引。
- 底层存储结构:文本数据会被分词器(Analyzer)处理,将文本分割成词项(terms)。每个词项会被存储在倒排索引(inverted index)中,映射到包含该词项的文档。
TextField 示例
static String path="src/resources/index/app2";
public static void main(String[] args) throws IOException, ParseException {
Document doc = new Document();
doc.add(new TextField("fieldName", "This is a sample text.", Field.Store.YES));
Document doc1 = new Document();
doc1.add(new TextField("fieldName", "Sample text.", Field.Store.YES));
//索引存放目录
try (FSDirectory directory = FSDirectory.open(Paths.get(path));
//分词器
Analyzer analyzer = new StandardAnalyzer();
IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
){
//创建索引
indexWriter.addDocument(doc);
indexWriter.addDocument(doc1);
//提交索引写到文件
indexWriter.commit();
//查询索引
QueryParser queryParser = new QueryParser("fieldName", analyzer);
Query query = queryParser.parse("sample");
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
TopDocs topDocs_10 = indexSearcher.search(query, 10);
ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;
List<JSONObject> list = new ArrayList<>();
for (ScoreDoc scoreDoc : scoreDocArray) {
JSONObject jsonDoc = new JSONObject();
int docId = scoreDoc.doc;
Document document = indexSearcher.doc(docId);
jsonDoc.set("fieldName", document.get("fieldName"));
jsonDoc.set("score", scoreDoc.score);
jsonDoc.set("shardIndex", scoreDoc.shardIndex);
list.add(jsonDoc);
}
System.out.println(JSONUtil.toJsonStr(list));
//测试索引删除
indexWriter.deleteAll();
}
}
执行结果
查询词:sample

查询词:a sample text

StringField
StringField简介
- 路径:
org.apache.lucene.document.StringField
- 用途:用于存储不需要分词的字符串数据,如唯一标识符(ID)/封面图片路径等。
- 底层存储结构:字符串数据作为一个整体存储在倒排索引中,不会进行分词。
StringField示例
import org.apache.lucene.document.Document;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.Field.Store;
Document doc = new Document();
doc.add(new StringField("fieldName", "unique_identifier", Store.YES));
doc.add(new StringField("cover_image", "abc.jpg", Store.YES));
数字类
数字类型包含:IntField 、LongField、FloatField、DoubleField
数字类型简介
- 路径:
org.apache.lucene.document.IntField
org.apache.lucene.document.LongField
org.apache.lucene.document.FloatField
org.apache.lucene.document.DoubleField
- 用途:用于存储数字型数据并支持范围查询,如唯一自增长int/long标识符(ID)/日期时间戳等。
- 底层存储结构:数值数据会被转换成字节数组,并按照分块(block)的方式存储,以支持高效的范围查询。
数字类型示例
static String path="src/resources/index/app1";
public static void main(String[] args) throws IOException, ParseException {
//索引存放目录
try (FSDirectory directory = FSDirectory.open(Paths.get(path));
//分词器
Analyzer analyzer = new StandardAnalyzer();
IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
){
//创建索引
addDocument(indexWriter,"Cb Fc Zhang3",8);
addDocument(indexWriter,"Cb Fb Lie",18);
addDocument(indexWriter,"Cb Fb Wang",6);
//提交索引写到文件
indexWriter.commit();
//数字类型 范围查询
Query query = LongField.newRangeQuery("age",8,30);
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
TopDocs topDocs_10 = indexSearcher.search(query, 10);
ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocArray) {
int docId = scoreDoc.doc;
float score = scoreDoc.score;
Document document = indexSearcher.doc(docId);
System.out.printf("score: %s , name: %s ,age: %s%n", score, document.get("name"), document.get("age"));
}
//测试索引删除
indexWriter.deleteAll();
}
}
public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
Document doc = new Document();
doc.add(new TextField("name", name, Field.Store.YES));
doc.add(new LongField("age", age,Field.Store.YES));
indexWriter.addDocument(doc);
}
执行结果

组合排序使用


特别提醒:有范围查询和排序需求的同一个字段,建议在索引文档创建两个类型,数字/数点类型用于范围查询,SortedDocValuesField 和 NumericDocValuesField 用于排序查询。
数点类型
数点类型包含:IntPoint、LongPoint、FloatPoint、DoublePoint、BigIntegerPoint
数点类型字段简介
- 路径:
org.apache.lucene.document.IntPoint
org.apache.lucene.document.LongPoint
org.apache.lucene.document.FloatPoint
org.apache.lucene.document.DoublePoint
- 用途:用于存储数值数据,并支持范围查询(只创建查询索引,并不存储原始数据,需要配合
StoredField
进行存储)。 - 底层存储结构:数值数据会被转换成字节数组,并按照分块(block)的方式存储,以支持高效的范围查询。
数点类型示例
static String path="src/resources/index/app1";
public static void main(String[] args) throws IOException, ParseException {
//索引存放目录
try (FSDirectory directory = FSDirectory.open(Paths.get(path));
//分词器
Analyzer analyzer = new StandardAnalyzer();
IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
){
//创建索引
addDocument(indexWriter,"Cb Fc Zhang3",8);
addDocument(indexWriter,"Cb Fb Lie",18);
addDocument(indexWriter,"Cb Fb Wang",6);
//提交索引写到文件
indexWriter.commit();
//Point 范围查询
Query query = IntPoint.newRangeQuery("age",8,30);
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
TopDocs topDocs_10 = indexSearcher.search(query, 10);
ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocArray) {
int docId = scoreDoc.doc;
float score = scoreDoc.score;
Document document = indexSearcher.doc(docId);
System.out.printf("score: %s , name: %s ,age: %s%n", score, document.get("name"), document.get("age"));
}
//测试索引删除
indexWriter.deleteAll();
}
}
public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
Document doc = new Document();
doc.add(new TextField("name", name, Field.Store.YES));
doc.add(new IntPoint("age", age));
doc.add(new StoredField("age", age));
indexWriter.addDocument(doc);
}
示例执行结果

可以看到只查询出来了8-30范围值,边界值包含在内。
StoredField
StoredField简介
- 路径:
org.apache.lucene.document.StoredField
- 用途:用于存储不需要索引的数据,仅用于检索时返回的字段(例如:配置数字类型索引存储原始数据)。
- 底层存储结构:数据以原始字节的形式存储在存储字段(stored field)中,不会被索引。
示例
import org.apache.lucene.document.Document;
import org.apache.lucene.document.StoredField;
Document doc = new Document();
doc.add(new StoredField("fieldName", "This is the stored content."));
BinaryField
BinaryField 简介
- 路径:
org.apache.lucene.document.BinaryField
- 用途:用于存储二进制数据。
- 底层存储结构:二进制数据以原始字节的形式存储在存储字段中,不会被索引
示例
import org.apache.lucene.document.Document;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.util.BytesRef;
Document doc = new Document();
byte[] byteArray = new byte[] {1, 2, 3, 4, 5};
doc.add(new StoredField("fieldName", new BytesRef(byteArray)));
排序打分类型
包含:SortedDocValuesField 和 NumericDocValuesField
简介
- 路径:
org.apache.lucene.document.SortedDocValuesField
org.apache.lucene.document.NumericDocValuesField
- 用途:用于存储排序和打分时需要的字段值,(注意:默认不会存储原始值,需要配合
StoredField
存储原始值)。 - 底层存储结构:数据以紧凑的格式存储在文档值(doc values)中,支持高效的排序和打分计算。
示例
static String path="src/resources/index/app1";
public static void main(String[] args) throws IOException, ParseException {
//索引存放目录
try (FSDirectory directory = FSDirectory.open(Paths.get(path));
//分词器
Analyzer analyzer = new StandardAnalyzer();
IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
){
//创建索引
addDocument(indexWriter,"Cb Fc Zhang3",8);
addDocument(indexWriter,"Cb Fb Lie",18);
addDocument(indexWriter,"Cb Fb Wang",6);
//提交索引写到文件
indexWriter.commit();
//查询索引
Query query = new QueryParser("name", analyzer).parse("Cb");
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
// TopDocs topDocs_10 = indexSearcher.search(query, 10);
Sort sort=new Sort(new SortField[]{SortField.FIELD_SCORE,new SortField("age",SortField.Type.LONG,true)});
TopDocs topDocs_10 = indexSearcher.search(query, 10,sort);
ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocArray) {
int docId = scoreDoc.doc;
Document document = indexSearcher.doc(docId);
System.out.printf("score: %s , name: %s ,age: %s%n", scoreDoc.score, document.get("name"), document.get("age"));
}
//测试索引删除
indexWriter.deleteAll();
}
}
public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
Document doc = new Document();
doc.add(new TextField("name", name, Field.Store.YES));
doc.add(new NumericDocValuesField("age", age));
doc.add(new StoredField("age", age));
indexWriter.addDocument(doc);
}
执行结果
普通查询,未排序

查询+排序(降序)

reverse
排序字段剖析
public SortField(String field, Type type, boolean reverse) {
this.initFieldType(field, type);
this.reverse = reverse;
}
构造函数中有个参数为:reverse
默认值 false,参考下图:

reverse
值说明:
- true - 等于SQL的DESC 降序
- false - 等于SQL的ASC 升序
Score - NaN
眼尖的朋友可能已经发现排序后Score变成了NaN,原因是自定义了排序后,评分被移动到了fields里面,数组的分数和创建的排序顺序一致

不同类型字段用同一个名问题
重要提醒:
以上不同类型除开
StoredField
不要使用相同字段名会有问题
参考:

https://www.syntaxspace.com/article/250428134000069.html
评论