Apache Lucene Field / IndexField 类型和使用详解

编程 > Java (186) 2025-04-30 14:37:41

Lucene Field简介

在 Apache Lucene 中，Field 类是文档中存储数据的基础。不同类型的 Field 用于存储不同类型的数据（如文本、数字、二进制数据等）。

Apache Lucene Field / IndexField 类型和使用详解_图示-5207ec8cc28049599f3af14c5178f4b9.png — IndexableField

Apache Lucene Field / IndexField 类型和使用详解_图示-56ede2e593554edc829d0f14dd0e2fb2.png — Field

TextField

TextField 介绍

路径：org.apache.lucene.document.TextField
用途：用于存储（Field.Store.YES）文本数据，并对其进行分词和索引。
底层存储结构：文本数据会被分词器（Analyzer）处理，将文本分割成词项（terms）。每个词项会被存储在倒排索引（inverted index）中，映射到包含该词项的文档。

TextField 示例

static String path="src/resources/index/app2";
    public static void main(String[] args) throws IOException, ParseException {

        Document doc = new Document();
        doc.add(new TextField("fieldName", "This is a sample text.", Field.Store.YES));
        Document doc1 = new Document();
        doc1.add(new TextField("fieldName", "Sample text.", Field.Store.YES));
        //索引存放目录
        try (FSDirectory directory = FSDirectory.open(Paths.get(path));
             //分词器
             Analyzer analyzer = new StandardAnalyzer();
             IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));

        ){
            //创建索引
            indexWriter.addDocument(doc);
            indexWriter.addDocument(doc1);
            //提交索引写到文件
            indexWriter.commit();

            //查询索引
            QueryParser queryParser = new QueryParser("fieldName", analyzer);
            Query query = queryParser.parse("sample");

            IndexReader indexReader = DirectoryReader.open(directory);
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);
            TopDocs topDocs_10 = indexSearcher.search(query, 10);
            ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;

            List<JSONObject> list = new ArrayList<>();
            for (ScoreDoc scoreDoc : scoreDocArray) {
                JSONObject jsonDoc = new JSONObject();

                int docId = scoreDoc.doc;
                Document document = indexSearcher.doc(docId);
                jsonDoc.set("fieldName", document.get("fieldName"));
                jsonDoc.set("score", scoreDoc.score);
                jsonDoc.set("shardIndex", scoreDoc.shardIndex);

                list.add(jsonDoc);
            }
            System.out.println(JSONUtil.toJsonStr(list));

            //测试索引删除
            indexWriter.deleteAll();
        }

    }

执行结果

查询词：sample

Apache Lucene Field / IndexField 类型和使用详解_图示-3efb39e5029045ad94a635daa3385ce5.png

查询词：a sample text

Apache Lucene Field / IndexField 类型和使用详解_图示-d4dbdfc6bfc141f394038503b18bf8c1.png

StringField

StringField简介

路径：org.apache.lucene.document.StringField
用途：用于存储不需要分词的字符串数据，如唯一标识符（ID）/封面图片路径等。
底层存储结构：字符串数据作为一个整体存储在倒排索引中，不会进行分词。

StringField示例

import org.apache.lucene.document.Document;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.Field.Store;
 
Document doc = new Document();
doc.add(new StringField("fieldName", "unique_identifier", Store.YES));
doc.add(new StringField("cover_image", "abc.jpg", Store.YES));

数字类

数字类型包含：IntField 、LongField、FloatField、DoubleField

数字类型简介

路径：
- org.apache.lucene.document.IntField
- org.apache.lucene.document.LongField
- org.apache.lucene.document.FloatField
- org.apache.lucene.document.DoubleField
用途：用于存储数字型数据并支持范围查询，如唯一自增长int/long标识符（ID）/日期时间戳等。
底层存储结构：数值数据会被转换成字节数组，并按照分块（block）的方式存储，以支持高效的范围查询。

数字类型示例

    static String path="src/resources/index/app1";
    public static void main(String[] args) throws IOException, ParseException {
        //索引存放目录
        try (FSDirectory directory = FSDirectory.open(Paths.get(path));
             //分词器
             Analyzer analyzer = new StandardAnalyzer();
             IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
        ){
            //创建索引
            addDocument(indexWriter,"Cb Fc Zhang3",8);
            addDocument(indexWriter,"Cb Fb Lie",18);
            addDocument(indexWriter,"Cb Fb Wang",6);
            //提交索引写到文件
            indexWriter.commit();

            //数字类型 范围查询
            Query query =  LongField.newRangeQuery("age",8,30);

            IndexReader indexReader = DirectoryReader.open(directory);
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);

            TopDocs topDocs_10 = indexSearcher.search(query, 10);

            ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;

            for (ScoreDoc scoreDoc : scoreDocArray) {
                int docId = scoreDoc.doc;
                float score = scoreDoc.score;
                Document document = indexSearcher.doc(docId);
                System.out.printf("score: %s , name: %s ,age: %s%n", score, document.get("name"), document.get("age"));
            }

            //测试索引删除
            indexWriter.deleteAll();
        }


    }
    public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
        Document doc = new Document();
        doc.add(new TextField("name", name, Field.Store.YES));
        doc.add(new LongField("age", age,Field.Store.YES));
        indexWriter.addDocument(doc);
    }

执行结果

Apache Lucene Field / IndexField 类型和使用详解_图示-229a8c901b824866b9e3f23f615313d7.png

组合排序使用

Apache Lucene Field / IndexField 类型和使用详解_图示-8b955c8cace84d2eae5aede7c013c079.png

Apache Lucene Field / IndexField 类型和使用详解_图示-d726b999ea2142e8b6a51cfcae33670d.png

特别提醒：有范围查询和排序需求的同一个字段，建议在索引文档创建两个类型，数字/数点类型用于范围查询，SortedDocValuesField 和 NumericDocValuesField 用于排序查询。

数点类型

数点类型包含：IntPoint、LongPoint、FloatPoint、DoublePoint、BigIntegerPoint

数点类型字段简介

路径：
- org.apache.lucene.document.IntPoint
- org.apache.lucene.document.LongPoint
- org.apache.lucene.document.FloatPoint
- org.apache.lucene.document.DoublePoint
用途：用于存储数值数据，并支持范围查询（只创建查询索引，并不存储原始数据,需要配合StoredField进行存储）。
底层存储结构：数值数据会被转换成字节数组，并按照分块（block）的方式存储，以支持高效的范围查询。

数点类型示例

static String path="src/resources/index/app1";
    public static void main(String[] args) throws IOException, ParseException {
        //索引存放目录
        try (FSDirectory directory = FSDirectory.open(Paths.get(path));
             //分词器
             Analyzer analyzer = new StandardAnalyzer();
             IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
        ){
            //创建索引
            addDocument(indexWriter,"Cb Fc Zhang3",8);
            addDocument(indexWriter,"Cb Fb Lie",18);
            addDocument(indexWriter,"Cb Fb Wang",6);
            //提交索引写到文件
            indexWriter.commit();

            //Point 范围查询
           Query query =  IntPoint.newRangeQuery("age",8,30);

            IndexReader indexReader = DirectoryReader.open(directory);
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);

            TopDocs topDocs_10 = indexSearcher.search(query, 10);

            ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;

            for (ScoreDoc scoreDoc : scoreDocArray) {
                int docId = scoreDoc.doc;
                float score = scoreDoc.score;
                Document document = indexSearcher.doc(docId);
                System.out.printf("score: %s , name: %s ,age: %s%n", score, document.get("name"), document.get("age"));
            }

            //测试索引删除
            indexWriter.deleteAll();
        }


    }
    public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
        Document doc = new Document();
        doc.add(new TextField("name", name, Field.Store.YES));
        doc.add(new IntPoint("age", age));
        doc.add(new StoredField("age", age));
        indexWriter.addDocument(doc);
    }

示例执行结果

Apache Lucene Field / IndexField 类型和使用详解_图示-8f1edab279124f6f9de4cf6206ca0002.png

可以看到只查询出来了8-30范围值，边界值包含在内。

StoredField

StoredField简介

路径：org.apache.lucene.document.StoredField
用途：用于存储不需要索引的数据，仅用于检索时返回的字段（例如：配置数字类型索引存储原始数据）。
底层存储结构：数据以原始字节的形式存储在存储字段（stored field）中，不会被索引。

示例

import org.apache.lucene.document.Document;
import org.apache.lucene.document.StoredField;
 
Document doc = new Document();
doc.add(new StoredField("fieldName", "This is the stored content."));

BinaryField

BinaryField 简介

路径：org.apache.lucene.document.BinaryField
用途：用于存储二进制数据。
底层存储结构：二进制数据以原始字节的形式存储在存储字段中，不会被索引

示例

import org.apache.lucene.document.Document;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.util.BytesRef;
 
Document doc = new Document();
byte[] byteArray = new byte[] {1, 2, 3, 4, 5};
doc.add(new StoredField("fieldName", new BytesRef(byteArray)));

排序打分类型

包含：SortedDocValuesField 和 NumericDocValuesField

简介

路径：
- org.apache.lucene.document.SortedDocValuesField
- org.apache.lucene.document.NumericDocValuesField
用途：用于存储排序和打分时需要的字段值，（注意：默认不会存储原始值，需要配合StoredField 存储原始值）。
底层存储结构：数据以紧凑的格式存储在文档值（doc values）中，支持高效的排序和打分计算。

示例

    static String path="src/resources/index/app1";
    public static void main(String[] args) throws IOException, ParseException {
        //索引存放目录
        try (FSDirectory directory = FSDirectory.open(Paths.get(path));
             //分词器
             Analyzer analyzer = new StandardAnalyzer();
             IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
        ){
            //创建索引
            addDocument(indexWriter,"Cb Fc Zhang3",8);
            addDocument(indexWriter,"Cb Fb Lie",18);
            addDocument(indexWriter,"Cb Fb Wang",6);
            //提交索引写到文件
            indexWriter.commit();

            //查询索引
            Query query = new QueryParser("name", analyzer).parse("Cb");

            IndexReader indexReader = DirectoryReader.open(directory);
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);


//            TopDocs topDocs_10 = indexSearcher.search(query, 10);

            Sort sort=new Sort(new SortField[]{SortField.FIELD_SCORE,new SortField("age",SortField.Type.LONG,true)});
            TopDocs topDocs_10 = indexSearcher.search(query, 10,sort);

            ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;

            for (ScoreDoc scoreDoc : scoreDocArray) {
                int docId = scoreDoc.doc;
                Document document = indexSearcher.doc(docId);
                System.out.printf("score: %s , name: %s ,age: %s%n", scoreDoc.score, document.get("name"), document.get("age"));
            }

            //测试索引删除
            indexWriter.deleteAll();
        }


    }
    
    public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
        Document doc = new Document();
        doc.add(new TextField("name", name, Field.Store.YES));
        doc.add(new NumericDocValuesField("age", age));
        doc.add(new StoredField("age", age));
        indexWriter.addDocument(doc);
    }

执行结果

普通查询，未排序

Apache Lucene Field / IndexField 类型和使用详解_图示-0c6ffd1972e945849ea4596d6757266c.png

查询+排序（降序）

Apache Lucene Field / IndexField 类型和使用详解_图示-43259b59fda2464a8e6cc6b3ae4280fc.png

`reverse` 排序字段剖析

    public SortField(String field, Type type, boolean reverse) {
        this.initFieldType(field, type);
        this.reverse = reverse;
    }

构造函数中有个参数为：reverse 默认值 false,参考下图：

Apache Lucene Field / IndexField 类型和使用详解_图示-ec6fedc66d9d4d399959e3a623a46476.png

reverse 值说明：

true - 等于SQL的DESC 降序
false - 等于SQL的ASC 升序

Score - NaN

眼尖的朋友可能已经发现排序后Score变成了NaN，原因是自定义了排序后，评分被移动到了fields里面，数组的分数和创建的排序顺序一致

Apache Lucene Field / IndexField 类型和使用详解_图示-73b3d8e3580743f4994d5d44ba8e9a71.png

不同类型字段用同一个名问题

重要提醒：

以上不同类型除开StoredField不要使用相同字段名会有问题

参考：

Apache Lucene Field / IndexField 类型和使用详解_图示-c908da22a7c34e9e80a47ede29151c0e.png

Lucene Field详解 Lucene高级查询 Lucene Lucene查询

https://www.syntaxspace.com/article/250428134000069.html

Apache Lucene Field / IndexField 类型和使用详解

Lucene Field简介

TextField

TextField 介绍

TextField 示例

执行结果

StringField

StringField简介

StringField示例

数字类

数字类型简介

数字类型示例

执行结果

数点类型

数点类型字段简介

数点类型示例

示例执行结果

StoredField

StoredField简介

示例

BinaryField

BinaryField 简介

示例

排序打分类型

简介

示例

执行结果

普通查询，未排序

查询+排序（降序）

reverse 排序字段剖析

Score - NaN

不同类型字段用同一个名问题

评论

相关文章

`reverse` 排序字段剖析