国产v2ba最新在线观看,久99久精品视频免费观看v,精品一区二区不卡

關于檢索的核心IndexSearcher類。

IndexSearcher是Lucene的檢索實現的最核心的實現類，它繼承自抽象類Searcher，該抽象類中包含了用于檢索的一些核心的方法的實現。而Searcher抽象類有實現了Searchable接口，Searchable接口是實現檢索的抽象網絡協議，可以基于此協議來實現對遠程服務器上的索引目錄的訪問。這一點，可以從Searchable接口所繼承的java.rmi.Remote接口來說明。

java.rmi.Remote接口在JDK中給出了說明，如下所示：

也就是說，繼承java.rmi.Remote的接口具有的特性是：

1、遠程接口用來識別那些繼承java.rmi.Remote的接口類，這些接口被非本地虛擬機調用；

2、繼承java.rmi.Remote的接口類具有遠程可用的特性；

3、實現了java.rmi.Remote接口的子接口的實現類，可以對遠程對象進行管理。

下面就對與檢索相關的一些接口及一些抽象類做一個概覽，有助于后面對這些接口的實現類進行學習研究：

Searchable接口類

Searchable接口的實現如下所示：

package org.apache.lucene.search;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.FieldSelector;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.CorruptIndexException;

import java.io.IOException;????

public interface Searchable extends java.rmi.Remote {
/* 用于檢索的核心方法，指定了權重Weight和過濾器Filter參數。因為返回值為void類型，所以實際檢索出來的Document都被存放在HitCollector中，該HitCollector類收集了那些得分大于0的Document。*/
void search(Weight weight, Filter filter, HitCollector results)
throws IOException;

// 釋放一個IndexSearcher檢索器所關聯的資源
void close() throws IOException;

// 返回根據指定詞條檢索出來的Document的數量
int docFreq(Term term) throws IOException;

// 返回根據指定詞條數組中所列詞條檢索出來的Document的數量的一個數組
int[] docFreqs(Term[] terms) throws IOException;

// 返回一個整數值：最大可能的Document的數量 + 1
int maxDoc() throws IOException;

// 檢索的方法，返回檢索出來的得分(Hits)排在前n位的Document
TopDocs search(Weight weight, Filter filter, int n) throws IOException;

//?? 獲取編號為i的Document，(注意：是內部編號，可以在上面測試程序中執行System.out.println(searcher.doc(24));，打印出結果為Document<stored/uncompressed,indexed<path:E:\Lucene\txt1\mytxt\FAQ.txt> stored/uncompressed,indexed<modified:200604130754>>)
Document doc(int i) throws CorruptIndexException, IOException;

// 獲取在位置n上的Document；FieldSelector接口類似于一個文件過濾器，它有一個方法FieldSelectorResult accept(String fieldName);
Document doc(int n, FieldSelector fieldSelector) throws CorruptIndexException, IOException;

//?? 重新設置Query(即，重寫先前設定的Query)
Query rewrite(Query query) throws IOException;

//?? 返回一個Explanation，該Explanation用于計算得分
Explanation explain(Weight weight, int doc) throws IOException;

// 指定一種排序方式，在此基礎上返回得分在前n位的Document
TopFieldDocs search(Weight weight, Filter filter, int n, Sort sort)
throws IOException;

}

Searcher抽象類

package org.apache.lucene.search;

import java.io.IOException;

import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.Term;
import org.apache.lucene.document.Document;

// 該抽象類實現了Searchable接口
public abstract class Searcher implements Searchable {

// 查詢與指定Query匹配的Document，返回Hits實例，該Hits內容相當豐富
public final Hits search(Query query) throws IOException {
??? return search (query, (Filter)null);??? // 調用下面的search()方法
}

public Hits search (Query query, Filter filter) throws IOException {
??? return new Hits(this, query, filter);
}

// 指定了Sort
public Hits search(Query query, Sort sort)
??? throws IOException {
??? return new Hits(this, query, null, sort);
}

// 指定了Filter和Sort
public Hits search(Query query, Filter filter, Sort sort)
??? throws IOException {
??? return new Hits(this, query, filter, sort);
}

// 實現了Searchable接口中方法，指定一種排序方式，在此基礎上返回得分在前n位的Document
public TopFieldDocs search(Query query, Filter filter, int n,
???????????????????????????? Sort sort) throws IOException {
??? return search(createWeight(query), filter, n, sort);??? // 調用abstract public TopDocs search(Weight weight, Filter filter, int n) throws IOException;
}

public void search(Query query, HitCollector results)
??? throws IOException {
??? search(query, (Filter)null, results);
}

public void search(Query query, Filter filter, HitCollector results)
??? throws IOException {
??? search(createWeight(query), filter, results);
}

?? public TopDocs search(Query query, Filter filter, int n)
??? throws IOException {
??? return search(createWeight(query), filter, n);
}

?? public Explanation explain(Query query, int doc) throws IOException {
??? return explain(createWeight(query), doc);
}

// 為一個Searcher設置一個Similarity
public void setSimilarity(Similarity similarity) {
??? this.similarity = similarity;
}

public Similarity getSimilarity() {
??? return this.similarity;
}

// 根據指定的Query，創建一個用于記錄該Query狀態的Weight
protected Weight createWeight(Query query) throws IOException {
????? return query.weight(this);
}

// 實現了接口Searchable中的方法
public int[] docFreqs(Term[] terms) throws IOException {
??? int[] result = new int[terms.length];
??? for (int i = 0; i < terms.length; i++) {
????? result[i] = docFreq(terms[i]);
??? }
??? return result;
}

// 一些abstract方法，在接口Searchable中列舉過
abstract public void search(Weight weight, Filter filter, HitCollector results) throws IOException;
abstract public void close() throws IOException;
abstract public int docFreq(Term term) throws IOException;
abstract public int maxDoc() throws IOException;
abstract public TopDocs search(Weight weight, Filter filter, int n) throws IOException;
abstract public Document doc(int i) throws CorruptIndexException, IOException;
abstract public Query rewrite(Query query) throws IOException;
abstract public Explanation explain(Weight weight, int doc) throws IOException;
abstract public TopFieldDocs search(Weight weight, Filter filter, int n, Sort sort) throws IOException;
}

Weight接口類

創建一個Weight的目的是，使得一個已經定制的Query實例不在檢索過程中被修改，以至于該Query實例可以被重用，而無需重復創建。

一個Query實例是獨立于IndexSearcher檢索器的。Query的這種獨立的狀態應該被記錄在一個Weight中。

Weight接口的源代碼如下所示：

package org.apache.lucene.search;

import java.io.IOException;

import org.apache.lucene.index.IndexReader;

public interface Weight extends java.io.Serializable {
// 獲取該Weight所關聯的Query實例
Query getQuery();

// 獲取一個Query的Weight值
float getValue();

/** The sum of squared weights of contained query clauses. */
float sumOfSquaredWeights() throws IOException;

// 為一個Query設置標準化因子
void normalize(float norm);

// 為一個Weight創建一個Scorer(Scorer是與Document的得分相關的)
Scorer scorer(IndexReader reader) throws IOException;

// 為編號為i的Document計算得分，返回Explanation記錄了該Document的得分
Explanation explain(IndexReader reader, int doc) throws IOException;
}

HitCollector抽象類

package org.apache.lucene.search;

// 抽象類用于收集檢索出來的Document
public abstract class HitCollector {
// 根據Document的編號和得分，篩選符合條件的Document
public abstract void collect(int doc, float score);
}

Scorer抽象類

package org.apache.lucene.search;

import java.io.IOException;

// 用于管理與查詢Query匹配的Document的得分
public abstract class Scorer {
private Similarity similarity;

// Constructs a Scorer.
protected Scorer(Similarity similarity) {
??? this.similarity = similarity;
}

public Similarity getSimilarity() {
??? return this.similarity;
}

// 遍歷HitCollector，收集所有匹配的Document
public void score(HitCollector hc) throws IOException {
??? while (next()) {
????? hc.collect(doc(), score());
??? }
}

// 在指定范圍內(編號<max的Document)收集匹配的Document
protected boolean score(HitCollector hc, int max) throws IOException {
??? while (doc() < max) {
????? hc.collect(doc(), score());
????? if (!next())
??????? return false;
??? }
??? return true;
}

/** Advances to the next document matching the query. */
public abstract boolean next() throws IOException;

// 獲取當前Document的編號
public abstract int doc();

//?? 獲取當前匹配的Document的得分
public abstract float score() throws IOException;

/** Skips to the first match beyond the current whose document number is
?? * greater than or equal to a given target.
?? * When this method is used the {@link #explain(int)} method should not be used.
?? * @param target The target document number.
?? * @return true iff there is such a match.
?? * Behaves as if written: <pre>
?? *?? boolean skipTo(int target) {
?? *???? do {
?? *?????? if (!next())
?? * ????? return false;
?? *???? } while (target > doc());
?? *???? return true;
?? *?? }
?? * </pre>Most implementations are considerably more efficient than that.
?? */
public abstract boolean skipTo(int target) throws IOException;
public abstract Explanation explain(int doc) throws IOException;

}

Similarity抽象類

關于該抽象類的說明，可以參考源代碼說明，如下所示：

org.apache.lucene.search.Similarity

Expert: Scoring API.

Subclasses implement search scoring.

The score of query q for document d correlates to the cosine-distance or dot-product between document and query vectors in a Vector Space Model (VSM) of Information Retrieval . A document whose vector is closer to the query vector in that model is scored higher. The score is computed as follows:

score(q,d) = coord(q,d) · queryNorm(q) ·	∑	( tf(t in d) · idf(t) ² · t.getBoost() · norm(t,d) )
	t in q

where

tf(t in d) correlates to the term's frequency , defined as the number of times term t appears in the currently scored document d . Documents that have more occurrences of a given term receive a higher score. The default computation for tf(t in d) in DefaultSimilarity is:

tf(t in d) =

frequency ^?

idf(t) stands for Inverse Document Frequency. This value correlates to the inverse of docFreq (the number of documents in which the term t appears). This means rarer terms give higher contribution to the total score. The default computation for idf(t) in DefaultSimilarity is:

idf(t) =

1 + log (

numDocs

–––––––––

docFreq+1

)

coord(q,d) is a score factor based on how many of the query terms are found in the specified document. Typically, a document that contains more of the query's terms will receive a higher score than another document with fewer query terms. This is a search time factor computed in coord(q,d) by the Similarity in effect at search time.

queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different indexes) comparable. This is a search time factor computed by the Similarity in effect at search time. The default computation in DefaultSimilarity is:

queryNorm(q) = queryNorm(sumOfSquaredWeights) =

––––––––––––––

sumOfSquaredWeights ^?

The sum of squared weights (of the query terms) is computed by the query org.apache.lucene.search.Weight object. For example, a boolean query computes this value as:

sumOfSquaredWeights = q.getBoost() ² ·	∑	( idf(t) · t.getBoost() ) ²
	t in q

t.getBoost() is a search time boost of term t in the query q as specified in the query text (see query syntax ), or as set by application calls to setBoost(). Notice that there is really no direct API for accessing a boost of one term in a multi term query, but rather multi terms are represented in a query as multi TermQuery objects, and so the boost of a term in the query is accessible by calling the sub-query getBoost().

norm(t,d) encapsulates a few (indexing time) boost and length factors:

Document boost - set by calling doc.setBoost() before adding the document to the index.
Field boost - set by calling field.setBoost() before adding the field to a document.
lengthNorm (field) - computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score. LengthNorm is computed by the Similarity class in effect at indexing.

When a document is added to the index, all the above factors are multiplied. If the document has multiple fields with the same name, all their boosts are multiplied together:

norm(t,d) = doc.getBoost() · lengthNorm(field) ·	∏	f.getBoost()
	field f in d named as t

However the resulted norm value is encoded as a single byte before being stored. At search time, the norm byte value is read from the index directory and decoded back to a float norm value. This encoding/decoding, while reducing index size, comes with the price of precision loss - it is not guaranteed that decode(encode(x)) = x. For instance, decode(encode(0.89)) = 0.75. Also notice that search time is too late to modify this norm part of scoring, e.g. by using a different Similarity for search.

See Also:

setDefault(Similarity)

org.apache.lucene.index.IndexWriter.setSimilarity(Similarity)

Searcher.setSimilarity(Similarity)

該抽象類的源代碼如下所示：

package org.apache.lucene.search;

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.util.SmallFloat;

import java.io.IOException;
import java.io.Serializable;
import java.util.Collection;
import java.util.Iterator;

public abstract class Similarity implements Serializable {
// DefaultSimilarity是Similarity的子類
private static Similarity defaultImpl = new DefaultSimilarity();

public static void setDefault(Similarity similarity) {
??? Similarity.defaultImpl = similarity;
}

public static Similarity getDefault() {
??? return Similarity.defaultImpl;
}

// 標準化因子列表
private static final float[] NORM_TABLE = new float[256];

static {??? // 靜態加載
??? for (int i = 0; i < 256; i++)
????? NORM_TABLE[i] = SmallFloat.byte315ToFloat((byte)i);??? // 將Cache中的字節轉化成浮點數
}

// 解碼標準化因子(從byte變為float)
public static float decodeNorm(byte b) {
??? return NORM_TABLE[b & 0xFF];??? // & 0xFF maps negative bytes to positive above 127
}

// 獲取解碼標準化因子列表
public static float[] getNormDecoder() {
??? return NORM_TABLE;
}

// 指定了名稱為fieldName的Field，以及該Field中包含的詞條的數量numTokens，計算該Field的標準化因子長度
public abstract float lengthNorm(String fieldName, int numTokens);

// 給定了一個Query的每個詞條的Weight的平方值，計算一個Query的標準化因子
public abstract float queryNorm(float sumOfSquaredWeights);

//??? 為一個索引中存儲的標準化因子解碼(從float到byte)
public static byte encodeNorm(float f) {
??? return SmallFloat.floatToByte315(f);
}

// 計算一個Document中的詞條的得分因子
public float tf(int freq) {
??? return tf((float)freq);
}

/** Computes the amount of a sloppy phrase match, based on an edit distance.
?? * This value is summed for each sloppy phrase match in a document to form
?? * the frequency that is passed to {@link #tf(float)}.
?? *
?? * A phrase match with a small edit distance to a document passage more
?? * closely matches the document, so implementations of this method usually
?? * return larger values when the edit distance is small and smaller values
?? * when it is large.
?? *
?? * @see PhraseQuery#setSlop(int)
?? * @param distance the edit distance of this sloppy phrase match
?? * @return the frequency increment for this match
?? */
public abstract float sloppyFreq(int distance);

/** Computes a score factor based on a term or phrase's frequency in a
?? * document. This value is multiplied by the {@link #idf(Term, Searcher)}
?? * factor for each term in the query and these products are then summed to
?? * form the initial score for a document.
?? *
?? * Terms and phrases repeated in a document indicate the topic of the
?? * document, so implementations of this method usually return larger values
?? * when <code>freq</code> is large, and smaller values when <code>freq</code>
?? * is small.
?? *
?? * @param freq the frequency of a term within a document
?? * @return a score factor based on a term's within-document frequency
?? */
public abstract float tf(float freq);

/** Computes a score factor for a simple term.
?? *
?? * The default implementation is:<pre>
?? *?? return idf(searcher.docFreq(term), searcher.maxDoc());
?? * </pre>
?? *
?? * Note that {@link Searcher#maxDoc()} is used instead of
?? * {@link org.apache.lucene.index.IndexReader#numDocs()} because it is proportional to
?? * {@link Searcher#docFreq(Term)} , i.e., when one is inaccurate,
?? * so is the other, and in the same direction.
?? *
?? * @param term the term in question
?? * @param searcher the document collection being searched
?? * @return a score factor for the term
?? */
public float idf(Term term, Searcher searcher) throws IOException {
??? return idf(searcher.docFreq(term), searcher.maxDoc());
}

// 為一個短語計算得分因子
public float idf(Collection terms, Searcher searcher) throws IOException {
??? float idf = 0.0f;
??? Iterator i = terms.iterator();
??? while (i.hasNext()) {
????? idf += idf((Term)i.next(), searcher);
??? }
??? return idf;
}

/** Computes a score factor based on a term's document frequency (the number
?? * of documents which contain the term). This value is multiplied by the
?? * {@link #tf(int)} factor for each term in the query and these products are
?? * then summed to form the initial score for a document.
?? */
public abstract float idf(int docFreq, int numDocs);

/** Computes a score factor based on the fraction of all query terms that a
?? * document contains. This value is multiplied into scores.
?? */
public abstract float coord(int overlap, int maxOverlap);

/**
?? * Calculate a scoring factor based on the data in the payload. Overriding implementations
?? * are responsible for interpreting what is in the payload. Lucene makes no assumptions about
?? * what is in the byte array.
?? */
public float scorePayload(byte [] payload, int offset, int length)
{
??? //Do nothing
??? return 1;
}

}

Lucene-2.2.0 源代碼閱讀學習(28)

更多文章、技術交流、商務合作、聯系博主

微信掃碼或搜索：z360901061

微信掃一掃加我為好友

QQ號聯系： 360901061

您的支持是博主寫作最大的動力，如果您喜歡我的文章，感覺我的文章對您有幫助，請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧，狠狠點擊下面給點支持吧，站長非常感激您！手機微信長按不能支付解決辦法：請將微信支付二維碼保存到相冊，切換到微信，然后點擊微信右上角掃一掃功能，選擇支付二維碼完成支付。

【本文對您有幫助就好】元

2元

5元

10元

20元

自定義

欧美三区_成人在线免费观看视频_欧美极品少妇xxxxⅹ免费视频_a级毛片免费播放_鲁一鲁中文字幕久久_亚洲一级特黄

org.apache.lucene.search.Similarity