注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

程序员小站

J2EE丨Spring | JVM | Scala

 
 
 

日志

 
 

lucene 3.5 入门实例  

2012-02-07 23:36:23|  分类: lucene |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
该版本进行了大量优化、改进和Bug修复,包括: 
  • 大大降低了控制开放的IndexReader上的协议索引的RAM占用(3~5倍)。
  • 新增IndexSearcher.searchAfter,可在指定ScoreDoc后返回结果(例如之前页面的最后一个文档),以支持deep页用例。
  • 新增SearcherManager,以管理共享和重新开始跨多个搜索线程的IndexSearchers。基本的IndexReader实例如果不再进行引用,则会被安全关闭。
  • 新增SearcherLifetimeManager,为跨多个请求(例如:paging/drilldown)的索引安全地提供了一个一致的视图。
  • 将IndexWriter.optimize重命名为forceMerge,以便去阻止使用这种方法,因为它的使用代价较高,且也不需要使用。
  • 新增NGramPhraseQuery,当使用n-gram分析时,可提升30%-50%的短语查询速度。
  • 重新开放了一个API(IndexReader.openIfChanged),如果索引没有变化,则返回空值,而不是旧的reader。
  • Vector改进:支持更多查询,如通配符和用于产生摘要的边界分析。
  • 修复了若干Bug。
2.新建java Project : lucene3.5
3.新建Folder :  lib  放入所需jar包  : lucene-core-3.5.0.jar  并 build path
   新建Folder :  luceneDatasource  存放文件的目录,放入一个txt文件。
新建类:HelloWorld:

package com.lucene;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Filter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

import com.lucene.utils.File2DocumentUtils;

public class HelloWorld {
String filePath = "D:\\f\\lucene3.5\\luceneDatasource\\I HAVE A DREAM.txt";
String indexPath = "D:\\f\\lucene3.5\\luceneIndex";
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

public static void main(String[] args) throws Exception{
HelloWorld helloWorld = new HelloWorld();
helloWorld.createIndex();
helloWorld.search();
}
/**
* 创建索引
* @throws IOException
* @throws CorruptIndexException
*/
public void createIndex() throws CorruptIndexException, IOException {
Document doc = File2DocumentUtils.file2document(filePath);
Directory dir = FSDirectory.open(new File(indexPath));
// IndexWriter indexWriter = new IndexWriter(dir,analyzer,IndexWriter.MaxFieldLength.LIMITED); //Deprecated
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35,analyzer);
IndexWriter indexWriter = new IndexWriter(dir,indexWriterConfig);
indexWriter.addDocument(doc);
indexWriter.close();
}
public void search() throws IOException, ParseException{
String queryString = "have";
//1.把搜索的文本 解析为Query
String[] fields = {"name","content"};
QueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,fields,analyzer);
Query query = queryParser.parse(queryString);
//2 .查询
Directory dir = FSDirectory.open(new File(indexPath));
IndexReader ir = IndexReader.open(dir);
IndexSearcher indexSearch = new IndexSearcher(ir);
Filter filter = null;
TopDocs topDocs = indexSearch.search(query,filter,10000);
//3.打印结果
System.out.println("总共有"+topDocs.totalHits+"条结果");
for(ScoreDoc scoreDoc : topDocs.scoreDocs){
int docSh = scoreDoc.doc; //文档内部编号
Document doc = indexSearch.doc(docSh); //根据编号取出相应文档
File2DocumentUtils.printDocument(doc); //打印文档信息
}
}
}


新建辅助类:

package com.lucene.utils;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;

public class File2DocumentUtils {
/**
* 文件,filename content path
*
* @param path
* @return
*/
public static Document file2document(String path) {
File file = new File(path);
Document doc = new Document();
// store 是否建立索引 yes no
// index.ANALYZED 分词后索引
doc.add(new Field("name", file.getName(), Store.YES, Index.ANALYZED));
doc.add(new Field("content",readFileContent(file) , Store.YES, Index.ANALYZED));
doc.add(new Field("size", String.valueOf(file.length()), Store.YES, Index.NOT_ANALYZED));
doc.add(new Field("path", file.getAbsolutePath(), Store.YES, Index.NO));
return doc;
}

/**
* 读取文件内容
* @param file
* @return
*/
private static String readFileContent(File file) {
StringBuilder content = new StringBuilder();
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
for (String line = null; (line = reader.readLine()) != null;) {
content.append(line).append("\n");
}
} catch (IOException e) {
e.printStackTrace();
}
return content.toString();
}

public static void printDocument(Document doc) {
System.out.println("name = " + doc.get("name"));
System.out.println("content = " + doc.get("content"));
System.out.println("size = " + doc.get("size"));
System.out.println("path = " + doc.get("path"));
}
}


测试结果:

总共有1条结果
name = I HAVE A DREAM.txt
content = I say to you, my friends, so even though we must face the difficulties of today and tomorrow, I still have a dream. It is a dream deeply rooted in the American dream.
I have a dream that one day this nation will rise up and live out the true meaning of its creed - we hold these truths to be self-evident, that all men are created equal.
I have a dream that one day on the red hills of Georgia, sons of former slaves and sons of former slave-owners will be able to sit down together at the table of brotherhood.
I have a dream that one day, even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.
I have a dream my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character. I have a dream today!
size = 915
path = D:\f\lucene3.5\luceneDatasource\I HAVE A DREAM.txt


  评论这张
 
阅读(699)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017