KeyLimeTie Blog

Lucene.NET - Advanced Search Engine example

By Brian Pautsch – 6/12/2005 9:19:50 PM. Posted to Applications.

A recent project at my current contract required us to research and obtain a 3rd party advanced search engine to fulfill some of the project requirements. I was not involved in the research, but the company decided to go with Verity. It turned out the Verity K2 Enterprise software had everything we needed to get the project developed and deployed fast. Working with the Verity software was very interesting and I decided to see what else was out there (open source only). Almost immediately, I came across the Apache Lucene project. "Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.". Getting the Java examples to work was very easy and so I started to look for more online...that's when I came across the .NET version at SearchBlackBox.com. "SearchBlackBox Lucene Edition is a 100% C# based native .NET assembly that is fully optimized for the .NET Framework.". The website doesn't offer a lot of example, but it's easy to get it working. Also, the software product Lookout (aquired by Microsoft in June 2004) runs on the Lucene.NET code.

This application walks you through a simple implementation of the Lucene .NET DLL. We're bascially going to build out own Desktop Search Engine.

Download code

1. LuceneEngine.IndexFiles.cs
Lines 12-59: Input public properties (FilesLocation, IndexLocation, StopProcessing) and output public properties (Error, NumDocsIndexed, NumDocsSkipped, NumDocsErrored, TotalTime).
Lines 63-68: Custom events that are fired when the percentage of files completed changes and when errors occur.
Lines 69-135: StartIndexing is the entry method to get the indexing process going. After validating the input properties, I iterate through the directory chosen to index and all of its subdirectories. After I have all of the directories stored in mstrSubDirectories, I iterate through each one and call IndexDocs.
Lines 136-182: IndexDocs is the real worker in this application. This method loops through each file and, if the extension is supported, adds the document to the index.

2. LuceneEngine.MakeFileDoc.cs
Lines 12-46: The only method here is Document(), which returns a Lucene.Net.Documents.Document object. A Lucene Document contains multiple properties and the code populates some of them: filename, path, name, length, contents, creation_time, last_write_time and last_access_time.

3. LuceneEngine.SearchFiles.cs
Lines 16-72: Input public properties (IndexLocation, SearchFor, LastWriteFrom, LastWriteTo, NumHitsRequested) and output public properties (NumHitsFound, ResultsXML, ResultsDataView, TotalTime, Error).
Lines 73-158: The only method here is StartSearch(). This method creates an instance of the IndexSearcher and StandardAnalyzer (other Analyzers are available, but the StandardAnalyzer is basic enough for this project). It then creates a Lucene.Net.Search.Query object and selects the "contents" to be searched. Then the filtering criteria is loaded into a Lucene.Net.Search.DateFilter object. Finally, we call the IndexSearcher's Search method and return a Lucene.Net.Search.Hits object. From that, I get the number of hits and can iterate through the results collection. In this example, I create an XML resultset and a DataView (gives more flexibility to the consumer, i.e. WinForms App, Website or Web Service). The success result (true/false) is returned.

4. LuceneUI.frmMain
The top section consists of the indexing criteria. The "Files Location" is the root folder for the files you want to index. The "Index Location" is the location where the index should be stored.

 

The middle section consists of the search criteria and results. Simply enter in the search phrase and date range and the results will return immediately. To launch any item, double-click the row.

 

5. LuceneUI.frmMain.cs
Lines 383-478: The cmdStart_Click event handles all of the processing to index the files. After validating the input data, it creates an instance of LuceneEngine.Index.IndexFiles, sets up the events, loads the properties and calls the StartIndexing method. Upon return, it displays the results. While indexing was occurring, events were being fired constantly and the IndexFiles_OnPercentCompleteChangedHandler (Lines 486-497) method was processing them.
Lines 506-618: The cmdSearch_Click event handles all processing to search  the index. After validating the input data, it creates an instance of LuceneEngine.Search.SearchFiles, load sthe properties and calls the StartSearch method. Upon return, it displays the statistical results in a label, binds the results DataView to the DataGrid and automatically resizes the columns so they can be viewed (SizeColumnsToContent).

Comments

On 1/8/2010 seraj said:
Thank you..

On 12/29/2008 Dabra Cic said:
Thanks for really interesting information.

On 10/9/2008 Miguel Alho said:
Thanks, this was really helpfull.

On 7/25/2008 Diab said:
Thank you for sharing this information.

Leave a Comment

Name:
Email:
URL:
Comment:
Security Code:
Type Security Code:

Photos on Flickr

More Photos »

Search Blog


Get Email Updates

Like what you read here at KeyLimeTie? Sign up for our email list!

Subscribe