SQL Server 2005 Full Text Search

We had a requirement where the user had to do a search on Overview/description field which had nvarchar (max) data type. Full text search can be used for searching against the LOB data types like nvarchar (max), text and BLOB data types like varbinary (max) and image data type. Please note that in image and text data type will be deprecated in the future and hence avoid using them.

Some of us might wonder as to why we cannot use a normal non-clustered index against these data types. Assume that you have a non-clustered index against nvarchar (max) in table with millions of record and you can use LIKE predicate in the WHERE clause (example. WHERE description LIKE %search text%). The query would run for hours to return the result set depending on the search text and also the non-clustered index wouldnt be used since we are having a leading wild card in the WHERE clause. Whereas the same search using full text index would be completed within seconds.

In any version of windows there is a component called Microsoft Search Service. This service provides indexing and searching capabilities to SQL Server. SQL 2005 now has the side-by-side installation of the full-text engine. For each instance of SQL 2005, one instance of Microsoft Full-Text Engine for SQL Server (MSFTESQL) service is installed.

Installing Full Text Search:

I already have other services like database engine, integration services and all have SP2 applied. Now when I ran the setup to install I faced issue. It seems that we shouldnt run the setup directly and instead go to Control panel > Add/Remove program > Select SQL Server 2005 > Click Change and select the instance in which you wish to install full text search service and proceed to install it. After that since all other components have SP2 except full text we need to apply SP2 for full text as well in order to make it work properly. Without SP2 installed for full text service you wouldnt be able to create full text catalog. Basically we need to have the same version for full text service as the db engine.

Steps for installing Full text index:

1. The table should have a unique not null column like primary key or a unique indexed column
2. Enable full text index for the database
3. Create the full text catalog
4. Create the full text index
5. Create a schedule to repopulate the index on regular basis (optional)

We can enable the full text index for the database using the below command, execute it in the respective database:

EXEC sp_fulltext_database enable

To create the full text catalog we can make use of GUI or through query,

CREATE FULLTEXT CATALOG [Test]
IN PATH N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQLFTData'
WITH ACCENT_SENSITIVITY = ON
AUTHORIZATION [dbo]

If you wish to use GUI expand the database, expand storage right click on Full text catalogs and create the catalog as shown in the below screenshot

sql-2005-fts-1

Mention the name of catalog and catalog location in the disk as shown below:

sql-2005-fts-2

Now we need to create the Full text index using the below command,

CREATE FULLTEXT INDEX ON [dbo]. [CurriculumUnit]
(Overview,Title)
KEY INDEX PK_CurriculumUnit ON [Test]
WITH CHANGE_TRACKING AUTO
GO
CREATE FULLTEXT INDEX ON [dbo].[LearningActivity]
(Overview,Title)
KEY INDEX PK_LearningActivity ON [Test]
WITH CHANGE_TRACKING AUTO
GO

We can also use the GUI by right clicking on the full text catalog named Test, click on Tables/views in select a page option and add the tables you wish to index using full text and then choose the respective columns.

sql-2005-fts-3

 

Populating full text index:

Creating and maintaining a full-text index involves populating the index by using a process called a population (also known as a crawl). SQL Server supports the following types of population: full population, change tracking-based automatic or manual population, and incremental timestamp-based population.

Full population:

Firstly, whenever we create a full text index there will be a full population of the index for all the rows. To avoid it we need to use the option CHANGE_TRACKING OFF, NO POPULATION. Those options will turn off change tracking and will prevent populating the full text index while FTX creation. Since the Full population consumes a lot of resources we need to do them at offproduction hours. During a full population, index entries are built for all the rows of a table or indexed view. A full population of a full-text index builds index entries for all the rows of the base table or indexed view.

Change tracking based population: There are 2 types

a.) Using Automatic whenever an insert/update or delete happens to the full text index there will be a population of those rows alone (automatically)

b.) Using Manual whenever DML operations happen it will not get propagated automatically to the FTX. Hence we need to use automatic population so that whenever a change happens in the text field it will immediately be available in the full text index i.e. it will be populated instantaneously. If we use manual population we need to run the command using a job to populate the changes alone manually.

The overhead involved here is the SQL Server will maintain a table to track the list of tables and rows modified.

Incremental population: We need to have a timestamp column in the table in order to make use of Incremental population. This will populate only the rows that have modified since the previous incremental population. This method is highly suitable for tables which change frequently.

The time delay for the full text index data to be available while searching after getting populated is close to 1 to 2 minutes maximum. Without getting populated it will not be available.

There are four T-SQL predicates used in Full text searching namely,

1. FREETEXT

2. FREETEXTTABLE

3. CONTAINS

4. CONTAINSTABLE

Freetext Its a predicate used in the where clause to search for words in full text index. It will fetch all the words similar in meaning to the search text and not only the exact match.

SELECT * FROM Curriculumunit WHERE FREETEXT(Overview,'Test')

Output will be as follows: It not only brings the exact match of the word but also the words similar in meaning.

Deepak is testing

6th Maths test

Freetexttable – Returns a table of zero, one, or more rows for those columns containing character-based data types for values that match the meaning, but not the exact wording, of the text in the specified free text_string. FREETEXTTABLE can only be referenced in the FROM clause of a SELECT statement like a regular table name.

Queries using FREETEXTTABLE specify freetext-type full-text queries that return a relevance ranking value (RANK) and full-text key (KEY) for each row.

SELECT * FROM Curriculumunit Cu
INNER JOIN FREETEXTTABLE(Curriculumunit,Overview,'test') AS Key_tbl
ON Cu.Curriculumunitid=Key_tbl.[key]
ORDER BY RANK DESC

Contains Its used to fetch the records that produce an exact match to a single word or phrase and also the proximity of words within a certain distance of one another.

The following will bring the exact matches as the word test in the overview field.

SELECT * FROM Curriculumunit Cu WHERE CONTAINS(Overview,'test')

The following will bring the words beginning with test (example the result will also contain the word testing). We need to include enclose a prefix term in double quotation marks (“”) and add an asterisk (*) before the ending quotation mark, so that all text starting with the simple term specified before the asterisk is matched.

SELECT * FROM Curriculumunit Cu WHERE CONTAINS(Overview,'"test*"')

ContainsTable Returns a table of zero, one, or more rows for those columns containing precise matches to single words and phrases, the proximity of words within a certain distance of one another, or weighted matches. CONTAINSTABLE can only be referenced in the FROM clause of a SELECT statement as if it were a regular table name. It is similar to freetexttable but it does pull both the words similar to the search text and those similar in meaning.


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *