SharePoint search: Inside the Index book ‘review’ at ia play

SharePoint search: Inside the Index book ‘review’

Inside the Index and Search Engines is 624 pages of lovely SharePoint search info. It is the sort of book that sets me apart from my colleagues. I was delighted when it arrived, everyone else was sympathetic.

The audience is “administrators” and “developers”. I’m never sure how technical they are imagining when they say “administrators” so I waded in anyway. The book defines topics for administrators as; managing the index file; configuring the end-user experience; managing metadata; search usage reports; configuring BDC applications; monitoring performance; administering protocol handlers and iFilters. I skimmed through the content for developers and found some useful nuggets in there too.

Contents:
1. Introducing Enterprise Search in SharePoint 2007
2. The End-User Search Experience
3. Customizing the Search User Interface
4. Search Usage Reports
5. Search Administration
6. Indexing and Searching Business Data
7. Search Deployment Considerations
8. Search APIs
9. Advanced Search Engine Topics
10. Searching with Windows SharePoint Services 3.0

The book begins by setting the scene, and with lots of fluff about why search matters and some slightly awkward praise for Microsoft’s efforts. It gets much more interesting later, so you can probably skip most of the introduction.

Content I found useful:

Chapter 1. Introducing Enterprise Search in SharePoint 2007

p.28-33 includes a comparison of features for a quick overview of Search Server, Search Server Express and SharePoint Server.

“Queries that are submitted first go through layers of word breakers and stemmers before they are executed against the content index file is available. Word breaking is a technique for isolating the important words out of the content, and stemmers store the variations on a word” p.32

Keyword query syntax p.44

maximum query length 1024 characters
by default is not case sensitive
defaults to AND queries
phrase searches can be run with quote marks
wildcard searching is not supported at the level of keyword syntax search queries. Developers could build this functionality using CONTAINS in the SQL query syntax
exclude words with
you can search for propertiesÂ e.g rnib author:loasby
property searches can include prefix searches e.g author:loas
properties are ANDed unless it the same property repeated (which would run as OR search)

Search URL parameters p.50

k = keyword query
s = the scope
v = sort e.g “&v=date”

Chapter 4: The Search Usage Reports

Search queries report contains:

number of queries
query origin site collections
number of queries per scope
query terms

Search results report contains:

search result destination pages (which URL was clicked by users)
queries with zero results
most clicked best bets
search results with zero best bets
queries with low clickthrough

Data can be exported to Excel (useful if I need to share the data in an accessible format).

You cannot view data beyond the 30 day data window. The suggested solution is to export every report!

Chapter 5: Search Administration

Can manage the crawl by:

create content sources
define crawl rules : exclude content (can use wildcard patterns), follow/noindex, crawl URLs with query strings
define crawl schedules
removed unwanted items with immediate effect
troubleshoot crawls

There’s a useful but off-topic box about file shares vs. sharepoint on p.225

Crawler can discover metadata from:

file properties e.g name, extension, date and size
additional microsoft office properties
SharePoint list columns
Meta Tags from in HTML
Email subject and to fields
User profile properties

You can view the list of crawled properties via the Metadata Property Mappings link in the Configure Search Settings page. The Included In Index indicates if the property is searchable.

Managed properties can be:

exposed in advanced search and in query syntax
displayed in search results
used in search scope rules
used in custom relevancy ranking

Adjusting the weight of properties in ranking is not an admin interface task and can only be done via the programming interface.

High Confidence Results: A different (more detailed?) result for results that the search engine believes are an exact match for the query.

Authoritative Pages

site central to high priority business process should be authoritative
sites that encourage collaboration and actions should be authoritative
external sites should not be authoritative

Thesaurus p.291

an XML file on the server with no admin interface
no need to include stemming variations
different lanuage thesauri exist. The one used depends on the language specified by client apps sending requests
tseng.xml and tsenu.xml

Noise words p.294

language specific plain text files, in the same directory as the thesaurus
- for US english the file name is noiseenu.txt

Diacritic-sensitive search

off by default

Chapter 8 – Search APIs

Mostly too technical but buried in the middle of chapter 8 are the ranking parameters:

saturation constant for term frequency
saturation constand for click distance
weight of click distance for calculating relevance
saturation constant for URL depth
weight of URL depth for calculating relevance
weight for ranking applied to non-default language
weight of HTML, XML and TXT content type
weight of document content types (Word, PP, Excel and Outlook)
weight of list items content types

They’ll come in handy when I’m baffling over some random ranking decisions that SP has made.

Chapter 9 – Advanced Search Engine Topics
Skipped through most of this but it does covers the Codeplex Faceted Search on p.574-585

Overall

A good percentage of the book was valuable to a non-developer, particularly one who is happy to skip over chunks of code. I’ve seen and heard a lot of waffle about what SharePoint search does and doesn’t do, so it was great to get some solid answers.
Inside the Index and Search Engines: MicrosoftÂ® Office SharePointÂ® Server 2007

Written by Karen

July 22nd, 2009 at 6:33 am

Posted in books,search,sharepoint

ia play

SharePoint search: Inside the Index book ‘review’

Categories