Archive for the ‘information architecture’ Category
“Best Bets” functionality for search systems
I was working on a Best Bets system this week, which is essentially what I did 8 years ago on my first BBC project . It is nice to working on something straightforward but I’ve had to do a lot of explaining of the concept. What follows is my advice if you are think about adding Best Bets to your search.
What are Best Bets?
Best Bets are essentially editorial picks that appear at the top of the search results. They are a manual intervention for use when the search engine isn’t developing the best results for the users. Some sites use them to fix just a couple of problematic queries but others have built up extensive databases of thousands of best bets.
You can see examples in Peter Morville’s Best Bets collection on Flickr:
http://www.flickr.com/photos/morville/collections/72157603790587909/
Some search systems have Best Bets functionality as standard (surprisingly SharePoint is one of these) or you can have something bespoke added. The first system I ever worked with was just a basic text file that I edited and uploaded to server – you should be able to get something better than that!
A Bad Idea?
Kas Thomas thinks that we shouldn’t do best bets:
“In point of fact, the search software should do all this for you. After all, that’s its job: to return relevant results (automatically) in response to queries. Why would you sink tens (or hundreds) of thousands of dollars into an enterprise search system only to override it with a manually assembled collection of point-hacks? Sure, search is a hard problem. But if your search system is so poor at delivering relevant results that it can’t figure out what your users need without someone in IT explicitly telling it the answer, maybe you should search for a new search vendor.”
http://www.cmswatch.com/Trends/1286-Best-bets:-a-worst-practice
This is the sort of language I expect from the vendors but it is a bit surprising from industry analysts. Yes, the search systems should be good enough. But they’re not. They’re certainly not good enough without a lot of work. A lot of expensive work. If your supplier says “the search is really good, you don’t need to worry about it” then you definitely need to worry about it.
As James Robertson says “No amount of tweaking of metadata or search configuration will… ensure that the most relevant results always appear at the beginning of the list.”
http://www.steptwo.com.au/papers/cmb_bestbets
Oh and IT shouldn’t be managing the Best Bets anyway. The teams I’ve worked with it has always been an editorial or product management role. After all why would you build a simple tool to allow editorial intervention and then ask IT to put the content in?
A simple best bets solution, that can be maintained by editorial/product teams rather than scarce technical experts (or worse expensive consultants) is often a better business solution than battling with the search algorithm to try and get it right for all the scenarios. Particularly on a tight budget.
Other pros for Best Bets:
- Just fixes that problem. It doesn’t change any other results. There’s no mysterious black box that has you banging your head against the desk about why when you changed Property X to fix the results for Query Y the results for Query Z changed like that.
- Fixes the problem straight away. You don’t have to wait for the next crawl or even for an emergency crawl to finish. Sometimes it really is that important. Other times someone else thinks it really is that important and you want them to leave you alone now.
- Buys you time whilst you improve the algorithm.
Managing Best Bets
The critics are however correct that Best Bets have some drawbacks. You have to create and maintain them. If you let the links break then you’ve created a worse user experience than the one you set out to fix.
- Don’t go overboard. Only create them where there are clear problems
- Plan for maintenance time. Who is going to add Best Bets and when? Do they have time to check existing Best Bets?
- Make sure you have access to search logs so you can see what terms users might be having difficulties with
- If possible, set up a broken and redirected link checker to run over the Best Bets
And yes, do look at what your Best Bets tell you about the weakness of your search system. If you have the permissions and the skills you may be able to put that knowledge to use in improving the algorithm. But even if you can’t make the changes yourself and there’s no budget for incremental changes (which there often isn’t) then you can at least start building a business case for a search improvement project.
Designing the display
It is tempting to strongly highlight the Best Bets to draw attention to them but this is one area where usability testing tells us a different story.
Users demonstrate a very strong preference for the first ‘ordinary’ looking search result, which is presumably a behaviour they have learnt from web search engines. With search engines any result that is styled slightly differently is probably an ad. Some users didn’t even notice the existence of best bets when we had tried to draw attention to them. This may be a similar situation to banner blindness.
So don’t make a song and dance about it. We might feel the need to tell the user all the effort we’ve put into helping them but ultimately they just want the right result for their query. And they don’t care how it gets to the top of the results, so long as it is at the top of the results.
(Think about it. You’d never highlight a set of the results with a label saying “Brought to you by the IA tweaking the algorithm to weight page title more heavily”)
3 steps to happy Best Bets
In summary:
- If the system you are buying doesn’t come with a built in Best Bets system, see if you can get a simple one added on.Think of it as safety net for once all the developers and project managers have packed up and left you to your own devices.
- Put them at the top of the search results. If you feel the need to style them differently then keep the styling as minimal as possible
- Don’t get carried away and make sure you maintain those links!
Related posts:
SharePoint search: Inside the Index book ‘review’
Inside the Index and Search Engines is 624 pages of lovely SharePoint search info. It is the sort of book that sets me apart from my colleagues. I was delighted when it arrived, everyone else was sympathetic.
The audience is “administrators” and “developers”. I’m never sure how technical they are imagining when they say “administrators” so I waded in anyway. The book defines topics for administrators as; managing the index file; configuring the end-user experience; managing metadata; search usage reports; configuring BDC applications; monitoring performance; administering protocol handlers and iFilters. I skimmed through the content for developers and found some useful nuggets in there too.
Contents:
1. Introducing Enterprise Search in SharePoint 2007
2. The End-User Search Experience
3. Customizing the Search User Interface
4. Search Usage Reports
5. Search Administration
6. Indexing and Searching Business Data
7. Search Deployment Considerations
8. Search APIs
9. Advanced Search Engine Topics
10. Searching with Windows SharePoint Services 3.0
The book begins by setting the scene, and with lots of fluff about why search matters and some slightly awkward praise for Microsoft’s efforts. It gets much more interesting later, so you can probably skip most of the introduction.
Content I found useful:
Chapter 1. Introducing Enterprise Search in SharePoint 2007
p.28-33 includes a comparison of features for a quick overview of Search Server, Search Server Express and SharePoint Server.
“Queries that are submitted first go through layers of word breakers and stemmers before they are executed against the content index file is available. Word breaking is a technique for isolating the important words out of the content, and stemmers store the variations on a word” p.32
Keyword query syntax p.44
- maximum query length 1024 characters
- by default is not case sensitive
- defaults to AND queries
- phrase searches can be run with quote marks
- wildcard searching is not supported at the level of keyword syntax search queries. Developers could build this functionality using CONTAINS in the SQL query syntax
- exclude words with
- you can search for properties e.g rnib author:loasby
- property searches can include prefix searches e.g author:loas
- properties are ANDed unless it the same property repeated (which would run as OR search)
Search URL parameters p.50
- k = keyword query
- s = the scope
- v = sort e.g “&v=date”
Chapter 4: The Search Usage Reports
Search queries report contains:
- number of queries
- query origin site collections
- number of queries per scope
- query terms
Search results report contains:
- search result destination pages (which URL was clicked by users)
- queries with zero results
- most clicked best bets
- search results with zero best bets
- queries with low clickthrough
Data can be exported to Excel (useful if I need to share the data in an accessible format).
You cannot view data beyond the 30 day data window. The suggested solution is to export every report!
Chapter 5: Search Administration
Can manage the crawl by:
- create content sources
- define crawl rules : exclude content (can use wildcard patterns), follow/noindex, crawl URLs with query strings
- define crawl schedules
- removed unwanted items with immediate effect
- troubleshoot crawls
There’s a useful but off-topic box about file shares vs. sharepoint on p.225
Crawler can discover metadata from:
- file properties e.g name, extension, date and size
- additional microsoft office properties
- SharePoint list columns
- Meta Tags from in HTML
- Email subject and to fields
- User profile properties
You can view the list of crawled properties via the Metadata Property Mappings link in the Configure Search Settings page. The Included In Index indicates if the property is searchable.
Managed properties can be:
- exposed in advanced search and in query syntax
- displayed in search results
- used in search scope rules
- used in custom relevancy ranking
Adjusting the weight of properties in ranking is not an admin interface task and can only be done via the programming interface.
High Confidence Results: A different (more detailed?) result for results that the search engine believes are an exact match for the query.
Authoritative Pages
- site central to high priority business process should be authoritative
- sites that encourage collaboration and actions should be authoritative
- external sites should not be authoritative
Thesaurus p.291
- an XML file on the server with no admin interface
- no need to include stemming variations
- different lanuage thesauri exist. The one used depends on the language specified by client apps sending requests
- tseng.xml and tsenu.xml
Noise words p.294
- language specific plain text files, in the same directory as the thesaurus
- for US english the file name is noiseenu.txt
Diacritic-sensitive search
- off by default
Chapter 8 – Search APIs
Mostly too technical but buried in the middle of chapter 8 are the ranking parameters:
- saturation constant for term frequency
- saturation constand for click distance
- weight of click distance for calculating relevance
- saturation constant for URL depth
- weight of URL depth for calculating relevance
- weight for ranking applied to non-default language
- weight of HTML, XML and TXT content type
- weight of document content types (Word, PP, Excel and Outlook)
- weight of list items content types
They’ll come in handy when I’m baffling over some random ranking decisions that SP has made.
Chapter 9 – Advanced Search Engine Topics
Skipped through most of this but it does covers the Codeplex Faceted Search on p.574-585
A good percentage of the book was valuable to a non-developer, particularly one who is happy to skip over chunks of code. I’ve seen and heard a lot of waffle about what SharePoint search does and doesn’t do, so it was great to get some solid answers.
Inside the Index and Search Engines: Microsoft® Office SharePoint® Server 2007
Related posts
SharePoint search: some ranking factors
SharePoint search: good or bad?
use of Google Analytics
Where search analytics is concerned it appears the RNIB is actually doing what everyone else is doing i.e. using Google Analytics:
“The use of Google Analytics is very much on the increase. Just under a quarter of responding organisations (23%) now use Google Analytics exclusively compared to only 14% a year ago.
A further 57% of respondents are using Google Analytics in conjunction with another tool (up from 52% in 2008), which means that 80% of companies are now using Google for analytics compared to 66% last year…The majority of responding companies believe that they have set up Google Analytics properly.
There is more doubt among those who do not use Google exclusively, with 23% of these
respondents saying they don’t know if it has been properly configured”
And I’m firmly in the later 46% camp these days:
“since 2008 there has been an increase from 8% to 15% of companies who have two dedicated web analysts and a decrease in the proportion of companies who have one analyst (from 32% to 26%).
But while this is a positive development, it can also be seen that exactly the same proportion of companies (46%) report that they do not have any web analysts.”
redesigning NTEN.org
The Nonprofit Technology Network have been sharing lots of info about their ongoing site redesign:
We’re going to make sure our site architecture is sound before we worry about making it purty.
The story so far:
* We started with a card sort. Rebecca Sherrill, our Information Architect at Beaconfire, has written a terrific synopsis of that process, with definitions, a walk-through of the process, and an overview of the findings. You should read it.
* Building on the results of the card sort and an Audience Matrix (Excel) we had filled out earlier, Beaconfire produced a draft site map. Holly and I worked with them in a conference call to revise the map (PDF), then brought the entire staff into the process during our weekly staff call.
Beaconfire now has our feedback, which they’ll use to refine the site map, then produce a wireframe version of the site.
via Redesigning NTEN.org: of Card Sorts and Site Maps | NTEN: The Nonprofit Technology Network.
conferences to learn IA at
I was asked for some advice about conferences to attend if you are just learning about IA.
IA Summit
The IA Summit is still the main event of the IA year. There are usually 3 days of multi-tracked presentations preceded by 2 days of workshops. It is certainly a great place to meet IAs and to get a feel for what is currently capturing IAs imaginations. The pre-conference workshops usually include some good ones for people starting out. That said, the conference presentations themselves are more and more about general UX and web design. There’s a lot of philosophy, strategy talk and many presentations are focused on what highly experienced IAs should do next. If you are new to IA you might struggle to find more than a couple of presentations about the details of the craft.
The conference is very good value for money, especially when you consider how well fed you will be. If you are US-based then definitely go and make sure you sign up for a pre-conference. For everyone else, go if you can get your company to pay, otherwise consider some of the more local options (assuming you have them!).
EuroIA
EuroIA is the younger sibling of the IA Summit. Still very good value for money but slightly smaller and with fewer of IA big names. It can actually be a better place to get to grips with the basics as the European market is a bit less developed and there are still plenty of people wanting to talk about tackling typical IA projects.
Oz IA, IA Konferenz, Italian IA Summit
There’s a growing number of country specific IA conferences. They’ve got a good track record of attracting well known speakers for the main presentations. If your country runs one of these, I’d definitely suggest attending your local conference first. Just make sure you can speak the language!
Usability Week
An expensive option, especially if you go for the full five days. In spite of the title you can do two full days of IA tuition and you’ll get taken through the basics in a structured way. Just don’t expect small tutor groups. The tutorial audiences are huge. A good intro if your company has deep pockets but I’d be wary of shelling out for this myself.
UX London
In spite of the name, this was actually a good alternative to the IA Summit for Londoners, with many of the same regular speakers. For learners, Donna Maurer’s workshops would have been a great start and the rest of the event a good chance to see the usual suspects speak. Hopefully this will happen again next year.
IDEA
Oddly the IA Institute’s own conference isn’t really about the craft of IA, more the philosophical and creative landscape it sits within. Fascinating stuff but if you are new to IA you should go for the pre-conference workshop which tends to be more practical.
UX Intensive
Not so much a conference, but actually my best recommendation to people looking to learn about IA. Adaptive Path run great training events and UX Intensive is a nice balance of detailed IA craft and the broader UX context. Not cheap but well worth the money. You can also choose to just attend the IA day.
None of these options are cheap. The cheaper conferences really need you to pay out for pre-conferences to get value for money. And most people will need to shell out for travel and accommodation too. In my new non-profit mindset I’ve been thinking about cheaper alternatives and that’s a topic I’ll come back to later.
uncategorized
I’ve been looking at lots of alternative format bookstores, as part of the e-commerce project. One of these was the Large Print Bookshop which has a category of ‘uncategorized’.
I’m trying to imagine the scenario when the user would think “I know…it’ll be in uncategorized”? Particularly given that the choices above are ‘fiction’ and ‘non-fiction’, surely one of the better examples of exhaustive options?
If Guy is still reading, I’d love to know the thinking…
e-commerce project: current state analysis
This article is part of a series about our e-commerce redesign.
I had some quiet time over Xmas and did some current state analysis of the online shop then. I’m so glad I did this. As per usual, as soon as the project actually kicks off there is limited time to do this sort of thorough research.
One of our business analysts has done a formal “as-is” review of the back-end processes but I’ve been concentrating on the front end user experience, particularly browsing the catalogues.
For my current state analysis I identified all the existing features. To do this:
- took lots of screenshots, of all the screen variations I found
- made a sitemap
- annotated the documents, identifying each separate element
Now just because we have all these features now, it doesn’t mean we want to keep them. That said, during the website redesign we missed things that are working really well on the existing site. The site looks clunky and old-fashioned but there’s some nice features in there. So I wanted to make sure I genuinely knew the site inside out.
The functionality basically breaks down into:
- arriving on site (including via search engines)
- finding and choosing items
- information about purchasing
- registering
- adding to basket and purchasing
- tracking/cancelling
I’m going to concentrate on the first two areas for now.
Within the main shop (i.e. not the book shop) there are
- a store homepage
- category pages (including sub-categories)
- product pages
There’s also a sitemap, terms and conditions, product news, pricing information, contact forms, and help information but the other three are the main page types.
The project already has a product backlog from an earlier attempt to kick it off. After I have annotated all my screenshots, I compiled a list of features and then compared that to the product backlog.
The backlog was missing the following elements:
- link from product page to product instructions
- link from product page to other product guide/pages
- link from category page to product category guide e.g. choosing a mobile phone
- information about product size
- offer product variations e.g. colour and size
- product image
- product image enlargement
- seasonal offers and selections e.g. Xmas
- alternative ordering information e.g. call this number
- vat price + non-vat price
- login as different types of shopper
- links to t&cs
- communicate different delivery prices (free, special + xmas)
This flagged up for me a problem with the way the backlogs were generated. Stakeholders contributed ideas for features they wanted to see but tended to assume they would automatically get all the functionality they already have. Even with this process, I almost missed out search from the list, as it is part of the main website navigation and I was ignoring the standard page “furniture”.
Some of these gaps would indeed be obvious as we built the site but a number are not standard e-commerce functionality and it is entirely possible that the project team wouldn’t have thought of them independently. So for me the current state analysis catches functionality that might otherwise have slipped through the net.
Next: business requirements
charity e-commerce project
This article is part of a series about our e-commerce redesign. The series includes Current state analysis and Business requirements.
When I tell my friends that I’m working on an e-commerce project they look a bit baffled. It isn’t something that people immediately think of in relation to charities.
But we make/publish and sell a lot of stuff: books (braille, large print, audio etc), magazines, watches, telephones, kitchen equipment, mobility aids, remote controls, headphones, clocks, calendars, software, board games, playing cards, lamps, and batteries.
Our resource centres are also shops, and we have a moderately sized warehouse in Peterborough.
I’ve mentioned the bump-ons before, but some other favourite products include:
The first thing you notice when you go to the RNIB shop is that this page talks about two separate “stores”.
“At present our Online Shop and Book site are separate. You will need to register in each store to buy a product or listen to book.”
Obviously less than ideal.
Once you get into the stores it becomes obvious that the shop doesn’t feel like a normal online shop. There’s some basic patterns and conventions about how online shops look that the site isn’t consistent with. That makes the site a bit confusing, you have to actually read everything properly… you have to think about what you are doing. The product pages themselves are ok but the lack of images in the browse pages means the site doesn’t scream shop at you.
We’re just starting the project to relaunch the shop now, so I’m going to be digging a bit deeper. The goals are roughly to re-brand, improve the user experience and improve the back-end processes. At the moment it is just fun to be designing a shop.
Next: Current state analysis
opportunities in search logs: the geographical element
This article is part of a series about search log analysis which includes what people are searching for, bounce rates and spotting real opportunities.
Following on from yesterday’s post Spotting real opportunities in search logs I’ve been looking at what geography can add to the picture.
Anecdotely I’d heard that a lot of our Helen Keller referrals were “just American school children”. Google Analytics seems to validate this. From within the keywords report you can select a keyword and then set a dimension of continent or country. That then gives you the data about where geographically Google thinks those keywords are coming from.
Helen Keller:Â mostly North American
Excel Shortcuts: mostly Asia and North America
Fundraising ideas:Â largely European
Triathalons: more European
This changes some of my initial reactions to the opportunities each term represents. The non-UK traffic is still valuable to us, but this information could impact on what other kind of content we try and promote to these users e.g. the volunteering opportunities are all UK based so we’re unlikely to be able to cross-promote to non-UK users.
(I’ve been trying to work out how to set up a custom report for keywords by continent but can’t quite crack it. Any suggestions?)
spotting real opportunities in search logs
This article is part of a series about search log analysis which includes what people are searching for, bounce rates, and the geographical element.
Some users attention is worth more to you than others. For the most of us, we are not in the raw attention business. We want traffic, we want referrals, we want pageviews but all as a means to an end. E-commerce sites want those users to buy something. Charities want them to donate or campaign or take up a service. Bloggers want them to read their ideas (for all sorts of further reasons). Lots of sites want you to look at/click on their adverts. The BBC? That one’s a bit trickier. But in general you get the idea.
But that fact sometimes seems to get a bit lost.
Lots of people have got the idea that Google is important. Some are still struggling with it or missing it entirely but mostly people in the industry have got that Google matters. For some reason.
And lots of people are looking at their analytics and recognising that there is gold dust in there. But as with so many things a little knowledge is a dangerous thing.
I’ve been in lots of conversations and seen lots of reports that jump straight from “metric A is low” to “we must, at all costs, improve metric A”. If you ask why then they tell you about the importance of Google. With search logs these conversations mostly seem to revolve around poor bounce rates and low referrals for particular terms.
Google brings you that all important attention. But some attention is more important. Some attention represents a good business opportunity and some is a dead-end. Where the cost is minimal then sure, why not maximise attention. But if there is cost involved (and there nearly always is) then you need to be making business decisions about what you are trying to achieve.
I’m trying to think about this in 4 stages:
- Attention is only a step. Do you know what you are trying to achieve? If not, put down the metrics and go back to the strategy whiteboard
- Look for strong opportunities. Don’t try and succeed with everyone. What can the metrics tell you about these users and how likely they are to help you meet you goals? Not masses admittedly but more thoughts on this below…
- Your users are on a mission. Don’t try and persuade them to help you until you’ve helped them. This is classic seducible moment stuff. You might be unhappy with the bounce rate for a particular page but sticking promotions for other content above the content the user came looking for is only going to increase your bounce rate.
- Identify the hook. Given what you know about the users from the metrics (again you don’t have a lot to go on here) you need to think about what actually has a chance of holding their attention. If they are searching for homework help then they are unlikely to be captivated by content about creating a will. All things are possible but this one is unlikely.
So thinking about strong opportunities, I’ve been re-examining our search referral logs.
If the referring keyword explicitly refers to an RNIB service (Soccer Sight, See it Right, Talking Books) then we know we should be meeting these users needs. These are the obvious wins. If the metrics are bad then we probably need to sort this asap.
If the keywords explicitly relate to issues around sight loss then those users represent a good opportunity. We know they care or have some level of motivation to investigate the same issues that the RNIB is trying to promote.
But alot of referring terms are neither RNIB or sight loss specific. Fundraising ideas, excel shortcuts, flash, triathalons could all be from users with no interest in the RNIB’s cause. They might but we don’t have any evidence. Each of these terms offers a different strength of opportunity.
Fundraising ideas: we can be reasonably sure that the users want ideas about how to fundraise. We can guess that these are people who are inclined to raise money for charities. Seemingly a good opportunity. But why are they searching Google for fundraising ideas. Probably because they have a cause they are trying to raise money for. That probably isn’t us. So these users may be an opportunity but they’re unlikely to be a quick win.
Excel shortcuts: for some reason these users want information about excel shortcuts and it may have nothing to do with sight loss. Could be RSI or just improved efficiency. They might want other shortcuts and they might have empathy with the difficulties keyboard only users experience. Possible opportunity.
Flash: Very hard to decode this one. It is unlikely to be Flash developers (physicists don’t usually search for physics). The bounce rate is high and fast, so we know they didn’t want the content they ended up with. So we’d have to work out what they wanted and then provide that and then engage them further. Doesn’t seem such a great opportunity.
Triathalons: probably just users thinking about taking part in a triathlon, rather than the money raising potential of a triathlon. But they will probably need or be able to choose to raise money as part of their sporting endeavour. And they may well not have a strong charitable allegiance already. Good opportunity.
And what about Helen Keller? This represents a huge amount of attention for us but does it help us meet any goals? We think (but don’t know) that this traffic is teachers and schoolchildren, probably primary age. They will be thinking about sight loss and the impact on individuals so it should represent a good opportunity. But they are also thinking about cutting and pasting and getting homework done. Kids can be great fundraisers. We want to start life-long relationships. This could be a great opportunity but also a huge challenge. We don’t understand this space enough. And the logs won’t answer these questions, they can only take you so far. We’ll have to talk to real people.
Next: the geographical element