Information Diet

Information foraging theory gives us fruitful insights into how users interact with information, and we have looked at the concepts of information scent and information patches. Another concept from information foraging theory is that of information diet, which is today’s subject.

The popular version of the concept is that people looking for information have a variety of options for meeting their information needs, and that they choose which ones to use based on some notion of cost/benefit (profitability), guided by information scent. Again, this concept has biological origins, in which animals choose food based on nutritional, energetic and even medicinal values, trying to minimize the costs and maximize value.

The authors of the concept take this further, by saying that there is a strategic component to this, that the user will create a diet of the most profitable resources available, then progressively less profitable ones, but will stop including other types of resource when their profitability become quantifiably too low.

The user’s notion of profitability is highly situational and unknowable to an individual web page or document, but the diet model encourages us to think about the whole set of resources we could offer to meet a user’s information needs, with various characteristics, rather than just focussing on one resource at a time.

Let’s take a real world example, namely me learning about information foraging. There was an abundance of resources and no way that I could read them all. I had the options of reading blog posts, original papers, books by leading researchers, watching a YouTube video, or attending international conferences, to name a new.

Which did I choose, and why? The following extended example might seem a bit conversational, but please regard it as an example of user research that has already been done for you.

Initially, I was looking for quality information or resources, a high level understanding of the information space, and an early assessment of whether I could use these in my work. I also wanted to pursue my investigation right away, and subsequently at times of my convenience, so I chose to start with web resources.

I trust my ability to skim, and didn’t want to spend too much time, so I ruled out videos (can’t skim).

I ruled out SlideShare presentations (at this time) which often lack the presenter’s commentary and make me work too hard.

I searched for “information foraging” and the search results had a good information scent, specifically the presence of some of the leading authorities in the user experience world. Good, this is not just a an academic discipline. I looked for practitioner blogs talking about applications of the theory, and found several.  Their presence indicated I had found a likely patch, and I decided to press on.

My focus was now quite different. As a practising consultant, I was looking for two things:

  • are there examples or case studies I can utilize when working with clients, and
  • are there general principles that I can apply in my analysis and design work, and teach to others.

I was prepared to spend more time, effort, and money to find out.

I bought and read parts of Designing The Search Experience: The Information Architecture of Discovery by Tony Russell-Rose & Tyler Tate. This gave me lots of concepts, examples, and stimulation. It didn’t provide guiding principles quite the way that I like them. Great, there’s room for me to make a contribution.

I read several practitioner blogs, and very soon ran into diminishing returns. There was a recurring pattern of naming the ingredients of the theory, providing a biological example, and giving some valid but generally the same examples, such as the importance of good names for links. A few blogs drilled a bit deeper and I bookmarked them, but I essentially ruled blogs out as a resource class.

SlideShare presentations did become an eligible and productive resource class. Based on this work, I felt there was a lot more to be learned and applied, and was prepared to work harder and longer to find out.

My focus became different again. This time, I was looking for a deep understanding of information foraging, with a view to developing some guiding principles for my solutions design. I selected the 1999 paper on Information Foraging by Pirolli and Brand, and read chunks of it, some many times. This paper has a very high density of useful concepts, and I spent a lot of time making sense of what I was reading, primarily from the perspective of how I apply it in my work.

Now, I’m at the point where I’m integrating these concepts into my practice, and am not searching for additional resources. I have been left with strong go-forward scents. Pirolli has written a book on Information Foraging Theory, and has published a paper on social information foraging (search for “pirolli social information foraging”). The latter is especially high-scent for me, as I do a lot of work on collaboration.

The key things that this example points out are:

  • users think about resources in classes
  • users utilize these classes based on notions of profitability
  • there are some classes that will be highly used
  • there are some classes that might never be used
  • these notions of profitability are individual and situational
  • the user’s notions of profitability are heuristic and will be incorrect in cases.

These concepts apply not just to online activities. As a segue into an upcoming discussion of cross-channel information foraging, you might want to look at the following exercise.

Exercise for the reader: describe the following story from the point of view of information foraging.

“Information Architect John is on a phone call with a colleague Sally who wants help redesigning their part of the intranet. Half way through the call, when they want to walk through the current site, they realize that John doesn’t have access. They decide to use their corporate web conferencing solution. Neither John nor Sally knows how to set up the call. John does a search on the intranet. A lot of the search results are past meeting notes with conferencing details. There is some self-help material, but it is a ten-minute video. John rushes down the corridor and bursts in on a colleague, Nancy, asking if she can help. Nancy can and does, real time, saving the day.”

Think about the information foraging perspective, and any thoughts that arise for solutions design.

Stay tuned.

The Information Artichoke Home Page | Search Table of Contents

 

Information Patches

Last time we explored the notion of information scent, and in this post we will talk about information patches.  Both are thought provoking ingredients of the Information Foraging model of search behaviour, which says that the way we look for information in any complex situations has characteristics in common with the way an animal forages for food.

In the biological foraging model, a bear foraging for food would search for a berry patch, eat there till the point of diminishing returns, and then look for another patch, but not too soon.  By diminishing returns, we mean that it involves progressively more work to get a mouthful of berries once the patch has been picked over.

Is it plausible that we forage for information this way? Sometimes, yes. When I was learning what’s new in SharePoint 2013 information architecture, distinct topic areas revealed themselves, some examples being content types, hover panels, cross-site publishing, etc. I foraged in each of these until content started to look familiar, which was my point of diminishing returns.  No need to read the same thing a third time if there is nothing new.

Now anyone reading this will challenge the comparison between biological and information foraging, as I did, but I am looking for ideas that are useful.  One big idea for me was the notion that information can be patchy, with implications for defining patches, making them easy to find, and making it easy to search within them. That’s what we’ll explore in this post.

Let’s start with my paperwork. I have a smallish amount on my desk that I use constantly, a bookcase at hand with frequently used books, and lots of books on shelves in the family room, with groups for science fiction and travel, the remainder being a jumble.  This organization is designed to reduce the time to find material.  It doesn’t reduce the time to zero.  I still have to shuffle papers on my desk, or scan the travel books for the one on Istanbul.

So there are two aspects to searching when patches are involved:

  • finding a suitable patch
  • searching within a patch.

If we had woken up from a twenty year sleep and hit the web for the first time, searching for a patch on the internet would be hit-and-miss   After a while, though, we learn authoritative sources, and these become patches, like BBC Food or Microsoft’s TechNet.

Of course, this naming of patches is nothing new.  We have the Periodicals section in libraries, China Town and named neighbourhoods in cities, and the spice, carpet, and gold sections of bazaars.  It is just something under-exploited in web and especially intranet settings. In an intranet setting for example, we typically identify patches through broad navigational labels such as HR, Social Club, etc., but there are many specialized patches we could create, such as microsites and dashboards.

Ecommerce sites have highly structured patches.  A carefully designed faceted structure gets us to the patch of all DVD players made by a certain manufacturer and in a certain price range.  Then we have to browse within the patch.

There is not a rigorous definition of information patch.  Rather it is some level of thematic cohesion. So the following would be considered patches:

  • BBC Food
  • an At-A-Glance page of links for New Employees on an intranet
  • the sales and marketing portal in a corporate intranet.

Patches can contain sub-patches but at some level we reach pages with no information structure implied, for example Calgary weather.

What about a search results page?  Search results can be considered a dynamically generated patch. Those from the big search engines have thousands of hits for even the most unlikely queries, and do not usually indicate their patch structure.  In an intranet setting, where we can control metadata, we can do better and introduce search refiners to filter down to useful (by design) patches.

Pushing this to the extreme, we can omit the search results themselves and just list the patches and how many results there are in each patch.  So if an executive queried “SharePoint” in such a design, the search could identify that the company had done fifty SharePoint projects, there were eighteen employees skilled in SharePoint, we had five documents on corporate licencing, and eighty pieces of content to do with training.  In other words, the search concentrates on the structure of the information rather than detailed instances.

We can see that our main job as designers so far is creating a rich patch structure and helping our clients find the most useful ones.

Once we have found a patch, how do we search within it?  There are several approaches, depending on the content:

  • scan it manually, just like I do with my travel books
  • utilize any internal pathfinding that the designer has provided
  • look for significant scent identified and implemented by the designer.  In the eCommerce world, this might be product images, ratings, reviews, or specifications.

Our main job as designers here is providing scent and/or pathfinding appropriate to searching within a patch rather than finding a patch. Two distinct design exercises!

These concepts from Information Foraging don’t exhaust the range of concepts introduced.  The biggest area remaining is how actually do we as information foragers operate, and make decisions about what how long to stay in a patch, and what information diet to subscribe to.  My approach to the subject when I first encountered it was very much foraging, looking for aspects of the theory that could inform my analysis and design, and it is these aspects that I have shared.

Key names in this field are Peter Pirolli and Stuart K. Card.  They have written many papers; one that I like is called Information Foraging, available at http://act-r.psy.cmu.edu/papers/280/uir-1999-05-pirolli.pdf.

This is no mere blog post, but a seventy-odd page technical report.  The density of new concepts is staggering, some fruitful ones just thrown out offhandedly. Be warned, different parts of the document have different information scents, with some narrative, some mathematical modelling, and some simulation.  But the first twenty pages are accessible to everyone, and the key point of the modelling section is Figure 5 (Charnov’s model).

Stay tuned.

The Information Artichoke Home Page | Search Table of Contents

 

Information Scent

From the examples in the previous post in the Search series, we have seen that searching is a more constructive activity than punching in search terms and consuming the result.  One aspect of this is the ability of the user to quickly scan information looking for characteristics which they can employ to assess relevance.  These characteristics are collectively are known as “information scent”.

The definition “those characteristics of information which the user can employ to determine relevance” might sound a bit too vague to be useful, but in reality it invites us to think broadly about what those characteristics might be, and how as designers we might apply them to the information or interfaces we are involved with. It also invites us to consider how the user interacts with these characteristics.

Here’s how information scent might apply to a single search result. When I search for my recent blog posts using the query “information artichoke search”, the big search engines will include a result like this

Upcoming series on search | The Information Artichoke
theinformationartichoke.com/?p=390‎
Sep 27, 2013 – I’ve had an interesting few months advancing my understanding of search from combined information architecture and user experience …

Deconstructing this, I make the following observations:

  • The title “Upcoming series on search” – not bad scent, but I’m looking for the series itself, not an announcement.  A user however might follow the link to see the pages had links to the series.
  • This thought reminds me that I need a table of contents page for the Search series. My other series on Modern Information Architecture has a table of contents and I wonder how it shows up.  Not well – its table of contents has the title “Table of Contents”.  Not quite what I intended, perhaps “Modern IA Course Table of Contents” would look better here.  This is a good example of content that works well in a specific context (i.e. a richly linked blog) not working so well when some context is removed (i.e. in a set of search results)
  • My blog name “The Information Artichoke” appears in the title, although I didn’t ask the search provider to do this.  Depending on the reader, this might be considered noise or the name of a trusted authority and hence positive scent
  • I suspect the search provider knows that this is a blog post
  • The date appears in this search result, but not in all
  • The description “I’ve had an interesting few months … ” is nicely conversational within a blog, but not high scent here; a short description “We will be exploring such topics as information foraging, information scent, and sensemaking” would be better for a quick assessment.

So do these observations help anyone?  Personally, I suspect they could be applied to corporate settings where I am trying to improve enterprise search through the application of metadata and smarter formatting of search results.  I decide to give it a try.

I start by abstracting what I have found and make notes to myself, along these lines

  • A blog is an example of a type of content.  Others might be News, or Reference Material.  Coming up with content types to the finest level of granularity is hard work, but some high level typology might be helpful to categorize or refine search results. We might find that not all content needs to get tagged with a content type.
  • The name of my blog, The Information Artichoke, might be considered a source or a publisher or origin.  Other content types might be tagged as coming from a particular department, or person, or from outside the organization altogther
  • The date might be especially valuable for blog postings, news articles, job postings
  • The meaning of the date value could be spelled out.  For a blog, it could be the date posted; for job postings it could be the date that the job was posted or the date the posting expires; I have an opinion, certainly, but user research will resolve this
  • Is there any significance to the date value?  We often see a  “New” indicator on fresh content, and could imagine other treatments such as red flags on action items
  • The description should be constructed to have high information scent.  Scraping the first elements of the content does not give good results.  For example, the search result description for some of my blog posts include the names of the blog navigation elements; spreadsheets are even worse, showing the contents of a number of  cells 3.14, 1.41, 999.

We can desk check our ideas by constructing some sample search results

BLOG POST Upcoming series on search | The Information Artichoke
Posted Sep 27th, 2013
We will be exploring such topics as information foraging, information scent, and sensemaking.

JOB POSTING Senior Information Architect | HR
Closes Nov 5th, 2013 **NEW**
Required to formulate content strategy for sales and marketing

Employee Benefits Overview | HR
Last Updated August 27, 2013
Information on health, insurance, personal development allowance, and …

To proceed, we can stress test the concept on paper, explore business rules for any metadata needed, and talk to our UX colleagues to do usability testing on some of these ideas.  They can also suggest some visual treatments and iconography.

Information scent does not apply just to search results, but can apply to documents, pages, links and navigation, among others.

When it comes to links, the simplest implementation is a text link.  If the name does not provide high enough scent, then we can add a short description and/or a picture to increase the scent.  Another approach is to give the user a chance to lean in for a closer sniff, by providing alt text, or revealing more information when the cursor hovers over the link.

Here’s an example involving documents. I enjoy reading lecture notes in physics and have noticed distinctive aspects of scent that affect my assessment and seem reliable.   If the document is all text or all mathematics, it has a low information scent for me.  If the math is too hard for me, I rule out the document immediately.  Diagrams provide a positive scent.  Interestingly, I can make this assessment almost as soon as the document loads in the browser.

I have also noticed characteristics of lecture notes that reflect a distinct bias on my part.  If the format is PowerPoint slides, I notice myself downgrading the information scent, but am prepared to persist if the authority is high.  If the format is scanned handwritten lecture notes, I rule the document out without further investigation.

This last case is interesting.  Most literature on information scent seems to focus on the user selecting stuff to pursue, but this example suggests that information scent is also useful for letting the user select stuff to rule out.   Likewise, in our previous examples, somebody not interested in blogs could easily bypass them in a search results page, if they were easily identifiable.  This is quite consistent with the biological origins of the concept, which is how animals use scent to identify good foraging opportunities. It also challenges the notion that we want to make content sticky.  Maybe traversable might be a better notion. 

To summarize, information scent consists of characteristics of information that we can use to make decisions about the value of that information, without having to read the information itself.

I’m not sure about the boundaries of the concept.  People are influenced by branding, layout, tone, and a host of other things that we can manipulate.  Some people are influenced by US vs. British spellings. Some people discount content with incorrect semi-colon usage.  Does this make all of these characteristics part of information scent or something else?

Looking back to the biological origins, we know that bees are attracted to flowers by their shape and colour as well as scent.  So maybe we could factor out visual elements. You probably didn’t know that bees can also detect the shape of cells comprising a leaf, and hence whether the leaf will afford their feet a good grip. http://www.botanic.cam.ac.uk/Botanic/Page.aspx?p=27&ix=2847&pi..

Personally, I’m not going to attempt to define the boundaries of information scent.  I will just adopt it as a useful concept to have in my toolkit, and apply where it might add value.

Stay tuned.

The Information Artichoke Home Page | Search Table of Contents

 

Deconstructing Some Searches

In the last post in this series, we speculated that search goals might be a useful characteristic to explore.  Let’s consider the interplay between search goals and search results in the four reference search tasks we introduced last time, namely:

  • what will the weather be in Calgary this weekend?
  • how can I fix the <model> laser printer in my home office, which is blinking with a certain sequence of lights?
  • what do I have to do the make my concrete path less slippery in winter?
  • how can I get up to speed on SharePoint 2013?

These were real searches I performed.  Here’s what happened.

Calgary Weather

The weather example is classical information retrieval.  My goal is specific, and I know through experience that I can fulfil it with the simple query “Calgary weather”.  If this recurring information is important to me, I can increase its prominence by delivering it through a persistent widget or app.

Laser Printer

In this case, the goal is explicit (“diagnose the problem, and fix it myself if possible”).  I don’t have a guaranteed search query, but have fixed hardware before and have some ideas. I start with “troubleshoot <model> laser printer”. By the way, if I had previously looked unsuccessfully in my office for the printer manual, I might have started with “download manual <model> laser printer”.  I do a quick scan of the search results to see if I am on the right track.  I am not.  [The ability to do a quick scan introduces the idea of information scent which we will talk about in an upcoming post]. The high frequency terms “laser” “printer” overwhelm “troubleshoot”, stuffing the results with printer sales and printer reviews.  I try “blinking lights <model>”, and get results that are recognizably useful, or in other words, have a high information scent.  One of them helps me  my problem.

Exercise for the reader: How would things be different if I ran a help desk and frequently had to help fix a variety of printers?

Reflecting on this as an information architect, a couple of things cross my mind.

  • I start to see that the world of laser printers has a little ontology (specifications, sales, use, problems, ratings, support), and wonder how I could exploit this.
  • And I wonder how I can learn how to manipulate information scent so that page viewers can assess the value of the page quickly.

Concrete Path

The slippery concrete path example was more problematical.  I am clueless when it comes to DIY, so my goal is not very explicit (“get something to make the concrete path less slippery”), and I expect a bit of fumbling around.  I am not disappointed.  I make a big misstep in the query by asking for “concrete surfacing”.  Adding “exterior” doesn’t help. I still get lots of product sites, talking about things I don’t understand.

I have an ah-ha moment, and enter “concrete path slippery”.  Good call, I find lots of question-answering forums answering my problem.  I still have work to do, but I know I’m in the right sort of place. Now, I have to drill into a variety of options in true evaluation and decision making mode.  In doing so, I learn about evaluation criteria such as appearance, permanence, cost, etc.

Reflecting on this as an information architect, a couple of things cross my mind.

  • I see a distinction between product sites and question-answering sites, and wonder if there are broad categories of site that I can exploit, perhaps as metadata in an enterprise search solution.
  • I also notice that I started to makes notes on a piece of paper.

SharePoint 2013

The SharePoint 2013 example raised some other points.  I understand SharePoint 2010 very well when it comes to designing knowledge rich solutions, have formerly done a lot of development, and avoid infrastructure. Nevertheless, I chose an initial query “what’s new sharepoint 2013” to get an overview of the new stuff.

SharePoint 2013 is huge, and I realise that my goal is ill-formed.  Over period of time, I refine my goal, first to “how can I get up to speed on SharePoint 2013 search and information architecture?”, and finally “what do I need to become a consulting information architect in SharePoint 2013”.  This refinement of goal came about as a result of interacting with the whole information space, and turned out to be instrumental in screening content in or out.

This latest goal did not perform well as a query term. The query “how can I become a consulting information architect in SharePoint 2013” pulled in SharePoint consulting groups as well as SharePoint 2013 information architecture.  I needed a strategy for formulating my query.  My survey of the What’s New information had identified a list of focus areas that I felt I needed to drill into;  I used the names of these areas as the basis for a deep dive.

Reflecting on this as an information architect, I noticed

  • I worked in both survey mode and deep dive mode. I wondered how content providers support these different modes through different IA or UX patterns
  • There was quite an ecosystem of information sources
    • large, highly structured, richly interconnected sites from Microsoft
    • blogs providing practitioner experience, tips and tricks
    • training providers.
  • Some sites had curated overview content.  This takes work and categorization. It works well if the overview gets the audience right.  Microsoft had categories for the reader to select their role as Manager, Developer, etc.  I’ve done this sort of curation for New Employee Quick Reference microsites, and wonder whether there are general principles for when and how to do this
  • My “deep dives” were not infinitely deep.  Once I understood an area to a certain level, I stopped looking at that area and moved to a new one [information foraging]
  • I built my own information structures, ranging from annotated lists of links to scrappy diagrams to documents;  some of these become permanent summaries, others were intermediate stepping stones to understanding, and got trashed when they had served their purpose.

All in all, I found this deconstruction of the four searches useful.  It started to cement my understanding of the intricacies of search, and it gave me enough thoughts for solutions design that I am encouraged to continue.

Stay tuned

The Information Artichoke Home Page | Search Table of Contents

 

Designing The Search Experience [Book]

Designing The Search Experience: The Information Architecture of Discovery
Tony Russell-Rose & Tyler Tate

This is an excellent and stimulating book!  I recommend it to anyone who wants a deeper understanding of how people search, and who strives to exploit this understanding in their solutions design.

The first part of the book, a Framework for Search and Discovery, is a well-referenced presentation of some behavioural attributes of information seekers, and different ways that they interact with information. It introduces concepts such as information scent, information foraging, and sensemaking, and follows this with sections on context and search modes.

This section has been consciousness-raising.  I have kept the framework in mind as I observed myself interacting with a variety of search tools; it has proven valuable in articulating my own behaviours and identifying how well (or not) my search tools support (or could support) these behaviors.

The second part of the book, Design Solutions, provides a wealth of attractively presented examples of user interfaces showing how the insights from the first part have been applied.  Some of the examples expose design decisions that we see every day in our experience with the large search engines. Others describe search interfaces that push(ed) the envelope in different directions.  Some of these no longer exist in the form presented.  Some no longer exist. But the ongoing struggle for improved search experiences is well represented.

So is this book for you?  If you’re looking for a paint-by-numbers book, afraid not. The struggle for improved search experiences is the theme of the entire book, and the authors are thoughtful practitioners and part of this ongoing struggle. If you share some of those characteristics, need to contribute in this space, and are looking for a quick journey to the leading edge, you will benefit greatly from this book.

By the way, from a coverage point of view, most of the examples come from search engines or consumer facing sites.  I personally work a lot in creating solutions for knowledge workers within an organization. The ideas presented in this book still apply.  In fact, with the ability to access our users, and the opportunity to define information structures tailored to their goals, I suspect we can meet their needs very convincingly.

Nice job!

There is a third part to the book on Cross-Channel Information Architecture which I haven’t read yet.

 

 

 

 

Different Faces of Search

I am confident that, over the last few weeks, we have all made numerous on-line searches.  For most of us, this is an unexamined activity. But as an information architect with a keen interest in user experience, I have become aware that I use several different search modes, and hope that by identifying useful distinctions, I can help build more useful search solutions.

Here are some of the search tasks that I have performed:

  • what will the weather be in Calgary this weekend?
  • how can I fix the laser printer in my home office, which is blinking with a certain sequence of lights?
  • what do I have to do to make my concrete path less slippery in winter?
  • how can I get up to speed on SharePoint 2013 Information Architecture?

From these, it is clear that there are different types of search task. But how do we describe the differences?  Some commentators talk about search modes, or seek to provide taxonomies of search activity.  While these recognize the variety inherent in the search task, and our roles as active participants as our query activities move us closer to our goals (or not), I find them unfulfilling in some ways.

Let me give a couple of examples. One popular framework has top levels Lookup, Learn, Investigate, which include activities such as Verification, Comparison, Analysis, Discovery, Synthesis Another talks about Moves, Stratagems, Tactics, and Strategies.

Why didn’t I warm to these? When working as a solutions consultant, I look for practical distinctions that I can use when talking to clients and thinking about solutions. I’ve nailed this in a number of domains.  In document management, I can differentiate between static reference materials (such as policies), and operational documents (such as meeting minutes), and use these to focus discussion and shape solutions.  But the frameworks mentioned above didn’t do the same thing for search.  First, I would find it very awkward to talk to a client in such abstract terms.  Second, the frameworks are too broad to help me shape solutions.  And finally, I simply cannot apply them.  When I tried to apply them to my sample searches, it was a struggle, inconclusive and not insightful.

The problem is perhaps a mismatch between what I need and what the researchers were trying to provide.  I was looking for entry points for solutions design; they were trying to provide a cognitive-behavioural framework, and in an information free context.

In my world, user research provides definite context.  Stimulated by other reading, and deconstructing my search examples, I came up with the following dimensions of the search task that I felt might be helpful.

  • how well is the goal defined
  • are the information resources and authorities well known
  • do we know what success looks like
  • can we tell when we are making progress toward our goal
  • are we time limited
  • does the goal results in creation of an information artefact, or doing something in the real world, or retrieving a data value
  • do we expect the goal will be accomplished easily and directly, or will there be fumbling around.

This is starting to feel better for me for a number of reasons.

  • I was able to create facets that I can shape and refine and explore, rather than being forced into wholesale design of a taxonomy or framework (which of course are two different ways that IAs approach information)
  • I can look at my sample searches, and easily see how they line up along these dimensions
  • I can glimpse some ways in which I can help end users and information providers improve their worlds.

In the next few posts, I will explore some of these dimensions.  Comments welcome.  Stay tuned.


Sources

For the Lookup, Learn, Investigate framework and critiques, search online for “Marchionini exploratory search”, especially  http://www.inf.unibz.it/~ricci/ISR/papers/p41-marchionini.pdf

For the Moves, Stratagems, Tactics, and Strategies, search online for “Marcia Bates Moves, Stratagems, Tactics, and Strategies”, especially http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.16.362&rep=rep1&type=pdf

For descriptions of these and others, see the book “Designing The Search Experience” by Tony Russell-Rose and Tyler Tate.  See the book review for Designing The Search Experience in this blog.

The Information Artichoke Home Page | Search Table of Contents

Upcoming series on search

I’ve had an interesting few months advancing my understanding of search from combined information architecture and user experience perspectives.

It has been strangely self-referential, watching myself search about search.
Specifically, I created a battery of searches of different types, and checked my actual behaviours against the reference behaviours I encountered in the literature.

How did it turn out?  Well, I agreed and disagreed to various degrees, and came up with my own distillation that would be useful to me as a consultant and solutions designer.  Of course, this is exactly what the literature said I would do, and even gave it the name “sensemaking”.

So stay tuned as we explore such topics as information foraging, information scent, and sensemaking.  I personally found these to be very thought provoking, and hope you will too.

Handling Unstructured Data

A colleague of mine was asking what unstructured data was, and why it was a concern, and what we can do about it. I gave him a quick explanation.  Here’s a longer one.

First of all, what is unstructured data?  It is data that is not broken down into individual named pieces.  Examples might be a Word document, a web page, the body of an email, a text message, and a tweet.  Contrast this with a name and address record stored in a database, with well defined fields for name, address, email, etc.

Unstructured information is a problem because a huge amount of what an organization could know about itself is stored in formats that don’t lend themselves to finding things.  By the way, it might also be stored in locations that don’t lend themselves to finding things, such as personal folders, email, dropboxes, etc., but that’s another story.

As always, before solutioning, we should have scenarios in mind.  Let’s consider a couple.

  • our Helpdesk wants to be able to manage the large number of emails between it and its clients, in order to (a) improve responsiveness, (b) find a particular email, and (c) mine the set of emails for patterns
  • the client wants an enterprise search where its users can find intellectual assets as confidently, reliably and easily as they search a large on-line catalog for physical assets (think cameras, DIY products, clothing).

There are two main strategies for handling unstructured data:

  • adding structure
  • adding description.

We’ll talk about these in turn.

Adding Structure

Although we use the term “unstructured data”, the documents and emails we deal with are not random strings of characters but already have structures that we might be able to exploit.

For example, a news article has a predicable structure; by creating a template in a web content management system with separate fields for title, author, and body content, we can achieve benefits at local and enterprise levels

  • locally, we can provide more powerful searches and filter for news articles; for example the query “find articles authored by John Doe” which is more specific than the basic full text search “find articles containing the phrase John Doe”
  • enterprise wide, we want the article to show up in a comprehensive set of information about staff; so when we query John Doe, we get
Person: John Doe
Skills
Projects
Publications
News Articles
“Handling Unstructured Data”
White Papers

Sometimes a technology change will make it easier to add structure.  For example, if we replace emails to helpdesk with web forms, we are more likely to get a well formed request, improving cycle times and enabling analytics to be generated more easily. For example, contrast an unstructured email

Hi, helpdesk!!! I just finished putting in a bunch of entries and got an error and now I can’t do anything.

with a structured form

Reason for Call Problem with A Program
Program Name Tran120
What Were You Doing Posting some invoices
Last Thing You Did Posted the batch, and tried to create a batch template so I could use it again
Error Condition Screen froze with error 22,B4.

As a final example, we can add structure to short text messages using understood codes, for example, to vote for John Doe, text this number with LUVJD.  This approach works fine for text messages.  Unfortunately, it is often used for structuring names of documents, so we often see document names like

RFP Response for Big Enterprise – final v 10 – Joe’s comments.

When we have a folder full of names like this, we realise we have a problem of clarity.  Full text search will not help, as all versions of this document will likely show up in the same search.  That’s why we need the other approach to handling unstructured data, namely “Adding Description” and that’s what we’ll talk about next.

Adding Description

Whether or not we can add structure to unstructured content, we often want to add description.  This is done by adding metadata, information external to the content that describes some useful aspects of the content.  Much of the time, this is done to help us search or file the content, but different metadata can be used for other aspects of information management, for example describing how long the content is to be retained, or whether it is to be archived.  We will focus on metadata for categorizing and finding content, but wanted to make it clear that this is not the only possible purpose for metadata.

One part of the information architect’s job is to help the client define suitable metadata. We explore with the client how they might want to search, filter, and categorize their information. If we were dealing with a library of sales collateral, for example, we might considering searching it by product or business sector, and filter on whether it was overview material or detailed, or whether it was aimed for a business or a technical audience.

Coming up with initial metadata ideas is not too hard, but additional work is needed to turn it into a practical tool. The sales collateral examples illustrate various situations that might be encountered:

  • Product
    • there might already be a product catalog that we can leverage
    • products might have both numbers and names
    • products can often be arranged in a hierarchical structure (look at a complex parts catalog or a consumer electronics site)
    • a document might refer to several products or product categories
  • Business Sector
    • this might already exist in the organization
    • if it doesn’t, we should consider looking for a scheme that could apply enterprise-wide, not just to sales collateral
    • we can help the user develop this scheme using card sorting and other familiar techniques the final result is likely to be comprehensible to users tagging content and viewing content
  • Overview or Detail
    • this might not already exist
    • it might be difficult to get a definition of what we mean by Overview and Detail
    • even if we did, the final result might not be reliably comprehensible to users tagging or viewing content
    • we might have an ah-ha moment and realise that the sales staff already talk about their documentation in terms of Two-Pager, Briefing Notes, etc., and that Collateral Type might be a more useful piece of metadata, especially for internal audiences
    • going with Collateral Type, this metadata is likely to be applicable just to the Sales Department rather than being enterprise-wide.

The next part of the information architect’s job, now that we have got the client excited, is to raise the question of how the metadata will get assigned to content.  There are two options: adding it by hand or adding it by program.

In a few lucky cases, adding metadata by hand is feasible.  This is the case where we have professionals whose job is corporate communications or corporate librarianship, who believe in the importance of tagging content, who understand the domain they are working in, where the volume of publications is small, and where the metadata structure is simple.  A feasible example might be corporate communications tagging internal news releases with category, any applicable departments, and any applicable projects.

Otherwise, there are a lot of “ifs” that might not pertain.  Some employees might not care about tagging the content they create.  Some metadata might be so complex that is would not be feasible to reliably tag a big document with all applicable instances of metadata.

In this case, we will need autotagging software to do the job of tagging.  We still need to define the  metadata that we will use. The autotagging software scans content and applies metadata tags based on rules that are set up and maintained by the organization.  For example, an HR department might have the rules:

If the document contains the words “Human Resources”, tag it with “Human Resources”
If the document contains “HR”, tag it with “Human Resources”.

The software makes it easy to test the applicability of rules and tweak them based on what we find.  For example, when we run the second rule on a set of documents, it will show which documents match the rule, like this

….. contact HR at [email protected]
… the HR Department provides services …
… to get a horizontal rule in your web page, use the hr tag …

In this case, we have learned something and can modify the rule.

If the document contains “HR” and not “tag” and not “HTML”, tag it with “Human Resources”.

Retesting the rule will now exclude the document talking about horizontal rules.

There can be quite an elaborate syntax for building tagging rules.  Some things that we can handle are Booleans, words that sound alike even though they have different spellings, and pattern matching.

By the way, pattern matching has some interesting uses.  For example, we can use it to tag documents that contain email addresses or phone numbers.  This is useful if want to scrub a set of documentation  to make sure that it does not contain any personally identifiable information.  And of course, using the rules, we could also test for the anti-bot form of email, name[at]provider[dot]extension.

 

New Lesson in IA Course

Just posted Separating This From That (Part 1)

A commonly-heard maxim in the software and content publishing world is the importance of separating content from presentation, or separating structure from presentation, or presentation from behaviour, etc.  There are many variations on this theme, but they all share similar objectives:

  • separating responsibilities of one kind or another
  • making certain types of change easier
  • allowing reusability.

We take a look at a few examples and see where information architects get involved.

Future “books” and future “reading”

I heard an excellent radio program about the future of the book a few days ago. Some of the points raised in the context of digital books align with the future of the content web and other channels. Here’s my mashup.

Future “books” will redefine the “reading” experience by :

  1. letting the author augment their content with structure and linkages
  2. letting the reader augment the content with their own annotations
  3. letting external sources augment the content with their own commentary

Let’s talk about these, taking as an example “Lord of The Rings” for fiction, or a biography of Henry Kissinger for non-fiction.

First, content augmentation by the author. Currently the author provides ample structure through plot lines as characters and events unfold. Some of these represent relationships in physical space-time, others reflect unfolding in social and psychological spaces. Regardless, in a complex book, we sometimes have to work hard at keeping all the events, players, and locations straight. When reading a physical book, we might keep a thumb in the map, chronology, and dramatis personae, but soon run out of thumbs. In digital books, the same artefacts, suitably architected, could be integrated into the content; they could even provide alternative visualizations of the entire book’s content. Different readers would have different preferences for using these tools. Some would trust the author’s aesthetic and follow along for the ride, agreeing to work hard when the author wanted them to; others might use the tools as a cross reference and crutch; yet others might rely on the alternative visualizations as their primary access to the material.

Second, content augmentation by the reader. Making annotations and margin notes is straightforward enough, but this is taking a book-centric point of view. A more user centric point of view might ask why the user is reading a book, and allow the annotations to be semantically tagged with categories meaningful to the user, for example “Going On A Quest”. And of course, the reader might be reading several books to support their goals, so would want the annotations to be structured consistently across all books, but aggregated outside of any particular book.

Third, content augmentation by external sources. Right now I often read a book with Google at the ready, but this is a hit-or-miss proposition. I know there are resources out there, and I would appreciate it, even pay for it, if someone had curated some good resources for me. We could imagine an ebook offered with options for plugins of curated study notes, literary criticism or historical context, written by an authorities and cross linked to the book.

So those are the three takeaways from the radio program. By the time we have done all these things, the book will have acquired additional structure, content and interaction. The book business will have new players and institutions.

I really have to sympathize with one of the speakers in the radio program. They were asked why they kept talking about the “Future of the Book” when it would feel and behave nothing like a book. The somewhat plaintiff answer was, “we don’t have a name for it yet”. But we will one day. Ten years from now, we will have new vocabularies and patterns for this sort of thing.  I look forward to helping us get there.