Understanding Taxonomy and Metadata: What They Are, and What You Need to Know

If you’re deploying a proposal management system, you may be wondering what you need to know about taxonomy and metadata. If you’re new to content management, you may even want to know: What is taxonomy? And, what is metadata? After all, these terms sound very scientific, and you’re a content person, right?

The topics themselves are somewhat scientific, but they both apply to the content stored in your proposal library. This article is designed to convey the basics about what you need to understand about each, and how they are used by your proposal management system to manage your proposal repository.

Let’s start with taxonomy

According to dictionary.com, taxonomy is “the science or technique of classification.” It turns out that this is a very modern definition — the word “taxonomy” was first used to describe the system of scientific classification of plants and animals beginning in 1828. Since then, the word has evolved to describe the practice of classifying any set of related topics or concepts. Its scientific application has expanded to include taxonomies for everything from viruses to types of soil.

In the business world, the types of taxonomies have become vast as well. There are global taxonomies, like the International Standard Industrial Classification (ISIC) system created by the United Nations to classify employment data by economic type. This is reinterpreted in many country- or region-specific classifications. The U.S. version, originally created in 1937, assigns a 4-digit code to businesses to denote their industry. The primary usage for both is the reporting of economic data, and consistency is essential.

But many businesses have their own internal taxonomies — systems to classify everything from accounting transactions to organizational structures. These taxonomies are usually hierarchical, denoting the structural relationship between parts of the business or how financials will be reported by business unit or expense category, then consolidated into a total picture for the company.

When it comes to content, a company often needs to establish its own taxonomy or the representational structure against which unstructured content is classified in an organization. Whoa, that’s a mouthful! In plain language, it’s a classification system to sort content into logical buckets such as a nested folder structure, a format that probably everyone is familiar with.

These taxonomies are created to manage records that may need to be tracked for compliance requirements, for example, or in knowledge management systems to enable better management of organizational information and know-how. And they might be required to support the organization and facilitate the retrieval of content in your proposal database.

So where does metadata fit in?

Metadata is literally data about data. It’s an essential part of associating an organization’s taxonomy with the content produced.

There are several types of metadata, too. The three main types are descriptive, structural, and administrative metadata, and you should be familiar with each, at least at a cursory level.

Descriptive metadata is something we are all aware of, even if we don’t know that’s what it’s called: it’s basic information about documents or files like the filename, the author, and the date it was created or last modified. It’s likely that you’ve used this type of metadata recently to find a file on your own hard drive or in a shared repository like Sharepoint or Google Drive. You know it’s essential and helps you find information. But you probably also understand that it isn’t sufficient by itself to enable you to find things as quickly or easily as you’d like.

Descriptive metadata also includes keywords, which are typically assigned to files by the file creator. Keywords can add additional specificity that makes locating a particular file more intuitive. But unless those keywords are created within some type of taxonomy, they can also add confusion, as each author or contributor typically assigns their own keywords to their content. It’s also impractical to assign large numbers of keywords to each file or content element, as that may lead to bloat in the system and additional confusion for users. (More on this later.)

Structural metadata denotes the relationships between content blocks, such as chapter numbers assigned to elements within a book or long report, for example. And administrative metadata provides information that facilitates the management of content, such as access controls (e.g., who has the right to use or edit content), limitations on reuse (such as licensing requirements), or expiration or review dates.

Why is this so important to understand?

Many content management systems, including proposal systems, need both metadata and a governing taxonomy to operate effectively. The goal is enabling users to easily and efficiently find and reuse all types of content, from the essential to the obscure. Organizations deploying these systems can go far toward meeting that goal with a clear understanding of these concepts.

So, if you’re starting from scratch — or overhauling your current system — where do you begin? As you might guess from its definition, a great deal of metadata can be pulled from the original content or source document. As such, it’s a fairly straightforward task to record, enter, and use metadata throughout your system.

The trickier part is developing your taxonomy and the appropriate keywords to support it. Developing a robust taxonomy does not happen overnight. The process should include a range of stakeholders, including system users as well as subject matter experts and business managers (as explained in this great article on taxonomy design for content management systems). A taxonomy also should be revisited regularly to account for evolutionary changes in the business and the lexicon of the marketplace.

What if this all sounds daunting?

This level of effort can be off-putting for smaller businesses or even for larger companies that have streamlined their staffing levels. (At DraftSpark, we do see some larger companies with specific roles, or at least a subset of responsibilities within a proposal content management role, to create and manage the organizational taxonomy and keywords or tags to be deployed.) So how does a business compensate?

This is where NLP-based AI solutions can do the heavy lifting for you. An AI-based solution can not only ingest your content, but it can also record its associated metadata automatically for reuse within the system. But even more importantly, it doesn’t require the creation of a taxonomy or a hierarchical keyword or naming structure to enable you to find your best content going forward. A technique that we’ve referred to as semantic search in this blog, more accurately “Concept Search,” allows users to retrieve content blocks based on queries that use natural language to find an answer.

This also helps protect organizations from edge cases where content has been appropriately tagged yet includes uncommon content that doesn’t fall within the keyword parameters, so it can’t be found to address unique requirements in the business. For companies with diverse business capabilities and an array of engagements with customers, this content can be critical to success. Making it easily accessible across multiple departments could make a significant difference in everything from operational efficiency and regulatory compliance to workforce management and winning new business.

Because of their capacity to absorb and categorize every type of content across an entire organization, NLP-based AI systems offer the chance to break free from the traditional structure of taxonomy and keywords. The result is a tool that automates content in the most effective way possible, without the need for your team to create and deploy your own internal metadata and taxonomy strategies. The difficult work of organizing and archiving is done on your behalf, while you enjoy the maximum level of benefits that effective content automation is designed to provide.