bill vorhiesBusiness Foundation Series, #3

Summary: Business executives need to understand the new opportunities available in Big Data from unstructured and semi-structured data, and how to blend these newly available data types into their data-driven competitive strategies.


The Big Deal about Big Data for business users is hiding in plain sight.  Yes there is a lot more data that can be captured, stored, and analyzed (Volume) but the real payoff is likely to be in Variety, types of data that in the past we couldn’t easily capture or analyze.  Big data technologies solve the problem of allowing us to cost effectively capture and store many new types of data in their raw format, later allowing us to analyze these new forms in our analytic systems.

There are three types of data we need to consider, structured, unstructured, and semi-structured.  Of these, the last two are new in Big Data.

Structured Data: Your current data warehouse contains structured data and only structured data.  It’s structured because when you placed it in your relational database system a structure was enforced on it, so we know where it is, what it means, and how it relates to other pieces of data in there.  It may be text (a person’s name) or numerical (their age) but we know that the age value goes with a specific person, hence structured.

Unstructured Data:  Essentially everything else that has not been specifically structured is considered unstructured.  The list of truly unstructured data includes free text such as documents produced in your company, images and videos, audio files, and some types of social media.  If the object to be stored carries no tags (metadata about the data) and has no established schema, ontology, glossary, or consistent organization it is unstructured.  However, in the same category as unstructured data there are many types of data that do have at least some organization.

Semi-Structured Data:  The line between unstructured data and semi-structured is a little fuzzy.  If the data has any organizational structure (a known schema) or carries a tag (like XML extensible markup language used for documents on the web) then it is somewhat easier to organize and analyze, and because it is more accessible for analysis may make it more valuable.  Some types of data that appear to be unstructured but are actually semi-structured include:

  • Text: XML, email or electronic data interchange messages (EDI).  These lacks formal structure but do contain tags or a known structure that separate semantic elements.  Most social media sources, a hot topic for analysis today, fall in this category.  Facebook, Twitter, and others offer data access through an application programming interface (API).
  • Web Server Logs and Search Patterns:  An individual’s journey through a web site, whether searching, consuming content, or shopping is recorded in detail in electronic web server logs.
  • Sensor Data:  There is a huge explosion in the number of sensors producing streams of data all around us.  Once we thought of sensors as only being found in industrial control systems or major transportation systems.  Now this includes RFIDs, infrared and wireless technology, and GPS location signals among others. In addition to monitoring mechanical systems, sensors increasingly monitor consumer behavior.  Your cell phone puts out a constant stream of signals that are being captured for location-based marketing.  In-store sensors are monitoring consumer shopping behavior. Your car monitors its systems and constantly records data that can be used to evaluate mechanical failure or accidents.  There is huge growth in the popularity of ‘the quantified self’ in which we voluntarily wear devices like the FitBit or a Nike Fuel Band that record our activity and in some cases even heart rate, velocity, location, and calorie burn.  While a great deal of attention is being paid to new types of analysis for social media, in the next two or three years at most we will reach a crossover point where the volume of data available from sensors will exceed new social media postings, and sensor data volumes are likely to grow 10 or 20 times faster than social media sources.

We have been refining our use of structured data for the past 10 or 20 years.  Opportunity lies in understanding how adding unstructured and semi-structured data to the mix creates competitive advantage.  Here are just a few thought starters for your consideration:

Marketing and Sales Campaigns:  Consumers now actively share their likes and dislikes about companies, campaigns, and products through social media.  Through text-based sentiment analysis of social media messages companies are learning quickly what pleases and displeases their customers and prospects.

Ecommerce:  Web server logs and search engine summaries are being analyzed in detail to discover how to make the customer’s journey through your web site easier for them and more profitable for you.

Brick and Mortar Retail:  Retailers using electronic, RFID, video, and infrared technologies can now track customers as groups and as individuals through their physical stores to enhance the shopping experience.  Some grocery chains are now using video technology to count the number of shoppers and predict the number of checkout lanes needed to keep wait times at acceptable levels.  Customer reward cards can gather even more information matching customer detail to specific product purchases.

Supply Chain:  Both the consumers and providers of global logistical services have combined data sources from traditional internal ERP systems with semi-structured data from GPS location trackers, EDI messages, RFID and bar scans of shipped and in-transit merchandise, and even social media sources to speed goods along at lower cost.

Finance:  All types of financial institutions including banks, credit card companies, and the internal finance activities of companies are rapidly embracing new data types to reduce fraud, reduce revenue leakage (under billing), and ensure compliance with the multitude of financial laws and regulations.

Healthcare:  The government’s initiative to require electronic health records is making new and vast semi-structured data sources available to enhance treatment outcomes and contain cost.

Business executives need to understand the new opportunities available in Big Data from unstructured and semi-structured data, and how to blend these newly available data types into their data-driven competitive strategies.


Bill Vorhies, President & COO – Data-Magnum – © 2013, all rights reserved.



Tags: , , , ,