Big Data, Small Data, and Everything In Between

measuring performance data

I’m definitely not the first person to say this, but the term “Big Data” is significantly overused.  In fact, Big Data has become a catch-all phrase for anything related to data analysis, (regardless of whether it actually involves data sources commonly associated with Big Data).  In some ways, a lot of people probably think they are working with Big Data when in fact they are not.

So, what exactly is Big Data, how is it different from more conventional data sources, what is the benefit of using it, and how does it relate to organizational culture?

Over the past 7-10 years our ability to collect and store digital information has increased exponentially.  In fact, it is estimated that the world’s per capita data capacity has doubled every 40 months since 1980, currently we generate about 2.5 exabytes of data every day (that’s about 500 million hard drives or 1 billion digital movie downloads).

This data comes from social media, GPS sensors, digital pictures, video, and purchase transactions, among other sources.  The term “Big Data” comes from our attempt to collect, compile, and analyze this information in meaningful ways.

More simply put, Big Data is information that is too large and complex to use traditional data management practices.  Since Big Data comes in a variety of formats, it needs to be cleaned and structured before it can be analyzed.  Due to its size, this takes time and can be a very involved process.  In terms of ease of use, it is definitely on the more challenging side of things.  If you think of data management as a continuum, datasets (i.e. surveys) are on one end, databases (i.e. sales records, employee information) are in the middle and Big Data is on the other end. The larger the data source the harder it is to manage.

The hype surrounding Big Data is usually focused on two things: (1) its sheer size, which potentially gives us a more representative sample of the population, and (2) its content, which allows us to look at areas that would not have been feasible 10 years ago.  Because of this, Big Data is touted as the key to unlocking our ability to understand large social, medical, biological, and emergent trends.  The ultimate goal is for businesses, nonprofits, consumers, and policy makers to leverage this data to make more informed choices and better understand the potential cause and effects of those choices.

While I’ll be the first to push for using data to better understand our client’s needs, bigger isn’t always better.  Ultimately the type of data we use depends on the scope of the questions we want to answer.  For instance, if you are trying to figure out what genes are linked to diabetes, then leveraging a large amount of complex data (i.e. Big Data) will probably help you reach your goal. However, if you want to understand what contributes to employee satisfaction and performance, then a traditional dataset or database is more appropriate.  Most analyses of organizational culture leverage smaller data sources such as surveys, interviews, and metrics reports (see my post on using text mining to understand culture).  As of now, Big Data is too complex and resource intensive for most organizations to utilize; however, it is likely that companies such as Google, Apple, and Facebook (which have both a sizable workforce and processing power) may become pioneers in using Big Data to understand the inner workings of their organizations and fine tune the customer-business relationship.

One last remark on Big Data.  Big Data is still in its infancy.  There are a lot of interesting things people are doing with complex data sources, but there are a number of kinks that still need to be worked out.  Researchers have a very good grasp on using Big Data to predict trends, but they are far behind on the ability to make inferences.  For instance, Amazon can recommend products based on your search and purchase history (prediction) but I am not sure whether they are able to determine how likely you are to actually purchase the product.  In other words, the models are robust, but our confidence in the models still needs some improvement.

At the end of the day, regardless of its format, data is becoming an integral part of the way we do business and make choices as customers.  While there will likely be only a handful of people doing the actual analysis, it will be essential for people to be critical consumers of data so that they are able to assess the quality of the analysis, understand its strengths and weaknesses, and discuss their thoughts with others.

Now that we’ve touched on the basics, you may be wondering about how businesses acquire and use this data.  I’ll discuss the ethics of data in my next post (stay tuned).

Coupons, Analytics, and 4 Fun Ways to Understand Your Customers

Ever wonder why grocery stores push customers to use rewards cards?  I became fascinated with this question recently when I received a booklet of coupons in the mail from my local grocery store.  The coupons were tailored to my purchasing habits and even offered decent savings on items that I purchased only once or twice.  Seemed pretty amazing to me, but I wondered how and why they did it.

After doing some research, I realized that rewards cards serve two purposes (1) to drive customer loyalty by offering discounts and (2) to build customer profiles for the store and region.  It’s the second purpose that is the most intriguing.  Essentially grocery stores use purchase information to understand the preferences of their customers; this helps them estimate demand and ensure key products are stocked at each store.  In a way, they can use the data to custom tailor their services to unique demographic groups.

Grocery stores offer an exchange, in return for using your purchase information, they provide in-store discounts and personalized coupons.  Overall, it seems like a pretty fair trade and an excellent way for businesses to understand their customers.  While most grocery stores are part of national chains (which can afford large business intelligence departments), are there similar ways for small businesses (who often have constrained resources) to capture and analyze this information?

Here are four simple ways small businesses can use data to understand (and better serve) their customers.

  1. Leverage Web Analytics.  Website traffic is a great way to understand where viewers come from, how they found the website, and what content they are viewing.  This can help identify if there are key areas of interest (possibly a great article or blog post) and how to optimize content for your audience.
  2. Reach Out with Social Media.  Pretty much everyone has a social media strategy, but how much analysis is actually going on.  There are a number of platforms to look at trends, popular posts, “likes”, and “shares”.  While these are simple measures, they allow businesses to see what content is most popular.  Businesses can take this a step further with text and network mining to analyze the content, sentiment, influencers, and relationships between the viewers.
  3. Dig Deep into Current Clientele.  Examining sales for current clients is the best way to understand demographic trends, product preference, and customer “loyalty.”  For a small business like gothamCulture, this can be easy as looking at where clients come from, what market sector they are apart of, and what type of services they purchase.  By identifying links and connections, businesses can target marketing and project trends.
  4. Understand the Competition.  Understanding where the competition’s customers come from and how it uses social media can be a mirror to allow small businesses to differentiate their products and target new customer groups.

By leveraging current sales, potential sales (website traffic, social media, and the competition), and identifying trends and similarities between the two, small businesses can start piecing together profiles for different customer populations, the products and services they may be interested in, and anticipate future market trends.

Data + Culture = A New Approach for Safety

In my previous two posts (here and here), I talked about how the use of data enhances our ability to understand culture. In this post, I’d like to expand on that a bit further and provide some real world context.

gothamCulture is passionate about safety; in fact, we recently worked with several clients to address safety concerns within their organizations. There is evidence to suggest that workplace safety is not only essential to maintaining the health and wellbeing of employees but also can improve a business’s bottom line. For organizations with safety concerns, addressing these challenges often necessitates a change to the underlying culture.

In our work with clients, data often takes the form of text-based inputs from interviews, focus groups, and site observations. While text-based data provides a wealth of information, it can be challenging to extract the most important pieces. One widely used method is text mining, which can be used to identify major themes among the interviews. In the example text cloud above we used text mining to look at overall morale. A couple key words jump out such as “antagonistic”, “complacent”, “change”, and “unsafe”. This is supported by key ngrams such as “staff extremely difficult”, “tough change culture”, and “question unsafe bad”. These data points seem to suggest that while change is needed to improve overall safety there are underlying tensions within the organization that make it difficult to discuss and implement improved safety measures.

This data is useful in understanding broad issues and challenges in organizations; however, it does not show connections and correlations which are helpful in determining strategies best suited to address the issue. Correlations are a product of quantitative (numeric) data, to identify correlations we transform our text-based data into quantitative data. While there are a number of methods being pioneered, a simple method we have leveraged is using text clouds to identify themes and then determine which interviews, focus groups, and site observations include those themes. Interestingly, this method produces fairly reliable results.

The network diagram above shows a number of correlations that exist across the data. The size of the circle relates to the number of correlations the “theme” has, the size of the line relates to the strength of the correlation, and the color relates to different categories of themes (blue=training, green=morale/culture, yellow=safety). Here we get a better idea of the different dynamics within the organization. For instance, while there is a connection between training and safety, the elements connecting those two themes are a hierarchical culture and poor morale. In this case, it is not enough to update policies or develop new training opportunities, the organization must also address its hierarchical elements which seem to be linked to poor morale, inadequate communications, and a sense that the organization is uncaring.

Organizations are a lot more like ecosystems than they are machines. Addressing challenges (whether safety, mergers, or customer relations) requires a lot more than turning a wrench or drawing a schematic; it involves understanding relationships between the values, personalities, and perspectives that exist across the organization. Traditionally, most people have felt that data analysis is a little out of place when looking at culture, but, as we’ve shown, it is an effective tool that can save time and reveal compelling insights.

Mining Text to Understand Your Culture Puzzle

It is estimated that 80% of data in the world exists in the form of documents, reviews, blog posts, emails, and articles. Unlike numeric data, text-based data is unstructured, making it more difficult to identify themes and trends across different media. However, in our work we found that text-based data conveys a number of attributes (such as values, beliefs, perceptions, needs, etc.), which cannot be expressed in traditional data sources. For organizations that want to understand their culture, text-based analysis is an often overlooked, but critical piece of the puzzle.

While traditional statistical methods are not as effective with text-based data, there are a number of methods to help us sift through this information. Furthermore, when we couple traditional and text-based analytical approaches, we have a bigger lens to understand our client’s organization.

To give an example, I used Glassdoor.com to understand employee attitudes at a large multinational airline. There were a total of 30 reviews, which were posted from 2008 to 2014. The reviews were from employees at 14 locations in 12 different countries. The reviews included numeric ratings of the overall organization, its culture, career opportunities, work-life balance, senior management, and compensation and benefits. In addition, the reviewers included a summary of the pros and cons of working at the organization.

Using text mining methods, we can sift through the unstructured data to understand overarching themes and attitudes. In the text cloud above, size corresponds to how frequently the word appeared across the reviews. Color corresponds to whether the word was positive (green) or negative (red). While this is one of the more basic approaches, a couple of key concepts emerge. We find words such as “flexible”, “friendly”, “security”, and “interesting” which may indicate that the airline has a friendly working environment and provides a certain degree of job security. On the other hand, there are other words such as “bureaucratic”, “cheap”, “worsening”, and “slow” which may indicate that there are some cumbersome processes within the organization.

This particular approach looks at words in isolation based on their frequency and sentiment; however, we can also look at clusters of words (ngrams) to clarify these themes. The most frequent word clusters include: “low salary”, “quality service”, “management good”, “learning opportunity”, and “great experience”. This may indicate that the overall management is received well, but the overall salary and compensation is low. This latter point might be why a number of reviewers indicated their experience was great but was more of a “learning opportunity,” indicating that they may have moved on to other opportunities. This text mining process saves us and our client’s time, provides an overview of the themes and attitudes, and point our culture overview toward key areas for further exploration.

On the other side of the spectrum, we can look at the scores and ratings to find patterns and themes. The map above shows ratings by location. Color corresponds to the airline’s overall rating, while the size of the dots corresponds to the employees’ attitudes toward the organization’s culture. One interesting pattern is how the ratings were higher in Eastern Europe and South Asia versus Germany, Switzerland, and New York. From one perspective this can be an important indicator of locations where issues might exist; however, for multinational companies it is also important to consider how values, attitudes, and assumptions about the work experience may change from country to country. Employees in Switzerland may have very different expectations than employees in Greece, Russia, or the Philippines. These are important factors to consider when bringing together different people (whether two offices down or two thousand miles away) to address common challenges.

Understanding culture is a lot like a puzzle. There are a lot of pieces. No one piece is exactly the same, and they all fit together in a unique way. Leveraging a variety of data types and data sources point leaders to key pieces that can make the picture come to life. In today’s competitive market, companies need to leverage the full range of tools at their disposal to orient their organizations for long-term success.

Color Your Culture Picture with Data

We live in a world of data. Every day we are inundated with more and more information. In fact, the internet alone is estimated to comprise about 1.2 Zettabytes of information (that’s about 2.6 billion times the size of the average computer hard drive). We use data to help us make decisions in many parts of life from where to go to dinner, what schools to send our kids to, or where to invest. The use of data in business planning and operations is just beginning to take off and is expected to increase exponentially as data storage costs continue to decline.

So what exactly does this have to do with culture? Surprisingly a lot. Organizations regularly collect large sums of data regarding their workforce and operations. Some common types of information include retention and recruitment numbers, workforce size, sales figures, and customer and supplier orders.

Each of these data points tells a story about what is happening in the organization. The key is to make meaning of this information by identifying connections and correlations between data points. For example, “Big Box” Inc. discovered the following connections following an analysis of its culture and operations:

  • Sales is driven by customer satisfaction, overall safety compliance, and employee retention.
  • Retention is driven by employee satisfaction, employee satisfaction is closely associated with safe work environments and the availability to opportunities to mature skills.
  • Safety compliance is closely linked to the maturity of the processes that govern the company.

By understanding these connections we have a more colorful picture of how the moving pieces are interrelated. Using the example above, our individual data points are now connected in a network of relationships where each individual part impacts the whole. For instance, improving employee retention not only requires us to improve professional development opportunities but also to closely examine the safety of the work environment. That in turn compels us to look closer at our processes and how we use them to manage the organization. To address a specific problem, we have to understand the system and how it functions.

Data isn’t just for business intelligence departments. The wealth of data (both quantitative and qualitative) we can access today makes our understanding of our culture much richer and nuanced. If we can use data to peel through the layers of our culture, leaders are able to address core issues earlier and employees will be more satisfied with their work, and all stakeholders will have the necessary information to tell better stories about where they work and why it matters.