It’s pretty amazing how many research tools (social media monitoring in particular) attempt to charm users with meaningless graphical overlays.

When I say meaningless, I’m referring to:

Share of Voice Metrics Based on Porous Datasets
Share of voice is an empty gesture if the underlying dataset is missing a substantial volume of the relevant authors and sources that cover a given topic.  If a search for “virtualization” turns up a result set that’s missing scores of relevant blogs and publications, whatever pretty share of voice chart that ensues is useless.

Sentiment Analysis
This is the one that I find particularly objectionable.  There are some extreme challenges in natural language processing that have yet to be conquered to make the margins of error even remotely acceptable for sentiment analysis in terms of raw text / tech news analysis.  Even processing a large set of unstructured data and determining what the theme is can be extremely difficult, with how many words in the English language have different meanings in different contexts.  The idea that you can slap some semantic foo on top of a huge volume of clips and determine which of them are positive or negative in tone is outrageous, and every single representation of sentiment analysis that I’ve seen applied to a substantive dataset of tech articles (and determining whether a specific vendor or product was mentioned in a positive or favorable light) has fallen short when drilled down.

Who buys that crap?  I’m guessing the same type of companies that leave real research work to interns … who consider a keen insight of the landscape too “low level” and “not strategic.”  IMHO (and in the opinion of the folks that sign up for ITDatabase), the killer app for interpreting news is still the human brain, which turns out to be incredibly efficient when it’s fed truly comprehensive datasets on whatever category of tech news it needs to disseminate.

Can either Share of Voice or Sentiment Analysis be pulled off effectively?  I’m sure.  I just have yet to see a solution that provides either share of voice or sentiment analysis that both #1- has a comprehensive / accurate dataset and #2- if there is semantic foo / taxonomies under the hood, the results aren’t extremely skewed by false positives.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Print this article!
  • TwitThis
  • Reddit
  • Technorati