Search is the Internet OS. Before folks started talking about Web 2.0, the Semantic Web was all the rage. Sentiment Analysis emerges from a cloud of intersecting disciplines including search, the proliferation of user-authored content, the wisdom of the crowd, information retrieval techniques , academic research, data mining and progress in machine learning, emerging computational linguistic such as semantic categorization applications, content from the tail, and of course statistics.

For brand managers of the private sector (pharmaceutical, consumer, …), politicians, sociologs, Wall Street and others interested in buzz monitoring, sentiment analysis unlocks actionable information out of reach until now and creates real capitalistic value, such as real time brand, public image and reputation monitoring, product-related early warning systems, detection of unfavorable rumors for risk management, customer satisfaction indices.

I have been doing search for a while – Microsoft before MS knew how important search was, Enfish desktop search that curiously never made it, Altavista before the Yahoo! acquisition, Yaga distributed P2P digital content search and eCommerce platform, right on but too early, local search at Infospace while local was hot, and now AOL Search with the FullView turn around. It is sometimes still difficult to put things in perspective and really walk the walk in our searchers’ shoes in spite of all the focus groups, usability tests, eye tracking studies, so I decided to go on a quest to look for a generic definition of “sentiment analysis” and good synonyms, simply because I care about it, and wanted to feel how it really feels. The query is [sentiment analysis], a simple “informational” query as Andrei Broder would put it, in opposition to “navigational” or “transactional“, and the right answer should be a collection of good links, maybe even a Onebox, FullView, Short cut, or SmartAnswer … answers. My information need revolves around Defitions, Companies, Technology, Academic coverage, Articles, Papers and I am really not sure what to expect.

I started with ODP/DMOZ because it came up at work the other day. The Open Directory Project, with 5 million entries, is not particularly rich in the subject. 1 node [Top: Business: Investing: Derivatives: Options: Research and Analysis] related when you think about it from the content to the query, but not when you think of the intent of the query to the content. What I am looking for is generic information about Internet “sentiment analysis”, not investment indices. I tried to navigate the taxonomy from Top >> Computers >> Internet >> nothing under “buzz” nor “sentiment”. I’ll have to wait for ODP to be back up and become an Editor to create a “sentiment analysis” node. Can’t wait, I hear ODP is still pretty important to Webmasters, Publishers, Content owners in terms of ranking high on Google, and Editors still care passionately.

The Google search was pretty good. I first typed “sentiment analyses“, and got the “Did you mean” spellcheck suggesting “sentiment analysis“; nice. Interestingly enough, I believe I got one more sponsored link, including the top premium placement with 2 after I signed in. The top 10 results are pretty good, cover IBM research, PDF papers, the Wikipedia entry, and some blog. Out of 4 Sponsored Links, 3 were investment-related. Not what I am looking for although there are parallels between Internet sentiment analysis and investment trends I am sure. At the end, the most interesting results were i) organic: Data Mining: Text Mining, Visualization and Social Media, ii) the IBM definition iii) Paid Sponsored link Nstein. Overall, the Sponsored links are definitely not as relevant as the organic matches because coverage depends on economics but not organic content. Go wonder, there are about 10 billion pages on the net and only maybe about 700,000 advertisers; unfair advantage to the organic index, to the order of 3,000 times bigger if you consider advertisers have 5 pages each (out of no-where).


Wikipedia barely had a “stub” about “sentiment analysis”, so I contributed a couple more definitions I ran into. I’ll clean that up as I learn more while searching next for “sentiment analysis” in Yahoo!, Ask, and the others.

Know of some good links about “Internet sentiment analysis”? What it is? Who is doing what? Who is paying for it? Cool emerging linguistic technology like semantic categorization?