There are plenty of techno pundits out there who are on a mission and rarely dip in to data to determine if reality is or is not cooperating with them. From the outside, one has to be somewhat creative to get at data that reflects consumer trends, but not that creative.
Below is the graph from Google Trends (Google Insights; a great tool) that shows search behaviour (let's call that consumer interest) around some key Microsoft product terms.
What does it suggest? If we use xbox (green) as a baseline (xbox is the best selling gaming console on the planet and the companion of the kinect, the world's fastest selling piece of consumer electronics ever) we might imagine that we see searching around windows 8 (blue) to exceed interest in xbox. Interest in lumia is approaching that in bing and microsoft surface is clearly showing an uptick with now obvious down turn.
"Marketing director Rami Nuseir says he uses it to analyze movie tweets when he’s bored.In fact, one blogger recently did just that, to get some deeper insight into the new James Bond movie, Skyfall, with an analysis of 3,000 tweets. Another use case: a university in Canada has 40,000 transcripts of people’s nightmares as part of a research project that’s trying to understand if there are common entities or themes appearing in nightmares among different demographics that correlate in time with real-life events.
“This kind of use case was not available before there was an out-of-the-box, easy-to-use-analytics tool that is cheap,” says Rogynskyy. The Excel plug-in actually was originally built as demonstration technology to give people a way to try out the main Semantria API solution, a RESTful API to process text in real-time that can handle any amount of traffic. Users that want direct access to the API tend to be resellers like social media monitoring companies or survey providers, who build a solution using the Semantria technology for their own customers."
"We’ve all become used to the clichéd ‘data is the new oil’. In digital media at least, data is what drives performance, data is what helps us make business cases, helps us to prove or disprove hypotheses and what ultimately justifies our fees. Having data at our fingertips is not new, it’s just that we have more of it now than we did before. If you’re in the data business, you’ll know the skill is not in the data collection or aggregation, anyone can do that, the skill lies in the interpretation. Clients today want ‘actionable insight’ (another over-used cliché in digital media circles), they don’t simply want reports. Clients want data to drive them forward, to feed their planning processes and to optimise performance. In doing a piece of data analysis I always liken the process to the peeling of an onion; there’s always another layer of information required to complete the picture and provide a more rounded answer to what’s going on. The ‘peeling’ can be the most challenging and yet rewarding practice for a digital marketer. Below I have described two examples of many, where my initial findings using a specific data source changed completely after further interrogation."
In a joint effort with Wegener Media we are developing a semantic editor and dashboard for journalists. SemBoard analyses the article a journalist is writing, and with Natural Language Processing technology extracts (named) entities from the content. That information is used to provide the journalist with suggestions for related news, background information, and more! SemBoard also gives the journalist powerful facetting, search, and filter options to find additional information in our continuously growing database of semantically enriched content.
Because images sometimes do say more than words, this is what the editor looks like:
"Eventster is a new application that aggregates events and then further differentiates them using semantic technologies. The result is additional information to help you identify events you want to attend, and those you don’t. This interesting information was found on Tech Crunch in their article, “Eventster Adds Social Signals To Help You Find Events You’ll Like, Know Which To Avoid.” This new and interesting use of social media may seem more like a party search, but if you apply it to educational conferences, trade shows, and continuing education offerings, it is easier to see the benefit of this avenue of social search. It is like getting user review for events – before you go."
"I recently did a text mining of all the article titles for Cell Stem Cellfor 2012. The results were very insightful (expressed in a beautiful word cloud) for the journal and what are the top areas of focus of authors publishing there.I just did the same analysis for the journal Stem Cells and the results are striking. See below.
The #1 word (other than “stem”, “cell” and “cells”, which I removed) is the same as Cell Stem Cell: human .
What does this mean?
Articles on human stem cells either (A) predominate or (B) authors choose to emphasize the human nature of their stem cells in the titles of their articles because they believe that reviewers and editors may view it more favorably."
"Sentiment analysis is nothing new but is taking on urgency with the rise of mobile computing and the event enabled enterprise.
What is sentiment analysis?
Sentiment analysis is all about figuring out the attitude of someone speaking or writing. With so much being expressed through various social media across the Web, knowing the actual attitude behind the words is more important than what’s being transmitted.
We do this with software applications that allow us to use automation to track sentiments about products, brands and individuals and to understand whether they’re viewed positively or negatively. We analyze blogs, reviews, tweets and comments as broadly as they’re available."
"I can’t speak for all soccer analytics experts, but I think it can be broadly said that the eventual hope in the field is for a) a set of models and metrics with reliable predictive quality, over an entire season or the course of single game, that can be isolated for regional tactical preferences and player skill (i.e. what a team should do to win more games across all possible worlds); and b) a set of metrics for scouts to help reasonably evaluate the long-term utility of a position player. But while those goals would truly set the field apart and give it a strong dollar value, it’s likely they are chimerical. In reality, analytics experts just want to make better cases about how individual players should behave together in a football match against a particular opposition in order to win. And they want to make accurate qualitative statements about why a player played well on the day."
"A grateful tip of the hat to Zodiac Killer Cipher Meister Dave Oranchak, for passing me a lovely little story in the Daily Mail (and now many other newspapers) about a plucky Second World War carrier pigeon found dead in a Surrey chimney. So far so mundane (there were 250,000 in the Royal Pigeon Service, and many failed to reach home): but what sets this particular one well apart was that it was carrying an enciphered message.
My transcription of the message looks like this (where there’s ambiguity, I’ve included the possibilities in square brackets, though I’d recommend sticking with the first of each set):- <above<
The first and last group (both “AOAKN”) almost certainly denote a key reference, a feature common to many cipher systems. The “27 1525/6″ bit is probably not part of the code but a military reference of some sort – I’d predict day of current month (27th) & time of day (3.25pm). It also seems vaguely possible that the dots delimit sentences, but they could just as easily be bits of dirt."
"Empathy can trick the smartest of brains into being fooled by a swindler, says a new US research.
The network of neurons (nerve cells) that allows us to empathise also dumbs down the brain's analytical network, says the study by the Case Western Reserve University.
The findings show for the first time that we have a built-in neural constraint on our ability to be both empathetic and analytic simultaneously.
When the analytic network is engaged, our ability to appreciate the human cost of our action is repressed, the journal NeuroImage reports.
At rest, our brains cycle between the social and analytical networks. But when presented with a task, healthy adults engage the appropriate neural pathway, the researchers found, according to a Case Western statement.
"This is the cognitive structure we've evolved," said Anthony Jack, assistant professor of cognitive science at Case Western who led the new study.
"Empathetic and analytic thinking are, at least to some extent, mutually exclusive in the brain," Jack said.
The work suggests that established theories about two competing networks within the brain must be revised."
"Many of my fellow foreigners arrive to my blog while searching for the most commonly used English words, and there’s a good chance that you may be one of them!
‘The top 100 most commonly used English words’, ‘top 500 English words’, ‘English word frequency lists’ – such and similar keywords are used by thousands of foreign English speakers eager to improve their English fluency.
But are these English word lists any good? Do they offer good value in terms of improving one’s ability to speak fluently?
Frankly speaking, such frequency lists don’t provide a lot of practical value – if any!
Fair enough – give me a few moments and I’ll show you exactly why!"
"All those involved with text analytics agree that the need for tools to analyze unstructured text is only going to grow. Agencies are using new tools and powerful processing to analyze terabytes of data in multiple formats to improve airline safety, hunt for terrorists, detect bio threats and more. Find out how.
“The government is struggling in all organizations with how to harness big data,” said Fiona McNeill, text analytics product marketing manager at SAS. “You don’t have to boil the ocean when you have text analytics. You can extract just what is relevant to begin with and then investigate that for the value.” Chris Biow, federal CTO at MarkLogic, agrees. “Any agency in the government that deals in any respect with the public should be to using text analytics now,” he told GCN. “It’s maybe only being used now in 20 percent of the cases where it should. It’s as broad as treaty compliance versus watching public sentiment toward the United States overseas to predict a riot. All of that is out there.” Unfortunately, the reluctance of public sector organizations to talk about their implementations of text analytics means there are few case studies to guide those interested in possibly implementing it."
For years, companies have been collecting and analyzing feedback from their employees to curb attrition, boost morale and gauge employee satisfaction. But gathering and parsing this unstructured data is a time-consuming and labor intensive undertaking.
For one thing, the sources of employee feedback—namely performance appraisals and employee surveys—can generate lots of data. What’s more, unstructured employee feedback is notoriously nuanced, requiring HR professionals to carefully read between thousands of lines of data for subtle hints and clues to an evaluator’s meaning or workforce sentiment.
Text analytics is a meaningful way to address this issue. Instead of relying on spreadsheets, HR professionals can turn to text analytics tools that promise to provide a more complete picture of employee satisfaction and workforce trends. By listening to employees’ needs, companies are better able to engage them, and thereby drive productivity, boost profitability and increase customer satisfaction. That is, if they ask the right questions and take the time to put new insights into action.
"Without a well-defined plan that puts social media analytics into a broader enterprise context, and a set of technologies that can effectively support that process, organizations can miss the mark -- badly -- in trying to parlay isolated insights gleaned from social networking data into strategic business intelligence.
In response to the surging popularity of social networking sites like Facebook and Twitter, companies are frantically setting up social media "listening posts" and empowering departments to take action based on what they hear. But treating social media data as an island unto itself is a big mistake, said Katie Paine, chairman of consultancy KDPaine & Partners in Berlin, N.H., and chief marketing officer at its Dubai-based parent company, News Group International.
"The reality is that social media is just a piece of the broader mix," Paine said. "You can't just think in terms of social media -- you have to think in terms of the business."
That alone won't guarantee social media data analysis success, though. Paine and other analysts cited a variety of challenges that organizations need to address as they plan and push forward with social media monitoring and analytics programs, including these:"
"ArchiveGrid connects you with primary source material held in archives, special collections, and manuscript collections around the world. You will find historical documents, personal papers, family histories, and more. ArchiveGrid also helps researchers contact archives to request information, arrange a visit, and order copies.
ArchiveGrid includes collection descriptions from WorldCat bibliographic records and from finding aids harvested from ArchiveGrid contributors' websites. If you have questions about your collection descriptions in ArchiveGrid, please get in touch with us. Interested in contributing? Please let us know that as well.
OCLC is phasing out its offering of ArchiveGrid as a subscription-based service in WorldCat Local, FirstSearch, and at http://archivegrid.org, and will replace it with this OCLC Research ArchiveGrid system according to the following time-table:
- Beginning in November 2012, we will no longer require authentication (by IP address or logon account) to use the stand-alone ArchiveGrid subscription service at http://archivegrid.org. This will provide easier access to and better syndication of ArchiveGrid and its collections to search engines like Google. - By December 2012, the OCLC Research version of ArchiveGrid will transition from its current beta status to a production service. - In January 2013, the OCLC Research version of ArchiveGrid will replace the http://archivegrid.org interface."
"Currently, as I've mentioned in previous posts, beaches are a strangely under-served segment of the local search space. Searches on Google and Bing for beaches are fielded by entities such as resorts and restaurants that happen to be matches for certain beach related terms. If you search for 'beaches in kauai' you will get hits for beach resorts, etc.
There is plenty of content about beaches, from the many dedicated locale sites to general travel related community sites (like Trip Advisor) and editorial sites (like Fodor's). In addition, there are a number of resources that aggregate structural data about beaches. These include open data resources like GeoNames and GNIS but also proprietary resources like Foursquare.
Unfortunately, there is nothing that brings all these things together. There is not product which provides an aggregate view of the set of beaches or the collection of things said or otherwise reported about them.
With an upcoming trip to Hawai'i at the end of the year, I wanted to make sure I was getting the best value for my travel dollars. I've build a prototype beach search engine which provides the following.
a partly curated set of beach data covering approximately 12, 000 international beaches
aggregation of beach related content
search funtionality (so you can search for kid friendly beaches that offer good snorkeling)
summarization of Flickr images so that an impression of what it's like to be at the beach can be formed
I believe there is plenty of potential for such a system. I've already found some hidden beaches that I wasn't aware of at our destination that I'm excited to check out when we get there. My goal is to make the system public in the next few weeks (my trip will be a forcing function for this!)."
"Might a presidential campaign have another use for tens of thousands of mini-memoirs?That’s the central thrust of a project under way in Chicago known by the code name Dreamcatcher and led by Rayid Ghani, the man who has been named Obama’s “chief scientist.” Veterans of the 2008 campaign snicker at the new set of job titles, like Ghani’s, which have been conjured to describe roles on the re-election staff, suggesting that they sound better suited to corporate life than a political operation priding itself on a grassroots sensibility. Indeed, Ghani last held the chief-scientist title at Accenture Technology Labs, just across the Chicago River from Obama’s headquarters. It was there that he developed the expertise Obama’s campaign hopes can help them turn feel-good projects like “share your story” into a source of valuable data for sorting through the electorate."
"I'm processing hundreds of thousands of files. Potentially millions later on down the road. A bad file will contain a text version of an excel spreadsheet or other text that isn't binary but also isn't sentences. Such files cause CoreNLP to blow up (technically, these files take a long time to process such as 15 seconds per kilobyte of text.) I'd love to detect these files and discard them in sub-second time.
What I am considering is taking a few thousand files at random, examining the first, say, 200 characters and looking for the distribution of characters to determine what is legic and what is an outlier. Example, if there are no punctuation marks or too many of them. Does this seem like a good approach? Is there a better one that has been proven? I think, for sure, this will work well enough, possibly throwing out potentially good files but rarely.
Another idea is to simply run with annotators tokenize and ssplit and do word and sentence count. That seems to do a good job as well and returns quickly. I can think of cases where this might fail as well, possibly."
"Imagine an expressive object. A book, a painting, or a website will do. This object, in literary terms, constitutes a text, with a meaning to be interpreted by readers.
Assume that a given audience would like to read this text (that is, the text is not already subject to some form of internalized, repressive foreclosure). However, you, for any number of reasons, wish to intercede and prevent them from reading it. How might you go about doing this?
One way to do it would be to act on object itself: that is, to subject the object to some form of overt cultural regulation. If it is a painting, you might ask a museum to remove it from its walls. If it is a book, you might demand a library remove it from its shelves, or even organize a burning in the town common. If it is an embarrassing or dangerous government secret, you might classify it and keep it under lock and key, available to only those with the proper clearance."
"When it comes to self-improvement, few people consider their reasoning skills. Most of us simply assume — and take for granted — that under most circumstances, we formulate perfectly rational opinions. But according to an emerging subculture of rationality gurus, there’s still plenty of room for improvement. They believe there are ways we can train ourselves to make better decisions, as well as increase personal control over our lives, health, and happiness. Here are a few of their ideas about how you can become more rational. To better understand rationality and how we can improve upon it, we spoke to Julia Galef, co-host of the Rationally Speaking Podcast, and the president and co-founder of the Center for Applied Rationality, a nonprofit think tank that teaches math- and cognitive science-based techniques for effective decision-making.
After speaking to Julia, it became clear that rationality is coming to be seen as a kind of cognitive enhancement — a likely explanation as to why so many lifehackers and futurists have started to take interest. And as we also learned, becoming more rational is not as difficult as it may sound. When it comes to clearer thinking, all we often need to do is make a minor adjustment."
"When a group of young sociolinguists started an annual conference called New Ways of Analyzing Variation four decades ago, they focused on variation of the spoken kind, looking at how speech patterns relate to group identity. But at the 41st gathering of NWAV a week ago, at Indiana University Bloomington, papers on traditional ways of speaking shared the limelight with something the founders couldn’t have predicted: the 21st-century terrain of computer-mediated language.
Twitter, in particular, merited a whole panel, with papers on the medium’s changing slang (are your Twitter followers “tweeps,” “tweeple,” or “tweeties”?) and on the way that the Spanish verb “gustar” (meaning “to like”) gets used in different parts of the Spanish-speaking Twittersphere. A third paper crunched through millions of tweets to detect gender differences in language use, not just in dictionary words but in such electronic shorthand as “xoxo” (for hugs and kisses) and emoticons."
"I’m a frequentist. But lots of very smart people aren’t. This post isn’t an argument for or against either philosophy. It’s just to alert you that this philosophical conflict exists, that it is very deep, and that you, as a working scientist, need to be familiar with it in order to make an informed choice of statistical approach. One thing frequentists and Bayesians agree on is that it’s a bad idea to do “cookbook statistics”, where you just mindlessly choose and follow some statistical “recipe” without worrying about why the recipe works–or even about what it’s trying to cook! I agree with Ellison and Dennis (2010) that ecologists should be “statistically fluent”, although I disagree with them that taking calculus-based technical courses in statistics is the only way to achieve fluency. Note that “fluency” is not at all the same thing as “technical proficiency”. If anything, I think one unfortunate side effect of the increasing popularity of technically-sophisticated, computationally-intensive statistical approaches in ecology has been to make ecologists even more reluctant to engage with philosophical issues–i.e. less fluent, or else less likely to care about fluency. It seems like there’s a “shut up and calculate the numbers” ethos developing, as if technical proficiency with programming could substitute for thinking about what the numbers mean. Lee Smolin noted a similar trend in fundamental physics."
"Not everything that counts can be counted, and not everything that can be counted counts” – sign in Einstein’s Princeton office
This quote is from one of my favorite survey reminder postcards of all time, along with an image from from the Emilio Segre visual archives. The postcard layout was an easy and pleasant decision made in association with a straightforward survey we have conducted for nearly a quarter century. …If only social media analysis could be so easy, pleasant or straightforward!
The main point I want to make here is about the nature of variables in social media research. Compared to a survey, where you ask a question, determined in advance, and have a set of answers to work with in your analysis, you are free to choose your own variables for your analysis. Each choice brings with it a set of constraints and advantages, and some fit your data better than others. But the path to analysis can be a more difficult path to take, and more justification about the choices you make is important. To augment this, a quantitative analysis, which can sometimes have very arbitrary or less clear choices included in it, is best supplemented with a qualitative analysis that delves into the answers themselves and why they fit the coding structure you have imposed."