Researchers at Clarity Labs at the University of Michigan recently launched Sirius, which they refer to as “an open end-to-end standalone speech and vision based intelligent personal assistant (IPA)….” It was introduced, primarily to the academic community, in the journal of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2015, where you can read its design in detail here. Even though I associate the abbreviation “IPA” with “India Pale Ale,” I have to give props to Clarity Lab for taking on the challenge of providing human-like responses to utterances, text input and images across knowledge domains and at scale.
Authors of the document observe that intelligent assistants are “emerging as one of the fastest growing Internet services” and attribute that growth to the proliferation of IPAs on a variety of platforms – iOS, Android and Windows Phone. They expect even greater growth as IPAs are incorporated into wearable technology, like smart watches. They also observe that the ability to respond in real time to spoken words relies on a series of computationally intensive activities, most notably natural language understanding, speech recognition and (increasingly) video or image processing. Such activities require the sort of massive datacenters that only a small number of very large companies – think Google, Apple, Microsoft, IBM, Amazon – can afford to operate.
To derive their “best answers” to queries and gestures, Google, Microsoft, Apple, Nuance and IBM take a brute force approach to capturing and analyzing queries, utterances, product manuals and all other sorts of Big Data. This means that the current approach to Intelligent Assistance militates toward marketplace that is dominated by a handful of very large service providers. On the one hand an “open source” approach to IPA suggests a more democratized marketplace. Yet it leverages existing “well-known” resources. The speech recognition had been Carnegie Mellon’s open source Sphynx but no is Google Voice. Support of “question answering” is built on the lesser known OpenEphyra framework, but now may be the product of collaboration with IBM’s Watson business unit. An added capability is “image recognition” or image matching, enabling Sirius to answer questions about a captured image.
In this video, University of Michigan’s researchers explain the objectives of Sirius in more detail:
On one level, as explained in depth in the published paper, they are exposing an anticipated performance gap between the demands of intelligent assistance and the ability of generic data centers to support such activities at scale. On a more subversive level, they are empowering developers to build their own solutions and, perhaps, show more creativity and capability than we’ve seen from the companies that “have had a monopoly on this infrastructure.” That said, it is interesting to note that some representatives of those monopolies are backers of the initiative. The project is supported by Google, the Defense Advanced Research Projects Agency (DARPA) and the National Science Foundation.
But don’t be fooled. Sirius is not a direct competitor of Siri, Cortana or Google Now, which have grown far beyond Q&A platforms and are growing into more advisory roles. Cortana, for instance, includes command and control Windows-based applications and communications utilities, as well as a machine learning component that tracks an individuals location, Web surfing and search activities in order to tailor responses, engage in conversations and learn over time. Ditto for Google Now, which is also has the capability to deliver observations (“you’re 26 minutes from your office”) and suggested actions in the form of cards on the Google mobile app as well as elsewhere. Siri has device control and message dictation baked in but, you must remember, was initially conceived as a “do engine” with tight links from utterance to action (E.g. “Book me a table at Slanted Door at 6:30”). As described in the ASPLOS publication an illustrated on YouTube, Sirius has the more modest goal of enabling individuals to ask questions in their own words or by taking a picture and then benefiting from the collective wisdom of IBM Watson.
Sirius should have great appeal to a broad community of developers. It answers the recognized need for a simple way for a large group of developers to get better acquainted with state-of-the-art technologies for NLU and automated Q&A. As the use cases and code based grows, it will also help researchers address the scalability gap that their model anticipates as the number of users, applications and service providers explodes. Democratizing intelligent assistance has to start somewhere.
Categories: Intelligent Assistants