Odds and Ends 

Microsoft Maluuba teaches management 101 to machines in its first paper since being acquired

In mid-January, the ongoing race for AI put Montreal-based Maluuba on our radar. Microsoft acquired the startup and its team of researchers to build better machine intelligence tools for analyzing unstructured text to enable more natural human computer interaction think bots that can actually respond with reasonable intelligence to a text you send. The team dropped its first paper since being acquiredand it sheds light on the groups priorities.

The paper outlines a method for multi-advisor reinforcement learning that breaks problems down to besimpler and more easily computable.In oversimplified terms, Maluuba is effectively trying to teach leadership to groups of machines working to solve problems.


Existing conversational interfaces are rigidand easily broken. Siri, Alexa and Cortanaare miles ahead of old-fashioned dialog trees, but they still are a far cry from generalized intelligence. From a computational standpoint, a complete model of the world would be infeasible to create so instead engineers create specialized machine intelligence tools that can perform well on a smaller number oftasks. This is why you can ask Siri to make a phone call but cant ask it to organize a large dinner event.

A lot of attention is being given to reinforcement learning, a specialized branch of machine learning. As I have explained previously, reinforcement learning steals the idea of utility from economists in an effort to quantify and iteratively evaluate decision making. Instead of explicitly telling an autonomous car every rule of the road, it can be more effective to gamify the problem and assign figurative points that the intelligent system can optimize. The system could hypotheticallylose points for driving over a double yellow line and gain them for maintaining the speed limit.

This allows for a much more adaptable system, but it unfortunately is still a rather complex problem requiring a lot of compute. This is where multi-advisor reinforcement learning comes in.

  1. IMG_2149

    The Maluuba team working at their office.
  2. IMG_5356

    The Maluuba team working at their office.
  3. IMG_2183

    The Maluuba team working at their office.


The Maluuba team istrying to solve these complexity problems that face reinforcement learning. Their approach is to use multiple advisors to break the problem down into smaller, more digestible, chunks. Traditionally, a single virtual agent isused for reinforcement learning but in recent years multi-agent approaches have become more common.

In a conversation, the group presented the example of an intelligent scheduling assistant. Rather than have a single agent learn to schedule every kind of optimal meeting, it could someday make sense to assign a different agent to different classes of meetings. The challenge is getting all these agents to work together in consonance.

Intuitively its easy to imagine these agents as humans splitting up a task. Getting people to work together efficiently is no small task even though a divide and conquer strategy can outperform the lone wolf mentality.

The solution is to have an aggregator sit on top of all the advisors to make a decision. Each advisor in Maluubas paper has a different focus with respect to the grand problem being solved. Each agent gets a different reward for the action it specializes in. If agents take different positions, the aggregator steps in and arbitrates.

Maluuba used a simplified version of Ms. Pac-Man, called Pac-Boy to test different methods for it multi-advisor reinforcement aggregating learning framework. The team wants to studythe process of breaking down problems. Ideally there issome universality in how problems can be organized around a number of optimal aggregators. This is another place where its interesting to think about how humans decompose problems, often inefficiently think leadership 101 for machines.

Why you should care

Multi-advisor reinforcement learning can save CPU and GPU power. Breaking down a problem also makes it more easilydistributed to different servers forparalyzed processing. Reduced complexityis universally helpfulfor allreinforcement learning problems.

The research team explained to me that its still early days forworking alongsideMicrosoft. Theyre transitioning to Azure and building out communication channels between exiting machine learning teams. But when that process is complete, it strikes me that Maluuba will play a huge role in analyzing text and the language held within it.

While reinforcement learning itself isnt novel, Maluuba is pouring a lot of resources into it. We have already seen the potential of reinforcement learning in DeepMinds AlphaGo. Future joint research projects could bring more efficient and adaptable reinforcement learning into newconsumer and enterprise facing dialog products for Microsoft.

Read more: https://techcrunch.com/2017/04/06/maluubarl/

Related posts