On using machines to help humans to understand natural language faster, cheaper and better and on feeling comfortable with your life choices
Vijaykant Nadadur is the Co-Founder & CEO of Stride.AI. His expertise spans the areas of Artificial Intelligence and Natural Language Understanding. He is also a mentor at Techstars Paris and Bangalore. He has lived in 6 countries and speaks 7 languages.
On rethinking how audio is captured, represented and retrieved in this new world of AI
Ishwarya Ananthabotla completed her BS and MS in Electrical Engineering and Computer Science from MIT. She is pursuing a PhD in the MIT Media Lab’s Responsive Environments group, exploring ways to capitalize on our knowledge of human perception, cognition, memory, and attention, to re-think traditional paradigms for audio capture, representation, and retrieval.
Well the short answer is “Yes”, longer answer is “not right now but in a few years”. I know that is very provocative and most people would disagree, especially radiologists. Stanford researchers have been able to create machine learning models and algorithms that can detect brain aneurysms much more effectively than a radiologist.
What is a brain aneurysm?
Brain aneurysm is a bulge in the blood vessels that could potentially leak or burst causing brain hemorrhage, damage or even death.
So the question is, if AI can do a better job of identifying aneurysms, can it be used in place of radiologists doing a similar function? If it can, then we will not be limited by number of radiologists there are but by number of servers we can add. Moore’s law applies to machines and not to humans so over time it’d become cheaper to deploy AI Radiologists than human radiologists.
This new AI tool is apparently built on an algorithm called HeadXNet. Researchers note however that the results are dependent on “scanner hardware and imaging protocols” which are not standardized, providers (hospital or lab) might have different hardware and use different imaging techniques that will influence how the AI tool detects or misses accurate diagnosis.
“A.I. could play a big role in supporting prevention, diagnosis, treatment plans, medication management, precision medicine and drug creation” __Bruce Liang, Chief Information Officer of Singapore’s Ministry of Health
In software development, versioning is one of the key tenets of good programming. One can go back in history using a version control system such as git, svn, cvs etc to troubleshoot bugs, reverse deployment. Wouldn’t it be cool if a similar system existed in medical imaging which can assist radiologists to quickly “see” if a treatment is positively or negatively affecting the patient? Computer vision can process images and highlight differences between two or more images in real-time. That means, a radiologist need not spend hours retrieving, interpreting images of a patient and identifying the differences, with a click of a button on their phone they can see highlights of what has changed between images.
If there was a hypothetical medical imaging versioning system, how would such a system work, how would it be implemented and deployed in hospital systems, who would primarily use it, how would it enhance treatment effectiveness?
“Medical imaging guides the course of much of patient care and is an essential element of biomedical research. From x-rays and ultrasound to computerized tomography (CT), functional magnetic resonance imaging (fMRI), and positron emission tomography (PET), medical imaging helps clinicians diagnose, treat, and understand a range of diseases and conditions, including cancer, cardiovascular disease, and neurodegenerative disorders.”
Internet Working Group for Medical Imaging (IWDMI) defined the above as the four key pieces for better healthcare through effective medical imaging. I’m particularly interested in “Advanced Computation & Machine Learning” aspect of the roadmap.
Here is a set of breast cancer images for a patient taken at regular intervals. I’m not going to pretend to know what’s going on in the following image but anyone with half a brain can guess that it shows images of different stages of breast cancer and it’s trying to help the physician understand treatment’s effectiveness over time (week 1 through week 20).
For a radiologist to pull such a report, my sense is, it’s not straightforward. Retrieving images from disparate systems, putting them next to each other for quick and easy comparison and looking at the treatments (dosage etc) alongside the images and viewing that over time to get a sense of disease progression probably takes hours if not days.
This can be streamlined and automated using better image storage, retrieval and computer vision. If we can reduce the amount of time to generate this report from days/hours to minutes/seconds, it would save precious time for physicians and might be a life saver for the patient.
In this series, next we will see where the current art is on this issue and subsequently look at the possibilities of using latest Computer Vision (CV) techniques to save time for Radiologists and Pathologists.
“If the whole truth is told, oral tradition stands out as the single most dominant communicative technology of our species as both a historical fact and, in many areas still, a contemporary reality.” __John Foley, Signs of Orality
Johannes Gutenberg ushered in the era of printing press and movable type around the year 1439. Thanks to him I’m able to write this blog but what intrigues me is why did we go from aural tradition to written tradition in the first place. What were the problems with aural methods that were addressed by printed word?
How did people transfer knowledge, news, gossip before printing was a thing? Handwritten manuscript was the prevalent method for writing and sharing ideas. However, the most popular method of transferring, sharing ideas with one another was through oral communication.
“If the whole truth is told, oral tradition stands out as the single most dominant communicative technology of our species as both a historical fact and, in many areas still, a contemporary reality.”
__John Foley, Signs of Orality
In ancient India, scriptures, folklore, stories were mainly transmitted orally. It is widely believed that srutis of Hinduism (Vedas) were never written down but have been transferred from generation to generation solely orally. Signs of that can be seen even today in the way music is taught in North India, Hindustani music, which is the main focus of one of the four vedas, Samaveda. The notations, structure of the composition (Raaga) and the Chalan (movements and interconnections between various notes) are some of the aspects that are still transferred between the Guru and Sishya in the so called Guru-Sishya-Parampara in oral methods.
“The Vedic texts were orally composed and transmitted, without the use of script, in an unbroken line of transmission from teacher to student that was formalized early on. This ensured an impeccable textual transmission superior to the classical texts of other cultures; it is, in fact, something like a tape-recording… Not just the actual words, but even the long-lost musical (tonal) accent (as in old Greek or in Japanese) has been preserved up to the present.”
— Michael Witzel
In Greece, it’s believed that Homer’s epic poetry (Lliad and Odyssey) was primarily composed, performed and transmitted orally.
Are we getting back to oral/aural mode of transmitting ideas more so than written script? This brings me to the point I’m trying to make with this writeup, if we can speak and the machines can understand and converse with us just like humans, if not better than humans, would oral communication become the predominant way we transmit ideas, commands, conversations?
If we can talk to the TV, talk to the garage door, talk to the thermostat and listen to books, listen to magazines, make annotations using audio markers rather than visual markers, you get the point, If I can talk to my phone and it can talk back to me (as Turing dreamed of), would we still want to type or read? or would we rather talk and listen?. What do you think?
“Alexa and Google Duplex are not perfect but so are humans, only difference is, Alexa and Duplex are making great strides forward”
During the years 2008-11 I had worked at a healthcare IT company that used Automated Speech Recognition (ASR), Speech-to-Text (STT) and Text-to-Speech (TTS) software to automate collecting insurance information from members on behalf of health insurance companies in the US e.g. Blue Cross, United.
Speaking to technology, as if you are chatting with a human, has been the holy-grail of digital user interface design and I believe we are in the golden age of speech recognition.
Speech recognition will become so ubiquitous that we wouldn’t have to type to chat with friends and family, in any language, or speak to an automated voice assistant that doesn’t understand what you are saying even after repeating 5 times. We would never have to fumble through many similar looking buttons on your TV remote to watch the show you want to watch, from any streaming app. It will be accomplished by a simple voice command, as if summoning a human assistant to find and play the show from Netflix for you.
Alexa And Duplex Are Not Perfect But So Are Humans
In 2017, when I spoke to my Alexa at home in Telugu (my mother tongue), it was missing what I said 9 out of 10 times but now in 2019 it’s already faring a lot better, magic of NLP in the cloud. I’d say it’s still far from perfect but not 1 out of 10, may be a 5!
Typing in non-english language into your iMessage or WhatsApp requires installing a language keyboard, finding several key combinations to type a single word, it’s not easy. Imagine simply speaking to the chat bot and it types the text in a language of your choice effortlessly. Imagine being able to communicate with a tourist in english while she sees text on her phone in her language instantly. It’s already possible to a large extent, with assistants like Google Duplex we are well on our way to this “utopian” world where language is not a barrier to communication any more.
Here is the sample google duplex making a call to reserve a spot at a hair salon, not bad ha?
In contrast, AI research has been in the works for many decades starting in the 1950s, there have been as many if not more libraries created in the past 10 years as there were programming languages created in the last 50 years.
Here is a chart of AI libraries and how many people “follow” them on github, interestingly, newest of the libraries, TensorFlow seems to be a few orders of magnitude more popular than the others. That doesn’t mean it’s the best AI library, in fact there is no such thing as an AI library (general purpose). My sense is that there are libraries that assist in developing AI applications and some are better suited for an application than others, depending on the problem being solved e.g. computer vision, natural language processing
Let’s take a quick look at what each of these libraries are suited for
scikit-learn – machine learning library of algorithms for data analysis and regression in Python
BVLC/Caffe – Berkley Vision and Learning Centre’s Caffe is a deep learning library for processing images. Caffe can process over 60M images per day with a single NVIDIA K40 GPU
Keras – a deep learning Python library that runs on top of TensorFlow, Theano or CNTK. Primarily an experimentation framework assists in fast experimentation with models.
CNTK– Microsoft Cognitive Toolkit is a deep learning library that can be included in Python, C# or C++ code. It describes neural networks as a series of computational steps in a directed graph.
mxnet – A flexible and efficient library for deep learning.
“Deep learning denotes the modern incarnation of neural networks, and it’s the technology behind recent breakthroughs in self-driving cars, machine translation, speech recognition and more. While widespread interest in deep learning took off in 2012, deep learning has become an indispensable tool for countless industries.”
PyTorch – is an open-source machine learning library for Python, based on Torch, used for applications such as natural language processing.
PyTorch is a Python package that provides two high-level features:
Tensor computation (like NumPy) with strong GPU acceleration
Deep neural networks built on a tape-based autograd system
Theano – is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
Caffe2 – aims to provide an easy and straightforward way for you to experiment with deep learning and leverage community contributions of new models and algorithms.
Torch7 – is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT
Ok great, this is a list of a few of the hundreds of AI libraries, so what? I can google them myself, what’s the point of this blog? I’m as deluded as I was before reading this blog, if I am just starting out in AI, which library should I pick, where should I start? The short answer is pick any library, you will be better off picking one and running with it and developing something than not picking any, you have to do it, might as well start now than later.
Of course a better answer might be, what do you seek to solve? Are you looking to programmatically recognize people’s faces or cars in a photograph of a busy street? You might want to start with the BVLC/Caffe. Here is a good presentation to get you started on Caffe
“To self-drive or not to self-drive is not the question” __Madhav SBSS
Sensors play a key role in self-driving cars. Radars are widely used as they are good in different weather and lighting conditions and they are cheap. Lidars are high density sensors but are not cheap and are not good in poor lighting and weather conditions. Cameras are cheap, high density but don’t perform well under poor lighting or weather conditions. Our eyes work in a similar way to the camera, it might be the right device here in mimicking human visual perception.
Here is a visualization of where these sensors are being used today in developing self-driving car technology –
Here is a visual spider web representation of the effectiveness of each type of sensor –
Is driver state detection important in self-driving cars?, isn’t the driver supposed to be free to do whatever they want to do, as if they are sitting on a sofa at home?. Here is a chart from Lex Fridman’s talk on what to track and tracking difficult (increasing) going left to right.
As you can see in the above picture, there are still many areas in red (hard to solve), which means there are plenty of opportunities to make a difference in the field of self-driving cars. Be part of the dream that brings safety and access to humans trying to get from point A to point B.
Here are some of the companies that are worth following if you are interested in self-driving cars, technologies and most importantly the people driving the technology forward.
Companies working on fully autonomous self-driving tech –
“To drive better than humans, autonomous vehicles must first see better than humans.” __Nvidia
How does the technology we briefly reviewed in part 1 of this blog actually work to make self-driving possible? before jumping into how the tech comes together, we should understand the leveling up to become fully autonomous –
Levels of autonomy
Different cars are capable of different levels of self-driving, and are often described by researchers on a scale of 0-5.
What are the challenges and how are they being addressed with AI and Tech. For example, the car needs to be able to “see” what is around it, how fast is it moving, how far is it and do it extremely well in terrible weather or poor lighting conditions, not just in good light. For example, here is a picture of the rearview wide angle camera that shows 3 objects moving, so first, the software on the car needs to see them, then figure out how near to the car they are, while also detecting if they are moving or stationary.
Car uses radar to detect the distance of the objects, radar works by sending out pulses of radio waves and receiving them back as they bounce off the surface of these objects. Using the time it took for them to bounce back and speed at which they travel, one can calculate the distance (S = D/T). Good thing about radar is that it can work in low or zero light conditions. A radar can tell distances and speeds but it cannot tell what the objects are, whether they are humans, animals, vehicles, lamp posts etc. Lidars are being used (360 degree light detection and radiating devices) to get a sense of what the object might be.
There is little research going on in terms of sound and smell. First, focus has been on sight (being able to “see”). Imagine if cars can sense sounds and smells in and around them, wouldn’t that be interesting? I digress….
So, the way cars see is by using the sensors (radars, lidars, cameras). These sensors capture images, distances, 3D mappings of surroundings and that data is fed into the CPU either in the car or the cloud. Software (image recognition, data processing and decision logic) interprets the data and sends commands to the accelerator, brake, steering controls to make decision on navigation, to slow down, brake, turn or speed up.
“Google is working on self-driving cars and they seem to work. People are so bad at driving cars that computers don’t have to be that good to be much better” __Marc Andreessen
What’s the big deal with self driving cars anyway. Why do we need them, can’t humans do any work anymore, are we so lazy that we just want to be transported from place to place without lifting a finger? If the car drives itself, what are we going to be doing? watch videos? chat? selfies? read a book? get more work done? Well, these are a few questions that popped up in my mind as I visualize a world where people are not driving their cars but cars are driving people to their destinations.
I was impressed by this image Lex Fridman presented in his MIT Self-driving lecture series on what’s the BFD with self-driving cars
That’s not the point of this note though, I’d like to explore how Artificial Intelligence is helping cars drive themselves, what are some of the open challenges and how might A.I help solve these problems in the future? How might autonomous cars reduce accidents and give people who can’t drive an option to “drive”.
There are 3 key things that make a self-driving possible –
LIDAR (LIght Detection And Ranging)
Radar (Radio waves to detect objects, angles, distance etc)
Ultrasonic & Others (Odometer and other “close to vehicle” sensing)
Cameras (To “see” signal lights)
Software (To process all the sensor data)
Internet (to communicate with cloud or other vehicles)
GPS (positioning system so the car knows where it is to the centimeter, which today’s GPS cannot support)
Hear an interesting podcast on the shift to self-driving cars
Sensors collect millions of data points including objects on the sides, front and back, other moving vehicles nearby.
Software and algorithms process the data collecting through sensors and make decision on acceleration, brake, turns, speed and so on.
Connectivity helps the car “know” road conditions, weather, construction, traffic (is that still going to be an issue? may be not as long as there are no man-made disruptions like construction, drunk person walking across the road, a sleeping cow.
Will self-driving cars look like the cars of today? Perhaps not, there is no need for steering wheel, windows, wipers, mirrors, lights and foot pedals, however, on the flip side, not everything on the car is there only for functionality, some of it is also for esthetic reasons.
So anyway, how does self-driving technology actually work? We will see in Part 2 of this writeup.