With Love, A.I: Self Driving Cars (Part 1 of 3)

“Google is working on self-driving cars and they seem to work. People are so bad at driving cars that computers don’t have to be that good to be much better” __Marc Andreessen

What’s the big deal with self driving cars anyway. Why do we need them, can’t humans do any work anymore, are we so lazy that we just want to be transported from place to place without lifting a finger? If the car drives itself, what are we going to be doing? watch videos? chat? selfies? read a book? get more work done? Well, these are a few questions that popped up in my mind as I visualize a world where people are not driving their cars but cars are driving people to their destinations.

I was impressed by this image Lex Fridman presented in his MIT Self-driving lecture series on what’s the BFD with self-driving cars

source: MIT Self-Driving Cars Lecture

That’s not the point of this note though, I’d like to explore how Artificial Intelligence is helping cars drive themselves, what are some of the open challenges and how might A.I help solve these problems in the future? How might autonomous cars reduce accidents and give people who can’t drive an option to “drive”.

How AV technology works

There are 3 key things that make a self-driving possible –

  1. Sensors
    1. LIDAR (LIght Detection And Ranging)
    2. Radar (Radio waves to detect objects, angles, distance etc)
    3. Ultrasonic & Others (Odometer and other “close to vehicle” sensing)
    4. Cameras (To “see” signal lights)
  2. Software (To process all the sensor data)
  3. Connectivity
    1. Internet (to communicate with cloud or other vehicles)
    2. GPS (positioning system so the car knows where it is to the centimeter, which today’s GPS cannot support)

Hear an interesting podcast on the shift to self-driving cars

source: https://www.ucsusa.org/clean-vehicles/how-self-driving-cars-work

Sensors collect millions of data points including objects on the sides, front and back, other moving vehicles nearby.

Software and algorithms process the data collecting through sensors and make decision on acceleration, brake, turns, speed and so on.

Connectivity helps the car “know” road conditions, weather, construction, traffic (is that still going to be an issue? may be not as long as there are no man-made disruptions like construction, drunk person walking across the road, a sleeping cow.

Will self-driving cars look like the cars of today? Perhaps not, there is no need for steering wheel, windows, wipers, mirrors, lights and foot pedals, however, on the flip side, not everything on the car is there only for functionality, some of it is also for esthetic reasons.

So anyway, how does self-driving technology actually work? We will see in Part 2 of this writeup.

With Love, A.I: Radiology

“If AI can recognize disease progression early, then treatments and outcomes will improve.”

Isn’t it fascinating how little we understand about the brain?. A really good case for applying deep learning AI to recognize subtle patterns and changes to neuron activity can help in early diagnosis of Alzheimers disease. Using Positron Emission Tomography (PET) scans researchers are able to measure the amount of glucose a brain cell consumes.

A healthy brain cell consumes glucose to function, the more active a cell is the more glucose it consumes but as the cell deteriorates with disease, the amount of glucose it uses drops and eventually goes to zero. If doctors can diagnose the patterns of drop in glucose consumption levels sooner, they can administer drugs to help patients recover these cells which otherwise would die and cause Alzheimers.

“One of the difficulties with Alzheimer’s disease is that by the time all the clinical symptoms manifest and we can make a definitive diagnosis, too many neurons have died, making it essentially irreversible.”

JAE HO SOHN, MD, MS
the brain of a person with Alzheimer's disease sits next to a normal brain
The brain of a person with Alzheimer’s (left) compared with the brain of a person without the disease. Source: https://www.ucsf.edu/news/2019/01/412946/artificial-intelligence-can-detect-alzheimers-disease-brain-scans-six-years

Human radiologists are really good at detecting a focal point tumor but subtle global changes over time are harder to spot by the naked eye. AI is good at analyzing time series data and identifying micro patterns.

Other areas of research where AI is being applied to improve diagnosis is in osteoporosis detection and progression through bone imaging and comparison of subtle changes in the time series of images.

Stroke management is another area where machine learning has started to assist radiologists and neurologists. For example, here is a picture of how computers are trained with stroke imaging and then that model is used to predict if a “new image” has infarctions or not (it’s an yes or no answer).

Does this new image have infarction Yes/No? Machine says Yes and color codes the area of the brain in red. Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5647643

Furthermore the ML model can identify the exact location of stroke and highlight it for the physicians, saving precious time and helping expedite treatment, in stroke treatment seconds shaved off can mean the difference between life and death.

The areas in which deep learning can be useful in Radiology are lesion or disease detection, classification, quantification, and segmentation. 

“Deep learning is a class of machine learning methods that are gaining success and attracting interest in many domains, including computer vision, speech recognition, natural language processing, and playing games. Deep learning methods produce a mapping from raw inputs to desired outputs (eg, image classes)”. __RSNA

Figure 1.

Convolutional Neural Networks (CNN) algorithms have become popular in identifying patterns in data automatically without any external engineering, especially in image processing. CNNs are developed on the basis of biological neuron structures. Here is an example of how biological neurons detect edges through visual stimuli i.e. seeing.

Figure 5a.
Source: RSNA.org

and here is how a similar structure can be developed using CNNs

Figure 5b.
Source: RSNA.org
CNN representation of biological neurons

The “deep” term in deep learning comes from the fact that there are multiple layers between inputs and outputs as represented in a simplified diagram below

Figure 6.

If we apply the above CNNs structure to radiology images as inputs to detect disease or segment the image we can have an output that might highlight the areas where there is possible disease and/or output that says what the image might represent

Figure 7.

“Many software frameworks are now available for constructing and training multilayer neural networks (including convolutional networks). Frameworks such as Theano, Torch, TensorFlow, CNTK, Caffe, and Keras implement efficient low-level functions from which developers can describe neural network architectures with very few lines of code, allowing them to focus on higher-level architectural issues (3640).”

“Compared with traditional computer vision and machine learning algorithms, deep learning algorithms are data hungry. One of the main challenges faced by the community is the scarcity of labeled medical imaging datasets. While millions of natural images can be tagged using crowd-sourcing (27), acquiring accurately labeled medical images is complex and expensive. Further, assembling balanced and representative training datasets can be daunting given the wide spectrum of pathologic conditions encountered in clinical practice.”

“The creation of these large databases of labeled medical images and many associated challenges (54) will be fundamental to foster future research in deep learning applied to medical images.” __RSNA

300 applications have been identified for deep learning in radiology, check the survey out here

Sources:

  • https://pubs.rsna.org/doi/10.1148/rg.2017170077
  • https://www.ucsf.edu/news/2019/01/412946/artificial-intelligence-can-detect-alzheimers-disease-brain-scans-six-years
  • https://www.rheumatoidarthritis.org/ra/diagnosis/imaging/
  • https://pubs.rsna.org/doi/10.1148/radiol.2019181568
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5647643/
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5789692/

With Love, A.I: Transcription (2 of 2)

Can a developer enhance Google’s Speech to Text system (GS2T)? Short answer is “Yes”. Let’s take a look how to go about it

Google has what’s called language “hints”. By adding hints to the vocabulary, google speech-to-text is able to “understand” the audio and transcribe better.

As I had shared in part 1 of this blog, I tried transcribing the following

The lines in English and Telugu –

Nee Pada Sevaku Velaye Swami  
Aapada Baapava Aananda Nilaya 
Daari Tennu Teliyaga Leni  
Daasula Brovaga Vegame Raava

నీ పద సేవకు వేళాయె స్వామి  
ఆపద బాపవా ఆనంద నిలయ 
దారి తెన్నూ తెలియగ లేని  
దాసుల బ్రోవగ వేగమే రావా

Google transcribed that to –

Pada Sevaku Velaye Swami, Nee Pada Seva Vela 
Teliyadu 
Teliyagaane Leni Naa Manasulo

పద సేవకు వేళాయె స్వామి, నీ పాద సేవ వేళ 
తెలియదు 
తెలియగానే లేని నా మనసులో

I’d like to help Google’s S2T to transcribe better by providing the words (Aapada, Baapava, Aananda, Nilaya, Daari, Tennu) as “phrase hints”. Once I do that, I will transcribe again and hope that something better comes out the other end this time.

In order to get my phrase hints across to GS2T I need to enroll as a Google Cloud Platform developer, create an account and enable Cloud Speech-To-Text APIs and write a bit of json code to get going. There are good examples in Google cloud documentation

Here is how I fed GS2T my phrase hints

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" --data \
"{ 'config': { \
'language_code': 'te-IN', \
'speechContexts': { \
"phrases":['దారి','తెన్నూ', 'తెలియక', 'లేని', 'ఆపద', 'బాపవా'] \
} \
}, \
'audio' : { \
'uri':'gs://ammas2t/12465.flac' \
} \
}" "https://speech.googleapis.com/v1/speech:longrunningrecognize"
$ ./resultcurlcmd
{
"name": "4980738099736676025",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2019-04-22T20:05:40.740461Z",
"lastUpdateTime": "2019-04-22T20:06:41.100358Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"results": [
{
"alternatives": [
{
"transcript": "పద సేవకు వేళాయె నీ పదసేవ వేళాయె",
"confidence": 0.6822279
}
]
},
{
"alternatives": [
{
"transcript": "తెలియక లేని నా",
"confidence": 0.6823187
}
]
},
{
"alternatives": [
{
"transcript": " మమ్మేలుకో చరణములే నమ్మితి నీ పదసేవ వేళలో స్వామి",
"confidence": 0.5497838
}
]
},
{
"alternatives": [
{
"transcript": " ఆనంద నిలయం",
"confidence": 0.63640434
}
]
},
{
"alternatives": [
{
"transcript": " ఆశల జీవితం",
"confidence": 0.3930311
}
]
},
{
"alternatives": [
{
"transcript": " లేని జీవితం",
"confidence": 0.613313
}
]
},
{
"alternatives": [
{
"transcript": " నేను",
"confidence": 0.41449854
}
]
},
{
"alternatives": [
{
"transcript": " హాయ్ బ్రదర్",
"confidence": 0.59204257
}
]
}
]
}
}

The transcription seems to have gone south for some reason. I need to investigate further why my phrase hints not only didn’t help make the result better but they made it worse.

If you want to follow along how I setup Google Cloud Speech-To-Text API here are the screenshots, mostly self-evident.

Then download and install GCloud SDK and tools

in AI | 368 Words

With Love, A.I: Transcription (1 of 2)

tran·scrip·tion/ˌ tran(t)ˈskripSH(ə)n/
a written or printed representation of something.

Written representation of audio is normally considered as transcription. How does one go from audio to written word? We could listen to the audio word by word and note down the written representation of each word. This is a manual process and sometimes, we may need to pause the audio to catch up. Some words might be transcribed incorrectly.

Can AI help speed up the process and reduce errors in transcription? That’s a rhetorical question because AI already does this to some extent, we have seen it in products from Apple, Amazon, Google and others.

What would it take for a machine to listen and convert that listening into written word? In its simplest sense, assuming that the machine knows the entire vocabulary of that language in which the audio is in e.g. English, it can compare the spoken word with its vast library of phonemes to figure out what word the audio maps to and “type’ that word in that language into a text editor. Repeating this process for every uttered word recursively will produce a text document that is (hopefully) an exact representation of the audio.

For example, the spoken word “Potato” could be recognized as such by the software that processes each phoneme in the word with the library of phonemes and deconstruct the word to its basic phonemes, then match the possible word with a library of words, take context into consideration and figure out if it the textual representation of the spoken audio is really “Pohtahtoh” or “Pahtayto” or something else.

Apparently, most speech recognition systems use something called Hidden Markov Models.

Specific example of a Markov model for the word POTATO
The more general representation of Markov model. source: wikipedia

Can you implement a speech recognition and transcription system for Telugu language, using off the shelf libraries? This is a question I don’t know the answer to but let’s find out.

I set out looking for speech recognition libraries already available I can leverage and found a few. I don’t know which one is best suited for my purpose. I ‘ll with Google Cloud Speech to Text API as it claims to support 120 languages and Telugu is one of them.

I uploaded a Telugu song clip and Google STT produced the following –

The lines go –

Nee Pada Sevaku Velaye Swami 
Aapada Baapava Aananda Nilaya
Daari Tennu Teliyaga Leni 
Daasula Brovaga Vegame Raava

Google transcribed that to –

Pada Sevaku Velaye Swami, Nee Pada Seva Vela
Teliyadu
Teliyagaane Leni Naa Manasulo

What just happened. Why did Google transcription not work? In fact, it is so far off, the transcribed text reads like gibberish.

It’s possible the audio was not of great quality. It’s also possible that the Telugu vocabulary universe of Google Speech-to-Text System (GSTT) is limited. Perhaps the words Aapada, Baapava, Aananda, Nilaya, Daari, Tennu and others are not transcribed properly because related phonemes are missing from the GSTT.

Can one add phonemes and new words to GSTT to improve its accuracy? Funny thing is, it’s possible to add vocabulary to GSTT, it’s simple but not easy. It requires you to know programming and using Google’s STT Application Programming Interface (API). We will look at how to improve Google’s Speech to Text system by adding to its vocabulary in Part 2!

in AI | 563 Words