With Love, A.I: Self-Driving Cars (Part 3 of 3)

“To self-drive or not to self-drive is not the question” __Madhav SBSS

Sensors play a key role in self-driving cars. Radars are widely used as they are good in different weather and lighting conditions and they are cheap. Lidars are high density sensors but are not cheap and are not good in poor lighting and weather conditions. Cameras are cheap, high density but don’t perform well under poor lighting or weather conditions. Our eyes work in a similar way to the camera, it might be the right device here in mimicking human visual perception.

Here is a visualization of where these sensors are being used today in developing self-driving car technology –

Source: selfdrivingcars.mit.edu

Here is a visual spider web representation of the effectiveness of each type of sensor –

Source: selfdrivingcars.mit.edu

Is driver state detection important in self-driving cars?, isn’t the driver supposed to be free to do whatever they want to do, as if they are sitting on a sofa at home?. Here is a chart from Lex Fridman’s talk on what to track and tracking difficult (increasing) going left to right.

Driver State Detection, source: selfdrivingcars.mit.edu
Challenges at the fork of Full vs Human-centered autonomy
Source: selfdrivingcars.mit.edu

As you can see in the above picture, there are still many areas in red (hard to solve), which means there are plenty of opportunities to make a difference in the field of self-driving cars. Be part of the dream that brings safety and access to humans trying to get from point A to point B.

Here are some of the companies that are worth following if you are interested in self-driving cars, technologies and most importantly the people driving the technology forward.

Companies working on fully autonomous self-driving tech –

Companies working on Human-centered autonomous self-driving tech –

Here are a few resources to further explore self-driving technologies –

Sources:

With Love, A.I: Self-Driving Cars (Part 2 of 3)

“To drive better than humans, autonomous vehicles must first see better than humans.” __Nvidia

How does the technology we briefly reviewed in part 1 of this blog actually work to make self-driving possible? before jumping into how the tech comes together, we should understand the leveling up to become fully autonomous –

Levels of autonomy

Different cars are capable of different levels of self-driving, and are often described by researchers on a scale of 0-5.

6 Levels of Autonomous Driving source: http://www.ukspa.org.uk
Adoption Forecast source: http://www.ukspa.org.uk

What are the challenges and how are they being addressed with AI and Tech. For example, the car needs to be able to “see” what is around it, how fast is it moving, how far is it and do it extremely well in terrible weather or poor lighting conditions, not just in good light. For example, here is a picture of the rearview wide angle camera that shows 3 objects moving, so first, the software on the car needs to see them, then figure out how near to the car they are, while also detecting if they are moving or stationary.

Car uses radar to detect the distance of the objects, radar works by sending out pulses of radio waves and receiving them back as they bounce off the surface of these objects. Using the time it took for them to bounce back and speed at which they travel, one can calculate the distance (S = D/T). Good thing about radar is that it can work in low or zero light conditions. A radar can tell distances and speeds but it cannot tell what the objects are, whether they are humans, animals, vehicles, lamp posts etc. Lidars are being used (360 degree light detection and radiating devices) to get a sense of what the object might be.

There is little research going on in terms of sound and smell. First, focus has been on sight (being able to “see”). Imagine if cars can sense sounds and smells in and around them, wouldn’t that be interesting? I digress….

So, the way cars see is by using the sensors (radars, lidars, cameras). These sensors capture images, distances, 3D mappings of surroundings and that data is fed into the CPU either in the car or the cloud. Software (image recognition, data processing and decision logic) interprets the data and sends commands to the accelerator, brake, steering controls to make decision on navigation, to slow down, brake, turn or speed up.

Sources:

With Love, A.I: Self Driving Cars (Part 1 of 3)

“Google is working on self-driving cars and they seem to work. People are so bad at driving cars that computers don’t have to be that good to be much better” __Marc Andreessen

What’s the big deal with self driving cars anyway. Why do we need them, can’t humans do any work anymore, are we so lazy that we just want to be transported from place to place without lifting a finger? If the car drives itself, what are we going to be doing? watch videos? chat? selfies? read a book? get more work done? Well, these are a few questions that popped up in my mind as I visualize a world where people are not driving their cars but cars are driving people to their destinations.

I was impressed by this image Lex Fridman presented in his MIT Self-driving lecture series on what’s the BFD with self-driving cars

source: MIT Self-Driving Cars Lecture

That’s not the point of this note though, I’d like to explore how Artificial Intelligence is helping cars drive themselves, what are some of the open challenges and how might A.I help solve these problems in the future? How might autonomous cars reduce accidents and give people who can’t drive an option to “drive”.

How AV technology works

There are 3 key things that make a self-driving possible –

  1. Sensors
    1. LIDAR (LIght Detection And Ranging)
    2. Radar (Radio waves to detect objects, angles, distance etc)
    3. Ultrasonic & Others (Odometer and other “close to vehicle” sensing)
    4. Cameras (To “see” signal lights)
  2. Software (To process all the sensor data)
  3. Connectivity
    1. Internet (to communicate with cloud or other vehicles)
    2. GPS (positioning system so the car knows where it is to the centimeter, which today’s GPS cannot support)

Hear an interesting podcast on the shift to self-driving cars

source: https://www.ucsusa.org/clean-vehicles/how-self-driving-cars-work

Sensors collect millions of data points including objects on the sides, front and back, other moving vehicles nearby.

Software and algorithms process the data collecting through sensors and make decision on acceleration, brake, turns, speed and so on.

Connectivity helps the car “know” road conditions, weather, construction, traffic (is that still going to be an issue? may be not as long as there are no man-made disruptions like construction, drunk person walking across the road, a sleeping cow.

Will self-driving cars look like the cars of today? Perhaps not, there is no need for steering wheel, windows, wipers, mirrors, lights and foot pedals, however, on the flip side, not everything on the car is there only for functionality, some of it is also for esthetic reasons.

So anyway, how does self-driving technology actually work? We will see in Part 2 of this writeup.

With Love, A.I: Transcription (2 of 2)

Can a developer enhance Google’s Speech to Text system (GS2T)? Short answer is “Yes”. Let’s take a look how to go about it

Google has what’s called language “hints”. By adding hints to the vocabulary, google speech-to-text is able to “understand” the audio and transcribe better.

As I had shared in part 1 of this blog, I tried transcribing the following

The lines in English and Telugu –

Nee Pada Sevaku Velaye Swami  
Aapada Baapava Aananda Nilaya 
Daari Tennu Teliyaga Leni  
Daasula Brovaga Vegame Raava

నీ పద సేవకు వేళాయె స్వామి  
ఆపద బాపవా ఆనంద నిలయ 
దారి తెన్నూ తెలియగ లేని  
దాసుల బ్రోవగ వేగమే రావా

Google transcribed that to –

Pada Sevaku Velaye Swami, Nee Pada Seva Vela 
Teliyadu 
Teliyagaane Leni Naa Manasulo

పద సేవకు వేళాయె స్వామి, నీ పాద సేవ వేళ 
తెలియదు 
తెలియగానే లేని నా మనసులో

I’d like to help Google’s S2T to transcribe better by providing the words (Aapada, Baapava, Aananda, Nilaya, Daari, Tennu) as “phrase hints”. Once I do that, I will transcribe again and hope that something better comes out the other end this time.

In order to get my phrase hints across to GS2T I need to enroll as a Google Cloud Platform developer, create an account and enable Cloud Speech-To-Text APIs and write a bit of json code to get going. There are good examples in Google cloud documentation

Here is how I fed GS2T my phrase hints

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" --data \
"{ 'config': { \
'language_code': 'te-IN', \
'speechContexts': { \
"phrases":['దారి','తెన్నూ', 'తెలియక', 'లేని', 'ఆపద', 'బాపవా'] \
} \
}, \
'audio' : { \
'uri':'gs://ammas2t/12465.flac' \
} \
}" "https://speech.googleapis.com/v1/speech:longrunningrecognize"
$ ./resultcurlcmd
{
"name": "4980738099736676025",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2019-04-22T20:05:40.740461Z",
"lastUpdateTime": "2019-04-22T20:06:41.100358Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"results": [
{
"alternatives": [
{
"transcript": "పద సేవకు వేళాయె నీ పదసేవ వేళాయె",
"confidence": 0.6822279
}
]
},
{
"alternatives": [
{
"transcript": "తెలియక లేని నా",
"confidence": 0.6823187
}
]
},
{
"alternatives": [
{
"transcript": " మమ్మేలుకో చరణములే నమ్మితి నీ పదసేవ వేళలో స్వామి",
"confidence": 0.5497838
}
]
},
{
"alternatives": [
{
"transcript": " ఆనంద నిలయం",
"confidence": 0.63640434
}
]
},
{
"alternatives": [
{
"transcript": " ఆశల జీవితం",
"confidence": 0.3930311
}
]
},
{
"alternatives": [
{
"transcript": " లేని జీవితం",
"confidence": 0.613313
}
]
},
{
"alternatives": [
{
"transcript": " నేను",
"confidence": 0.41449854
}
]
},
{
"alternatives": [
{
"transcript": " హాయ్ బ్రదర్",
"confidence": 0.59204257
}
]
}
]
}
}

The transcription seems to have gone south for some reason. I need to investigate further why my phrase hints not only didn’t help make the result better but they made it worse.

If you want to follow along how I setup Google Cloud Speech-To-Text API here are the screenshots, mostly self-evident.

Then download and install GCloud SDK and tools

in AI | 368 Words

With Love, A.I: Transcription (1 of 2)

tran·scrip·tion/ˌ tran(t)ˈskripSH(ə)n/
a written or printed representation of something.

Written representation of audio is normally considered as transcription. How does one go from audio to written word? We could listen to the audio word by word and note down the written representation of each word. This is a manual process and sometimes, we may need to pause the audio to catch up. Some words might be transcribed incorrectly.

Can AI help speed up the process and reduce errors in transcription? That’s a rhetorical question because AI already does this to some extent, we have seen it in products from Apple, Amazon, Google and others.

What would it take for a machine to listen and convert that listening into written word? In its simplest sense, assuming that the machine knows the entire vocabulary of that language in which the audio is in e.g. English, it can compare the spoken word with its vast library of phonemes to figure out what word the audio maps to and “type’ that word in that language into a text editor. Repeating this process for every uttered word recursively will produce a text document that is (hopefully) an exact representation of the audio.

For example, the spoken word “Potato” could be recognized as such by the software that processes each phoneme in the word with the library of phonemes and deconstruct the word to its basic phonemes, then match the possible word with a library of words, take context into consideration and figure out if it the textual representation of the spoken audio is really “Pohtahtoh” or “Pahtayto” or something else.

Apparently, most speech recognition systems use something called Hidden Markov Models.

Specific example of a Markov model for the word POTATO
The more general representation of Markov model. source: wikipedia

Can you implement a speech recognition and transcription system for Telugu language, using off the shelf libraries? This is a question I don’t know the answer to but let’s find out.

I set out looking for speech recognition libraries already available I can leverage and found a few. I don’t know which one is best suited for my purpose. I ‘ll with Google Cloud Speech to Text API as it claims to support 120 languages and Telugu is one of them.

I uploaded a Telugu song clip and Google STT produced the following –

The lines go –

Nee Pada Sevaku Velaye Swami 
Aapada Baapava Aananda Nilaya
Daari Tennu Teliyaga Leni 
Daasula Brovaga Vegame Raava

Google transcribed that to –

Pada Sevaku Velaye Swami, Nee Pada Seva Vela
Teliyadu
Teliyagaane Leni Naa Manasulo

What just happened. Why did Google transcription not work? In fact, it is so far off, the transcribed text reads like gibberish.

It’s possible the audio was not of great quality. It’s also possible that the Telugu vocabulary universe of Google Speech-to-Text System (GSTT) is limited. Perhaps the words Aapada, Baapava, Aananda, Nilaya, Daari, Tennu and others are not transcribed properly because related phonemes are missing from the GSTT.

Can one add phonemes and new words to GSTT to improve its accuracy? Funny thing is, it’s possible to add vocabulary to GSTT, it’s simple but not easy. It requires you to know programming and using Google’s STT Application Programming Interface (API). We will look at how to improve Google’s Speech to Text system by adding to its vocabulary in Part 2!

in AI | 563 Words