With Love, A.I: Can A.I Replace Radiologists?

Well the short answer is “Yes”, longer answer is “not right now but in a few years”. I know that is very provocative and most people would disagree, especially radiologists. Stanford researchers have been able to create machine learning models and algorithms that can detect brain aneurysms much more effectively than a radiologist.

What is a brain aneurysm?

Brain aneurysm is a bulge in the blood vessels that could potentially leak or burst causing brain hemorrhage, damage or even death.

So the question is, if AI can do a better job of identifying aneurysms, can it be used in place of radiologists doing a similar function? If it can, then we will not be limited by number of radiologists there are but by number of servers we can add. Moore’s law applies to machines and not to humans so over time it’d become cheaper to deploy AI Radiologists than human radiologists.

This new AI tool is apparently built on an algorithm called HeadXNet. Researchers note however that the results are dependent on “scanner hardware and imaging protocols” which are not standardized, providers (hospital or lab) might have different hardware and use different imaging techniques that will influence how the AI tool detects or misses accurate diagnosis.

In this brain scan, the location of an aneurysm is indicated by HeadXNet using a transparent red highlight. (Image credit: Allison Park)

“A lot of patients are not getting treatment fast enough.”

Eric Schmidt

Augmented reading on the topic from wired.com

The Algorithm Will See You Now

With Love, A.I: Medical Image Versioning To Manage Disease Progression

“A.I. could play a big role in supporting prevention, diagnosis, treatment plans, medication management, precision medicine and drug creation” __Bruce Liang, Chief Information Officer of Singapore’s Ministry of Health

In software development, versioning is one of the key tenets of good programming. One can go back in history using a version control system such as git, svn, cvs etc to troubleshoot bugs, reverse deployment. Wouldn’t it be cool if a similar system existed in medical imaging which can assist radiologists to quickly “see” if a treatment is positively or negatively affecting the patient? Computer vision can process images and highlight differences between two or more images in real-time. That means, a radiologist need not spend hours retrieving, interpreting images of a patient and identifying the differences, with a click of a button on their phone they can see highlights of what has changed between images.

If there was a hypothetical medical imaging versioning system, how would such a system work, how would it be implemented and deployed in hospital systems, who would primarily use it, how would it enhance treatment effectiveness?

“Medical imaging guides the course of much of patient care and is an essential element of biomedical research. From x-rays and ultrasound to computerized tomography (CT), functional magnetic resonance imaging (fMRI), and positron emission tomography (PET), medical imaging helps clinicians diagnose, treat, and understand a range of diseases and conditions, including cancer, cardiovascular disease, and neurodegenerative disorders.”

Roadmap for Medical Imaging Whitehouse.gov

Internet Working Group for Medical Imaging (IWDMI) defined the above as the four key pieces for better healthcare through effective medical imaging. I’m particularly interested in “Advanced Computation & Machine Learning” aspect of the roadmap.

Here is a set of breast cancer images for a patient taken at regular intervals. I’m not going to pretend to know what’s going on in the following image but anyone with half a brain can guess that it shows images of different stages of breast cancer and it’s trying to help the physician understand treatment’s effectiveness over time (week 1 through week 20).

For a radiologist to pull such a report, my sense is, it’s not straightforward. Retrieving images from disparate systems, putting them next to each other for quick and easy comparison and looking at the treatments (dosage etc) alongside the images and viewing that over time to get a sense of disease progression probably takes hours if not days.

This can be streamlined and automated using better image storage, retrieval and computer vision. If we can reduce the amount of time to generate this report from days/hours to minutes/seconds, it would save precious time for physicians and might be a life saver for the patient.

In this series, next we will see where the current art is on this issue and subsequently look at the possibilities of using latest Computer Vision (CV) techniques to save time for Radiologists and Pathologists.

Aural Culture vs Typed Word

“If the whole truth is told, oral tradition stands out as the single most dominant communicative technology of our species as both a historical fact and, in many areas still, a contemporary reality.” __John Foley, Signs of Orality

Johannes Gutenberg ushered in the era of printing press and movable type around the year 1439. Thanks to him I’m able to write this blog but what intrigues me is why did we go from aural tradition to written tradition in the first place. What were the problems with aural methods that were addressed by printed word?

Johannes Gutenberg
Source: wikipedia.org

How did people transfer knowledge, news, gossip before printing was a thing? Handwritten manuscript was the prevalent method for writing and sharing ideas. However, the most popular method of transferring, sharing ideas with one another was through oral communication.

“If the whole truth is told, oral tradition stands out as the single most dominant communicative technology of our species as both a historical fact and, in many areas still, a contemporary reality.”

__John Foley, Signs of Orality

In ancient India, scriptures, folklore, stories were mainly transmitted orally. It is widely believed that srutis of Hinduism (Vedas) were never written down but have been transferred from generation to generation solely orally. Signs of that can be seen even today in the way music is taught in North India, Hindustani music, which is the main focus of one of the four vedas, Samaveda. The notations, structure of the composition (Raaga) and the Chalan (movements and interconnections between various notes) are some of the aspects that are still transferred between the Guru and Sishya in the so called Guru-Sishya-Parampara in oral methods.

“The Vedic texts were orally composed and transmitted, without the use of script, in an unbroken line of transmission from teacher to student that was formalized early on. This ensured an impeccable textual transmission superior to the classical texts of other cultures; it is, in fact, something like a tape-recording… Not just the actual words, but even the long-lost musical (tonal) accent (as in old Greek or in Japanese) has been preserved up to the present.”

— Michael Witzel

In Greece, it’s believed that Homer’s epic poetry (Lliad and Odyssey) was primarily composed, performed and transmitted orally.

Are we getting back to oral/aural mode of transmitting ideas more so than written script? This brings me to the point I’m trying to make with this writeup, if we can speak and the machines can understand and converse with us just like humans, if not better than humans, would oral communication become the predominant way we transmit ideas, commands, conversations?

If we can talk to the TV, talk to the garage door, talk to the thermostat and listen to books, listen to magazines, make annotations using audio markers rather than visual markers, you get the point, If I can talk to my phone and it can talk back to me (as Turing dreamed of), would we still want to type or read? or would we rather talk and listen?. What do you think?


With Love, A.I: Speaking Sense

“Alexa and Google Duplex are not perfect but so are humans, only difference is, Alexa and Duplex are making great strides forward”

During the years 2008-11 I had worked at a healthcare IT company that used Automated Speech Recognition (ASR), Speech-to-Text (STT) and Text-to-Speech (TTS) software to automate collecting insurance information from members on behalf of health insurance companies in the US e.g. Blue Cross, United.

Speaking to technology, as if you are chatting with a human, has been the holy-grail of digital user interface design and I believe we are in the golden age of speech recognition.

Speech recognition will become so ubiquitous that we wouldn’t have to type to chat with friends and family, in any language, or speak to an automated voice assistant that doesn’t understand what you are saying even after repeating 5 times. We would never have to fumble through many similar looking buttons on your TV remote to watch the show you want to watch, from any streaming app. It will be accomplished by a simple voice command, as if summoning a human assistant to find and play the show from Netflix for you.

source: wired.com

Alexa And Duplex Are Not Perfect But So Are Humans

In 2017, when I spoke to my Alexa at home in Telugu (my mother tongue), it was missing what I said 9 out of 10 times but now in 2019 it’s already faring a lot better, magic of NLP in the cloud. I’d say it’s still far from perfect but not 1 out of 10, may be a 5!

Typing in non-english language into your iMessage or WhatsApp requires installing a language keyboard, finding several key combinations to type a single word, it’s not easy. Imagine simply speaking to the chat bot and it types the text in a language of your choice effortlessly. Imagine being able to communicate with a tourist in english while she sees text on her phone in her language instantly. It’s already possible to a large extent, with assistants like Google Duplex we are well on our way to this “utopian” world where language is not a barrier to communication any more.

Here is the sample google duplex making a call to reserve a spot at a hair salon, not bad ha?

Google Duplex


With Love, A.I: TensorFlow, Keras, PyTorch And A Hodgepodge Of Other Libraries

Hodgepodge of AI Libraries

In the beginning there was FORTRAN one of the first widely spread high-level programming language. Then came Algol, PL/1, Pascal, COBOL, BASIC, C, Lisp and others and then came javascript, Python, PHP, Perl, Ruby and the more widely adopted object oriented programming languages C++ and Java. It took nearly 50 years to go from FORTRAN to Java.

In contrast, AI research has been in the works for many decades starting in the 1950s, there have been as many if not more libraries created in the past 10 years as there were programming languages created in the last 50 years.

Here is a chart of AI libraries and how many people “follow” them on github, interestingly, newest of the libraries, TensorFlow seems to be a few orders of magnitude more popular than the others. That doesn’t mean it’s the best AI library, in fact there is no such thing as an AI library (general purpose). My sense is that there are libraries that assist in developing AI applications and some are better suited for an application than others, depending on the problem being solved e.g. computer vision, natural language processing

Let’s take a quick look at what each of these libraries are suited for

TensorFlow – provides a collection of workflows to develop and train models using Python, JavaScript, or Swift, and to easily deploy in the cloud, on-prem, in the browser, or on-device no matter what language you use.

scikit-learn – machine learning library of algorithms for data analysis and regression in Python

BVLC/Caffe – Berkley Vision and Learning Centre’s Caffe is a deep learning library for processing images. Caffe can process over 60M images per day with a single NVIDIA K40 GPU

Keras – a deep learning Python library that runs on top of TensorFlow, Theano or CNTK. Primarily an experimentation framework assists in fast experimentation with models.

CNTK – Microsoft Cognitive Toolkit is a deep learning library that can be included in Python, C# or C++ code. It describes neural networks as a series of computational steps in a directed graph.

mxnet – A flexible and efficient library for deep learning.

“Deep learning denotes the modern incarnation of neural networks, and it’s the technology behind recent breakthroughs in self-driving cars, machine translation, speech recognition and more. While widespread interest in deep learning took off in 2012, deep learning has become an indispensable tool for countless industries.”

source: https://mxnet.apache.org/versions/master/faq/why_mxnet.html

PyTorch – is an open-source machine learning library for Python, based on Torch, used for applications such as natural language processing.

PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

Theano – is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. 

Caffe2 – aims to provide an easy and straightforward way for you to experiment with deep learning and leverage community contributions of new models and algorithms. 

Torch7 – is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT

Ok great, this is a list of a few of the hundreds of AI libraries, so what? I can google them myself, what’s the point of this blog? I’m as deluded as I was before reading this blog, if I am just starting out in AI, which library should I pick, where should I start? The short answer is pick any library, you will be better off picking one and running with it and developing something than not picking any, you have to do it, might as well start now than later.

Of course a better answer might be, what do you seek to solve? Are you looking to programmatically recognize people’s faces or cars in a photograph of a busy street? You might want to start with the BVLC/Caffe. Here is a good presentation to get you started on Caffe

If you seek to solve parsing and understanding written or spoken word, may be PyTorch is the library to start with. Here is an interesting tutorial for creating a chatbot using PyTorch

With Love, A.I: Self-Driving Cars (Part 3 of 3)

“To self-drive or not to self-drive is not the question” __Madhav SBSS

Sensors play a key role in self-driving cars. Radars are widely used as they are good in different weather and lighting conditions and they are cheap. Lidars are high density sensors but are not cheap and are not good in poor lighting and weather conditions. Cameras are cheap, high density but don’t perform well under poor lighting or weather conditions. Our eyes work in a similar way to the camera, it might be the right device here in mimicking human visual perception.

Here is a visualization of where these sensors are being used today in developing self-driving car technology –

Source: selfdrivingcars.mit.edu

Here is a visual spider web representation of the effectiveness of each type of sensor –

Source: selfdrivingcars.mit.edu

Is driver state detection important in self-driving cars?, isn’t the driver supposed to be free to do whatever they want to do, as if they are sitting on a sofa at home?. Here is a chart from Lex Fridman’s talk on what to track and tracking difficult (increasing) going left to right.

Driver State Detection, source: selfdrivingcars.mit.edu
Challenges at the fork of Full vs Human-centered autonomy
Source: selfdrivingcars.mit.edu

As you can see in the above picture, there are still many areas in red (hard to solve), which means there are plenty of opportunities to make a difference in the field of self-driving cars. Be part of the dream that brings safety and access to humans trying to get from point A to point B.

Here are some of the companies that are worth following if you are interested in self-driving cars, technologies and most importantly the people driving the technology forward.

Companies working on fully autonomous self-driving tech –

Companies working on Human-centered autonomous self-driving tech –

Here are a few resources to further explore self-driving technologies –


With Love, A.I: Self-Driving Cars (Part 2 of 3)

“To drive better than humans, autonomous vehicles must first see better than humans.” __Nvidia

How does the technology we briefly reviewed in part 1 of this blog actually work to make self-driving possible? before jumping into how the tech comes together, we should understand the leveling up to become fully autonomous –

Levels of autonomy

Different cars are capable of different levels of self-driving, and are often described by researchers on a scale of 0-5.

6 Levels of Autonomous Driving source: http://www.ukspa.org.uk
Adoption Forecast source: http://www.ukspa.org.uk

What are the challenges and how are they being addressed with AI and Tech. For example, the car needs to be able to “see” what is around it, how fast is it moving, how far is it and do it extremely well in terrible weather or poor lighting conditions, not just in good light. For example, here is a picture of the rearview wide angle camera that shows 3 objects moving, so first, the software on the car needs to see them, then figure out how near to the car they are, while also detecting if they are moving or stationary.

Car uses radar to detect the distance of the objects, radar works by sending out pulses of radio waves and receiving them back as they bounce off the surface of these objects. Using the time it took for them to bounce back and speed at which they travel, one can calculate the distance (S = D/T). Good thing about radar is that it can work in low or zero light conditions. A radar can tell distances and speeds but it cannot tell what the objects are, whether they are humans, animals, vehicles, lamp posts etc. Lidars are being used (360 degree light detection and radiating devices) to get a sense of what the object might be.

There is little research going on in terms of sound and smell. First, focus has been on sight (being able to “see”). Imagine if cars can sense sounds and smells in and around them, wouldn’t that be interesting? I digress….

So, the way cars see is by using the sensors (radars, lidars, cameras). These sensors capture images, distances, 3D mappings of surroundings and that data is fed into the CPU either in the car or the cloud. Software (image recognition, data processing and decision logic) interprets the data and sends commands to the accelerator, brake, steering controls to make decision on navigation, to slow down, brake, turn or speed up.


With Love, A.I: Self Driving Cars (Part 1 of 3)

“Google is working on self-driving cars and they seem to work. People are so bad at driving cars that computers don’t have to be that good to be much better” __Marc Andreessen

What’s the big deal with self driving cars anyway. Why do we need them, can’t humans do any work anymore, are we so lazy that we just want to be transported from place to place without lifting a finger? If the car drives itself, what are we going to be doing? watch videos? chat? selfies? read a book? get more work done? Well, these are a few questions that popped up in my mind as I visualize a world where people are not driving their cars but cars are driving people to their destinations.

I was impressed by this image Lex Fridman presented in his MIT Self-driving lecture series on what’s the BFD with self-driving cars

source: MIT Self-Driving Cars Lecture

That’s not the point of this note though, I’d like to explore how Artificial Intelligence is helping cars drive themselves, what are some of the open challenges and how might A.I help solve these problems in the future? How might autonomous cars reduce accidents and give people who can’t drive an option to “drive”.

How AV technology works

There are 3 key things that make a self-driving possible –

  1. Sensors
    1. LIDAR (LIght Detection And Ranging)
    2. Radar (Radio waves to detect objects, angles, distance etc)
    3. Ultrasonic & Others (Odometer and other “close to vehicle” sensing)
    4. Cameras (To “see” signal lights)
  2. Software (To process all the sensor data)
  3. Connectivity
    1. Internet (to communicate with cloud or other vehicles)
    2. GPS (positioning system so the car knows where it is to the centimeter, which today’s GPS cannot support)

Hear an interesting podcast on the shift to self-driving cars

source: https://www.ucsusa.org/clean-vehicles/how-self-driving-cars-work

Sensors collect millions of data points including objects on the sides, front and back, other moving vehicles nearby.

Software and algorithms process the data collecting through sensors and make decision on acceleration, brake, turns, speed and so on.

Connectivity helps the car “know” road conditions, weather, construction, traffic (is that still going to be an issue? may be not as long as there are no man-made disruptions like construction, drunk person walking across the road, a sleeping cow.

Will self-driving cars look like the cars of today? Perhaps not, there is no need for steering wheel, windows, wipers, mirrors, lights and foot pedals, however, on the flip side, not everything on the car is there only for functionality, some of it is also for esthetic reasons.

So anyway, how does self-driving technology actually work? We will see in Part 2 of this writeup.

With Love, A.I: Radiology

“If AI can recognize disease progression early, then treatments and outcomes will improve.”

Isn’t it fascinating how little we understand about the brain?. A really good case for applying deep learning AI to recognize subtle patterns and changes to neuron activity can help in early diagnosis of Alzheimers disease. Using Positron Emission Tomography (PET) scans researchers are able to measure the amount of glucose a brain cell consumes.

A healthy brain cell consumes glucose to function, the more active a cell is the more glucose it consumes but as the cell deteriorates with disease, the amount of glucose it uses drops and eventually goes to zero. If doctors can diagnose the patterns of drop in glucose consumption levels sooner, they can administer drugs to help patients recover these cells which otherwise would die and cause Alzheimers.

“One of the difficulties with Alzheimer’s disease is that by the time all the clinical symptoms manifest and we can make a definitive diagnosis, too many neurons have died, making it essentially irreversible.”

the brain of a person with Alzheimer's disease sits next to a normal brain
The brain of a person with Alzheimer’s (left) compared with the brain of a person without the disease. Source: https://www.ucsf.edu/news/2019/01/412946/artificial-intelligence-can-detect-alzheimers-disease-brain-scans-six-years

Human radiologists are really good at detecting a focal point tumor but subtle global changes over time are harder to spot by the naked eye. AI is good at analyzing time series data and identifying micro patterns.

Other areas of research where AI is being applied to improve diagnosis is in osteoporosis detection and progression through bone imaging and comparison of subtle changes in the time series of images.

Stroke management is another area where machine learning has started to assist radiologists and neurologists. For example, here is a picture of how computers are trained with stroke imaging and then that model is used to predict if a “new image” has infarctions or not (it’s an yes or no answer).

Does this new image have infarction Yes/No? Machine says Yes and color codes the area of the brain in red. Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5647643

Furthermore the ML model can identify the exact location of stroke and highlight it for the physicians, saving precious time and helping expedite treatment, in stroke treatment seconds shaved off can mean the difference between life and death.

The areas in which deep learning can be useful in Radiology are lesion or disease detection, classification, quantification, and segmentation. 

“Deep learning is a class of machine learning methods that are gaining success and attracting interest in many domains, including computer vision, speech recognition, natural language processing, and playing games. Deep learning methods produce a mapping from raw inputs to desired outputs (eg, image classes)”. __RSNA

Figure 1.

Convolutional Neural Networks (CNN) algorithms have become popular in identifying patterns in data automatically without any external engineering, especially in image processing. CNNs are developed on the basis of biological neuron structures. Here is an example of how biological neurons detect edges through visual stimuli i.e. seeing.

Figure 5a.
Source: RSNA.org

and here is how a similar structure can be developed using CNNs

Figure 5b.
Source: RSNA.org
CNN representation of biological neurons

The “deep” term in deep learning comes from the fact that there are multiple layers between inputs and outputs as represented in a simplified diagram below

Figure 6.

If we apply the above CNNs structure to radiology images as inputs to detect disease or segment the image we can have an output that might highlight the areas where there is possible disease and/or output that says what the image might represent

Figure 7.

“Many software frameworks are now available for constructing and training multilayer neural networks (including convolutional networks). Frameworks such as Theano, Torch, TensorFlow, CNTK, Caffe, and Keras implement efficient low-level functions from which developers can describe neural network architectures with very few lines of code, allowing them to focus on higher-level architectural issues (3640).”

“Compared with traditional computer vision and machine learning algorithms, deep learning algorithms are data hungry. One of the main challenges faced by the community is the scarcity of labeled medical imaging datasets. While millions of natural images can be tagged using crowd-sourcing (27), acquiring accurately labeled medical images is complex and expensive. Further, assembling balanced and representative training datasets can be daunting given the wide spectrum of pathologic conditions encountered in clinical practice.”

“The creation of these large databases of labeled medical images and many associated challenges (54) will be fundamental to foster future research in deep learning applied to medical images.” __RSNA

300 applications have been identified for deep learning in radiology, check the survey out here


  • https://pubs.rsna.org/doi/10.1148/rg.2017170077
  • https://www.ucsf.edu/news/2019/01/412946/artificial-intelligence-can-detect-alzheimers-disease-brain-scans-six-years
  • https://www.rheumatoidarthritis.org/ra/diagnosis/imaging/
  • https://pubs.rsna.org/doi/10.1148/radiol.2019181568
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5647643/
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5789692/

With Love, A.I: Transcription (2 of 2)

Can a developer enhance Google’s Speech to Text system (GS2T)? Short answer is “Yes”. Let’s take a look how to go about it

Google has what’s called language “hints”. By adding hints to the vocabulary, google speech-to-text is able to “understand” the audio and transcribe better.

As I had shared in part 1 of this blog, I tried transcribing the following

The lines in English and Telugu –

Nee Pada Sevaku Velaye Swami  
Aapada Baapava Aananda Nilaya 
Daari Tennu Teliyaga Leni  
Daasula Brovaga Vegame Raava

నీ పద సేవకు వేళాయె స్వామి  
ఆపద బాపవా ఆనంద నిలయ 
దారి తెన్నూ తెలియగ లేని  
దాసుల బ్రోవగ వేగమే రావా

Google transcribed that to –

Pada Sevaku Velaye Swami, Nee Pada Seva Vela 
Teliyagaane Leni Naa Manasulo

పద సేవకు వేళాయె స్వామి, నీ పాద సేవ వేళ 
తెలియగానే లేని నా మనసులో

I’d like to help Google’s S2T to transcribe better by providing the words (Aapada, Baapava, Aananda, Nilaya, Daari, Tennu) as “phrase hints”. Once I do that, I will transcribe again and hope that something better comes out the other end this time.

In order to get my phrase hints across to GS2T I need to enroll as a Google Cloud Platform developer, create an account and enable Cloud Speech-To-Text APIs and write a bit of json code to get going. There are good examples in Google cloud documentation

Here is how I fed GS2T my phrase hints

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" --data \
"{ 'config': { \
'language_code': 'te-IN', \
'speechContexts': { \
"phrases":['దారి','తెన్నూ', 'తెలియక', 'లేని', 'ఆపద', 'బాపవా'] \
} \
}, \
'audio' : { \
'uri':'gs://ammas2t/12465.flac' \
} \
}" "https://speech.googleapis.com/v1/speech:longrunningrecognize"
$ ./resultcurlcmd
"name": "4980738099736676025",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2019-04-22T20:05:40.740461Z",
"lastUpdateTime": "2019-04-22T20:06:41.100358Z"
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"results": [
"alternatives": [
"transcript": "పద సేవకు వేళాయె నీ పదసేవ వేళాయె",
"confidence": 0.6822279
"alternatives": [
"transcript": "తెలియక లేని నా",
"confidence": 0.6823187
"alternatives": [
"transcript": " మమ్మేలుకో చరణములే నమ్మితి నీ పదసేవ వేళలో స్వామి",
"confidence": 0.5497838
"alternatives": [
"transcript": " ఆనంద నిలయం",
"confidence": 0.63640434
"alternatives": [
"transcript": " ఆశల జీవితం",
"confidence": 0.3930311
"alternatives": [
"transcript": " లేని జీవితం",
"confidence": 0.613313
"alternatives": [
"transcript": " నేను",
"confidence": 0.41449854
"alternatives": [
"transcript": " హాయ్ బ్రదర్",
"confidence": 0.59204257

The transcription seems to have gone south for some reason. I need to investigate further why my phrase hints not only didn’t help make the result better but they made it worse.

If you want to follow along how I setup Google Cloud Speech-To-Text API here are the screenshots, mostly self-evident.

Then download and install GCloud SDK and tools