Using the API for Automated Transcription


How We Came to Know the Rev API

We were approached by a transcription service company in Los Angeles to integrate with a speech-to-text software called Rev. The company Rev provides 2 major offerings,, which is their service arm providing human transcribers, and, which is their AI based API for automated transcriptions. For this article, we will be referring to, the API. 

Our client, the transcription service company, was seeking a way to upload mp3 files and have them sent to Rev’s speech-to-text engine. During several phone calls it became clear that the software would evolve to be more complex. Together Endertech and our client developed architectural solutions that would serve as the blueprints for the entire build. 

The optimal software to build was a web-based database application. The software has secure user roles and defined workflows. Users would be able to log in to a secure software application and upload mp3 files. The uploaded files were stored securely in Amazon s3 and sent to Rev’s speech-to-text engine. The mp3 files were large and upwards of 2 hours of voice recordings. typically takes 5-10 minutes to process the request and send us back the formatted text.

Rev API Use Cases provides several use cases for their software. These can be found directly on their website.

  • Phone conversations can be recorded and output into a text 

  • Analyze sales call and provide valuable feedback to increase sales 

  • Record meetings to recount key points that would otherwise be missed 

  • Doctors chart by voice converted to formatted medical charts 

  • Legal dictations converted to formatted text 

  • Support Agents recorded conversation output into business CRM’s 

  • Voice dictations to quickly draft emails on the fly 

After Rev sends us the formatted text, we display this in the secure web-based database application. Users can log in and define speakers, edit text with formatting tools, use macros, and other hotkeys. The formatted text can then be saved and output into a specific .docx formatted report. The outputted reports can be upwards of 100 pages and have several inserted variables. Cover letters, footers, page numbers, speaker names, signatures, and other custom variables are dynamically inserted. 

Our client was using Rev for legal dictations that would be converted to digital pleading formatted docx. The system worked well because each recording could have several speakers. The API does an excellent job of identifying the speakers and returning them to us in an organized fashion. API Integration Basics

The API docs provided outstanding documentation and was very simple to follow their instructions.  The framework we are using is Symfony 4.3 and Guzzle for implementation. Guzzle is a helpful integration tool that sends HTTP requests. The configuration of Guzzle made it much easier to integrate with the Rev API.

eight_points_guzzle: clients: rev_ai: base_url: "" options: timeout: 30 http_errors: true headers: User-Agent: "EightPointsGuzzleBundle/v7" Accept: 'application/json' Authorization: "Bearer %env(REV_AI)%" Content-Type: "application/json" plugin: null

Provided are a few integration key points. The base endpoint in Rev is the base_url. The authorization is the access token generated in the API account. The Accept is the Content-Type.

$response = $this->client->post('/speechtotext/v1/jobs/', [ 'json' => [ 'media_url' => $url, 'callback_url' => $this->systemDomain . 'rev-ai/callback' ] ]); Response Returned: { "id": "Umx5c6F7pH7r", "created_on": "2018-09-15T05:14:38.13", "name": "sample.mp3", "metadata": "This is a sample submit jobs option for multipart", "status": "in_progress" }

We are using a Symfony service and using a constructor to define the Client. The Client is the configuration values in Guzzle. We are submitting an audio file for transcription and the HTTP request we are using is POST. The media_url is set to the specific file URL that is being sent for transcription. The second option is a callback_url that is essentially a webhook. Once the file is transcribed Rev sends a notification to our callback URL where we can essentially do anything with the returned data. Once you have created a Job, a response is returned with valuable information. The ID of the job, status, and other metadata. The ID is important to retrieve the job in the future.

//JSON Format $response = $this->client->get('/speechtotext/v1/jobs/'. $revId .'/transcript', [ 'headers' => [ 'Accept' => 'application/vnd.rev.transcript.v1.0+json' ] OR //Plain Text Format $response = $this->client->get('/speechtotext/v1/jobs/'. $revId .'/transcript', [ 'headers' => [ 'Accept' => 'text/plain' ] Transcription Response JSON Format: { "monologues": [ {“speaker”:0,"elements":[{"type":"text","value":"On","ts":0.57,"end_ts":0.78,"confidence":1.0},.....}, {} ] }

In our POST request we chose to use the callback method (webhook). The Getter is very simple to use; All you need to do is pass in the ID inside the URL and have the Accept argument with the correct format that you would like to have. Rev gives you the ability to grab the transcript in JSON or Plain text format. It returns “monologues”. On the return you can choose to parse the JSON code to grab the specific data you would like to have.

if ($json['job']['status'] == 'transcribed') { $transcription = new Transcription(); foreach ($transcript['monologues'] as $index => $monologue) { …… } $em->persist($transcription); }

Once the transcription is finished the webhook either sends a success or failure response. On Success, we ran through all the monologues and saved each element to a transcription. API Integration Project Outcome

The primary goal of this project was to cut down the manual labor on time-consuming tasks. The secondary goals were to organize all files, output reports, and create a streamlined operational workflow. Based on the review and quote from our client, I would say we succeeded with these goals. With the help of Rev and their thorough documentation, we could easily integrate the systems and accomplish the goals we had in mind. 

Endertech’s solution has automated our client’s transcription process in its entirety. The amount of time they have saved is enormous—tasks that previously took 13-14 hours to complete now take about two hours. If you are interested in integrating with API or other API systems, please read our clients review and feel free to contact us.