How I Use AI to take perfect notes...without typing
Building a Bot that Transcribes Voice Notes to Text and Sends Them to Notion
In this video, the speaker demonstrates how to build a bot that transcribes voice notes into text and sends them to a notes database in Notion. The process is automated using two AI tools, OpenAI's Whisper model for audio-to-text transcription and ChatGPT for generating summaries and action items.
Setting Up the Workflow
- The workflow is easy to set up and completely hands-off once built.
- Four tools are needed for the workflow:
- A Notion account with a Notes database
- An OpenAI account for Whisper API (audio-to-text transcription) and ChatGPT API (summary generation)
- A cloud storage provider such as Google Drive or Dropbox
- A Pipe Dream account for automation building
- Whimsical is recommended as a tool for creating flowcharts of the automation process.
Using the Workflow
- The workflow widens the pipeline between one's actual brain and their second brain inside of Notion.
- All AI-transcribed notes can be stored in one section of a customized note-taking system in Notion.
- Pipe Dream allows automation every time a new audio file is uploaded to Google Drive or other cloud storage providers.
Conclusion
- Writing consistently better prompts can give better output from ChatGPT.
- Ultimate Brain template for Notion provides robust task management, project management dashboard, goal tracking, and a full note-taking system.
- The speaker provides a discount code for Ultimate Brain in the video description.
Overview of the Workflow
In this section, the speaker provides an overview of what will happen in the workflow and what will be built.
Building the Workflow
- The audio files will be uploaded to cloud storage.
- When a new file is added to a specific folder in Google Drive, Pipe Dream will see it and trigger automation.
- The audio file will be downloaded into temp storage.
- Whisper will transcribe the audio file.
- Chat GPT API will summarize the transcription text.
- Formatting will be done to bundle all information together nicely.
- A brand new page in Notion will be created with all information.
Setting up PipeDream Account
In this section, the speaker explains how to set up a PipeDream account and start building a workflow.
Creating a Workflow
- Sign up for a PipeDream account if you don't have one already.
- Click on "New Workflow" button on your dashboard or create a new workflow from scratch.
- Name your workflow "Speech to Text to Notion".
- The first step of any workflow is always the trigger. It cannot be renamed and we want our workflow to trigger every time an audio file is uploaded to a specific folder in Google Drive.
- Search for Google Drive as your trigger action
- Choose "New Files" action which emits an event anytime a new file is added in your linked Google Drive
- Connect your Google Drive account
- Select only files that go into a specific folder called "audio upload test"
- Hit create source
- Generate test event for this trigger by uploading an audio file into that specific folder
- One new event has been detected by the trigger
- Open it up and select this new file that it has detected
- Every time you get a successful result in a step in Pipedream, you're gonna get this green success message.
Triggering the Workflow
In this section, the speaker explains how to trigger the workflow and access properties of the exported object.
Accessing Properties of Exported Object
- Throughout the entire process, we will be accessing properties of this object and referencing them in additional steps.
- There is both a copy path option and a copy value option.
- Copy value copies the exact value from that particular run of automation.
Downloading and Transcribing Audio Files with Pipedream and OpenAI
In this tutorial, the speaker demonstrates how to download an audio file from Google Drive into Pipe Dream's temp directory and then transcribe it using OpenAI's Whisper.
Downloading Audio File from Google Drive
- To download the file from Google Drive, search for "drive" in the step.
- Find the "download a file" action and reference the dynamic value from the previous step under the file property.
- Provide a destination file path by typing /tmp/recording.mp3 or dynamically setting it based on the audio file type using steps trigger event full file extension.
Transcribing Audio File with OpenAI's Whisper
- Use the OpenAI Chat GPT app to create a transcription.
- Connect to your OpenAI account if you haven't already done so.
- Select "create transcription" action to send your audio file to Whisper for transcription.
Setting up Pipe Dream Workflow for AI Transcription
In this section, the speaker explains how to set up a Pipe Dream workflow for AI transcription. The steps include signing up for a free trial, upgrading to a paid account, creating an API key, defining the file path, and testing the transcription.
Signing Up and Creating an API Key
- To sign up for a free trial of Pipe Dream workflow, go to the website and create an account.
- After signing up, access your dashboard by clicking on "Personal" in the top right corner.
- Go to "Manage Account" and upgrade to a paid account if desired.
- Create a new secret key under "User API Keys".
- Copy the API key and paste it into your Pipe Dream workflow.
Defining File Path and Testing Transcription
- Select audio upload type as "file".
- Define file path using "/tmp/recording" and get file extension from previous step's object.
- Test transcription by hitting "test".
- Optional fields can be set in next step when working with ChatGPT.
Troubleshooting Potential Errors
- If you receive an error saying that your recording no longer exists during testing, upload another file to Google Drive and test again.
- Set timeout value from 30 seconds to 180 seconds in execution control settings if needed.
Additional Resources
For more details about Pipe Dream settings, pricing, workflows etc., refer to the written version of this tutorial linked in the description.
Configuration Options
In this section, the speaker discusses the different configuration options available for using ChatGPT.
Selecting a Model
- The speaker recommends using the GPT 3.5 Turbo model as it is not currently frozen.
- If you have beta access to GPT-4 via the API, you can select that option.
Creating a Prompt
- A prompt consists of three parts: the user message (query), context, and system instructions.
- The user message is what you would type into ChatGPT to generate a response.
- The context is the transcript or any other text that provides background information for ChatGPT to analyze.
- The system instructions provide formatting instructions for ChatGPT's response and can include examples.
- When working with the API, the system instructions are a separate parameter from the user message.
Writing a User Message
- The speaker provides a pre-written prompt for summarizing a transcript with ChatGPT.
- The prompt includes specific instructions for writing a title and summary, as well as delimiters to separate different sections of output.
Requesting Lists
- You can ask ChatGPT to generate lists of main points, action items, follow-up questions, potential arguments against the transcript, or any other kind of list you want.
Conclusion
In this section, the speaker concludes by summarizing how to use ChatGPT effectively.
Using Delimiters
- Delimiters are useful for parsing through output from ChatGPT and separating different sections of data.
Dynamic Linking
- When working with multiple steps in an automation workflow, you can use dynamic linking to reference the output of a previous step.
Customizing Prompts
- You can customize prompts to ask ChatGPT for specific kinds of information or responses.
Summary
- The speaker provides an overview of the different configuration options and prompt components available when using ChatGPT.
- Using delimiters and dynamic linking can help parse through output and automate workflows effectively.
Setting System Instructions for Consistent Output
In this section, the speaker explains how to set system instructions for consistent output using Markdown formatting language.
Setting System Instructions
- To get consistent output, the entire transcript from the previous step is used as context.
- The speaker sets system instructions by telling it to only speak in Markdown and not write text that isn't formatted as Markdown.
- Example formatting can be given to be extra thorough.
- Temperature setting is a value from zero to one that influences how creative the output will be. A value of 0.2 works well for well-formatted markdown.
Testing and Demonstrating Output
In this section, the speaker tests and demonstrates the output of their system instructions using PipeDream and Notion.
Testing Output
- After setting system instructions, hit test to check if it works.
- Success message indicates that it worked.
- The choices property shows an array with content formatted in Markdown as requested.
Demonstrating Output
- Speaker copies the value of their output and pastes it into a new note in Notion workspace.
- The copied Markdown appears with headings and bullet lists as intended.
Adding Code Step for Elegant Results
In this section, the speaker adds a small code step to make results more elegant and save calls to open API.
Adding Code Step
- Another step is added using Node and Run Node Code actions.
- JavaScript or Python can be written if desired.
- Pre-written code block is pasted into window provided by PipeDream.
Copying and Pasting Code
In this section, the speaker explains how to copy and paste code into a blog post. They also discuss the importance of naming steps descriptively.
Naming Steps Descriptively
- The speaker renames a step from "Node" to "Formatter" for clarity.
- They explain that the name of each step is used in future exports, so it's important to choose descriptive names.
Copying and Pasting Code
- The speaker copies a block of code and pastes it into their blog post.
- They test the code and show the resulting object.
- The code splits text into paragraphs with no more than three sentences each, which will be sent as separate text blocks to Notion.
Sending Data to Notion
In this section, the speaker explains how to send data from Pipe Dream to Notion.
Adding a Final Step
- The speaker adds a final step to send data to Notion.
- They search for Notion as an app and select "Create Page from Database."
Connecting a Notion Account
- The speaker connects their College Info Geek workspace account.
- They select the notes database they want to use.
Setting Parent Database ID
- The speaker sets the parent database ID by selecting their desired database from a list or manually entering its ID.
- If your desired database isn't listed, you may need to grant access manually.
Specifying Page Content
- The speaker shows how Markdown syntax can be used dynamically for page layout but opts for simplicity in this tutorial.
- They specify what content will be sent as part of the page to Notion.
Setting up Notion API for AI Transcription and Summarization
In this section, the speaker explains how to set up a Notion API for AI transcription and summarization. They cover setting page titles, enabling property types, and choosing meta types.
Setting Up Page Title and Property Types
- The speaker explains that there are optional fields available when setting up a Notion API for AI transcription and summarization.
- One of these fields is called Meta Types, which is used to set an icon or page cover.
- The speaker recommends enabling property types before setting your meta type specifically.
- To set the title property, you can dynamically reference an object by copying its path from the formatter success message.
- For type, you can choose from different select options. The speaker chooses AI transcription.
Testing the Workflow
- The speaker shows a filtered view of notes where the type is AI transcription.
- After hitting test, the transcript and summary should show up in that exact view.
- Once deployed, anytime you upload a brand new audio file to that folder in Google Drive, this will kick off and you'll get that same transcript and summary inside of Notion.
Code Heavy Version for Longer Audio Files
In this section, the speaker discusses limitations with the no code version of their workflow. They recommend checking out the code heavy version if you want to transcribe longer podcast episodes or talk for an hour.
Limitations with No Code Version
- Whisper has a 25 megabyte file limit right now.
- The code heavy version of this tutorial is just as comprehensive, if not more than the no code version.
Using the Code Heavy Version
- If you're comfortable with at least copying and pasting code into Pipe Dream or chat GPT, then check out that section of the tutorial.
- The speaker notes that they are using the code heavy version themselves in their own workflow.
Ultimate Brain for Note-Taking System
In this section, the speaker promotes their all-in-one productivity template for Notion called Ultimate Brain.
About Ultimate Brain
- Ultimate Brain includes task management, GTD workflows, project management dashboard, goal tracking, recipe tracker, and an all-in-one notes dashboard.
Ultimate Brain Notion Template
In this section, the speaker introduces the Ultimate Brain Notion template and explains how it can be used to turn Notion into a second brain.
Features of Ultimate Brain Notion Template
- The Ultimate Brain Notion template is a customizable tool that allows users to organize their thoughts and ideas in one place.
- Customers can get a discount on the template by visiting thomasjfrank.com/brain.
- A full Notion Fundamentals course is available for free on the website, which covers everything from page basics to databases.
- Users can sign up for the Notion Tips email newsletter or ask questions in the comment section or on Twitter (@TomFrankly).
Conclusion
In this section, the speaker concludes by encouraging viewers to implement the workflow and share their feedback.
Final Thoughts
- Viewers are encouraged to try out the workflow and share their thoughts with the speaker.
- The video ends with an upbeat music outro.