How I Use AI to take perfect notes...without typing
Introduction and Overview
In this section, the speaker introduces a bot that converts voice notes into text and sends them to a notes database in Notion. The speaker explains that the bot provides not only a transcript but also a summary, main points, and action items.
Building a Voice-to-Text Bot
- The speaker built a bot that converts voice notes into text and sends them to a notes database in Notion.
- The bot uses OpenAI's Whisper model to transcribe audio files into text.
- Chat GPT is used to generate summaries, main points, and action items based on the transcript.
- The output of Chat GPT can be customized by changing the prompt given to it.
Benefits of the Workflow
In this section, the speaker discusses the benefits of using AI tools for note-taking. They explain how this workflow improves efficiency and allows for customization.
Customization and Efficiency
- The workflow allows for customization based on individual note-taking needs.
- Using AI tools widens the pipeline between one's actual brain and their second brain (in this case, Notion).
- Previously, typing notes on a phone was slow and cumbersome, but now voice-to-text transcription makes it faster and easier.
- The speaker shares their own note-taking system in Notion, which includes AI-transcribed voice notes.
Tools Required for Building the Workflow
This section outlines the four essential tools needed to set up the voice-to-text bot workflow.
Required Tools
- Notion account: A database is needed within Notion to store the converted voice notes.
- OpenAI account: Access to OpenAI's Whisper API for audio transcription and Chat GPT API for generating summaries.
- Cloud storage provider: A platform like Google Drive or Dropbox to upload audio files for automation.
- Pipedream account: An automation builder that connects all the apps and triggers the workflow when a new audio file is uploaded.
Overview of Automation Workflow
The speaker provides an overview of how the automation workflow functions, using Whimsical to visualize the process.
High-Level Workflow
- Voice notes are taken on a phone and uploaded to cloud storage (e.g., Dropbox or Google Drive).
- Pipedream automates the process by querying OpenAI's Whisper API and Chat GPT API.
- The results from OpenAI are sent to a new page in Notion, creating a comprehensive note-taking system.
Conclusion
The provided markdown file summarizes the transcript by breaking it down into meaningful sections with concise bullet points. Timestamps are used to link each bullet point to the corresponding part of the video. This structure allows for easy navigation and study of the transcript content.
Overview of the Workflow
The speaker explains the workflow they will be building, which involves sending a transcription to ChatGPT for summarization, formatting the results, and sending them to Notion to create a new page.
Setting Up the Workflow
- Sign up for a Pipe Dream account.
- Access the dashboard or start building a new workflow.
- Name the workflow "Speech to Text to Notion".
- The first step is always the trigger, which cannot be renamed.
Triggering the Workflow with Google Drive
- Search for Google Drive in Pipe Dream's actions.
- Select the "New Files" action to trigger when a new file is added in your linked Google Drive.
- Connect your Google Drive account and choose "My Drive" as the option.
- Specify a specific folder (e.g., "audio upload test") within your Google Drive account to watch for new files.
Generating Test Event for Trigger
- To generate a test event, upload an audio file to the specified folder in Google Drive.
- Open the event generated by the trigger and select the newly uploaded file.
Downloading File from Google Drive
- Since OpenAI's Whisper cannot directly access Google Drive, download the file from Google Drive into Pipe Dream's temp directory.
- Search for "download file" in Pipe Dream's actions and select the corresponding action.
- Use custom expression or copied path from previous step to reference the dynamic value of the file.
Downloading File from Google Drive
The speaker explains why it is necessary to download files from Google Drive before uploading them to OpenAI's Whisper model.
Uploading File to OpenAI's Whisper Model
- Downloading files from Google Drive allows access by OpenAI's Whisper model through Pipe Dream.
- Search for "drive" in Pipe Dream's actions and select Google Drive.
- Choose the "Download a File" action to download the file from Google Drive.
- Connect your Google Drive account.
- Use custom expression or copied path to reference the dynamic value of the file.
Uploading File to OpenAI's Whisper Model
The speaker continues explaining how to upload the downloaded file from Google Drive to OpenAI's Whisper model.
Accessing Properties of Exported Object
- Successful results in Pipe Dream steps generate a green success message and an exported object in the exports tab.
- Copying the path allows dynamically referencing values that come in with each automation run.
The transcript does not provide further details beyond this timestamp.
Downloading Files from Google Drive
In this section, the speaker explains how to download files from Google Drive using their ID and specifying a destination file path. They also discuss how to handle different file types.
Downloading Files
- To download a file from Google Drive, you need to provide the ID of the file and specify a destination file path.
- The destination path can be set to the
/TMPdirectory with a specific name likerecording.mp3.
- However, if you have different audio file types, such as m4a or mp3, it's important to dynamically set the extension based on the uploaded file type.
- By accessing the properties of the event object, such as
full_file_extension, you can determine the correct extension for your destination file path.
- This allows for flexibility in handling various audio file types supported by Whisper.
Sending Audio File for Transcription
In this section, the speaker explains how to send an audio file for transcription using OpenAI's Chat GPT app. They also provide instructions on creating an OpenAI account and obtaining API keys.
Using OpenAI Chat GPT App
- To send an audio file for transcription, use the "create transcription" action in the OpenAI Chat GPT app.
- Before using this app, make sure to connect it with your OpenAI account.
- If you don't have an OpenAI account yet, you can sign up at platform.openai.com and receive $5 worth of free tokens as a trial offer.
- After signing up and logging into your account, go to "manage account" under "personal" in the top right corner.
- Create a new secret key under "user API Keys" and copy it.
- Paste this API key into the workflow and save it.
- Select the audio upload type as "file" since the file is already stored in Temp Storage.
- Define the file path by accessing the previous step's object and retrieving the file extension from
full_file_extension.
- Test the workflow to ensure successful transcription of the uploaded audio file.
Obtaining Transcription Results
In this section, the speaker discusses how to obtain transcription results from OpenAI Chat GPT app and access them in Pipedream workflow.
Obtaining Transcription
- After testing the workflow, you will receive a full transcript of the audio file in the return value property called
transcription.
- This transcript can be accessed and used for further processing or analysis within your Pipedream workflow.
- The obtained transcription is an accurate representation of the audio content that was uploaded.
Uploading Files to Google Drive
The speaker discusses potential issues that may arise when running automation and uploading files to Google Drive. They mention that if the automation takes a while to build, there might be a problem with the download file step. To resolve this, they suggest uploading another file to Google Drive and testing the download file step again.
Timeout Settings for Pipedream Workflows
The speaker explains that Pipedream workflows have a default timeout of 30 seconds. However, if using Whisper and uploading large files, it may take longer than 30 seconds. To address this issue, they recommend adjusting the timeout value in the execution control settings from 30 seconds to 180 seconds.
Additional Details and Code Heavy Method
The speaker mentions that more detailed information about Pipedream settings, pricing, and workflow can be found in the written version of the tutorial. They also mention an alternative method called the "code heavy" method which involves using code blocks. However, for this video tutorial, they will focus on the no-code method.
Adding Chat GPT Step for Transcription Summarization
The speaker adds another chat GPT step to work with OpenAI's Chat GPT API for summarizing transcripts. They explain that authentication is already set up and recommend selecting the GPT 3.5 turbo model if beta access is not available. They also discuss the importance of creating a well-crafted prompt for better output quality.
Components of a Prompt
The speaker explains that a prompt consists of three parts: the user message, context, and system instructions. The user message is what you would type into Chat GPT, the context is the transcript or any relevant information for analysis, and the system instructions guide Chat GPT on how to respond.
Filling Out User Message Field
The speaker retrieves a pre-written prompt from the written version of the tutorial. They paste it into the user message field in Pipedream's chat GPT step. The prompt includes specific instructions for writing a title for the transcript under 15 words and adding a delimiter for parsing the output.
Summary Prompt Structure
The speaker explains that the prompt structure includes specific instructions for writing a title and adding a delimiter. They emphasize that using a well-crafted prompt is crucial for obtaining desired results from Chat GPT.
Timestamps are provided in [HH:MM:SS] format
Splitting the Data into Separate Pieces
The speaker discusses the need to split the title, summary, and lists into separate pieces of data for individual processing. They explain that they will use delimiters and formatting instructions to achieve this.
Using Delimiters for Separation
- The speaker plans to use delimiters such as "summary" and "additional info" to split the data.
- They request Chat GPT to write summaries at heading 1 and create lists of main points, action items, follow-up questions, potential arguments against the transcript, or any other desired list.
- By providing these delimiters and instructions, they aim to obtain separate pieces of data.
Dynamically Linking Transcription
The speaker demonstrates how to dynamically link the transcription from a previous step in Pipedream.
Setting Up Dynamic Reference
- In Pipedream, they find the transcription property in the results of creating a transcription.
- They copy the path of this property to dynamically reference it in subsequent steps.
- This dynamic reference allows them to use the transcription as context for further processing.
System Instructions for Consistent Output
The speaker explains how system instructions can be used to ensure consistent output formatting in Markdown.
Formatting with Markdown
- To achieve well-formatted Markdown output consistently, they instruct Chat GPT that it is an assistant that only speaks in Markdown.
- They emphasize not writing text that isn't formatted as Markdown.
- By setting these system instructions, they expect Chat GPT's responses to always be well-formatted Markdown.
Testing with Temperature Setting
The speaker discusses the use of temperature settings to control the creativity of Chat GPT's output.
Setting Temperature for Output
- The speaker sets a temperature value between 0 and 1 to influence the creativity of Chat GPT's output.
- A higher temperature value allows for more creative responses, while a lower value results in more straightforward and summarizing outputs.
- They choose a temperature value of 0.2 to obtain well-formatted Markdown and concise summaries.
Obtaining Well-Formatted Markdown Output
The speaker explores the success message and verifies that Chat GPT provides well-formatted Markdown output.
Checking Output Format
- They examine the choices property, which contains an array with content properties.
- By drilling down into these properties, they confirm that the title, summary delimiter, summary, additional info, and main points are all formatted as requested in Markdown.
- They demonstrate copying this Markdown output and pasting it into Notion to show how it retains headings and bullet lists.
Adding Code Step for Title Extraction
The speaker explains the need for an additional code step to extract the title from the transcript.
Extracting Title from Transcript
- While reviewing previous examples, they notice that the transcript is not included in the result from Pipedream.
- To address this issue, they add a small code step to extract the title separately using JavaScript.
- This step ensures that only relevant information is included in each section of their workflow.
The remaining part of the transcript was not provided.
Formatting the Title Summary and Transcript
In this section, the speaker discusses formatting the title summary and transcript for a blog post. They provide a code block that can be copied and pasted to achieve this formatting.
Changing Step Name
- The speaker suggests giving the step a more descriptive name instead of just "node" to avoid potential issues with future references.
- By changing the step name to "formatter," it becomes easier to reference in future steps.
Copying Code Block
- The speaker instructs to copy the provided code block from the blog post.
- This code block splits the title, summary, and lists into separate properties in an object.
- It also formats the transcript into paragraphs with no more than three sentences each.
Sending Data to Notion
- To send the formatted data to Notion, a final step is added using the "create page from database" action.
- A Notion account needs to be connected, and access to the desired database should be granted.
- The parent database ID is set, specifying where the page will be created.
- Markdown syntax can be used in specifying page content layout dynamically.
Sending Data to Notion
In this section, the speaker explains how to send data from Pipedream to Notion using a specific action called "create page from database."
Connecting Notion Account
- The speaker authenticates their Notion account within Pipedream by selecting their workspace.
- It is important to ensure that access is granted for Pipedream to have access rights for creating pages in Notion.
Selecting Parent Database
- The parent database ID field needs to be set by choosing an appropriate database from available options or manually entering a notion database ID.
- If Pipedream does not have access to the desired database, manual access can be granted by adding Pipedream as a connected app in Notion.
Specifying Page Content
- The page content is specified using markdown syntax.
- A code block from the written version is copied and pasted into the page content field.
- Dynamic referencing of values is possible within the page content, such as summary, transcript, and additional info.
Optional Fields
- Optional fields like meta types (for icon or page cover) and additional page properties can be set if needed.
Setting Property Types
In this section, the speaker discusses setting property types for a specific task.
Choosing Property Types
- The speaker wants to set the title property, which is by default the name property in the database.
- They also want to set a property called type to a value called AI transcription.
Enabling Configuration Options
This section focuses on enabling configuration options for the task.
Enabling Options
- The speaker chooses the icon option.
- Emoji can be searched and added as an option.
- The speaker searches for the robot emoji and selects it for their configuration.
Dynamically Referencing Objects
Here, the speaker demonstrates how to dynamically reference objects in the configuration.
Referencing Title Object
- The speaker pins the configuration option for title.
- They go back to the formatter success message and copy the path of the title property.
- Returning to title in the configuration, they paste the copied path.
Selecting Type Option
- For type, there are multiple select options available.
- The speaker clicks on AI transcription, which was previously set in their Notion database.
Testing Workflow
This section covers testing the workflow and checking if new transcripts appear in a filtered view.
Preparing Notes Area
- The speaker navigates to their voice notes area, which is a filtered view showing notes with AI transcription as their type.
Clicking Test Button
- After configuring all options, they click on test.
- Upon completion, they expect to see their brand new transcript and summary appear in their filtered view of notes.
Successful Page Creation
This section confirms the successful creation of a page and the appearance of the transcript and summary in Notion.
Page Creation
- The speaker confirms that the page was created successfully.
- All exported return values from the Notion API are displayed.
Transcript and Summary in Notion
- They check their Notion workspace and find the brand new transcript and summary.
- The content is located under main points, action items, follow-up questions, etc.
Live Workflow Deployment
Here, the speaker deploys the workflow to make it live for automatic transcription and summarization.
Deploying Workflow
- They click on deploy to make the workflow live.
- The workflow will now wait for events triggered by uploading audio files to a specific folder in Google Drive.
Limitations of No Code Version
This section discusses limitations of the no code version of this tutorial.
File Size Limitation
- The no code version has a 25 megabyte file limit for transcription using Whisper.
Code Heavy Version Solution
- To transcribe longer audio files or overcome limitations, they recommend using the code heavy version explained in an article/tutorial.
- It involves copying and pasting code into Pipedream or Chat GPT for customization options.
Choosing Between Versions
The speaker explains why they chose to focus on the no code version for simplicity but also highlights benefits of using the code heavy version.
Simplified Workflow Purpose
- The no code version is suitable for simple voice notes and quick setup without extensive coding knowledge.
- It provides an easy way to get started with AI transcription and summarization.
Code Heavy Version Benefits
- The code heavy version allows transcribing long podcast episodes or lengthy recordings.
- It offers more customization options and flexibility for advanced users.
- The article/tutorial provides comprehensive guidance for implementing the code heavy version.
Ultimate Brain Note-Taking System
This section introduces the Ultimate Brain note-taking system, which is recommended for the AI transcription and summarization workflow.
Ultimate Brain Features
- Ultimate Brain is an all-in-one productivity template for Notion.
- It includes task management, GTD workflows, project management dashboard, goal tracking, recipe tracker, and an all-in-one notes dashboard.
- Each feature has its own dedicated page within the template.
Active Support and Community
- Ultimate Brain offers active support from a dedicated team that answers questions promptly.
- There is also a community of customers who share their customizations and changes to the template.
Conclusion and Additional Resources
The speaker concludes the video by providing additional resources and ways to stay updated with new tutorials and templates.
Discounted Access to Ultimate Brain
- Viewers can get discounted access to Ultimate Brain at Thomas jfrank.com/brain using a discount code provided in the video description.
Notion Tips Newsletter
- To receive notifications about new notion tutorials or released templates, viewers can sign up for the Notion Tips newsletter mentioned in the video description.
Free Notion Fundamentals Course
- The speaker offers a free Notion fundamentals course on their website.
- The course covers various aspects of using Notion, including page basics, writing system, databases, etc.
New Section
In this section, the speaker concludes the video and invites viewers to try out the information shared and provide feedback.
Closing Remarks
- The speaker encourages viewers to try out the information presented.
- Viewers are invited to share their thoughts and feedback.
- The speaker expresses gratitude for watching the video.
- The video ends with a promise to see viewers in the next one.
Please note that there is no specific timestamp provided for this section.