Step 2: Integrating Gemini AI to Understand Requests and Extract Data | Case Study: Building an AI-Powered Invoice and Meeting Scheduler | First Steps with Google Workspace Studio: AI Workflow Development Course Connecting Gmail, Calendar and Spreadsheets

In the previous section, we successfully set up the trigger—the digital tripwire that fires every time a relevant email lands in our inbox. We now have a reliable starting point for our automation. But this victory immediately presents us with a much more complex challenge: the email itself is just a block of unstructured, human-written text. How do we teach our script to actually read and understand it?

This is the critical gap between simple automation and true AI-powered workflows. A basic script can find keywords, but it can't comprehend context. It wouldn't know that "Invoice for our last project" and "Here's my bill #4815" are requests of the same type, nor could it differentiate between "Let's meet next Tuesday at 3pm" and a casual mention of a meeting that happened last Tuesday. To bridge this gap, we need to integrate a cognitive engine. This is where Google's Gemini AI comes in.

Think of Gemini as the brain of our operation. While Google Apps Script provides the hands and feet to interact with our Workspace apps, Gemini provides the ability to reason about the information it receives. By sending the raw text of an email to the Gemini API, we can ask it to perform tasks that were once the exclusive domain of human assistants: identify the sender's intent, find specific pieces of data, and reformat that data into a predictable, machine-readable structure.

Our primary tool for communicating with Gemini is the "prompt." A prompt is simply a set of instructions you give the AI. The art and science of crafting effective instructions is known as prompt engineering, a core skill in modern AI development. For our workflow, the goal isn't to have a chat with the AI; it's to give it a precise task: act as a data-extraction expert. A great prompt tells the AI its role, the context of the data it's analyzing, and, most importantly, the exact format for its response.

For automation, the single most important prompting technique is to demand a specific output structure. Requesting the answer in JSON (JavaScript Object Notation) format is a game-changer. It transforms the AI's potentially variable, conversational answer into a clean, organized data object that our Google Apps Script can parse with zero ambiguity. This is how we ensure our workflow is not just intelligent, but also reliable.

Let's see how this looks in practice within our Google Apps Script. We'll create a function that takes the email body as an argument, constructs a prompt, and calls the Gemini API to get our structured data.

function extractDataWithGemini(emailBody) {
  const API_KEY = 'YOUR_GEMINI_API_KEY'; // Replace with your actual API key
  const API_URL = 'https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=' + API_KEY;

  // We will define this prompt in the next step
  const prompt = `Analyze the following email and extract the key information in JSON format. Email content: "${emailBody}"`;

  const requestBody = {
    "contents": [{
      "parts": [{
        "text": prompt
      }]
    }]
  };

  const options = {
    'method': 'post',
    'contentType': 'application/json',
    'payload': JSON.stringify(requestBody)
  };

  const response = UrlFetchApp.fetch(API_URL, options);
  const responseText = response.getContentText();
  const jsonResponse = JSON.parse(responseText);
  
  // Extract the text content from the AI's response
  const extractedText = jsonResponse.candidates[0].content.parts[0].text;
  return JSON.parse(extractedText); // This should be our structured JSON data
}

The code handles the technical connection, but the magic happens in the prompt variable. Let's design a powerful prompt for our invoice-processing task. Notice how we give the AI a role, specify the task, define the fields we want, and provide instructions for cases where data is missing.

Example Prompt for Invoice Extraction:

"You are an automated data extraction assistant. Your task is to analyze the following email content and extract the invoice details. Provide the output only in a valid JSON format with the following keys:

requestType: Should be "invoice".
invoiceNumber: The invoice ID or number as a string.
amount: The total amount due as a number.
dueDate: The due date in 'YYYY-MM-DD' format.
senderName: The name of the person or company sending the invoice.

If any information is not present in the email, use null for its value. Here is the email content: [Email Body Text Here]"

This same principle applies beautifully to our meeting scheduling task. We simply swap out the prompt to ask for different information, demonstrating the flexibility of this approach.

Example Prompt for Meeting Scheduling:

"You are an automated scheduling assistant. Your task is to analyze the following email content to identify a meeting request. Provide the output only in a valid JSON format with the following keys:

requestType: Should be "meeting".
topic: A brief summary of the meeting's purpose.
attendees: An array of strings containing the email addresses or names of proposed attendees.
proposedDateTime: The first suggested date and time in 'YYYY-MM-DDTHH:MM:SS' ISO 8601 format.

If any information is not present, use null for its value. Here is the email content: [Email Body Text Here]"

graph TD
    A[Email Arrives with 'Process' Label] --> B{Apps Script Trigger Fires};
    B --> C[Function Extracts Email Body];
    C --> D[Sends Body to Gemini API with a Formatted Prompt];
    D --> E{Gemini Analyzes Text};
    E --> F[Returns Structured JSON Data];

With this step complete, we've built the most crucial part of our intelligent system. We have successfully taught our script how to read and interpret human requests, turning ambiguous emails into the clean, structured data that computers love. We've moved from a simple trigger to a sophisticated understanding engine.

Of course, this structured data is only useful if we do something with it. Right now, it's just a variable in our script. In the next section, we will take this JSON object and use it to perform concrete actions: logging the invoice details in a Google Sheet and drafting a new event in Google Calendar. This is where our automated assistant truly comes to life.

References

Google. (2024). Introduction to prompt design. Google AI for Developers.
Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165.
Wu, T., et al. (2023). An Introduction to Large Language Models. Google Research.
White, J., & Kazemian, F. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382.
Google. (2024). UrlFetchApp Class. Google Apps Script Documentation.

graph TD A[Email Arrives with 'Process' Label] --> B{Apps Script Trigger Fires}; B --> C[Function Extracts Email Body]; C --> D[Sends Body to Gemini API with a Formatted Prompt]; D --> E{Gemini Analyzes Text}; E --> F[Returns Structured JSON Data];