Using a Code Snippet to Perform OCR via OpenAI API

This guide will help you understand how to use a SurveyToGo code snippet to perform Optical Character Recognition (OCR) using the OpenAI API. We'll walk through the code, explain how it works, and show you how to adapt it to your needs.


An example image that will was processed by the OCR method (populated to Q_1).
20241121_205919.jpgScreenshot_20241121_210101.jpg


Overview

In this guide we will discuss how to create a survey that uses OpenAI integration with user attachments.

1. Setting up a survey
2. Copy code to Advanced Scripts
3. Setting Up the OpenAI API Key
4. Using the script with Prompt and Image
5. Customizing the prompt
6. Code overview and explanation
Final Notes

Step-by-Step Guide

1. Setting up a survey

Before implementing the OCR functionality, you need to set up your survey in SurveyToGo with the following three questions:

  1. First Question: Multimedia Question

    • Type: Multimedia Question
    • Purpose: To capture or upload the image that will be processed by the OCR.
    • Identifier: Assign a unique identifier (e.g., Q_1) to reference this question in your code.
  2. Second Question: Waiting screen - Empty Question

    • Type: Empty Question (Will show a waiting message)
    • Purpose: To run the OCR logic and wait for the result from the OpenAI API.
    • Identifier: Assign a unique identifier (e.g., Q_2) to reference this question in your code.
  3. Third Question: Display Result

    • Type: Standard Question (e.g., Single Choice, Open-Ended)
    • Purpose: To display the OCR result or use it within the survey flow.
    • Identifier: Assign a unique identifier (e.g., Q_3) if you need to manipulate it further in your code.

 


Placeholder for Image: Survey structure


 

2. Copy code to Advanced Scripts

This code should be placed in the Advanced Scripts. 

The advanced scripts will hold the function definition and callback functions neeeded for the implementation.

 
// Replace with your actual OpenAI API key
var secret = 'your_openai_api_key';
var imagePrompt = '';

// Main function to initiate OCR with the image and prompt
function PromptImageGPT(prompt, QuestionID) {
imagePrompt = prompt; // Set the prompt var filename = GetAttachedFiles(QuestionID)[0]; // Get the image file associated with the question var fileBytes = ReadFileBytes(filename); // Read the image file bytes } // Callback function after reading the image bytes function OnReadFileBytesResult(inFileName, inSuccess, fileBytes) { var attachment64 = BytesToBase64(fileBytes); // Convert image bytes to Base64 string var options = CreateWSOptions(); // Create web service options options.ContentType = "application/json"; options.Headers["Authorization"] = "Bearer " + secret; // Set the authorization header with API key // Construct the request body with prompt and image data var body = '{ \ "model": "gpt-4o-mini", \ "messages": [ \ { \ "role": "user", \ "content": [ \ { \ "type": "text", \ "text": "' + imagePrompt + '" \ }, \ { \ "type": "image_url", \ "image_url": { \ "url": "data:image/jpeg;base64,' + attachment64 + '" \ } \ } \ ] \ } \ ] \ }'; // Send the request to the OpenAI API WebServicePost("https://api.openai.com/v1/chat/completions", body, options); } // Callback function to handle the API response function OnWebServiceResult(inTicket, inOriginalUrl, inXMLResult, inIsError, inHttpStatus, inContentType) { if (!inIsError) { var result = ParseJson(inXMLResult); // Parse the JSON response Vars["response"] = result.choices[0].message.content; // Extract the text from the response //SetAnswer(Q_2, Vars["response"]); // Save the response to the next question or variable Prompt(Vars["response"]); // Display the response
ExecutionMgr.GotoNext(); // Proceed to the next question
} else { Prompt("Failed: " + inXMLResult); // Display the error message } }

 

3. Setting Up the OpenAI API Key

First, you need to set your OpenAI API key to authenticate your requests with OpenAI.
Instructions available Here.
Then insert the API Key you've created into the "secret" variable in the advanced scripts

// Replace 'your_openai_api_key' with your actual API key
var secret = 'your_openai_api_key';

Note: Keep your API key secure. Do not share it publicly or include it in code repositories.


 

4. Using the script with Prompt and Image

The prompt instructs the AI on what you want it to do with the image.
Add this script to the second question's Start Script.

// Example usage of the PromptImageGPT function
PromptImageGPT("The image conatins text. Please perform OCR on the attached image and provide only the extracted text.", Q_1);
  • "The image conatins text. Please perform OCR on the attached image and provide only the extracted text.": This is the prompt sent to the AI.
  • Q_1: This is the identifier for the image question or input in your system.

Screenshot_20241121_210101.jpg


 

5. Customizing the prompt

You can modify the code to suit your needs by changing the prompt:

  • OCR:
    PromptImageGPT("The image conatins text. Please perform OCR on the attached image and provide only the extracted text.", Q_1);
  • Describe image contents: 
    PromptImageGPT("Describe what you see in the image", Q_1);
  • Count chosen Items:
    var chosenItemName = Answer(Q_ItemTypes);
    var prompt = stringFormat("The image conatins items. Count the number of '{0}' in the image, and return only the number", chosenItemName);
    PromptImageGPT
    (prompt , Q_1);

 

6. Code overview and explanation

6.1 The PromptImageGPT Function

This function handles the image and initiates the OCR process.

function PromptImageGPT(prompt, QuestionID) {
imagePrompt = prompt; var filename = GetAttachedFiles(QuestionID)[0]; var fileBytes = ReadFileBytes(filename); }
  • Explanation:
    • Retrieves the image file associated with QuestionID.
    • Asynchronously reads the image file bytes.
    • Saves the prompt for the image.

6.2 Reading and Encoding the Image

After reading the image file bytes, we convert them to a Base64 string.

function OnReadFileBytesResult(inFileName, inSuccess, fileBytes) {
    var attachment64 = BytesToBase64(fileBytes);
    // ...
}
  • Explanation:
    • BytesToBase64(fileBytes): Converts the binary image data to a Base64-encoded string.

6.3 Constructing the API Request

We create the request body that includes the prompt and the Base64-encoded image.

var body = '{ \
    "model": "gpt-4o-mini", \
    "messages": [ \
        { \
            "role": "user", \
            "content": [ \
                { \
                    "type": "text", \
                    "text": "' + imagePrompt + '" \
                }, \
                { \
                    "type": "image_url", \
                    "image_url": { \
                        "url": "data:image/jpeg;base64,' + attachment64 + '" \
                    } \
                } \
            ] \
        } \
    ] \
}';
  • Explanation:
    • Model: Specifies the AI model to use.
    • Messages: Contains the prompt and image data.
      • Content: The instruction for the AI.
      • Image URL: The Base64-encoded image embedded as a data URL.

6.4 Sending the Request

We send the composed message to the OpenAI API.

WebServicePost("https://api.openai.com/v1/chat/completions", body, options);
  • Explanation:
    • Sends a POST request to the API endpoint with the request body and headers.

6.5 Processing the Response

When the API responds, we process the result to extract the text.

 
function OnWebServiceResult(inTicket, inOriginalUrl, inXMLResult, inIsError, inHttpStatus, inContentType) {
    if (!inIsError) {
        var result = ParseJson(inXMLResult);
        Vars["response"] = result.choices[0].message.content;
Prompt(Vars["response"]);
// SetAnswer(Q_2, Vars["response"]); // Save the response to the next question or variable ExecutionMgr.GotoNext(); } else { Prompt("Failed: " + inXMLResult); } }
  • Explanation:
    • ParseJson: Parses the JSON response from the API.
    • Vars["response"]: Stores the extracted text.
    • Optional -SetAnswer: Saves the response to the next question or variable in your system.
    • ExecutionMgr.GotoNext(): Proceeds to the next question.
    • Prompt: Displays the response or error message.

 

Final Notes

  • Testing: Try processing different images to see how the AI responds to various prompts.
  • Error Handling: Enhance the code with additional error checks as needed.

By following this guide, you can adapt the code snippet to perform OCR tasks tailored to your specific requirements. Customize the prompt, handle different image types, and integrate the code into your applications seamlessly.


 

Additional Resources

  • OpenAI API Documentation: OpenAI Vision API Reference
  • Best Practices:
    • Securely manage your API keys.
    • Implement proper error handling.
    • Optimize code for asynchronous operations.
Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.