Using a Code Snippet to Perform OCR via OpenAI API

This guide will help you understand how to use a SurveyToGo code snippet to perform Optical Character Recognition (OCR) using the OpenAI API. We'll walk through the code, explain how it works, and show you how to adapt it to your needs.


An example image that was processed by the OCR method (populated to Q_1).
20241121_205919.jpgScreenshot_20241121_210101.jpg


Overview

In this guide we will discuss how to create a survey that uses OpenAI integration with user attachments.

1. Setting up a survey
2. Copy code to Advanced Scripts
3. Setting Up the OpenAI API Key
4. Using the script with Prompt and Image
5. Customizing the prompt
6. Code overview and explanation
Final Notes

Step-by-Step Guide

1. Setting up a survey

Before implementing the OCR functionality, you need to set up your survey in SurveyToGo with the following three questions:

  1. First Question: Multimedia Question

    • Type: Multimedia Question
    • Purpose: To capture or upload the image that will be processed by the OCR.
    • Identifier: Assign a unique identifier (e.g., Q_1) to reference this question in your code.
    •  
  2. Third Question: Display Result

    • Type: Standard Question (e.g., Single Choice, Open-Ended)
    • Purpose: To display the OCR result or use it within the survey flow.
    • Identifier: Assign a unique identifier (e.g., Q_2) if you need to manipulate it further in your code.

 


Survey structure


 

2. Copy code to Advanced Scripts

This code should be placed in the Advanced Scripts. 

The advanced scripts will hold the function definition and callback functions neeeded for the implementation.


  // Replace with your actual OpenAI API key 
  var secret = 'your_openai_api_key'; 
  
  // Main function to initiate OCR with the image and prompt
 function PromptImageGPT(prompt, QuestionID) {
   var filename = GetAttachedFiles(QuestionID)[0];
   var fileBytes = ReadFileBytesSync(filename);
   var attachment64 = BytesToBase64(fileBytes);

   var options = CreateWSOptions();
   options.ContentType = "application/json";
   options.Headers["Authorization"] = "Bearer " + secret;
   var body  = '{ \
   "model": "gpt-4o-mini", \
     "messages": [ \
       { \
         "role": "user", \
         "content": [ \
           { \
             "type": "text", \
             "text": "' + prompt + '" \
           }, \
           { \
             "type": "image_url", \
             "image_url": { \
               "url": "data:image/jpeg;base64,' + attachment64 +'" \
             } \
           } \
         ] \
       } \
     ] \
   }';

   var res = WebServicePostSync("https://api.openai.com/v1/chat/completions",body,options);

   if (!res.IsError && res.HttpStatusCode == 200)
   {
     var jsonResult = ParseJson(res.Result);
     return jsonResult.choices[0].message.content;
   }
   else
   {
    Prompt("failed " + res.Result);
    return res.Result;
   }
}

 

3. Setting Up the OpenAI API Key

First, you need to set your OpenAI API key to authenticate your requests with OpenAI.
Instructions available Here.
Then insert the API Key you've created into the "secret" variable in the advanced scripts

// Replace 'your_openai_api_key' with your actual API key
var secret = 'your_openai_api_key';

Note: Keep your API key secure. Do not share it publicly or include it in code repositories.


 

4. Using the script with Prompt and Image

The prompt instructs the AI on what you want it to do with the image.
Add this script to the second question's Start Script.

// Example usage of the PromptImageGPT function
var res = PromptImageGPT("The image conatins text. Please perform OCR on the attached image and provide only the extracted text.", Q_1);
  • "The image conatins text. Please perform OCR on the attached image and provide only the extracted text.": This is the prompt sent to the AI.
  • Q_1: This is the identifier for the image question or input in your system.

Screenshot_20241121_210101.jpg


 

5. Customizing the prompt

You can modify the code to suit your needs by changing the prompt:

  • OCR:
    var res = PromptImageGPT("The image conatins text. Please perform OCR on the attached image and provide only the extracted text.", Q_1);
  • Describe image contents: 
    var res = PromptImageGPT("Describe what you see in the image", Q_1);
  • Count chosen Items:
    var chosenItemName = Answer(Q_ItemTypes);
    var prompt = stringFormat("The image conatins items. Count the number of '{0}' in the image, and return only the number", chosenItemName);
    var res = PromptImageGPT
    (prompt , Q_1);

 

6. Code overview and explanation

6.1 The PromptImageGPT Function

This function handles the image and initiates the OCR process.

function PromptImageGPT(prompt, QuestionID) {
var filename = GetAttachedFiles(QuestionID)[0]; var fileBytes = ReadFileBytesSync(filename); var attachment64 = BytesToBase64(fileBytes);
  • Explanation:
    • Retrieves the image file associated with QuestionID.
    • Reads the image file bytes.
    • BytesToBase64(fileBytes): Converts the binary image data to a Base64-encoded string.
  •  

6.2 Constructing the API Request

We create the request body that includes the prompt and the Base64-encoded image.

var body = '{ \
    "model": "gpt-4o-mini", \
    "messages": [ \
        { \
            "role": "user", \
            "content": [ \
                { \
                    "type": "text", \
                    "text": "' + imagePrompt + '" \
                }, \
                { \
                    "type": "image_url", \
                    "image_url": { \
                        "url": "data:image/jpeg;base64,' + attachment64 + '" \
                    } \
                } \
            ] \
        } \
    ] \
}';
  • Explanation:
    • Model: Specifies the AI model to use.
    • Messages: Contains the prompt and image data.
      • Content: The instruction for the AI.
      • Image URL: The Base64-encoded image embedded as a data URL.

6.3 Sending the Request

We send the composed message to the OpenAI API.

WebServicePostSync("https://api.openai.com/v1/chat/completions", body, options);
  • Explanation:
    • Sends a POST request to the API endpoint with the request body and headers.

6.4 Processing the Response

When the API responds, we process the result to extract the text.

 
 if (!res.IsError && res.HttpStatusCode == 200)
   {
     var jsonResult = ParseJson(res.Result);
     return jsonResult.choices[0].message.content;
   }
   else
   {
    Prompt("failed " + res.Result);
    return res.Result;
   }
  • Explanation:
    • ParseJson: Parses the JSON response from the API.
    • jsonResult.choices[0].message.content: Parse the OpenAI reply message and return the message content
    • Prompt: Displays the response or error message.

 

Final Notes

  • Testing: Try processing different images to see how the AI responds to various prompts.
  • Error Handling: Enhance the code with additional error checks as needed.

By following this guide, you can adapt the code snippet to perform OCR tasks tailored to your specific requirements. Customize the prompt, handle different image types, and integrate the code into your applications seamlessly.


 

Additional Resources

  • OpenAI API Documentation: OpenAI Vision API Reference
  • Best Practices:
    • Securely manage your API keys.
    • Implement proper error handling.
    • Optimize code for asynchronous operations.
  • Minimum supported version:
    • Studio 1.32.673
    • Playstore Android 2.0.614
Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.