This guide will help you understand how to use a SurveyToGo code snippet to perform Optical Character Recognition (OCR) using the OpenAI API. We'll walk through the code, explain how it works, and show you how to adapt it to your needs.
An example image that will was processed by the OCR method (populated to Q_1
).
Overview
In this guide we will discuss how to create a survey that uses OpenAI integration with user attachments.
1. Setting up a survey
2. Copy code to Advanced Scripts
3. Setting Up the OpenAI API Key
4. Using the script with Prompt and Image
5. Customizing the prompt
6. Code overview and explanation
Final Notes
Step-by-Step Guide
1. Setting up a survey
Before implementing the OCR functionality, you need to set up your survey in SurveyToGo with the following three questions:
-
First Question: Multimedia Question
- Type: Multimedia Question
- Purpose: To capture or upload the image that will be processed by the OCR.
-
Identifier: Assign a unique identifier (e.g.,
Q_1
) to reference this question in your code.
-
Second Question: Waiting screen - Empty Question
- Type: Empty Question (Will show a waiting message)
- Purpose: To run the OCR logic and wait for the result from the OpenAI API.
-
Identifier: Assign a unique identifier (e.g.,
Q_2
) to reference this question in your code.
-
Third Question: Display Result
- Type: Standard Question (e.g., Single Choice, Open-Ended)
- Purpose: To display the OCR result or use it within the survey flow.
-
Identifier: Assign a unique identifier (e.g.,
Q_3
) if you need to manipulate it further in your code.
Placeholder for Image: Survey structure
2. Copy code to Advanced Scripts
This code should be placed in the Advanced Scripts.
The advanced scripts will hold the function definition and callback functions neeeded for the implementation.
3. Setting Up the OpenAI API Key
First, you need to set your OpenAI API key to authenticate your requests with OpenAI.
Instructions available Here.
Then insert the API Key you've created into the "secret" variable in the advanced scripts
Note: Keep your API key secure. Do not share it publicly or include it in code repositories.
4. Using the script with Prompt and Image
The prompt instructs the AI on what you want it to do with the image.
Add this script to the second question's Start Script.
-
: This is the prompt sent to the AI."The image conatins text. Please perform OCR on the attached image and provide only the extracted text."
-
Q_1
: This is the identifier for the image question or input in your system.
5. Customizing the prompt
You can modify the code to suit your needs by changing the prompt:
-
OCR:
PromptImageGPT("The image conatins text. Please perform OCR on the attached image and provide only the extracted text.", Q_1);
-
Describe image contents:
PromptImageGPT("Describe what you see in the image", Q_1);
-
Count chosen Items:
var chosenItemName = Answer(Q_ItemTypes);
var prompt = stringFormat("The image conatins items. Count the number of '{0}' in the image, and return only the number", chosenItemName);
PromptImageGPT(prompt , Q_1);
6. Code overview and explanation
6.1 The PromptImageGPT Function
This function handles the image and initiates the OCR process.
-
Explanation:
- Retrieves the image file associated with
QuestionID
. - Asynchronously reads the image file bytes.
- Saves the prompt for the image.
- Retrieves the image file associated with
6.2 Reading and Encoding the Image
After reading the image file bytes, we convert them to a Base64 string.
-
Explanation:
-
BytesToBase64(fileBytes)
: Converts the binary image data to a Base64-encoded string.
-
6.3 Constructing the API Request
We create the request body that includes the prompt and the Base64-encoded image.
-
Explanation:
- Model: Specifies the AI model to use.
-
Messages: Contains the prompt and image data.
- Content: The instruction for the AI.
- Image URL: The Base64-encoded image embedded as a data URL.
6.4 Sending the Request
We send the composed message to the OpenAI API.
-
Explanation:
- Sends a POST request to the API endpoint with the request body and headers.
6.5 Processing the Response
When the API responds, we process the result to extract the text.
-
Explanation:
- ParseJson: Parses the JSON response from the API.
- Vars["response"]: Stores the extracted text.
- Optional -SetAnswer: Saves the response to the next question or variable in your system.
- ExecutionMgr.GotoNext(): Proceeds to the next question.
- Prompt: Displays the response or error message.
Final Notes
- Testing: Try processing different images to see how the AI responds to various prompts.
- Error Handling: Enhance the code with additional error checks as needed.
By following this guide, you can adapt the code snippet to perform OCR tasks tailored to your specific requirements. Customize the prompt, handle different image types, and integrate the code into your applications seamlessly.
Additional Resources
- OpenAI API Documentation: OpenAI Vision API Reference
-
Best Practices:
- Securely manage your API keys.
- Implement proper error handling.
- Optimize code for asynchronous operations.
Comments
Article is closed for comments.