Image Recognition in Ruby

6 min readDec 17, 2020

A few steps you can use to help you get started with image recognition in ruby

This blog post is documentation through my journey through the Google API documentation, which you can view here

Google Cloud Vision is an API allowing developers to understand the content of an image, by implementing Google Machine learning and artificial intelligence learning models. Some interesting use cases of image recognition include.

Competitive Landscape

courtesy of kairos

Google, Amazon, and Microsoft are the most popular vision API’s and although each of the companies has the same end-user result, I choose to work with Google because it connects to the entire google cloud, and once you have it set up, it gives you access to many different artificial intelligence capabilities. I also found their documentation easy to follow, so if you’re reading this and haven’t yet take a look and then come back to this article, for my own personal experience.

You can get started for free with up to $300, but with a pro plan

they offer Tech Support
You’re killing it

Many smaller companies offer compelling use cases as well.

Use Cases

Authentication and Emotion

Microsoft Project Oxford AI

Microsoft’s Project Oxford offers facial recognition to easily recognize users and certain photos and allow users to log into phones using authentication. Used as a tool for marketing companies to gauge success metrics.

A Self Driving Future

Self-driving car image classification

Self-driving cars perform something known as image classification, which is trained to recognize various objects such as other cars and pedestrians. Images are detected or classified. The classified images get passed to the neural network and get localized where the artificial intelligence connects the input with a valid suggestion or result.

Augmented Reality

Marker Based Augmented Reality

Marker Based augmented reality works of triggers in your local environment. These markers, which work off of image recognition, can provide users with entire experiences based on their location and in-situ environments.

Getting Started

Create a new Google Cloud Project

a) Sign up quickly using your GMAIL account.

b) On the dashboard tap the “My First Project” dropdown and select “New Project”

c) Enter your project name and hit create

d) You should see an empty dashboard

Google Cloud Platform Dashboard

Start cloud shell

In order to run the cloud environment, you need to have a server running, instead of downloading a server, for the purposes of this article you can use the Google Cloud Shell, which is a command-line environment that runs in the cloud

Run the following command to confirm that you are authenticated

gcloud auth list

Command Output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)

Run this code next

gcloud config list project

Command Output

project = <PROJECT_ID>

3. Enable the vision API

Run this in the cloud shell

gcloud services enable vision.googleapis.com

Command Output

4. Create a service account

In order to use the API, you need to create a service account to make requests. There are a few different ways to go about this step, and it is the most complex, but here’s what worked best for me.

Go to the Service Account Key Page
From the Service account list, select New service account.
In the Service account name field, enter a name.
From the Role list, select Project > Owner.
Click Create. A JSON file that contains your key downloads to your computer.
Remember the save location as you will need to access it in the next step

After creating and downloading your service account key in JSON. Locate your file path in the terminal and copy it into your clipboard

After confirming the file works and is opening properly set this file two proper keys in your terminal input the following

export GOOGLE_APPLICATION_CREDENTIALS="/Users/AndyXcode/Downloads/Mod-3-Blogpost-fff77bdf1b10.json"t

This sets your application credentials

export GOOGLE_CLOUD_PROJECT="/Users/AndyXcode/Downloads/Mod-3-Blogpost-fff77bdf1b10.json"

This sets your cloud project credentials, note both of these are pointed to the same JSON file. Note if your file name has any spaces, make sure to edit the file name before setting the credentials. Once you have this set up the rest is pretty straightforward.

5. Install the Cloud Vision Gem

gem install google-cloud-vision -v 0.27.0

6. Download the sample ruby repository

git clone https://github.com/GoogleCloudPlatform/ruby-docs-samples.git
cd ruby-docs-samples
git checkout "a902f30dd449ce82469cc315610c8a3d4888ff5a"

change directory into ruby-docs-samples/vision

cd vision

Run bundle install in your vision folder

7. Start IRB

irb --noecho

8. Sample an Image

Run the following code in your IRB

require "google/cloud/vision"vision = Google::Cloud::Vision.new
**select an image**
image  = vision.image "images/otter_crossing.jpg"

9. Explore API

The vision API offers many different options of ways to analyze and image and they also provide ruby files with information on how to use each of its different features.

10. Perform image analysis

Here’s the code Google Provides for us to identify landmarks. The goes into the image and calls landmarks, and pastes the landmark’s description. The code also then puts both the latitude and longitude of landmark

image.landmarks.each do |landmark|
    puts landmark.descriptionlandmark.locations.each do |location|
      puts "#{location.latitude}, #{location.longitude}"
    end
  end

Let’s run this image

A caption will be provided by Google… shortly

image = vision.image "images/imagine.png" => [#<Google::Cloud::Vision::Annotation::Entity mid: "/m/013397", locale: "", description: "Strawberry Fields (memorial)", score: 0.416331022977829, confidence: 0.0, topicality: 0.0, bounds: 4, locations: 1, properties: {}>]

It successfully pulled the image description — and it’s location property

Strawberry Fields (memorial)40.775744, -73.975281

Let’s paste that location into google maps

Strawberry Fields (memorial) 40.775744, -73.975281

Let’s try and perform text detection on that image as well, see if the API can pick up on the text “imagine”

puts image.text

close…

IMAG NE=> nil

it’s missing the ‘i’ in imagine.

Now for the final test

Take screenshot of student in the class and run an emotion detection

image coming soon

JOY: 
ANGER: 
SORROW:
SUPRISE:

As you can tell once you have the available data from the image input there are a lot of different directions that you can take this app. The API offers unlimited possibilities. The next step for me would be to take the API from IRB and bring it into a functional javascript application that I can work with. All of the possibilities make Google Vision API one of the most exciting developments in the industry.