How to integrate Google cloud vision MCP with Codex

Codex is one of the most popular coding harnesses out there. And MCP makes the experience even better. With Google cloud vision MCP integration, you can draft, triage, summarise emails, and much more, all without leaving the terminal or the app, whichever you prefer.

Google cloud vision logoGoogle cloud vision
Api Key

Google Cloud Vision API adds advanced image analysis—like labeling, OCR, and detection—to apps. It helps you extract structured data and insights from images at scale.

29 Tools

Introduction

Codex is one of the most popular coding harnesses out there. And MCP makes the experience even better. With Google cloud vision MCP integration, you can draft, triage, summarise emails, and much more, all without leaving the terminal or the app, whichever you prefer.

Also integrate Google cloud vision with

Why use Composio?

Apart from a managed and hosted MCP server, you will get:

  • CodeAct: A dedicated workbench that allows GPT to write its code to handle complex tool chaining. Reduces to-and-fro with LLMs for frequent tool calling.
  • Large tool responses: Handle them to minimise context rot.
  • Dynamic just-in-time access to 20,000 tools across 1000+ other Apps for cross-app workflows. It loads the tools you need, so GPTs aren't overwhelmed by tools you don't need.

How to install Google cloud vision MCP in Codex

Run the setup command

Run this command in your terminal to add the Composio MCP server to Codex.

Terminal

It will initiate the authentication in a browser window, authorize Codex to access your Composio account.

Composio authentication page

(Optional) Authenticate with OAuth

To authenticate manually, run the login command to open a browser window and authorize Codex to access your Composio account.

bash
codex mcp login composio

Verify the connection

Run codex mcp list to confirm Composio appears as a registered MCP server.

bash
codex mcp list

Codex App

Codex App follows the same approach as VS Code.

  1. Click ⚙️ on the bottom left → MCP Servers → + Add servers → Streamable HTTP:
  2. Fill the header and Key fields with { "x-consumer-api-key" = "ck_*******" }.
  3. The Key is the Composio API key, that you can find on dashboard.composio.dev
  4. Click on Authenticate and authorize Codex to your Composio account and you're all set.
Codex App MCP setup
  1. Restart and verify if it's there in .codex/config.toml
bash
[mcp_servers.composio]
url = "https://connect.composio.dev/mcp"
http_headers = { "x-consumer-api-key" = "ck_*******" }

What is the Google cloud vision MCP server, and what's possible with it?

The Google cloud vision MCP server is an implementation of the Model Context Protocol that connects your AI agent and assistants like Claude, Cursor, etc directly to your Google Cloud Vision account. It provides structured and secure access to your image analysis resources, so your agent can perform actions like registering products, managing reference images, listing endpoints, and automating large-scale image operations on your behalf.

  • Product and reference image management: Easily create new products and add reference images for visual search, enabling your agent to organize and expand your vision datasets effortlessly.
  • Bulk import and product set operations: Let your agent import large numbers of reference images into product sets from Cloud Storage CSV files, streamlining dataset curation at scale.
  • Automated product cleanup and deletion: Direct your agent to purge unused or orphan products from your project, keeping your cloud resources tidy without manual effort.
  • Location and endpoint discovery: Quickly list available Vision AI service locations and existing IndexEndpoints, making it easy for your agent to select optimal regions and manage deployment targets.
  • Vision API operation tracking: Retrieve and review ongoing or past Vision API operations, so your agent can monitor processing jobs and ensure workflow transparency.

Conclusion

You've successfully integrated Google cloud vision with Codex using Composio's MCP server. Now you can interact with Google cloud vision directly from your terminal, VS Code, or the Codex App using natural language commands.

Key benefits of this setup:

  • Seamless integration across CLI, VS Code, and standalone app
  • Natural language commands for Google cloud vision operations
  • Managed authentication through Composio
  • Access to 20,000+ tools across 1000+ apps for cross-app workflows
  • CodeAct workbench for complex tool chaining

Next steps:

  • Try asking Codex to perform various Google cloud vision operations
  • Explore cross-app workflows by connecting more toolkits
  • Build automation scripts that leverage Codex's AI capabilities
TOOLS

Supported Tools

Every Google cloud vision action and event your agent gets out of the box.

Annotate Files with Vision API

Tool to perform image detection and annotation for batch files in Google Cloud Vision.

Async Batch Annotate Files

Tool to run asynchronous image detection and annotation for a list of generic files (PDF, TIFF, GIF).

Annotate Images

Run image detection and annotation for a batch of images using Google Cloud Vision API.

Annotate Images Async Batch

Tool to run asynchronous image detection and annotation for a batch of images.

Annotate Location Images

Tool to run image detection and annotation for a batch of images scoped to a specific project and location.

Create Vision Product

Creates a new Product resource in Google Cloud Vision Product Search.

Create Product Set

Creates a new ProductSet resource in Google Cloud Vision Product Search.

Create ReferenceImage

Tool to create a ReferenceImage under a product.

Delete Product

Permanently deletes a Product and its associated reference images from Google Cloud Vision API.

Get Product

Tool to get information associated with a Product.

Get Product Set

Tool to get a ProductSet.

Import Product Sets

Asynchronously imports product sets and reference images from a CSV file stored in Google Cloud Storage.

List Vision AI IndexEndpoints

Lists IndexEndpoints in Vertex AI Vision for a given project and location.

List Locations

Tool to list available Vision AI service locations for a project.

List Vision API Operations

Tool to list operations that match the specified filter.

Purge Products

Tool to asynchronously delete products in a ProductSet or orphan products.

Update Product

Tool to update a Product's mutable fields: displayName, description, and productLabels.

Update Product Set

Tool to update a ProductSet resource.

Add Product to ProductSet

Add a Product to a ProductSet in Google Cloud Vision Product Search.

Cancel Vision Operation

Starts asynchronous cancellation of a long-running Vision API operation.

Delete Vision API Operation

Tool to delete a long-running Vision API operation.

Delete Product Set

Tool to permanently delete a ProductSet.

Delete Reference Image

Permanently removes a reference image from a product in Google Cloud Vision Product Search.

Get Vision API Operation

Retrieves the latest state of a long-running Vision API operation.

Get Reference Image

Tool to get information associated with a ReferenceImage.

List Products in ProductSet

Tool to list Products in a specified ProductSet.

List Projects

List Google Cloud projects accessible to the authenticated user via Cloud Resource Manager API.

List Reference Images

Tool to list reference images for a product.

Remove Product from ProductSet

Removes a Product from a specified ProductSet in Google Cloud Vision API.

FAQ

Frequently asked questions

With a standalone Google cloud vision MCP server, the agents and LLMs can only access a fixed set of Google cloud vision tools tied to that server. However, with the Composio Tool Router, agents can dynamically load tools from Google cloud vision and many other apps based on the task at hand, all through a single MCP endpoint.

Yes, you can. Codex fully supports MCP integration. You get structured tool calling, message history handling, and model orchestration while Tool Router takes care of discovering and serving the right Google cloud vision tools.

Yes, absolutely. You can configure which Google cloud vision scopes and actions are allowed when connecting your account to Composio. You can also bring your own OAuth credentials or API configuration so you keep full control over what the agent can do.

All sensitive data such as tokens, keys, and configuration is fully encrypted at rest and in transit. Composio is SOC 2 Type 2 compliant and follows strict security practices so your Google cloud vision data and credentials are handled as safely as possible.

Start with Google cloud vision.It takes 30 seconds.

Managed auth, hosted MCP servers, and every Google cloud vision tool your agent needs.Free to start.

Start building