Get in Touch

Course Outline

Introduction to Multimodal LLMs in Vertex AI

  • Overview of multimodal capabilities within Vertex AI.
  • Gemini models and supported modalities.
  • Enterprise and research use cases.

Setting Up the Development Environment

  • Configuring Vertex AI for multimodal workflows.
  • Managing datasets across different modalities.
  • Hands-on lab: Environment setup and dataset preparation.

Long Context Windows and Advanced Reasoning

  • Understanding long-context workflows.
  • Applications in planning and decision-making.
  • Hands-on lab: Implementing long-context analysis.

Cross-Modal Workflow Design

  • Combining text, audio, and image analysis.
  • Chaining multimodal steps within pipelines.
  • Hands-on lab: Designing a multimodal pipeline.

Working with Gemini API Parameters

  • Configuring multimodal inputs and outputs.
  • Optimizing inference and efficiency.
  • Hands-on lab: Tuning Gemini API parameters.

Advanced Applications and Integrations

  • Interactive multimodal agents and assistants.
  • Integrating external APIs and tools.
  • Hands-on lab: Building a multimodal application.

Evaluation and Iteration

  • Testing multimodal performance.
  • Metrics for accuracy, alignment, and drift.
  • Hands-on lab: Evaluating multimodal workflows.

Summary and Next Steps

Requirements

  • Proficiency in Python programming.
  • Experience with machine learning model development.
  • Familiarity with multimodal data types (text, audio, image).

Target Audience

  • AI researchers.
  • Advanced developers.
  • ML scientists.
 14 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories