Job Description

AI Data Architect
Posting Start Date:  19/05/2026
Country/Region:  Singapore
Work Location:  Singapore Great World City
Business/ Function:  IT

Job Summary

We are seeking a passionate AI Data Architect to design and build the data foundation that makes AI work well across Kuok Group — specifically, the hybrid vector and knowledge graph layer (the enterprise semantic memory) that underpins RAG-based use cases across every business unit, as well as the embedding pipelines, ingestion workflows, and data schemas that keep it accurate and current.

 

This is an architecture-first role. The successful candidate will make structural decisions about how the Group’s unstructured data is organised, retrieved, and made useful to AI systems — working closely with the Head, AI Platform on technical direction and with the Applied AI Engineers who depend on the data layer to build reliable, high-quality solutions.

 

The role sits at an exciting intersection of data engineering and AI infrastructure. Those who bring a strong data engineering background and a genuine curiosity about vector databases, knowledge graphs, and retrieval design will find a lot to build here.

Key Responsibilities

AI Data Foundation Architecture

  • Design and own the hybrid vector and knowledge graph layer that underpins RAG across all Kuok Group BUs — the enterprise semantic memory that AI use cases draw on
  • Make structural decisions on how unstructured data is organised for retrieval: chunking strategies, embedding approaches, metadata schemas, and knowledge graph ontologies
  • Work with the Head, AI Platform to align data foundation design with the broader AI platform architecture and the requirements of active use cases
  • Document architectural decisions clearly — capturing both the reasoning and the outcomes — so the wider team can work with confidence

Embedding Pipelines & Vector Infrastructure

  • Build and maintain embedding pipelines: document ingestion, chunking, embedding model selection, and vector DB write workflows
  • Own the vector database layer (Pinecone, Weaviate, or equivalent) — index management, refresh cadence, performance tuning, and cost management
  • Design retrieval patterns that serve the needs of applied use cases: similarity search, hybrid search, re-ranking, and metadata filtering
  • Ensure embedding pipelines are monitored, versioned, and recoverable — data foundation reliability is as important as application reliability

Knowledge Graph Design & Ingestion

  • Design the knowledge graph layer (Neo4j or equivalent) — ontology modelling, entity and relationship schema, and ingestion workflows from source systems
  • Work with domain experts across BUs to ensure the knowledge graph accurately reflects the entities, relationships, and terminology that matter in each business context
  • Build and maintain ETL pipelines that keep the knowledge graph current as source data changes
  • Knowledge graph capability is being built from the ground up at the Group — this role has a real opportunity to shape how it develops and set the direction for how it scales

Data Quality & Governance

  • Establish data quality standards for AI-ingested content — source freshness, deduplication, completeness checks, and validation pipelines
  • Work with BU Domain Data Stewards to validate that domain-specific data is accurate before it enters the AI data layer
  • Maintain clear data lineage across the AI data foundation — what source data feeds which index or graph, and when it was last refreshed
  • Partner with the AI Governance & Compliance Lead on data privacy requirements for AI-ingested content, particularly across BUs with sensitive operational data

Collaboration & Standards

  • Partner with Applied AI Engineers to understand the retrieval requirements of each use case and ensure the data foundation is designed to support them well
  • Work with the Lead Data Engineer (supporting functions) on the handoff boundary between structured data / BI pipelines and the AI data layer
  • Maintain documentation of the AI data foundation — schemas, pipeline specs, refresh schedules, and known limitations — so the team can work with the data layer confidently
  • Contribute to the broader AI Platform cluster's engineering standards and participate in code and design reviews

Key Requirements

  • Solid data engineering foundations — you have designed and built ETL / ELT pipelines at production scale, managed data quality, and worked with structured and semi-structured data in cloud environments
  • Hands-on experience with vector databases — you have built embedding pipelines, managed indexes, and designed retrieval patterns for RAG or semantic search applications
  • Understanding of RAG architecture from the data side: chunking strategies, embedding model selection, retrieval optimisation, and the effect of data quality on AI output quality
  • Experience designing schemas and data models for AI systems — with a strong appreciation for how data structure shapes retrieval quality and downstream AI output
  • Strong Python skills and comfort with the data engineering tooling ecosystem: pipeline orchestration, data validation, and working with cloud storage and databases
  • Clear, structured communication skills — you can explain data architecture decisions to both technical peers and non-technical stakeholders

 

Strong Advantage

  • Experience with knowledge graph design and ontology modelling — Neo4j or equivalent, including schema design, Cypher querying, and ETL into graph structures
  • Familiarity with enterprise data environments: federated data sources, multiple business domains, and working across teams with different data ownership models
  • Experience working closely with applied engineering teams as a data infrastructure provider — you understand how the data layer choices you make affect what engineers can build
  • Exposure to data governance and compliance requirements in an AI context: data lineage, PII handling, retention policies, and working with compliance stakeholders
  • Background in unstructured data processing — document parsing, OCR, text extraction, or working with content repositories as AI data sources

Education

Bachelors in Computer Engineering or Computer Science

Certifications