
Job Summary
We are seeking a passionate AI Data Architect to design and build the data foundation that makes AI work well across Kuok Group — specifically, the hybrid vector and knowledge graph layer (the enterprise semantic memory) that underpins RAG-based use cases across every business unit, as well as the embedding pipelines, ingestion workflows, and data schemas that keep it accurate and current.
This is an architecture-first role. The successful candidate will make structural decisions about how the Group’s unstructured data is organised, retrieved, and made useful to AI systems — working closely with the Head, AI Platform on technical direction and with the Applied AI Engineers who depend on the data layer to build reliable, high-quality solutions.
The role sits at an exciting intersection of data engineering and AI infrastructure. Those who bring a strong data engineering background and a genuine curiosity about vector databases, knowledge graphs, and retrieval design will find a lot to build here.
Key Responsibilities
AI Data Foundation Architecture
- Design and own the hybrid vector and knowledge graph layer that underpins RAG across all Kuok Group BUs — the enterprise semantic memory that AI use cases draw on
- Make structural decisions on how unstructured data is organised for retrieval: chunking strategies, embedding approaches, metadata schemas, and knowledge graph ontologies
- Work with the Head, AI Platform to align data foundation design with the broader AI platform architecture and the requirements of active use cases
- Document architectural decisions clearly — capturing both the reasoning and the outcomes — so the wider team can work with confidence
Embedding Pipelines & Vector Infrastructure
- Build and maintain embedding pipelines: document ingestion, chunking, embedding model selection, and vector DB write workflows
- Own the vector database layer (Pinecone, Weaviate, or equivalent) — index management, refresh cadence, performance tuning, and cost management
- Design retrieval patterns that serve the needs of applied use cases: similarity search, hybrid search, re-ranking, and metadata filtering
- Ensure embedding pipelines are monitored, versioned, and recoverable — data foundation reliability is as important as application reliability
Knowledge Graph Design & Ingestion
- Design the knowledge graph layer (Neo4j or equivalent) — ontology modelling, entity and relationship schema, and ingestion workflows from source systems
- Work with domain experts across BUs to ensure the knowledge graph accurately reflects the entities, relationships, and terminology that matter in each business context
- Build and maintain ETL pipelines that keep the knowledge graph current as source data changes
- Knowledge graph capability is being built from the ground up at the Group — this role has a real opportunity to shape how it develops and set the direction for how it scales
Data Quality & Governance
- Establish data quality standards for AI-ingested content — source freshness, deduplication, completeness checks, and validation pipelines
- Work with BU Domain Data Stewards to validate that domain-specific data is accurate before it enters the AI data layer
- Maintain clear data lineage across the AI data foundation — what source data feeds which index or graph, and when it was last refreshed
- Partner with the AI Governance & Compliance Lead on data privacy requirements for AI-ingested content, particularly across BUs with sensitive operational data
Collaboration & Standards
- Partner with Applied AI Engineers to understand the retrieval requirements of each use case and ensure the data foundation is designed to support them well
- Work with the Lead Data Engineer (supporting functions) on the handoff boundary between structured data / BI pipelines and the AI data layer
- Maintain documentation of the AI data foundation — schemas, pipeline specs, refresh schedules, and known limitations — so the team can work with the data layer confidently
- Contribute to the broader AI Platform cluster's engineering standards and participate in code and design reviews
Key Requirements
- Solid data engineering foundations — you have designed and built ETL / ELT pipelines at production scale, managed data quality, and worked with structured and semi-structured data in cloud environments
- Hands-on experience with vector databases — you have built embedding pipelines, managed indexes, and designed retrieval patterns for RAG or semantic search applications
- Understanding of RAG architecture from the data side: chunking strategies, embedding model selection, retrieval optimisation, and the effect of data quality on AI output quality
- Experience designing schemas and data models for AI systems — with a strong appreciation for how data structure shapes retrieval quality and downstream AI output
- Strong Python skills and comfort with the data engineering tooling ecosystem: pipeline orchestration, data validation, and working with cloud storage and databases
- Clear, structured communication skills — you can explain data architecture decisions to both technical peers and non-technical stakeholders
Strong Advantage
- Experience with knowledge graph design and ontology modelling — Neo4j or equivalent, including schema design, Cypher querying, and ETL into graph structures
- Familiarity with enterprise data environments: federated data sources, multiple business domains, and working across teams with different data ownership models
- Experience working closely with applied engineering teams as a data infrastructure provider — you understand how the data layer choices you make affect what engineers can build
- Exposure to data governance and compliance requirements in an AI context: data lineage, PII handling, retention policies, and working with compliance stakeholders
- Background in unstructured data processing — document parsing, OCR, text extraction, or working with content repositories as AI data sources