MAIA Tables Reference

Technical explanation on how MAIA works with tables inside uploaded documents/connected integrations.

Last updated 6 months ago

Overview

MAIA processes tables differently based on your document analysis settings and file format. This reference explains how table detection, storage, and retrieval work.

File Format Support

We support tables in all available formats, in particular we have:

  • PDF files: Full table detection including scanned/image-based tables

  • DOCX, PPTX files: Native table structure recognition

  • TXT files: Markdown table format supported

  • CSV files: Available on request (contact your Account Executive)

Analysis Modes

Premium Mode (Advanced Analytics)

  • Table Detection: Uses document AI that "sees" page layout

  • Capabilities: Detects tables in scanned PDFs, preserves row/column structure, captures on-page position

  • Storage: Each table saved as structured item with Markdown formatting

Standard Mode

  • Table Detection: Text-only processing

  • Limitations: Tables not recognized as structured data, scanned tables missed entirely

How Tables Are Processed

Detection & Storage

  1. Each detected table becomes a single table item with metadata (row/column counts, page position). We know what a “whole table looks like”

  2. Content stored as Markdown table string for rendering and search. It’s all flattened to txt (no multi-dimensional tables possible).

  3. Individual cells stored separately but search uses Markdown text.

  4. Text inside tables not duplicated as regular paragraphs.

  5. Table captions (e.g., "Table 4.1") stored separately from table content. The same applies for other table surrounding objects, texts and paragraphs.

Embedding Behavior

  • Tables split into token-sized pieces for search scaling. Meaning we can retrieve a smaller portion of it if reasonable.

  • All pieces reference the same source table item.

  • Multi-page tables may appear as separate tables per page segment.