Index Management

This is a critical topic, especially given HANA's unique architecture as an in-memory, columnar database. The concept of "indexes" in HANA is somewhat different from traditional disk-based row-store databases.

Index Management in SAP HANA

In traditional relational databases, indexes are separate physical structures used to speed up data retrieval by providing fast lookup paths to data rows on disk. In SAP HANA, which primarily stores data in-memory and in a columnar format, the "index" concept is intrinsically tied to the column store itself.

Fundamental Principles in HANA

Column Store's Intrinsic Indexing:
- Value and Position IDs: Every column in a column-store table inherently maintains a "value ID" index (dictionary encoding) and a "position ID" list for each value. This structure is the primary form of indexing in HANA.
- When you query a column, HANA doesn't scan the raw values. It scans the highly compressed value IDs, which is extremely fast. Once relevant value IDs are found, it uses the position IDs to quickly locate the corresponding entries across other columns.
- This intrinsic indexing means that simply selecting a column in a WHERE clause is often very efficient without needing a separate B-tree index.
Specialized Indexes (B-tree, Text, Spatial): While the column store is self-indexing, HANA also supports explicit secondary indexes for specific use cases, primarily for:
- Row-store tables: These behave more like traditional databases and often benefit from B-tree indexes for WHERE clause filtering.
- Column-store tables:
  - Unique Constraints/Primary Keys: These automatically create a B-tree index in the background to ensure uniqueness and accelerate lookups.
  - Non-unique Secondary Indexes (B-tree indexes): Can be explicitly created on column-store tables, typically when queries filter heavily on a specific column(s) that are not part of the primary key and where the column store's intrinsic indexing isn't sufficient (e.g., highly selective filters on uncompressed columns or when join performance needs a boost). However, they are rarely needed and should be created with caution as they consume additional memory and can slightly increase write times.
  - Text Indexes: For fast full-text search capabilities on text data (e.g., CLOBs, VARCHARs).
  - Spatial Indexes: For geospatial data to accelerate spatial queries.

Types of Indexes in SAP HANA

Internal/Implicit Indexes (Column Store):
- Dictionary Encoding: For each column, a dictionary (mapping of actual values to compact integer IDs) and an attribute vector (list of these IDs for each row) are created. This is the primary internal "index."
- Inverted Index: Conceptually, this is what the position ID list provides. For each unique value (or value ID), there's a list of all row IDs where that value appears. This enables very fast lookups.
- Min/Max Indexes: Automatically maintained for each column, storing minimum and maximum values within blocks. Used for pruning data during scans.
- Sorted Attributes: Columns flagged for sorting or those that are part of the primary key may have their data physically sorted within the main store, which further accelerates range queries.
Explicit/External Indexes (B-tree Indexes):
- Syntax: CREATE [UNIQUE] INDEX <index_name> ON <table_name> (column1, column2, ...);
- Purpose:
  - Enforce uniqueness (UNIQUE INDEX).
  - Speed up lookups for specific WHERE clauses, especially on row-store tables.
  - Improve performance of joins.
- Considerations:
  - Consume additional memory and disk space (for persistence).
  - Can slightly increase write times (inserts, updates, deletes) because the index needs to be updated.
  - Use Sparingly on Column Store: Only create after thorough analysis (e.g., using SQL Trace - ST05, or Workload Analysis - HANA Cockpit) shows a bottleneck that cannot be resolved by data modeling or other means. Often, adding a B-tree index on a column-store table can be detrimental if not carefully chosen.
Full-Text Indexes (Text Search):
- Syntax: CREATE FULLTEXT INDEX <index_name> ON <table_name>(<column_name>) ASYNCHRONOUS;
- Purpose: Enables fast, sophisticated text searches (e.g., fuzzy search, linguistic analysis) on text-heavy columns.
- Considerations: Significant memory consumption and index build time for large text fields. Keep synchronization (SYNCHRONOUS/ASYNCHRONOUS) in mind for performance impact on writes.
Spatial Indexes:
- Syntax: CREATE SPATIAL INDEX <index_name> ON <table_name>(<geometry_column>);
- Purpose: Optimize queries involving geographical or spatial data types (e.g., finding points within a polygon, nearest neighbors).