This is an automated email from the ASF dual-hosted git repository.

pinal pushed a commit to branch ATLAS-5021_u
in repository https://gitbox.apache.org/repos/asf/atlas.git


The following commit(s) were added to refs/heads/ATLAS-5021_u by this push:
     new f4e585931 ATLAS-5021: added documentation
f4e585931 is described below

commit f4e58593169f144e3350dcc81b0bf6b68c438b22
Author: Pinal Shah <pinal.s...@freestoneinfotech.com>
AuthorDate: Tue Sep 9 10:21:49 2025 +0530

    ATLAS-5021: added documentation
---
 docs/src/documents/Tools/TrinoExtractor.md | 149 +++++++++++++++++++++++++++++
 1 file changed, 149 insertions(+)

diff --git a/docs/src/documents/Tools/TrinoExtractor.md 
b/docs/src/documents/Tools/TrinoExtractor.md
new file mode 100644
index 000000000..cf660f643
--- /dev/null
+++ b/docs/src/documents/Tools/TrinoExtractor.md
@@ -0,0 +1,149 @@
+---
+name: Trino Extractor
+route: /TrinoExtractor
+menu: Documentation
+submenu: Tools
+---
+
+import  themen  from 'theme/styles/styled-colors';
+import  * as theme  from 'react-syntax-highlighter/dist/esm/styles/hljs';
+import SyntaxHighlighter from 'react-syntax-highlighter';
+
+# Trino Extractor
+
+## Overview
+
+The Trino Extractor is a comprehensive metadata extraction utility designed 
for Apache Atlas integration with Trino. It provides discovery, extraction, and 
synchronization of Trino metadata including catalogs, schemas, tables, and 
columns into Apache Atlas for enhanced data governance and metadata management.
+
+## Key Features
+
+### Metadata Extraction
+* **Comprehensive Discovery**: Automatically discovers and extracts metadata 
from Trino catalogs, schemas, tables, and columns
+* **JDBC-Based Connection**: Uses standard Trino JDBC driver for reliable 
connectivity
+* **Selective Extraction**: Supports extraction for specific catalog, schema, 
or table names
+
+### Atlas Integration
+* **Entity Management**: Creates and updates Atlas entities for Trino metadata 
objects
+* **Relationship Mapping**: Establishes proper hierarchical relationships 
between catalogs, schemas, tables, and columns
+* **Synchronization**: Maintains consistency by removing Atlas entities that 
no longer exist in Trino
+* **Connector Support**: Specialized handling for Trino connectors for which 
Atlas captures the metadata through individual Hook like Hive, Iceberg
+
+### Scheduling & Automation
+* **Cron-based Scheduling**: Supports automated periodic extraction using cron 
expressions
+* **One-time Execution**: Can be run as a single extraction job
+* **Error Handling**: Robust error handling with detailed logging
+
+## Architecture
+
+<SyntaxHighlighter language="text" style={theme.github}>
+{`
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   Trino Cluster │    │ Trino Extractor │    │  Apache Atlas   │
+│                 │    │                 │    │                 │
+│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
+│ │  Catalogs   │◄┼────┼►│ JDBC Client │ │    │ │  Entities   │ │
+│ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │
+│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
+│ │   Schemas   │ │    │ │ Extraction  │◄┼────┼►│Relationships│ │
+│ └─────────────┘ │    │ │   Service   │ │    │ └─────────────┘ │
+│ ┌─────────────┐ │    │ └─────────────┘ │    │ ┌─────────────┐ │
+│ │   Tables    │ │    │ ┌─────────────┐ │    │ │  Lineage    │ │
+│ └─────────────┘ │    │ │Atlas Client │◄┼────┼►│    Data     │ │
+│ ┌─────────────┐ │    │ └─────────────┘ │    │ └─────────────┘ │
+│ │   Columns   │ │    │                 │    │                 │
+│ └─────────────┘ │    └─────────────────┘    └─────────────────┘
+└─────────────────┘                           
+`}
+</SyntaxHighlighter>
+
+## Quick Start
+
+### 1. Configuration Setup
+
+Configure the `atlas-trino-extractor.properties` file:
+
+<SyntaxHighlighter language="properties" style={theme.github}>
+{`
+# Atlas connection
+atlas.rest.address=http://localhost:21000/
+# Trino connection
+atlas.trino.jdbc.address=jdbc:trino://localhost:8080/
+atlas.trino.jdbc.user=your-username
+# Catalogs to extract
+atlas.trino.catalogs.registered=hive_catalog,iceberg_catalog
+`}
+</SyntaxHighlighter>
+
+### 2. Basic Execution
+
+<SyntaxHighlighter language="bash" style={theme.github}>
+{`
+# Extract all registered catalogs
+./bin/run-trino-extractor.sh
+# Extract specific catalog
+./bin/run-trino-extractor.sh -c my_catalog
+# Schedule periodic extraction (every 6 hours)
+./bin/run-trino-extractor.sh -cx "0 0 */6 * * ?"
+`}
+</SyntaxHighlighter>
+
+## Configuration Properties
+
+| Property | Description | Default | Example |
+|----------|-------------|---------|---------|
+| `atlas.rest.address` | Atlas REST API endpoint | `http://localhost:21000/` | 
`https://atlas.company.com:21443/` |
+| `atlas.trino.jdbc.address` | Trino JDBC URL | - | 
`jdbc:trino://trino-server:8080/` |
+| `atlas.trino.jdbc.user` | Trino username | - | `admin` |
+| `atlas.trino.jdbc.password` | Trino password | `""` | `password123` |
+| `atlas.trino.namespace` | Trino instance namespace | `cm` | 
`production-cluster` |
+| `atlas.trino.catalogs.registered` | Catalogs to extract | - | 
`hive,iceberg,mysql` |
+| `atlas.trino.catalog.hook.enabled.<catalog-name>` | Hook enabled under atlas 
for this catalog? | `false` | `true` |
+| `atlas.trino.catalog.hook.enabled.<catalog-name>.namespace` | Namespace 
under Atlas for this Hook | `cm` | `cm` |
+| `atlas.trino.extractor.schedule` | Cron expression | - | `0 0 2 * * ?` |
+
+## Command Line Usage
+
+### Available Options
+
+| Option | Long Form | Description | Example |
+|--------|-----------|-------------|---------|
+| `-c` | `--catalog` | Extract specific catalog | `-c hive_catalog` |
+| `-s` | `--schema` | Extract specific schema | `-s sales_data` |
+| `-t` | `--table` | Extract specific table | `-t customer_orders` |
+| `-cx` | `--cronExpression` | Schedule with cron expression | `-cx "0 0 2 * * 
?"` |
+| `-h` | `--help` | Display help information | `-h` |
+
+## Connector-Specific Processing
+
+#### For Example: Hive Connector Integration
+<SyntaxHighlighter language="properties" style={theme.github}>
+{`
+# Enable Hive hook integration
+atlas.trino.catalog.hook.enabled.hive_catalog=true
+atlas.trino.catalog.hook.enabled.hive_catalog.namespace=cm
+`}
+</SyntaxHighlighter>
+
+#### Benefits:
+- Links Trino entities with existing Hive entities
+- Maintains consistency between Hive and Trino metadata
+- Supports environments with Atlas Hive hooks
+
+## FAQ
+
+### Troubleshooting Questions
+
+**Q: Why are some entities not appearing in Atlas?**
+
+A: Check catalog registration, permissions, and network connectivity. Review 
logs for specific errors.
+
+**Q: How do I handle large clusters with thousands of tables?**
+
+A: Use selective extraction, increase memory allocation, schedule during 
off-peak hours, and process catalogs individually.
+
+### Documentation
+- [Apache Atlas Documentation](https://atlas.apache.org/#/)
+- [Trino Documentation](https://trino.io/docs/)
+- [Atlas REST API Reference](https://atlas.apache.org/api/v2/)
+
+

Reply via email to