https://store-images.s-microsoft.com/image/apps.15069.efa036a6-aab7-4624-953c-6e13debc592a.061478f1-1a01-4f49-960a-5cb980f897e9.e3423b23-14f7-4475-a0d8-d254ecd812e2
Document Intelligence
by Stealth Labs LTD
Just a moment, logging you in...
SharePoint content extractor for integration with Purview
SharePoint Content Scraper with AI Chat Interface is an enterprise-grade Azure solution that transforms your SharePoint environment into a searchable, AI-powered knowledge base. Deployed as a fully managed Azure Container App, it combines intelligent document extraction with a RAG-powered chat interface, allowing your team to instantly find and understand content across your entire SharePoint estate.
Unlike simple document crawlers, this solution provides a complete document intelligence platform:
- Ask Questions, Get Answers: Chat with your SharePoint documents using natural language. The AI retrieves relevant documents and synthesizes accurate answers with source citations.
- Multi-Site Management: Configure and monitor multiple SharePoint sites from a single dashboard. Add, enable, disable, or trigger scans for individual sites without redeployment.
- Real-Time Visibility: Watch scanning progress live with detailed statistics, error tracking, and site-by-site status updates.
Key Features
AI-Powered Chat Interface
- RAG (Retrieval-Augmented Generation): Ask questions about your documents in plain English
- Source Citations: Every answer includes links to the source documents
- Smart Suggestions: AI-generated questions based on your indexed content
- Context-Aware Responses: Uses multiple documents to provide comprehensive answers
Multi-Site Management Dashboard
- Centralized Configuration: Manage all SharePoint sites from one interface
- Per-Site Controls: Enable/disable scanning, trigger immediate scans, or delete sites
- Status Monitoring: Track scanning status (pending/scanning/active/error) per site
- Document Counts: See how many documents each site contributes
Intelligent Document Processing
- 10+ File Formats: PDF, Word, Excel, PowerPoint, text files, JSON, CSV, and code files
- Full Content Extraction: Text, tables, embedded content, and metadata
- Smart Incremental Updates: Only reprocesses changed files using delta detection
- High Performance: Processes 30+ files per second with concurrent workers
AI-Powered Analysis (Optional)
- Document Summaries: Auto-generate concise summaries using Azure OpenAI
- Security Risk Detection: Flag exposed credentials, PII, and compliance issues
- Stale Content Identification: Highlight documents not updated in 5+ years
- Compliance Support: GDPR, HIPAA, SOC2, and PCI-DSS awareness
Real-Time Progress Dashboard
- Live Statistics: Documents processed, failed, sites scanned
- Current Activity: See exactly which site and library is being scanned
- Error Tracking: View recent errors with affected documents and timestamps
- Document Analytics: Breakdown by file type and site distribution
Use Cases
Knowledge Discovery
- "What is our company policy on remote work?"
- "Find all contracts mentioning ABC Corporation"
- "What security procedures do we have documented?"
Content Auditing
- Inventory all documents across SharePoint sites
- Identify stale content needing review
- Track document growth and distribution
Compliance & Security
- Discover documents with exposed credentials or PII
- Identify outdated security policies
- Support GDPR data subject access requests
Migration Planning
- Understand content distribution before migrations
- Identify document types and volumes per site
- Plan storage and bandwidth requirements
Prerequisites
Required:
- Entra ID (Azure AD) App Registration
- Grant permission (application type)
- For tenant-wide scanning, grant admin consent
- Create a client secret
Optional (for AI features): 2. Azure OpenAI Service
- Deployed GPT-4o or GPT-3.5-turbo model
- API endpoint and key