https://store-images.s-microsoft.com/image/apps.10812.f404cba0-1a74-4010-b4f5-353c38277924.5f6ab57b-afe7-4855-9be1-03c22d8c0618.b2f9a6fa-e48d-4d97-9fd2-5052065f23fa
Reader-LM 0.5b
oleh Jina AI
Just a moment, logging you in...
Small Language Models for Cleaning and Converting HTML to Markdown
Jina Reader-LM 0.5 b is a small language model that converts HTML content to Markdown content, which is useful for content conversion tasks. The model is trained on a curated collection of HTML content and its corresponding Markdown content.
Highlights:
- Jina Reader-LM 0.5b is designed to efficiently convert noisy HTML into clean markdown, showcasing a novel approach to web content extraction that is both cost-effective and scalable.
- Jina Reader-LM 0.5 has been optimized for long context support, handling up to 256K tokens, which is crucial for dealing with the intricacies of modern HTML, including inline CSS and scripts.
Jina Reader-LM 0.5b outperforms larger language models in the HTML-to-markdown conversion task, despite being significantly smaller in size, which is a testament to their specialized training and design for this specific task.
Aplikasi lainnya dari Jina AI
Jina Embeddings v3Jina AINew State-of-the-Art Multilingual Embeddings With Task LoRA
+1
Applicable to:
Virtual Machines
NaN out of 5
Jina Reranker v2 Base - MultilingualJina AINew state-of-the-art neural text reranking model for 100+ languages.
+1
Applicable to:
Virtual Machines
NaN out of 5
Jina Reranker v1 Base - enJina AIA state-of-the-art neural text reranking model supporting 8192 sequence length.
+1
Applicable to:
Azure Applications
NaN out of 5
Jina Embeddings v2 Base - esJina AIText embedding model (base) for English and Spanish input of size up to 8192 tokens.
+1
Applicable to:
Azure Applications
NaN out of 5
Jina Embeddings v2 Base - deJina AIText embedding model (base) for English and German input of size up to 8192 tokens.
+1
Applicable to:
Azure Applications
NaN out of 5