Skip to main content

Posts

Showing posts from May, 2024

LlamaParse: Incredibly good at parsing PDFs

  What is LlamaParse? LlamaParse is a proprietary parsing service that is incredibly good at parsing PDFs with complex tables into a well-structured markdown format. It directly integrates with LlamaIndex ingestion and retrieval to let you build retrieval over complex, semi-structured documents. It is promised to be able to answer complex questions that weren’t possible previously. This service is available in a public preview mode: available to everyone, but with a usage limit (1k pages per day) with 7,000 free pages per week. Then $0.003 per page ($3 per 1,000 pages). It operates as a standalone service that can also be plugged into the managed ingestion and retrieval API Currently, LlamaParse primarily supports PDFs with tables, but they are also building out better support for figures, and an expanded set of the most popular document types: .docx, .pptx, .html as a part of the next enhancements. Code Implementation: Install required dependencies: a) Create requirements.txt in t...