Question Answering System — NLP Project (Intermediate)

3 min readOct 22, 2022

Build an end to end QA system using Haystack transformers & Streamlit 🔥

1. Introduction

Natural Language Processing is one of the important and exciting fields in AI and Data science. NLP applications are already used in many places — chatbots, sentiment analyzers, recommender systems, translators, search engines, etc.

In this article, we will develop an end to end Question Answering application.

But, What is Question Answering ? — the task of searching through a large collection of documents for a piece of text that answers a question. Simply, answering the questions using a set of documents as reference. QA systems are used for information retrieval, document search, Real time FAQ etc..

Haystack is an open-source framework for building search systems that work intelligently over large document collections.
Streamlit is an open-source framework for building Machine Learning and Data Science web apps.

2. QA system — Overview

Documents: Source of information. Word documents, Plain text documents, PDFs, etc..
File Converter: Converts files on your computer into the documents that can be processed by the Haystack pipeline.
Preprocessor: Cleans and splits the text into sensible units.
Document Store: The component in Haystack that stores the text documents and their metadata in a way that optimises retrieval time.
Retriever: A lightweight filter that selects only the most relevant documents for the Reader to further process.
Reader: A trained Question Answering model that does the closest reading of a document to extract the exact text which answers a question.

3. Project !!

We will develop our QA sytem on a book (Think & grow Rich by Napolean Hill pdf) using it’s pages as our documents, this system will give answers to our questions related to the book.

Setting up the python environment (Learn more..)

# create a virtual environment
python3 -m venv env# activate the env
for ubuntu: source env/bin/activate 
for windows: env\Scripts\activate# install required packages
pip install --upgrade pip
pip install git+https://github.com/deepset-ai/haystack.git
pip install requests streamlit pdftotext

Data

Downloading a pdf book from the internet, you can use your local documents also.

Indexing pipeline

We will convert the text document into the haystack supported format and apply Preprocessor to clean and split the document into sensible units. We will store these preprocessed texts in a SQL document store.

Search pipeline

We will download our reader (a pre-trained transformer model on QA task) and also initialize our retriever to search top k relevant documents in document store.

For a given question, the retriever will search for the top ‘k’ documents relevant to the question and reader will predict answers using those ‘k’ documents instead of searching the whole document store.

Web app

We will build our web app using streamlit framework which is compatible with haystack and also it is easy to use.

Run the application using the below comand and our web app will open at http://localhost:8501/ in your browser.