Document summarizer using Open AI on LangChain

For the sake of a use case, the intention of this example is to summarize a resume. Google Colab was used for this experiment but you can use your own IDE/environment. Just make sure you have the necessary prerequicites set.

  1. Since I am using Google Colab, I will be uploading the sample input file to the “Files” store. You can choose to use your local disk storage if you are on a laptop/pc.
  2. While you can use any file format, I am using a pdf file as input so we have to convert the pdf to readable text. I will be using pdfx library to read and extract text data.
  3. A meaningful prompt and setting context will be done
  4. Access Open API API
  5. Receive response and show the summarized text.

Below are the instructions and code:

  1. Install Prerequisites

I am using pdfx library to read the pdf document. You can use any provider here.

pip install pdfx

We use OpenAI using LangChain so install the required dependencies

pip install --upgrade langchain langchain-openai tiktoken

2. Load Job Description (JD)

import pdfx
pdf = pdfx.PDFx('sample_data/Sample Resume.pdf')

resume_content = pdf.get_text();

3. Make the resume content compatible for LLM Chain

from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
model_name = "gpt-3.5-turbo"
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    model_name=model_name
)

# Caution: This code doesn't bother about large documents so chunking/tokenization is out of scope of this example
texts = text_splitter.split_text(resume_content)

docs = [Document(page_content=t) for t in texts]

4. Initialize OpenAI

from langchain_openai import ChatOpenAI
from google.colab import userdata


# Open AI API key is stored in the Secrets vault in Google Colab
OPENAI_API_KEY = userdata.get('openai_api_key')

llm = ChatOpenAI(
    temperature=0,
    openai_api_key=OPENAI_API_KEY,
    model_name=model_name)

5. Define summarization prompt

from langchain.prompts import PromptTemplate

# Use the prompt "List the skills mentioned in below resume:" to list the skills alone

prompt_template = """Summarize below resume:

{text}

"""

prompt = PromptTemplate(template = prompt_template, input_variables=["text"])

6. Summarization

from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import LLMChain

llm_chain = LLMChain(llm=llm, prompt=prompt)

#I am ignoring the chunking aspect
chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")
summary = chain.run(docs)

7. Print Summary

import textwrap
print(textwrap.fill(summary, width=100))

Example output: <name-removed> is a passionate researcher with a focus on cutting-edge technology such as Machine Learning, Computer Vision, and Deep Learning. He has experience as an Associate Data Scientist-Trainee at Lincode Labs and as an AI/ML Intern. His roles included data collection and cleaning, extending code modules, experimenting and deploying machine/deep learning models, and handling end-to-end processes. He has worked on various projects related to object detection, OCR detection, and classification in the manufacturing domain. <name-removed> has a Bachelor’s degree in Computer Science and skills in Python, machine learning platforms, frameworks, libraries, and tools. He has also worked on academic and personal projects related to border security systems and house price prediction.

Webinar: Securing the Skies- Navigating Cloud Security Challenges and Beyond

My upcoming webinar on “Securing the Skies- Navigating Cloud Security Challenges and Beyond” for FDPPI on July 26, 2023 at 7PM IST.

In this talk, I will explore the major topics surrounding cloud security, covering various scenarios, risk challenges, multi-cloud security, and mitigation strategies. Delving into cloud security patterns and best practices, attendees will gain a deep understanding of how to safeguard digital assets in the cloud. The discussion will also extend to API security and the latest developments in the realm of cloud security, equipping participants with valuable insights and practical knowledge to protect their data in a connected world. Don’t miss this opportunity to discover effective ways to defend against cloud-related threats and embrace the immense potential of cloud computing securely. Join me, and let us make this session an interactive one.

0cc9267c-74d7-4b2a-a384-863f653f7342