Project Name: AI-faq

Contributor Name: Peter Atef

Hello there, I’m Peter Atef, a senior student in the Computer Department, at the faculty of Engineering, at Cairo University.

I expect to graduate in fall 2024. My current GPA is 3.7 out of 4. Here is my website where you can find all the details about my career: skills, courses, past experiences…etc: Peter Atef (engpeteratef.github.io)

Contacts:

Email: peter.atef2000@gmail.com
Phone number: +201212773495

I am super interested in this program because I know how much experience anyone in the tech industry can gain from participating in open-source projects, regardless of the technical experience, a lot of soft skills are gained and developed which are so important for a senior student who is graduating in a couple of months. Also, once I learned about the program and AI-FAQ, I got very excited because I was very interested in the field of NLP and building apps that provide the user with the power of AI.

Due to the projects I participated in, previously, I had experience in almost all the technologies recommended to build this project.

For example, I had a three-month internship at VNCR where I was working on a Blog Writer website which was a website that takes input from the user and then creates a prompt that will be given to LLM to generate the blog for the user based on the user preferences like topic, number of words, and sources to collect the information from.

Project Summary and Proposed Plan

We can break down the problem into multiple stages:

Load the data
1. Load the data from websites, GitHub Repo, or maybe search for answers on Google and use the results as verbs.
Data processing
1. Text tokenization
Create embeddings
1. Using any embedding algorithm provided by HuggingFace or OpenAI, however, choosing the algorithm is critical in terms of time because the process of creating embeddings takes a lot of time.
Create a vector database
1. Using a cloud vector database to keep the history of the chat and also keep any data we got from the previous search process. We may use Qdrant to achieve this goal or deploy any vector database like chromaDB, Pgvector, or Faiss.
2. I have a great experience with vector databases because I've participated in implementing a vector database indexer project so I know they work and the algorithms used to implement them.
LLM
1. I want to talk about this point specifically because it’s critical, we want to use LLM cheap, efficient, and fast.
2. There are multiple greater models on Huggingface with an Apache 2.0 License and they are created for question-answering tasks like Intel/dynamic_tinybert, distilbert/distilbert-base-cased-distilled-squad, FlagAlpha/Llama2-Chinese-13b-Chat,
Create back-end APIs
1. I recommend using Python frameworks like Django to build our back-end and endpoints because the part that deals with LLM and gets its response will be implemented in Python, and to make the interface between them easy it’s a good idea to build the back-end using Python.
Create front-end
1. There is no doubt that React is a great front-end framework in terms of performance.
2. I think the website has the following features:
  1. Create an account: Sign in/ Sign-up
  2. Chat with LLM
  3. Show search history
  4. Clear History
  5. Continue as a guest (without memory)
Integrate the front end with the back end
Deploy the website

Suggestions

Instead of using React to build the website, we can use Flutter to take our service to the next level to have a single code base and our application will work on mobile phones (android or IOS), websites, and desktops.

Learning Progress

Natural Language Processing (NLP) course.
Studying more about the Langchain framework
Study React.js and Nest.js
Creating UI/UX for the website.