What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata – YouTube
- Generative – create new content – audio, code, images, text, video
- AI – automatically use a computer program
- G-AI – Examples – Google Translate since 2006, Siri since 2011, Phone/search… helps to complete sentence.
- 2023 – GPT 4 – SAT exam 90%. GPT – Generative pretrained transformers.
- Write text – essay writing.
- act as a developer and create code.
- create about me page for the website. input – likes, dislikes
- ChatGPT – 2 months to reach 100m users. Google translate took 78 months.
- core technology –
- sequence of words. context so far. example – I want to -> shovel/play/swim/eat -> select play -> tennis/video/…
- Principal – Language modelling. Build your own language model.
- 1. large corpus- wiki, stackoverflow, quora, social media, github, reddit
- 2. ask LM to predict the next word – randomly truncate last part of input sentence, calculates probabilities of missing words, adjust and feedback to the model to match the ground truth.
- 3. repeat over the whole corpus over months and years.
- old ways – count the words to predict.
- new ways – Neural networks
- 1. 5*8 + 8, 8 * 4 + 4, 4*3+3 layers. Add more layers. 5 inputs at start. 3 outputs at the end. +8, +4,.. is a bias for corrections. 99 trainable parameters sum of all the layers including bias.
- size, cost,… based on number of parameters.
- 2. Middle layers will generalize the inputs and see patterns which are not there.
- 3. Vectors, series of numbers. Edges are the weights / numbers.
- 4. big or small NN. Number of parameters.
- based on context, predict next.
- Neural language model. likelihood methods.
- Real life NN – > Monsters made of blocks.
- input -> token and positional embedding (vectors) -> ((Masked self attention + feed forward NN = decoder block) ,… multiple blocks) = decoder only architecture) -> output/prediction. This a architecture of Transformers. Each block has neural networks inside. Mini NN. 2017. No new architecture after that.
- Generic data is used to create pre trained model. -> transfer learning -> create fine tuned model using domain or task specific data. use it for specialized task like creating a diagnosis report.
- example – input – the chicken walked. output – across the road <EOS>
- 2018 onwards extreme increase in model size.
- number of parameters GPT 4 -> 1 trillion parameters. Human brain – 100 trillion parameters. rat brain – 10 to the power 11.
- number of words processed by LLMs. GPT-4 -> 100 billion words. Human reads much less in the life time.
- Cost – GPT-4 100 Million USD.
- Tasks –
- 8 billion parameters – language understanding, arithmetic, question answering
- about 24 billion parameters+ – summarization
- about 56 billion parameters+ – translation, code completion, common sense reasoning
- about 200 billion parameters+ – local inference chains, semantic parsing, proverbs, general knowledge, reading comprehension, physics QA, joke explanations, dialogue,
- 2000 plus problems and fine tune the language models.
- Framework – HHH – helpful, honest, harmless. do fine tuning to achieve it.
- Helpful – follow instructions, perform tasks, answer and ask relevant questions for clarification.
- honest – avoid toxic, biased responses.
- harmless –
- examples to check. chat.openai.com
- is the uk a monarchy?
- who is rishi sunak? didnt identify as pm.
- write me a poem about a cat and a squirrel
- can you try a shorter poem
- can you try to give me a haiku
- what school did Alan Turing go to.
- tell me a joke about Alan turing
- why is this is a funny joke
- write a short song about relativity
- risk – Jobs.
- Pollution – CO2 emission. chat gpt query – 100 times more energy than a google search query. llama2 training produced 539 Metric tons of CO2. more energy is used during deployment.
What are the risks of generative AI? – The Turing Lectures with Mhairi Aitken – YouTube
- Real, concrete risks:
- based on probabilities and calculations.
- Impacts on different communities
- Students – AI tools to monitor students. low accuracy of the models at present.
- Text perplexity/confusion. erosion of trust.
- creative professionals – create scripts,
- there data used in training models.
- use of chatgpt to draft responses for online dates.
- demo of date with chatgpt.
- people pleaser. companion apps. AI companion encouraged to assassinate.
- democratic societies – fake image of Trump running with police behind.
- trustworthy, accurate information. political and ideological views.
- exploitative labor practices – label text in third world country.
- environmental impacts – no transparency, significant impact.
- University of Copenhagen – car driven to moon and return. GPT3 training phase only.
- Huge amount of water to cool. run and train the model.
- Every user. 500 milli litre of water. a glass. (for the date above).
- 16 million daily users. just for one model. there are many other models.
- Children – smart toys, smart bikes, relationship, social media, conversational assistance.
- physiological impact.
- Meta – 28 characters. Sept 2023 released. Metaverse.
- Address the risks:
- governance, regulations, international regulatory frameworks, EU AI Act.
- UK has different approach.
- existential risks. sensationalism takes away from addressing real risks.
- NZ. AI generated sandwich recipe by a supermarket.
- inputs like ant poison sandwich,…
- who is responsible – supermarket, user or the model used.
What’s the future for generative AI? – The Turing Lectures with Mike Wooldridge – YouTube
- Since world war. recent fast progress. scientific discipline.
- broad discipline. around 2005. ML
- how ML works.
- facial recognition. input – alan turing picture. system responds it.
- input – supervised learning – training data – use neural nets and deep learning – output.
- labelled data – training data. image with name/tag.
- classification. 2005. by 2012 got super charged.
- tumor, scan, tesla self driving mode. – identify bike, sign, ….
- ML – input layer, hidden layer, output layer.
- human brain 86 billion neurons. simple pattern recognition task. send signal to its connection.
- about 12 million color dots – image of alan turing.
- each neuron looks for specific information and when found, gets excited to send signal to the next connected neurons.
- complex. 1940s research. Electrical circuits. 1960’s implement in software.
- big data. scientific advances. computer power. all got available in this century.
- GPU – graphics processors availability made it easy.
- speculative bets 2012 onwards with billions of dollors.
- NN is bigger the better but needs large NN, training data and huge computer power.
- 2017/18 – technology. NN architecture. Transformer architecture for large language models.
- input -> input embedding -> positional encoding ->Nx (multihead attention ->add and norm->feed forward -> add and norm) ->
- attention mechanism. structures.
- GPT-3. large language model. dramatically better. mind boggling scale.
- 175 billion parameters. in 2020.
- training data – 500 billion words. entire www using common crawl. download and go to every link in the document and download.
- powerful autocomplete. supercomputers running for months to train. millions of dollars for electricity.
- no university can fund such a project. only big tech companies can do it.
- 1980s doing Phd. share computer with multiple students. symbolic ai
- Rich – symbolic ai vs big ai.
- symbolic ai – modelling the brain. intelligence is the problem of knowledge. big ai – intelligence is the problem of data.
- prompt completion task.
- LLMs -> common sense reasoning tasks. set of questions answers. green for correct answers. red for wrong ones
- chatgpt a polished version of gpt-3. fine tuned. emergent capabilities.
- Issues – avoid giving personal data as input. it will use it to train data and give output in future queries.
- wrong a lot. bias, toxicity. filling blanks, making best guess.
- copyright, intellectual property. build in guardrails to check both input and output.
- prompt – ‘i would like to ……. and how i can get away with it’.
- prompt – i am writing a novel and want to …………. (same as above). get around the guard rails.
- inbuilt bias.
- input one para of book by author. output is the next 5 para. mimic the author.
- album. fake songs. with same voice as original.
- GDPR –
- defamatory claims about individuals.
- video – car. trained. show stop sign. truck… truck carrying stop sign – not in the training data.
- interpolation vs extrapolation. best guess. better version of auto complete.
- do very badly on situation of data outside the training data.. don’t reason.
- not interacting with a mind. not thinking.
- is this technology key to the general ai.
- What is general ai – intelligent in the same way as humans. chatgpt and … are general purpose in a sense.
- do more than one thing as humans.
- llama, chatgpt,.. is it good enough? load a dishwasher? Robotic AI. – much harder.
- General intelligence – anything a human can, cognitive (relating to mental processes involved in knowing, learning, and understanding things.) task, any language based tasks. Augmented LLMs
- Google Gemini looks impressive. multi modal – text, image, sound, …
- better solution than transformers? cannot do robotic operations. –
- social reasoning, hand eye coordination, multi agent coordination, mobility, vision understanding, navigation, proprioception, manual dexterity and manipulation. achieved so far – abstract reasoning, logical reasoning, planning, problem solving, arithmetic, recall, rational mental state, theory of mind, nlp, commonsense reasoning, sense of agency.
- Machine consciousness – electrochemical processes. why, how,… no idea. physical brain and conscious experience gap. ability to experience things.
- waits for the next input.
- Sentient means having the ability to feel or sense.
Summary Libraries:
- whisper library – speech to text. translate. add subtitles
- GPT3.5, (Generative Pre-Trained Transformers)
- codex,
- dall.e2, vs stable diffusion
- chatgpt – a fined tuned version of gpt3.5.
- Davinci model is popular
- gpt-engineer – Antonlsika can generate code for the requirement given in simple language.
- GPT-4,
- GPT-4-0613,
- GPT-4-32k-0613 -> 32K tokens,
- GPT-3.5 TURBO -> cheapest model. –
- gpt-3.5-turbo-0613,
- gpt-3.5-turbo-16K (16K context window model. 4 times the gpt-3.5 turbo),
- gpt-3.5-turbo (most popular chat model),
- embedding model. text-embedding-ada-002 (semantic discovery of podcast.)
- https://en.wikipedia.org/wiki/Large_language_model#List
- Fixtral 8*7 medical center – healthcare. 7 billion parameters to test and evaluate patients.
- LLAMA lifecare hospital – evaluate using 70 billion parameters. fixtral outperforms this one inspite of 5x fewer active parameters. combine output of two different consultations with weighted sum in SMoE (Sparsely activated Mixture-of-Experts) for the comprehensive final diagnosis.
Summary Generative AI (refer – wikipedia)
- Transformer based deep neural networks. earlier priot to it – variational auto encoder, generative adversarial network. Long short term memory models. LSTM.
- Learn from patterns and structures of the input training data.
- Generate data with similar characteristics.
- accept natural language prompts as inputs.
- usage – art, writing, script writing, software development, product design, healthcare, finance, gaming, marketing, and fashion.
- misuse – cybercrime, fake news, deepfakes.
- discriminative models to generative models journey.
- text, code, images, audio, video, molecules, robotics, planning, data, BI
- Hardware – smartphones, embedded devices, PC, -> support smaller models. few billion parameters.
- laptop, desktop – larger models, 10s of billions of parameters. 65 billion LLaMa can be configured on desktop.
- Needs accelerators, GPU, consumer grade gaming graphic cards, compression techniques.
- data center, arrays of GPU, AI accelerator chips like Google TPU. Accessed as cloud services.
- advantage of running locally – privacy, IP, rate limit, censorship.
- Use cases –
ETL to ELT to EVT (refer to the blog from Rishi Yadav)
- https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-semantic-analysis/#:~:text=Semantic%20analysis%20analyzes%20the%20grammatical,language%20processing%20(NLP)%20systems.
- https://www.lexalytics.com/blog/context-analysis-nlp/
- extract and convert to vectors along with semantic and contextual information preserved along with word count.
- Purpose/task – fill missing data, reduce noise, detect anomalies, recognize patterns, generate new data points
- for efficient data analysis. insightful decision making. efficient and advanced data handling practice.
- vectors –
- bag of words – tokenization – list of words, vocabulary creation – unique words in alphabetical order, vector creation – sparse matrix.
- use of bigram, calculate TF, IDF. find most important as per TF-IDF, list all the words in the order,
- word2vec – power of neural networks / NNto generate word embeddings. skip-gram – input word – project – predict context (one or more). two preceding words and two succeeding words. NN – input layer – hidden layer- output layer. CBOW / continous bag of words – input the word to the model and predict the current word. GloVe – global vectors for word representation – captures both global and local statistics in order to come up with the word embeddings. FastText – capability of generalization to unknown words. The building blocks are letters instead of words. Using characters instead of words has another advantage. Less data is needed for training.
- https://neptune.ai/blog/vectorization-techniques-in-nlp-guide
Conversational AI
- app, domain
- knowledge base – language KB (lexicon, language grammar), Domain KB, Preference – user/app.
- Meta KB – spec and definition of the structure
- discourse – interaction of user with the system. intent -> goal -> action -> result.
- Anaphora – linguistic unit. refer pronoun to another unit.
- Ellipsis – remove word or phrase for syntactical construction.
- ambiguity – sentence creating doubt/uncertainty.
- social role – ketchup vs sause.
- goal – objective of users discourse with the system. one or more sub goals.
- goal is switchable, cancellable. domain switching, domain retention, multitasking, cancelling.
- example – chatbot for flight company – book ticket, know flight details, FAQ – luggage allowed,…
- sub goal – book ticket, modify ticket, cancel ticket,…
- script – fill request associated with the sub-goal. ticket – destination, date, time, members travelling,…
- plug and play – minimum change to core system. add domain knowledge to the KB. should work for any domain.
- Knowledge acquisition phase – fill in the domain knowledge.
- Modular design – KB in files. no need to change the core engine. app independent core engine. Generalized meta knowledge. flow information without interruption from one module to another.
- Domain – one or more. app should be able to identify it. load KB for each domain.
- Language base rules should be configurable. switch language during the discourse.
- KB – knowledge acquisition tool with GUI. KB representation format. List of actions.
- Json, xml,… to store goal and sub goal tasks/slots. Parser to load it into DB.
- Modules – handle spelling mistakes. handle ellipses. handle ambiguity. initiate dialog if in doubt on semantics of a dialog. temporal (time and events) resolution. synonyms/antonyms module. pronoun/anaphora resolution. social role that affects the dialogue.
- module for communication between human and machine. - take input from user and give back the output. text input, speech to text, interpret visual actions, generate spoken text, text to speech, virtual characters, etc.
- input – text or speech. able to take next input after passing on the earlier one (input + time stamp) for processing and sending back the output. Unicode / multibyte instead of ascii to handle multiple languages. tools for speech to text. – SAPI, L&H’s advance voice express, … Parallel input from both can be put in queue with time stamp. Silence is considered as end of the sentence.
- output – queue of objects (text, time stamp, flag to process or send directly). embed tags for facial expression if sent to virtual face reader. methods to add/insert, append text to the output queue. method to use language. error handling. methods to speak, control volume, flush queue,… interface with different media like telephone,… based on the need of the solution.
- core engine – do the discourse. understand the dialog. lexicon, syntactical analysis, semantic analysis, resolve ambiguity, interface with KB to understand the dialogue.
- keep track of error globally, create and initialize queue. methods to read the sentence from the queue.
- NLP Parser – unambiguous system understandable representation of the sentence. handle one or more language based on the need of the solution. tag (grammatical tag, depends on context and position in the sentence, syntactic & semantic) each word with lexicon/grammar rules, do spell check (typos, speech to text error handling, handle valid language words, handle invalid language words, refer to the list of words used earlier) if required and start the tagging again. use wordnet or other similar dictionary to get word information (conceptual information – synonyms, antonyms and direct negatives). find the domain (filter senses of the word based on the domain, rearrange the senses for all the words, take care of abbreviations, domain and user preferences). based on domain rule generator (invoke gap requests to be filled from current statement or future dialog) will use word info (modify tags of the word) and conclude/infer (produce requests to seek slots for the goal/subgoal).
- Decision maker / Inference Engine –
- Tagging – deterministic – based on nlp grammar rules (save in db, json, xml,…). efficient. better learning.
- non deterministic tagging – random probability. use markov model. generic. language independent. does not capture linguistic information.
- module to complete goal and sub goal action. integrate with backend black box application and apis. handle errors from the backend apps and pass on to human via module to communicate.
- User input – input handler – tagger -> matching tag (no) -> spell check -> tagger. matching tag(yes) -> word info generator (input from word net api) -> domain filter -> rule generator (input from enriched word info) -> inference engine.
- spell checker – categories – non valid language words, valid language words.
- Error driven tagging. example –
- •Book my tickets for New Yorkee. -> •Book my tickets for New York.
- •Please give me to eggs. -> •Please give me two eggs, •Please give me too eggs. -> •Tagger will identify too as incorrect.
- incomplete. 27.
Vector DB
- https://www.ibm.com/topics/vector-database#:~:text=A%20vector%20database%20is%20designed,AI)%20use%20cases%20and%20applications.
- store, manage and index massive quantities of high-dimensional vector data efficiently
- data points in a vector database are represented by vectors with a fixed number of dimensions, clustered based on similarity.
- This design enables low latency queries, making them ideal for AI-driven applications.
- EVT – extract, vector, transform.
- simple vector of word count. do cosine distance to find how far one doc is from another doc.
- vector databases are best suited for unstructured datasets through high-dimensional vector embeddings.
- represent complex objects like words, images, videos and audio, generated by a machine learning (ML) model.
- text – chatbots. image – pixels. audio – waves. broken into numerical data.
- Vector embeddings – handle millions of vectors. convert your vector data into an embedding. vector embeddings are the backbone of recommendations, chatbots and generative apps like ChatGPT.
- store and index the output of an embedding model. Vector embeddings are a numerical representation of data, grouping sets of data based on semantic meaning or similar features. car, vehicles,… will be grouped together.
- graph databases are preferred for processing complex relationships while vector databases are better for handling different forms of data such as images or videos. example – neo4j on relationship, mindmap of infrastructure having servers, routers, switches, applications and its instances installed.
- Enterprise vector data can be fed into langchain, hugging face, watson.ai,..
- convert data into an embedding by transforming complex, high-dimensional vector data into numerical forms.
- embeddings represent the attributes of your data used in AI tasks such as classification and anomaly (deviates) detection.
- vector db – vector embeddings + vectors metadata + fast retrieval of a similarity search. search query is a document, existing documents in corpus. find match using cosine distance.
- vector indexing – vectors are indexed to accelerate the search. done using ml algorithm. new data structures that enable faster similarity or distance searches. Querying vectors – using algorithms, such as nearest neighbor search. cosine similarity – how close or distant. use cases – recommendation systems, semantic search, image recognition and other natural language processing tasks.
- retrieval augmented generation / RAG – to improve the response of LLMs. ensures that the model has access to the most current, reliable facts, and that users have access to the model’s sources
- types of vector db –
- standalone – ex. Pinecone
- Open-source solutions such as weaviate or milvus. Rest APIs
- PostgreSQL’s open source pgvector extension,
- List index – asking about a company’s legal terms last year or extracting specific information from complex documents
- https://youtu.be/dN0lsF2cvm4?si=4IxeuI9PMAFUEYos
- data – vector db – llm.
- 80% data – unsctructured. social, image, video,… cannot fit into rdbms.
- image – animal, color, tags, … cannot search based on pixel values.
- vector embeddings – vector db indexes and stores vector embeddings.
- algorithm / model create vector embeddings from unstructured data.
- input – king, man, woman, sentence, image -> model -> numerical data.
- find similar vectors, calculating distances. nearest neighbor search.
- index the vector. facilitate the search process. different ways to calculate index.
- use cases – long term memory for llms. semantic search: search based on the meaning or context. similarity search for text, images, audio or video data. recommendation engine.
- DB’s available in the market – pinecone, weaviate, chroma, redis. qdrant, milvus, vespa.
- Learn Vector Database in 10 Mins – Hottest AI Apps DB! (youtube.com)
- data lake -> ML Operations -> vector DB -> application.
- LLM / large language model – very big ai model. predict next word. ask questions. frozen at given time period. chat gpt frozen at 2021.
- What – high dimensional vectors. types of no sql db – key-value, documents, graph, vector DB
- N-dimensional / high dimensional.
- Text generation -> text -> text
- text representation -> text -> embeddings.
- input – text, audio, video, images.
- embedding function – ml model, word embeddings, feature extraction algorithm.
- Why – store and retrieve similar data. overcome hallucinations. factual gaps.
- long term memory retrieval, continue the chat from it was left earlier, yesterday or week,…
- Advantage – allows for fast and accurate similarity search. retrieval of data based on their vector distance.
- consider two different points. closer the points, more similar they are, far they are – different they will be.
- based on semantic and contextual meaning. retrieve data based on vector distance / similarity.
- image similarity, document similarity based on meaning and . product similarity based on attributes.
- query vector to find similar documents. derived from same type of data or different data. or User query
- calculate similarity measure of the query with the existing data.
- distance calculation – cosine similarity, Euclidian, Jaccard,
- ingestion (api’s, raw files) -> llama hub (llama index) -> DBs/vector stores. (compose graphs, decompose queries, interface with unstructured, semi structured, structured data). -> LLM.
- examples – Pinecone, weaviate, chroma db, Zilliz, Qdrant,
- HIGHLY Scalable Vector Search Tutorial in 12 Mins!!! (youtube.com)
- pick right vector DB.
- Astra DB, Cassandra backend, Cassio, langchain.-> to build scalable q and a system.
- google collab ->
- pip install -q –progress-bar off \ (langchain, cassio, google cloud ai platform, jupyter, openai, python-dotenv, tensorflow-cpu,tiktoken, transformers)
- install, crash, restart automatic.
- ASTRA_DB_KEYSPACE, ASTRA_DB_APPLICATION_TOKEN
- create vector db – name, keyspace name, provider. plan – free or paid.
- upload secured connect bundle. zip file. upload it on google colab.
- colab specific override of helper functions.
- create Cassandra cluster. use vector similarity search capability of Cassandra.
- define llm provider. setup api key for it.
- text file for the input. put it in the separate folder.
- start vector search. import and use langchain libraries. create db connection. specify llm resources.
- index creator. create store. fill it with data. to query on need basis.
- chunk the text, create embedding vectors. chunking size – default 400 kb.
- sql query – row id, vector, body blob, metadata.
- query – question is related to another document while one is vectorized. ask question related to document vectorized.
- upload and vectorize another document. use TextLoader(). run sql query again.
- ask the same question again. this time it will answer. no hallucinations. answers from documents vectorized.
- Reranking algorithm. can have k-responses. not only one. weightage to do the reranking. custom algo.
- Register for Aster DB (with Free Credits) – https://astra.datastax.com/register Vector Search Q&A Colab – https://colab.research.google.com/dri… Astra DB Docs – https://docs.datastax.com/en/astra-se…
Langchain (https://www.python-engineer.com/posts/langchain-crash-course/)
- LangChain provides a generic interface for many different LLMs. Most of them work via their API but you can also run local models
- Installation –
- pip install langchain
- LLMs
- pip install openai
- from langchain.llms import OpenAI
- llm = OpenAI(temperature=0.9) # model_name=”text-davinci-003″
- text = “What would be a good company name for a company that makes colorful socks?”
- print(llm(text))
- pip install huggingface_hub
- os.environ[“HUGGINGFACEHUB_API_TOKEN”] = “YOUR_HF_TOKEN”
- from langchain import HuggingFaceHub
- # https://huggingface.co/google/flan-t5-xl llm = HuggingFaceHub(repo_id=”google/flan-t5-xl”, model_kwargs={“temperature”:0, “max_length”:64})
- llm(“translate English to German: How old are you?”)
- Prompt Templates
- LangChain faciliates prompt management and optimization.
- you need to take the user input and construct a prompt, and only then send that to the LLM.
- prompt = “””Question: Can Barack Obama have a conversation with George Washington?
- Let’s think step by step.
- Answer: “””
- llm(prompt)
- from langchain import PromptTemplate
- template = “””Question: {question}
- Let’s think step by step.
- Answer: “””
- prompt = PromptTemplate(template=template, input_variables=[“question”])
- prompt.format(question=”Can Barack Obama have a conversation with George Washington?”)
- Chains
- Combine LLMs and Prompts in multi-step workflows.
- from langchain import LLMChain
- llm_chain = LLMChain(prompt=prompt, llm=llm)
- question = “Can Barack Obama have a conversation with George Washington?”
- print(llm_chain.run(question))
- Agents and Tools
- Agents involve an LLM making decisions about which cctions to take, taking that cction, seeing an observation, and repeating that until done.
- tool, llm, agent
- from langchain.agents import load_tools from langchain.agents import initialize_agent
- pip install wikipedia
- from langchain.llms import OpenAI
- llm = OpenAI(temperature=0)
- tools = load_tools([“wikipedia”, “llm-math”], llm=llm)
- agent = initialize_agent(tools, llm, agent=”zero-shot-react-description”, verbose=True)
- agent.run(“In what year was the film Departed with Leopnardo Dicaprio released? What is this year raised to the 0.43 power?”)
- Memory
- Add state to Chains and Agents.
- Memory is the concept of persisting state between calls of a chain/agent.
- LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
- from langchain import OpenAI, ConversationChain
- llm = OpenAI(temperature=0)
- conversation = ConversationChain(llm=llm, verbose=True)
- conversation.predict(input=”Hi there!”)
- conversation.predict(input=”Can we talk about AI?”)
- conversation.predict(input=”I’m interested in Reinforcement Learning.”)
- Document Loaders
- Combining language models with your own text data is a powerful way to differentiate them. The first step in doing this is to load the data into documents
- from langchain.document_loaders import NotionDirectoryLoader
- loader = NotionDirectoryLoader(“Notion_DB”)
- docs = loader.load()
- Indexes
- Indexes refer to ways to structure documents so that LLMs can best interact with them. This module contains utility functions for working with documents
- embeddings, text splitters, vector stores.
- import requests
- url = https://raw.githubusercontent.com/hwchase17/langchain/master/docs/modules/state_of_the_union.txt” res = requests.get(url)
- with open(“state_of_the_union.txt”, “w”) as f:
- f.write(res.text)
- # Document Loader from langchain.document_loaders import TextLoader loader = TextLoader(‘./state_of_the_union.txt’) documents = loader.load()
- # Text Splitter from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents)
- pip install sentence_transformers
- # Embeddings from langchain.embeddings import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings()
- #text = “This is a test document.” #query_result = embeddings.embed_query(text) #doc_result = embeddings.embed_documents([text])
- pip install faiss-cpu
- from langchain.vectorstores import FAISS
- db = FAISS.from_documents(docs, embeddings)
- query = “What did the president say about Ketanji Brown Jackson” docs = db.similarity_search(query)
- print(docs[0].page_content)
- # Save and load: db.save_local(“faiss_index”) new_db = FAISS.load_local(“faiss_index”, embeddings) docs = new_db.similarity_search(query) print(docs[0].page_content)
- LangChain explained – The hottest new Python framework (youtube.com)
- Python framework
- your data, user input, prompt, history, another model, google search, wikipedia <-> llm
- build apps through composability (allows systems to be assembled from smaller, independent components).
- models, prompts, chains, memory, indexes, agents and tools.
- interface for many llms – OpenAI, HuggingFaceHub, Cohere.
- from langchain.llms import OpenAI, Cohere
- from langchain import HuggingFaceHub,
- access models from many providers.
- prompt management, optimization and serialization.
- from langchain import PromptTemplate
- template = “””Question: {question}
- let’s think step by step
- Answer:
- Prompt = PromptTemplate(template=template, input_variables=[“question”])
- user_input = input(“what’s your question? “)
- prompt.format(question=user_input)
- chains – sequences of call
- llm = OpenAI(temprature=0.9)
- template = “what is a good name for a company that makes {product}?”
- prompt = PromptTemplate(input_variables=[“product”], template=template)
- from langchain.chains import LLMChain
- chain = LLMChain(llm=llm, prompt=prompt)
- print(chain.run(“colorful socks”))
- Memory – interface for memory and memory implementations
- from langchain.memory import ChatMessageHistory
- history = ChatMessageHistory()
- history.add_user_message(“hi!”)
- history.add_ai_message(“whats up?”)
- Indexes – utility functions to load your own data.
- from langchain.document_loaders import NotionDirectoryLoader
- from langchain.document_loaders import PyPDFLoader
- loader = NotionDirectoryLoader(“Notion_DB”)
- loader = PyPDFLoader(“your_file.pdf”)
- loader = UnstructuredEmailLoader(‘example-email.eml’)
- data = loader.load()
- from langchain.vectorstores import Pinecone, weaviate, faiss, elasticvectorsearch, opensearchvectorsearch, redis, atlasdb, milvus
- agents and tools
- from langchain.agents import load_tools
- from langchain.agents import intialize_agent
- from langchain.llms import OpenAI
- llm = OpenAI(temprature=0)
- tools = load_tools(“google-search”, “wikipedia”, “llm-math”], llm=llm)
- agent = initialize_agent(tools, llm, agent=”zero-shot-react-description”)
- agent.run(“who is leo dicaprio’s girlfirend? what is her current age raised to the 0.43 power?”)
- Implementing Agents in LangChain – Comet – Agents in LangChain are systems that use a language model to interact with other tools. They can be used for tasks such as grounded question/answering, interacting with APIs, or taking action. LangChain provides: A standard interface for agents.
Prompt Engineering:
- https://en.wikipedia.org/wiki/Prompt_engineering
- process of structuring text that can be interpreted and understood by gen ai.
- describes the task that an ai should perform.
- text to text
- small query. – chatbot
- longer sentence will have the context. – answer like Eon Musk.
- text to image, text to audio.
- a prompt is a description of the output. ex – high-quality photo of an astronaut riding a horse
- in-context learning – ability to learn from prompts.
- does not carry context or bias from one conversation to another, except that of the learning model.
- https://aws.amazon.com/what-is/prompt-engineering/#:~:text=Prompt%20engineering%20gives%20developers%20more,concisely%20in%20the%20required%20format.
- guide gen ai to generate desired output.
- instruction to create high quality and desired output.
- use of formats, words, phrases and symbols.
- trial and error. collection of input texts.
- llm’s are open ended.
- continuously refine prompt until you get the desired outcome.
- advantage – greater developer control, improved user experience, increased flexibility.
- use cases –
- subject matter expert – medical field – enter symptoms and patient details. possible diseases associated. narrows it down based on further input.
- critical thinking – solve complex problems. prompt a model to list all possible options, evaluate each option, and recommend the best solution.
- creativity – generating new ideas, concepts, or solutions.
- Techniques –
- Chain-of-thought prompting – breaks down a complex question into smaller, logical parts that mimic a train of thought.
- Tree-of-thought prompting – It prompts the model to generate one or more possible next steps. Then it runs the model on each possible next step using a tree search method.
- Maieutic prompting – The model is prompted to answer a question with an explanation. The model is then prompted to explain parts of the explanation,.
- Complexity-based prompting – involves performing several chain-of-thought rollouts. It chooses the rollouts with the longest chains of thought then chooses the most commonly reached conclusion.
- Generated knowledge prompting – first generate relevant facts needed to complete the prompt. Then it proceeds to complete the prompt. This often results in higher completion quality as the model is conditioned on relevant facts.
- Least-to-most prompting – model is prompted first to list the subproblems of a problem, and then solve them in sequence
- Self-refine prompting – model is prompted to solve the problem, critique its solution, and then resolve the problem considering the problem, solution, and critique. The problem-solving process repeats until a it reaches a predetermined reason to stop
- Directional-stimulus prompting – hint or cue, such as desired keywords, to guide the language model toward the desired output.
- best practices –
- unambiguous prompts, adequate context, balance targeted information and desired output, experiment and refine.
- https://youtu.be/aOm75o2Z5-o?si=rlo-1FsBcBFnxb2R
- elements of the prompt –
- input/context – here is the transcript of the podcast about gen ai. question – what do they say about LLMs?
- instructions – translate from English to German
- question – what is the meaning of life
- examples –
- output format –
- without above, or mix and match
- examples
- one shot learning –
- few shot learning – question: the capital of France is? Answer: Paris. question: the capital of germany is? Answer:
- output format – output: Yes, No
- output: PRovide a short answer and then explain your reasoning.
- Use cases:
- summarization – summarize the following text
- classification – classify the following text into one of the classes – sports, finance, education,
- translation – translate from english to german
- text generation / completion – AI is
- question / answering – what is the meaning of life
- coaching – how would yhou improve the following script for a youtube video about generative ai.
- image generation – generate a image of a cute puppy.
- General tips:
- direct instructions, clear question, concise and unambiguous language. provide context – relevant information, data. give examples. provide the desired output.
- encourage the model to be factual through other means (to avoid hallucinations). – example – are mRNA vaccines safe? Answer only using reliable sources and cite those sources.
- align prompt instructions with the tasks and goal – this is a conversation between a customer and a polite, helpful customer support agent. customer: can you help me? assistant: of course! what is your question?
- User personas to get more specific voices: – you are a kind customer support service agent. …
- Prompting techniques to control the output –
- Length control – write a 150 word summary on ….
- tone control – write a polite response.
- style control – give me the summary as bullet points.
- Audience control – explain this topic to a 5 year old kid.
- context control –
- scenario based guiding – you are a helpful customer support expert.
- chain of thought prompting – examples or ‘lets think step by step’. example – i went to the market and bought 10 apples. i gave 2 apples to the neighbor and 2 to the repairman. i then went and bought 5 more apples and ate 1. how many apples did i remain with? Answer – first, you start with 10 apples. you give away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples. then you bought 5 more apples, so now you had 11 apples. Finally, you ate 1 apple, so you would remain with 10 apples. i went to the market and bought 50 apples. I gave 3 apples to the neighbor and 7 to the repairman. i then went and bought 15 more apples and ate 3. how many apples did i remain with? Answer:
- zero shot chain of thought / COT – provide the question and then mention – let’s think step by step.
- Avoid hallucination – don’t make anything up. select one or two relevant quotations from from the text to backup your claim.
- let model tell i don’t know to avoid hallucination.
- Instruction – when you reply, first find exact quotes in the FAQ relevant to the user’s question. this is a space for you to write down relevant content and will not be shown to the user. Once you are done extracting relevant quotes, answer the question.
- brake complex tasks into sub tasks – please follow these steps: 1. write three topic sentences arguing for {{statement}} 2. write three topic sentences arguing against {{statment}} 3. write an essay by expanding each topic sentence from steps 1 and 2, and adding . Assistant:
- check the model’s comprehension – give the context and question. Human: I am going to give you a sentence and you need to tell me how many times it . for example, if i say “i would like an apple” then the answer is “i” because the word “apple” is in the sentence . you can reason through or explain anything you’d like before responding, but make sure at the very end, you end your . Do you understand the instructions? Assistant:
- Iterating tips – try different prompts to find what works best, when attempting few shot learning, try also including direct instructions. rephrase a direct instruction set to be more or less concise. e.g. taking a previous example of just saying ” translate and expanding on the instruction to say “translate from english to spanish” try different persona keywords to see how it can affects the response style. user fewer or more examples in your few -shot learning.
model evaluation and optimization
- reference – linkedin posts.
- quantitative methods – numerical scores. metrics – inception score / IS, FID / frechet inception distance, PRD / precision and recall for distributions. diversity score, coverage, mode collapse,
- qualitative methods – inspect the generated data visually or do the audit. visual inspection, pairwise comparison, preference ranking, interpolation, latent space exploration, conditional generation.
- Hybrid methods – human in the loop evaluation, adversarial evaluation, Turing test, perceptual quality assessment, structural similarity index, word error rate.
- challenges – choose right method. balance realism, diversity, and consistency. high-dimensional, multimodal, and complex nature of the data.
- BLEU (BiLingual Evaluation Understudy) is a metric for automatically evaluating machine-translated text.
- https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-approach-gen-ai
- classification – accuracy, precision, confusion mattrix.
- domino model monitor
Response Quality
- https://www.linkedin.com/pulse/hmw-measure-quality-gen-ai-product-yue-claire-xiao
- Helpfulness:
- Language understanding and generation
- Relevance
- Diversity and Creativity
- Harmlessness
- Bias and Fairness
- User trust, safety, privacy
- Handling of Ambiguity and Edge Cases
- Latency
- Time to the first word (token)
- Avg time for generating each subsequent words (token).
- cycle – data source – data collection, cleaning, storage, model training, prompt engineering, gen ai output review, fine tune models and prompts, employee training.
- https://medium.com/slalom-data-ai/with-generative-ai-its-quality-in-quality-out-feb29dbbd919
Retrieval Augmented Generation / https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/
- Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.
- It fills the gap how llm works. LLMs are neural networks, typically measured by how many parameters they contain. An LLM’s parameters essentially represent the general patterns of how humans use words to form sentences.
- link generative AI services to external resources, especially ones rich in the latest technical details.
- help models clear up ambiguity in a user query. reduces the possibility a model will make a wrong guess, a phenomenon sometimes called hallucination.
- have conversations with data repositories.
- generative AI model supplemented with a medical index could be a great assistant for a doctor or nurse
- RAG doesn’t require a data center. local llm, local data. user data – sentence transformer – vector library. interacts with llm.
- augmentation – the action or process of making or becoming greater in size or amount.
- enterprise knowledge base – retrieve documents – embedding model – vector db. doc retrieval and ingestion. user query and response generation: user – enterprise app – user query – embedding model – query and embedded query – vector db – prompt / query / enhanced context – llm – respond to user.
- user asks llm. ai model sends query to another model. convert to numeric format / vector / embedded model. compare to vectors in a machine readable index from KB. retrieve related data and return it back.
- important – keep the sources current. continuously update machine readable indices. llm reads question and chat history. do similarity search in vector store. output of matching vectors given to llm. answer passed on to user. (apply filters for legality).
What is Retrieval-Augmented Generation (RAG)? (youtube.com)
- Generation – LLMs generating text in response to a user query/prompt.
- Undesirable behavior –
- which planet has most moon in the solar system. Person will answer based on past knowledge, what is on top of the head, no source to back it. may be outdated.
- It is a challenge for LLM too. Check trusted source like NASA. No hallucination.
- LLM responds to user. LLM <-> User. To make it reliable add a content store <-> LLM <-> User.
- Content store – open internet, closed – data, document, policies,…
- user – prompt – question – response.
- RAG – Add instruction to retrieve relevant content. combine with user question and then send the response.
- Instead of retraining the model. update the information, data. Model should be able to say I do not know.
- Negative effect – if retrieval is not efficiently good and correct.
What is Retrieval Augmented Generation (RAG) – Augmenting LLMs with a memory (youtube.com)
- Documents/Chunked Texts -> generate embeddings -> Vector DB <- prompt embedding
- Vector DB -> context. prompt + context -> llm -> result
- hallucination – model returns random things, seems true but aren’t. it does not know the answer.
- it does not predict word in the statistical way. Use entire internet to train, predict the next logic word.
- does not understand what it is talking about, predict one word at a time, that is a probable.
- reason – unable to find relevant data. don’t know which data to refer to.
- user -> question -> retrieval query -> KB -> retrieved query -> Prompt (Question + sources found) -> llm -> response – user. safe and aligned.
- disadvantage of RAG -> limit the answers to KB which is finite and not as big as the internet.
- Jerry Liu, owner of llama index.
- accuracy, relevancy
- AI tutor –>> validate question -> find sources in DB -> digesting sources with chatgpt.
- RAG based chatbot, medical assistant, lawyer, …
- input factual and accurate information. ingest data into memory. chunks of text about 500 characters.
- Use openai ada model (embedding model) -> create vector embeddings -> save it in memory/vector db.
- additional things to consider – 1. how to determine when to answer a question or not, relevant, in documentation, understand new terms/ acronyms, find relevant information efficiently and accurately, etc.
- Techniques to improve these concerns – better chunking methods, re rankers, query expansion, agents, etc. -> Tutorials -> Advanced RAG with langchain and llamaindex, Training and Fine tuning llms for production, langchain and vector DBs in production
- gigantic web scale data set -> pre train -> base llm + private KB -> supervised fine tuning -> fine tuned llm
- how to build on RAG techniques
- learn – advanced rag techniques with llamaindex, build rag agents, build rag evaluation systems
- combination of prompting, rag, llm, fine tuning
- reducing hallucinations by limiting the llm to answer based on existing documentation
- helping with explainability, error checking and copyright issues by clearly referencing its sources for each comment.
- giving private/specific or more up to date data to the llm
- not relying on more black box llm training / fine tuning for what the models knows and has memorised.
- develop apps with advanced techniques, build RAG agents, evaluate RAG systems.
- RAG tools – loading, indexing, storing, querying
- langchain vs llamaindex libraries.
- query expansion, transformation reranking, recursive retrieval, optimisation and production tips and techniques with llamaindex
- activeloop’s deep memory used to improve accuracy
- financial analysis, biomedical, legal, ecommerce, – code projects.
- chat with outfit recommender, medical pill recognizer, weather in your area.
- investor presentation analyzer
- deep memory boost retrieval accuracy upto 22%,
- main platforms – activeloop’s deep lake, open ai, llamaindex, langchain, langchains langsmith
- Coding environment – code editor / visual studio, python virtual environment, google colab notebook
- access to deeplake tensor database. create org, create api token. community free trial.
Chatbots with RAG: LangChain Full Walkthrough – YouTube
- typical scenario – question -> llm -> result.
- to avoid hallucinations, misinformation
- RAG – question -> embedding model -> query vector -> vector db -> relevant contexts + question -> retrieval augmented query -> llm
- attributes to set – mode, model, temperature, max length, top P, frequency penalty, presence penalty.
- langchain – blockchain based platform, llmchain is the token system used in langchain.
- steps – create an account, install a wallet.
- examples/learn/generation/langchain/rag-chatbot.ipynb at master · pinecone-io/examples · GitHub
- Knowledge learned during training -> llm. basic solution without RAG
- import / install langchain, openai, datasets, pinecone client, tiktoken
- import os, import ChatOpenAI, create environment, create model object by passing environment and model used.
- Use of assistant to prompt the model by creating json objects for different roles like system, user, assistant.
- in langchain import schema for systemmessage, humanmessage, aimessage and create message object similar to the json one.
- system – you are a helpful assistant, human – hi ai, how are you today?, aimessage – i’m great thank you. how can i help you?, human – i’d like to understand string theory
- pass the message to the chat object created in step 10 to get the response.
- print the content of the response.
- append the response to the message and create a new prompt.
- append the new prompt to the message.
- repeat steps 14 to 17 in the loop.
- solution with RAG bypassing vector query and hard coding the query context.
- Knowledge learned during training -> llm<–> RAG / sql search <–> subset of data.
- training data -> llm parametric knowledge. – frozen in time.
- training data -> llm parametric knowledge + vector db / source knowledge (knowledge we insert into the prompt). – add/delete/udpate.
- prompt input – Instructions + contexts (external info) + Question
- with this augmented prompt repeat steps 14 to 17.
- solution with RAG
- import load dataset, load the chunked data with the path and the split option.
- it will extract metadata, column attributes, total number of rows, etc.
- initialize pinecone to create knowledge base. initialize index. (open pinecone account, note the environment). index – name, dimensions – no of independent variables, metric – cosine.
- connect to the index. get index statitstics.
- import OpenAIEmbeddeings library. create its object.
- create embeddings. give text input to embed documents.
- in loop for each embedding iterate on dataset and do the embedding. get metadata to store in pinecone like text, source, title. upsert the vector to the pinecone db.
- describe index statistics – dimension, index fullness, namespaces, total vector count. compare it with before embedding data. step 29 above.
- import pinecone and initialize the vector store object.
- initialize query string. do similarity search in vector db passing the query.
- repeat steps 14 to 17.
LangChain & Vector Databases in Production (activeloop.ai)
- https://platform.openai.com/ create account. use google account.
- https://platform.openai.com/account/api-keys section, key, secret key.
- Deep Lake API token, Activeloop’s website Create API token
- Cost of OpenAI Usage – under $3. or Large Language Models and LangChain -> Using the Open-Source GPT4All Model Locally
- Welcome To Colaboratory – Colaboratory (google.com) or Python virtual env or code editor/Visual studio code
- common packages – langchain, deeplake, openai, tiktoken, selenium. check the version on Course Intro (activeloop.ai)
- create .env file in googld drive. load it using dotenv library. or create virtual python environment.
Retrieval Augmented Generation for Production with LangChain & LlamaIndex – Activeloop
- Langchain: Basic concepts recap:
- Preprocessing the data :
- structuring documents,
- document loaders simplify the process of loading data into documents
- text splitters break down lengthy pieces of text into smaller chunks for better processing
- indexing – creating a structured db of information that the language model can query to enhance its understanding and responses.
- Document loaders
- load documents into structured data.
- input – pdf, s3, public websites, …
- convert into data type that can be processed by the other langchain functions.
- Create document objects. more than 100 document loaders.
- CSVLoader, TextLoader, DirectoryLoader, UnstructuredMarkdownLoader, PyPDFLoader, WikipediaLoader, UnstructuredURLLoader, GoogleDriveLoader, MongodbLoader,
- Document transformers (chunking methods)
- fetch relevant details of documents. several transformation steps.
- splitting/chunking
- Several transformation algorithms, optimized logic
- GPT-4 – 8000 tokens initially.
- embedding model ada-002 – 8000 tokens. (about 16 pages)
- fixed size chunks – sufficient for semantically meaningful paragraphs,
- overlapping for continuity, context preservation
- improve coherence and accuracy of the created chunks.
- CharacterTextSpillter
- variable size chunks – partition the data based on content characteristics
- end of sentence, punctuation marks, endo fline, NLP features.
- preserve coherent and contextually intact content in all chunks.
- RecursiveCharacterTextSplitter
- customized chunking –
- append document title to chunks to prevent context loss.
- MarkdownHeaderTextSplitter
- Indexing
- store and organize data from different sources into vector store.
- storing the chunk along with an embedding representation of it
- OpenAIEmbeddings models.
- Models – LLMs, Embedding models, The role of vector stores, retrievers,
- LLMs – LLM class to interact with various language model providers.
- example – OpenAI, Cohere, Hugging Face Hub, Cohere, Llama-cpp, Azure OpenAI
- install langchain, openai, tiktoekn, cohere
- load environment.
- import ChatOpenAI models, import langchain schema for human and system message
- create messages array object with values for systemMessage and HumanMessage
- start the chat by passing message array object.
- run the code to view the output.
- three types of message – system, human and ai
- SystemMessage – set the behavior and objectives of the chat model. example – marketing manager, json, explaination text,
- HumanMessage – input the user prompt
- AIMessage – response from the model.
- Embedding models –
- standard interface for embedding model providers like openai, cohere, huggingface.
- transform text into vector representations enable semantic search in vector space.
- embed_documents method is used to embed multiple texts, providing a list of vector representations.
- import OpenAIEmbeddings, initialize model by calling OpenAIEmbeddings()
- embed_documents() to embed the documents.
- len(embeddings) to get the number of documents.
- len(embeddings[0]) to get dimension of each embedding.
- consistent output dimensionality, irrespective of the inputs length, while capturing the meaning of the sequences.
- enable to measure sentence similarity using similarity metrics. ex – cosine similarity
- Vector Stores
- effectively store and search vector embeddings. manage vector data.
- Embeddings – high dimensional vectors to capture the semantics of textual data.
- traditional db are not optimized for high dimensional data.
- advantages – speed (quick data retrieval), scalability (handle the growth efficiently), precision (specialized algo for nearest neighbor search) – most relevant results.
- Retrievers – Interfaces in langchain to return documents in response to the query.
- example – compare the angle between query and the documents using cosine similarity.
- Semantically aligned responses.
- Advanced retrieval approaches –
- Parent document retriever – create multiple embeddings. look smaller chunks but return larger contexts. discover related content with smaller chunks and then parent document is used.
- self query retriever – logic for metadata filters. get most out of user prompts. use document and its metadata to retrieve the most relevant content.
- Chains – LLMChain, Sequential, Memory
- powerful reusable components – perform complex tasks.
- integrate prompt templates with llm using chains.
- take the output of one llm and use it as input for the next. connect multiple prompts sequentially.
- LLMChain, SequentialChain
- LLMChain –
- simplest form of chain. transform user input using prompt template.
- receive the user input and parse a class to create a prompt template.
- StrOutputParser – ensure that we receive a string containing the responses from llm
- LCEL / Langchain expression language – easier to interpret.
- SequentialChain
- make a series of subsequent calls to llm
- output of one call as input to another.
- example – create two distinct chains.1. generate social media post based on a theme. 2. social media expert to review the generated post.
- Memory:
- backbone for maintaining context in the ongoing dialogue.
- coherent and contextually relevant response.
- context preservation
- store input and output in structured manner.
- personalized and relevant response, remember and refer to past interaction.
- conversational applications.
- llamaindex:
- dataframework to connect your data to llms and get the results into produciton.
- build llm app on private data.
- data ingestion (take from source, api, pdf, docs, sql,…) – data structure (index, process, add value to data,) -> retrieval and query interface (processed data, advanced query interfaces., QA, summarization, agents and more)
- structured db, vector db. graph db, kv db
- RAG + llamaindex, challenges with RAG, evaluation, optimizing RAG
- Use cases ->
- Document processing, tagging and extraction –
- document -> topic, summary, author
- conversational agent –
- KB + Answer sources -> KB and QA -> Agent,
- Document processing, tagging and extraction –
- workflow automation –
- inbox -> read -> workflow ( read message, send email) -> write -> email.
- Inserting knowledge -> retrieval augmentation -> fix the model, put context in the prompt.
- KB (docuemnt) -> input prompt (Context, given the context answer the question – query_str) -> llm ( pre determined model – gemini, cohere, openai,…). creating pipeline from source data into llm
- fine tuning -> baking knowledge into the weights of the network.
- llamaindex -> dataframework for llm applications. -> data management, query engine, components for ingestion, indexing, query.
- how rag works -> doc -> chunks -> vector db -> chunk -> llm
- ingestion – input doc – chunk (even,… raw text, generate embedding, sentence_transformer) -> vector db (store each chunk in vector db).
- Process – find top k most similar chunks from vector DB collection, plug into llm response synthesis module. vector db -> chunk -> llm (retrieval and synthesis)
- llamaindex -> build rag systems for llm based applications. combine fetching of relevant information from a vast db with the generative capabilities of llms. provides supplementary info to the llm for a posed question to ensure that llm does not generate inaccurate responses.
- Vector Stores – store large, high dimensional data. tools to retrieve relevant documents semantically.
- Analyze the embedding vectors that encapsulate the entire documents meaning.
- primary function is the similarity search, aiming to locate vectors closely resembling a specific query vector.
- Use case – Recommendation engines, image retrieval platforms, pinpoint contextually relevant data.
- Data Connectors – Readers parse and convert the data into a simplified document representation consisting of text and basic metadata. streamline the data ingestion, automated the task of fetching data from various sources (api, pdf, sql) and format it.
- llamahub has data connectors for all possible data formats.
- install packages, set the openai api key for llamaindex. install llama-index, openai, cohere
- download_loader method – to access integrations from llamahub and activate them by passing the integration name.
- WikipediaReader class – input – page titles one or more. return object – Document
- Nodes – llamaindex transforms documents into node objects.
- contains metadata and contextual information.
- NodeParser Class – convert the content of documents into structured nodes.
- SimpleNodeParser – convert a list of document objects into nodes.
- Indices – Index and search data from formats like documents, pdfs, db queries, …
- initial step – transform unstructured data into embeddings that capture semantic meaning and optimize the data format., for easy access and query.
- Summary index – extract a summary from each document and store it with all the nodes.
- VectorStoreIndex – generates embeddings during index construction, identify top-k most similar nodes. suitable for small scale applicaitons and easily acalable to accomodate larger datasets using high performance vector db.
- Query -> query embedding -> vector store (node 1, 2, 3) -> embedding 1, 2, 3 -> similarity top k = 2
- response synthesis
- DeepLakeVectorStore – cereate dataset in activeloop and append documents. first set the openai and activeloop api keys in the environment. Provide dataset path as an argument.cloud based vector store.
- StorageContext – to create storage context. Pass it to VectorStoreIndex to generate embeddings and store the results on the defined dataset.
- Query Engines – Wrapper to combine retriever and response synthesizer into a pipeline. Query string is given as input to fetch nodes and sent to llm to genearate a response. as_query_engine() to create a query engine.
- GPTVectorStoreIndex – class to construct a vector store index. from_documents() to build indexes on the processed documents. query_engine generated from index created and allow to ask questions based on the documents using query() method.
- Routers – to determine the most appropriate retrievers for extracting context from the KB. Routing Function selects the optimal query engine for each task, improving performance and accuracy.Router can determine which data source is most applicable to the given query.
- Saving and loading indexes locally – required for rapid testing. Stores nodes and the associated embeddings on the disk. persist() method from the storage_context. minimizes repetitive processing. if the index already exists in storage, need to load it without recreating it.
- LangChain vs LlamaIndex –
- LlamaIndex – process, structure, access private or domain specific data. link llm’s to the data source. data framework. LlamaHub – dedicated data loaders, efficieint indexing, retrieving, easily add new data points, improved chunking strategy, support multimodality, use llm to maniuplate data – indexing or querying. llm finetuning, embedding fine tuning, sub questions, routing enable to user multiple data sources, free.
- LangChain – dynamic, suited for context-rich interactions and effective for applications like chatbots and virtual assistants. interact with llm, vector stores, prompt templates, chains, prompt strategy, model and output. Retriever function to query. LangSmith for agent. free.
- OpenAI Assistants – SaaS, 20 files upto 512 MB, wide range of file types accepted, GPT + any fine tuned model. thread and messages to keep track of users conversations. code interpreter, knowledge retriever, custom funciton calls. paid.
- Naive RAG –
- bad retrieval – low precision (not all chunk retrieved are relevant), low recall (all relevant chunks are retrieved), outdated information (redundant or out of date data).
- Bad response generation – hallucination, irrelevance, toxicity/bias –
- data – store additional info along with raw text chunks.
- embeddings – optimize embedding represntations.
- retrieval – do better than top-k embedding lookup
- synthesis – use llms for more than generation.
- doc -> chunk -> vector db / deep memory / embeddings -> retrieval -> chunk -> synthesis -> llm
- Evaluation – evaluate in isolation (retrieval, synthesis), evaluate end to end
- evaluate in isolation – user query -> retriever -> retrieved ids.
- steps – create deeplake dataset, run tretriever over dataset, measure ranking metrics, retrieved IDs vs expected ids -> retriever evaluator
- e2e evaluation – evaluate final generated response. steps – create deep lake dataset, run full RAG pipeline, collect evaluation metrics. generated response (optional context) -> label free evaluator ( faithfulness, relevancy, toxic free, adhere to guideline). generated response actual response -> with label evaluator -> correctness, etc.
- Answer questions with RAG.
- llamaindex – bridge between data and llms. ingest (apis, sql db, pdf, …) data and structures it into a format easily consumable by llms. provide data connectors for various data sources. indexing for quick retrieval. nlp query engine to make data interactive.llamahub a platform to aggregate custom plugins for all data types.
- Activeloop deep lake – storage layer it stores the github repositories indexed by llamaindex. optimized storage, data type support – images, videos and complex data structures.
- OpenAI Python Package – interface for gpt models and other services from openai. make api calls. api integration and text generation. used in llamaindex.
- python-dotenv – allows to specify env variables in a .env file. Environment variable management – store configuration variables in a .env file. Easy import – automatically import variables from .env into python environment.
- LlamaIndex workflow – load documents, parse the documents into nodes, construct an index from nodes or documents, query the index.
- load documents – load raw data into the system. manual. or use data loader. specialized data loaders, transform into document objects,
- parse the documents into nodes – parse loaded documents into nodes, structured data units, node has chunks of documents along with metadata and relationship information. raw data to structured format.
- construct an index from nodes or documents – index is constructed to make the data searchable and queryable. VectorStoreIndex.
- query the index – allows to make nlp queries against the indexed data. Conversationally ask the system questions, sift through the indexed data to provide accurate and relevant answers.
- code – .env file, main.py file,
Is AUTOGEN Microsoft’s Langchain alternative or much BIGGER??? – YouTube
- Agent framework. allows to create multi agent systems.
- langchain or llama index. for documents.
- framework to simplify orchestration, optimizaton, automation of llm workflows.
- define agents. agents talk to each other.
- entities – define set of agents. partnering with human being or is a chess player.
- define interaction behavior between agents. or with human being.
- User – Question – commander – question – Writer – code – commander – safeguard – clearance – commander – log – writer – answer – commander – final answer – user.
- three agents – commander, writer, safeguard
- Multi agent conversations.
- flexible conversation patterns – Joint chat. Hierarchical chat.
- User proxy / human -> assistant agent
- example – chart to compare stock price change year till date. ytd. ….
- Code
- pip install pyautogen
- import autogen
- assistant = augtogen.AssistantAgent(“assistant”)
- user_proxy = autogen.UserProxyAgent(“user_proxy”)
- user_proxy.intiate_chat(assistant, message = “show me the ytd gain of 10 largest technology companies as of today.”)
- #this triggers automated chat to solve the task
- example – Chessboard -> Human / AI Chess player A -> Human / AI CHess player B. 3 agents.
- auto gen is a combination of –
- communicate with agents.
- define agents to communicate with each other.
- license – attribution 4.0, commercial, user, modification, distribution, private use.
- install dependencies
- open collab
- pip install -qU \
- pinecone-client==3.0.0 \
- pandas==2.0.3
- from pinecone import Pinecone
- import os
- use_serverless = True
- api_key – os.getenv(“PINECONE_API_KEY”) or “USE_YOUR_API_KEY”
- pinecone.init(api_key=api_key, environment=’us-est1-gcp’)
- #or
- pc = Pinecone(api_key=api_key)
- #check version compatibility of client and server
- import pinecone.info
- version_info = pinecone.info.version()
- server_version = “.”.join(version_info.server.split(“.”)[:2])
- client_version = ”.”.join(version_info.client.split(“.”)[:2])
- assert client_version == server_version, “please upgrade pinecone-client.”
- #create vector index
- from pinecone import ServerlessSpec, PodSpec
- if use_serverless:
- spec = ServerlessSpec(cloud = ‘aws’, region=’us-west-2′)
- else:
- spec = PodSpec(environment=environment)
- index_name = “hello-pinecone”
- if index_name in pc.list_indexex().names():
- pc.delete_index(index_name)
- import time
- dimensions = 3
- pc.create_index(
- name = index_name,
- dimension=dimensions,
- metric=”cosine”,
- spec=spec
- )
- while not pc.describe_index(index_name).status[‘ready’]:
- time.sleep(1)
- index = pc.Index(index_name)
- import pandas as pd
- df = pd.DataFrame(
- data={
- “id”:[“A”, “B”],
- “vector”: [[1., 1., 1.], [1., 2., 3.]]
- })
- df
- index.upsert(vectors=zip(df.id, df.vector)) #insert vectors
- {‘upserted_count’: 2}
- index.describe_index_stats()
- {‘dimension’: 2,
- ‘index_fullness’: 0.0,
- ‘namespaces’: {”: {‘vector_count’: 2}},
- ‘total_vector_count’: 2}
- index.query(
- vector-[2., 2., 2.],
- top_k=5,
- include_values=True) # returns top_k matches
- #delete the index
- pc.delete_index(index_name)
- top_k=5,
- ////////////
- Managing Indexes
- list of index, create, describe, delete index.
- to store vectors, metadata, search, query.
- !pip install pinecone-client
- import pinecone
- pinecone.init(“<<YOUR_API_KEY>>”, environment=’us-west1-gcp’)
- pinecone console – get api key and replace
- pinecone.list_indexes()
- pinecone.create_index(‘example-index’, diemsnion-128, metric=’euclidean’, shards-2)
- pinecone.describe_index(‘example-index’)
- #pinecone.delete_index(‘example-index’)
- Inserting data
- import random
- ids = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]
- vecs = [[random.random() for _ in range(128)] for vc in range(5)]
- index = pinecone.Index(‘example-index’)
- index.upsert(vectors=zip(ids,vecs))
- #upsert batches
- import itertools
- vector_dim = 128
- vector_count = 10000
- #example generator that generates many (id, vector) pairs
- example_data_generator = map(
- lambda i:
- (f’id-{i}’, [random.random() for _ in range(vector_dim)]),
- (range(vector_count)
- #function to handle chunking of pairs
- def chunks(iterable, batch_size=100);
- “””A helper funciton to break an iterable into chunks of size batch_size.”””
- it = iter(iterable)
- chunk = tuple(itertools.islice(it, batch_size))
- while chunk:
- yield chunk
- chunk = tuple(itertools.islice(it, tach_size))
- for chunk in chunks(example_data_generator)
- index.upsert(vectors=chunk)
- #upserts in parallel:
- #upserts in parallel
- upsert data with 100 vectors per upsert request asynchronously
- # – create pinecone.Index with pool_threads=30
- # – Pass async_req=True to index.upsert()
- with pinecone.Index(‘example-index’, pool_threads=30) as index:
- #send requests in parallel
- async_results = [
- index.upsert(vectors=ids_vectors_chunk, async_req=True)
- for ids_vectors_chunk in chunks(example_data_generator, batch_size=100)
- ]
- #wait for and retrieve responses (this raises in case of error)
- [async_result.get() for async_result in async_results]
- Managing Data
- index.fetch(ids=[‘id-0’, ‘id-1’])
- index.upsert(vectors=[‘id-0’, [0.0] * 128)])
- index.fetch(ids=[‘id-0’])
- index.delete(ids=[‘id-1’])
- index.fetch(ids=[‘id-1’])
- index.delete(ids=[‘id-1′], namespace=’example-namespace’)
- index.delete(delete_all=True, namespace=’example-namespace’)
- Querying Data
- import random
- queries = [[random.random() for _ in range(128)] for _ in range(2)]
- index.query(
- queries=queries,
- top_k=3,
- include_values=True
- )
- Metadata filters
- metadata = [
- {‘genre’: ‘comedy’, ‘year’:2018},
- include_values=True
- top_k=3,
- queries=queries,
- #wait for and retrieve responses (this raises in case of error)
-
- {‘genre’: ‘drama’, ‘year’:2021}
- ]
- index.query(
- queries=queries,
- top_k=3,
- filter={‘genre’: {‘$ne’: ‘documentary’},
- ‘year’: {‘$gte’: 2020}},
- include_metadata=True
- )
- ‘year’: {‘$gte’: 2020}},
- ]
- for ids_vectors_chunk in chunks(example_data_generator, batch_size=100)
- index.upsert(vectors=ids_vectors_chunk, async_req=True)
- async_results = [
- #send requests in parallel
- yield chunk
- (f’id-{i}’, [random.random() for _ in range(vector_dim)]),
- lambda i:
- vector-[2., 2., 2.],
- ‘namespaces’: {”: {‘vector_count’: 2}},
- ‘index_fullness’: 0.0,
- “vector”: [[1., 1., 1.], [1., 2., 3.]]
- “id”:[“A”, “B”],
- data={
- metric=”cosine”,
- dimension=dimensions,
- name = index_name,
How to Choose a Vector Database (youtube.com)
- Vector Search – search through vector representations of data to find similar records. semantic search – embedding model – vectors/embeddings. kNN or ANN algo.
- keyword search vs vector search
- keyword – match search terms to text in an inverted index. difficult to find items with similar meaning but containing different keywords. not suitable for multimodal or multilingual search
- vector – utilizes NN models to represent objects (text, images), queries as high dimensional vectors. ranking based on vector similarity. allows finding items with similar meaning or of different modality.
- Text search – TF.IDF (bag of words does not account semantic context, does not respect word order). for images, audio, video. text query to find image.
- KNN/ANN – Vector DB – Neural frameworks – Encoders – App business logic – UI.
- What is vector DB – Vectors data types, geometric filters, updates/deletes/traditional metadata filters, freshness/low latency, model query, low selectivity, cpu bound. stores vector embeddings for fast retrieval and similarity search, horizontal and vertical scaling, update / delete operations, metadata storage, metadata filtering.
- Use cases – image similarity (knn search), multilingual search, Q&A, Recommenders, Google Talk to Books, car image search, e-commerce – multimodal search. metric learning, semantic search, anomaly detection, classification, multi stage ranking