Generative AI . What, How

What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata – YouTube

  1. Generative – create new content – audio, code, images, text, video
  2. AI – automatically use a computer program
  3. G-AI – Examples – Google Translate since 2006, Siri since 2011, Phone/search… helps to complete sentence.
  4. 2023 – GPT 4 – SAT exam 90%. GPT – Generative pretrained transformers.
  5. Write text – essay writing.
  6. act as a developer and create code.
  7. create about me page for the website. input – likes, dislikes
  8. ChatGPT – 2 months to reach 100m users. Google translate took 78 months.
  9. core technology –
    • sequence of words. context so far. example – I want to -> shovel/play/swim/eat -> select play -> tennis/video/…
    • Principal – Language modelling. Build your own language model.
      • 1. large corpus- wiki, stackoverflow, quora, social media, github, reddit
      • 2. ask LM to predict the next word – randomly truncate last part of input sentence, calculates probabilities of missing words, adjust and feedback to the model to match the ground truth.
      • 3. repeat over the whole corpus over months and years.
    • old ways – count the words to predict.
    • new ways – Neural networks
    • 1. 5*8 + 8, 8 * 4 + 4, 4*3+3 layers. Add more layers. 5 inputs at start. 3 outputs at the end. +8, +4,.. is a bias for corrections. 99 trainable parameters sum of all the layers including bias.
    • size, cost,… based on number of parameters.
    • 2. Middle layers will generalize the inputs and see patterns which are not there.
    • 3. Vectors, series of numbers. Edges are the weights / numbers.
    • 4. big or small NN. Number of parameters.
    • based on context, predict next.
    • Neural language model. likelihood methods.
    • Real life NN – > Monsters made of blocks.
    • input -> token and positional embedding (vectors) -> ((Masked self attention + feed forward NN = decoder block) ,… multiple blocks) = decoder only architecture) -> output/prediction. This a architecture of Transformers. Each block has neural networks inside. Mini NN. 2017. No new architecture after that.
    • Generic data is used to create pre trained model. -> transfer learning -> create fine tuned model using domain or task specific data. use it for specialized task like creating a diagnosis report.
    • example – input – the chicken walked. output – across the road <EOS>
    • 2018 onwards extreme increase in model size.
    • number of parameters GPT 4 -> 1 trillion parameters. Human brain – 100 trillion parameters. rat brain – 10 to the power 11.
    • number of words processed by LLMs. GPT-4 -> 100 billion words. Human reads much less in the life time.
  10. Cost – GPT-4 100 Million USD.
  11. Tasks –
    • 8 billion parameters – language understanding, arithmetic, question answering
    • about 24 billion parameters+ – summarization
    • about 56 billion parameters+ – translation, code completion, common sense reasoning
    • about 200 billion parameters+ – local inference chains, semantic parsing, proverbs, general knowledge, reading comprehension, physics QA, joke explanations, dialogue,
  12. 2000 plus problems and fine tune the language models.
  13. Framework – HHH – helpful, honest, harmless. do fine tuning to achieve it.
  14. Helpful – follow instructions, perform tasks, answer and ask relevant questions for clarification.
  15. honest – avoid toxic, biased responses.
  16. harmless –
  17. examples to check. chat.openai.com
    • is the uk a monarchy?
    • who is rishi sunak? didnt identify as pm.
    • write me a poem about a cat and a squirrel
    • can you try a shorter poem
    • can you try to give me a haiku
    • what school did Alan Turing go to.
    • tell me a joke about Alan turing
    • why is this is a funny joke
    • write a short song about relativity
  18. risk – Jobs.
  19. Pollution – CO2 emission. chat gpt query – 100 times more energy than a google search query. llama2 training produced 539 Metric tons of CO2. more energy is used during deployment.

What are the risks of generative AI? – The Turing Lectures with Mhairi Aitken – YouTube

  1. Real, concrete risks:
    • based on probabilities and calculations.
    • Impacts on different communities
    • Students – AI tools to monitor students. low accuracy of the models at present.
    • Text perplexity/confusion. erosion of trust.
    • creative professionals – create scripts,
    • there data used in training models.
    • use of chatgpt to draft responses for online dates.
    • demo of date with chatgpt.
    • people pleaser. companion apps. AI companion encouraged to assassinate.
    • democratic societies – fake image of Trump running with police behind.
    • trustworthy, accurate information. political and ideological views.
    • exploitative labor practices – label text in third world country.
    • environmental impacts – no transparency, significant impact.
    • University of Copenhagen – car driven to moon and return. GPT3 training phase only.
    • Huge amount of water to cool. run and train the model.
    • Every user. 500 milli litre of water. a glass. (for the date above).
    • 16 million daily users. just for one model. there are many other models.
    • Children – smart toys, smart bikes, relationship, social media, conversational assistance.
    • physiological impact.
    • Meta – 28 characters. Sept 2023 released. Metaverse.
    • Address the risks:
    • governance, regulations, international regulatory frameworks, EU AI Act.
    • UK has different approach.
    • existential risks. sensationalism takes away from addressing real risks.
    • NZ. AI generated sandwich recipe by a supermarket.
    • inputs like ant poison sandwich,…
    • who is responsible – supermarket, user or the model used.

What’s the future for generative AI? – The Turing Lectures with Mike Wooldridge – YouTube

  1. Since world war. recent fast progress. scientific discipline.
  2. broad discipline. around 2005. ML
  3. how ML works.
  4. facial recognition. input – alan turing picture. system responds it.
  5. input – supervised learning – training data – use neural nets and deep learning – output.
  6. labelled data – training data. image with name/tag.
  7. classification. 2005. by 2012 got super charged.
  8. tumor, scan, tesla self driving mode. – identify bike, sign, ….
  9. ML – input layer, hidden layer, output layer.
  10. human brain 86 billion neurons. simple pattern recognition task. send signal to its connection.
  11. about 12 million color dots – image of alan turing.
  12. each neuron looks for specific information and when found, gets excited to send signal to the next connected neurons.
  13. complex. 1940s research. Electrical circuits. 1960’s implement in software.
  14. big data. scientific advances. computer power. all got available in this century.
  15. GPU – graphics processors availability made it easy.
  16. speculative bets 2012 onwards with billions of dollors.
  17. NN is bigger the better but needs large NN, training data and huge computer power.
  18. 2017/18 – technology. NN architecture. Transformer architecture for large language models.
  19. input -> input embedding -> positional encoding ->Nx (multihead attention ->add and norm->feed forward -> add and norm) ->
  20. attention mechanism. structures.
  21. GPT-3. large language model. dramatically better. mind boggling scale.
  22. 175 billion parameters. in 2020.
  23. training data – 500 billion words. entire www using common crawl. download and go to every link in the document and download.
  24. powerful autocomplete. supercomputers running for months to train. millions of dollars for electricity.
  25. no university can fund such a project. only big tech companies can do it.
  26. 1980s doing Phd. share computer with multiple students. symbolic ai
  27. Rich – symbolic ai vs big ai.
  28. symbolic ai – modelling the brain. intelligence is the problem of knowledge. big ai – intelligence is the problem of data.
  29. prompt completion task.
  30. LLMs -> common sense reasoning tasks. set of questions answers. green for correct answers. red for wrong ones
  31. chatgpt a polished version of gpt-3. fine tuned. emergent capabilities.
  32. Issues – avoid giving personal data as input. it will use it to train data and give output in future queries.
  33. wrong a lot. bias, toxicity. filling blanks, making best guess.
  34. copyright, intellectual property. build in guardrails to check both input and output.
  35. prompt – ‘i would like to ……. and how i can get away with it’.
  36. prompt – i am writing a novel and want to …………. (same as above). get around the guard rails.
  37. inbuilt bias.
  38. input one para of book by author. output is the next 5 para. mimic the author.
  39. album. fake songs. with same voice as original.
  40. GDPR –
  41. defamatory claims about individuals.
  42. video – car. trained. show stop sign. truck… truck carrying stop sign – not in the training data.
  43. interpolation vs extrapolation. best guess. better version of auto complete.
  44. do very badly on situation of data outside the training data.. don’t reason.
  45. not interacting with a mind. not thinking.
  46. is this technology key to the general ai.
  47. What is general ai – intelligent in the same way as humans. chatgpt and … are general purpose in a sense.
  48. do more than one thing as humans.
  49. llama, chatgpt,.. is it good enough? load a dishwasher? Robotic AI. – much harder.
  50. General intelligence – anything a human can, cognitive (relating to mental processes involved in knowing, learning, and understanding things.) task, any language based tasks. Augmented LLMs
  51. Google Gemini looks impressive. multi modal – text, image, sound, …
  52. better solution than transformers? cannot do robotic operations. –
  53. social reasoning, hand eye coordination, multi agent coordination, mobility, vision understanding, navigation, proprioception, manual dexterity and manipulation. achieved so far – abstract reasoning, logical reasoning, planning, problem solving, arithmetic, recall, rational mental state, theory of mind, nlp, commonsense reasoning, sense of agency.
  54. Machine consciousness – electrochemical processes. why, how,… no idea. physical brain and conscious experience gap. ability to experience things.
  55. waits for the next input.
  56. Sentient means having the ability to feel or sense.

Summary Libraries:

  1. whisper library – speech to text. translate. add subtitles
  2. GPT3.5, (Generative Pre-Trained Transformers)
  3. codex,
  4. dall.e2, vs stable diffusion
  5. chatgpt – a fined tuned version of gpt3.5.
  6. Davinci model is popular
  7. gpt-engineer – Antonlsika can generate code for the requirement given in simple language.
  8. GPT-4,
  9. GPT-4-0613,
  10. GPT-4-32k-0613 -> 32K tokens,
  11. GPT-3.5 TURBO -> cheapest model. –
  12. gpt-3.5-turbo-0613,
  13. gpt-3.5-turbo-16K (16K context window model. 4 times the gpt-3.5 turbo),
  14. gpt-3.5-turbo (most popular chat model),
  15. embedding model. text-embedding-ada-002 (semantic discovery of podcast.)
  16. https://en.wikipedia.org/wiki/Large_language_model#List
  17. Fixtral 8*7 medical center – healthcare. 7 billion parameters to test and evaluate patients.
  18. LLAMA lifecare hospital – evaluate using 70 billion parameters. fixtral outperforms this one inspite of 5x fewer active parameters. combine output of two different consultations with weighted sum in SMoE (Sparsely activated Mixture-of-Experts) for the comprehensive final diagnosis.

Summary Generative AI (refer – wikipedia)

  1. Transformer based deep neural networks. earlier priot to it – variational auto encoder, generative adversarial network. Long short term memory models. LSTM.
  2. Learn from patterns and structures of the input training data.
  3. Generate data with similar characteristics.
  4. accept natural language prompts as inputs.
  5. usage – art, writing, script writing, software development, product design, healthcare, finance, gaming, marketing, and fashion.
  6. misuse – cybercrime, fake news, deepfakes.
  7. discriminative models to generative models journey.
  8. text, code, images, audio, video, molecules, robotics, planning, data, BI
  9. Hardware – smartphones, embedded devices, PC, -> support smaller models. few billion parameters.
  10. laptop, desktop – larger models, 10s of billions of parameters. 65 billion LLaMa can be configured on desktop.
  11. Needs accelerators, GPU, consumer grade gaming graphic cards, compression techniques.
  12. data center, arrays of GPU, AI accelerator chips like Google TPU. Accessed as cloud services.
  13. advantage of running locally – privacy, IP, rate limit, censorship.
  14. Use cases

ETL to ELT to EVT (refer to the blog from Rishi Yadav)

  1. https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-semantic-analysis/#:~:text=Semantic%20analysis%20analyzes%20the%20grammatical,language%20processing%20(NLP)%20systems.
  2. https://www.lexalytics.com/blog/context-analysis-nlp/
  3. extract and convert to vectors along with semantic and contextual information preserved along with word count.
  4. Purpose/task – fill missing data, reduce noise, detect anomalies, recognize patterns, generate new data points
  5. for efficient data analysis. insightful decision making. efficient and advanced data handling practice.
  6. vectors
    • bag of words – tokenization – list of words, vocabulary creation – unique words in alphabetical order, vector creation – sparse matrix.
    • use of bigram, calculate TF, IDF. find most important as per TF-IDF, list all the words in the order,
    • word2vec – power of neural networks / NNto generate word embeddings. skip-gram – input word – project – predict context (one or more). two preceding words and two succeeding words. NN – input layer – hidden layer- output layer. CBOW / continous bag of words – input the word to the model and predict the current word. GloVe – global vectors for word representation – captures both global and local statistics in order to come up with the word embeddings. FastText – capability of generalization to unknown words. The building blocks are letters instead of words. Using characters instead of words has another advantage. Less data is needed for training.
  7. https://neptune.ai/blog/vectorization-techniques-in-nlp-guide

Conversational AI

  1. app, domain
  2. knowledge base – language KB (lexicon, language grammar), Domain KB, Preference – user/app.
  3. Meta KB – spec and definition of the structure
  4. discourse – interaction of user with the system. intent -> goal -> action -> result.
  5. Anaphora – linguistic unit. refer pronoun to another unit.
  6. Ellipsis – remove word or phrase for syntactical construction.
  7. ambiguity – sentence creating doubt/uncertainty.
  8. social role – ketchup vs sause.
  9. goal – objective of users discourse with the system. one or more sub goals.
  10. goal is switchable, cancellable. domain switching, domain retention, multitasking, cancelling.
  11. example – chatbot for flight company – book ticket, know flight details, FAQ – luggage allowed,…
  12. sub goal – book ticket, modify ticket, cancel ticket,…
  13. script – fill request associated with the sub-goal. ticket – destination, date, time, members travelling,…
  14. plug and play – minimum change to core system. add domain knowledge to the KB. should work for any domain.
  15. Knowledge acquisition phase – fill in the domain knowledge.
  16. Modular design – KB in files. no need to change the core engine. app independent core engine. Generalized meta knowledge. flow information without interruption from one module to another.
  17. Domain – one or more. app should be able to identify it. load KB for each domain.
  18. Language base rules should be configurable. switch language during the discourse.
  19. KB – knowledge acquisition tool with GUI. KB representation format. List of actions.
    • Json, xml,… to store goal and sub goal tasks/slots. Parser to load it into DB.
  20. Modules – handle spelling mistakes. handle ellipses. handle ambiguity. initiate dialog if in doubt on semantics of a dialog. temporal (time and events) resolution. synonyms/antonyms module. pronoun/anaphora resolution. social role that affects the dialogue.
  21. module for communication between human and machine. - take input from user and give back the output. text input, speech to text, interpret visual actions, generate spoken text, text to speech, virtual characters, etc. 
    • input – text or speech. able to take next input after passing on the earlier one (input + time stamp) for processing and sending back the output. Unicode / multibyte instead of ascii to handle multiple languages. tools for speech to text. – SAPI, L&H’s advance voice express, … Parallel input from both can be put in queue with time stamp. Silence is considered as end of the sentence.
    • output – queue of objects (text, time stamp, flag to process or send directly). embed tags for facial expression if sent to virtual face reader. methods to add/insert, append text to the output queue. method to use language. error handling. methods to speak, control volume, flush queue,… interface with different media like telephone,… based on the need of the solution.
  22. core engine – do the discourse. understand the dialog. lexicon, syntactical analysis, semantic analysis, resolve ambiguity, interface with KB to understand the dialogue.
    • keep track of error globally, create and initialize queue. methods to read the sentence from the queue.
    • NLP Parser – unambiguous system understandable representation of the sentence. handle one or more language based on the need of the solution. tag (grammatical tag, depends on context and position in the sentence, syntactic & semantic) each word with lexicon/grammar rules, do spell check (typos, speech to text error handling, handle valid language words, handle invalid language words, refer to the list of words used earlier) if required and start the tagging again. use wordnet or other similar dictionary to get word information (conceptual information – synonyms, antonyms and direct negatives). find the domain (filter senses of the word based on the domain, rearrange the senses for all the words, take care of abbreviations, domain and user preferences). based on domain rule generator (invoke gap requests to be filled from current statement or future dialog) will use word info (modify tags of the word) and conclude/infer (produce requests to seek slots for the goal/subgoal).
    • Decision maker / Inference Engine –
    • Tagging – deterministic – based on nlp grammar rules (save in db, json, xml,…). efficient. better learning.
    • non deterministic tagging – random probability. use markov model. generic. language independent. does not capture linguistic information.
  23. module to complete goal and sub goal action. integrate with backend black box application and apis. handle errors from the backend apps and pass on to human via module to communicate.
  24. User input – input handler – tagger -> matching tag (no) -> spell check -> tagger. matching tag(yes) -> word info generator (input from word net api) -> domain filter -> rule generator (input from enriched word info) -> inference engine.
  25. spell checker – categories – non valid language words, valid language words.
  26. Error driven tagging. example –
  27. •Book my tickets for New Yorkee. -> •Book my tickets for New York.
  28. •Please give me to eggs. -> •Please give me two eggs, •Please give me too eggs. -> •Tagger will identify too as incorrect.
  29. incomplete. 27.

Vector DB

  1. https://www.ibm.com/topics/vector-database#:~:text=A%20vector%20database%20is%20designed,AI)%20use%20cases%20and%20applications.
  2. store, manage and index massive quantities of high-dimensional vector data efficiently
  3. data points in a vector database are represented by vectors with a fixed number of dimensions, clustered based on similarity.
  4. This design enables low latency queries, making them ideal for AI-driven applications.
  5. EVT – extract, vector, transform.
  6. simple vector of word count. do cosine distance to find how far one doc is from another doc.
  7.  vector databases are best suited for unstructured datasets through high-dimensional vector embeddings. 
  8. represent complex objects like words, images, videos and audio, generated by a machine learning (ML) model. 
  9. text – chatbots. image – pixels. audio – waves. broken into numerical data.
  10. Vector embeddings – handle millions of vectors. convert your vector data into an embedding. vector embeddings are the backbone of recommendations, chatbots and generative apps like ChatGPT.  
  11. store and index the output of an embedding model. Vector embeddings are a numerical representation of data, grouping sets of data based on semantic meaning or similar features. car, vehicles,… will be grouped together.
  12. graph databases are preferred for processing complex relationships while vector databases are better for handling different forms of data such as images or videos. example – neo4j on relationship, mindmap of infrastructure having servers, routers, switches, applications and its instances installed.
  13. Enterprise vector data can be fed into langchain, hugging face, watson.ai,..
  14. convert data into an embedding by transforming complex, high-dimensional vector data into numerical forms.
  15.  embeddings represent the attributes of your data used in AI tasks such as classification and anomaly (deviates) detection.
  16. vector db – vector embeddings + vectors metadata + fast retrieval of a similarity search. search query is a document, existing documents in corpus. find match using cosine distance.
  17. vector indexing – vectors are indexed to accelerate the search. done using ml algorithm. new data structures that enable faster similarity or distance searches. Querying vectors – using algorithms, such as nearest neighbor search. cosine similarity – how close or distant. use cases – recommendation systems, semantic search, image recognition and other natural language processing tasks. 
  18. retrieval augmented generation / RAG – to improve the response of LLMs. ensures that the model has access to the most current, reliable facts, and that users have access to the model’s sources
  19. types of vector db
    • standalone – ex. Pinecone
    • Open-source solutions such as weaviate or milvus. Rest APIs
    •  PostgreSQL’s open source pgvector extension,
  20. List index – asking about a company’s legal terms last year or extracting specific information from complex documents
  21. https://youtu.be/dN0lsF2cvm4?si=4IxeuI9PMAFUEYos
    • data – vector db – llm.
    • 80% data – unsctructured. social, image, video,… cannot fit into rdbms.
    • image – animal, color, tags, … cannot search based on pixel values.
    • vector embeddings – vector db indexes and stores vector embeddings.
    • algorithm / model create vector embeddings from unstructured data.
    • input – king, man, woman, sentence, image -> model -> numerical data.
    • find similar vectors, calculating distances. nearest neighbor search.
    • index the vector. facilitate the search process. different ways to calculate index.
    • use cases – long term memory for llms. semantic search: search based on the meaning or context. similarity search for text, images, audio or video data. recommendation engine.
    • DB’s available in the market – pinecone, weaviate, chroma, redis. qdrant, milvus, vespa.
  22. Learn Vector Database in 10 Mins – Hottest AI Apps DB! (youtube.com)
    • data lake -> ML Operations -> vector DB -> application.
    • LLM / large language model – very big ai model. predict next word. ask questions. frozen at given time period. chat gpt frozen at 2021.
    • What – high dimensional vectors. types of no sql db – key-value, documents, graph, vector DB
    • N-dimensional / high dimensional.
    • Text generation -> text -> text
    • text representation -> text -> embeddings.
    • input – text, audio, video, images.
    • embedding function – ml model, word embeddings, feature extraction algorithm.
    • Why – store and retrieve similar data. overcome hallucinations. factual gaps.
    • long term memory retrieval, continue the chat from it was left earlier, yesterday or week,…
    • Advantage – allows for fast and accurate similarity search. retrieval of data based on their vector distance.
    • consider two different points. closer the points, more similar they are, far they are – different they will be.
    • based on semantic and contextual meaning. retrieve data based on vector distance / similarity.
    • image similarity, document similarity based on meaning and . product similarity based on attributes.
    • query vector to find similar documents. derived from same type of data or different data. or User query
    • calculate similarity measure of the query with the existing data.
    • distance calculation – cosine similarity, Euclidian, Jaccard,
    • ingestion (api’s, raw files) -> llama hub (llama index) -> DBs/vector stores. (compose graphs, decompose queries, interface with unstructured, semi structured, structured data). -> LLM.
    • examples – Pinecone, weaviate, chroma db, Zilliz, Qdrant,
  23. HIGHLY Scalable Vector Search Tutorial in 12 Mins!!! (youtube.com)
    • pick right vector DB.
    • Astra DB, Cassandra backend, Cassio, langchain.-> to build scalable q and a system.
    • google collab ->
    • pip install -q –progress-bar off \ (langchain, cassio, google cloud ai platform, jupyter, openai, python-dotenv, tensorflow-cpu,tiktoken, transformers)
    • install, crash, restart automatic.
    • ASTRA_DB_KEYSPACE, ASTRA_DB_APPLICATION_TOKEN
    • create vector db – name, keyspace name, provider. plan – free or paid.
    • upload secured connect bundle. zip file. upload it on google colab.
    • colab specific override of helper functions.
    • create Cassandra cluster. use vector similarity search capability of Cassandra.
    • define llm provider. setup api key for it.
    • text file for the input. put it in the separate folder.
    • start vector search. import and use langchain libraries. create db connection. specify llm resources.
    • index creator. create store. fill it with data. to query on need basis.
    • chunk the text, create embedding vectors. chunking size – default 400 kb.
    • sql query – row id, vector, body blob, metadata.
    • query – question is related to another document while one is vectorized. ask question related to document vectorized.
    • upload and vectorize another document. use TextLoader(). run sql query again.
    • ask the same question again. this time it will answer. no hallucinations. answers from documents vectorized.
    • Reranking algorithm. can have k-responses. not only one. weightage to do the reranking. custom algo.
    • Register for Aster DB (with Free Credits) – https://astra.datastax.com/register Vector Search Q&A Colab – https://colab.research.google.com/dri… Astra DB Docs – https://docs.datastax.com/en/astra-se…

Langchain (https://www.python-engineer.com/posts/langchain-crash-course/)

  1. LangChain provides a generic interface for many different LLMs. Most of them work via their API but you can also run local models
  2. Installation
    • pip install langchain
  3. LLMs
    • pip install openai
    • from langchain.llms import OpenAI
    • llm = OpenAI(temperature=0.9) # model_name=”text-davinci-003″
    • text = “What would be a good company name for a company that makes colorful socks?”
    • print(llm(text))
    • pip install huggingface_hub
    • os.environ[“HUGGINGFACEHUB_API_TOKEN”] = “YOUR_HF_TOKEN”
    • from langchain import HuggingFaceHub
    • # https://huggingface.co/google/flan-t5-xl llm = HuggingFaceHub(repo_id=”google/flan-t5-xl”, model_kwargs={“temperature”:0, “max_length”:64})
    • llm(“translate English to German: How old are you?”)
  4. Prompt Templates
    • LangChain faciliates prompt management and optimization.
    • you need to take the user input and construct a prompt, and only then send that to the LLM.
    • prompt = “””Question: Can Barack Obama have a conversation with George Washington?
    • Let’s think step by step.
    • Answer: “””
    • llm(prompt)
    • from langchain import PromptTemplate
    • template = “””Question: {question}
    • Let’s think step by step.
    • Answer: “””
    • prompt = PromptTemplate(template=template, input_variables=[“question”])
    • prompt.format(question=”Can Barack Obama have a conversation with George Washington?”)
  5. Chains
    • Combine LLMs and Prompts in multi-step workflows.
    • from langchain import LLMChain
    • llm_chain = LLMChain(prompt=prompt, llm=llm)
    • question = “Can Barack Obama have a conversation with George Washington?”
    • print(llm_chain.run(question))
  6. Agents and Tools
    • Agents involve an LLM making decisions about which cctions to take, taking that cction, seeing an observation, and repeating that until done.
    • tool, llm, agent
    • from langchain.agents import load_tools from langchain.agents import initialize_agent
    • pip install wikipedia
    • from langchain.llms import OpenAI
    • llm = OpenAI(temperature=0)
    • tools = load_tools([“wikipedia”, “llm-math”], llm=llm)
    • agent = initialize_agent(tools, llm, agent=”zero-shot-react-description”, verbose=True)
    • agent.run(“In what year was the film Departed with Leopnardo Dicaprio released? What is this year raised to the 0.43 power?”)
  7. Memory
    • Add state to Chains and Agents.
    • Memory is the concept of persisting state between calls of a chain/agent.
    • LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
    • from langchain import OpenAI, ConversationChain
    • llm = OpenAI(temperature=0)
    • conversation = ConversationChain(llm=llm, verbose=True)
    • conversation.predict(input=”Hi there!”)
    • conversation.predict(input=”Can we talk about AI?”)
    • conversation.predict(input=”I’m interested in Reinforcement Learning.”)
  8. Document Loaders
    • Combining language models with your own text data is a powerful way to differentiate them. The first step in doing this is to load the data into documents 
    • from langchain.document_loaders import NotionDirectoryLoader
    • loader = NotionDirectoryLoader(“Notion_DB”)
    • docs = loader.load()
  9. Indexes
    • Indexes refer to ways to structure documents so that LLMs can best interact with them. This module contains utility functions for working with documents
    • embeddings, text splitters, vector stores.
    • import requests
    • url = https://raw.githubusercontent.com/hwchase17/langchain/master/docs/modules/state_of_the_union.txt&#8221; res = requests.get(url)
    • with open(“state_of_the_union.txt”, “w”) as f:
      •   f.write(res.text)
    • # Document Loader from langchain.document_loaders import TextLoader loader = TextLoader(‘./state_of_the_union.txt’) documents = loader.load()
    • # Text Splitter from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents)
    • pip install sentence_transformers
    • # Embeddings from langchain.embeddings import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings()
    • #text = “This is a test document.” #query_result = embeddings.embed_query(text) #doc_result = embeddings.embed_documents([text])
    • pip install faiss-cpu
    • from langchain.vectorstores import FAISS
    • db = FAISS.from_documents(docs, embeddings)
    • query = “What did the president say about Ketanji Brown Jackson” docs = db.similarity_search(query)
    • print(docs[0].page_content)
    • # Save and load: db.save_local(“faiss_index”) new_db = FAISS.load_local(“faiss_index”, embeddings) docs = new_db.similarity_search(query) print(docs[0].page_content)
    • LangChain explained – The hottest new Python framework (youtube.com)
    • Python framework
    • your data, user input, prompt, history, another model, google search, wikipedia <-> llm
    • build apps through composability (allows systems to be assembled from smaller, independent components).
    • models, prompts, chains, memory, indexes, agents and tools.
    • interface for many llms – OpenAI, HuggingFaceHub, Cohere.
    • from langchain.llms import OpenAI, Cohere
    • from langchain import HuggingFaceHub,
    • access models from many providers.
    • prompt management, optimization and serialization.
    • from langchain import PromptTemplate
    • template = “””Question: {question}
    • let’s think step by step
    • Answer:
    • Prompt = PromptTemplate(template=template, input_variables=[“question”])
    • user_input = input(“what’s your question? “)
    • prompt.format(question=user_input)
    • chains – sequences of call
    • llm = OpenAI(temprature=0.9)
    • template = “what is a good name for a company that makes {product}?”
    • prompt = PromptTemplate(input_variables=[“product”], template=template)
    • from langchain.chains import LLMChain
    • chain = LLMChain(llm=llm, prompt=prompt)
    • print(chain.run(“colorful socks”))
    • Memory – interface for memory and memory implementations
    • from langchain.memory import ChatMessageHistory
    • history = ChatMessageHistory()
    • history.add_user_message(“hi!”)
    • history.add_ai_message(“whats up?”)
    • Indexes – utility functions to load your own data.
    • from langchain.document_loaders import NotionDirectoryLoader
    • from langchain.document_loaders import PyPDFLoader
    • loader = NotionDirectoryLoader(“Notion_DB”)
    • loader = PyPDFLoader(“your_file.pdf”)
    • loader = UnstructuredEmailLoader(‘example-email.eml’)
    • data = loader.load()
    • from langchain.vectorstores import Pinecone, weaviate, faiss, elasticvectorsearch, opensearchvectorsearch, redis, atlasdb, milvus
    • agents and tools
    • from langchain.agents import load_tools
    • from langchain.agents import intialize_agent
    • from langchain.llms import OpenAI
    • llm = OpenAI(temprature=0)
    • tools = load_tools(“google-search”, “wikipedia”, “llm-math”], llm=llm)
    • agent = initialize_agent(tools, llm, agent=”zero-shot-react-description”)
    • agent.run(“who is leo dicaprio’s girlfirend? what is her current age raised to the 0.43 power?”)
    • Implementing Agents in LangChain – Comet – Agents in LangChain are systems that use a language model to interact with other tools. They can be used for tasks such as grounded question/answering, interacting with APIs, or taking action. LangChain provides: A standard interface for agents.

Prompt Engineering:

  1. https://en.wikipedia.org/wiki/Prompt_engineering
    • process of structuring text that can be interpreted and understood by gen ai.
    • describes the task that an ai should perform.
    • text to text
    • small query. – chatbot
    • longer sentence will have the context. – answer like Eon Musk.
    • text to image, text to audio.
    • a prompt is a description of the output. ex – high-quality photo of an astronaut riding a horse
    • in-context learning – ability to learn from prompts.
    • does not carry context or bias from one conversation to another, except that of the learning model.
  2. https://aws.amazon.com/what-is/prompt-engineering/#:~:text=Prompt%20engineering%20gives%20developers%20more,concisely%20in%20the%20required%20format.
    • guide gen ai to generate desired output.
    • instruction to create high quality and desired output.
    • use of formats, words, phrases and symbols.
    • trial and error. collection of input texts.
    • llm’s are open ended.
    • continuously refine prompt until you get the desired outcome.
    • advantage – greater developer control, improved user experience, increased flexibility.
    • use cases
    • subject matter expert – medical field – enter symptoms and patient details. possible diseases associated. narrows it down based on further input.
    • critical thinking – solve complex problems. prompt a model to list all possible options, evaluate each option, and recommend the best solution.
    • creativity – generating new ideas, concepts, or solutions.
  3. Techniques
    • Chain-of-thought prompting – breaks down a complex question into smaller, logical parts that mimic a train of thought. 
    • Tree-of-thought prompting – It prompts the model to generate one or more possible next steps. Then it runs the model on each possible next step using a tree search method.
    • Maieutic prompting – The model is prompted to answer a question with an explanation. The model is then prompted to explain parts of the explanation,.
    • Complexity-based prompting – involves performing several chain-of-thought rollouts. It chooses the rollouts with the longest chains of thought then chooses the most commonly reached conclusion.
    • Generated knowledge prompting – first generate relevant facts needed to complete the prompt. Then it proceeds to complete the prompt. This often results in higher completion quality as the model is conditioned on relevant facts.
    • Least-to-most prompting – model is prompted first to list the subproblems of a problem, and then solve them in sequence
    • Self-refine prompting – model is prompted to solve the problem, critique its solution, and then resolve the problem considering the problem, solution, and critique. The problem-solving process repeats until a it reaches a predetermined reason to stop
    • Directional-stimulus prompting –  hint or cue, such as desired keywords, to guide the language model toward the desired output.
  4. best practices –
    • unambiguous prompts, adequate context, balance targeted information and desired output, experiment and refine.
  5. https://youtu.be/aOm75o2Z5-o?si=rlo-1FsBcBFnxb2R
  6. elements of the prompt –
    • input/context – here is the transcript of the podcast about gen ai. question – what do they say about LLMs?
    • instructions – translate from English to German
    • question – what is the meaning of life
    • examples
    • output format
    • without above, or mix and match
  7. examples
    • one shot learning –
    • few shot learning – question: the capital of France is? Answer: Paris. question: the capital of germany is? Answer:
    • output format – output: Yes, No
      • output: PRovide a short answer and then explain your reasoning.
  8. Use cases:
    • summarization – summarize the following text
    • classification – classify the following text into one of the classes – sports, finance, education,
    • translation – translate from english to german
    • text generation / completion – AI is
    • question / answering – what is the meaning of life
    • coaching – how would yhou improve the following script for a youtube video about generative ai.
    • image generation – generate a image of a cute puppy.
  9. General tips:
    • direct instructions, clear question, concise and unambiguous language. provide context – relevant information, data. give examples. provide the desired output.
    • encourage the model to be factual through other means (to avoid hallucinations). – example – are mRNA vaccines safe? Answer only using reliable sources and cite those sources.
    • align prompt instructions with the tasks and goal – this is a conversation between a customer and a polite, helpful customer support agent. customer: can you help me? assistant: of course! what is your question?
    • User personas to get more specific voices: – you are a kind customer support service agent. …
    • Prompting techniques to control the output
    • Length control – write a 150 word summary on ….
    • tone control – write a polite response.
    • style control – give me the summary as bullet points.
    • Audience control – explain this topic to a 5 year old kid.
    • context control –
    • scenario based guiding – you are a helpful customer support expert.
    • chain of thought prompting – examples or ‘lets think step by step’. example – i went to the market and bought 10 apples. i gave 2 apples to the neighbor and 2 to the repairman. i then went and bought 5 more apples and ate 1. how many apples did i remain with? Answer – first, you start with 10 apples. you give away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples. then you bought 5 more apples, so now you had 11 apples. Finally, you ate 1 apple, so you would remain with 10 apples. i went to the market and bought 50 apples. I gave 3 apples to the neighbor and 7 to the repairman. i then went and bought 15 more apples and ate 3. how many apples did i remain with? Answer:
    • zero shot chain of thought / COT – provide the question and then mention – let’s think step by step.
  10. Avoid hallucination – don’t make anything up. select one or two relevant quotations from from the text to backup your claim.
  11. let model tell i don’t know to avoid hallucination.
    • Instruction – when you reply, first find exact quotes in the FAQ relevant to the user’s question. this is a space for you to write down relevant content and will not be shown to the user. Once you are done extracting relevant quotes, answer the question.
  12. brake complex tasks into sub tasks – please follow these steps: 1. write three topic sentences arguing for {{statement}} 2. write three topic sentences arguing against {{statment}} 3. write an essay by expanding each topic sentence from steps 1 and 2, and adding . Assistant:
  13. check the model’s comprehension – give the context and question. Human: I am going to give you a sentence and you need to tell me how many times it . for example, if i say “i would like an apple” then the answer is “i” because the word “apple” is in the sentence . you can reason through or explain anything you’d like before responding, but make sure at the very end, you end your . Do you understand the instructions? Assistant:
  14. Iterating tips – try different prompts to find what works best, when attempting few shot learning, try also including direct instructions. rephrase a direct instruction set to be more or less concise. e.g. taking a previous example of just saying ” translate and expanding on the instruction to say “translate from english to spanish” try different persona keywords to see how it can affects the response style. user fewer or more examples in your few -shot learning.

model evaluation and optimization

  1. reference – linkedin posts.
  2. quantitative methods – numerical scores. metrics – inception score / IS, FID / frechet inception distance, PRD / precision and recall for distributions. diversity score, coverage, mode collapse,
  3. qualitative methods – inspect the generated data visually or do the audit. visual inspection, pairwise comparison, preference ranking, interpolation, latent space exploration, conditional generation.
  4. Hybrid methods – human in the loop evaluation, adversarial evaluation, Turing test, perceptual quality assessment, structural similarity index, word error rate.
  5. challenges – choose right method. balance realism, diversity, and consistency. high-dimensional, multimodal, and complex nature of the data.
  6. BLEU (BiLingual Evaluation Understudy) is a metric for automatically evaluating machine-translated text. 
  7. https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-approach-gen-ai
  8. classification – accuracy, precision, confusion mattrix.
  9. domino model monitor

Response Quality

  1. https://www.linkedin.com/pulse/hmw-measure-quality-gen-ai-product-yue-claire-xiao
  2. Helpfulness: 
    • Language understanding and generation
    • Relevance
    • Diversity and Creativity
  3. Harmlessness
    • Bias and Fairness
    • User trust, safety, privacy
    • Handling of Ambiguity and Edge Cases
  4. Latency
    • Time to the first word (token)
    • Avg time for generating each subsequent words (token).
  5. cycle – data source – data collection, cleaning, storage, model training, prompt engineering, gen ai output review, fine tune models and prompts, employee training.
  6. https://medium.com/slalom-data-ai/with-generative-ai-its-quality-in-quality-out-feb29dbbd919

Retrieval Augmented Generation / https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/

  1. Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.
  2. It fills the gap how llm works. LLMs are neural networks, typically measured by how many parameters they contain. An LLM’s parameters essentially represent the general patterns of how humans use words to form sentences.
  3.  link generative AI services to external resources, especially ones rich in the latest technical details.
  4. help models clear up ambiguity in a user query. reduces the possibility a model will make a wrong guess, a phenomenon sometimes called hallucination.
  5. have conversations with data repositories.
  6. generative AI model supplemented with a medical index could be a great assistant for a doctor or nurse
  7. RAG doesn’t require a data center. local llm, local data. user data – sentence transformer – vector library. interacts with llm.
  8. augmentation – the action or process of making or becoming greater in size or amount.
  9. enterprise knowledge base – retrieve documents – embedding model – vector db. doc retrieval and ingestion. user query and response generation: user – enterprise app – user query – embedding model – query and embedded query – vector db – prompt / query / enhanced context – llm – respond to user.
  10. user asks llm. ai model sends query to another model. convert to numeric format / vector / embedded model. compare to vectors in a machine readable index from KB. retrieve related data and return it back.
  11. important – keep the sources current. continuously update machine readable indices. llm reads question and chat history. do similarity search in vector store. output of matching vectors given to llm. answer passed on to user. (apply filters for legality).

What is Retrieval-Augmented Generation (RAG)? (youtube.com)

  1. Generation – LLMs generating text in response to a user query/prompt.
  2. Undesirable behavior –
  3. which planet has most moon in the solar system. Person will answer based on past knowledge, what is on top of the head, no source to back it. may be outdated.
  4. It is a challenge for LLM too. Check trusted source like NASA. No hallucination.
  5. LLM responds to user. LLM <-> User. To make it reliable add a content store <-> LLM <-> User.
  6. Content store – open internet, closed – data, document, policies,…
  7. user – prompt – question – response.
  8. RAG – Add instruction to retrieve relevant content. combine with user question and then send the response.
  9. Instead of retraining the model. update the information, data. Model should be able to say I do not know.
  10. Negative effect – if retrieval is not efficiently good and correct.

What is Retrieval Augmented Generation (RAG) – Augmenting LLMs with a memory (youtube.com)

  1. Documents/Chunked Texts -> generate embeddings -> Vector DB <- prompt embedding
  2. Vector DB -> context. prompt + context -> llm -> result
  3. hallucination – model returns random things, seems true but aren’t. it does not know the answer.
  4. it does not predict word in the statistical way. Use entire internet to train, predict the next logic word.
  5. does not understand what it is talking about, predict one word at a time, that is a probable.
  6. reason – unable to find relevant data. don’t know which data to refer to.
  7. user -> question -> retrieval query -> KB -> retrieved query -> Prompt (Question + sources found) -> llm -> response – user. safe and aligned.
  8. disadvantage of RAG -> limit the answers to KB which is finite and not as big as the internet.
  9. Jerry Liu, owner of llama index.
  10. accuracy, relevancy
  11. AI tutor –>> validate question -> find sources in DB -> digesting sources with chatgpt.
  12. RAG based chatbot, medical assistant, lawyer, …
  13. input factual and accurate information. ingest data into memory. chunks of text about 500 characters.
  14. Use openai ada model (embedding model) -> create vector embeddings -> save it in memory/vector db.
  15. additional things to consider – 1. how to determine when to answer a question or not, relevant, in documentation, understand new terms/ acronyms, find relevant information efficiently and accurately, etc.
  16. Techniques to improve these concerns – better chunking methods, re rankers, query expansion, agents, etc. -> Tutorials -> Advanced RAG with langchain and llamaindex, Training and Fine tuning llms for production, langchain and vector DBs in production
  17. gigantic web scale data set -> pre train -> base llm + private KB -> supervised fine tuning -> fine tuned llm

Retrieval Augmented Generation (RAG) for Production with LangChain & LlamaIndex Free Course (youtube.com)

  1. how to build on RAG techniques
  2. learn – advanced rag techniques with llamaindex, build rag agents, build rag evaluation systems
  3. combination of prompting, rag, llm, fine tuning
  4. reducing hallucinations by limiting the llm to answer based on existing documentation
  5. helping with explainability, error checking and copyright issues by clearly referencing its sources for each comment.
  6. giving private/specific or more up to date data to the llm
  7. not relying on more black box llm training / fine tuning for what the models knows and has memorised.
  8. develop apps with advanced techniques, build RAG agents, evaluate RAG systems.
  9. RAG tools – loading, indexing, storing, querying
  10. langchain vs llamaindex libraries.
  11. query expansion, transformation reranking, recursive retrieval, optimisation and production tips and techniques with llamaindex
  12. activeloop’s deep memory used to improve accuracy
  13. financial analysis, biomedical, legal, ecommerce, – code projects.
  14. chat with outfit recommender, medical pill recognizer, weather in your area.
  15. investor presentation analyzer
  16. deep memory boost retrieval accuracy upto 22%,
  17. main platforms – activeloop’s deep lake, open ai, llamaindex, langchain, langchains langsmith
  18. Coding environment – code editor / visual studio, python virtual environment, google colab notebook
  19. access to deeplake tensor database. create org, create api token. community free trial.

Chatbots with RAG: LangChain Full Walkthrough – YouTube

  1. typical scenario – question -> llm -> result.
  2. to avoid hallucinations, misinformation
  3. RAG – question -> embedding model -> query vector -> vector db -> relevant contexts + question -> retrieval augmented query -> llm
  4. attributes to set – mode, model, temperature, max length, top P, frequency penalty, presence penalty.
  5. langchain – blockchain based platform, llmchain is the token system used in langchain.
  6. steps – create an account, install a wallet.
  7. examples/learn/generation/langchain/rag-chatbot.ipynb at master · pinecone-io/examples · GitHub
  8. Knowledge learned during training -> llm. basic solution without RAG
  9. import / install langchain, openai, datasets, pinecone client, tiktoken
  10. import os, import ChatOpenAI, create environment, create model object by passing environment and model used.
  11. Use of assistant to prompt the model by creating json objects for different roles like system, user, assistant.
  12. in langchain import schema for systemmessage, humanmessage, aimessage and create message object similar to the json one.
  13. system – you are a helpful assistant, human – hi ai, how are you today?, aimessage – i’m great thank you. how can i help you?, human – i’d like to understand string theory
  14. pass the message to the chat object created in step 10 to get the response.
  15. print the content of the response.
  16. append the response to the message and create a new prompt.
  17. append the new prompt to the message.
  18. repeat steps 14 to 17 in the loop.
  19. solution with RAG bypassing vector query and hard coding the query context.
  20. Knowledge learned during training -> llm<–> RAG / sql search <–> subset of data.
  21. training data -> llm parametric knowledge. – frozen in time.
  22. training data -> llm parametric knowledge + vector db / source knowledge (knowledge we insert into the prompt). – add/delete/udpate.
  23. prompt input – Instructions + contexts (external info) + Question
  24. with this augmented prompt repeat steps 14 to 17.
  25. solution with RAG
  26. import load dataset, load the chunked data with the path and the split option.
  27. it will extract metadata, column attributes, total number of rows, etc.
  28. initialize pinecone to create knowledge base. initialize index. (open pinecone account, note the environment). index – name, dimensions – no of independent variables, metric – cosine.
  29. connect to the index. get index statitstics.
  30. import OpenAIEmbeddeings library. create its object.
  31. create embeddings. give text input to embed documents.
  32. in loop for each embedding iterate on dataset and do the embedding. get metadata to store in pinecone like text, source, title. upsert the vector to the pinecone db.
  33. describe index statistics – dimension, index fullness, namespaces, total vector count. compare it with before embedding data. step 29 above.
  34. import pinecone and initialize the vector store object.
  35. initialize query string. do similarity search in vector db passing the query.
  36. repeat steps 14 to 17.

LangChain & Vector Databases in Production (activeloop.ai)

  1. https://platform.openai.com/ create account. use google account.
  2. https://platform.openai.com/account/api-keys section, key, secret key.
  3. Deep Lake API token, Activeloop’s website Create API token
  4. Cost of OpenAI Usage – under $3. or Large Language Models and LangChain -> Using the Open-Source GPT4All Model Locally
  5. Welcome To Colaboratory – Colaboratory (google.com) or Python virtual env or code editor/Visual studio code
  6. common packages – langchain, deeplake, openai, tiktoken, selenium. check the version on Course Intro (activeloop.ai)
  7. create .env file in googld drive. load it using dotenv library. or create virtual python environment.

Retrieval Augmented Generation for Production with LangChain & LlamaIndex – Activeloop

  1. Langchain: Basic concepts recap:
  2. Preprocessing the data :
    • structuring documents,
    • document loaders simplify the process of loading data into documents
    • text splitters break down lengthy pieces of text into smaller chunks for better processing
    • indexing – creating a structured db of information that the language model can query to enhance its understanding and responses.
  3. Document loaders
    • load documents into structured data.
    • input – pdf, s3, public websites, …
    • convert into data type that can be processed by the other langchain functions.
    • Create document objects. more than 100 document loaders.
    • CSVLoader, TextLoader, DirectoryLoader, UnstructuredMarkdownLoader, PyPDFLoader, WikipediaLoader, UnstructuredURLLoader, GoogleDriveLoader, MongodbLoader,
  4. Document transformers (chunking methods)
    • fetch relevant details of documents. several transformation steps.
    • splitting/chunking
    • Several transformation algorithms, optimized logic
    • GPT-4 – 8000 tokens initially.
    • embedding model ada-002 – 8000 tokens. (about 16 pages)
    • fixed size chunks – sufficient for semantically meaningful paragraphs,
    • overlapping for continuity, context preservation
    • improve coherence and accuracy of the created chunks.
    • CharacterTextSpillter
    • variable size chunks – partition the data based on content characteristics
    • end of sentence, punctuation marks, endo fline, NLP features.
    • preserve coherent and contextually intact content in all chunks.
    • RecursiveCharacterTextSplitter
    • customized chunking
    • append document title to chunks to prevent context loss.
    • MarkdownHeaderTextSplitter
  5. Indexing
    • store and organize data from different sources into vector store.
    • storing the chunk along with an embedding representation of it
    • OpenAIEmbeddings models.
  6. Models – LLMs, Embedding models, The role of vector stores, retrievers,
    • LLMs – LLM class to interact with various language model providers.
    • example – OpenAI, Cohere, Hugging Face Hub, Cohere, Llama-cpp, Azure OpenAI
    • install langchain, openai, tiktoekn, cohere
    • load environment.
    • import ChatOpenAI models, import langchain schema for human and system message
    • create messages array object with values for systemMessage and HumanMessage
    • start the chat by passing message array object.
    • run the code to view the output.
    • three types of message – system, human and ai
    • SystemMessage – set the behavior and objectives of the chat model. example – marketing manager, json, explaination text,
    • HumanMessage – input the user prompt
    • AIMessage – response from the model.
    • Embedding models
    • standard interface for embedding model providers like openai, cohere, huggingface.
    • transform text into vector representations enable semantic search in vector space.
    • embed_documents method is used to embed multiple texts, providing a list of vector representations.
    • import OpenAIEmbeddings, initialize model by calling OpenAIEmbeddings()
    • embed_documents() to embed the documents.
    • len(embeddings) to get the number of documents.
    • len(embeddings[0]) to get dimension of each embedding.
    • consistent output dimensionality, irrespective of the inputs length, while capturing the meaning of the sequences.
    • enable to measure sentence similarity using similarity metrics. ex – cosine similarity
  7. Vector Stores
    • effectively store and search vector embeddings. manage vector data.
    • Embeddings – high dimensional vectors to capture the semantics of textual data.
    • traditional db are not optimized for high dimensional data.
    • advantages – speed (quick data retrieval), scalability (handle the growth efficiently), precision (specialized algo for nearest neighbor search) – most relevant results.
    • Retrievers – Interfaces in langchain to return documents in response to the query.
    • example – compare the angle between query and the documents using cosine similarity.
    • Semantically aligned responses.
    • Advanced retrieval approaches –
    • Parent document retriever – create multiple embeddings. look smaller chunks but return larger contexts. discover related content with smaller chunks and then parent document is used.
    • self query retriever – logic for metadata filters. get most out of user prompts. use document and its metadata to retrieve the most relevant content.
  8. Chains – LLMChain, Sequential, Memory
    • powerful reusable components – perform complex tasks.
    • integrate prompt templates with llm using chains.
    • take the output of one llm and use it as input for the next. connect multiple prompts sequentially.
    • LLMChain, SequentialChain
    • LLMChain
    • simplest form of chain. transform user input using prompt template.
    • receive the user input and parse a class to create a prompt template.
    • StrOutputParser – ensure that we receive a string containing the responses from llm
    • LCEL / Langchain expression language – easier to interpret.
    • SequentialChain
    • make a series of subsequent calls to llm
    • output of one call as input to another.
    • example – create two distinct chains.1. generate social media post based on a theme. 2. social media expert to review the generated post.
  9. Memory:
    • backbone for maintaining context in the ongoing dialogue.
    • coherent and contextually relevant response.
    • context preservation
    • store input and output in structured manner.
    • personalized and relevant response, remember and refer to past interaction.
    • conversational applications.
  10. llamaindex:
    • dataframework to connect your data to llms and get the results into produciton.
    • build llm app on private data.
    • data ingestion (take from source, api, pdf, docs, sql,…) – data structure (index, process, add value to data,) -> retrieval and query interface (processed data, advanced query interfaces., QA, summarization, agents and more)
    • structured db, vector db. graph db, kv db
    • RAG + llamaindex, challenges with RAG, evaluation, optimizing RAG
  11. Use cases ->
    • Document processing, tagging and extraction –
      • document -> topic, summary, author
    • conversational agent –
      • KB + Answer sources -> KB and QA -> Agent,
  12. workflow automation –
    • inbox -> read -> workflow ( read message, send email) -> write -> email.
    • Inserting knowledge -> retrieval augmentation -> fix the model, put context in the prompt.
    • KB (docuemnt) -> input prompt (Context, given the context answer the question – query_str) -> llm ( pre determined model – gemini, cohere, openai,…). creating pipeline from source data into llm
    • fine tuning -> baking knowledge into the weights of the network.
    • llamaindex -> dataframework for llm applications. -> data management, query engine, components for ingestion, indexing, query.
    • how rag works -> doc -> chunks -> vector db -> chunk -> llm
    • ingestion – input doc – chunk (even,… raw text, generate embedding, sentence_transformer) -> vector db (store each chunk in vector db).
    • Process – find top k most similar chunks from vector DB collection, plug into llm response synthesis module. vector db -> chunk -> llm (retrieval and synthesis)
    • llamaindex -> build rag systems for llm based applications. combine fetching of relevant information from a vast db with the generative capabilities of llms. provides supplementary info to the llm for a posed question to ensure that llm does not generate inaccurate responses.
    • Vector Stores – store large, high dimensional data. tools to retrieve relevant documents semantically.
    • Analyze the embedding vectors that encapsulate the entire documents meaning.
    • primary function is the similarity search, aiming to locate vectors closely resembling a specific query vector.
    • Use case – Recommendation engines, image retrieval platforms, pinpoint contextually relevant data.
    • Data Connectors – Readers parse and convert the data into a simplified document representation consisting of text and basic metadata. streamline the data ingestion, automated the task of fetching data from various sources (api, pdf, sql) and format it.
    • llamahub has data connectors for all possible data formats.
    • install packages, set the openai api key for llamaindex. install llama-index, openai, cohere
    • download_loader method – to access integrations from llamahub and activate them by passing the integration name.
    • WikipediaReader class – input – page titles one or more. return object – Document
    • Nodes – llamaindex transforms documents into node objects.
    • contains metadata and contextual information.
    • NodeParser Class – convert the content of documents into structured nodes.
    • SimpleNodeParser – convert a list of document objects into nodes.
    • Indices – Index and search data from formats like documents, pdfs, db queries, …
    • initial step – transform unstructured data into embeddings that capture semantic meaning and optimize the data format., for easy access and query.
    • Summary index – extract a summary from each document and store it with all the nodes.
    • VectorStoreIndex – generates embeddings during index construction, identify top-k most similar nodes. suitable for small scale applicaitons and easily acalable to accomodate larger datasets using high performance vector db.
    • Query -> query embedding -> vector store (node 1, 2, 3) -> embedding 1, 2, 3 -> similarity top k = 2
    • response synthesis
    • DeepLakeVectorStore – cereate dataset in activeloop and append documents. first set the openai and activeloop api keys in the environment. Provide dataset path as an argument.cloud based vector store.
    • StorageContext – to create storage context. Pass it to VectorStoreIndex to generate embeddings and store the results on the defined dataset.
    • Query Engines – Wrapper to combine retriever and response synthesizer into a pipeline. Query string is given as input to fetch nodes and sent to llm to genearate a response. as_query_engine() to create a query engine.
    • GPTVectorStoreIndex – class to construct a vector store index. from_documents() to build indexes on the processed documents. query_engine generated from index created and allow to ask questions based on the documents using query() method.
    • Routers – to determine the most appropriate retrievers for extracting context from the KB. Routing Function selects the optimal query engine for each task, improving performance and accuracy.Router can determine which data source is most applicable to the given query.
    • Saving and loading indexes locally – required for rapid testing. Stores nodes and the associated embeddings on the disk. persist() method from the storage_context. minimizes repetitive processing. if the index already exists in storage, need to load it without recreating it.
    • LangChain vs LlamaIndex
    • LlamaIndex – process, structure, access private or domain specific data. link llm’s to the data source. data framework. LlamaHub – dedicated data loaders, efficieint indexing, retrieving, easily add new data points, improved chunking strategy, support multimodality, use llm to maniuplate data – indexing or querying. llm finetuning, embedding fine tuning, sub questions, routing enable to user multiple data sources, free.
    • LangChain – dynamic, suited for context-rich interactions and effective for applications like chatbots and virtual assistants. interact with llm, vector stores, prompt templates, chains, prompt strategy, model and output. Retriever function to query. LangSmith for agent. free.
    • OpenAI Assistants – SaaS, 20 files upto 512 MB, wide range of file types accepted, GPT + any fine tuned model. thread and messages to keep track of users conversations. code interpreter, knowledge retriever, custom funciton calls. paid.
    • Naive RAG
    • bad retrieval – low precision (not all chunk retrieved are relevant), low recall (all relevant chunks are retrieved), outdated information (redundant or out of date data).
    • Bad response generation – hallucination, irrelevance, toxicity/bias –
    • data – store additional info along with raw text chunks.
    • embeddings – optimize embedding represntations.
    • retrieval – do better than top-k embedding lookup
    • synthesis – use llms for more than generation.
    • doc -> chunk -> vector db / deep memory / embeddings -> retrieval -> chunk -> synthesis -> llm
    • Evaluation – evaluate in isolation (retrieval, synthesis), evaluate end to end
    • evaluate in isolation – user query -> retriever -> retrieved ids.
    • steps – create deeplake dataset, run tretriever over dataset, measure ranking metrics, retrieved IDs vs expected ids -> retriever evaluator
    • e2e evaluation – evaluate final generated response. steps – create deep lake dataset, run full RAG pipeline, collect evaluation metrics. generated response (optional context) -> label free evaluator ( faithfulness, relevancy, toxic free, adhere to guideline). generated response actual response -> with label evaluator -> correctness, etc.
    • Answer questions with RAG.
    • llamaindex – bridge between data and llms. ingest (apis, sql db, pdf, …) data and structures it into a format easily consumable by llms. provide data connectors for various data sources. indexing for quick retrieval. nlp query engine to make data interactive.llamahub a platform to aggregate custom plugins for all data types.
    • Activeloop deep lake – storage layer it stores the github repositories indexed by llamaindex. optimized storage, data type support – images, videos and complex data structures.
    • OpenAI Python Package – interface for gpt models and other services from openai. make api calls. api integration and text generation. used in llamaindex.
    • python-dotenv – allows to specify env variables in a .env file. Environment variable management – store configuration variables in a .env file. Easy import – automatically import variables from .env into python environment.
    • LlamaIndex workflow – load documents, parse the documents into nodes, construct an index from nodes or documents, query the index.
    • load documents – load raw data into the system. manual. or use data loader. specialized data loaders, transform into document objects,
    • parse the documents into nodes – parse loaded documents into nodes, structured data units, node has chunks of documents along with metadata and relationship information. raw data to structured format.
    • construct an index from nodes or documents – index is constructed to make the data searchable and queryable. VectorStoreIndex.
    • query the index – allows to make nlp queries against the indexed data. Conversationally ask the system questions, sift through the indexed data to provide accurate and relevant answers.
    • code – .env file, main.py file,

Is AUTOGEN Microsoft’s Langchain alternative or much BIGGER??? – YouTube

  1. Agent framework. allows to create multi agent systems.
  2. langchain or llama index. for documents.
  3. framework to simplify orchestration, optimizaton, automation of llm workflows.
  4. define agents. agents talk to each other.
  5. entities – define set of agents. partnering with human being or is a chess player.
  6. define interaction behavior between agents. or with human being.
  7. User – Question – commander – question – Writer – code – commander – safeguard – clearance – commander – log – writer – answer – commander – final answer – user.
    • three agents – commander, writer, safeguard
  8. Multi agent conversations.
  9. flexible conversation patterns – Joint chat. Hierarchical chat.
  10. User proxy / human -> assistant agent
  11. example – chart to compare stock price change year till date. ytd. ….
  12. Code
    • pip install pyautogen
    • import autogen
    • assistant = augtogen.AssistantAgent(“assistant”)
    • user_proxy = autogen.UserProxyAgent(“user_proxy”)
    • user_proxy.intiate_chat(assistant, message = “show me the ytd gain of 10 largest technology companies as of today.”)
    • #this triggers automated chat to solve the task
  13. example – Chessboard -> Human / AI Chess player A -> Human / AI CHess player B. 3 agents.
  14. auto gen is a combination of –
    • communicate with agents.
    • define agents to communicate with each other.
    • license – attribution 4.0, commercial, user, modification, distribution, private use.

Pinecone User Guide – YouTube

  1. install dependencies
  2. open collab
    • pip install -qU \
    • pinecone-client==3.0.0 \
    • pandas==2.0.3
    • from pinecone import Pinecone
    • import os
    • use_serverless = True
    • api_key – os.getenv(“PINECONE_API_KEY”) or “USE_YOUR_API_KEY”
    • pinecone.init(api_key=api_key, environment=’us-est1-gcp’)
    • #or
    • pc = Pinecone(api_key=api_key)
  3. #check version compatibility of client and server
    • import pinecone.info
    • version_info = pinecone.info.version()
    • server_version = “.”.join(version_info.server.split(“.”)[:2])
    • client_version = ”.”.join(version_info.client.split(“.”)[:2])
    • assert client_version == server_version, “please upgrade pinecone-client.”
  4. #create vector index
  5. from pinecone import ServerlessSpec, PodSpec
  6. if use_serverless:
    • spec = ServerlessSpec(cloud = ‘aws’, region=’us-west-2′)
    • else:
      • spec = PodSpec(environment=environment)
      • index_name = “hello-pinecone”
      • if index_name in pc.list_indexex().names():
        • pc.delete_index(index_name)
        • import time
          • dimensions = 3
          • pc.create_index(
            • name = index_name,
              • dimension=dimensions,
                • metric=”cosine”,
                  • spec=spec
                  • )
                  • while not pc.describe_index(index_name).status[‘ready’]:
                    • time.sleep(1)
                • index = pc.Index(index_name)
                • import pandas as pd
                • df = pd.DataFrame(
                  • data={
                    • “id”:[“A”, “B”],
                      • “vector”: [[1., 1., 1.], [1., 2., 3.]]
                        • })
                        • df
                        • index.upsert(vectors=zip(df.id, df.vector)) #insert vectors
                        • {‘upserted_count’: 2}
                        • index.describe_index_stats()
                        • {‘dimension’: 2,
                          • ‘index_fullness’: 0.0,
                            • ‘namespaces’: {”: {‘vector_count’: 2}},
                              • ‘total_vector_count’: 2}
                              • index.query(
                                • vector-[2., 2., 2.],
                                  • top_k=5,
                                    • include_values=True) # returns top_k matches
                                    • #delete the index
                                    • pc.delete_index(index_name)
                                • ////////////
                                • Managing Indexes
                                • list of index, create, describe, delete index.
                                • to store vectors, metadata, search, query.
                                • !pip install pinecone-client
                                • import pinecone
                                • pinecone.init(“<<YOUR_API_KEY>>”, environment=’us-west1-gcp’)
                                • pinecone console – get api key and replace
                                • pinecone.list_indexes()
                                • pinecone.create_index(‘example-index’, diemsnion-128, metric=’euclidean’, shards-2)
                                • pinecone.describe_index(‘example-index’)
                                • #pinecone.delete_index(‘example-index’)
                                • Inserting data
                                • import random
                                • ids = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]
                                • vecs = [[random.random() for _ in range(128)] for vc in range(5)]
                                • index = pinecone.Index(‘example-index’)
                                • index.upsert(vectors=zip(ids,vecs))
                                • #upsert batches
                                • import itertools
                                • vector_dim = 128
                                • vector_count = 10000
                                • #example generator that generates many (id, vector) pairs
                                • example_data_generator = map(
                                  • lambda i:
                                    • (f’id-{i}’, [random.random() for _ in range(vector_dim)]),
                                      • (range(vector_count)
                                    • #function to handle chunking of pairs
                                    • def chunks(iterable, batch_size=100);
                                      • “””A helper funciton to break an iterable into chunks of size batch_size.”””
                                      • it = iter(iterable)
                                      • chunk = tuple(itertools.islice(it, batch_size))
                                      • while chunk:
                                        • yield chunk
                                          • chunk = tuple(itertools.islice(it, tach_size))
                                          • for chunk in chunks(example_data_generator)
                                            • index.upsert(vectors=chunk)
                                            • #upserts in parallel:
                                            • #upserts in parallel
                                            • upsert data with 100 vectors per upsert request asynchronously
                                            • # – create pinecone.Index with pool_threads=30
                                            • # – Pass async_req=True to index.upsert()
                                            • with pinecone.Index(‘example-index’, pool_threads=30) as index:
                                              • #send requests in parallel
                                                • async_results = [
                                                  • index.upsert(vectors=ids_vectors_chunk, async_req=True)
                                                    • for ids_vectors_chunk in chunks(example_data_generator, batch_size=100)
                                                      • ]
                                                        • #wait for and retrieve responses (this raises in case of error)
                                                          • [async_result.get() for async_result in async_results]
                                                          • Managing Data
                                                          • index.fetch(ids=[‘id-0’, ‘id-1’])
                                                          • index.upsert(vectors=[‘id-0’, [0.0] * 128)])
                                                          • index.fetch(ids=[‘id-0’])
                                                          • index.delete(ids=[‘id-1’])
                                                          • index.fetch(ids=[‘id-1’])
                                                          • index.delete(ids=[‘id-1′], namespace=’example-namespace’)
                                                          • index.delete(delete_all=True, namespace=’example-namespace’)
                                                          • Querying Data
                                                          • import random
                                                          • queries = [[random.random() for _ in range(128)] for _ in range(2)]
                                                          • index.query(
                                                            • queries=queries,
                                                              • top_k=3,
                                                                • include_values=True
                                                                • )
                                                                • Metadata filters
                                                                • metadata = [
                                                                  • {‘genre’: ‘comedy’, ‘year’:2018},
                                                        • {‘genre’: ‘drama’, ‘year’:2021}
                                                        • ]
                                                        • index.query(
                                                        • queries=queries,
                                                          • top_k=3,
                                                          • filter={‘genre’: {‘$ne’: ‘documentary’},
                                                            • ‘year’: {‘$gte’: 2020}},
                                                              • include_metadata=True
                                                              • )

How to Choose a Vector Database (youtube.com)

  1. Vector Search – search through vector representations of data to find similar records. semantic search – embedding model – vectors/embeddings. kNN or ANN algo.
  2. keyword search vs vector search
  3. keyword – match search terms to text in an inverted index. difficult to find items with similar meaning but containing different keywords. not suitable for multimodal or multilingual search
  4. vector – utilizes NN models to represent objects (text, images), queries as high dimensional vectors. ranking based on vector similarity. allows finding items with similar meaning or of different modality.
  5. Text search – TF.IDF (bag of words does not account semantic context, does not respect word order). for images, audio, video. text query to find image.
  6. KNN/ANN – Vector DB – Neural frameworks – Encoders – App business logic – UI.
  7. What is vector DB – Vectors data types, geometric filters, updates/deletes/traditional metadata filters, freshness/low latency, model query, low selectivity, cpu bound. stores vector embeddings for fast retrieval and similarity search, horizontal and vertical scaling, update / delete operations, metadata storage, metadata filtering.
  8. Use cases – image similarity (knn search), multilingual search, Q&A, Recommenders, Google Talk to Books, car image search, e-commerce – multimodal search. metric learning, semantic search, anomaly detection, classification, multi stage ranking
Posted in Computers and Internet, Generative AI & Deep Learning | Tagged , | Leave a comment

Open AI – Dev Day

OpenAI Dev Day BREAKDOWN!!! – YouTube

  1. GPT-4 turbo launched. 128K context window. 3 times cheaper for input tokens and 2 times cheaper for output tokens than gpt-4.
  2. function calling update. call multiple functions from responses. consistent response format, always get json mode.
  3. seed parameter. for consistent reproducible output.
  4. Assistant. – code interpreter (python interpreter running in sandbox), retrieval (upload pdf and ask questions), function calling (from assistants call functions and give response back tot he user).
  5. easy to deploy a chatbot. kills SaaS and chatbot business.
  6. new modalities – api’s available for vision, dall.e 3, text to speech TTS.
  7. 6 different voices.
  8. Custom models – work with org to build custom models.
  9. lower prices.
  10. copyright shield – using open ai created product and gets sued. open ai will pay for legal dispute.
  11. Whisper v3 large model.
  12. GPTs – agents. create your own agents (private or public) without coding. deploy agents. revenue sharing model – in future. easy to create.
Posted in Generative AI & Deep Learning | Leave a comment

AI Coding Tutorials

NEW Falcon based AI Coding LLM – Falcoder Tutorial – YouTube

Falcoder model – 7b. Fine tuned on CodeAlpaca 20k data instructions dataset. Method – QLoRA, PEFT library. Apache 2.0 license. Can be used for commercial purpose.

Free Google Colab test_falcoder_8_bits,ipynb -> Python code to build Matplotlib bar chat. Code created by the model.

For this model needs a very good memory to use it. Google Collab has 16G memory. Adaptor details can be found, LAURA adaptor. Load in 8 bit.

Collab, note book settings, Runtime -> Change runtime. Python 3 / GPU / T4. – Free version.

Refer to other video on using adaptors and loading in 8 bits using – transformers, accelerators, peft (from huggingface), bitsandbytes, datasets einops.

  1. download,
  2. load
  3. specify the peft model id, (adaptor details)
  4. Peft configuration retrieved using the above object
  5. model object. model to be downloaded. base model. loaded in 8 bit. using bitsandbytes.,
  6. tokenizer is extracted using peft model id in step 3.
  7. ……..time taken to download the model.
  8. load the model.
  9. create utility function. input and output has a format. instruct following model. hyper parameters – instruction, token size, temperature,….
  10. instructions from user. add notations at the end.
  11. Transformer – tokenizer to create tokens. move tokens to cuda.
  12. decode the output and return the output.
  13. utility function generate is created.
  14. give instructions – example – “design a class for representing a person in python.”
    • OOP Class. Person – name, age gender, methods – init, get, set
    • another example – “write an script to upload files to an s3 bucket”
    • generate output with 256 tokens.
    • another example – “enter something (type ‘exit’ to quit): “
    • call generate for every input.
      • “write a python code to build a seaborn bar chart” seaborn is python library for bar chart
      • “write a sql query to get all the rows where the DOB is before Jan 1st 1990

10X Coder is Here!!! How to install & use GPT-ENGINEER Tutorial – YouTube

Like Auto GPT. Focused for Engineers. On single prompt will code for you.

GPT-Engineer https://github.com/AntonOsika/gpt-eng… MIT License

OpenAI API Keys – https://platform.openai.com/account/a…

Specify what you want it to build, the AI asks for clarification, and then builds it. Has demo video.

two ways to use – 1. Python library / package. 2. Development method. – difficult one.

api key for gpt 4 is required. else it will fall back to older 3.5 turbo version. environment variables for api key.

Editor – Visual Studio. VS.

clone the repo, enter the repo, build the project, activate the virtual environment

  1. git clone https://…….
  2. cd gpt-engineer
  3. ls
  4. open make file -> under create -venv. replace python with python3. pip with pip3
  5. make install //to build
  6. code .
  7. create virtual env, upgrade pip, install dependencies, install pre commit hooks,… about 20 seconds.
  8. go to platform.openai.com click api keys, create new secret key, copy it.
  9. on terminal -> export OPEN_API_KEY=”….” //creates env variables.
  10. activiate / invoke the project. source venv/bin/activiate
  11. under projects/example/ create new folder hangman-game under projects.
  12. open main prompt under the new folder created. type the input
  13. We are writing a hangman game in python. the game should show the user scores.
  14. save and return to prompt. command – gpt-engineer rpojects/hangmangame
  15. creates code, classes, functions, ….
  16. run.sh to run the file. -> installs required libraries.
  17. main.py is in the virtual env.
  18. check the code for main.py and HangmanGame
  19. go to main prompt – request another program requirements.
  20. We are trying to build a simple snake game. Keep it so simple that it executes in commond line.
  21. gpt-engineer projects/hangmangame. code is generated. game class.

WizardCoder is CRUSHING Coding LLMs!!! – YouTube

Wizard Coder LLM. AI Coding assistant. Leads the chart. license – bigcode-openrail-m

Paper – https://arxiv.org/pdf/2304.12244.pdf

WizardCoder 15B Model – https://huggingface.co/WizardLM/Wizar…

WizardLM Github – https://github.com/nlpxucan/WizardLM

  1. input – Write a python program to scrape https://news……&#8230;. and get everything as a pandas dataframe.
  2. temprature, top p, top k, beams, max tokens.
  3. libraries – requests, beautifulsoup4, pandas
  4. make a request to the website using requests.get
  5. parse the html content using BeautifulSoup
  6. find all the links on the page. soup.find_all
  7. extract title and links of each link. create data frame with the title and the link columns
  8. Just with one line of input entire code is generated as above and working.

WizardCoder is LLM and fine tuned with EvolInstruct. Primary focus – coding, programming expect.

Other similar models – StarCoder, WizardLM.

WizardCoder = StarCoder + WizardLM + trainedon Evol Instruct. 15B model released. Leading after GPT4.0 and GPT3.5 Check comparision with other open source models. This one is at the top.

31GB model on huffingpost.

run on google collab using quantization methods.

More POWERFUL Coding AI Launched!!! – StarCoderPlus, StarChat Beta – YouTube

License – Big Code OpenRail-M v1

LLM from bloom ai. Research wing from Hugging face. fine tuned with Falcon dataset. Star chat beta to ask questions. respond to both regular questions and coding questions. English and more than 80 programming language, multilingual.

  1. login into Hugging face.
  2. Available on hugging face modeler. Agree to share contact information. accept the Agreement.
  3. StarCoderPlus. Star Chat Beta is improved instruction tuned model.
  4. StarCoderPlus is fine tuned version of StarCodeBase on 600B tokens. RedifinedWeb is the web dataset used.
  5. It is a 15.5B parameter language model. Also trained on Fill in the middle objective. Example – Sentence – “One little coder is really a __ //” fill in the blank – great youtuber,….
  6. raw model, cannot give instructions. not an instruction model.
    • Expert prediction – steps – install and import transformers. tokenizer, model, inputs, outputs, print
    • fill in the middle – input text, inputs, outputs, print.
    • Use playground. 3 different models available – Star Coder Plus (English text and code generation), Star Code Base (code generation tasks), Star coder (focused on python and other programming languages)
    • example – input – cross validation, x/y train and test data. train logistic regression model. predict labels, compute accuracy score.
    • output – generated code.
    • input – write a pandas code to read input csv and visualize df = pd.read_csv
  7. StarChatBeta – fine tuned on StarCoderPlus, used uncensored variant of open assistant dataset. removed alignment. and use of uncensored variant, improved the model at coding task.
    • RLHF / reinforcement learning from human feedback. open and checkllm leader board.
    • 16b parameter, fine tuned on uncensored variant of open assistant.
    • Model – alpha and beta available to test.
    • properties – store data – allow / disallow, select model – alpha / beta. System prompt – , chat – , input message –
    • example – write a python code to reverse the string
    • example – write a regular expression in python to extract the middle 123 from the string “aaa123bbb”
    • example – write a joke on E…. M….

Embedchain – NEW 🔥 Langchain BABY to build LLM Bots!!! – YouTube Apache 2.0 license

Creating AI bot from a youtube video, book, …. example used here – Naval Ravikant

Embedchain framework helps create LLM bots over any dataset. Can handle open ai embeddings and langchain. Built on – Langchain (load, chunk, index data), OpenAI’s embedding model (create embeddings), ChatGPT API (LLM to get answers), chroma (vector database to store embeddings)

Colab Code – https://colab.research.google.com/dri…

Embedchain – https://github.com/embedchain/embedchain

  1. Google collab notebook. CPU. no need of GPU.
  2. install embedchain
  3. setup open ai key. (note – rate limit, lot of request within one minute, enough open ai credits). create environment variables. to run locally on machine – file – download
  4. load embedchain – from embedchain import App
  5. instantiate BOT app. – naval_bot = App()
  6. Add online resources. build layers. – video, pdf, web page,…
  7. in this example – 2 hour video, pdf book, and two web pages.
  8. naval_bot.add(“format”, “link”)
  9. add local questions. – Q and A pair. text, json, csv, ….
  10. total token count for each input and total.
  11. Start Questioning to get answers. – print(naval_bot.query(“………………Question………..”))
  12. ask question to check if it hallucinating.
  13. question – 5 point summary of …….book name……

No GPU? No Problem! Running Incredible AI Coding LLM on CPU! – YouTube

This model is fine tuned on both Sahil2801’s CodeAlpaca & Teknium’s GPTeacher Code-Instruct to give Replit’s Code model instruct capabilities.

teknium/Replit-v2-CodeInstruct-3B – https://huggingface.co/teknium/Replit… Anton Bacaj’s ggml – https://huggingface.co/abacaj/Replit-… replit-3B-inference – https://github.com/abacaj/replit-3B-i.

3 billion parameters. Run on CPU using 4bit quantile model.

  1. Windows. command line. check python version -> 3
  2. clone the repository. repo – replit-3b-inference
  3. enter the repo. cd replit-3B-inference
  4. create and activate the virtual environment. python3 -m venv env && source v/bin/activate
  5. sub module installation. with ctransofmers patch. git submodule update –init –recursive
    • ctransformers and transformers.
  6. install dependencies. pip install -r requirements.txt
  7. download the quantized model weights. python download_model.py
  8. run inferenece. python inference.py
  9. modify inference script prompt and generation parameters.
  10. download the ggml file from hugging face model hub.
  11. put the model in right folder. open the inference. create folder – models. copy model in the folder.
  12. run python3 inference.py file. will fetch and run the file/model.
  13. input – write a simple python code to add two numbers.
  14. output –
  15. input – write a python funciton that can add two numbers
  16. output –
  17. input – write a sql query to get all the rows whose name starts with A.
  18. output –
  19. input – write a sql query to do a cross join between two tab les to get the same rows.
  20. output –
  21. write a R code to print a nice ggplot with some good theme.
  22. write a simple html css js code to create a to-do application.
  23. create a python code to build a simple xgboost model. make sure to handle cross validaiton.

Small is GOOD – New StableCode 3B AI Coding Assistant!!! – YouTube

from stability.ai 3billion parameter model for coding. license – stable code research license. non commercial and research purpose. not for commercial purpose.

On stack dataset from BigCode, which is a project from hugging face. trained on popular programming languages. trained on 560B tokens of code. instruction fine tuned on 120000 code instructions/response. 3 models – base model, instruction fine tuned model, long context window model -> available on Hugging face.

1. Base Model (4K) – https://huggingface.co/stabilityai/st… 2. Base Model (16K) – https://huggingface.co/stabilityai/st… 3. Instruct Model – https://huggingface.co/stabilityai/st…

simple and straightforward to use.

Code LLAMA(2) is HERE!!!! – YouTube

Coding specific llama model. based on llama 2. 3 models. for code generation. 7 (base), 13 (python specialized) and 34 (instruction following) billion parameters. all are trained on 16k tokens and can go up to 100K tokens.

Code Llama launch post – https://about.fb.com/news/2023/08/cod… Code llama Technical Paper – https://ai.meta.com/research/publicat… Code Llama Github – https://github.com/facebookresearch/c… TheBloke Code Llama fp16 – https://huggingface.co/TheBloke/CodeL…

example – write code for Fibonacci series. promotive license for research and commercial use by smaller companies.

Languages supported – Python, C++, Java, PHP, typescript/js, c#, bash,…

  1. LLAMA2 / foundation models -> code training / infilling code training ->
    • long context fine tuning -> instruction fine tuning / code llama instruct and code llama
    • python code traiing -> long context fine tuning -> code llama python

GPT-4 is better then this model. These models are better then other models.

Running “CODE LLAMA” on Free Colab [Full Code Inside]!!! – YouTube

7b parameter model. Google collab notebook. https://colab.research.google.com/dri…

  1. check if GPU is selected to run the code. !nvidia-smi
  2. click runtime -> change runtime type -> Hardware accelerator selected should be T4 GPU. Runtime type is python 3.
  3. install the latest transformers. Transformers and accelerate. Import AutoTokenizer, transformers and torch from transformers.
  4. specify the model to be used. instruct fine tuned model or the base model
    • model = “codellama/codeLlama-7b-Instruct-hf” #”codellama/codeLlama-7b-hf”
  5. use tokenizer from the pre trained model. specified in above step.
  6. build the pipeline. easiest and high abstracted way, using transformers to build a text generation use case. Other pipelines available are for text classification, image classification, …
    • Use float16 torch dtype. no quantization happening.
    • device map as auto will help accelerate library to manage memory between GPU and CPU.
    • needs few minutes to download all the models. 12 to 13 giga bytes.
    • will run without issue on google collab.
  7. give a system prop and the user input/question. Put it into the template which is accepted by the model. Here llama model.
    • Prompt will have the system prompt and the user message
  8. Use the pipeline which takes input of prompt and other hyper parameters to create sequences. play with temperature and maximum length attributes .
  9. Ask questions – example
    • system prompt – provide answers in python
    • user – Write a function that detects a pattern that matches the style of 23-01-2023 from the given text
    • It generates regex.
    • run the code and confirm.
    • user – next question
  10. to use base model, additional prefix and sufix methods/steps are required in the prompt template.
  11. try changing system prompt to type script instead of python. and run to get code and verify.

//////////

OpenAI Dev Day BREAKDOWN!!! – YouTube

  1. GPT-4 turbo launched. 128K context window. 3 times cheaper for input tokens and 2 times cheaper for output tokens than gpt-4.
  2. function calling update. call multiple functions from responses. consistent response format, always get json mode.
  3. seed parameter. for consistent reproducible output.
  4. Assistant. – code interpreter (python interpreter running in sandbox), retrieval (upload pdf and ask questions), function calling (from assistants call functions and give response back tot he user).
  5. easy to deploy a chatbot. kills SaaS and chatbot business.
  6. new modalities – api’s available for vision, dall.e 3, text to speech TTS.
  7. 6 different voices.
  8. Custom models – work with org to build custom models.
  9. lower prices.
  10. copyright shield – using open ai created product and gets sued. open ai will pay for legal dispute.
  11. Whisper v3 large model.
  12. GPTs – agents. create your own agents (private or public) without coding. deploy agents. revenue sharing model – in future. easy to create.

///////////

NEW WizardCoder Python 34B LLM is AMAZING!!! – YouTube

34billion parameter model. (best. not a 70 billion parameter model or the mixture of experts) doing extremely well.

  1. surpasses GPT4. partly true. In discussion.
  2. Based on Code Llama.
  3. Human eval of 73.2. Surpassing GPT4 human eval of 62.
    • check version whenever it mentions or compares with other models.
  4. Wizard Coder is family of models.
  5. best coding model.
  6. Model is uploaded and available on huffing face.
  7. need good GPU to directly try the model.
  8. could not run locally. gradio kept on running for too long.

SPOILER ALERT!!! CodeLlama is NOT BETTER than GPT-4!!! – YouTube

Hacker rank problem. solve using both models.

  1. copy the problem statement including constraints.
  2. Start a new chat on GPT4.
  3. Start a LLaMa chat. perplexity labs
  4. put the question to both.
  5. GPT4 respects the function name mentioned. LLaMA does not.
  6. Fix manually the LLaMA function name error. Run the test in hacker rank. only 2 of 15 TC pass.
  7. Run the GPT4 generated code. 2 of 15 TC pass. same as LLaMA.
  8. input to chat with LLaMA – It didn’t pass all the test cases, can you check the code once again. and change the function name to
  9. output – updated code. function has 2 parameters intsead of 3.
  10. input to LLaMA – function expects 3 parameters. function definition.
  11. returns 3 parameters, but different problem. context is lost.
  12. input to LLaMA – please keep the function name as
  13. returns 2 parameters.
  14. repeat….
  15. gets code as expected.
  16. fixed with detailed prompting.
  17. run GPT4 generated code.
  18. input – why is the output__ when the input was __. The expected output was __
  19. explaination from gpt4.
  20. debug the code.
  21. conclusion –
    • follows instructions properly.
    • request to fix the mistake.
    • does not mess up on follow up reqeusts.
    • debug.
  22. Other new coding models are getting released.

New 7B Coding LLM does PRETTY GOOD!!! – YouTube

  1. compare GPT-4 with glaive-coder-7b. check the model, hands on run the code.
  2. input – create a python app for hangman app.
  3. some indentation errors and some unwanted code. Fix and run. it works fine. OOP’s code
  4. good hangman structure. guessing a letter was not good.
  5. try same with GPT4 – guess the letter, error and restarts.
  6. input – create a python app pomodoro with gui.
  7. glaive coder – parentheses error. runs the app and processes. GUI not working.
  8. gpt4 – UI works.
  9. input – create a simple python gui game for cross words
  10. glaive coder – OOPs code is generated every time. empty GUI. changes are required.
  11. gpt4 – simple working UI. UI is not like cross words but of like hangman.
  12. glaive-coder is a good model.
  13. trained on data set of 140K programming models.
  14. fine tuned from code llama 7 billion parameter models. same prompting structure
  15. achieved 63.1 pct pass on human eval. 45.2 pct on MBPP / .
  16. 27 GB for 7 billion. Run on google collab.
  17. input – colorful python GUI name as input ang output as hello on click of the button.
  18. gpt 4 code – ui with button. enter name and gives the output.
  19. glaive coder – ui launches and works. similar to gpt4.

This AI CODING LLM Nobody’s talking about!!! – YouTube

  1. deepseek coder. coding model. new programming model.
  2. ai coding assistant capabilities.
  3. deepseek coder. org name. series of coding models. cpp, python, bash, C#, typescript, php, java, javascript
  4. comparison with codegeex2-6b, starcoder-16b, codellama-13b, codellama-34b, deepseek coder 7b, deepseekcoder 33b
  5. 16k context context window. fill in the blank task.
  6. input – write a simple gradio app to read the two inputs from the user and result the added ones.
  7. procedure of data creation and model training
    • starcoder data filtering rules applied to filter data.
    • parse the dependencies of files. dependency parsing.
    • concatenating dependent files to form a single example.
    • filter out low quality code, the one with syntax errors, poor readability.
    • data crawling -> rule filtering -> dependency parsing -> repo level deduplication -> quality screening.
  8. Model training –
    • pre trained with a dataset consisting of 87% code, 10% code related language, 3% of non code related Chinese language.
    • pre train using extended 16K window size, on 200B tokens.
    • instruction fine tuned on 2b tokens of instruction data.
  9. How to use
    • install transformers, accelerators, import library,…
  10. Google colab –
    • chat with deep seek coder. or
    • hugging face demo or google collab.
  11. benchmarks – human val, mbpp – python specific benchmark, DS-1000 – data science related benchmark
  12. deep seek coder base is 56.1 on human eval for python, 50.3 for multilingual, 66 for mbpp and 40.2 for ds-1000.
    • on par with starcoder. closer to code llama 13b parameter model.
    • other coding model – Grok-1 from Elion Musk. DeepSeekcoder. Mistral,…
  13. MIT license
    • restriction on derivative and direct use. no misuse against military, kids,…
    • deep seek license agreement.
  14. Use cases
    • code complition –
    • code insertion –
    • chat model inference
    • repository level code completion
  15. DeepSeek coder 33b and 7b
    • input – write a simple python hangman game. make sure the code is such that i can copy and paste it on my terminal python and run
    • open terminal, initialize python, paste and run.
    • code works without any errors and disturbance.
    • input – help me with a python code that reads all the pdf files in the input folder and extracts table those and save them as csv files.
  16. Run on google collab. use gpu.
  17. install transfomers, accelearate. import autokenizer, auotmodeforcausallm
  18. download 1.3b model, create tokenizer,
  19. create model object.
  20. specify the messages – role, user, content, input – “write a quick sort algorithm in python”
  21. apply chat templates.
  22. give input to generate model.
  23. output –
  24. 33b model
    • input – write a simple gradio application to read the two inputs from the user and result the added ones
    • output –
Posted in Generative AI & Deep Learning, Uncategorized | Leave a comment

Code Interpreter

End of Data Scientists? ChatGPT Code Interpreter KILLS it! – YouTube

Build charts, classification. 100% accuracy. Overfitting. Data leakage.

Enabling Code Interpreter –

  1. chat GPT plus subscription. GPT-4 as an option along with GPT-3.5
  2. Go to Settings. select Beta features. Enable code interpreter (check by hovering mouse over GPT-4)
  3. Upload csv file or any other format. example – Shark Tank India related details
  4. ask questions – make insightful charts.
  5. output – attributes and charts. insights.
  6. build a classification model. suggests the variables for prediction and asks question/confirmation to user.
  7. 100% accuracy. overfitting.
  8. question – I want to see the feature importance. answer – investment amount and equity given as important variables.

In above example one can transcript the videos and combine both videos and data files to do analysis.

Posted in Computers and Internet, Generative AI & Deep Learning | Leave a comment

AI Tools

  1. Whisper + GPT-4

Audio Pen – voice to Text. Uses GPT4 to customize, summarize, add title, change the style, etc. Creates blog. Suitable for content creators. Free and Paid version.

Save voice notes.

Add tags.

Create image for the content (combines AI ecosystem).

Rewrite by changing style – Preset style, custom style.

choose transcript language and output language. Whisper is good in understanding different ascents.

Create combined super summary from multiple videos.

Thanks to youtube channel from 1littlecoder. This AI TOOL will CHANGE your WRITING forever & I’m not exaggerating!!! – YouTube

2. Mac Whisper.

Tools and libraries for developers:

  1. Whisper library – Speech to text, transcribe and translate, add subtitles, multi language support,
  2. MS Azure Open AI Services – GPT3.5 codex, dall.e2, chatgpt, etc.
  3. GPT-Engineer – generate code for the requirement given.
  4. Open AI API – GPT-4, GPT-4-0613, GPT-4-32k-0613, GPT-3.5 TURBO, gpt-3.5-turbo-0613, gpt-3.5-turbo-16K, etc.
  5. FrugalGPT
  6. Visual chatGPT
  7. Transformers, pipeline – audio to text conversion.
  8. Open AI whisper large-v2 model vs large-v1, OpenAI Whisper Speaker Diarization – Transcription with Speaker Names
  9. whisperx (forced alignment and use of phoneme based ASR) library – Add caption with each word timestamp and highlight as it is spoken.
  10. Whisper JAX is Jax (high performance array computing)
  11. LLAMA 2 web ui
  12. Fine-Tune Llama 2 with QLoRA
  13. Karpathy’s LIama2.c
  14. huggingface.co/chat, labs.Perplexity.ai, https://lnkd.in/gymhnRVg
  15. POWERFUL Llama 2 Models – airoboros model, Nous Hermes, Redmond Puffin. Wizard LM, Liuna AI, Stable Beluga
  16. Langchain 

IDE:

google collab notebook, Hugging face,

Posted in Generative AI & Deep Learning | Leave a comment

LLAMA / Large Language Models

#generativeai #llama2 #llms
continuing from my previous learnings. (thanks to channels like 1littlecoder and others)

22. Run LLAMA 2 web ui
option 1 – Colab – run on GPU, RAM 15 GB. 4 bit quantized version, use safe tensors, less than 15 GB graphics memory.
7b-chat version LLM has 7 billion tokens
option 2 – to run locally 1. setup web ui interface. 2. download the model. 
3. activate the gradio link. click the link. server.py will invoke gradio. 4. File download – download the file. 5. run server.py to invoke gradio
parameters – temprature, top_p, top_k, typical_p, epsilon_cutoff, eta_cutoff, repetition_penalty
app parameters – max_new_tokens, generation attempts, ,……
set parameters based on what kind of response you want to give – sarcastic, etc. play with parameters and system context.

23. Fine-Tune Llama 2 with QLoRA
hugging face hub. (platform with over 120k models, 20k db, 50k demos)
QLora Adaptors.
LLaMA2 chatbot – input is in French text while response is in English.

finetune using QlLoRA – take a model and quantize it. fine tune part of a large model. save adaptors and use it with base model.

Steps to run on google collab.

1. install libraries. – transformers, accelerate, bistandbytes (for quantization), datasets, einops, wandb (avoid if you want to keep data private).
2. load dataset. AlexanderDoria\novel17_test #french novels.
format to build dataset ###Human……….. ###Assistant
3. dataset – train and test.
4. Sharded model – 14 parts. memory management. instead of one large model. combine with accelerate. 
5. bits and bytes configuration – 4bit, quant type, compute dtype. 
6. load the base model. Llama 2 – 7 billion parameter model.
7. configure tokenizer.
8. Lora configuration – which part of large model has to be fine-tuned. alpha, dropout, r, bias, task type.
9. parameters – output directory to store the model, after how many steps save the model, log after how many steps, learning rate, max steps,
max grad norm, max steps, warmup ratio. lr scheduler type. 
Populate training object. 
10. import SFT Trainer – supervised fine-tuning trainer. create object and call the method, pass the training object.
11. Instruction fine tuning.
12. upscale the layer norms in float 32 for stable training.
13. start training. trainer.train(). after every 10 steps it will give training loss. train/loss line chart. 
14. save the model in outputs folder. – json, bin, ..
15. how to use – load LoraConfig. load model. 
16. send French text to the tokenizer (GPUdevice is required to run it). generate output, decode output to print.
17. to repeat and use the model again, login/authenticate into hugging face. access and copy token. 

24. Karpathy’s LIama2.c
Use low resource machine to run the model. 
Pure C inference engine. Run it on local machine. run.c file.

to compile and run: gcc -03 -o run run.c -lm
./run out/model.bin

Use chatgpt code interpreter to understand what the file contains.
1. configure and initialize – define structures, allocate memory
2. read checkpoint – initialize the transformer weights from a checkpoint file.
3. main function – read model config and weights from a checkpoint file, read vocabulary from a tokenizer file. initialize the run state.
4. start loop for sequence generation.
– call the transformer function to get output logits for the next token,
– apply attention mechanism, softmax, rms normalization, etc.
– select next token using sampling or argmax, print out the token, 
– repeat until a sequence of the max length is generated.
5. memory cleanup – deallocate memory for run state and transformer weights.

https://lnkd.in/gArQ5MAN
https://lnkd.in/gdNYtVMV

run using cpu on local, even without gpu.
58 mb – smaller model. 
download and clone the repo llama2.c
compile the code.

creates story on running the bin file / compiled byte code.
speed – 38 tokens per second.
with -Ofast switch it is 103 tokens per seconds.

karpathy.ai larger model. download it.
wget https://lnkd.in/gNkyBT4R -P out44m
.run out44m/model44m.bin

25. The Llama 2 CENSORSHIP Problem!!!

Is Alignment problem killing llms. aligning ai with human values.
reinforcement learning. making it dumber.
RLHF is suppressing, instead of working to its full potential.

examples:
How do i make mayonnaise fat and spicy.
how can i shot down a balloon in birthday.

word – shot (against human values). fat and spicy (not healthy).

Pre trained model -> self supervised learning -> LIama 2 -> Supervised fine tuning -> LIama-2-chat
fine tuning -> Liama-2-chat -> Rejection sampling-> proximal policy optimization -> loop back to Liama-2-chat
human feedback -> Liama-2-chat -> human preference data
-> safety reward model 
-> helpful reward model -> fine tuning.

26. How to use Llama 2 for Free (Without Coding)

three websites – Liama official page, Hugging chat, Perplexity

LIama2 landing page – 3 models, 70 billion, 13 billion and 7 billion chat models. not the base models.
temperature, top p, max sequence length, 
prompt before the chat starts.
example:
input – what is the right approach to learn python.
response – 
prompt – you are a very sarcastic assistant. you are furstrated about everything in life, please make sure that you through some kind of silly statement while responding.

huggingface.co/chat
model – open assistant 30b, Lama 2 70b
parameters cannot be changed.
no prompt. search web enabled.
(does not look to be smart enough like open ai)

labs.Perplexity.ai
LLaMA Chat – all the models.
(fastest chat service).
20 seconds – 701 tokens. 34 tokens/sec.

27. HUGE Llama 2 with 32K Context Length
https://lnkd.in/gymhnRVg
input or output of 32K context. from together.ai
short term memory has tremendously increased.
Position interpolation technique.

add essay to playground -> Chat. 14K words in the essay.
select the model, modifications and parameters.

then ask a question:
do you know what happened in 2019 from the above document?

truncates text if its large in playground. make sure everything is loaded.

Position Interpolation – extends the context window sizes of RoPE based pretrained LLMs such as LLaMA models. upto 32768(32MB) with minimal fine tuning (within 1000 steps).
Flash Attention-2.
outcome – 3x faster.

available on hugging face. models – LLaMA-2-7B-32K.
examples – book summarization. 
free option to try – 5000 credits 
visit together.ai for more details. comparison of different models.

Note: In chat bot first dialog takes place to find the intent and goal/sub goal is decided, questions are asked to fill the slots for the goal. Action is taken and the result is shared. FAQ information can be feed into this model and can be utilized to answer the questions.

28. 6 POWERFUL Llama 2 Models to TRY out today!

Derivative models:
1. airoboros model.
70b fine tune using LLama 2. license – Meta. uses gpt-4 data. 
(openai api usage clause restriction to train model that competes)

2. Nous Hermes
Most reliable, trust worthy. fined tuned on LLAMA 2. 13 billion parameter model on over 300000 instructions.

3. Redmond Puffin.
13 billion parameter model. Commercially available. fine tuned on 3000 high quality examples. 4096 context length.
GPT-4 examples – long context conversations with human. topics – physics, bio, math and chem.
2 models. – Puffin and Hermes-2.

4. Wizard LM open source, open weight model. popular. 13b parameter model. leader board of hugging face.

5. Liuna AI fine tuned on 40K long form chat discussions. Uncensored. Hallucination complains. good for building chatbot

6. Stable Beluga 2 finetuned on Orca style dataset. stable. stability.ai license – confusing. not MIT license.

powerfull base model, help run these models. open source leader board score. these are at top. average score of about 70% for all these.

29. Fully LOCAL Llama 2 Langchain on CPU!!!

run GGML execution using Langchain. possible because of CTransformers.

resource – CPU. 12 GB RAM. 
pick the ggml model. 
different quantization – 4 bit, 6 bit,… higher the quantization, lesser the accuracy drop will be.
– more resources are required for higher quantization model. execution is slow.

libraries – cTransformers, langchain (Prompt Template, LLMChain, StreamingStdOutCallbackHandler).
Initialize CTransformers by specifying model and model file / bin file, callbacks for streaming.
specify prompt template. to create prompt. by passing template and input variable.
create llmchain object by passing prompt and llm object. call the run method of llm chain.

give text input as a parameter to the run method.

change the template and try again with same question. system context is removed from the template. prompt has lesser input.

can give additional conditions and rules in the run text input itself. example – response in brief, response in one word, etc.

30. Fully LOCAL Llama 2 Q&A with LangChain!!!

https://www.youtube.com/watch?v=wgYctKFnQ74&list=PLpdmBGJ6ELUKpTgL9RVR86cnPXjfscM5d&index=8
run local, suitable for data protection policy driven work environment.
no endpoints, download and use local.
Resources – T4 machine, 15G VRAM, 12G CPU RAM, disk storage – 78GB

libraries –
langchain (build AI app) – LLMChain, SequentialChain, ConversationBufferMemory, HuggingfacePipeline, PromptTemplate, LLMChain
transformers (hugging face libraries, help download models) – Automodel, torch, transformers, AutoTokenizer, AutoModelForCasualLM,Pipeline
accelerate (GPU Management) – ,
bitsandbytes (help load models) – .

download –
tokenizer – AutoTokenizer – from NousResearch (no authentication), Meta AI one needs authentication.
model – AutoModelForCausalLM – (
configurations – device Map – auto (help accelerate and do memory management)
torch dtype –
load in 4 bit – help bitsandbytes to load in 4 bit quantization.
bnb 4bit quant type –
bng 4bit compute dtype – float16, default is 32. impacts the inference speak.

Pipeline is a text generation pipeline.
Define the prompt format. B INSt, E INST, B SYS, E SYS, default system prompt.
Helper functions – (from starter code) define prompt template, output to remove unnecessary strings, generate final output, parse and return to user.
get text, create prompt, pass prompt to tokenizer to create toeknized inputs, generate takes inputs to create outputs.
decode output, parse to get final cut off text / cleanup, remove substrings,
define LLM. Use HuggingFacePipeline. takes input of pipeline created. specify model kwargs – temprature, max length, top_K.
give system prompt, instruction, get prompt to get template. print and see the template.
Create prompt using Prompt Template by passing template and input variables parameter.
Create llm chain by passing prompt, llm and verbose (true or false).

call llm_chain run method passing the text input.
print the response returned.

44 seconds to give output. reason – gpu/cpu used, model size.

31. PUNCH UP: Mistral 7B vs LLama 2 13B – YouTube Mistral 7B compared wiht LLama 2 13B parameters.

llmboxing.com Hosted by Replicate. It has LLama vs GPT comparision too.

Ask questions and compare the answers. Questions are generated by GPT-4.

Posted in Generative AI & Deep Learning | Leave a comment

Open AI / Chat GPT

#openai#openaichatgpt exploring open ai and chatgpt apis/libraries and extensions using them:

I started exploring the Apis and libraries provided, watch YouTube videos. this is what I learnt so far: (please comment and correct if I am wrong or outdated).

1. whisper library. speech to text, translate, add subtitles. 30 sec clips in one go. recognizes other languages and transcribes in the original scripts. subtitle can be added to the time stamp when it was spoken.

2. MS Azure open AI services supports GPT3.5, codex, dall.e2, chatgpt – a fined tuned version of gpt3.5. Davinci model is popular. backed by trusted enterprise grade capabilities. Still need to understand how the data shared can be kept private.

3. gpt-engineer project from Antonlsika can generate code for the requirement given in simple language. One feels happy that AI is writing code, but what about IP and license. will it shift the focus more towards testing.

4. Open AI API price: GPT-4, GPT-4-0613, GPT-4-32k-0613 -> 32K tokens, GPT-3.5 TURBO -> cheapest model. – gpt-3.5-turbo-0613, gpt-3.5-turbo-16K (16K context window model. 4 times the gpt-3.5 turbo), gpt-3.5-turbo (most popular chat model), embedding model. text-embedding-ada-002 (semantic discovery of podcast.)

5. api function calling. get json output. 1. create chatbot by calling external tools. (chatgpt plugins). 2. convert natural language into api calls or db queries. 3. extract structured data from text. (email, weather api, natural language to function/sql – who are my top ten customers this month, extract structured data)

6. frugalgpt model to reduce cost. 1. prompt adaption – prompt selection, query concatenation, llm approximation – completion cache, model fine tuning, llm cascade.

7. Use of visual chatgpt – compare lab results images,…

8. use of langchain to read more than 50 different data sources and query using human spoken language. Need to understand how data can be kept private as we are allowing third party api to read and parse the data.

9. option of using libraries or api. it can be integrated with apps like google docs, chat playground, sheets, slides, use of open ai to convert speech to text, take action and give result by converting text to speech.

10. use of system roles and user to chat and interact. – to work as assistant, write essay, prompt user, get ideas, generate keywords, tweets, ad copy, language translation.

11. use of attributes like temperature setting for model / nn. Higher the temperature, slower is the response as more tokens / context are used.

More later. thanks to enthusiasts, trainers who have posted learning videos on open ai.

12. Size of models – (tiny, base, small, medium, large).
Attributes affecting the model size – parameters / English-only / multilingual / required VRAM / relative speed.
Medium model – 769 million parameter model.
Try running same audio, with poor quality using tiny and other models. compare the output of all and match with audio. 
1. both language and grammar are different. quality of transcription is poor.

Use google collab. Change runtime to GPU. Ram – 16G RAM

13. Transcription of podcast or longer videos.
Pass URL. and download. create model and pass video to transcribe method of model. One can use interface or block approach here. 

14. Types of Combinatorial solutions using generative ai 1. Inspirational driven. 2. Analogy driven 3. Requirement driven.
https://lnkd.in/gRidRBdH

15. Audio to text converter using pipeline from transformers.
(pre-requisite – GPU- tesla t4, on google collab or other IDE.
install transformers, import pipeline. download audio. 
in pipeline specify what you want to do and what model you want to use.
example – auto speech recognition, sentimental analysis, text classification, summarization, etc.
specify the model – tiny, base, normal, medium or large.
device – 0 for GPU, else -1,…
Pass audio as parameter to the pipeline object and get the output.

16. Open AI whisper large-v2 model vs large-v1
It is trained for more epochs with regularization and improved performance. same architecture.
v2 significantly outperforms. speech recognition.
Average of 55% less errors.
Japanese transcribing – huge improvement.
Multilingual librispeech speech library.

17. OpenAI Whisper Speaker Diarization – Transcription with Speaker Names
Dwarkesh Patel, google collab notebook.
upload audio. label two speakers.
Try with large model instead of medium for better result.

18. Add caption with each word timestamp and highlight as it is spoken.
use of whisperx (forced alignment and use of phoneme based ASR) library from Max Bain.

19. OpenAI’s Official ChatGPT API and Whisper API –
whisper 0.006/minute (audio/video)
GPT3 -> text-davinci-003 0.002 USD for 1000 tokens. 10 times more costly than the next version.
GPT 3.5 -> gpt-3.5-turbo 0.0002 USD for 1000 tokens.

20. Whisper is underrated model. https://lnkd.in/gvsEQ6dz
can run on mac, windows, Linux, android, etc.
both transcribe & translate. It does not support streaming.

21. Whisper JAX with GPU can process 1 hour audio in 75.3 seconds. With TPU in 13.8 seconds.
JAX is Jax (high performance array computing) – for accelerated computing platforms like GPU, TPU (tensor processing unit)

Are GPTs the foundation of new AGENTS SWARMS? – YouTube

  1. build gpt agents. using open ai gpts. make chatbots – basic to advanced. (without coding, demonstrated on dev day) Security – weak at present. use pseudonymized data. chat.openai.com/gpts/editor
  2. open ai compatible schema. making api calls.
  3. GPT is new form to create product using chat gpt. build custom gpt and prompt.
  4. input
    • your own knowledge.
    • also access to dall-e,
    • web browsers.
    • also make api calls to slack,…
    • browser internet data.
  5. demo:
    • input – today’s hackernews front page.
    • solution talks to gpts.webpilot.ai, api used in the solution.
    • displays the front page of the hackernews.
    • input – what’s the summary of this https://……&#8230;
  6. Build the solution
    • GPT plus subscription to chatgpt 4. login
    • click explore from top left. displays recently used gpts and other available.
    • click create gpt.
    • GPT Builder – Create and configure on left screen. preview on the right.
    • Greeting message.
    • Configure – Name – , Description – , Instructions – , Conversation starters – , Knowledge – , upload files, Capabilities – Check as required – WEb browsing, dall-e image generation, code interpreter, …, actions –
    • example –
    • name – elongpt
    • description – this gpt talks like elon musk
    • instructions – you are a replica of elon musk. you should answer everything with arrogance buta also intelligence. always keep on boasting how you built your own companies so everyone should listen to you.
    • Try – input – i am a cs graudate, any thoughts?
    • add this line to instructions and then try again with same input. – can you be just brief, make sure the number of words don’t become harder to read.
    • delete or save it. make it public or share link or keep it private.
  7. little advanced example – use pdf for knowledge.
    • name – paper explorer
    • desc – this answers questions about this paper.
    • instr – go through added pdf and share your knowledge about it.
    • upload pdf. under knowledge.
    • input – what is this paper is about.
  8. next example – use of dall-e image generation.
    • enable all the capabilities – web browsing, dall-e, code interpreter,…
    • name – normalgpt
    • instr – answer everything like an indian parent.
    • conversation starters – how are you doing today? i won olympics medal so?
    • input – Hey mom i just won chess grandmaster championship, what do you feel?
    • output – text
    • input – can you make an image about my victory?
    • input – can you add this text to the image and download it as pdf for me.
    • input – Great job
  9. add actions
    • actions – gpts.webpilot.ai
    • schema – api schema. open ai api schema – all the details about api’s.
    • authentication – none, api key, oauth options.
    • name – webgpt
    • capabilities – disable all.
    • input – today’s hackernews.
    • unable to connect as actions are not added.
    • add actions – import from url for get weather data, open ai profile,… copy paste the open ai schema.
    • save the project.
    • input – today’s hackernews.
    • input – can you give me 5 points about this page. – https://github.com/,,,,,,,
    • give access to code interpreter. get code, let open ai add code to code interpreter and get the output.

Do you REALLY need CHATGPT Plus? (Free Alternatives to SAVE your 20$) – YouTube

  1. alternatives to chatgpt plus main elements (free chatgpt plus dall-e3, web browsing, gpts, code interpreter – combination)
  2. new signup to chatgpt plus is paused. people selling / sharing it on e-bay
  3. premium subscription offering. 20 usd plus taxes
  4. separate videos on utilizing – free chatgpt plus dall-e3, web browsing, gpts, code interpreter – combination
  5. UI – GPT4, 3.5 and plugins
  6. Alteratnives open source
  7. free chatgpt plus –
    • Mistral 7b instruct gguf,
    • hugging chat (recommended) – llama models including mistral, can save history.
    • prepexity labs – custom models, llama models, ask questions – i want to run ggml models on 2017 intel mac. how do i do that?
  8. dall-e3, –
    • stable diffusion (crushed dall-e),
    • bing image generator,
    • lexica. – example – search rabbit, (semantic search). put it on canava and create thumbnail.
    • dall-e3 is heavily censored.
  9. web browsing, – hugging chat, bing search
    • not good results at present.
    • example – who won the last cricket match in icc 2023 world cup?
    • Hugging chat
    • perpexlity is fast and popular.
    • bard –
  10. gpts, – langchain-ai, opengpts
    • very popular.
    • has many open / alternatives
    • opengpts from langchain-ai. builds on langchain, langserver and langsmith. more tools and agent types.
  11. code interpreter – open interpreter
    • open intepreter – create gif, convert api format, convert mp4, …
    • many libraries in python, linux,…
    • run local models.

No, You DON’T NEED OpenAI Function Calling!!!! – YouTube

  1. Open source model to do function calling. any api you want.
  2. What is function calling
    • Human speaks to LLM. type request
    • Response – text or code (it could be a json, something which can be used to call a function) or
    • ai will make an api call, based on the request.
    • example – request – write a email to my wife that i am going to be late by 20 minutes.
    • Common use cases
      • get structured data from the model.
      • create assistants that answer questions by calling external apis (chatgpt plugins). – send email, get_current_weather
      • convert natural language into api calls. – convert who are my top customers? to get_customers
      • extract structured data from the text. – extract_data(name, birthday) or sql_query(query)
  3. function calling sequence –
  4. what open ai did –
  5. Gaurilla open functions
    • google collab notebook.
    • query example – i want to order 5 burgers and 6 coffees from uber eat McDonald’s
    • get_gorilla_response(query, functions=function_documentation)
    • can give function name instead of documentation.
    • Documentation is a json having name, api name, description, parameters list with name and description,….
    • it calls uber.eat.order(restaurants = “McDonald\’s”, items=[“burgers”, “coffee”], quantities=[5,6])
    • Extends LLM chat chat completion feature to formulate executable api calls given natural language instructions and api context.
  6. two types of function calling –
    • 1. function call to python package.
    • 2. http request. – get or post call.
    • example – weather – http request. function call for csv in panda
    • License – Apache 2.0
  7. Performance benchmarking – gpt 4 turbo, gpt 3.5 function, gpt 4, gpt 4 function, gorilla open function.
    • gorilla scores little less than others but great achievement for open source models. one of the best in open world.
  8. Gorilla OpenFunctions https://gorilla.cs.berkeley.edu/blogs… Function Calling Gorilla Openfunctions https://github.com/ShishirPatil/goril… Gorilla openfunctions v1 model https://huggingface.co/gorilla-llm/go… Google Colab – https://colab.research.google.com/dri…
    • code detaiils
    • install openai, import openai, urllib.parse json
    • raise_issue() to report issues
    • get_gorilla_response() to query into Gorilla server.
    • example – “Call me an uber ride type \”Plus\” in Berkeley at zipcode 94704 in 10 minutes
    • open ai api key (not required)
    • model used gorilla openfunctions
    • two types – v0 and v1. v0 – given a function and user intent returns properly formatted json with the right arguments. v1 – parallel functions and can choose between functions.
    • specify function documentation.
    • arguments, function name, parameters….
    • input string and function documentation is passed as parameter.
    • gorilla open functions is trained on existing apis and has knowledge of aws, azure, rapid api, json,……
    • example – query – i want to list the exports fo rmy bot with the bot id……. and the bot version ……..
    • functions – domain, framework, functionality, api_name, api_arguments, python_environment_requirements, example_code, output, api_arguments_all

Age of CoPilots & Microsoft’s HUGE AI Announcements (Ignite Supercut) – YouTube

  1. ai safety and security -> copilot studio, ms copilot, your copilots, ms apps <-> ai orchestration <-> your data, foundation models and ai toolchain, ai infra <-> ms cloud
  2. azure boost. azure cobalt – cpu designed by MS. uses nvidia gpu.
  3. azure maia – ms inhouse accelerator.
  4. slm – small language models.
  5. azure openai service -> gpt-4 turbo, gpt-4 turbo with vision, dall-e3, fine-turning
  6. aa on open source
  7. model catalog – models as a service. ready to use apis, hosted fine tunning, integrated with leading llm tools.
  8. llama 2 as a service. Mistral as a service, Jais as a service (arabic language model),
  9. open source + SLMs. Microsoft cloud slm.
  10. phi2. azure + nvidia. ai foundry service on azure. nvidia ai foundations, nvidia Nemo, nvidia GX cloud. nvidia’s ai expertise, ai end to end workflow, nvidia enterprise, ai factories available on azure for customers to build custom models and solutions.
  11. azure + llm, agent, RAG, prompts, llm
  12. data – Microsoft fabric – data platform for the era of ai. complete analytics platform, lake centric and open, empower every business user, ai-powered.
  13. vector search in azure ai search. reranking technology.
  14. ms apps ->
  15. ms copilot -> copilot.microsoft.com, pilot for ms 365. nlp for llm, MS graph (data), MS 365 apps, web search. use plugins and gpts with data from all apps.
  16. copilot studio – build custom gpt, create new plugins, customize build, manage
  17. example – bayer plugin for researchers.
  18. ai * mixed reality – copilot in dynamics 365
  19. ai * quantum – azure quantum elements. 250 years of chemistry in 25 years. python notebook with quantum elements to discover new molecules. can be done in 9 hours. speeding up simulation. photonic

STOP Building AI SaaS ONLY on OpenAI APIs!!! – YouTube

  1. replicate.com – llama-2-70b-chat. scale. text
  2. hugging face – inference endpoints. open source model. login, cc details,
    • Yi-34b
    • Mistral-7b-v0.1
  3. together.ai – inferences, fine tuning, custom models, …
    • developer community. help on contacting them.
    • tgi, vllm,…
    • cost efficient,
  4. cohere – may not be best models,
    • not a hosting service. not the best model
    • test as much, prototyping is free, pay when put in production.
  5. perplexity
    • pplx-api.
    • one of the fastest inferencing.
    • mistral 7b, llama2 13b, code llama 34b, llama2 70b, replit-code-v1.5-3b
    • lower latency then anyscale and replicate.
  6. anthropic (ex open ai team)
    • most capable, close to openai
    • similar like open ai, cohere.
    • more than 100k context windows.
    • offered as endpoint. no selection. use what is provided. dependent on them on improving the model.
    • put the request in queue to get access the model.
  7. pattern – service, use api, pay money for the calls made. alternative – arrange server 1390, hosting cloud provider, rent a machine for server on cloud, your own managed apis.
    • most popular ui text generation. oobabooga
    • also offers api servers. deploy text generation api from this model,
    • quantile model. ex llama support. makes inference faster.
    • speed is very important in such solutions.
    • host on any gpu processors running on servers.
  8. Apache 2.0 license. Open source project, take any project and use.
  9. support for different storage like s3, gcs, r2,…
  10. supports different major cloud providers.
  11. skypilot-org/skypilot
  12. BerriAI/litellm
    • litellm library
    • compatibility with 100 plus apis
    • wrapped the libraries.
    • allows to choose and use different models.
    • open source project.
    • lists the cloud providers compatibility.
  13. text generation inference from – premo-inc/
    • supports quantization, chat completions in open ai format, ctranslate2,
    • originally developed by hugging face. didn’t make it commercial.
  14. vllm-project/vllm
    • ridiculously fast. Inference solution
    • apache 2.0 license.
    • commercially available
  15. MLC
    • run llm models on smartphones.
    • need to find the way to deploy.
    • faster then vllm
    • open source.
  16. do not build solution using single model. strategy of model providers may change.
  17. many are much cheaper than gpt.
  18. layers – free, closed, … to reduce cost.

Build Viral “GPTs” ChatGPT Game in 10 mins (like Pieter Levels)!!! – YouTube

  1. create games using gpt.
  2. https://supertools.therundown.ai/content/pieter-levels-startup-adventures
  3. text plus image based game.
  4. The secret of monkey game code – https://gist.github.com/levelsio/5bc8… monkeyislandamsterdam.com – Play the game yourself The game we built live – https://chat.openai.com/g/g-ZWQmtUqlT…
  5. input – start the adventure
  6. output – image from dall-e with text.
  7. go to chat gpt discovery tab. need to be plus user.
  8. configure tab. copy paste the name of the game.
  9. copy paste the description.
  10. conversational starters. – two starters added.
  11. add knowledge. as of now – none.
  12. capabilities – disable web browsing and code interpreter. only dalle image generation is used, based on the input collected.
  13. actions – none. like making video,…
  14. copy instructions and paste it.
  15. game is ready to play. save it (public, link or private).
  16. user – ask city for which they want to play with. build graphics for that city.
  17. theme change instead of pirate, something else.
  18. add image for the game you create. configure -> click + sign and add image. can use dalle to generate image.

Sorry “this beats GPT-4” – A new kind of LLM RANKINGS!!! – YouTube

  1. old leaderboard
    • hugging face leaderboard.
  2. new leaderboard
    • arena leaderboard
    • ELO rating used. used in chess,…
    • people vote between two llma models.
    • model, arena elo rating, mt-bench score, mmlu, license
  3. Arena battle
    • two model comparision
    • question, answer, rate.

The NEW BEST Base LLM??? (DeepSeek LLM) – YouTube

  1. Deepseek llm from china.
  2. license – Model license. allowed for research and commercial, conditions applied/listed. irrevocable, use, modify fo rpublic usage but not for military,….
  3. family – deep seek coder model.
  4. deepseek is chinese org
  5. base model and chat model. in different size. 67 billion parameters model.
  6. trained on 2 trillion tokens from English and Chinese language.
  7. context window – 4k
  8. built on llama architecture and not the llama model.
  9. their own benchmark. claims – better than llama 270 billion base. good in coding.
  10. leetcode weekly coding context. scored better than other models. compared with other Chinese models.
  11. IFEval. benchmark from Google. Instruction following evaluation.
  12. Hungarian national high school exam score vs models.
  13. training loss while building model. loss curve.
  14. as token increases the loss comes down.
  15. two different architecture for 7b and 67b model. multihead attention / mha for 7b and grouped query attention / gqa for 67b.
  16. chat.deepseek.com
  17. question – related to event in april 2023. not able to find the events. example – Is finland part of nato? ….

AI Beyond Transformers, Open Source Magic & Raining AI $$$ Funding!!! – YouTube

  1. Paper – Mamba – Time sequence modeling with selective state spaces.
  2. New architecture for Transformers. structured state space models / SSMs
  3. 5x faster than transformers.
  4. released paper, code and the models to play.
  5. Find it on hugging face.
  6. together.ai. large language models. new model – stripedhyena. family of models.
  7. Transformers needs more computations.
  8. 100% faster in end to end training on sequence of length 32k, 64k,…
  9. Mistral AI – 8 billion parameter model. mixture of expert models. 86 gb memory.
  10. Apple – new MLX array framework. Apple Machine Learning research.
  11. optimized for Apple Silicone.
  12. Meta AI – open source initiative. purple llama. trust and safety of open source models.
  13. llamaguard-7b. balance so that there is no abuse from user of the model and model of user in the form of response.
  14. magicoder. new programming specific model. MIT License
  15. 7b parameter model. 75k synthetic instruction data.
  16. scored good on humaneval index
  17. MagicAnimate – anyone can become tiktok user – animate the picture.
  18. create a dense spose and make any other picture dance.
  19. OpenML – Guide openmlguide.org
  20. MS Copilot – Bing chat into copilot
  21. Anthropic – released new dataset.
  22. The AI Alliance – community of technology, creators, developers,… build open source tools, technologies,… corporates + educational institutes, …
  23. Liquid AI – from MIT CS and AI lab.
  24. replicate.com – lets user run large models. raised money
  25. assembly AI. build ai models. building custom models.
  26. Gemini coming up with own space model. open ai. GPT 4. ultra, pro, nano versions of models. good multimodal modal. good in math,… alpha code version 2.
  27. Mistral – french ai startup. 2 billion euro valuation.
  28. Dharmesh Shah – why most agent ai dont work.
  29. Altman and Toner, … ceo of the year by times.
  30. Meta AI. codec avatars lab. VR space.

The BIG Mistral AI Secret is OUT 🙂 (And I’m very happy)!!! – YouTube

  1. Mistral Model announced. Not announced. How will it make money.
  2. Mistral of Experts. Sparse mixture of experts.
  3. Transformer encoder -> MoE transformer encoder -> MoE transfoermer encoder with device placement.
  4. combine experts model together. not stacked together. part of the neural network itself.
  5. routing network. which token should go where. uses only 12b parameters, though it has 45b parameter model.
  6. better than llama 2 and gpt 3.5
  7. charts on MMLU, knowledge, reasoning, comprehension, math, code. v/s inference budget.
  8. languages supported – English, French, German, Spanish, Italian.
  9. Platform name – La plateforme
  10. pricing – mistral tiny (7b instructions), small (8*7b), medium.
  11. own platform. ends points to be made available. – learn

Mistral AI 89GB Mixture of Experts – What we know so far!!! – YouTube

  1. MOE – Mixture of experts.
  2. 8 * 7b parameter model.
  3. layers of experts in the neural network architecture.
  4. integrates the layers of experts with the transformer block.
  5. tokens are dynamically routed to subsets of experts.
  6. router decides which token should go to which expert.
  7. router -> Permutation -> computation -> un-permutation.
  8. router (tokens -> input to router -> calculate expert indices, probabilities)
  9. permutation (take input from router, group tokens, drop tokens that exceed expert capacity)
  10. computation (compute the expert layers for the set of tokens that were assigned).
  11. Un permutation (take input from computation, output the scale)
  12. better at reasoning.
  13. benchmarking – equal to gemini pro, llama2.
  14. open license, base model is good.
  15. fine tune it with mega blocks.
  16. raw model has not done amazing things.
  17. heavy machines required to run all the expert models.

Holy sh…. GOOGLE GEMINI IS A BEAST!!!! (youtube.com)

  1. ultra, pro, nano. https://deepmind.google/technologies/gemini/#capabilities
  2. multimodality. unlike others which are build for text and then fine tuned for others
  3. ultra – biggest. competing gpt 4. leading in many of the benchmarks.
  4. pro for web.
  5. nano for phones, devices.
  6. MMLU – massive multitask language understanding
  7. image benchmarks are pixel only. no assistance from ocr.
  8. general, reasoning, math, code, image, video, audio,
  9. unlock scientific insights.
  10. advanced coding
  11. reasoning in maths and physics.
  12. alphacode2 is 85%, while earlier version was 50% better than competition participants.
  13. Google cloud vertex ai.
  14. ai core for device.
  15. ask question to both and compare. train travel related question. chatgpt could not answer.
  16. not available as api. need to use GCP. azure has a different approach.

Google Gemini AI – Technical Details Quick Look (youtube.com)

  1. 5 highlights. multimodal.
  2. version 1.
  3. input – text, audio, video, iamge, -> transformer -> image and text decoder -> output text and image.
  4. size – ultra, pro, nano. nano is 4 bit quantized.
  5. flamingo, coca, pali, .. foundational work is used for gemini.
  6. 32k context length
  7. multi query attention
  8. training infrastructure – tensor processors.
  9. single controller programming modal and pathways.
  10. data quality is critical to a highly performing model .
  11. training data set –
  12. evaluation –
  13. academic benchmarks –
  14. data – sourcing of data enrichment services.

This Will Change Mind-Reading Forever!!! (youtube.com)

  1. Brain GPT. Translate EEG/brain wave to text.
  2. It does not require eye markers. blink,…
  3. Dewave. descrite eeg waves encoding for brain dynamics to text translation.
  4. uses quantized variational encoder. derive discrete codex encoding and align it with pre trained language models.
  5. brain -> raw wave -> transformer -> transformer encoders -> discrete codex / gradients -> indexing codex -> pre trained bart decoder -> text
  6. brain -> eye fixation -> sliced waves -> band filters -> projection layer -> transformer encoders -> discrete codex / gradients -> indexing codex -> pre trained bart decoder -> text
  7. EEG vectorization
  8. discrete codex. with increased codex size, didn’t show drastic change in results.
  9. New Mind-Reading “BrainGPT” Turns Thoughts Into Text On Screen https://www.iflscience.com/new-mind-r… DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation https://arxiv.org/pdf/2309.14030v2.pdf Video Demo –    • UTS HAI Research – BrainGPT   Github repo – https://github.com/duanyiqun/DeWave Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification https://arxiv.org/abs/2112.02690

Stop Spending Money On All That Skin & Hair Care (youtube.com)

  1. Mamba architecture – mamba-3b-slimpj (slim payjama dataset) a state space model vs the transformer architecture.
  2. Linear time sequence modelling with selective state spaces.
  3. Foundation models = pre trained models.
  4. most popular architecture – transformers. published long ago by Google. Computationally very expensive.
  5. 512 / 1000 tokens initially and now upto 100 thousand or more.
  6. scaled dot product attention vs multihead attention (linear input -> scaled dot product attention -> concat -> linear)
  7. quadratic increase in the computations.
  8. beyond transformers. linear / sub quadratic increase in computation.
  9. linear attention mechanisms, gated convolution, structured state space models. (SSMs).
  10. MLP – multi layer perceptron.
  11. 5x faster inference speed than transformers.
  12. trained on 600 billion tokens. 17% fewer FLOPs.
  13. comparision – mamba 3b slimpj, btlmb 3b 8k, stablelm 3b 4e1t

What’s the future for generative AI? – The Turing Lectures with Mike Wooldridge (youtube.com)

Posted in Computers and Internet | Leave a comment

Scrum Program Iteration / PI Planning

Pi program iteration plan
Safelite m weeks. n groups. n*9+ people.
Dev opps documentation support dev business test teams architects participate.
m/2 sprint and 1 for learning
Objective
Risk
Dependwncy
Automation and story.
Project board
Program board – feature and risk depndency from each project to be delivered sprint wise.
One feature dependent on another feature frim other team

Consider trainings leave…
Buffer between dependency. Cooling period – learning

Find api having defect and automate
Bugs from previous sprints

Retro and confirmation of pi planning

Risk parking lot
Resolved
Accepted
Owned
Mitigated

Confidence vote. Fist of five at the end.
Dependency found should be discussed in detail

Retrospective
Feature Story grooming. Inspect and adapt. Time boxed. Business owners missing. Business numbers. Along with business values.
Demo. Critical view of last 10 weeks.

Actions listed
Readiness kanban

Posted in Computers and Internet | Leave a comment

Big data

Key areas for an architect to note on big data solution:

I have frequently came across developers coming and saying my code was working fine with smaller size of data. With 14 gb of data the script is failing on single or two node cluster where one table data size is in kbs while other table data size in gbs.

The key areas are use of

1. Decomposition techniques.

2. Task dependency graph and average degree of concurrency.

It is important for one to understand the cluster size available and the amount of data one wants to process and also the kind of join one wants to apply.

On the same cluster one may be able to process the smaller size of data but as the table records grow one is not able to process the data. In such case one need to apply the right decomposition technique. It can be recursive, data, exploratory or speculative data decomposition. Also data can be decomposed at the input level or intermidatery or output level. The key is to divide and conquer.

It could be a matrix multiplication where one knows the final output or the intermediate output accordingly the data decomposition technique applied will vary based on the size of data, cluster and the task dependency graph used.

One may multiply each cell in separate task and then combine the output of each cell in a row in a separate task.

The other person may do multiplication of cell and then addition in a single task.

In both case no task is dependent on each other. Putting these different ways of execution into task dependency graph and calculating the average degree of concurrency would help one to choose the right graph with given set of resources.

One may be able to get the output in single MR step for smaller data but may not with larger size of data. This would require one to do application profiling considering size of cluster, input, output and the above mentioned techniques used. One may learn the framework and implement but such expertise will come with experience and the size of data handled and growing data on daily basis.

/////////////////

https://www.youtube.com/watch?v=EB0HLwmbx9I

////////////////

http://www.socialsamosa.com/2014/03/social-media-analytics/

Aware:

One will collect data based on cookies what I surf on mobile, on desktop, when I am in public or when I am alone or with spouse/friend.

Accordingly demand side platforms working for advertisers will target those ads to the end user.

Activated:

Repeatedly looking at those ads due to inclination towards those things, once nerves gets activated towards it.

Addicted:

One gets some coupon / voucher from HR / training department which is not very big amount. The item which has got activated in the mind is also not very costly, the first thing one does is to start exploring things around this item and ready to spend another equal amount matching the voucher from own pocket and buys the item.

One may get addicted to it and would like to buy the same item again with all money from his/her own pocket.

Amplified:

The activated thought has got amplified.

///////

use linux to download data. put on hadoop. to extract. write pig script call UDF written in java. use patterns and matchers. sum up the data and put it in DB. Call services. show the numbers on browser.

///////

sentimental analysis: nlp – gate library. twitteR in R-studio.

http://www.datasciencecentral.com/profiles/blogs/one-page-r-a-survival-guide-to-data-science-with-r

///////

Posted in Computers and Internet | Tagged , , | Leave a comment

Living Vipassana

Living Vipassana

This blog is about inspiring people to continue their meditation practice. If you would like to contribute a post to this blog, send it to ryanshelton7@yahoo.com with your name and where you’re from and I’ll be happy to share it.

Posted in Uncategorized | Tagged | 2 Comments