# How to guides ## API Guide ### Basic Completions with user query It shows how a user might interact with the API by providing a prompt and receiving a completion. The response from the API is an example of what the system returns after processing the input prompt. This allows users to see the actual output generated by the API based on the given input. Following request shows an example of a particular use-case. `HTTP Request` ```bash POST /v1.1/skills/completion/query HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "messages": [ {"content": "tell me about nature", "role": "user"} ], "skill_parameters" : { "model_name" : "gpt-4", "emb_type": "openai", "max_output_tokens": 4096 }, "stream_response": false } ``` `Response` ```json { "data": { "content": "Nature encompasses all the living and non-living things that exist naturally on Earth or some part of Earth. It can refer to the phenomena of the physical world, and also to life in general. The study of nature is a large, if not the only, part of science. Nature is often considered in terms of the natural world, or the universe beyond human life, but it has also been conceptualized as a representation of the untouched, those parts of the world's existence that have not been altered significantly by human intervention.\n\nThe concept of nature as a whole, the physical universe, is one of several expansions of the original notion; it began with certain core applications of the word by pre-Socratic philosophers, and has steadily gained currency ever since. This usage continued during the advent of modern scientific method in the last several centuries.\n\nHere are some key aspects of nature:\n\n1. **Biodiversity**: Nature is characterized by the incredible diversity of life on Earth, from microscopic bacteria to giant redwood trees and blue whales. Biodiversity refers to the variety of life forms within a given area, from the genetic level to species and ecosystems.\n\n2. **Ecosystems**: Nature consists of various ecosystems, which are communities of organisms (biotic components) interacting with their non-living (abiotic) environments. Examples include forests, coral reefs, grasslands, deserts, and tundra.\n\n3. **Geological Features**: The physical landscape of nature is shaped by the Earth's geological processes, including volcanism, plate tectonics, erosion, and glaciation, resulting in mountains, valleys, coastlines, and other landforms.\n\n4. **Water Systems**: Water is a fundamental aspect of nature, covering oceans, lakes, rivers, and underground aquifers. It supports life, shapes landscapes, and is part of the Earth's hydrological cycle.\n\n5. **Weather and Climate**: The state of the atmosphere at a place and time with regards to heat, dryness, sunshine, wind, rain, etc. constitutes the weather. Climate refers to weather patterns observed over a long term and can dictate the characteristics and behaviors of ecosystems.\n\n6. **Natural Resources**: Nature provides various resources essential for human survival and economic activity, including water, timber, minerals, and fossil fuels. It also offers services such as pollination, decomposition, and natural hazard regulation.\n\n7. **Conservation and Preservation**: The protection of natural landscapes and wildlife is vital for maintaining biodiversity, ecological balance, and the overall health of the planet. Efforts include establishing wildlife reserves, conservation practices, and combating pollution and climate change.\n\nNature operates through fundamental physical laws that describe the behavior of the natural world. Humans have developed diverse fields of sciences, like biology, geology, meteorology, ecology, and physics, to understand and describe these laws and their effects on the living and non-living components of nature." } } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for querying the document completion skill completion_endpoint = f"https://api.lab45.ai/v1.1/skills/completion/query" # Set the headers for the request, including content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type of the request is JSON, meaning the request body will be in JSON format 'Accept': "text/event-stream, application/json", # The server is expected to respond with either event-stream or JSON 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which contains the user's query and skill parameters payload = { "messages": [ {"content": "tell me about nature", "role": "user"} ], "skill_parameters" : { "model_name" : "gpt-4", "emb_type": "openai", "max_output_tokens": 4096 }, "stream_response": False } response = requests.post(completion_endpoint, headers=headers, json=payload) # Print the response from the API call to inspect the result (status code, content, etc.) print(response) ``` More details on the parameters and API, you can refer to [Completion API](https://docs.lab45.ai/quickstart.html#completion-api). To download the Python notebook, refer to the [Completion API Python Notebook](./_static/completion.ipynb) for a step-by-step implementation of the Completion API use case. This notebook includes examples of generating completions, handling API requests, and customizing prompts for different tasks. ### Index larger number of files via Document Completion API Document Completion is a sophisticated skill that leverages advanced language models combined with information retrieval techniques to respond accurately to user queries by dynamically extracting relevant information from a document. The core concept behind this approach is Retrieval-Augmented Generation (RAG), which enhances the model’s capability by enabling it to search for relevant context from external knowledge (such as documents or databases) before generating a response. This helps in producing more accurate and contextually relevant answers to user queries. The endpoint uses various skill parameters which can be user configured to generate the response accordingly. The API operates in a 4-step sequence that ensures efficient processing and querying of document-based data to generate contextual responses. The four steps include Dataset Creation, Ingestion, Preparation, and Querying. Each step involves an API endpoint that is essential for moving through the pipeline and getting the final response. Here's an in-depth look at each of these four steps with a sample `use-case` for `indexing larger number of files`: #### Create Dataset This endpoint is used to create a new dataset in the system. A dataset is a logical grouping that contains all the documents uploaded by the user. To get the ``, please refer to the [Authentication Page](https://docs.lab45.ai/quickstart.html#authentication). `HTTP Request` ```bash POST /v1.1/datasets HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "name": "Test Dataset1", "description":"Some test Dataset1" } ``` `Response` ```json { "_id": "1d21b16a-bd3e-4a2d-8e94-fd65fe79feb5", "desc": "Some test Dataset1", "files": [], "name": "Test Dataset1", "owners": [ "00000000-0000-0000-0000-0000000000a0" ], "tenant_id": "00000000-0000-0000-0000-000000000000" } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for the datasets datasets_endpoint = f"https://api.lab45.ai/v1.1/datasets" # Set the headers for the request, including the content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type is JSON, so the body of the request will be in JSON format 'Accept': "text/event-stream, application/json", # The server is expected to respond with event-stream or JSON data 'Authorization': "Bearer " # Authorization header containing the API key (replace with your actual API key) } # Payload data to be sent with the request, typically in JSON format. This payload defines a new dataset. payload = { "name": "Test Dataset1", # Name of the dataset "description": "Some test Dataset1" # Description of the dataset } # Make the POST request to the datasets API endpoint with the provided headers and payload response = requests.post(datasets_endpoint, headers=headers, json=payload) # Print the response from the API call (this could be the status code, or content, depending on the API response) print(response) ``` #### Ingest Dataset This endpoint is used to upload the document data into the previously created dataset. The files are stored in the blob storage for future purposes. `Http Request` ```bash POST /v1.1/datasets/{dataset_id}/ingest Content-Type: MultipartFormData Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 file1: file, /file.txt ``` - In the above request, `{dataset_id}` will be the `_id` value generated in create dataset api response. `Note`: it should be put as value and not string. - When working with `large datasets`, particularly when dealing with thousands of files, uploading large numbers of files in a single request can result in long processing times, server timeouts, and potential failures. To address this issue, batch processing is used to break down the files into smaller `batches`, which can then be ingested and processed more efficiently. This allows the user to send files in smaller groups (e.g., 100 files per batch) instead of sending all the files in a single API call. This ensures smoother, faster, and more reliable ingestion of large document sets. `Response` ```json { "_id": "1d21b16a-bd3e-4a2d-8e94-fd65fe79feb5", "desc": "Some test Dataset1", "files": [ { "name": "file.txt", "ts": 1736414777.6179998 } ], "name": "Test Dataset1", "owners": [ "00000000-0000-0000-0000-0000000000a0" ], "tenant_id": "00000000-0000-0000-0000-000000000000" } ``` The `_id` denotes `dataset_id` which is used in place of `{dataset_id}` in endpoints which are explained below. `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for the ingest process, replacing `{dataset_id}` with the actual dataset ID ingest_endpoint = f"https://api.lab45.ai/v1.1/datasets/{dataset_id}/ingest" # The URL where the ingestion API is available # Set the headers for the request, including the content type, accepted response format, authorization token, and file details headers = { 'Content-Type': "application/json", # The request content type is JSON (but multipart for files is used below) 'Accept': "text/event-stream, application/json", # The client expects the server's response to be either event-stream or JSON 'Authorization': "Bearer ", # Authorization token for the API (replace with the actual key) 'Content-Type': 'multipart/form-data', } files = [ ('test_sample',('test_data.txt',open('./test_data.txt','rb'),'text/plain')), ('test_sample2',('test_data2.txt',open('./test_data2.txt','rb'),'text/plain')) ] payload = {} response = requests.post(ingest_endpoint, headers=headers, json=payload) # Print the response from the API call print(response) ``` #### Prepare Dataset The Prepare step prepares the dataset by converting the ingested documents into embeddings. These embeddings capture the semantic meaning of the content in each document. In this step, the system transforms each document into a vector representation, and stores it in Vector Database with the indexes. This prepares the dataset for fast retrieval during the query phase. `Http Request` ```bash POST /v1.1/skills/doc_completion/prepare Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "dataset_id": "1d21b16a-bd3e-4a2d-8e94-fd65fe79feb5" } ``` `Response` ```json { "_id": "94797865-4ae4-4b62-b343-072cb05fae07", "emb_type": "openai", "resource_group_id": "1d21b16a-bd3e-4a2d-8e94-fd65fe79feb5", "status": "Started" } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for preparing the skill, specifically for document completion prepare_endpoint = f"https://api.lab45.ai/v1.1/skills/doc_completion/prepare" # Set the headers for the request, including the content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type of the request is JSON (request body will be JSON) 'Accept': "text/event-stream, application/json", # The server is expected to respond with event-stream or JSON data 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which includes a dataset ID payload = { "dataset_id": "1d21b16a-bd3e-4a2d-8e94-fd65fe79feb5" # The ID of the dataset to be used for the document completion task } # Make the POST request to the "prepare" API endpoint with the provided headers and payload response = requests.post(prepare_endpoint, headers=headers, json=payload) # Print the response from the API call to see the status or data returned print(response) ``` #### Query This endpoint allows the user to ask a query, and the system will use the embeddings and indexed document content to retrieve and generate a relevant response. The query is first converted into an embedding and then compared with the embeddings of the documents in the dataset. The most similar document sections are retrieved, and the response is generated based on the context. `HTTP Request` ```bash POST /v1.1/skills/doc_completion/query HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "dataset_id" : "1d21b16a-bd3e-4a2d-8e94-fd65fe79feb5", "skill_parameters": { "model_name": "gpt-4", "retrieval_chain": "custom", "emb_type": "openai", "temperature": 0, "max_output_tokens": 100, "return_sources": false }, "stream_response":false, "messages": [ {"content": "Hi", "role": "user"}, {"content": "give summary of the uploaded document", "role": "user"} ] } ``` The messages section has `role` and `content`. The `role` represents the various roles used in interactions with the model. `Roles`: It can be any one as defined below - - SYSTEM: Defines the model's behavior. System messages are not accepted here, as agent instructions default to system messages for agents. - ASSISTANT: Represents the model's responses based on user messages. - USER: Equivalent to the queries made by the user. - AI: Interchangeably used with `ASSISTANT`, representing the model's responses. - FUNCTION: Represents all function/tool call activity within the interaction. `Content`: Content is the user prompt/query. Detailed prompt helps in getting the relevant response. As per above content (`give summary of the uploaded document`), LLM will give response relevant to the dataset id provided. `Response` ```json { "data": { "content": "The document tells the story of a little fish named Fin who tries to befriend a crab. The crab initially declines to play because it feels cold and unwell. However, Fin comes up with a plan to help the crab feel better by asking the sun for warmth. The sun responds to Fin's request, and the crab starts to feel better. As a result, the crab thanks Fin and they end up playing together and becoming good friends." } } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for querying the document completion skill query_endpoint = f"https://api.lab45.ai/v1.1/skills/doc_completion/query" # Set the headers for the request, including content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type is set to JSON (request body will be in JSON format) 'Accept': "text/event-stream, application/json", # The client expects either an event-stream (for real-time updates) or JSON as the response 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which contains the dataset and skill parameters payload = { "dataset_id" : "1d21b16a-bd3e-4a2d-8e94-fd65fe79feb5", "skill_parameters": { "model_name": "gpt-4", "retrieval_chain": "custom", "emb_type": "openai", "temperature": 0, "max_output_tokens": 100, "return_sources": False }, "stream_response": False, "messages": [ {"content": "Hi", "role": "user"}, {"content": "give summary of the uploaded document", "role": "user"} ] } # Make the POST request to the query API endpoint with the provided headers and payload response = requests.post(query_endpoint, headers=headers, json=payload) # Print the response from the API call to inspect the status or data returned print(response) ``` To download the Python notebook, refer to the [Doc Completion API Python Notebook](./_static/doc_completion.ipynb), which provides an in-depth example of how to use the Document Completion API for processing and completing documents with an example. ### Generate images using DALLE To generate images using DALL·E, an agent must first be created. This agent is assigned a unique agent_id, which is used to interact with it. Users send text-based prompts to the agent, and the agent calls the DALL·E tool to process the input and generate an image. The agent then responds by providing a URL to the generated image, allowing the user to access the final result. This process automates image creation and enables users to easily generate and retrieve images from textual descriptions. To use the `agent_chat_session` endpoint which helps in chatting with our [Agent Chat Session API](https://docs.lab45.ai/openapi_elements.html#/paths/v1.1-agent_chat_session-query/post) which will help in generating the response, first step is to [Create Agent](https://docs.lab45.ai/openapi_elements.html#/paths/v1.1-agents/post). The agent created, will take `tools` as the input. Tools can be Dalle, Stable Diffusion. The requests given below, shows an example of how to create agents and use it for a particular use-case of generating an image of a football. The response includes the image url in `links` field. #### Creation of Agent To generate images using DALL·E, an agent must first be created with specific tools like DALL·E or Stable Diffusion. The agent is assigned a unique `agent_id`, which is later used for interactions. Once created, users can send prompts to the agent to process image generation requests. `Http Request` ```bash POST /v1.1/agents HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "name": "agent_test", "description": "Test agent description", "instructions": "You are a helpful assistant that follows instructions exactly as given", "model_name": "gpt-4", "type": "Toolset", "temperature" : 0, "dataset_id": null, "tools": ["DalleText2ImageTool"] } ``` `Response` ```json { "_id": "2d7d03e9-0d02-47fd-b9f4-959ca7c6cded", "dataset_id": null, "description": "Test agent description", "instructions": "You are a helpful assistant that follows instructions exactly as given", "max_output_tokens": 256, "model_name": "gpt-4", "name": "agent_test", "owners": [ "00000000-0000-0000-0000-0000000000a0" ], "temperature": 0, "tenant_id": "00000000-0000-0000-0000-000000000000", "tools": [ "DalleText2ImageTool" ], "type": "Toolset" } ``` In the response, `_id` denoted `agent_id` which is used further as `part_id` in [Agent Chat Session API Usage](#agent-chat-session-api-usage) request. `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for interacting with the agents API agents_endpoint = f"https://api.lab45.ai/v1.1/agents" # The endpoint where agent-related operations are available # Set the headers for the request, including content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type is set to JSON, meaning the body will be JSON-encoded 'Accept': "text/event-stream, application/json", # The client expects event-stream or JSON as the response format 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call to create an agent, which includes agent details and configuration payload = { "name": "agent_test", "description": "Test agent description", "instructions": "You are a helpful assistant that follows instructions exactly as given", "model_name": "gpt-4", "type": "Toolset", "temperature": 0, "dataset_id": null, "tools": ["DalleText2ImageTool"] } response = requests.post(agents_endpoint, headers=headers, json=payload) # Print the response from the API call to inspect the result (status code, content, etc.) print(response) ``` #### Agent Chat Session API Usage The Agent Chat Session API allows users to interact with the created agent by sending text-based prompts. The agent processes the request using the assigned tool and responds with a generated image URL. This API enables automated image generation from textual descriptions efficiently. `Http Request` ```bash POST /v1.1/agent_chat_session/query HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "conversation_id": "", "messages": [ { "content": "Hi ImageAssistant, generate an image of a football", "name": "user_name", "role": "user" } ], "party_id": "2d7d03e9-0d02-47fd-b9f4-959ca7c6cded", "party_type": "Agent", "save_conversation": false, "stream_response": false } ``` `Response` ```json { "data": { "name": "agent_test", "role": "function", "content": "Sure, check this link out!", "links": "https://ai360tenantstorage.blob.core.windows.net/tenant-a919164d-8b7c-43fb-8119-f1997d45ca4f/images/stablediffusion/r4oRXlG6oo.jpg?se=2025-01-09T07%3A21%3A27Z&sp=r&sv=2021-08-06&sr=b&sig=kDMrzJfGrn5Q4SRIa1f7wa2XvVX3yp2bWi7jVaM%2Bg2M%3D", "response_status": "Completed" } } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for interacting with the agents API agents_endpoint = f"https://api.lab45.ai//v1.1/agent_chat_session/query" # The endpoint where agent-related operations are available # Set the headers for the request, including content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type is set to JSON, meaning the body will be JSON-encoded 'Accept': "text/event-stream, application/json", # The client expects either event-stream or JSON as the response format 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which contains the conversation details and instructions for the agent payload = { "conversation_id": "", "messages": [ { "content": "Hi ImageAssistant, generate an image of a football", "name": "user_name", "role": "user" } ], "party_id": "2d7d03e9-0d02-47fd-b9f4-959ca7c6cded", # The unique ID of the party (here it is the agent_id which is generated in agents api) "party_type": "Team", "save_conversation": False, "stream_response": False } response = requests.post(agents_endpoint, headers=headers, json=payload) # Print the response from the API call to inspect the result (status code, content, etc.) print(response) ``` To download the Python notebook, refer to the [Agent Chat Usage Python Notebook](./_static/agent.ipynb), which provides an example of using the Agent Chat Session API to generate images based on text prompts. ### Use Case for Sentiment Analysis The Sentiment Analysis workflow is a powerful tool designed to evaluate textual responses, classify them into predefined sentiment categories, and assign a polarity score ranging from -1 (very negative) to 1 (very positive). It utilizes an API endpoint (https://api.lab45.ai/v1.1/skills/completion/query) to process input data and generate actionable insights based on the emotional tone of the responses. This solution is particularly beneficial for businesses and organizations seeking to analyze customer feedback, survey results, product reviews, or conversational data, enabling them to make informed, data-driven decisions. #### **Key Features** **1. Sentiment Classification:** Each response is thoroughly analyzed and categorized into one of the following sentiment types: - Very Positive: Highly favorable, enthusiastic, or delighted responses. - Positive: Indicates general satisfaction or approval. - Neutral: Balanced or indifferent statements with no strong emotional inclination. - Negative: Dissatisfied or critical responses. - Very Negative: Strongly unfavorable or highly critical feedback. **2. Polarity Score:** In addition to classifying sentiment, the system assigns a polarity score that reflects the intensity of the sentiment: - Polarity Score -1: Represents extreme negativity. - Polarity Score 0: Represents neutrality or balanced sentiment. - Polarity Score 1: Represents extreme positivity. Polarity scores offer a quantitative measure of sentiment intensity, making it easier to analyze trends and prioritize actions. Here’s an example of how the sentiment analysis works and how the input can be formatted. In this case, the system takes a set of questions and their corresponding answers (in a Q&A format). user can send multiple questions at once, and each question can have one or more answers. The API will analyze each answer and classify the sentiment and polarity score accordingly. **Example 1:** Single Question, Multiple Answers Question: "How would you rate the quality of the food at our restaurant?" Answers: "The food was cold and lacked flavor." "It was okay, but the portions were too small." "Absolutely amazing! The taste and presentation were perfect." Output: Sentiment: "Negative," "Negative," "Positive" Polarity: -0.8, -0.6, 0.9 **Example 2:** Multiple Questions, Each With Multiple Answers Question 1: "What did you think of the service provided by our staff?" Answers: "The staff was slow and unhelpful." "The waiter was polite, but I had to wait a long time for my food." Question 2: "How would you rate the ambiance of the restaurant?" Answers: "The ambiance was delightful, with great music and lighting." "It was too noisy, and the seating was uncomfortable." Output: Sentiment: "Negative," "Neutral," "Positive," "Negative" Polarity: -0.7, -0.3, 0.9, -0.6 `Http Request` ```bash POST /v1.1/skills/completion/query HTTP/1.1 Host: api.lab45.ai Authorization: Bearer `` Content-Type: application/json Content-Length: 256 ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for querying sentiment analysis completion_endpoint = "https://api.lab45.ai/v1.1/skills/completion/query" # Set the headers for the request headers = { "Authorization": f"Bearer ", # Replace with your actual API token "Content-Type": "application/json" } # Define the prompt for sentiment analysis prompt = """ Analyze the sentiment of a question and its corresponding answer. For each pair: Categorize sentiment as one of the following: 'Very Positive' 'Positive' 'Neutral' 'Negative' 'Very Negative' Provide the sentiment polarity score on a scale of -1 to 1, -1 being very negative and +1 being very positive. For improvement-related questions, assume responses are negative unless explicitly stated positively (e.g., "It's already better"). The analysis must consider the relationship between the question and its answer for sentiment analysis. Ensure that sentiment category aligns with the polarity score. Example Input: Q1: A1 Q2: A2 Example Output: A list of dictionaries for each question's analysis, containing the following keys: Sentiment: A string denoting the sentiment (e.g., Positive, Negative, Neutral). Polarity: A numeric score indicating sentiment strength (-1.0 to 1.0). The output should strictly be in JSON format. Each question should map to a dictionary with the following structure: [{ "Question": Q1, "Sentiment": ["Positive", "Negative"], "Polarity": [0.3, -0.8], }, { "Question": Q2, "Sentiment": ["Positive", "Negative"], "Polarity": [0.3, -0.8], }] """ question = "How would you rate the ambiance of the restaurant?: The ambiance was delightful, with great music and lighting." # Define the payload (request body) for the API call payload = { "messages": [ {"content": prompt, "role": "system"}, {"content": question, "role": "user"}, ], "stream_response": False, "skill_parameters": {"max_output_tokens": 256, "model_name": "gpt-4"} } # Send the request to the API response = requests.post(completion_endpoint, headers=headers, json=payload) # Print the response from the API print(response) ``` To download the Python notebook, refer to the [Sentiment Analysis Python Notebook](./_static/sentiment_analysis.ipynb), implementing this Sentiment Analysis workflow. This notebook demonstrates how to interact with the API, analyze sentiment, and extract polarity scores for textual responses. ### Usecase for Topic Mapping This Topic Mapping Workflow leverages the API endpoint (https://api.lab45.ai/v1.1/skills/completion/query) to categorize feedback into predefined themes, sub-themes, and topics. The system uses advanced natural language processing (NLP) to analyze the context of each response, ensuring precise classification of the feedback based on its content. By doing so, it provides valuable insights into key areas of interest and concern, enabling organizations to make data-driven decisions. The workflow allows user to send multiple questions and answers in a single request. The API processes each response individually and maps it to the relevant theme, sub-theme, and topic based on its context. #### **Example:** Question: "How would you describe the quality of the product you purchased?" Answer 1: "The product is poorly made and doesn't work as advertised." Answer 2: "It's somewhat functional, but the durability could be improved." Answer 3: "I'm very impressed with the quality and features of this product." Output: Topic: "Product Quality," "Durability," "Satisfaction" Theme: "Customer Experience," "Customer Experience," "Customer Experience" Sub-theme: "Defects," "Improvement Areas," "High Quality" Question: "How would you rate your overall shopping experience?" Answer 1: "The website was easy to use, and I found what I needed quickly." Answer 2: "The checkout process was slow, and I faced issues with payment." Output: Topic: "Website Usability," "Payment Issues" Theme: "User Experience," "User Experience" Sub-theme: "Ease of Navigation," "Transaction Challenges" `Http Request` ```bash POST /v1.1/skills/completion/query HTTP/1.1 Host: api.lab45.ai Authorization: Bearer `` Content-Type: application/json Content-Length: 256 ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for querying sentiment analysis completion_endpoint = "https://api.lab45.ai/v1.1/skills/completion/query" # Set the headers for the request headers = { "Authorization": f"Bearer ", # Replace with your actual API token "Content-Type": "application/json" } # Define the prompt for sentiment analysis prompt = """ Using the provided themes, sub-themes, and keywords as reference, categorize each feedback response into the appropriate theme and sub-theme. Identify the sub-theme based on the keywords provided, ensuring the responses align with the context of the feedback. If the feedback does not align with any theme or sub-theme, classify it as Unknown. The keywords serve as an additional reference to help categorize the response, but if it doesn’t fit into any theme or sub-theme, mark it as Unknown. 1. Food & Beverage Quality -Taste & Freshness(Keywords: fresh ingredients, flavorful, quality, seasoning, authenticity, overcooked, undercooked, stale, homemade, organic, locally sourced) -Menu Variety(Keywords: diverse menu, vegan, vegetarian, gluten-free, healthy options, seasonal dishes, kids' menu, specialty dishes, limited options, unique offerings) 2. Service Experience -Customer Service(Keywords: friendly staff, attentive, polite, rude, slow service, proactive, professionalism, responsiveness, knowledgeable staff) -Speed of Service(Keywords: wait time, promptness, delayed orders, efficiency, timely delivery, fast, queue management) -Staff Knowledge(Keywords: recommendations, dietary restrictions, menu knowledge, allergies, pairings, indecisive) The analysis must consider the relationship between the question and its answer for sentiment analysis. Ensure that sentiment category aligns with the polarity score. Example Input: Q1: A1 /n Q2: A2 Example Output: A list of dictionaries for each question's analysis, containing the following keys: Topic: A string representing the primary subject. Theme: A string categorizing the general theme (e.g., Wellbeing, Growth). Sub-theme: A string specifying the sub-theme under the main theme The output should strictly be in JSON format. Each question should map to a dictionary with the following structure: [{ "Question": Q1, "Topic":["Topic1","Topic2"], "Theme": [Food & Beverage Quality, Food & Beverage Quality], "Sub-theme": [undercooked, fresh ingredients] }, { "Question": Q2, "Topic":["Topic1","Topic2"], "Theme": [Food & Beverage Quality, Service Experience], "Sub-theme": [quality, friendly staff] }] Note: Topic Extraction: Extract only the relevant topics from the answers. Order of Lists: Ensure the order of elements in each list (Topic, Theme, etc.) corresponds to the order of topics in the analysis. Exclude Other Details: Do not include explanatory text or irrelevant fields in the output. """ question = "How would you rate your overall shopping experience?: The website was easy to use, and I found what I needed quickly." # Define the payload (request body) for the API call payload = { "messages": [ {"content": prompt, "role": "system"}, {"content": question, "role": "user"}, ], "stream_response": False, "skill_parameters": {"max_output_tokens": 256, "model_name": "gpt-4"} } # Send the request to the API response = requests.post(completion_endpoint, headers=headers, json=payload) # Print the response from the API print(response) ``` To download the Python notebook, refer to the [Topic Mapping Python Notebook](./_static/topic_mapping.ipynb) implementing this Topic Mapping workflow. This notebook demonstrates how to interact with the API, categorize responses into themes, sub-themes, and topics, and extract structured insights from textual feedback. ### Dynamic Data Attachment This document provides a guide on utilizing a Lab45 AI Platform API implementation to augment the contextual data used by a Retrieval-Augmented Generation (RAG) agent. This approach dynamically attaches extra document data alongside the pre-existing, vectored data within a Lab45 AI platform dataset. This technique enables you to query and receive precise, combined responses that leverage information from both the dataset and an external PDF document, using a direct API call. #### Extract Text from PDF Document: This section reads a PDF file and extracts its textual content. ```python from pypdf import PdfReader pdf_file_path = "path/to/your/document.pdf" pdf_text = "" with open(pdf_file_path, 'rb'): reader = PdfReader(pdf_file_path) for page in reader.pages: pdf_text += page.extract_text() or "" ``` #### Preparing payload and header for the API Call: Preparing URL, playload and header to call the Lab45 AI Platform API. ```python token = "User Token or API Key" url = "<>/v1.1/skills/doc_completion/query" payload = { "dataset_id": "fc487d2d-8b81-4cb7-9243-1e0bb6694df1", # Replace with your actual dataset ID "skill_parameters": { "model_name": "gpt-4o", "temperature": 0, "max_output_tokens": 4000, "return_sources":False }, "stream_response": False, "messages": [ { "content": pdf_text, "role": "user" }, { "content": "Based on the context provided, tell me about 'The Impact of Football on Society' and 'The FIFA World Cup'", "role": "user" } ] } headers = { 'Content-Type': 'application/json', 'Authorization': 'bearer ' + token } ``` #### Quering on the context present in Dataset and PDF file: Calling Lab45 AI Platform API with the Dataset details and PDF content to answer queries. ```python import requests import json response = requests.post(url, headers=headers, data=json.dumps(payload)) response_json = response.json() ``` ### Custom Agent Transitions This document explains how to create and manage a workflow with custom transitions between AI agents using the Lab45 AI platform APIs. Custom transitions allow agents to communicate with each other in a structured way, following a pre-defined workflow that directs the conversation between specialized agents. #### Overview The Lab45 platform supports creating teams of specialized agents that can work together to solve complex problems. By defining custom transitions, you can control how messages flow between agents, allowing for sophisticated multi-agent collaboration patterns. This implementation demonstrates: 1. Creating specialized agents with specific roles 2. Defining a transition graph that determines the allowed communication paths 3. Creating a team with a custom transition workflow 4. Starting and managing chat sessions with the team #### Specialized Agents Our example creates a team of five specialized agents that work together on software development projects: 1. `Project Manager`: Creates detailed project plans with task breakdown and timeline estimates 2. `System Architect`: Designs robust system architectures with technical specifications 3. `Developer`: Implements code based on specifications with realistic time estimates 4. `Reviewer`: Evaluates code quality using structured assessment frameworks 5. `Tester`: Verifies functionality with comprehensive testing methodologies ##### Agent Creation The following code snippet demonstrates how agents are configured with specialized instructions: Following request shows an example of a particular use-case. Create Project manager agent `HTTP Request` ```bash POST /v1.1/agents HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "allow_all_access": false, "dataset_id": null, "description": "You are the Project manager for a Team", "instructions": "You are an experienced project manager who excels at creating detailed project plans.\\nYour responsibilities include:\\n1. Gathering and analyzing requirements\\n2. Breaking down work into specific tasks with time estimates (in days)\\n3. Creating a timeline with clear dependencies between tasks\\n4. Assigning resource requirements for each task\\n5. Identifying risks and mitigation strategies\\n6. Defining project milestones and deliverables\\n7. Providing regular status updates and timeline adjustments\\n\\nAlways create detailed plans with specific day estimates for each task.", "max_output_tokens": 1500, "model_name": "gpt-4o", "name": "project_manager_test", "temperature": 0, "tools": [], "type": "Basic" } ``` `Response` ```json { "_id": "fe7d7fb0-1190-441e-b890-26da7a563c22", "allow_all_access": false, "dataset_id": null, "description": "You are responsible for terminating the conversation.", "instructions": "sane instruction which we sent while we creating agent", "max_output_tokens": 100, "model_name": "gpt-4o", "name": "terminator_test", "owners": [ "XXXXXXX-92af-4379-8bb8-XXXXXXX" ], "temperature": 0, "tenant_id": "XXXXXXX-8b7c-43fb-8119-XXXXXXXXX", "tools": [], } ``` #### Create other agent with instructions provided below: **System Architect**: ```json { "name": "system_architecture_test", "description":"You are a senior software architect designing robust systems.", "instructions": "You are a senior software architect who designs robust system architectures.\\nFor each component you design, provide:\\n1. Technical specifications with implementation complexity (Low/Medium/High)\\n2. Estimated development time in days\\n3. Required technical skills\\n4. Dependencies and integration points\\n5. Scalability and performance considerations\\n6. Technology recommendations with justifications" } ``` **Developer**: ```json { "name": "developer_test", "description": "You are a skilled developer who writes code based on provided specifications.", "instructions": "You are a skilled developer who implements code based on specifications.\\nFor each implementation task, provide:\\n1. Time estimates for development in days (be realistic)\\n2. Technical challenges you anticipate\\n3. Required libraries and dependencies\\n4. Testing approach for your implementation\\n5. Resource needs (computing, specialized knowledge, etc.)" } ``` **Reviewer**: ```json { "name": "reviewer_test", "description": "You are a meticulous code reviewer who evaluates code quality and ensures adherence to standards and best practices.", "instructions": "You are a meticulous code reviewer who evaluates code quality.\\nFor each review, provide:\\n1. Time needed for thorough review (in days)\\n2. Quality assessment framework you'll use\\n3. Critical aspects you'll focus on during review\\n4. Standards and best practices you'll check against" } ``` **Tester**: ```json { "name": "tester_test", "description": "You are a comprehensive software tester who verifies functionality and ensures software, quality through various testing methodologies", "instructions": "You are a comprehensive software tester who verifies functionality.\\nFor each testing phase, provide:\\n1. Time required for complete test coverage (in days)\\n2. Test strategy (manual, automated, types of testing)\\n3. Test environment requirements\\n4. Test data needs\\n5. Success criteria for passing tests\\n6. Known edge cases you'll specifically test" } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for querying the document completion skill agent_endpoint = f"https://api.lab45.ai/v1.1/agents" # Set the headers for the request, including content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type of the request is JSON, meaning the request body will be in JSON format 'Accept': "text/event-stream, application/json", # The server is expected to respond with either event-stream or JSON 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which contains the user's query and skill parameters payload = { "allow_all_access": false, "dataset_id": null, "description": "You are the Project manager for a Team", "instructions": "You are an experienced project manager who excels at creating detailed project plans.\\nYour responsibilities include:\\n1. Gathering and analyzing requirements\\n2. Breaking down work into specific tasks with time estimates (in days)\\n3. Creating a timeline with clear dependencies between tasks\\n4. Assigning resource requirements for each task\\n5. Identifying risks and mitigation strategies\\n6. Defining project milestones and deliverables\\n7. Providing regular status updates and timeline adjustments\\n\\nAlways create detailed plans with specific day estimates for each task.", "max_output_tokens": 1500, "model_name": "gpt-4o", "name": "project_manager_test", "temperature": 0, "tools": [], "type": "Basic" } response = requests.post(agent_endpoint, headers=headers, json=payload) # Print the response from the API call to inspect the result (status code, content, etc.) print(response) ``` Note: Record _id field which is agent_id, it requires when we create the team #### Agent Transition Workflow Graph Structure Transition: Each agent can only pass control to the next agent or choose from the list of agent in the sequence, enforcing a structured workflow where each specialist provides their input at the appropriate stage of the process. The transition graph is defined as a dictionary where each key represents a source node (agent) and the value is a list of target nodes to which that agent can transition. In our implementation, we've defined a linear workflow: ```python graph_dict = {} graph_dict[initial_state] = [project_manager_test] graph_dict[project_manager_test] = [architect_test] graph_dict[architect_test] = [developer_test] graph_dict[developer_test] = [reviewer_test] graph_dict[reviewer_test] = [tester_test] ``` #### Visualizing the Transition Graph Initial State → Project Manager → System Architect → Developer → Reviewer→ Tester This linear workflow ensures that: The Project Manager first creates the project plan The System Architect then designs the technical architecture The Developer provides implementation details The Reviewer evaluates the proposed implementation The Tester plans testing strategies and validation approaches More complex workflows could include branching paths, feedback loops, or parallel processes by adding additional transitions in the graph dictionary. #### Creation of Team: `PreRequisite`: Agent name and agent _id, which you got from agent create api response. `Workflow configuration`: When creating a team with custom transitions, you need to specify the workflow pattern using agent UUIDs rather than their names. This is done through a transition graph in the team creation payload. #### Transition Graph with UUIDs The transition graph is a JSON structure that defines which agents can pass control to which others: The initial_state represents the transition from "User Query" to "Project Manager." Similarly, the UUIDs define subsequent transitions: "Project Manager" to "System Architect," "System Architect" to "Developer," "Developer" to "Reviewer," and finally, "Reviewer" to "Tester." ```json "workflow": { "transition_graph": { "initial_state": ["c2c5929e-f51b-4e71-a370-cc6279bea38b"], "c2c5929e-f51b-4e71-a370-cc6279bea38b": ["dc9d5954-be19-4aaa-8b89-aec4db87e75b"], "dc9d5954-be19-4aaa-8b89-aec4db87e75b": ["64b31edd-61c2-4046-84c9-ce8cfec1835e"], "64b31edd-61c2-4046-84c9-ce8cfec1835e": ["3da96cd0-d1ad-42e5-9ea9-8e6852e48c67"], "3da96cd0-d1ad-42e5-9ea9-8e6852e48c67": ["0fbe0d2e-d7ab-4704-8df9-40b8d7ac3e1c"], }, "transition_type": "allowed" } ``` #### Request sample to create the team with agents `HTTP Request` ```bash POST /v1.1/teams HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "allow_all_access": false, "assistant_ids": ["c2c5929e-f51b-4e71-a370-cc6279bea38b","dc9d5954-be19-4aaa-8b89-aec4db87e75b","64b31edd-61c2-4046-84c9-ce8cfec1835e", "3da96cd0-d1ad-42e5-9ea9-8e6852e48c67","0fbe0d2e-d7ab-4704-8df9-40b8d7ac3e1c"], "description": "Project planner Team", "instructions": "You have to plan the project based on the query", "name": "project_plan_team_test", "workflow": { "transition_graph": { "initial_state": ["c2c5929e-f51b-4e71-a370-cc6279bea38b"], "c2c5929e-f51b-4e71-a370-cc6279bea38b": ["dc9d5954-be19-4aaa-8b89-aec4db87e75b"], "dc9d5954-be19-4aaa-8b89-aec4db87e75b": ["64b31edd-61c2-4046-84c9-ce8cfec1835e"], "64b31edd-61c2-4046-84c9-ce8cfec1835e": ["3da96cd0-d1ad-42e5-9ea9-8e6852e48c67"], "3da96cd0-d1ad-42e5-9ea9-8e6852e48c67": ["0fbe0d2e-d7ab-4704-8df9-40b8d7ac3e1c"], }, "transition_type" : "allowed" } } ``` `Response` ```json { "_id": "fe7d7fb0-1190-441e-b890-26da7a563c22", } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for querying the document completion skill team_endpoint = f"https://api.lab45.ai/v1.1/teams" # Set the headers for the request, including content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type of the request is JSON, meaning the request body will be in JSON format 'Accept': "text/event-stream, application/json", # The server is expected to respond with either event-stream or JSON 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which contains the user's query and skill parameters payload = { "allow_all_access": false, "assistant_ids": ["c2c5929e-f51b-4e71-a370-cc6279bea38b","dc9d5954-be19-4aaa-8b89-aec4db87e75b","64b31edd-61c2-4046-84c9-ce8cfec1835e", "3da96cd0-d1ad-42e5-9ea9-8e6852e48c67","0fbe0d2e-d7ab-4704-8df9-40b8d7ac3e1c"], "description": "Project planner Team", "instructions": "You have to plan the project based on the query", "name": "project_plan_team_test", "workflow": { "transition_graph": { "initial_state": ["c2c5929e-f51b-4e71-a370-cc6279bea38b"], "c2c5929e-f51b-4e71-a370-cc6279bea38b": ["dc9d5954-be19-4aaa-8b89-aec4db87e75b"], "dc9d5954-be19-4aaa-8b89-aec4db87e75b": ["64b31edd-61c2-4046-84c9-ce8cfec1835e"], "64b31edd-61c2-4046-84c9-ce8cfec1835e": ["3da96cd0-d1ad-42e5-9ea9-8e6852e48c67"], "3da96cd0-d1ad-42e5-9ea9-8e6852e48c67": ["0fbe0d2e-d7ab-4704-8df9-40b8d7ac3e1c"], }, "transition_type" : "allowed" } } response = requests.post(team_endpoint, headers=headers, json=payload) # Print the response from the API call to inspect the result (status code, content, etc.) print(response) ``` Note: Record the `_id` from the response, which is used as team id for `teams` chat query. #### Team Chat Now user can initiate the chat with the team created. `HTTP Request` ```bash POST /v1.1/agent_chat_session/query HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "conversation_id":"67d2913af0a3e3549a044e0b", "party_id": "75e38237-b8f6-4510-808c-8c44a54e2cd1", "party_type": "Team", "stream_response": false, "max_rounds" :25, "messages": [ { "role": "user", "name": "user", "content": " I need a customer relationship management (CRM) system with the following requirements:\n \n 1. User management with role-based access control (admin, manager, sales rep)\n 2. Customer database with contact information, communication history, and purchase records\n 3. Sales pipeline management with deal tracking and forecasting\n 4. Email integration for tracking client communications\n 5. Reporting dashboard with customizable KPIs\n 6. Mobile app for on-the-go access\n 7. API for integration with our existing accounting software\n \n Our target launch is in 4 months and we have a team of 3 developers available.\n Please create a detailed project plan with timeline estimates for each phase.\n " } ], "save_conversation": true } ``` `Response` ```json { { "data": { "content": "To develop a CRM system with the specified requirements, we will break down the project into phases and tasks, estimate the time required for each task, and create a timeline. Given the 4-month target launch and a team of 3 developers, we will need to prioritize tasks and manage resources efficiently.\n\n### Phase 1: Requirements Gathering and Analysis (5 days)\n- **Task 1.1:** Conduct stakeholder interviews and workshops (2 days)\n- **Task 1.2:** Document detailed functional and non-functional requirements (3 days)\n\n### Phase 2: System Design (10 days)\n- **Task 2.1:** Design system architecture and database schema (4 days)\n- **Task 2.2:** Create wireframes and UI/UX design for web and mobile (3 days)\n- **Task 2.3:** Define API specifications for integration (3 days)\n\n### Phase 3: Development (50 days)\n- **Task 3.1:** Develop user management module with role-based access (8 days)\n- **Task 3.2:** Implement customer database and contact management (10 days)\n- **Task 3.3:** Develop sales pipeline management features (10 days)\n- **Task 3.4:** Integrate email tracking functionality (7 days)\n- **Task 3.5:** Create reporting dashboard with customizable KPIs (8 days)\n- **Task 3.6:** Develop mobile app for on-the-go access (7 days)\n\n### Phase 4: Integration and Testing (15 days)\n- **Task 4.1:** Integrate CRM with existing accounting software via API (5 days)\n- **Task 4.2:** Conduct unit and integration testing (5 days)\n- **Task 4.3:** Perform user acceptance testing (5 days)\n\n### Phase 5: Deployment and Launch (5 days)\n- **Task 5.1:** Deploy CRM system to production environment (2 days)\n- **Task 5.2:** Conduct training sessions for end-users (3 days)\n\n### Phase 6: Post-Launch Support and Optimization (5 days)\n- **Task 6.1:** Monitor system performance and address any issues (3 days)\n- **Task 6.2:** Gather user feedback and implement minor improvements (2 days)\n\n### Timeline and Dependencies\n- **Month 1:** Complete Phases 1 and 2\n- **Month 2 and 3:** Focus on Phase 3 (Development)\n- **Month 4:** Complete Phases 4, 5, and 6\n\n### Resource Allocation\n- Each developer will be assigned tasks based on their expertise, ensuring parallel development where possible to meet the timeline.\n\n### Risks and Mitigation Strategies\n- **Risk:** Delays in requirements gathering\n - **Mitigation:** Schedule regular check-ins with stakeholders\n- **Risk:** Integration challenges with existing software\n - **Mitigation:** Allocate additional buffer time for integration tasks\n\n### Milestones and Deliverables\n- **Milestone 1:** Completion of system design (End of Month 1)\n- **Milestone 2:** Completion of core development (End of Month 3)\n- **Milestone 3:** Successful deployment and user training (End of Month 4)\n\nPlease review the plan and let me know if there are any adjustments or additional requirements.", "conversation_id": "67d2913af0a3e3549a044e0b", "name": "project_manager_test", "role": "assistant", "response_status": "Pending" } } } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for querying the document completion skill chat_endpoint = f"https://api.lab45.ai/v1.1/agent_chat_session/query" # Set the headers for the request, including content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type of the request is JSON, meaning the request body will be in JSON format 'Accept': "text/event-stream, application/json", # The server is expected to respond with either event-stream or JSON 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which contains the user's query and skill parameters payload = { "conversation_id":"67d2913af0a3e3549a044e0b", "party_id": "75e38237-b8f6-4510-808c-8c44a54e2cd1", "party_type": "Team", "stream_response": false, "max_rounds" :25, "messages": [ { "role": "user", "name": "user", "content": " I need a customer relationship management (CRM) system with the following requirements:\n \n 1. User management with role-based access control (admin, manager, sales rep)\n 2. Customer database with contact information, communication history, and purchase records\n 3. Sales pipeline management with deal tracking and forecasting\n 4. Email integration for tracking client communications\n 5. Reporting dashboard with customizable KPIs\n 6. Mobile app for on-the-go access\n 7. API for integration with our existing accounting software\n \n Our target launch is in 4 months and we have a team of 3 developers available.\n Please create a detailed project plan with timeline estimates for each phase.\n " } ], "save_conversation": true } response = requests.post(chat_endpoint, headers=headers, json=payload) # Print the response from the API call to inspect the result (status code, content, etc.) print(response) ``` #### Chat Session Polling Mechanism The Lab45 platform uses a polling mechanism for agent chat sessions. When you initiate a chat session with a team, the process works as follows: ##### Initial Request Your first API call starts the conversation and returns a response with: - A `conversation_id` to track this specific chat session - A `response_status` typically set to "Pending" #### Polling for Completion Since agent responses may take time (especially in multi-agent scenarios), you need to poll the same endpoint repeatedly until the process completes: Take the conversation_id from the initial response Include it in subsequent requests to the same endpoint Continue polling until response_status changes to "Completed" ```python # Polling request (includes conversation_id from initial response) chat_payload = { "conversation_id": "67d1960af0a3e3549a044dc6", # From initial response "party_id": team_id, "party_type": "Team", "stream_response": False, "max_rounds": 25, "messages": [...], # Same messages as before "save_conversation": True } # Implementation of polling loop while chat_data['data']['response_status'] != "Completed": print("Response still pending. Polling in 2 seconds...") time.sleep(2) # Wait before polling again response = requests.post(url, headers=headers, json=chat_payload) chat_data = response.json() print(f"Status: {chat_data['data']['response_status']}") ``` #### Observing the Transition Pattern in Action When running a chat session with your multi-agent team, you'll notice that the responses follow the exact transition pattern defined in your graph. The conversation will progress through each agent in the specified sequence: 1. Your initial user query goes to the `Project Manager` 2. The Project Manager's response is followed by the `System Architect` 3. The System Architect is followed by the `Developer` 4. The Developer is followed by the `Reviewer` 5. Finally, the Reviewer is followed by the `Tester` This structured flow ensures that each specialist contributes their expertise at the appropriate stage of the process, creating a comprehensive solution that benefits from multiple perspectives. The Lab45 platform handles all the transitions behind the scenes, making the experience seamless for the end user. > **Key Insight**: By monitoring the `name` field in each response during polling, you can track which agent is currently active in the workflow. ### Using SharePoint Scan Feature Sharepoint Completion is a skill that leverages the user to ingest/retrieve file contents directly from a provided Sharepoint site url to respond accurately to user queries by dynamically extracting relevant information from the documents. The core concept behind this approach is getting the required documents context from the sharepoint site with Microsoft Power Automate for secure and automated data retrieval before generating a response. This helps in producing more accurate and contextually relevant answers to user queries. The endpoint uses various skill parameters which can be user configured to generate the response accordingly. To get the required setup for generarting the power automate workflow needed for data retrieval, please refer to the [Reference Page](https://docs.lab45.ai/reference.html#lab45-ai-platform-sharepoint-scan-support). The API operates in a 3-step sequence that ensures efficient processing and querying of document-based data to generate contextual responses. The four steps include Dataset Creation, Preparation, and Querying. Each step involves an API endpoint that is essential for moving through the pipeline and getting the final response. Here's an in-depth look at each of these four steps with a sample `use-case` for `indexing files from sharepoint`: #### Create Dataset This endpoint is used to create a new dataset in the system. A dataset is a logical grouping that contains all the documents uploaded by the user. Here the documents gets ingested directly from the sharepoint url provided. To get the ``, please refer to the [Authentication Page](https://docs.lab45.ai/quickstart.html#authentication). `HTTP Request` ```bash POST /v1.1/datasets HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "name": "Sharepoint_demo", "description":"Some test Dataset1", "skill_id": "sharepoint_completion", "sharepoint_attributes": { "site_url": "https://wipro365.sharepoint.com/sites/Lab45-shareScan", "power_automate_workflow_url": "https://prod-06.centralindia.logic.azure.com:443/workflows/1782bffadef048869e209601f3f57bf1/triggers/manual/paths/invoke?api-version=2016-06-01" } } ``` `Response` ```json { { "_id": "30149f9c-2858-4f8b-b591-50ef8a7d835b", "allow_all_access": false, "desc": "Some test Dataset1", "files": [ { "name": "/Shared Documents/Test_files/AI_IMPACT.pdf", "ts": 1736144953.0 }, { "name": "/Shared Documents/Covid_Impact_Survey.pdf", "ts": 1740738848.0 } ], "name": "Sharepoint_demo", "owners": [ "8650208b-f4a0-470d-aa74-e2ff7cb21dea" ], "sharepoint_attributes": { "power_automate_workflow_url": "https://prod-06.centralindia.logic.azure.com:443/workflows/1782bffadef048869e209601f3f57bf1/triggers/manual/paths/invoke?api-version=2016-06-01", "site_url": "https://wipro365.sharepoint.com/sites/Lab45-shareScan" }, "skill_id": "sharepoint_completion", "tenant_id": "a919164d-8b7c-43fb-8119-f1997d45ca4f" } } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for the datasets datasets_endpoint = f"https://api.lab45.ai/v1.1/datasets" # Set the headers for the request, including the content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type is JSON, so the body of the request will be in JSON format 'Accept': "text/event-stream, application/json", # The server is expected to respond with event-stream or JSON data 'Authorization': "Bearer " # Authorization header containing the API key (replace with your actual API key) } # Payload data to be sent with the request, typically in JSON format. This payload defines a new dataset. payload = { "name": "Sharepoint_demo", "skill_id": "sharepoint_completion", "sharepoint_attributes": { "site_url": "https://wipro365.sharepoint.com/sites/Lab45-shareScan", "power_automate_workflow_url": "https://prod-06.centralindia.logic.azure.com:443/workflows/1782bffadef048869e209601f3f57bf1/triggers/manual/paths/invoke?api-version=2016-06-01" } } # Make the POST request to the datasets API endpoint with the provided headers and payload response = requests.post(datasets_endpoint, headers=headers, json=payload) # Print the response from the API call (this could be the status code, or content, depending on the API response) print(response) ``` #### Prepare Dataset The Prepare step prepares the dataset by converting the ingested documents into embeddings. These embeddings capture the semantic meaning of the content in each document. In this step, the system transforms each document into a vector representation, and stores it in Vector Database with the indexes. This prepares the dataset for fast retrieval during the query phase. `Http Request` ```bash POST /v1.1/skills/doc_completion/prepare Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "dataset_id": "30149f9c-2858-4f8b-b591-50ef8a7d835b" } ``` `Response` ```json { "_id": "afa49e43-62bd-4f7e-b964-e51c1f8211e8", "emb_type": "openai", "resource_group_id": "30149f9c-2858-4f8b-b591-50ef8a7d835b", "status": "Started" } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for preparing the skill, specifically for document completion prepare_endpoint = f"https://api.lab45.ai/v1.1/skills/doc_completion/prepare" # Set the headers for the request, including the content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type of the request is JSON (request body will be JSON) 'Accept': "text/event-stream, application/json", # The server is expected to respond with event-stream or JSON data 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which includes a dataset ID payload = { "dataset_id": "30149f9c-2858-4f8b-b591-50ef8a7d835b" # The ID of the dataset to be used for the document completion task } # Make the POST request to the "prepare" API endpoint with the provided headers and payload response = requests.post(prepare_endpoint, headers=headers, json=payload) # Print the response from the API call to see the status or data returned print(response) ``` #### Query This endpoint allows the user to ask a query, and the system will use the embeddings and indexed document content to retrieve and generate a relevant response. The query is first converted into an embedding and then compared with the embeddings of the documents in the dataset. The most similar document sections are retrieved, and the response is generated based on the context. `HTTP Request` ```bash POST /v1.1/skills/doc_completion/query HTTP/1.1 Content-Type: application/json Accept: text/event-stream, application/json Authorization: Bearer `` Host: api.lab45.ai Content-Length: 148 ``` ```json { "dataset_id" : "30149f9c-2858-4f8b-b591-50ef8a7d835b", "skill_parameters": { "model_name": "gpt-4", "retrieval_chain": "custom", "emb_type": "openai", "temperature": 0, "max_output_tokens": 100, "return_sources": false }, "stream_response":false, "messages": [ {"content": "Hi", "role": "user"}, {"content": "What is the impact of AI?", "role": "user"} ] } ``` The messages section has `role` and `content`. The `role` represents the various roles used in interactions with the model. `Roles`: It can be any one as defined below - - SYSTEM: Defines the model's behavior. System messages are not accepted here, as agent instructions default to system messages for agents. - ASSISTANT: Represents the model's responses based on user messages. - USER: Equivalent to the queries made by the user. - AI: Interchangeably used with `ASSISTANT`, representing the model's responses. - FUNCTION: Represents all function/tool call activity within the interaction. `Content`: Content is the user prompt/query. Detailed prompt helps in getting the relevant response. As per above content (`give summary of the uploaded document`), LLM will give response relevant to the dataset id provided. `Response` ```json { "data": { "content": "The impact of artificial intelligence (AI) goes beyond increasing the efficiency of the existing economy; it also serves as a new general-purpose \"method of invention\" that can reshape the nature of the innovation process and the organization of research and development (R&D). AI, particularly recent developments in \"deep learning,\" has the potential to lead to a significant substitution away from more labor-intensive research towards research that leverages large datasets and enhanced prediction algorithms. This shift in research is likely to bring about significant changes in how companies approach innovation, with a focus on acquiring and controlling critical large datasets and application-specific algorithms to drive research productivity and competition. Policymakers are encouraged to promote transparency and sharing of core datasets across public and private sectors to stimulate research productivity and innovation-oriented competition in the future." } } ``` `Python Request` ```python import requests # Import the requests module to send HTTP requests # Define the API endpoint for querying the document completion skill query_endpoint = f"https://api.lab45.ai/v1.1/skills/doc_completion/query" # Set the headers for the request, including content type, accepted response format, and authorization token headers = { 'Content-Type': "application/json", # The content type is set to JSON (request body will be in JSON format) 'Accept': "text/event-stream, application/json", # The client expects either an event-stream (for real-time updates) or JSON as the response 'Authorization': "Bearer " # Replace with your actual API key for authentication } # Define the payload (request body) for the API call, which contains the dataset and skill parameters payload = { "dataset_id" : "30149f9c-2858-4f8b-b591-50ef8a7d835b", "skill_parameters": { "model_name": "gpt-4", "retrieval_chain": "custom", "emb_type": "openai", "temperature": 0, "max_output_tokens": 100, "return_sources": False }, "stream_response": False, "messages": [ {"content": "Hi", "role": "user"}, {"content": "What is the impact of AI?", "role": "user"} ] } # Make the POST request to the query API endpoint with the provided headers and payload response = requests.post(query_endpoint, headers=headers, json=payload) # Print the response from the API call to inspect the status or data returned print(response) ``` ## Autogen Extention Guide ### CSV Data Analysis with Lab45 AI autogen extended library and LangChain This document explains how to use the `lab45_autogen_extension` library with LangChain tools to analyze CSV data through natural language queries. #### Overview The `assistant_with_csv.py` script demonstrates how to create an AI assistant that can: 1. Load and analyze CSV data files 2. Respond to natural language queries about the data 3. Execute Python code through LangChain's REPL tool 4. Generate insightful responses using the Lab45 AI Platform This integration enables users to interact with data through conversation rather than writing code directly. #### How It Works The script combines several powerful components: * `Lab45 AI Platform`: Provides the language model capabilities * `LangChain Tools`: Offers Python code execution abilities * `Pandas`: Handles the data manipulation and analysis * `AutoGen`: Manages the agent conversation flow ##### The Process Flow 1. Load a CSV file into a pandas DataFrame 2. Create a Python REPL tool that has access to this DataFrame 3. Connect this tool to a Lab45 AI assistant agent 4. Send queries to the agent about the data 5. The agent uses the tool to analyze the data and responds with insights #### Prerequisites Note: To run this code, make sure Python 3.11 is set up in your environment. Follow the [Python 3.11 Environment Setup Guide](https://docs.lab45.ai/reference.html#installation). This script requires the following Python libraries: - `autogen-agentchat==0.4.7` - `autogen-core==0.4.7` - `autogen-ext==0.4.7` - `langchain_experimental` You can install the necessary libraries using pip mentioned above: sample: ```bash pip install langchain_experimental ``` #### Lab45 AI Platform Client ```python client = Lab45AIPlatformCompletionClient( model_name='gpt-4o' ) ``` The client connects to the Lab45 AI Platform API using the provided API key and model configuration. #### CSV Data Loading ```python df = pd.read_csv( "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv") ``` The script loads the famous Titanic dataset, but you can replace this with any CSV file. #### LangChain Python Tool ```python tool = LangChainToolAdapter(PythonAstREPLTool(locals={"df": df})) ``` #### Tool Description: This tool allows the agent to: Execute Python code Access the DataFrame through the variable df Run pandas operations and analyses Return results to be incorporated in responses. #### Assistant Agent Configuration ```python agent = AssistantAgent( name="assistant", llm_client=client, system_message="""You are a data analysis assistant. You have access to a Python tool that can execute code. The data is available in a pandas DataFrame named 'df'. When you need to analyze data, write Python code and use the tool to execute it. After execution, explain the results in plain language. """, tools=[tool] ) ``` #### Running the CSV Analysis Assistant The script is designed to be run directly, with a simple entry point that leverages Python's asynchronous capabilities: ```python if __name__ == "__main__": asyncio.run(assistant_chat_with_langchain_csv_tool()) ``` #### Example Usage and Responses When interacting with the CSV data analysis assistant, you'll observe how it processes queries and returns responses with tool call results. ##### Query Processing Flow 1. `User Query`: You ask a question about the data 2. `Tool Execution`: The assistant invokes the Python REPL tool to analyze the data 3. `Results Processing`: The assistant interprets the code execution results 4. `Response Generation`: A human-friendly explanation is provided response: [FunctionExecutionResult(source='assistant', models_usage=None, content='29.69911764705882', type='ToolCallSummaryMessage')] ##### Response Format The response includes metadata about the tool usage and results: This tells us: - `source='assistant'`: The message came from the assistant agent - `models_usage=None`: Model usage tracking information (if available) - `content='29.69911764705882'`: The actual result from the code execution - `type='ToolCallSummaryMessage'`: This is a summary of a tool call result ##### Example Interaction `User Query:` "What is the average age of passengers on the Titanic?" `Assistant's Process:` 1. The assistant recognizes this requires calculating the mean of the 'Age' column 2. It executes Python code using the tool: ```python df['Age'].mean() ##### Full code ```python import asyncio import os import pandas as pd from langchain_experimental.tools.python.tool import PythonAstREPLTool from autogen_ext.tools.langchain import LangChainToolAdapter from autogen_agentchat.agents import AssistantAgent from autogen_agentchat.messages import TextMessage from autogen_core import CancellationToken # Ensure the import paths are correct import sys sys.path.insert(0, os.path.abspath(os.getcwd())) from lab45_autogen_extension.lab45aiplatform_autogen_extension import Lab45AIPlatformCompletionClient #os.environ["LAB45AIPLATFORM_URL"] = "http://localhost:8000/v1.1/" os.environ["LAB45AIPLATFORM_URL"] = "https://api.lab45.ai/v1.1/" os.environ["LAB45AIPLATFORM_API_KEY"] = "User Jwt token or developer key" """ please install the following packages: pip install langchain_experimental """ async def assistant_chat_with_langchain_csv_tool(): """ Assistant agent with a LangChain tool """ client = Lab45AIPlatformCompletionClient( model_name='gpt-4o' ) df = pd.read_csv( "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv") tool = LangChainToolAdapter(PythonAstREPLTool(locals={"df": df})) agent = AssistantAgent( name="assistant", model_client=client, model_client_stream=False, tools=[tool], system_message="Use the `df` variable to access the dataset.", ) # Non-Streaming response = await agent.on_messages( [TextMessage(content="What's the average age of the passengers?", source="user")], cancellation_token=CancellationToken(), ) print(response.chat_message) print("Response:", response.chat_message.content) if __name__ == "__main__": asyncio.run(assistant_chat_with_langchain_csv_tool()) ``` ##### Benefits: No-Code Data Analysis: Users can get insights without writing Python code Natural Language Interface: Ask questions in plain English Powerful Analysis: Access to the full capabilities of pandas and Python Contextual Understanding: The agent understands follow-up questions Explanatory Responses: Results are explained in easy-to-understand language Advanced Capabilities ##### This system can handle complex analytical tasks including: Statistical analysis and hypothesis testing Data visualization recommendations Identifying trends and patterns Data cleaning suggestions Feature engineering for machine learning ### Dynamic Data Attachment Using Lab45 AI Platform Extended Autogen Library This document provides a comprehensive guide on utilizing the `lab45_autogen_extension` library to augment the contextual data used by a Retrieval-Augmented Generation (RAG) agent by dynamically attaching extra document data alongside the pre-existing, vectored data within a Lab45 AI platform dataset. This technique enables you to query and receive precise, combined responses that leverage information from both the dataset and an external PDF document. #### Prerequisites Note: To run this code, make sure Python 3.11 is set up in your environment. Follow the [Python 3.11 Environment Setup Guide](https://docs.lab45.ai/reference.html#installation). #### Creating client with Lab45 AI extention: This client handles interactions with the specified model on the Lab45 AI platform. ```python from lab45_autogen_extension.lab45aiplatform_autogen_extension import Lab45AIPlatformCompletionClient client = Lab45AIPlatformCompletionClient( model_name='gpt-4o' ) ``` #### Extract Text from PDF Document: This section reads a PDF file and extracts its textual content. ```python pdf_file_path = "path/to/your/document.pdf" pdf_text = "" with open(pdf_file_path, 'rb'): reader = PdfReader(pdf_file_path) for page in reader.pages: pdf_text += page.extract_text() or "" ``` #### Initialize RAG Tool with Dataset and Message History This creates a Retrieval-Augmented Generation (RAG) tool that uses a dataset and message history for context. ```python from lab45_autogen_extension.custom_tools.ragtool import Lab45AIPlatformRAGTool from autogen_ext.tools.langchain import LangChainToolAdapter dataset_id = "fc487d2d-8b81-4cb7-9243-1e0bb6694df1" # Replace with your actual dataset ID rag_tool = LangChainToolAdapter(Lab45AIPlatformRAGTool(dataset_id=dataset_id, model_name="gpt-4o", top_k=100, message_history=[{'content': pdf_text, 'name': 'user', 'role': 'user'}])) ``` #### Create Agent using Platform's Model Client and RAG Tool: This agent will use the RAG tool and the Lab45 AI platform client to answer queries. ```python agent = AssistantAgent( name="assistant", model_client=client, model_client_stream=False, tools=[rag_tool], system_message="You are an agent which has been given a set of documents which will be your context and knowledge base. Use this to answer all queries asked.", ) ``` #### Quering on the context present in Dataset and PDF file: This sends a query to the agent, which will use the RAG tool to retrieve relevant information from the data present in dataset and message history ```python query_with_context = f"Question: Based on the context provided, tell me about 'The Impact of Football on Society' and 'The FIFA World Cup'" response = await agent.on_messages( [TextMessage(content=query_with_context, source="user")], cancellation_token=CancellationToken(), ) ``` ### Specialized Data and Tool Agents Using Lab45 AI Platform Extended Autogen Library: #### An Academic Research Paper Search and Literature Review Usecase: This document provides a comprehensive guide on utilizing the `lab45_autogen_extension` library to create a research team of multiple agents capable of searching for academic papers, indexing them, and generating a literature review. It serves as a reference for effectively using Autogen agents with custom tools and platform RAG to achieve the desired outcomes. The guide includes code snippets and detailed explanations to facilitate understanding and implementation. #### Prerequisites Note: To run this code, make sure Python 3.11 is set up in your environment. Follow the [Python 3.11 Environment Setup Guide](https://docs.lab45.ai/reference.html#installation). Before you begin, ensure you have the following dependencies installed, use 'pip' to install each with the exact versions mentioned below: - `arxiv` - `asyncio` - `autogen-agentchat==0.4.7` - `autogen-core==0.4.7` - `autogen-ext==0.4.7` You also need to set the following environment variables, which are necessary for connecting to the platform endpoints for all LLM, RAG and tool related calls: - `LAB45AIPLATFORM_URL` - Platform's API URL(https://api.lab45.ai) - `LAB45AIPLATFORM_API_KEY` - Platform API key or Bearer token #### Arxiv search tool for academic papers Define a function to search for academic papers on Arxiv. This function will then further be used as a custom tool to be attached to an agent. ```python def arxiv_search(query: str, max_results: int = 1) -> str: """ An academic paper search tool to search Arxiv library for papers and return the results including abstracts. """ client = arxiv.Client() search = arxiv.Search(query=query, max_results=max_results, sort_by=arxiv.SortCriterion.Relevance) results = [ { 'title': paper.title, 'authors': [author.name for author in paper.authors], 'published': paper.published.strftime("%Y-%m-%d"), 'abstract': paper.summary, 'pdf_url': paper.pdf_url, } for paper in client.results(search) ] return json.dumps(results) ``` #### Research team class Create a `ResearchTeam` class that initializes the necessary tools and agents for this use case. The class uses `Lab45AIPlatformCompletionClient`, a Lab45 AI platform extension for `model_client` in Autogen 0.4. This client routes all calls requiring LLMs, tool/function calls, and tool executions to the platform backend. The research team consists of three agents: - `Arxiv_Search_Agent`: Uses the Arxiv tool to search for papers related to a given topic. - `Data_Agent`: Utilizes the downloaded and indexed research papers as its knowledge base to answer queries. - `Report_Agent`: Generates a literature review report based on a search topic. The agents use the following tools: - `Arxiv_Search_Tool`: Searches Arxiv for papers related to a given topic, including abstracts. - `RAG_Tool`: Interacts with a pre-indexed document set using a Retrieval-Augmented Generation (RAG) pipeline. - `Bing_Search_Tool`: Searches the web using the Bing Search API. ```python class ResearchTeam: """ Research team: A Team of agents to search for academic papers on research topics and generate a literature review based on it. """ def __init__(self): # initialize platform's model client client = Lab45AIPlatformCompletionClient(model_name='gpt-4o') # wrap custom function as a tool arxiv_search_tool = FunctionTool(arxiv_search, description="Search Arxiv for papers related to a given topic, including abstracts") # initialize platform's in-built RAG tool rag_tool = LangChainToolAdapter(Lab45AIPlatformRAGTool(dataset_id="37728abd-5862-4884-a019-a1a00a3331ff", model_name="gpt-4o", top_k=30)) # initialize platform's in-built web search tool bing_search_tool = LangChainToolAdapter(Lab45AIPlatformBingSearchTool()) # create agents using platform's model client self.arxiv_search_agent = AssistantAgent( name="Arxiv_Search_Agent", tools=[arxiv_search_tool], model_client=client, description="An agent that can search Arxiv for papers related to a given topic, including abstracts", system_message="You are a helpful AI assistant. Solve tasks using your tools. Specifically, you can take into consideration the user's request and craft a search query that is most likely to return relevant academic papers.", ) self.data_agent = AssistantAgent( name="Data_Agent", model_client=client, model_client_stream=False, tools=[rag_tool, bing_search_tool], system_message="You are an agent which has been given a set of documents which will be your context and knowledge base. Use this to answer all queries asked.", ) self.report_agent = AssistantAgent( name="Report_Agent", model_client=client, description="Generate a report based on a given topic", system_message="You are a helpful reporting assistant. Your task is to synthesize data extracted into a high-quality literature review including CORRECT references. You MUST write a final report that is formatted as a literature review with CORRECT references. Your response should end with the word 'TERMINATE'.", ) ``` #### Basic team demonstration: Using RAG and custom tools within a single agent This example demonstrates a basic search team that uses already indexed documents or performs web searches to generate a literature report based on the provided query. The `data_agent` is equipped with both RAG and Bing search tools, allowing it to leverage either tool depending on the query. The `report_agent` then takes in the response from `data_agent` as reference and generates a report. For example: - Query: "What is Llama adapter?" -> uses `rag_tool` in `data_agent` - Query: "What is the latest update on Llama adapter?" -> uses `bing_search_tool` in `data_agent` The team utilizes `RoundRobinGroupChat`, an Autogen feature that executes a team of agents in a round-robin format. In this case, two agents are passed to it. ```python async def search_and_report_with_platform_client(self): termination = TextMentionTermination("TERMINATE") team = RoundRobinGroupChat( participants=[self.data_agent, self.report_agent], termination_condition=termination ) response = await team.run(task="What is Llama adapter?") print(response) ``` #### Academic paper search and literature review report generation using multi-agent teams An advanced research team that searches for academic papers on a given topic, downloads the papers, indexes them, and generates a literature review based on the indexed papers. This function takes a search topic as input: - First, the `arxiv_search_agent` fetches relevant academic papers on the given search topic. - Then, the papers are downloaded and sent for indexing using platform APIs via `_index_files()`. This step requires the user to pass in a `dataset_id`, which can be generated using the create_dataset endpoint of the platform. - Once the data has been indexed, the `data_agent` can utilize the newly indexed knowledge base to answer queries. - A `RoundRobinGroupChat` based team is created to initiate a process where the `data_agent` extracts relevant information from the indexed documents, and the `report_agent` uses that context to generate a literature review for the given topic. ```python async def arxiv_search_index_and_report_with_platform_client(self, search_topic: str): # Use arxiv_search_agent to fetch relevant academic papers on the give search_topic response = await self.arxiv_search_agent.on_messages( [TextMessage(content=f"Find relevant academic papers on the topic: {search_topic}", source="user")], cancellation_token=CancellationToken(), ) print(response.chat_message) results = json.loads(response.chat_message.content) if not results: raise ValueError("No results found") # Create a local folder 'academic_files' if it doesn't exist local_folder = 'academic_files' os.makedirs(local_folder, exist_ok=True) # Download files into the local folder and prepare the list for indexing files = [] for content in results: file_name = f"{content['title'][:10]}.pdf" file_path = os.path.join(local_folder, file_name) with open(file_path, 'wb') as file: file.write(requests.get(content['pdf_url']).content) files.append((file_name, (file_name, open(file_path, 'rb'), 'application/pdf'))) # Index the papers index_status = self._index_files(files=files, dataset_id="e4142591-9297-4b79-9c71-4e5aedabefd0") if index_status: termination = TextMentionTermination("TERMINATE") team = RoundRobinGroupChat( participants=[self.data_agent, self.report_agent], termination_condition=termination ) response = await team.run(task=f"Write a literature review on {search_topic} for AI systems using the indexed papers") print(response) ``` #### Helper methods Add the following helper methods to manage headers, check index status, and index files. These methods utilize Lab45 AI platform's dataset and doc_completion APIs to ingest and index the downloaded academic papers into a vector database. ```python @staticmethod def _get_headers(): return { 'Content-Type': 'application/json', 'Authorization': f"Bearer {os.getenv('LAB45AIPLATFORM_API_KEY')}" } def check_index_status(self, dataset_id: str, workflow_id: str) -> dict: response = requests.get( urljoin(os.getenv("LAB45AIPLATFORM_URL"), f"datasets/{dataset_id}/workflow/{workflow_id}"), headers=self._get_headers() ) if response.status_code != 200: raise Exception(f"Failed to get the status of the dataset. Status code: {response.status_code}, Response: {response.text}") return response.json() def _index_files(self, files: list = [], dataset_id: str = None): ingest_url = urljoin(os.getenv("LAB45AIPLATFORM_URL"), f"datasets/{dataset_id}/ingest") headers = self._get_headers() headers.pop('Content-Type') ingest_response = requests.request("POST", ingest_url, headers=headers, data={}, files=files) if ingest_response.status_code != 200: raise Exception(f"Failed to upload files to the dataset. Status code: {ingest_response.status_code}, Response: {ingest_response.text}") payload = json.dumps({"dataset_id": dataset_id}) prepare_response = requests.post( urljoin(os.getenv("LAB45AIPLATFORM_URL"), "skills/doc_completion/prepare"), headers=self._get_headers(), data=payload ) if prepare_response.status_code != 200: raise Exception(f"Failed to prepare the dataset. Status code: {prepare_response.status_code}, Response: {prepare_response.text}") response_data = prepare_response.json() status = response_data["status"] workflow_id = response_data["_id"] retry = 0 while status in ['Started', 'Waiting', 'Indexing']: if retry > 5: raise Exception(f"Failed to index the dataset. Status: {status}") time.sleep(10) retry += 1 status_response = self.check_index_status(dataset_id, workflow_id) status = status_response["status"] return True ``` #### Running the script Finally, add the main function to run the script asynchronously: ```python if __name__ == "__main__": asyncio.run(ResearchTeam().arxiv_search_index_and_report_with_platform_client(search_topic="Llama adapter")) ```