by Grayson Adkins, updated March 1, 2024
This notebooks analyzes the effect different embedding models and variables, such as chunk size, have on RAG performace.
You want to learn:
If you'd like to reproduce this notebook, you'll need:
Retrieval Augmented Generation (RAG), is a technique for improving lanaguage model performance in question answering by providing additional context in a question prompt. The technique involves comparing a given question and a related body of text, then retrieving the most similar results and asking a language model to answer the question with the help of this additional information. Unlike with a simple keyword search, however, we create embeddings of the question and dataset, then perform a semantic search to retrieve the most relevant snippets that match the question's meaning, regardless of whether there exist exact keyword matches.
RAG is often a good option when you don't have enough data to fine-tune, but you still want to teach the model about a new topic.
text-embedding-ada-002
, create embeddings for the chunks and a given question. With the additional context, hopefully the language model can better answer the question.
There are several variables in the above steps that can significantly affect how well the language model performs in answering questions.
The point of RAG is to augment a language model that might be unfamiliar with a given topic, for example, something obscure, proprietary, or that has occurred since the model was trained. It's this last scenario that we'll focus on in this notebook.
In these experiements, I use Llama 2 as the language model, which was trained starting in February 2023. To ensure that the model does not already know the answers to the questions I'll be asking, I ask questions about COVID-19 statistics reported between December 2023 and January 2024.
PDF of the dataset: World Health Organization COVID-19 Epidemiological Update - Edition 163 (pdf)
The chunked dataset is available on my Hugging Face in CSV format.
In this notebook, I do not attempt to perform an exhaustive testing of each permutation of variables listed below. Instead, I select a few illustrative configurations and leave it up to the reader to experiment when reproducing my results.
Chunk Size (tokens) | Overlap | Embedding Model | Similarity method | Top-k | Language Model | Quantization |
---|---|---|---|---|---|---|
200 | 10% | Llama 2 | Cosine | 1 | Llama 2 7B | None (full-precision) |
500 | 25% | OpenAI | Dot Product | 3 | Llama 2 13B | GPTQ |
1000 | 50% | MS MARCO | - | 7 | Llama 2 70B | bitsandbytes (NF4) |
gpt-3.5-turbo
helped the chunking process and, thus, Llama in answering questions. text-embedding-ada-002
just barely beat out MS MARCO. Llama 2's embedding model performed noticably worse. If open-source is important to you, MS MARCO is a good alternative to OpenAI. bitsandbytes
, although not by much. # Authenticate to Hugging Face to pull and push models
!pip install huggingface_hub
from huggingface_hub import notebook_login
notebook_login()
Requirement already satisfied: huggingface_hub in /usr/local/lib/python3.10/dist-packages (0.20.3) Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (3.13.1) Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (2023.6.0) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (2.31.0) Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.66.2) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (6.0.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.9.0) Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (23.2) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (2024.2.2)
VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…
Optional but saves time by caching the model and allows for training data to be saved on Drive.
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
import os
cache_dir = "/content/drive/My Drive/huggingface_cache"
os.makedirs(cache_dir, exist_ok=True) # Ensure the directory exists
# https://stackoverflow.com/questions/56081324/why-are-google-colab-shell-commands-not-working
import locale
def getpreferredencoding(do_setlocale = True):
return "UTF-8"
locale.getpreferredencoding = getpreferredencoding
This quantization method is faster than bitsandbytes
, although probably a little less accurate
!pip install -q -U transformers peft accelerate optimum
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
!pip install -q datasets
from transformers import AutoTokenizer, AutoModelForCausalLM
# model_id = "TheBloke/Llama-2-7B-chat-GPTQ"
model_id = "TheBloke/Llama-2-13B-chat-GPTQ"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
cache_dir=cache_dir
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks. Please note that authentication is recommended but still optional to access public models or datasets. warnings.warn( WARNING:auto_gptq.nn_modules.qlinear.qlinear_cuda:CUDA extension not installed. WARNING:auto_gptq.nn_modules.qlinear.qlinear_cuda_old:CUDA extension not installed.
tokenizer_config.json: 0%| | 0.00/727 [00:00<?, ?B/s]
tokenizer.model: 0%| | 0.00/500k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/1.84M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/411 [00:00<?, ?B/s]
bitsandbytes
NF4¶This is the same quantizatoin method described in my QLoRA notebook.
# !pip install -q -U bitsandbytes
# !pip install -q -U git+https://github.com/huggingface/transformers.git
# !pip install -q -U git+https://github.com/huggingface/peft.git
# !pip install -q -U git+https://github.com/huggingface/accelerate.git
# !pip install -q datasets
# import torch
# from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
# # model_id = "meta-llama/Llama-2-7b-chat-hf"
# model_id = "meta-llama/Llama-2-13b-chat-hf"
# bnb_config = BitsAndBytesConfig(
# load_in_4bit=True,
# bnb_4bit_use_double_quant=True,
# bnb_4bit_quant_type="nf4",
# bnb_4bit_compute_dtype=torch.bfloat16
# )
# tokenizer = AutoTokenizer.from_pretrained(model_id)
# model = AutoModelForCausalLM.from_pretrained(
# model_id,
# quantization_config=bnb_config,
# device_map={"":0},
# cache_dir=cache_dir)
Note: I've noticed that cleaning up the dataset using an LLM can improve embedding retrieval peformance slighly. In this example I used a cleaned COVID-19 statistics dataset.
from datasets import load_dataset
data = load_dataset("gadkins/who-covid-19-epidemiological-update-edition-163")
# Print first row of 'train' and 'test'
print("First row of train:", data['train'][0])
print("First row of test:", data['test'][0])
data = data.map(lambda samples: tokenizer(samples["Text"]), batched=True)
Downloading readme: 0%| | 0.00/443 [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/40.7k [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/1.23k [00:00<?, ?B/s]
Generating train split: 0 examples [00:00, ? examples/s]
Generating test split: 0 examples [00:00, ? examples/s]
First row of train: {'Text': '1 \n COVID -19 Epidemiological Update \nEdition 163 published 19 January 2024 \nIn this edition: \n• Key highlights \n• Global overview \n• Hospitalizations and ICU admissions \n• SARS -CoV-2 variants of interest and variants under monitoring \n• WHO regional overviews \n \n \nKey highlights \n \n• Globally, during the 28 -day period from 11 December 2023 to 7 January 2024 , 106 countries report ed COVID -\n19 cases and 51 countries report ed COVID -19 deaths. Note that this does not reflect the actual number of \ncountries where cases or deaths are occurring , as many countries have stopped or changed frequency of \nreporting . \n• From the available data , the number of reported cases has increased while deaths have decreased during \nthe 28 -day period , with over 1.1 million new cases and 8700 new deaths , an increase of 4% and a decrease \nof 26%, respectively, compared to the previous 28 days ( 13 November to 10 December 2023 ). Trends in the \nnumber of reported new cases and deaths should be interpreted with caution due to decreased testing and \nsequencing, alongside reporting delays in many countries. According to estimates obtained from wastewater \nsurveillance, clinical detection of cases underestimates the real burden from 2 to 19 -fold. \n• SARS -CoV-2 PCR percent positivity , as detected in integrated sentinel surveillance as part of the Global \nInfluenza Surveillance and Response System ( GISRS ) and reported to FluNet was around 8 % as of 7 January \n2024. \n• During the 28 -day period from 11 December 2023 to 7 January , 53 and 42 countries provided data at least \nonce on COVID -19'} First row of test: {'Text': '1. The Americas reported a decrease in new COVID-19 cases, with the highest numbers reported in Canada, Chile, and Peru.\n2. The number of deaths in the Americas also decreased, with the highest numbers reported in the United States, Canada, and Chile.\n3. The Eastern Mediterranean Region reported a decrease in new cases, with Kuwait showing the highest proportional increase.\n4. Afghanistan, Iran, and Morocco had the highest numbers of new cases in the Eastern Mediterranean Region.\n5. The number of deaths in the Eastern Mediterranean Region decreased, with Iran, Afghanistan, and Morocco reporting the highest numbers.\n6. The European Region reported a decrease in new cases, with Malta, Moldova, and Ireland showing the highest proportional increases.\n7. Russia, Italy, and Greece had the highest numbers of new cases in the European Region.\n8. The number of deaths in the European Region decreased, with Italy, Russia, and Poland reporting the highest numbers.\n9. The South-East Asia Region reported a significant increase in new cases, with India, Myanmar, and Indonesia showing the highest proportional increases.\n10. India, Indonesia, and Thailand had the highest numbers of new cases in the South-East Asia Region.\n\n'}
Map: 0%| | 0/29 [00:00<?, ? examples/s]
Map: 0%| | 0/1 [00:00<?, ? examples/s]
Here, we're calculating the embeddings for each row of training data, by averaging across the token embeddings for each token in each row.
We're also using a test question to pick out and print the three most relevant snippets.
import torch
from torch.nn.functional import cosine_similarity
import numpy as np
n_samples=3
# Function to get average embeddings
def get_avg_embedding(input_ids):
with torch.no_grad():
embeddings = model.get_input_embeddings()(input_ids)
avg_embedding = torch.mean(embeddings, dim=1)
return avg_embedding
# Calculate average embeddings for each row in 'train' dataset
train_embeddings = []
for example in data['train']:
input_ids = torch.tensor(example['input_ids']).unsqueeze(0).to("cuda") # Moved to CUDA
avg_embedding = get_avg_embedding(input_ids)
train_embeddings.append(avg_embedding)
train_embeddings = torch.stack(train_embeddings).squeeze()
# Tokenize and get embedding for the question
question = "For the 28-day period from 11 December 2023 to 7 January 2024, how " + \
"many new cases of COVID-19 were reported globally?"
question_input_ids = tokenizer(question, return_tensors="pt", truncation=True, \
max_length=500)["input_ids"].to("cuda") # Moved to CUDA
question_embedding = get_avg_embedding(question_input_ids).squeeze()
# Calculate cosine similarity between question and each row in 'train'
# Convert to float32 and move to CPU
cosine_similarities = cosine_similarity(question_embedding.unsqueeze(0), \
train_embeddings, dim=1).cpu().float()
# Sort and find top n_samples most similar rows
top_indices = torch.topk(cosine_similarities, n_samples).indices.tolist()
for idx in top_indices:
print("Similarity Score:", cosine_similarities[idx].item())
print("Text:", data['train'][idx]['Text'])
print("---")
Similarity Score: 0.8447265625 Text: V-2 sequence data and metadata from GISAID, from 3 July to 31 December 2023. A B 22 Additional resources • Tracking SARS -CoV-2 Variants • WHO statement on updated tracking system on SARS -CoV-2 variants of concern and variants of interest • SARS -CoV-2 variant risk evaluation framework, 30 August 2023 • WHO JN.1 Initial Risk Evaluation, 1 3 December 2023 • WHO BA.2.86 Initial Risk Evaluation, 21 November 2023 • WHO EG.5 Updated Risk Evaluation, 21 November 2023 • WHO XBB.1.5 Updated Risk Assessment, 20 June 2023 • WHO XBB.1.16 Updated Risk Assessment, 5 June 2023 22 WHO regional overviews Data for 11 December 2023 to 7 January 2024 African Region The African Region reported over 3354 new cases, a 63% decrease as compared to the previous 28 -day period. Five (10%) of the 50 countries for which data are available reported increases in new cases of 20% or greater, with the highest proportional increases observed in Mauritania ( 30 vs six new cases; +400%), Senegal (17 vs six new cases ; +183%), Mali (two vs one new case; +100%), Burundi (78 vs 51 new cases; +53%), and Angola (417 vs 316 new cases; +32%). The highest numbers of new cases were reported from Mauritius (3228 new cases; 253.8 new cases per 100 000; -57% --- Similarity Score: 0.8330078125 Text: %), and the Western Pacific Region (five countries; 14%). No country in the Eastern Mediterranean Region shared1. The number of countries that consistently reported new hospitalizations for the period was 9% (22 countries) (Table 2). Among the 22 countries consistently reporting new hospitalizations, 8 (36%) countries registered an increase of 20% or greater in hospitalizations during the past 28 days compared to the previous 28 -day period: Indonesia (1337 vs 149; +797%), Malta (79 vs 21; +276%), Brunei Darussalam (588 vs 161; +265%), Malaysia (9312 vs 4137; +125%), Greece (6366 vs 3792; +68%), Singapore (2619 vs 1719; +52%), United Sates of America (12 8 073 vs 84 98 1; +51%), and Ireland (1353 vs 967; +40%). The highest numbers of new hospital admissions were reported from the United States of America (12 8 073 vs 84 981; +51%), Malaysia (9312 vs 4137; +125%), and Italy (8845 vs 13 857; -36%). 11 Table 2. Number of new hospitalization admissions reported by WHO regions, 11 December 2023 to 7 January 2024 compared to 16 October to 13 November to 10 December 2023 Region Countries reported at least once in the past 28 days Countries reported consistently in the past and previous 28 days* Number --- Similarity Score: 0.83251953125 Text: ), and Jamaica (32 vs 26 new cases; +23%). The highest numbers of new cases were reported from Canada (18 947 new cases; 50.2 new cases per 100 000; -18%), Chile (9693 new cases; 50.7 new cases per 100 000; -25%), and Peru (4657 new cases; 14.1 new cases per 100 000; -38%). The number of new 28 -day deaths in the Region decreased by 41% as compared to the previous 28 -day period, with 3869 new deaths reported. The highest numbers of new deaths were reported from the United States of America (3552 new deaths; 1.1 new deaths per 100 000; -40%), Canada (128 new deaths; <1 new death per 100 000; -60%), and Chile (114 new deaths; <1 new death per 100 000; -46%). Updates from the Region of the Americas 23 Eastern Mediterranean Region The Eastern Mediterranean Region reported over 2464 new cases, a 13% decrease as compared to the previous 28 -day period. One (5%) of the 22 countries for which data are available reported increases in new cases of 20% or greater, with the highest proportional increases observed in Kuwait (102 vs 39 new cases; +162%). The highest numbers of new cases were reported from Afghanistan (1256 new cas es; 3.2 new cases per 100 000; +14%), the Islamic Republic of Iran (663 new cases; <1 new case per 100 000; -48%), and Morocco (443 new cases; 1.2 new cases per ---
# https://stackoverflow.com/questions/56081324/why-are-google-colab-shell-commands-not-working
import locale
def getpreferredencoding(do_setlocale = True):
return "UTF-8"
locale.getpreferredencoding = getpreferredencoding
from google.colab import drive
# User inputs the API key here
api_key = input("Please enter your OpenAI API key:")
os.environ["OPENAI_API_KEY"] = api_key
Please enter your OpenAI API key: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
!pip install openai
import os
from openai import OpenAI
# Get the API key from environment variable
client = OpenAI(
api_key=os.environ['OPENAI_API_KEY'],
)
!pip install tqdm
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (4.66.2)
# Reload embeddings if previously saved to drive
import numpy as np
def reload_embeddings(cache_dir, embeddings_filename):
csv_path = os.path.join(cache_dir, embeddings_filename)
# Initialize an empty list to hold the read data
train_embeddings_list = []
# Read from CSV
with open(csv_path, 'r', newline='') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
train_embeddings_list.append(list(map(float, row)))
# Convert list of lists to a numpy array and then to a tensor
train_embeddings = torch.tensor(np.array(train_embeddings_list))
# Convert tensor to float32
train_embeddings = train_embeddings.to(torch.float32)
print("Reloaded train_embeddings:", train_embeddings)
return train_embeddings
def get_openai_embedding(text):
embedding = client.embeddings.create(
input = [text],
model="text-embedding-ada-002").data[0].embedding
return torch.tensor(embedding)
def embeddings_to_csv(train_embeddings, cache_dir, embeddings_filename):
# Check if train_embeddings is already a tensor
if not isinstance(train_embeddings, torch.Tensor):
# Convert list of tensors to a single tensor
train_embeddings = torch.stack(train_embeddings)
# Save train_embeddings to a CSV file
import csv
# Convert tensor to numpy array and then to list
train_embeddings_list = train_embeddings.cpu().numpy().tolist()
csv_path = os.path.join(cache_dir, embeddings_filename)
with open(csv_path, 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerows(train_embeddings_list)
print(f"Saved train_embeddings to {csv_path}")
def create_embeddings():
from tqdm import tqdm
import time
# Initialize train_embeddings as an empty list
train_embeddings = []
# Fetch embeddings for training data
for i in tqdm(range(0, len(data['train']), 1)): # Assume each API call handles one item
example = data['train'][i]
text = example['Text']
embedding = get_openai_embedding(text)
train_embeddings.append(embedding)
time.sleep(2) # Sleep for 2 seconds to avoid rate limiting
return train_embeddings
# Check if embeddings file already exists in cache, else create new embeddings
embeddings_filename="train_embeddings.csv"
if os.path.exists(os.path.join(cache_dir, embeddings_filename)):
train_embeddings = reload_embeddings(cache_dir, embeddings_filename)
print("Loaded train_embeddings from cache")
else:
train_embeddings = create_embeddings()
embeddings_to_csv(train_embeddings, cache_dir, embeddings_filename)
print(f"Created new train_embeddings. Saved embeddings to {os.path.join(cache_dir, embeddings_filename)}")
# from tqdm import tqdm
# import time
# def get_openai_embedding(text):
# embedding = client.embeddings.create(
# input = [text],
# model="text-embedding-ada-002").data[0].embedding
# return torch.tensor(embedding)
# # Initialize train_embeddings as an empty list
# train_embeddings = []
# # Fetch embeddings for training data
# for i in tqdm(range(0, len(data['train']), 1)): # Assume each API call handles one item
# example = data['train'][i]
# text = example['Text']
# embedding = get_openai_embedding(text)
# train_embeddings.append(embedding)
# time.sleep(2) # Sleep for 2 seconds to avoid rate limiting
100%|██████████| 64/64 [02:18<00:00, 2.17s/it]
# # Convert list to float32 tensor
# train_embeddings = torch.tensor(np.array(train_embeddings))
# train_embeddings = train_embeddings.to(torch.float32)
# Fetch embedding for the question
question = "For the 28-day period from 11 December 2023 to 7 January 2024, how " + \
"many new cases of COVID-19 were reported globally?"
question_embedding = get_openai_embedding(question)
# Calculate cosine similarity between question and each row in 'train'
cosine_similarities = cosine_similarity(question_embedding, train_embedddings, dim=-1)
# Sort and find top n_samples most similar rows
top_indices = torch.topk(cosine_similarities, n_samples).indices.tolist()
# Output similar texts
for idx in top_indices:
print("Similarity Score:", cosine_similarities[idx].item())
print("Text:", data['train'][idx]['Text'])
print("---")
Similarity Score: 0.8974971175193787 Text: cases remained stable during the 28-day period of 11 December 2023 to 7 January 2024 as compared to the previous 28-day period, with over 1.1 million new cases reported. The number of new weekly deaths decreased by 26% as compared to the previous 28-day period, with 8700 new fatalities reported. As of 7 January 2024, over 774 million confirmed cases and over 7 million deaths have been reported globally. According to estimates obtained from viral loads in wastewater surveillance, clinical detection of cases underestimated the real burden 2 to 19-fold. Reported cases do not accurately represent infection rates due to the reduction in testing and reporting globally. During this 28-day period, only 45% of countries reported at --- Similarity Score: 0.8882509469985962 Text: COVID-19 Epidemiological Update Edition 163 published 19 January 2024 In this edition: • Key highlights • Global overview • Hospitalizations and ICU admissions • SARS-CoV-2 variants of interest and variants under monitoring • WHO regional overviews Key highlights • Globally, during the 28-day period from 11 December 2023 to 7 January 2024, 106 countries reported COVID-19 cases and 51 countries reported COVID-19 deaths. Note that this does not reflect the actual number of countries where cases or deaths are occurring, as many countries have stopped or changed frequency of reporting. • From the available data, the number of reported cases has increased while deaths have decreased during the 28-day period, with over 1 --- Similarity Score: 0.876165509223938 Text: , especially data on morbidity and impact on health systems. COVID-19 remains a major threat, and WHO urges Member States to maintain, not dismantle, their established COVID-19 infrastructure. It is crucial to sustain, inter alia, early warning, surveillance and reporting, variant tracking, early clinical care provision, administration of vaccine to high-risk groups, improvements in ventilation, and regular communication. Hospitalizations and ICU admissions At the global level, during the past 28 days (11 December 2023 to 7 January 2024), a total of 173,547 new hospitalizations and 1,966 new intensive care unit (ICU) admissions were reported from 53 and 42 countries, respectively. This represents a 32% increase and a 3% decrease ---
# import numpy as np
# # Check if train_embeddings is already a tensor
# if not isinstance(train_embeddings, torch.Tensor):
# # Convert list of tensors to a single tensor
# train_embeddings = torch.stack(train_embeddings)
# # Save train_embeddings to a CSV file
# import csv
# # Convert tensor to numpy array and then to list
# train_embeddings_list = train_embeddings.cpu().numpy().tolist()
# csv_path = os.path.join(cache_dir, "train_embeddings.csv")
# with open(csv_path, 'w', newline='') as csvfile:
# csvwriter = csv.writer(csvfile)
# csvwriter.writerows(train_embeddings_list)
# print(f"Saved train_embeddings to {csv_path}")
Saved train_embeddings to /content/drive/My Drive/huggingface_cache/train_embeddings.csv
MS MARCO is an embedding model from the SentenceTransformers framework that was trained on user search queries using Bing search engine. It is fine-tuned for either cosine similarity or dot product. In this section, I use the msmarco-distilbert-base-tas-b
model, which is tuned for dot product and gives better results than Llama embeddings.
!pip install sentence_transformers
from sentence_transformers import SentenceTransformer
import torch
n_samples = 3
# Initialize the model
marco_model = SentenceTransformer('msmarco-distilbert-base-tas-b').to("cuda") # Moved to CUDA
# Function to get embeddings
def get_marco_embedding(text):
return torch.tensor(marco_model.encode(text)).to("cuda") # Moved to CUDA
# Calculate embeddings for each row in 'train' dataset
train_embeddings = []
for example in data['train']:
train_embeddings.append(get_marco_embedding(example['Text']))
train_embeddings = torch.stack(train_embeddings)
# Get embedding for the question
question = "For the 28-day period from 11 December 2023 to 7 January 2024, how " + \
"many new cases of COVID-19 were reported globally?"
question_embedding = get_marco_embedding(question)
# Calculate dot product between question and each row in 'train'
dot_products = torch.matmul(question_embedding, train_embeddings.T)
# Sort and find top n_samples most similar rows
top_indices = torch.topk(dot_products, n_samples).indices.tolist()
for idx in top_indices:
print("Similarity Score:", dot_products[idx].item())
print("Text:", data['train'][idx]['Text'])
print("---")
Similarity Score: 107.57888793945312 Text: 1 COVID -19 Epidemiological Update Edition 163 published 19 January 2024 In this edition: • Key highlights • Global overview • Hospitalizations and ICU admissions • SARS -CoV-2 variants of interest and variants under monitoring • WHO regional overviews Key highlights • Globally, during the 28 -day period from 11 December 2023 to 7 January 2024 , 106 countries report ed COVID - 19 cases and 51 countries report ed COVID -19 deaths. Note that this does not reflect the actual number of countries where cases or deaths are occurring , as many countries have stopped or changed frequency of reporting . • From the available data , the number of reported cases has increased while deaths have decreased during the 28 -day period , with over 1.1 million new cases and 8700 new deaths , an increase of 4% and a decrease of 26%, respectively, compared to the previous 28 days ( 13 November to 10 December 2023 ). Trends in the number of reported new cases and deaths should be interpreted with caution due to decreased testing and sequencing, alongside reporting delays in many countries. According to estimates obtained from wastewater surveillance, clinical detection of cases underestimates the real burden from 2 to 19 -fold. • SARS -CoV-2 PCR percent positivity , as detected in integrated sentinel surveillance as part of the Global Influenza Surveillance and Response System ( GISRS ) and reported to FluNet was around 8 % as of 7 January 2024. • During the 28 -day period from 11 December 2023 to 7 January , 53 and 42 countries provided data at least once on COVID -19 --- Similarity Score: 106.12256622314453 Text: 3 COVID -19 2023 -2025 Updated Strategic Preparedness and Response Plan . The new dashboard will progressively incorporate more components throughout 2024. The previous link of the Global WHO Coronavirus (COVID -19) Dashboard will still be active and redirect users to the new one from 22 December onward. Please note that start time of the redirection can differ around the world by up to 24 hours. 2 For t he latest data and other updates on COVID -19, please see: • WHO Monthly Operational Update and past editions of the Weekly Epidemiological Update on COVID -19 • WHO COVID -19 detailed surveillance data dashboard • WHO COVID -19 policy briefs • COVID -19 surveillance reporting requirements update for Member States • Summary Tables of COVID -19 vaccine effectiveness (VE) studies and results (last updated 11 January 2024) • Forest Plots displaying results of COVID -19 VE studies (last updated 15 January 2024) • Special focus WEU on interpreting relative VE (29 June 2022, pages 6 -8) • Neutralizat ion p lots (last updated 15 January 2024) • WHO COVID -19 VE Resources Global overview Data as of 7 January 2024 Globally, the number of new weekly cases remained stable during the 28 -day period of 11 December 2023 to 7 January 2024 as compared to the previous 28 -day period, with over 1.1 million new cases reported (Figure 1, Table 1). The number of new weekly deaths decreased by 26% as compared to the previous 28 -day period, with 8700 new fatalities --- Similarity Score: 104.67813110351562 Text: -19 cases over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 8 Figure 5. Number of COVID -19 deaths reported over the last 28 days per 100 000 population , as of 7 January 2024 ** **Se e Annex 1: Data, table, and figure notes 9 Figure 6. Percentage change in confirmed COVID -19 deaths over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 10 Hospitalizations and ICU admissions At the global level, during the past 28 days (11 December 2023 to 7 January 2024), a total of 173 547 new hospitalizations and 1 966 new intensive care unit (ICU) admissions were reported from 5 3 and 4 2 countries, respectively (Figure 7). This represents 32% increase and 3% decrease, respectively, compared to the previous 28 days (13 November to 10 December 2023). Note that the absence of reported data from some countries to WHO does not imply that there are no COVID -19-related hospitalizations in those countries. The prese nted hospitalization data are preliminary and might change as new data become available. Furthermore, hospitalization data are subject to reporting delays. These data also likely include both hospitalizations with incidental cases of SARS -CoV-2 infection and those due to COVID -19 disease. New hospitalizations During the past 28 days, 5 3 (23%) countries reported data to WHO on new hospitalizations at least once (Figure 7). ---
from transformers import TextStreamer
from torch.nn.functional import cosine_similarity
import torch
# Assume that train_embeddings has been computed and is available
# Define a stream
def stream(user_prompt):
model.config.use_cache = True
system_prompt = "You are an expert on COVID-19 statistics. You provide very " + \
"succinct answers no longer than ten words."
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
prompt = f"{B_INST}{B_SYS}{system_prompt.strip()}{E_SYS}{user_prompt.strip()} {E_INST}\n\n"
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=100)
def evaluation(questions, answers, use_embeddings=False, n_samples=3, embedding_type='marco'):
for i, (question, answer) in enumerate(zip(questions, answers)):
if use_embeddings:
print(f'Using the {embedding_type} embedding model')
# Tokenize and get embedding for the question
if embedding_type == 'llama':
question_input_ids = tokenizer(question, return_tensors="pt", truncation=True, max_length=512)["input_ids"].to("cuda")
question_embedding = get_avg_embedding(question_input_ids).squeeze()
question_embedding = question_embedding.unsqueeze(0) #average over all tokens.
elif embedding_type == 'openai':
question_embedding = get_openai_embedding(question)
elif embedding_type == 'marco':
question_embedding = get_marco_embedding(question)
else:
print("Invalid embedding_type. It should be either 'llama', 'openai', or 'marco'.")
# # Cosine Similarity
# # Calculate cosine similarity between question and each row in 'train'
# cosine_similarities = cosine_similarity(question_embedding, train_embeddings, dim=1).cpu().float()
# # Sort and find top n_samples most similar rows
# top_indices = torch.topk(cosine_similarities, n_samples).indices.tolist()
# print(f'Top indices are {top_indices}')
# Dot Product Similarity
# Calculate dot product similarity between question and each row in 'train'
dot_product_similarities = torch.matmul(question_embedding, train_embeddings.T).cpu().float()
# Sort and find top n_samples most similar rows
top_indices = torch.topk(dot_product_similarities, n_samples).indices.tolist()
# print(f'Top indices are {top_indices}')
similar_rows = []
for idx in top_indices:
text = data['train'][idx]['Text']
if isinstance(text, list):
# If text is a list of strings, join the strings into a single string with a newline in between
text = "\n".join(text)
# Append the string to similar_rows
similar_rows.append(text)
# print(similar_rows)
similar_rows_text = "\n".join(similar_rows)
user_prompt = "\n\nHere are some snippets from the World Health Organization " + \
"Epidemiological Update - Edition 163 regarding COVID-19 statistics from " + \
f"11 December 2023 to 7 January 2024:\n\n{similar_rows_text}\n\nAnswer the " + \
f"following question succinctly, solely using the above snippets of text.\n\n{question}"
else:
user_prompt = question
print(f"Question {i + 1}:") # Printing the question number
stream(user_prompt)
print("Correct Answer:", answer)
print('\n\n')
questions = [
"During the 28-day period from 11 December 2023 to 7 January, how many " + \
"countries reported new COVID-19 cases?",
"To date, how many confirmed cases of COVID-19 have been reported globally?",
"During the 28-day period from 11 December 2023 to 7 January, by what percent " + \
"did new hospitalization increase globally due to COVID-19?",
"During the 28-day period from 11 December 2023 to 7 January, which countries " + \
"in the Americas reported the highest number of new COVID-19 cases?",
"During the 28-day period from 11 December 2023 to 7 January, which Eastern " + \
"Mediterranean country reported the highest proportional increase in new COVID-19 cases?",
"During the 28-day period from 11 December 2023 to 7 January, which World Health " + \
"Organization Region reported the largest decrease in new COVID-19 cases?",
"During the 28-day period from 11 December 2023 to 7 January, what was the percent " + \
"increase in new deaths in the South-East Asia Region?",
"During the 28-day period from 11 December 2023 to 7 January, which countries " + \
"reported no new hospitalizations?",
"When did the ICU-to-hospitalizations ratio peak during the COVID-19 pandemic?",
"During the 28-day period from 11 December 2023 to 7 January, which SARS-CoV-2 " + \
"variants of interest (VOIs) were the WHO tracking?"
]
answers = [
"106 countries.",
"774 million.",
"40%.",
"Canada, Chile, and Peru.",
"Kuwait.",
"The African Region.",
"564%",
"Mauritania, Mali, Turks and Caicos Islands, Saint Lucia, and Honduras.",
"July 2021",
"XBB.1.5, XBB.1.16, EG.5, BA.2.86 and JN.1."
]
print("Question embedding shape:", question_embedding.shape)
print("Train embeddings shape:", train_embeddings.shape)
Question embedding shape: torch.Size([768]) Train embeddings shape: torch.Size([29, 768])
Given that this COVID-19 information is recent, Llama 2 13B answers 10 of 10 questions incorrectly without the help of embeddings.
# Call evaluation without embeddings
evaluation(questions, answers, use_embeddings=False)
Question 1: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> During the 28-day period from 11 December 2023 to 7 January, how many countries reported new COVID-19 cases? [/INST] 47 countries reported new COVID-19 cases.</s> Correct Answer: 106 countries. Question 2: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> To date, how many confirmed cases of COVID-19 have been reported globally? [/INST] Over 25 million confirmed cases worldwide.</s> Correct Answer: 774 million. Question 3: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> During the 28-day period from 11 December 2023 to 7 January, by what percent did new hospitalization increase globally due to COVID-19? [/INST] 12.3% increase.</s> Correct Answer: 40%. Question 4: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> During the 28-day period from 11 December 2023 to 7 January, which countries in the Americas reported the highest number of new COVID-19 cases? [/INST] Top 3 countries: 1. United States (352,831 new cases) 2. Brazil (231,305 new cases) 3. Mexico (123,421 new cases)</s> Correct Answer: Canada, Chile, and Peru. Question 5: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> During the 28-day period from 11 December 2023 to 7 January, which Eastern Mediterranean country reported the highest proportional increase in new COVID-19 cases? [/INST] Cyprus reported highest proportional increase.</s> Correct Answer: Kuwait. Question 6: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> During the 28-day period from 11 December 2023 to 7 January, which World Health Organization Region reported the largest decrease in new COVID-19 cases? [/INST] Europe.</s> Correct Answer: The African Region. Question 7: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> During the 28-day period from 11 December 2023 to 7 January, what was the percent increase in new deaths in the South-East Asia Region? [/INST] 125% increase in new deaths.</s> Correct Answer: 564% Question 8: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> During the 28-day period from 11 December 2023 to 7 January, which countries reported no new hospitalizations? [/INST] None.</s> Correct Answer: Mauritania, Mali, Turks and Caicos Islands, Saint Lucia, and Honduras. Question 9: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> When did the ICU-to-hospitalizations ratio peak during the COVID-19 pandemic? [/INST] Peak ICU-to-hospitalizations ratio: March 2020.</s> Correct Answer: July 2021 Question 10: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> During the 28-day period from 11 December 2023 to 7 January, which SARS-CoV-2 variants of interest (VOIs) were the WHO tracking? [/INST] WHO tracked Omicron, Delta, and Beta variants.</s> Correct Answer: XBB.1.5, XBB.1.16, EG.5, BA.2.86 and JN.1.
# Call evaluation with embeddings
# evaluation(questions, answers, use_embeddings=True, n_samples=3, embedding_type='llama')
# evaluation(questions, answers, use_embeddings=True, n_samples=3, embedding_type='openai')
evaluation(questions, answers, use_embeddings=True, n_samples=3, embedding_type='marco')
Using the marco embedding model Question 1: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: 1 COVID -19 Epidemiological Update Edition 163 published 19 January 2024 In this edition: • Key highlights • Global overview • Hospitalizations and ICU admissions • SARS -CoV-2 variants of interest and variants under monitoring • WHO regional overviews Key highlights • Globally, during the 28 -day period from 11 December 2023 to 7 January 2024 , 106 countries report ed COVID - 19 cases and 51 countries report ed COVID -19 deaths. Note that this does not reflect the actual number of countries where cases or deaths are occurring , as many countries have stopped or changed frequency of reporting . • From the available data , the number of reported cases has increased while deaths have decreased during the 28 -day period , with over 1.1 million new cases and 8700 new deaths , an increase of 4% and a decrease of 26%, respectively, compared to the previous 28 days ( 13 November to 10 December 2023 ). Trends in the number of reported new cases and deaths should be interpreted with caution due to decreased testing and sequencing, alongside reporting delays in many countries. According to estimates obtained from wastewater surveillance, clinical detection of cases underestimates the real burden from 2 to 19 -fold. • SARS -CoV-2 PCR percent positivity , as detected in integrated sentinel surveillance as part of the Global Influenza Surveillance and Response System ( GISRS ) and reported to FluNet was around 8 % as of 7 January 2024. • During the 28 -day period from 11 December 2023 to 7 January , 53 and 42 countries provided data at least once on COVID -19 -19 cases over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 8 Figure 5. Number of COVID -19 deaths reported over the last 28 days per 100 000 population , as of 7 January 2024 ** **Se e Annex 1: Data, table, and figure notes 9 Figure 6. Percentage change in confirmed COVID -19 deaths over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 10 Hospitalizations and ICU admissions At the global level, during the past 28 days (11 December 2023 to 7 January 2024), a total of 173 547 new hospitalizations and 1 966 new intensive care unit (ICU) admissions were reported from 5 3 and 4 2 countries, respectively (Figure 7). This represents 32% increase and 3% decrease, respectively, compared to the previous 28 days (13 November to 10 December 2023). Note that the absence of reported data from some countries to WHO does not imply that there are no COVID -19-related hospitalizations in those countries. The prese nted hospitalization data are preliminary and might change as new data become available. Furthermore, hospitalization data are subject to reporting delays. These data also likely include both hospitalizations with incidental cases of SARS -CoV-2 infection and those due to COVID -19 disease. New hospitalizations During the past 28 days, 5 3 (23%) countries reported data to WHO on new hospitalizations at least once (Figure 7). 3 COVID -19 2023 -2025 Updated Strategic Preparedness and Response Plan . The new dashboard will progressively incorporate more components throughout 2024. The previous link of the Global WHO Coronavirus (COVID -19) Dashboard will still be active and redirect users to the new one from 22 December onward. Please note that start time of the redirection can differ around the world by up to 24 hours. 2 For t he latest data and other updates on COVID -19, please see: • WHO Monthly Operational Update and past editions of the Weekly Epidemiological Update on COVID -19 • WHO COVID -19 detailed surveillance data dashboard • WHO COVID -19 policy briefs • COVID -19 surveillance reporting requirements update for Member States • Summary Tables of COVID -19 vaccine effectiveness (VE) studies and results (last updated 11 January 2024) • Forest Plots displaying results of COVID -19 VE studies (last updated 15 January 2024) • Special focus WEU on interpreting relative VE (29 June 2022, pages 6 -8) • Neutralizat ion p lots (last updated 15 January 2024) • WHO COVID -19 VE Resources Global overview Data as of 7 January 2024 Globally, the number of new weekly cases remained stable during the 28 -day period of 11 December 2023 to 7 January 2024 as compared to the previous 28 -day period, with over 1.1 million new cases reported (Figure 1, Table 1). The number of new weekly deaths decreased by 26% as compared to the previous 28 -day period, with 8700 new fatalities Answer the following question succinctly, solely using the above snippets of text. During the 28-day period from 11 December 2023 to 7 January, how many countries reported new COVID-19 cases? [/INST] 106 countries reported new COVID-19 cases during the 28-day period from 11 December 2023 to 7 January.</s> Correct Answer: 106 countries. Using the marco embedding model Question 2: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: 1 COVID -19 Epidemiological Update Edition 163 published 19 January 2024 In this edition: • Key highlights • Global overview • Hospitalizations and ICU admissions • SARS -CoV-2 variants of interest and variants under monitoring • WHO regional overviews Key highlights • Globally, during the 28 -day period from 11 December 2023 to 7 January 2024 , 106 countries report ed COVID - 19 cases and 51 countries report ed COVID -19 deaths. Note that this does not reflect the actual number of countries where cases or deaths are occurring , as many countries have stopped or changed frequency of reporting . • From the available data , the number of reported cases has increased while deaths have decreased during the 28 -day period , with over 1.1 million new cases and 8700 new deaths , an increase of 4% and a decrease of 26%, respectively, compared to the previous 28 days ( 13 November to 10 December 2023 ). Trends in the number of reported new cases and deaths should be interpreted with caution due to decreased testing and sequencing, alongside reporting delays in many countries. According to estimates obtained from wastewater surveillance, clinical detection of cases underestimates the real burden from 2 to 19 -fold. • SARS -CoV-2 PCR percent positivity , as detected in integrated sentinel surveillance as part of the Global Influenza Surveillance and Response System ( GISRS ) and reported to FluNet was around 8 % as of 7 January 2024. • During the 28 -day period from 11 December 2023 to 7 January , 53 and 42 countries provided data at least once on COVID -19 8700 new fatalities reported. As of 7 January 2024, over 774 million confirmed cases and over 7 million deaths have been reported globally. According to estimates obtained from viral loads in wastewater surveillance, clinical detection of cases underestimated the real burden 2 to 19 -fold1,2,3 Reported cases do not accurately represent infection rates due to the r eduction in testing and reporting globally . During this 28 -day period, only 45% (10 6 of 234) of countries reported at least one case to WHO. It is important to note that this statistic does not reflect the actual number of countries where cases exist. Additionally, data from the previous 28 -day period are continuously being updated to inc orporate retrospective changes made by countries regarding reported COVID -19 cases and deaths. Data presented in this report are therefore incomplete and should be interpreted considering these limitations. Some countries continue to r eport high burdens of COVID -19, including increases in newly reported cases and, more importantly, increases in hospitalizations and deaths – the latter of which are considered more reliable indicators given reductions in testing. Global and national data on SARS -CoV-2 PCR percent positivity are available on WHO’s integrated influenza and other respiratory viruses surveillance dashboard . Recent data (epidemiological week one, 1 to 7 January 2024) from sentinel sites show that the SARS -CoV-2 PCR percent positivity from reporting countries averages approximately 8% (Figure 2). As many countries discontinue COVID -19-specific reporting and integrate it into respiratory disease surveillance, WHO will use all available sources to continue monitoring the COVID -19 epidemiological situation, especially data on morbidity and impact on h ealth systems. COVID -19 remains a major threat, and WHO urges Member States to maintain, not dism at least once on COVID -19 hospitalizations and admissions to an intensive care unit (ICU) , respectively . From the available data, over 173 000 new hospitalizations and over 1900 new ICU admissions were reported during the 28 -day period . Amongst the countries reporting these data consistently over the current and past reporting period, there was an overall increase of 40% and 13% in new hospitalizations and new ICU admissions , respectively . • Globally, JN.1 is the most reported VOI (now reported by 71 countries), accounting for 65.5% of sequences in week 52 compared to 24.8% in week 48 (Figure 10, Table 6). Its parent lineage, BA.2.86, is stable and accounted for 7.8% of sequences in week 52 c ompared to 7.0% in week 48 (Figure 10, Table 6). The initial risk evaluation for JN.1 was published on 19 December 2023, with an overall evaluation of low public health risk at the global level based on available evidence. WHO is currently tracking several SARS -CoV-2 variants : five VOIs – XBB.1.5, XBB.1.16 , EG.5 BA.2.86 and JN.1 ; and five VUMs : DV.7, XBB, XBB.1.9.1, XBB .1.9.2 and XBB.2.3 • The Global WHO Coronavirus (COVID -19) Dashboard has been updated and adapted with a new interface on 22 December 2023 to support WHO and Member States’ work to transition from COVID -19 as an emergency to longer -term disease management, as outlined in WHO’s 3 May 2023 COVID -19 2023 Answer the following question succinctly, solely using the above snippets of text. To date, how many confirmed cases of COVID-19 have been reported globally? [/INST] Over 774 million confirmed cases of COVID-19 have been reported globally, as of 7 January 2024.</s> Correct Answer: 774 million. Using the marco embedding model Question 3: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: -19 cases over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 8 Figure 5. Number of COVID -19 deaths reported over the last 28 days per 100 000 population , as of 7 January 2024 ** **Se e Annex 1: Data, table, and figure notes 9 Figure 6. Percentage change in confirmed COVID -19 deaths over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 10 Hospitalizations and ICU admissions At the global level, during the past 28 days (11 December 2023 to 7 January 2024), a total of 173 547 new hospitalizations and 1 966 new intensive care unit (ICU) admissions were reported from 5 3 and 4 2 countries, respectively (Figure 7). This represents 32% increase and 3% decrease, respectively, compared to the previous 28 days (13 November to 10 December 2023). Note that the absence of reported data from some countries to WHO does not imply that there are no COVID -19-related hospitalizations in those countries. The prese nted hospitalization data are preliminary and might change as new data become available. Furthermore, hospitalization data are subject to reporting delays. These data also likely include both hospitalizations with incidental cases of SARS -CoV-2 infection and those due to COVID -19 disease. New hospitalizations During the past 28 days, 5 3 (23%) countries reported data to WHO on new hospitalizations at least once (Figure 7). 1 COVID -19 Epidemiological Update Edition 163 published 19 January 2024 In this edition: • Key highlights • Global overview • Hospitalizations and ICU admissions • SARS -CoV-2 variants of interest and variants under monitoring • WHO regional overviews Key highlights • Globally, during the 28 -day period from 11 December 2023 to 7 January 2024 , 106 countries report ed COVID - 19 cases and 51 countries report ed COVID -19 deaths. Note that this does not reflect the actual number of countries where cases or deaths are occurring , as many countries have stopped or changed frequency of reporting . • From the available data , the number of reported cases has increased while deaths have decreased during the 28 -day period , with over 1.1 million new cases and 8700 new deaths , an increase of 4% and a decrease of 26%, respectively, compared to the previous 28 days ( 13 November to 10 December 2023 ). Trends in the number of reported new cases and deaths should be interpreted with caution due to decreased testing and sequencing, alongside reporting delays in many countries. According to estimates obtained from wastewater surveillance, clinical detection of cases underestimates the real burden from 2 to 19 -fold. • SARS -CoV-2 PCR percent positivity , as detected in integrated sentinel surveillance as part of the Global Influenza Surveillance and Response System ( GISRS ) and reported to FluNet was around 8 % as of 7 January 2024. • During the 28 -day period from 11 December 2023 to 7 January , 53 and 42 countries provided data at least once on COVID -19 at least once on COVID -19 hospitalizations and admissions to an intensive care unit (ICU) , respectively . From the available data, over 173 000 new hospitalizations and over 1900 new ICU admissions were reported during the 28 -day period . Amongst the countries reporting these data consistently over the current and past reporting period, there was an overall increase of 40% and 13% in new hospitalizations and new ICU admissions , respectively . • Globally, JN.1 is the most reported VOI (now reported by 71 countries), accounting for 65.5% of sequences in week 52 compared to 24.8% in week 48 (Figure 10, Table 6). Its parent lineage, BA.2.86, is stable and accounted for 7.8% of sequences in week 52 c ompared to 7.0% in week 48 (Figure 10, Table 6). The initial risk evaluation for JN.1 was published on 19 December 2023, with an overall evaluation of low public health risk at the global level based on available evidence. WHO is currently tracking several SARS -CoV-2 variants : five VOIs – XBB.1.5, XBB.1.16 , EG.5 BA.2.86 and JN.1 ; and five VUMs : DV.7, XBB, XBB.1.9.1, XBB .1.9.2 and XBB.2.3 • The Global WHO Coronavirus (COVID -19) Dashboard has been updated and adapted with a new interface on 22 December 2023 to support WHO and Member States’ work to transition from COVID -19 as an emergency to longer -term disease management, as outlined in WHO’s 3 May 2023 COVID -19 2023 Answer the following question succinctly, solely using the above snippets of text. During the 28-day period from 11 December 2023 to 7 January, by what percent did new hospitalization increase globally due to COVID-19? [/INST] 40%</s> Correct Answer: 40%. Using the marco embedding model Question 4: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: 1 COVID -19 Epidemiological Update Edition 163 published 19 January 2024 In this edition: • Key highlights • Global overview • Hospitalizations and ICU admissions • SARS -CoV-2 variants of interest and variants under monitoring • WHO regional overviews Key highlights • Globally, during the 28 -day period from 11 December 2023 to 7 January 2024 , 106 countries report ed COVID - 19 cases and 51 countries report ed COVID -19 deaths. Note that this does not reflect the actual number of countries where cases or deaths are occurring , as many countries have stopped or changed frequency of reporting . • From the available data , the number of reported cases has increased while deaths have decreased during the 28 -day period , with over 1.1 million new cases and 8700 new deaths , an increase of 4% and a decrease of 26%, respectively, compared to the previous 28 days ( 13 November to 10 December 2023 ). Trends in the number of reported new cases and deaths should be interpreted with caution due to decreased testing and sequencing, alongside reporting delays in many countries. According to estimates obtained from wastewater surveillance, clinical detection of cases underestimates the real burden from 2 to 19 -fold. • SARS -CoV-2 PCR percent positivity , as detected in integrated sentinel surveillance as part of the Global Influenza Surveillance and Response System ( GISRS ) and reported to FluNet was around 8 % as of 7 January 2024. • During the 28 -day period from 11 December 2023 to 7 January , 53 and 42 countries provided data at least once on COVID -19 -19 cases over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 8 Figure 5. Number of COVID -19 deaths reported over the last 28 days per 100 000 population , as of 7 January 2024 ** **Se e Annex 1: Data, table, and figure notes 9 Figure 6. Percentage change in confirmed COVID -19 deaths over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 10 Hospitalizations and ICU admissions At the global level, during the past 28 days (11 December 2023 to 7 January 2024), a total of 173 547 new hospitalizations and 1 966 new intensive care unit (ICU) admissions were reported from 5 3 and 4 2 countries, respectively (Figure 7). This represents 32% increase and 3% decrease, respectively, compared to the previous 28 days (13 November to 10 December 2023). Note that the absence of reported data from some countries to WHO does not imply that there are no COVID -19-related hospitalizations in those countries. The prese nted hospitalization data are preliminary and might change as new data become available. Furthermore, hospitalization data are subject to reporting delays. These data also likely include both hospitalizations with incidental cases of SARS -CoV-2 infection and those due to COVID -19 disease. New hospitalizations During the past 28 days, 5 3 (23%) countries reported data to WHO on new hospitalizations at least once (Figure 7). 8700 new fatalities reported. As of 7 January 2024, over 774 million confirmed cases and over 7 million deaths have been reported globally. According to estimates obtained from viral loads in wastewater surveillance, clinical detection of cases underestimated the real burden 2 to 19 -fold1,2,3 Reported cases do not accurately represent infection rates due to the r eduction in testing and reporting globally . During this 28 -day period, only 45% (10 6 of 234) of countries reported at least one case to WHO. It is important to note that this statistic does not reflect the actual number of countries where cases exist. Additionally, data from the previous 28 -day period are continuously being updated to inc orporate retrospective changes made by countries regarding reported COVID -19 cases and deaths. Data presented in this report are therefore incomplete and should be interpreted considering these limitations. Some countries continue to r eport high burdens of COVID -19, including increases in newly reported cases and, more importantly, increases in hospitalizations and deaths – the latter of which are considered more reliable indicators given reductions in testing. Global and national data on SARS -CoV-2 PCR percent positivity are available on WHO’s integrated influenza and other respiratory viruses surveillance dashboard . Recent data (epidemiological week one, 1 to 7 January 2024) from sentinel sites show that the SARS -CoV-2 PCR percent positivity from reporting countries averages approximately 8% (Figure 2). As many countries discontinue COVID -19-specific reporting and integrate it into respiratory disease surveillance, WHO will use all available sources to continue monitoring the COVID -19 epidemiological situation, especially data on morbidity and impact on h ealth systems. COVID -19 remains a major threat, and WHO urges Member States to maintain, not dism Answer the following question succinctly, solely using the above snippets of text. During the 28-day period from 11 December 2023 to 7 January, which countries in the Americas reported the highest number of new COVID-19 cases? [/INST] According to the World Health Organization Epidemiological Update - Edition 163, as of 7 January 2024, the countries in the Americas that reported the highest number of new COVID-19 cases during the 28-day period from 11 December 2023 to 7 January were: 1. United States (106,177 new cases) 2. Brazil (28,37 Correct Answer: Canada, Chile, and Peru. Using the marco embedding model Question 5: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: 1 COVID -19 Epidemiological Update Edition 163 published 19 January 2024 In this edition: • Key highlights • Global overview • Hospitalizations and ICU admissions • SARS -CoV-2 variants of interest and variants under monitoring • WHO regional overviews Key highlights • Globally, during the 28 -day period from 11 December 2023 to 7 January 2024 , 106 countries report ed COVID - 19 cases and 51 countries report ed COVID -19 deaths. Note that this does not reflect the actual number of countries where cases or deaths are occurring , as many countries have stopped or changed frequency of reporting . • From the available data , the number of reported cases has increased while deaths have decreased during the 28 -day period , with over 1.1 million new cases and 8700 new deaths , an increase of 4% and a decrease of 26%, respectively, compared to the previous 28 days ( 13 November to 10 December 2023 ). Trends in the number of reported new cases and deaths should be interpreted with caution due to decreased testing and sequencing, alongside reporting delays in many countries. According to estimates obtained from wastewater surveillance, clinical detection of cases underestimates the real burden from 2 to 19 -fold. • SARS -CoV-2 PCR percent positivity , as detected in integrated sentinel surveillance as part of the Global Influenza Surveillance and Response System ( GISRS ) and reported to FluNet was around 8 % as of 7 January 2024. • During the 28 -day period from 11 December 2023 to 7 January , 53 and 42 countries provided data at least once on COVID -19 4/50 (8%) Eastern Mediter ranean 2 464 (0%) -13% 23 409 749 (3%) 37 (0%) -53% 351 870 (5%) 4/22 (18%) 3/22 (14%) Global 1 113 726 (100%) 4% 774 075 198 (100%) 8 700 (100%) -26% 7 012 984 (100%) 106/234 (45%) 51/234 (22%) *Percent change in the number of newly confirmed cases/deaths in the past 28 days, compared to 28 days prior. Data from previous weeks are updated continuously with adjustments received from countries. **See Annex 1: Data, table, and figure notes 5 Figure 2. SARS -CoV-2 specimens tested and test positivity rates reported to FluNet from sent inel sites ; 5 January 2020 to 7 January 2024 Source: Influenza and SARS -CoV-2 surveillance data from GISRS reported to FluNet ; WHO Global Influenza Programme 6 Figure 3. Number of confirmed COVID -19 cases reported over the last 28 days per 100 000 population , as of 7 January 2024** **See Annex 1: Data, table, and figure notes 7 Figure 4. Percentage change in confirmed COVID -19 cases over the last 2 -19 cases over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 8 Figure 5. Number of COVID -19 deaths reported over the last 28 days per 100 000 population , as of 7 January 2024 ** **Se e Annex 1: Data, table, and figure notes 9 Figure 6. Percentage change in confirmed COVID -19 deaths over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 10 Hospitalizations and ICU admissions At the global level, during the past 28 days (11 December 2023 to 7 January 2024), a total of 173 547 new hospitalizations and 1 966 new intensive care unit (ICU) admissions were reported from 5 3 and 4 2 countries, respectively (Figure 7). This represents 32% increase and 3% decrease, respectively, compared to the previous 28 days (13 November to 10 December 2023). Note that the absence of reported data from some countries to WHO does not imply that there are no COVID -19-related hospitalizations in those countries. The prese nted hospitalization data are preliminary and might change as new data become available. Furthermore, hospitalization data are subject to reporting delays. These data also likely include both hospitalizations with incidental cases of SARS -CoV-2 infection and those due to COVID -19 disease. New hospitalizations During the past 28 days, 5 3 (23%) countries reported data to WHO on new hospitalizations at least once (Figure 7). Answer the following question succinctly, solely using the above snippets of text. During the 28-day period from 11 December 2023 to 7 January, which Eastern Mediterranean country reported the highest proportional increase in new COVID-19 cases? [/INST] According to the data provided, the Eastern Mediterranean country with the highest proportional increase in new COVID-19 cases during the 28-day period from 11 December 2023 to 7 January was... (drumroll please)... Egypt, with a 33% increase!</s> Correct Answer: Kuwait. Using the marco embedding model Question 6: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: 1 COVID -19 Epidemiological Update Edition 163 published 19 January 2024 In this edition: • Key highlights • Global overview • Hospitalizations and ICU admissions • SARS -CoV-2 variants of interest and variants under monitoring • WHO regional overviews Key highlights • Globally, during the 28 -day period from 11 December 2023 to 7 January 2024 , 106 countries report ed COVID - 19 cases and 51 countries report ed COVID -19 deaths. Note that this does not reflect the actual number of countries where cases or deaths are occurring , as many countries have stopped or changed frequency of reporting . • From the available data , the number of reported cases has increased while deaths have decreased during the 28 -day period , with over 1.1 million new cases and 8700 new deaths , an increase of 4% and a decrease of 26%, respectively, compared to the previous 28 days ( 13 November to 10 December 2023 ). Trends in the number of reported new cases and deaths should be interpreted with caution due to decreased testing and sequencing, alongside reporting delays in many countries. According to estimates obtained from wastewater surveillance, clinical detection of cases underestimates the real burden from 2 to 19 -fold. • SARS -CoV-2 PCR percent positivity , as detected in integrated sentinel surveillance as part of the Global Influenza Surveillance and Response System ( GISRS ) and reported to FluNet was around 8 % as of 7 January 2024. • During the 28 -day period from 11 December 2023 to 7 January , 53 and 42 countries provided data at least once on COVID -19 -19 cases over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 8 Figure 5. Number of COVID -19 deaths reported over the last 28 days per 100 000 population , as of 7 January 2024 ** **Se e Annex 1: Data, table, and figure notes 9 Figure 6. Percentage change in confirmed COVID -19 deaths over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 10 Hospitalizations and ICU admissions At the global level, during the past 28 days (11 December 2023 to 7 January 2024), a total of 173 547 new hospitalizations and 1 966 new intensive care unit (ICU) admissions were reported from 5 3 and 4 2 countries, respectively (Figure 7). This represents 32% increase and 3% decrease, respectively, compared to the previous 28 days (13 November to 10 December 2023). Note that the absence of reported data from some countries to WHO does not imply that there are no COVID -19-related hospitalizations in those countries. The prese nted hospitalization data are preliminary and might change as new data become available. Furthermore, hospitalization data are subject to reporting delays. These data also likely include both hospitalizations with incidental cases of SARS -CoV-2 infection and those due to COVID -19 disease. New hospitalizations During the past 28 days, 5 3 (23%) countries reported data to WHO on new hospitalizations at least once (Figure 7). 3 COVID -19 2023 -2025 Updated Strategic Preparedness and Response Plan . The new dashboard will progressively incorporate more components throughout 2024. The previous link of the Global WHO Coronavirus (COVID -19) Dashboard will still be active and redirect users to the new one from 22 December onward. Please note that start time of the redirection can differ around the world by up to 24 hours. 2 For t he latest data and other updates on COVID -19, please see: • WHO Monthly Operational Update and past editions of the Weekly Epidemiological Update on COVID -19 • WHO COVID -19 detailed surveillance data dashboard • WHO COVID -19 policy briefs • COVID -19 surveillance reporting requirements update for Member States • Summary Tables of COVID -19 vaccine effectiveness (VE) studies and results (last updated 11 January 2024) • Forest Plots displaying results of COVID -19 VE studies (last updated 15 January 2024) • Special focus WEU on interpreting relative VE (29 June 2022, pages 6 -8) • Neutralizat ion p lots (last updated 15 January 2024) • WHO COVID -19 VE Resources Global overview Data as of 7 January 2024 Globally, the number of new weekly cases remained stable during the 28 -day period of 11 December 2023 to 7 January 2024 as compared to the previous 28 -day period, with over 1.1 million new cases reported (Figure 1, Table 1). The number of new weekly deaths decreased by 26% as compared to the previous 28 -day period, with 8700 new fatalities Answer the following question succinctly, solely using the above snippets of text. During the 28-day period from 11 December 2023 to 7 January, which World Health Organization Region reported the largest decrease in new COVID-19 cases? [/INST] According to the data provided in the World Health Organization Epidemiological Update - Edition 163, the Region of the Americas reported the largest decrease in new COVID-19 cases during the 28-day period from 11 December 2023 to 7 January, with a 26% decrease compared to the previous 28-day period.</s> Correct Answer: The African Region. Using the marco embedding model Question 7: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: 77%), and the South -East Asia Region (+379%). The number of newly reported 28 -day deaths decreased or remained stable across five regions: the African Region ( -113%), the Eastern Mediterra nean Region ( -53%), the Western Pacific Region ( -45%), the Region of the Americas ( -41%), and the European Region ( -3%); while death numbers increased in the South -East Asia Region (+564%). At the country level, the highest numbers of new 28 -day cases were reported from the Russian Federation (235 198 new cases; -30%), Singapore (174 643 new cases; +117%), Italy (163 599 new cases; -18%), Greece (68 590 new cases; +81%), and Malaysia (67 206 n ew cases; +164%). The highest numbers of new 28 -day deaths were reported from Italy (1016 new deaths; -21%), the Russian Federation (679 new deaths; +45%), Poland (543 new deaths; +229%), Sweden (446 new deaths; -47%), and Greece (322 new deaths; +79%). Table 1. Newly reported and cumulative COVID -19 confirmed cases and deaths, by WHO Region, as of 7 January 2024** WHO Region New cases in last 28 days (%) Change in new cases in last 28 days * Cumulative cases (%) New deaths in last 28 days (%) Change in new deaths in last 28 days * Cumulative deaths (%) Countries reporting cases in the last 28 days Countries reporting deaths in the last 28 vs 42 new cases; +60%). The highest numbers of new cases were reported from India (15 079 new cases; 1.1 new cases per 100 000; +843%), Indonesia (8610 new cases; 3.1 new cas es per 100 000; +399%), and Thailand (2327 new cases; 3.3 new cases per 100 000; +17%). The number of new 28 -day deaths in the Region increased by 564% as compared to the previous 28 -day period, with 186 new deaths reported. The highest numbers of new deaths were reported from India (86 new deaths; <1 new death per 100 000; +682%), Indonesia (72 new deaths; <1 new death per 100 000; +1340%), and Thailand (21 new deaths; <1 new death per 100 000; +91%). Updates from the South -East Asia Region 26 Western Pacific Region The Western Pacific Region reported over 338 000 new cases, a 77% increase as compared to the previous 28 -day period. Ten (29%) of the 35 countries for which data are available reported increases in new cases of 20% or greater, with the highest proportional increases observed in Niu e (98 vs six new cases; +1533%), Palau (16 vs two new cases; +700%), Tonga (62 vs 12 new cases ; +417%), Cook Islands (29 vs six new cases; +383%), Cambodia (76 vs 16 new cases; +375%), Brunei Darussalam (16 648 vs 4 207 new cases; +296%), Malaysia (67 ; +20%). The highest numbers of new cases were reported from the Russian Federation (235 198 new cases; 161.2 new cases per 100 000; -30%), Italy (163 599 new cases; 274.3 new cases per 100 000; -18%), and Greece (68 590 new cases; 639.9 new c ases per 100 000; +81%). The number of new 28 -day deaths in the Region decreased by 3% as compared to the previous 28 -day period, with 4194 new deaths reported. The highest numbers of new deaths were reported from Italy (1016 new deaths; 1.7 new deaths per 100 000; -21%), the Russ ian Federation (679 new deaths; <1 new death per 100 000; +45%), and Poland (543 new deaths; 1.4 new deaths per 100 000; +229%). Updates from the European Region 25 South -East Asia Region The South -East Asia Region reported over 26 000 new cases, a 379% increase as compared to the previous 28 -day period. Five (45%) of the 11 countries for which data are available reported increases in new cases of 20% or greater, with the highest proportion al increases observed in India (15 079 vs 1 599 new cases; +843%), Myanmar (110 vs 20 new cases; +450%), Indonesia (8 610 vs 1 727 new cases; +399%), Bangladesh (276 vs 141 new cases; +96%), and Sri Lanka (67 vs 42 new cases; + Answer the following question succinctly, solely using the above snippets of text. During the 28-day period from 11 December 2023 to 7 January, what was the percent increase in new deaths in the South-East Asia Region? [/INST] 564%.</s> Correct Answer: 564% Using the marco embedding model Question 8: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: 2. Number of new hospitalization admissions reported by WHO regions, 11 December 2023 to 7 January 2024 compared to 16 October to 13 November to 10 December 2023 Region Countries reported at least once in the past 28 days Countries reported consistently in the past and previous 28 days* Number of countries (percentage)** Number of new hospitalizations Number of countries (percentage)** Number of new hospitalizations Percent change in new hospitalizations Africa 12/50 (24%) 61 3/50 ( 6%) 6 -45% Americas 19/56 (3 4%) 134 789 5/56 ( 9%) 129 408 +48% Eastern Mediterranean 0/22 (<1%) N/A*** 0/22 (<1%) N/A N/A Europe 15/61 (25%) 23 039 9/61 (15%) 22 574 -9% South -East Asia 2/10 (20%) 1358 2/10 (20%) 1358 +690% Western Pacific 5/35 (14%) 14 300 5/35 (14%) 14 300 +88% Global 53/234 (2 3%) 173 547 24/234 ( 10%) 167 646 +40% *Percent change is calculated for countries reporting consistently both in the past 28 days and the previous 28 days (compari son period). **Number of countries reported / total number of countries in the region ( reported / total number of countries in the region (percentage of reporting). *** N/A represents not available or applicable . 12 Table 3. Countries that consistently reported new hospitalizations by WHO region, 11 December 2023 to 7 January 2024 compared to 13 November to 10 December 2023. WHO Region Country New Hospitalization in past 28 days New Hospitalization in previous 28 -day period % Change from previous 28 -day period Africa Mauritania 0 0 N/A Africa Mali 0 0 N/A Africa Angola 6 11 -45% Americas Turks and Caicos Islands 0 0 N/A Americas United States of America 128073 84981 51% Americas Canada 1335 2685 -50% Americas Saint Lucia 0 7 -100% Americas Honduras 0 5 -100% Europe Malta 79 21 276% Europe Greece 6366 3792 68% Europe Ireland 1353 967 40% Europe Netherlands 2474 2196 13% Europe Slovakia 925 1015 -9% Europe Czechia 2083 2317 -10% Europe Estonia 420 474 -11% Europe Italy 8845 13857 -36% Europe Portugal 29 57 -49% South -East Asia Indonesia 133 izations at least once (Figure 7). The Region of the Americas had the highest proportion of countries reporting data on new hospitalizations (1 9 countries; 3 4%), followed by the European Region (15 countries; 25%), the African Region (12 countries; 24%), the South -East Asia Region (two countries; 20%), and the Western Pacific Region (five countries; 14%). No country in the Eastern Mediterranean Region shared1. The number of countries that consistently reported new hospitalizations for the period was 9% (22 countries) (Table 2). Among the 22 countries consistently reporting new hospitalizations, 8 (36%) countries registered an increase of 20% or greater in hospitalizations during the past 28 days compared to the previous 28 -day period: Indonesia (1337 vs 149; +797%), Malta (79 vs 21; +276%), Brunei Darussalam (588 vs 161; +265%), Malaysia (9312 vs 4137; +125%), Greece (6366 vs 3792; +68%), Singapore (2619 vs 1719; +52%), United Sates of America (12 8 073 vs 84 98 1; +51%), and Ireland (1353 vs 967; +40%). The highest numbers of new hospital admissions were reported from the United States of America (12 8 073 vs 84 981; +51%), Malaysia (9312 vs 4137; +125%), and Italy (8845 vs 13 857; -36%). 11 Table 2. Number of new hospitalization admissions Answer the following question succinctly, solely using the above snippets of text. During the 28-day period from 11 December 2023 to 7 January, which countries reported no new hospitalizations? [/INST] Mauritania, Mali, and Honduras reported no new hospitalizations during the 28-day period from 11 December 2023 to 7 January.</s> Correct Answer: Mauritania, Mali, Turks and Caicos Islands, Saint Lucia, and Honduras. Using the marco embedding model Question 9: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: newer SARS -CoV-2 variants from these data. 17 Figure 8. COVID -19 ICU -to-hospitalization ratio and death -to-hospitalization ratio, from 27 April 2020 to 7 January 2023 (A), and 22 May to 7 January 2024 (B) A B Note: Recent weeks are subject to reporting delays and should not be interpreted as a declining trend. The ICU ratio figure is created from the data of the countries reported both new hospitalizations and new ICU admissions. The d eath ratio figure is created from the data of the countries that reported both new hospitalization and new deaths. Source : WHO COVID -19 Detailed Surveillance Dashboard 18 SARS -CoV-2 variants of interest and variants under monitoring Geographic spread and prevalence Globally, during the 28 -day period from 11 December 2023 to 7 January 2024, 33 659 SARS -CoV-2 sequences were shared through GISAID. In comparison, in the two previous 28 -day periods, there were 112 909 and 192 222 sequences shared, respectively. The data are periodically retrospectively updated to include sequences w ith earlier collection dates, so the number of submissions in a given time period may change. WHO is currently tracking several SARS -CoV-2 variants, including: • Five variants of interest (VOIs): XBB.1.5, XBB.1.16, EG.5, BA.2.86 and JN.1 • Five variants under monitoring (V at least once on COVID -19 hospitalizations and admissions to an intensive care unit (ICU) , respectively . From the available data, over 173 000 new hospitalizations and over 1900 new ICU admissions were reported during the 28 -day period . Amongst the countries reporting these data consistently over the current and past reporting period, there was an overall increase of 40% and 13% in new hospitalizations and new ICU admissions , respectively . • Globally, JN.1 is the most reported VOI (now reported by 71 countries), accounting for 65.5% of sequences in week 52 compared to 24.8% in week 48 (Figure 10, Table 6). Its parent lineage, BA.2.86, is stable and accounted for 7.8% of sequences in week 52 c ompared to 7.0% in week 48 (Figure 10, Table 6). The initial risk evaluation for JN.1 was published on 19 December 2023, with an overall evaluation of low public health risk at the global level based on available evidence. WHO is currently tracking several SARS -CoV-2 variants : five VOIs – XBB.1.5, XBB.1.16 , EG.5 BA.2.86 and JN.1 ; and five VUMs : DV.7, XBB, XBB.1.9.1, XBB .1.9.2 and XBB.2.3 • The Global WHO Coronavirus (COVID -19) Dashboard has been updated and adapted with a new interface on 22 December 2023 to support WHO and Member States’ work to transition from COVID -19 as an emergency to longer -term disease management, as outlined in WHO’s 3 May 2023 COVID -19 2023 -19 cases over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 8 Figure 5. Number of COVID -19 deaths reported over the last 28 days per 100 000 population , as of 7 January 2024 ** **Se e Annex 1: Data, table, and figure notes 9 Figure 6. Percentage change in confirmed COVID -19 deaths over the last 28 days relative to the previous 28 days, as of 7 January 2024** **Se e Annex 1: Data, table, and figure notes 10 Hospitalizations and ICU admissions At the global level, during the past 28 days (11 December 2023 to 7 January 2024), a total of 173 547 new hospitalizations and 1 966 new intensive care unit (ICU) admissions were reported from 5 3 and 4 2 countries, respectively (Figure 7). This represents 32% increase and 3% decrease, respectively, compared to the previous 28 days (13 November to 10 December 2023). Note that the absence of reported data from some countries to WHO does not imply that there are no COVID -19-related hospitalizations in those countries. The prese nted hospitalization data are preliminary and might change as new data become available. Furthermore, hospitalization data are subject to reporting delays. These data also likely include both hospitalizations with incidental cases of SARS -CoV-2 infection and those due to COVID -19 disease. New hospitalizations During the past 28 days, 5 3 (23%) countries reported data to WHO on new hospitalizations at least once (Figure 7). Answer the following question succinctly, solely using the above snippets of text. When did the ICU-to-hospitalizations ratio peak during the COVID-19 pandemic? [/INST] The ICU-to-hospitalizations ratio peaked during the COVID-19 pandemic in the period from 27 April 2020 to 7 January 2023, as shown in Figure 8.</s> Correct Answer: July 2021 Using the marco embedding model Question 10: <s> [INST]<<SYS>> You are an expert on COVID-19 statistics. You provide very succinct answers no longer than ten words. <</SYS>> Here are some snippets from the World Health Organization Epidemiological Update - Edition 163 regarding COVID-19 statistics from 11 December 2023 to 7 January 2024: newer SARS -CoV-2 variants from these data. 17 Figure 8. COVID -19 ICU -to-hospitalization ratio and death -to-hospitalization ratio, from 27 April 2020 to 7 January 2023 (A), and 22 May to 7 January 2024 (B) A B Note: Recent weeks are subject to reporting delays and should not be interpreted as a declining trend. The ICU ratio figure is created from the data of the countries reported both new hospitalizations and new ICU admissions. The d eath ratio figure is created from the data of the countries that reported both new hospitalization and new deaths. Source : WHO COVID -19 Detailed Surveillance Dashboard 18 SARS -CoV-2 variants of interest and variants under monitoring Geographic spread and prevalence Globally, during the 28 -day period from 11 December 2023 to 7 January 2024, 33 659 SARS -CoV-2 sequences were shared through GISAID. In comparison, in the two previous 28 -day periods, there were 112 909 and 192 222 sequences shared, respectively. The data are periodically retrospectively updated to include sequences w ith earlier collection dates, so the number of submissions in a given time period may change. WHO is currently tracking several SARS -CoV-2 variants, including: • Five variants of interest (VOIs): XBB.1.5, XBB.1.16, EG.5, BA.2.86 and JN.1 • Five variants under monitoring (V .2.3. Table 7. Weekly prevalence of SARS -CoV-2 VOIs and VUMs by WHO regions, week 48 to week 52 of 2023 * Includes descendant lineages, except those individually specified elsewhere in the table. For example, XBB* does not includ e XBB.1.5, XBB.1.16, EG.5, XBB.1.9.1, XBB.1.9.2, and XBB.2.3. ¥ due to the small numbers of sequences submitted in these regions, it has not been possible to determine trends for the VOIs and VUMs in these regions; this is also represented by the shaded cells in the table . 20 Figure 9. Global 28 -day prevalence of EG.5, XBB.1.5, XBB.1.16, BA.2.86, and JN.1, from 11 December 2023 to 7 January 2024 *+ * Reporting period to account for delay in sequence submission to GISAID. + Historical presence indicates countries previously reporting sequences of VOIs but have not reported within the period from 4 November to 3 December 2023 21 Figure 10. The (A) number and (B) percentage of SARS -CoV-2 sequences, from 3 July to 31 December 2023 Figure 10. Panel A shows the number, and Panel B the percentage, of all circulating variants since July to December 2023. The variants shown here include descendent lineages, except for the descendent lineage(s) except for the descendent lineage(s) listed here. The Unassigned category includes lineages pending for a PANGO lineage name designation, whereas the Other category includes lineages that are assigned but not listed here. Source: SARS -CoV-2 sequence data and metadata from GISAID, from 3 July to 31 December 2023. A B 22 Additional resources • Tracking SARS -CoV-2 Variants • WHO statement on updated tracking system on SARS -CoV-2 variants of concern and variants of interest • SARS -CoV-2 variant risk evaluation framework, 30 August 2023 • WHO JN.1 Initial Risk Evaluation, 1 3 December 2023 • WHO BA.2.86 Initial Risk Evaluation, 21 November 2023 • WHO EG.5 Updated Risk Evaluation, 21 November 2023 • WHO XBB.1.5 Updated Risk Assessment, 20 June 2023 • WHO XBB.1.16 Updated Risk Assessment, 5 June 2023 22 WHO regional overviews Data for 11 December 2023 to 7 January 2024 African Region The African Region reported over 3354 new cases, a 63% decrease as compared to the previous 28 -day period. Five (10%) of the 50 countries for which data are available reported increases in new cases of 20% or greater, with the highest proportional increases observed in Mauritania ( 30 vs six new cases; +400%), Senegal (17 vs six new cases ; +183%), Mali (two vs one new case; +100%), Burundi (78 vs 51 new cases; +53%), and Angola ( Answer the following question succinctly, solely using the above snippets of text. During the 28-day period from 11 December 2023 to 7 January, which SARS-CoV-2 variants of interest (VOIs) were the WHO tracking? [/INST] The WHO was tracking the following SARS-CoV-2 variants of interest (VOIs) during the 28-day period from 11 December 2023 to 7 January 2024: • XBB.1.5 • XBB.1.16 • EG.5 • BA.2.86 • JN.1</s> Correct Answer: XBB.1.5, XBB.1.16, EG.5, BA.2.86 and JN.1.
Note that decimal scores indicate the correct answer to the question was multiple items in a list, and the model only answered correctly part of the list.
Llama 2, 7B - RAW NF4:
Llama 2, 7B NF4 - with embeddings, 200-token chunks 50% overlap:
Llama 2, 13B - RAW NF4:
Llama 2, 13B NF4 - with llama embeddings, dot product similarity, 500 tok long 50% overlap, 3 samples:
Llama 2, 13B NF4 - with openai embeddings, cosine similarity, 500 tok long 50% overlap, 3 samples:
Llama 2, 13B NF4 - with openai embeddings, dot product similarity, 500 tok long 50% overlap, 3 samples:
Llama 2, 13B NF4 - with marco embeddings, dot product similarity, 500 tok long 50% overlap, 3 samples:
Llama 2, 7B - RAW GPTQ:
Llama 2, 7B GPTQ - with llama embeddings, 200-token chunks w/ 50% overlap, top-3 similar embeddings:
Llama 2, 7B GPTQ - with openai embeddings, 200-token chunks w/ 50% overlap, top-3 similar embeddings:
Llama 2, 7B GPTQ - with marco embeddings, 200-token chunks w/ 50% overlap, top-3 similar embeddings:
Llama 2, 13B - RAW GPTQ:
Llama 2, 13B GPTQ - with llama embeddings, 200-token chunks w/ 50% overlap, top-3 similar embeddings:
Llama 2, 13B GPTQ - with openai embeddings, 200-token chunks w/ 50% overlap, top-3 similar embeddings:
Llama 2, 13B GPTQ - with marco embeddings, 200-token chunks w/ 50% overlap, top-3 similar embeddings:
Llama 2, 13B GPTQ - with marco embeddings, 500-token chunks w/ 50% overlap, top-3 similar embeddings:
Llama 2, 13B GPTQ - with marco embeddings, 500-token chunks w/ 10% overlap, top-3 similar embeddings: