Back to documents

Document 1: comp4349_comp5349_assignment2_2026.pdf

Status: ready

S3 bucket: comp4349-a2-yawu8371

S3 key: uploads/1780492903_ff8f4926cbe44d718e28aa73c3741ca3_comp4349_comp5349_assignment2_2026.pdf

Uploaded: 2026-06-03 13:21:43.815181+00:00

Processing Runs

Strategy Status Chunks Average length Processing time Error
Fixed-size chunking completed 20 967.8 0.353 sec
Paragraph-aware chunking completed 15 1035.3 0.403 sec

Sample Chunks

Fixed-size chunking

Chunk 0 - 1000 characters

# Page 1

School of Computer Science
Dr. Ying Zhou
COMP4349/COMP5349: Cloud Computing Sem. 1/2026
Assignment 2: AWS Project
Individual Work: 20% 08.05.2026
Tasks
In this assignment, you will deploy an event-driven Python web application on AWS. The
application allows users to upload PDF documents and compare two different text chunk-
ing strategies for retrieval. The retrieval uses a simple local TF-IDF embedding approach
to avoid long latency caused by calls to LLM services and to avoid potential API rate-limit
issues. You are required to submit a report describing your deployment and to attend a
demonstration to verify the deployment.
Application Description
The application supports the following high-level workflow:
1. A user accesses the web application through a public web endpoint.
2. The user uploads a PDF document through the web application.
3. The uploaded PDF is stored in dura...

Chunk 1 - 1000 characters

. The user uploads a PDF document through the web application.
3. The uploaded PDF is stored in durable object storage.
4. The application records metadata about the uploaded document in a database.
5. The system processes the uploaded PDF using two different chunking strategies:
•fixed-size chunking;
•paragraph-aware chunking.
6. The system generates chunks and processing statistics for each strategy .
7. The web application retrieves processing status and generated results from the database.
8. Once both processing strategies have completed, the web application displays a side-
by-side comparison of their results.
9. The user may enter a retrieval query to compare which chunks are retrieved by each
strategy .
1

# Page 2

Required Architectural Properties
Your deployed system must satisfy the following architectural properties:
•The web application must be accessed through an Applicati...

Paragraph-aware chunking

Chunk 0 - 8 characters

# Page 1

Chunk 1 - 1512 characters

School of Computer Science
Dr. Ying Zhou
COMP4349/COMP5349: Cloud Computing Sem. 1/2026
Assignment 2: AWS Project
Individual Work: 20% 08.05.2026
Tasks
In this assignment, you will deploy an event-driven Python web application on AWS. The
application allows users to upload PDF documents and compare two different text chunk-
ing strategies for retrieval. The retrieval uses a simple local TF-IDF embedding approach
to avoid long latency caused by calls to LLM services and to avoid potential API rate-limit
issues. You are required to submit a report describing your deployment and to attend a
demonstration to verify the deployment.
Application Description
The application supports the following high-level workflow:
1. A user accesses the web application through a public web endpoint.
2. The user uploads a PDF document through the web application.
3. The uploaded PDF is stored in durable object...

Query Comparison