Automation and Cost Estimation of Query Executions on Document-Based Databases

Akpinar, Mert (2025) Automation and Cost Estimation of Query Executions on Document-Based Databases. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria e scienze informatiche [LM-DM270] - Cesena, Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Cloud-native NoSQL services promise elastic scale and global distribution, but their pay-per-use plans make it difficult to anticipate how data modeling and query design translate into monetary cost. This study tackles that challenge for document databases by automating the estimation of query execution cost, with a focus on Azure Cosmos DB. I study a realistic e-commerce dataset and two alternative document schemas (alpha and beta) to expose the trade-offs between redundancy, traversal depth, and price. On top of metadata (entity cardinalities and relationship multiplicities), I develop an estimator in Python that predicts document accesses for multi-hop queries, and I implement executable plans against MongoDB and Cosmos DB (SQL API) to measure Request Units (RUs) and observed charges. The experimental workload is a set of 17 read-query templates and their plans defined in the project repository. Each plan is run repeatedly to obtain stable RU and latency statistics. The number of estimated documents is converted to dollars to get the estimated price. The estimated price is then compared to the real price, which is obtained by converting RUs to dollars. At the workload level, estimated and measured totals for one million executions differ slightly, while individual plans show variance driven by document size, index hits, and join fan-out. The model preserves the relative ranking of plans and thus supports cost-aware design and plan selection. I discuss where the estimator should be refined—most notably, by making RU conversion sensitive to payload size and index probes—and outline how these extensions would close the per-plan gaps observed in practice. Overall, the results indicate that metadata-driven estimation, coupled with light empirical calibration, can deliver actionable cost predictions before deployment, improving budget predictability and guiding schema/query optimization.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Akpinar, Mert
Relatore della tesi
Scuola
Corso di studio
Indirizzo
CURRICULUM INTELLIGENT EMBEDDED SYSTEMS
Ordinamento Cds
DM270
Parole chiave
nosql,database,query,resource,unit,cost,schema,estimation,document-based,cosmosdb,mongodb,filtering,join,serverless,python,yaml,cardinality,one-to-many,many-to-one,cardenas,cost-model
Data di discussione della Tesi
2 Ottobre 2025
URI

Altri metadati

Gestione del documento: Visualizza il documento

^