Project: Points Of Interest (POI) search engine

This project is optional. Submitting a complete and good project in time brings up to four award points!

This project comes in 2 versions: The Simple Version, with the deadline 01.04.2023, brings 1 award point. The Complete Version has the deadline extended to 18.04.2023 and brings 4 award points. Students that submit the complete version do not have to submit also a simple version, but they may do so in case that their development work is incremental and they want to make sure that they get at least the intermediate points in case that they do not finish the complete version. Submitting 2 different programs (the simple and the complete version) will however not add their points (you can obtain a maximum of 4 points).

The project is individual and personal work. Plagiarism from any source is considered a serious offence. If a project presented by a student is detected as not being the result of his own work, this will lead to failing to pass the ADA exam: projects are optional, you do not have to do them - but if you do, then submit only your own work!

The project has a hard deadline. You must submit your projects using the corresponding assignments on the Virtual Campus. A submission must contain the source code and a readme file explaining your work.

Students will also do a public presentation of their projects in class with discussions and questions.

If you have questions related to this project, pleas address them by VirtualCampus messages or by email to ioana.sora@cs.upt.ro

Application Domain Overview

One of the most commonly encountered problems in Computer Science today is search. Search for countless types of data; search that has to provide results as fast as possible. One of the aspects that make search interesting but challenging at the same time is the large variety of data types computers know how to handle.

One of data types that is used on a daily basis is geospatial data i.e., numerical representation of physical objects in a geographic coordinate system. Each one of us has encountered representations of this type of data whenever we used a GPS or accessed services like Google Maps to find information about a location. Given the vast amounts of geospatial data and the critical systems that it is being used by, various data structures and algorithms have been created to handle storage and search for this type of data.

Project Requirements - Simple Version

The goal of this project is to implement a program that can store (in-memory) geospatial data and can answer queries efficiently. Given the scope of this project, only the two operations, namely insert and search are required.

The program will provide a command line interface (CLI) which accepts the following commands:

Geospatial data file provided for testing: locations.txt
The file contains POIs from the following cities: Timisoara, Cluj-Napoca, Bucharest, Iasi, Satu Mare.

Implementation Requirements - Simple Version

The program must be implemented by using an in-memory R-tree. No other data structures are accepted for this assignment.

An R-tree is a height-balanced tree, derived from B-tree, used to index multi-dimensional information such as geospatial or audio/video data.

To represent data in an R-tree, the nearby objects are grouped in a Miminum Bounding Box (MBB) or Minimum Bounding Rectangle (MBR), which is the smallest rectagle the contains all the grouped objects.

The structure of the data contained in the two types of possible nodes is:

R-tree example (source):

For details on how to implement the required operations (insert, search) you can use these great resources:

Project Requirements - Complete Version

In real life applications, geospatial data are huge and cannot be represented by in-memory datastructures.

Implement a program with the same command line interface as described above for the simple version, but use persistent R-trees instead of the in-memory R-trees. Similar as with persistent B-trees, the R-trees will be stored as files on disk, containing a number of records that correspond to the tree nodes.

Your program in the complete version must be able to use real data from OpenStreetMap. Data from OpenStreetMap can be freely downloaded in various formats from http://download.geofabrik.de.