This project is optional. Submitting a complete and good project in time brings up to four award points!
This project comes in 2 versions: The Simple Version, with the deadline 01.04.2023, brings 1 award point. The Complete Version has the deadline extended to 18.04.2023 and brings 4 award points. Students that submit the complete version do not have to submit also a simple version, but they may do so in case that their development work is incremental and they want to make sure that they get at least the intermediate points in case that they do not finish the complete version. Submitting 2 different programs (the simple and the complete version) will however not add their points (you can obtain a maximum of 4 points).
The project is individual and personal work. Plagiarism from any source is considered a serious offence. If a project presented by a student is detected as not being the result of his own work, this will lead to failing to pass the ADA exam: projects are optional, you do not have to do them - but if you do, then submit only your own work!
The project has a hard deadline. You must submit your projects using the corresponding assignments on the Virtual Campus. A submission must contain the source code and a readme file explaining your work.
Students will also do a public presentation of their projects in class with discussions and questions.
If you have questions related to this project, pleas address them by VirtualCampus messages or by email to ioana.sora@cs.upt.ro
One of the most commonly encountered problems in Computer Science today is search. Search for countless types of data; search that has to provide results as fast as possible. One of the aspects that make search interesting but challenging at the same time is the large variety of data types computers know how to handle.
One of data types that is used on a daily basis is geospatial data i.e., numerical representation of physical objects in a geographic coordinate system. Each one of us has encountered representations of this type of data whenever we used a GPS or accessed services like Google Maps to find information about a location. Given the vast amounts of geospatial data and the critical systems that it is being used by, various data structures and algorithms have been created to handle storage and search for this type of data.
The goal of this project is to implement a program that can store (in-memory) geospatial data and can answer queries efficiently. Given the scope of this project, only the two operations, namely insert and search are required.
The program will provide a command line interface (CLI) which accepts the following commands:
READ <geospatial-data-file>
-- reads the file located at the
geospatial-data-file
path -- e.g., read /home/user/locations.txt
.
Each line in the file will have the following format: <name>, <tag>, <lat>, <lng>
FIND <tag> WITHIN <range> OF <lat>, <lng>
-- returns all the
locations having the tag tag
that are in a range of range
from the
point given by the lat, lng
coordinates -- e.g., FIND accomodation WITHIN 500m OF 45.747826, 21.226216
INSERT <name> TAGGED WITH <tag> LOCATED AT <lat>, <lng>
-- inserts the record
identified by name, tag, lat
and lng
-- e.g.,
INSERT "Muzeul Banatului" TAGGED WITH museum LOCATED AT 45.757143, 21.232651
SHOW TAGS
-- lists all the tags have have at least one POI associated to them
The program must be implemented by using an in-memory R-tree. No other data structures are accepted for this assignment.
An R-tree is a height-balanced tree, derived from B-tree, used to index multi-dimensional information such as geospatial or audio/video data.
To represent data in an R-tree, the nearby objects are grouped in a Miminum Bounding Box (MBB) or Minimum Bounding Rectangle (MBR), which is the smallest rectagle the contains all the grouped objects.
The structure of the data contained in the two types of possible nodes is:
(MBB, n-dim Object)
, where MBB
is the MBB which contains the n-dimensional object
(MBB, child-ptr)
, where MBB
is the MBB which contains
all the regions stored in the child node pointed to by child-ptr
R-tree example (source):
Implement a program with the same command line interface as described above for the simple version, but use persistent R-trees instead of the in-memory R-trees. Similar as with persistent B-trees, the R-trees will be stored as files on disk, containing a number of records that correspond to the tree nodes.
Your program in the complete version must be able to use real data from OpenStreetMap. Data from OpenStreetMap can be freely downloaded in various formats from http://download.geofabrik.de.