TU Delft
print this page print this page     
2014/2015 Electrical Engineering, Mathematics and Computer Science Master Computer Science
Scalable Data Management for Data Science
Responsible Instructor
Name E-mail
Prof.dr.ir. A.P. de Vries    A.P.deVries@tudelft.nl
Contact Hours / Week x/x/x/x
Education Period
Start Education
Exam Period
Course Language
Course Contents
"Dataspaces" is a new abstraction for data management (first proposed in Dec 2005) that addresses the limitations of relational database systems for managing a mixture of both structured and unstructured data. Future dataspace support platforms will take advantage of structure in the data that makes up the dataspace, but should not restrict the scope of the dataspace to just that subset of structured data. This is especially challenging for dataspaces that include multimedia objects such as photos and videos.

The course investigates recent literature on the following three aspects of dataspaces and the data management issues involved: data models for dataspaces; adding structure interactively in a pay-as-you-go fashion; and, data structures and query processing techniques to ensure scalability.

We start with reviewing basic ideas underlying relational database technology: data abstraction and data independence, query processing, query optimization. We study query processing strategies in more detail, with an emphasis on the role of access structures at the physical layer of the database management system (DBMS).
The second part of the course explains the differences between data retrieval and information retrieval, giving a crash course into IR and multimedia search by content. Limitations of search by similarity are discussed, especially in high dimensionality.
The final part of the course presents different design alternatives for integrated systems for data and information retrieval. Implications on the system architecture for dataspace support platforms are the central focus. Techniques for scalable data management (like map reduce) are also discussed.
Study Goals
The course has three goals:
1. Theory: Database technology has been successful for administrative data because it offers a balance between flexibility and efficiency. The course investigates how the fundamentals of database technology can be applied also to create support platforms for heterogeneous multimedia dataspaces.
2. Learn to work with research papers (and recognize their limitations).
3. Gain practical "engineering" skills when turning research ideas into a prototype implementation.
Education Method
Lectures, lab work.
Literature and Study Materials
Reader (made available online).
Paper presentation, participation in class, project assignment.
The course is organized as follows. Each student organizes one of the classes, in which a research paper on the topics related to class is discussed. During the course period, students develop a prototype that illustrates one or more aspects of multimedia search and its implications on data management. Assessment is based on class participation, the quality of the presentation of the research paper treated, and a short report and demonstration of the prototype work performed.