Project
Semester 6 Project: Comparison of NoSQL and SQL Databases in a Distributed Environment.
Implementation:
This project is used to compare the performance of NoSQL and SQL Databases in a distributed (specifially, sharded) environment.
The reason? I read up on a lot papers, such as this which compares the two databases but only in a single database environment. I could not find a comparison on the basis of scalability, i.e, when your application is generating traffic, which database can scale well(easier?) and is simple to scale.
RESULTS!
MongoDB does fair better when I ran performance tests on my local machine. The results are not entirely accurate since they measure wall time, which is a measure of the total time taken by the process while other system processes are running. Furthermore, the test data is only 11 entries and for the first few tests I am relying on MongoDB's autosharding. It does fair well though, the only time MongoDB was outperformed was when there was a high startup and ending time for read operations. I presume it's because mongoid takes longer to cache the records. I am going to run a full fledged test on DigitalOcean droplets and try and scale it higher amounts of data, because 11 records doesn't cut it.
But for now, here are the results:
- rails-4.2.5 using ruby-2.3.1.112 on x86_64-linux
- master and slave shards
- Operations:
- Create: simple creation of Event objects with object presence and format validations. 3 objects go to slave shard.
- Read: Bulk read of all objects from all shards.
- Update: Updation of a non-keyed attribute.
- Delete: Deletion of all records, effectively emptying the database.
- Create: simple creation of Event objects with object presence and format validations. 3 objects go to slave shard.
For the implementation I have used the following:
- Ruby On Rails: The best part of Rails is the MVC Architecture, so changing the database only meant changing the model and database.yml file.
- MongoDB (mongoid) vs PostgreSQL: MongoDB, because of the support and Postgresql because its a ORDBMS and support from Heroku where this application is deployed.
The site is a crowdfunding, event management site. Users can create,donate and/or attend events.
I am following a Spiral Process Model as I roll in different features to make the site more complex.