Perspectives on a Big Data Application What Databa
Engineering, Technology & Applied Science Research Vol. 5, No. 5, 2015, 850-853 851 www.etasr.com Erturk and Jyoti: Perspectives on a Big Data Application: What Database Engineers and IT Students… structured query language (SQL) statements of traditional
relational databases. There are many NoSQL document
oriented databases, for example, CouchDB and MongoDB.
Document oriented databases store and retrieve data according
to the meta-data definitions and tags found in their documents.
MongoDB was first developed by in 2009. It has versions
that are compatible with different editions of Windows, Linux,
Solaris, and Mac operating systems. MongoDB is an open
source and free database application, under the GNU Affero
General Public License. It is currently the most popular
application in the category of document oriented databases [5].
MongoDB can store semi-structured and polymorphic data
as well as structured data. Much of the current big data
exchanged on the internet does not follow clear cut structures,
rigid schemas, or restrictions in terms of data type and length.
This also means handling emails, forums, complicated large
objects, and multimedia files [6]. MongoDB allows users to
flexibly define arrays within documents, and perform various
operations on those array fields. Furthermore, MongoDB offers
database querying functionality and supports high performance
indexing. These are useful, for example, for text searches and
manipulation of geospatial data. Instead of SQL Join
statements, MongoDB users may reference a document from
another or embed a document inside another document. It also
features a powerful data aggregation and data analysis
framework, for performing calculations on large data sets.
MongoDB aims to provide high performance, availability,
and scalability, for cloud based information systems and big
data environments. In order to ensure availability, sophisticated
database systems create replicas of database files in different
locations so that, if one location is down, then the user can
access the information from another location. As shown in
Figure 1, a new machine becomes the primary server while the
‘heartbeat’ diagnostic program continues to check if the various
replica servers are online and working properly. Additionally, it
is possible to automate the failover (switching from a server
that is down to an available one) and data recovery processes.
Fig. 1.
Database Replication and Failover.
Scalability is the ability of databases to grow tremendously
in size while maintaining usability or performance. There are
very large databases that constantly accumulate data on the
internet. In MongoDB, more servers can be allocated to a
database to balance the workload and increase performance;
this is called horizontal scalability. Sharding is the process in
which a single database is divided into multiple components.
These database components are stored on different servers.
These servers may also be virtual machines or cloud based.
IV.
L
EARNING ACTIVITIES USING MONGODB
One of the first skills that the database engineer needs is to
be able to quickly install the MongoDB server software on a
given computer. A brief example in this paper is installing it on
the Microsoft Windows operating system. After navigating to
https://www.mongodb.org/downloads, the user needs to select
the appropriate version from the menu, and clicks on
‘Download MSI.’ Once the setup file is saved, the user can
execute this, follow the instructions, and complete the steps.
After this installation, the MongoDB server can be run from the
Windows command prompt. However, the setup process does
not complete all the requirements; the user needs to manually
create a subfolder named ‘db’ in the ‘data’ folder of the hard
drive to avoid potential error messages and run MongoDB.
Although a database engineer can interact with this basic
installation using the command line interface, the next
recommended and useful task is to install a front-end graphical
user interface application to manage the databases. Figure 2
shows the configuration of the front-end application called
NoSQL Manager. The default port number for the MongoDB
server is 270017.
Fig. 2.
Connecting to MongoDB using a front-end application.
Another optional tool for database students and trainees to
be aware of is called MongoDB Management Service (MMS).