Here you will find a complete report of Final year Project.
1.1 Purpose for this Final Year Project:
This Software Requirement Specification (SRS) document states in precise and explicit language those functions and capabilities that the software system must provide, as well as states any required constraints by which the system must abide. The application whose requirements have been specified in this document is: CNIC Data Mart and Fraud Detection by the use of Business Intelligence.
The document decomposes the problem into component parts. This act of writing down software requirements in a well-designed format organizes information, places borders around the problem, solidifies ideas, and helps break down the problem into its component parts in an orderly fashion.
1.2 INTENDED AUDIENCE AND READING SUGGESTIONS
This document is basically the understanding of the development team about NADRA’s requirements for the project prior to any actual design or development work. It’s a two-way insurance policy that assures that both the organization and the development team understand the other’s requirements at this given point in time.
Thus, the intended audiences of this document are both the developers of the project and the users.
1.3 REFERENCES FOR THIS FINAL YEAR PROJECT:
 Sam Anthony, Dennis Murray; Data Warehousing in the Real World: A practical guide for building decision support systems; Pearson Education (Singapore) Pte. Ltd, Indian branch; Delhi (2004)
 Paulraj Ponniah; Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals; John Wiley & Sons, INC; New York (2001)
2 SCOPE OF FINAL YEAR PROJECT
2.1 SCOPE OF WORK for this Final Year Project.
The project encompasses the development of data mart for the Computerized National Identity Cards (CNIC) issued to the citizens of Pakistan. The project involves gathering of data from multiple sources and bringing them to one interface for extraction, transformation and loading into the CNIC Data Mart. The data mart will be created on functional lines and will be based on a natural break of the data in the National Data Warehouse. The data mart will be designed with bilingual support (Urdu and English) therefore Unicode handling will be involved throughout the project life cycle.
Mentioned below are the tasks and components included in the scope of the project:
- Data from multiple Online Transaction Processing Systems (OLTPs) will be extracted. In this process a set of data will be identified and retrieved from the operational systems to be brought to one interface. Only the data relevant to analysis will be extracted to the staging area.
- Transformation will be performed on the data extracted in to the staging area. In this process data will be converted from application-specific data into enterprise data and new data will be created using formulas and calculations. The data will be validated, cleansed and integrated from multiple tables and source systems by applying business rules to make it conformable to the logical and subject-oriented structure of the data mart.
- The formatted (transformed) data will then be loaded into the dimensions and cubes of the target data mart schema, in the form of batches. The loading of data in the form of batches will ensure efficient loading of the data in the data mart.
2.2 WORK OUT OF SCOPE IN THIS FINAL YEAR PROJECT.
Mentioned below are the tasks and components not included in the scope of the project:
- Data from existing databases of citizens will be used for populating the CNIC data mart hence the project does not include the process of acquisition of data from NIC applicants.
- The project deals with business rules only relevant to the process of duplicate detection. Other business rules identified and implemented on the National Data Warehouse fall outside the scope of this project.
- The project involves the development of a data mart for the Computerized National Identity Cards (CNIC) only. Hence, any other national data kept at NADRA will not be included in the scope of the project.
- The data that will be separated out from the data mart as prospective duplicate can be dealt with in various ways. One of the ways can make use of Image Processing tools for matching finger prints and photos of the records singled out as probable duplicates. Such activities will not be a part of the project.
- Data mining algorithm will be employed for the purpose of finding out fraud specific trends and patterns only. Thus the data mining algorithm which will eventually be employed will not cater to any other aspects of data mining.
Click at read more for further details
2.3 GOALS AND OBJECTIVES FOR THIS FINAL YEAR PROJECT
The proposed system is required to meet the following objectives:
- To improve the existing system design by introducing the ‘Dimensional’ data modeling technique, for designing the National Data Warehouse, as an alternative to the ‘ER’ data modeling technique; thus, aiming to ensure efficient information storage, processing and retrieval and better user understandability. According to Ralph Kimball, “The dimensional model is the only viable technique for achieving both user understandability and high query performance”.
- To provide an efficient utility for data extraction, transformation and loading for the National Data Warehouse.
3 OVERALL DESCRIPTION FOR FINAL YEAR PROJECT:
3.1 PRODUCT PERSPECTIVE
Among the various projects of NADRA, one is the issuance of state-of-the-art National Identity Cards (NICs) to all adult citizens of Pakistan. These NICs are duly backed by the computerized database and data warehouse respectively called the Citizens’ Database and National Data Warehouse (NDWH).
NADRA has created National Data Warehouse, which is integrated and interfaced with the citizen databases for optimum utilization by all users ensuring economy of effort and resources.
The project that is going to be developed is a component of a larger system i.e. the functional National Data Warehouse. The functionality of the project will complement the existing functional data warehouse.
3.2 PRODUCT FUNCTIONS FOR THIS FINAL YEAR PROJECT
The list of the major functions which will be performed in the data warehouse application is given below:
- The system will identify the data to be extracted from the heterogeneous Online Transaction Processing (OLTP) systems and will extract the required data. This data will be homogenized and brought to one interface.
- The system will transform the extracted data in to strategic information before loading it in to the data warehouse. Data will be cleansed and validated for accuracy and ensuring that all values conform to a standard definition.
- The system will load the data extracted and cleaned to the data mart. The major set of functions consists of taking the prepared data, applying it to the data warehouse, and storing it in the database there. The tool built for loading data will load the data in to the data mart in the form of batches.
3.3 USER CLASSES AND CHARACTERISTICS
The intended users of this software are the Database Administrators or DBAs. Thus, the Database Administrator class or the DBA class will emerge as the only user class that will interact with the end product.
The DBA class will have the privileges of initiating and stopping the ETL functions (that include Extraction, Transformation and Loading of data).
After deployment, the extent of user interaction with the end product will usually be confined only to report viewing.
3.4 OPERATING ENVIRONMENT
For an efficient implementation it is important to employ those tools and the technologies that best support the features of the system under consideration. For this system the tools and technologies adopted are described below:
ORACLE DATABASE SERVER 9i
The database management system employed for the creation of the data mart will be Oracle Database Server 9i. The Oracle 9i Enterprise Edition version of Oracle supports a large number of users or large databases with advanced features for extensibility, performance, and management. The Oracle database is a broad and powerful product. It has also added some performance enhancements that specifically apply to data warehousing applications.
Oracle9i features full Unicode 3.0 support. National Language Support (NLS) provides character sets and associated functionality, such as date and numeric formats, for a variety of languages. All data may be stored as Unicode, or select columns may be incrementally stored as Unicode.
All flavors of the Oracle database include different languages that allow programmers to access and manipulate the data in the database. These languages also include SQL and PL/SQL. The ANSI standard Structured Query Language (SQL) provides basic functions for data manipulation, transaction control, and record retrieval from the database. On the other hand Oracle’s PL/SQL, a procedural language extension to SQL, is commonly used to implement program logic modules for applications. PL/SQL can be used to build stored procedures and triggers, looping controls, conditional statements, and error handling.
MICROSOFT VISUAL STUDIO .NET 2003
The data mart front end application will be developed using Visual C# in Microsoft Visual Studio .Net 2003 environment. With an extensive set of visual designers, a range of programming languages, and integrated Visual Database Tools, Visual Studio .NET 2003 enables user to build powerful software quickly.
Visual Studio .NET 2003 is a comprehensive tool for rapidly building Microsoft .NET applications for Microsoft Windows (and the Web), dramatically increasing developer productivity, and enabling new business and enterprise opportunities.
§ SQL SERVER 2000
The data homogenization area for the data mart application will be developed in the native DBMS of the source systems i.e. Microsoft SQL Server 2000. Microsoft SQL Server 2000 also includes powerful features to support multilingual operations and environments. Extensive multilingual features make SQL Server 2000 a compelling database product and applications platform.
MICROSOFT ACCESS 2000
Microsoft Access is a part of MS-Office that provides database solutions for this Final year project. It works very well in windows and is used in many applications as database. It is a powerful and easy to use database. In this system Microsoft Access 2000 will be employed for developing the database for application user accounts.
MICROSOFT WINDOWS PROFESSIONAL 2000
The citizen data mart application will be developed to run on Microsoft Windows Professional 2000 platform.
3.5 DESIGN AND IMPLEMENTATION CONSTRAINTS
This section depicts the issues that might limit the options available to the development team. Following are some of the design and implementation constraints:
§ Hardware Limitations
Massive quantity of CNIC data will be required to carry forward the operations of the software. Because of the large bulk of data, machines with large storage capacity and high processing power will be required.
§ Security Constraints
The CNIC data at NADRA is highly confidential and will require a considerable amount of security measures to be applied during the design, development and the usage of the software.
§ Maintenance Issues
Once handed over to the organization, the software, its documentation, its handling and its maintenance will completely be the responsibility of the organization.
3.6 USER DOCUMENTATION
User manuals will be delivered along with the end product and the other deliverables. The user manuals will familiarize the users with the working of the end product.
4 EXTERNAL INTERFACE REQUIREMENTS FOR THIS FINAL YEAR PROJECT:
4.1 USER INTERFACES
Due to the limited user interaction, the number of user interfaces will be limited. The subsequent paragraphs will provide a brief description of each of the user interfaces.
The first interface will be the “User Login” interface. A number of DBAs will use the system and each of them will have their own logins and access rights for using the data and the software. The users will log on to the system, by providing their login names and passwords, to commence any operation.
The next interface, after logging in to the system, will provide the users with access to the system functions (according to their access rights). A user having complete rights will be able to manipulate the Data Warehousing functions. The Data Warehousing functions include the sequential tasks of data extraction, data transformation (according to the required format) and data loading (in to the data mart). The user will be allowed to initiate or to stop the sequential execution of the ETL functions. The user will be able to view the progress of these operations.
4.2 SOFTWARE INTERFACES
The data mart developed as a part of the project, will have an interface with multiple heterogeneous OLTP systems at NADRA, for data collection. After data collection ETL functions will be performed to load this data in to the data mart.
5 SYSTEM FEATURES
The following section provides a list of the features present in the system that develope for this Final year project. These features have been elaborated by describing distinct requirements associated with each feature so as to provide a better understanding of what each feature constitutes.
5.1 DATA EXTRACTION
Data that will be used in a data warehouse must be extracted from the operational systems that contain the source data. Data is initially extracted during the data warehouse creation process, and on-going periodic extractions occur during updates of the data warehouse. Data extraction can be a simple operation, if the source data resides in a single relational database, or a very complex operation, if the source data resides in multiple heterogeneous operational systems.
Given below are the functional requirements of the data extraction module:
REQ 1: The extraction module should enable the user to extract from the multiple operational systems currently in use.
REQ 2: The data extracted from the diverse data sources e.g. ‘Online Data Acquisition Database’ and ‘Form Based Registration Database’ must be homogenized. The data must be consolidated from these different operational systems on a common platform.
REQ 3: The data extraction process should bring all the source data into a common, consistent format.
REQ 4: All irregularities or inconsistencies that might exist between the disparate sources should be removed during the homogenization process. All inconsistencies should be resolved for common data elements coming from multiple sources.
REQ 5: Preliminary data cleansing should be performed on the data extracted from the multiple data sources.
- It should be ensured that there are no validation errors in the data extracted from the source systems.
- The data with missing mandatory fields or insufficient information must be rejected.
- Default values should be provided for missing fields if applicable.
- The data should be checked for correctness and accuracy as well.
- Information about all the records rejected must be recorded for future reference. The reasons of rejection should also be maintained so that it can be communicated to the required person.
REQ 6: After the data has been extracted from the data sources, the basic validation checks have been performed and incorrect or incomplete records have been rejected, the extraction module should export the data to the staging area so that a sequence of transformations can be applied on the data, to make it ready to be loaded into the data mart.
REQ 7: In case the extraction process is cancelled by the user or in case of an error, the module should roll back the performed activities so that the homogenization and staging areas are ready for the ensuing extraction.
REQ 8: A log of all tasks performed should be written in a log file so as to maintain an audit trail of all the activities carried out.
5.2 DATA TRANSFORMATION
After extracting the data, it needs to be transformed in to strategic information before loading it in to the data warehouse. Since this feature of the system takes the data that has been extracted from multiple data sources as an input, before moving the extracted data from the source systems into the data warehouse, inevitably various kinds of data transformations have to be performed. Data transformation is the cleansing and validation of data for accuracy and ensuring that all values conform to a standard definition.  The data has to be transformed according to standards because it comes from many dissimilar source systems. It has to be ensured that after all the data is put together, the combined data does not violate any business rules.
One major effort in data transformation is the improvement of data quality. This includes filling the missing values for attributes in the extracted data. Data quality is of paramount importance in a data warehouse because the effect of strategic decisions based on incorrect information can be devastating.
Described below are the functional requirements for the data transformation module:
REQ 1: The transformation module should be initiated only if the newly extracted data is present in the staging area.
REQ 2: All the extracted data should be brought into a standardized valid format during the transformation process.
REQ 3: Duplicate records should be removed, during transformation, when same data is brought from multiple systems.
REQ 4: Default values should be provided for the missing data elements of the extracted data during the data transformation process.
REQ 5: Format revisions of the extracted data should be provided, by the transformation module, to standardize the data types and field lengths for same data elements retrieved from the various sources.
REQ 6: The transformation module should de-code various fields i.e. to replace data codes with meaningful values to standardize the extracted data.
REQ 7: The user should be able to merge information i.e. to combine information from separate source records in to a new combination or a single entity.
REQ 8: The transformation module should combine data from single source record or related data elements from various source records.
REQ 9: The extracted data should be sorted and data arrangements should be re-sequenced according to the analysis requirements during transformation.
REQ 10: The transformation module should be able to purge any useless source data that has been extracted.
REQ 11: Date / time conversions should be carried out by the transformation module for representing date and time in standard formats.
REQ 12: Derived fields and calculated values should be incorporated into the extracted data during transformation.
REQ 13: Synonyms and homonyms should be resolved by the module.
REQ 14: Appropriate summarization, i.e. creation of summaries to be loaded into the data warehouse, should be carried out by the transformation module instead of loading the most granular level of data.
REQ 15: Surrogate keys, derived from the source system primary keys, should be assigned since a data warehouse cannot have primary keys with built-in meanings.
REQ 16: Business rules should be applied for validity checking during the transformation process.
REQ 17: After the data has been cleansed and all validation checks have been performed, the data transformation module should export the transformed data from the staging area to the data mart image schema, to make it ready to be loaded into the data mart.
REQ 18: The date and time of the beginning and completion of the transformation process should be recorded in a log file.
5.3 DATA LOADING
This feature incorporates the tasks that have to be performed to load the data that has been extracted and cleansed into the data warehouse. The major set of functions consists of taking the prepared data, applying it to the data warehouse, and storing it in the database there.  Load images are created to correspond to the target files to be loaded in the data warehouse database.
Described below are the functional requirements for the data loading module:
REQ 1: The data loading module should be initiated only when the data has been completely cleansed and transformed.
REQ 2: The data should be loaded sequentially in the form of batches for reducing the loading time; for loading the data warehouse may take an inordinate amount of time.
REQ 3: ‘Initial Load’ should be used to load the data into the data mart for the very first time. ‘Load’ mode should be used for initial loading. All the further runs should be applied using ‘Append’ mode or ‘constructive merge’.
REQ 4: ‘Incremental load’ should be used for applying ongoing changes as necessary in a periodic manner. The ‘constructive merge’ mode should be used for incremental loading, as the historical perspective of data is important.
REQ 5: A record of all information about the data load should be written in a log file so as to maintain an audit trail of all the activities carried out.
REQ 6: In case the loading process is cancelled by the user or in case of an error, the module should roll back the performed activities.
6 NON FUNCTIONAL REQUIREMENTS
6.1 EASY TO USE Graphical User Interface (GUI)
GUI is most important part for every project.
An effective and friendly Graphical User Interface is critical for effective system performance. The users’ view of a system is conditioned chiefly by experience with its interface. If the user interface is unsatisfactory, the users’ view of the system will be negative regardless of any niceties of internal computer processing. The system may be described as hard to learn, or clumsy, tiring and slow to use.
Keeping in mind the significance of a good GUI, all the interfaces for different workflows of the system processes must be in accordance with a good standard format and consistency must be followed through out the course. Every minute GUI attribute must be given chief significance and end-user satisfaction must be born in mind while placing, arranging, assigning and relating icons, buttons and menus.
Efficiency of a data warehouse system is concerned with the minimum query processing time as well as optimal use of the system resources. In designing the proposed system, the efficiency factor must be taken well into consideration and various mechanisms such as indexing should be used.
6.3 SECURITY REQUIREMENTS
The data that is eventually to be loaded into the data mart is confidential and its security is of paramount importance. To assure the confidentiality, integrity and availability of data, security measures which ensure that different categories of corporate data are protected to the degree necessary must be employed. Effective and efficient access control restrictions will have to be enforced so that the end-users can access only the data or programs for which they have legitimate privileges.
6.4 DATA INTEGRITY REQUIREMENTS
To guarantee data integrity, i.e. assurance that data or information has not been altered or destroyed in an unauthorized manner, a control mechanism will have to be used to prevent all users from updating and deleting the data in the data mart. It should also be ensured that the various components of the system are accessible only through grant of rights by the administrator.
6.5 FLEXIBLE ARCHTECTURE
Flexibility is the effort needed to modify operational program. In case of design and development of a data warehouse/data mart, not all of the requirements are known up front. Missing parts of the requirements usually show up after users begin to use the data warehouse. Thus, one of the requirements of the data mart architecture is that it should be flexible so that it can accommodate additional user needs as and when they surface.
6.6 PERFORMANCE REQUIREMENTS
The performance of a Data Warehouse is largely a function of the quantity and type of data stored within a database and the query/data loading workload placed upon the system.
When designing and managing the data warehouse there are numerous decisions that need to be made that can have a huge impact on the final performance of the system. Following are some of the requirements that will have to be fulfilled by proper design of the data mart to boost performance of the system:
- Ensuring the consistency of data from disparate data sources.
- Selecting a proper data modeling technique for the data warehouse design.
- Ensuring the proper amount of data partitioning, indexing, aggregation and summarization.
- Ensuring proper management of data storage.
- Periodic updates and purging of data warehouse data.
Besides there software performance requirements, the hardware will also have an impact on performance as described below:
- A Data Warehouse is required to run queries on a large table that involves full table scans. The response times for these queries are very critical. Therefore the performance will be affected by the choice of machines employed to run the various data matching algorithms. A powerful machine with a good processing speed will influence the time required to perform functions on massive amounts of data.
6.7 SOFTWARE QUALITY ATTRIBUTES
The following table depicts the software quality attributes for the end product on a scale of one to ten:
|S.No||Software Quality Attributes||1||2||3||4||5||6||7||8||9||10|
Table 1 Software Quality Attributes (on a scale of 1-10)
Following is a brief description of each quality attribute:
§ Correctness is the extent to which a program/software satisfies specifications, fulfills user’s mission objectives.
§ Efficiency is the amount of computing resources and code required to perform function.
§ Flexibility is the effort needed to modify operational program.
§ Interoperability is the effort needed to couple one system with another.
§ Reliability is the extent to which program performs with required precision.
§ Integrity means the property that data or information have not been altered or destroyed in an unauthorized manner.
§ Reusability is the extent to which it can be reused in another application.
§ Testability is the effort needed to test to ensure performs as intended
§ Usability is the effort required to learn, operate, prepare input, and interpret output.
§ Robustness is the resilience of the system, especially when under stress or when confronted with invalid input.
Appendix A: Glossary
Acronyms and Abbreviations
|SRS||Software Requirements Specification|
|NADRA||National Database and Registration Authority|
|CNIC||Computerized National Identity Card|
Table 2 Table of Acronyms and Abbreviations
Final Year Project Ideas.
Final Year Project for Computer Science.
Final Year Project for Computer networking.
Final Year Project for Electrical Engineering.
Final Year Project for Computer Engineering student for both level .Its good final year project report for university and college students.
Complete Report of My Final Year Project.