CoSIT 2014

Accepted Papers

Scalable Distributed First Story Detection Using STORM
Mahesh Huddar¹, Manjula Ramannavar² and Nandini Sidnal³ , ¹Hirasugar Institute of Technology , India , ²Gogte Institute of Technology, India and ³KLES College of Engineering and Technology , India
ABSTRACT
Twitter is an online service that enables users to read and post tweets; thereby providing wealth of information regarding breaking news stories. The problem of First Story Detection (FSD) is to identify first stories about events from a continuous stream of documents. Locality sensitive hashing algorithm is the traditional approach used for FSD. A major challenge in FSD is the high degree of lexical variation in documents which makes it very difficult to detect stories that talk about the same event but expressed using different words. We modify the Locality sensitive hashing algorithm to overcome this limitation while maintaining reasonable accuracy with improved performance. This work uses Twitter as the data source to address the problem of real-time FSD. As the input data rate is high, we use Storm distributed platform, so that the system benefits from the robustness, scalability and efficiency that this framework offers.
Case Based Study to Analyze the Applicability of Linear & Non-Linear Models
Gaurav Singh Thakur¹, Anubhav Gupta² and Ankur Bharadwaj³ , ¹Cisco Systems, India , ² Commonfloor, India and ³ NITK Surathkal , India
ABSTRACT
This paper uses a case based study – “product sales estimation” on real-time data to understand the applicability of linear and non-linear models. We use a systematic approach to address the given problem statement of sales estimation for a given product by applying both linear and non-linear techniques on a data set of selected features from the original data set. Feature selection is a process that reduces the dimensionality of the data set by eliminating those features which contribute minimal to the prediction of the dependent variable. The next step is training the model which is done using two techniques from linear & non-linear domains, one of the best ones in their respective areas. Data Re-modelling has then been done to extract new features from the data set by changing the structure of the dataset & the performance of the models is checked again. Data Remodelling often plays a crucial role in boosting classifier accuracies by changing the properties of the dataset. We then try to analyze the reasons due to which one model proves to be better than the other & hence try and develop an understanding about the applicability of linear & non-linear models. The target mentioned above being our primary goal, we also aim to find the classifier with the best possible accuracy for product sales estimation in the given scenario.
A Secure Naive Bayes Classifier for Horizontally Partitioned Data
Sumana M¹ and Hareesha K S² , ¹ M S Ramaiah Institute of Technology, India and ² Manipal Institute of Technology , India

ABSTRACT
In order to extract interesting patterns, data available at multiple sites has to be trained. Distributed Data mining enables sites to mine patterns based on the knowledge available at different sites. In the process of sites collaborating to develop a model, it is extremely important to protect the privacy of data or intermediate results. The features of the data maintained at each site are often similar in nature. In this paper, we design an improved privacy-preserving distributed naive Bayesian classifier to train the horizontal data. This trained model is propagated to sites involved in computation. We further analyze the security and complexity of the algorithm.
Configuration as a Service in Multi-Tenant Enterprise Resource Planning System
Mona Misfer AlShardan and Djamal Ziani , King Saud University , Saudi Arabia
ABSTRACT
Enterprise resource planning (ERP) systems are the organizations tickets to the global market. With the implementation of ERP organizations can manage and coordinate all functions, processes, resources and data from different departments by a single software. However, many organizations consider the cost of traditional ERP is expensive and look for alternative affordable solutions within their budget. One of these alternative solutions is providing ERP over a software as a service (SaaS) model. This alternative could be considered as cost effective solution compared to the traditional ERP system. A key feature of any SaaS system is the multi-tenancy architecture where multiple customers (tenants) share the system software. However, different organizations have different requirements. Thus, the SaaS developers accommodate each tenant’s unique requirements by allowing tenant-level customization or configuration. While customization requires source code changes and in most cases a programming experience, the configuration process allows user to change many features within a pre-defined scope in an easy and controlled manner. The literature provides many techniques to accomplish the configuration process in different SaaS systems. However, the nature and complexity of SaaS ERP needs more attention to the details regarding the configuration process which is merely described in previous researches. Thus, this research is built on strong knowledge regarding the configuration in SaaS to define specifically the configuration borders in SaaS ERP and to design a configuration service with the consideration of the different configuration aspects. The proposed architecture will ensure the easiness of the configuration process by using wizard technology. Also, the privacy and performance are guaranteed by adopting the databases isolation technique.
Multitasking with Monolithic MiniOS, a miniature operating system for Embedded Systems
Sourav Maji and Shuva Jyoti Kar , Ericsson India Global Services Pvt Limited , India
ABSTRACT
Embedded microcontrollers are often programmed in plain C and lack support for multithreading and real-time scheduling. This can make it very cumbersome to implement multitasking applications which require less computation per task. Need for apparent parallelism in the operation of more than one independent task is found in applications involving control systems and robotics where waiting for an input from one application must not hinder the processing of others. We have developed a Monolithic Operating System, named “MiniOS” for the Atmel ATmega16L AVR to show that it is feasible to use a priority based round robin scheduling even in a tiny 8-bit processor with 1 KB of RAM [11]. There is not sharp demarcation between the internal kernel data structures and the data used for the scheduler. This is purposely kept to make optimal performance of a low speed cpu coupled with a very small area of memory. Its usage is demonstrated in three applications each shedding light on the features, the operating system hosts. In the first we spawn three different tasks with different priorities, where they print out mutually exclusive characters on the console via the USART [8]. This demonstrates the idea of multitasking and the concept of saving states when a task is scheduled in and out by context switch and the working of the priority scheduler. The second demonstrates the usage of the delay subroutine which is a slight modification of the first in which three tasks of same priority are spawned and a delay of certain interval is injected in one. The third demonstrates the features of interprocess communication and a method of prioritizing various tasks with three tasks, where one task is an interface to the console for input and output and other two tasks are meant as counters implemented by LED’s.
Ontology based Data Mining Methodology for Discrimination Prevention
Nandana Nagabhushana and Natarajan S , P.E.S. Institute of Technology, India
ABSTRACT
Data Mining is being increasingly used in the field of automation of decision making processes which involve extraction and discovery of information hidden in large volumes of collected data. Nonetheless, there are negative perceptions, which include privacy invasion and potential discrimination. These perceptions contribute as hindrances to the use of Data Mining methodologies in software systems employing automated decision making. Loan granting/denial, Employment, Insurance Premium calculation are a few of the several such systems, which can make use of Data Mining to effectively prevent human biases pertaining to attributes like gender, nationality, race etc. in critical decision making. The proposed methodology prevents discriminatory rules arising due to the presence of information regarding the sensitive discriminatory attributes in the data itself.There are two aspects of novelty in the method proposed in this paper, first being the ontology based rule mining by identifying discrimination related data-properties and the second, concerning transformation of the mined rules that are quantized as discriminatory, into non-discriminatory ones. Certain metrics are used to perform the measurement of the amount of discrimination removal.
A New Scheme for RSA Cryptosystem
Nikita Somani and Dharmendra mangal, Student, India

ABSTRACT
In this paper, we have introduced RSA cryptosystem and its security aspects. RSA is a public key algorithm that applied widely in the information security of Internet-Banking and E-Commerce applications. We have proposed a new scheme for RSA cryptosystem that contain three prime numbers and overcome several attack possible on RSA. The new scheme has speed improvement on RSA decryption side by using the CRT and the new scheme is semantically secure also.
Forecasting of Sporadic Demand Patterns with Spare Parts
B.Vasumathi S.Baskar¹ and A.Saradha² , ¹PGP College of Arts &Science, India and ²Institute of Road Transport and Technology, India
ABSTRACT
Items with irregular and sporadic demand profiles are frequently tackled by companies, given the necessity of proposing wider and wider mix, along with characteristics of specific market fields (i.e., when spare parts are manufactured and sold). Furthermore, a new company entering into the market is featured by irregular customers' orders. Hence, consistent efforts are spent with the aim of correctly forecasting and managing irregular and sporadic products demand. In this paper, the problem of correctly forecasting customers' orders is analyzed by new method. Specifically, new proposal forecasting method (i.e., CUM modCr Method) for items are empirically analyzed and tested in the case of data coming from the industrial field and characterized by intermittence.Hence, in the conclusions section, new method produces better results than the existing method.
Plasticity of a Guidance System for Software Process Modeling
Hamid Khemissa¹, Mourad Oussalah² and Mohamed Ahmed-Nacer¹ , ¹USTHB University, Algeria and ²Nantes University , France
ABSTRACT
The need for adaptive guidance systems is now recognized for all process of software development. The new needs generated by the mobility context for software development led these guidance systems to be adapted for. This paper deals with the plasticity of guidance systems or their ability to be adapted to specific development contexts. We propose a Y description for adaptive guidance. This description focuses on three dimensions defined by the material platform, the adaptation form and provided guidance service. Each dimension considers several factors to deduce automatically the appropriate guidance service to a current development context.
TriBASim : a novel TriBA On Chip Network Simulator based on systemC
Daniel Gakwaya, Jean Claude Gombaniro and Jean Pierre Niyigena , Beijing Institute of Technology , China
ABSTRACT
In this paper ,we develop a simulator for the Triplet Based (TriBA) Network On Chip processor architecture. TriBA(Triple-based Architecture) is a multiprocessor architecture whose basic idea is to bundle together the object programming basic philosophy and hardware multicore systems[9] .In TriBA ,nodes are connected in recursive triplets .TriBA network topology performance analysis have been carried out from different perspectives [1] and routing algorithms have been developped [6][7] but the architecture still lacks a simulator that the researcher can use to run simple and fast behavioral analysis on the architecture based on common parameters in the Network On Chip arena.We present TriBASim in this paper ,a simulator for TriBA ,based on systemc[16][17] .TriBASim will lessen the burden on researchers on TriBA ,by giving them something to just plug in desired parameters and have nodes and topology set up ready for analysis.
Automating the Document Review Process
Sandeep Tukkoji and K. M. M. Rajashekharaiah , B.V.B. College of Engineering & Technology , India

ABSTRACT
The typical process for reviewing documents under development involves printing, binding, and distributing hardcopy drafts to review team. This hardcopy review process poses certain inefficiencies. Reviewing document drafts online can improve the review process, saving paper, labor, time, and money. However, simply going online doesn’t necessarily result in an efficient process. The right online tool must be used. To determine which tool is best, one must determine the criteria that must be met to improve the process. Then, the available tools must be analyzed to see if one meets all the criteria. Here we have created a web based online review tool which meets all the criteria.
Year 2038 problem : Y2K38
Rawoof Ahamed and Saran Raj , Dhanalakshmi College of Engineering , India
ABSTRACT
Our world has been facing many problems but few seemed to be more dangerous. The most famous bug was Y2K. Then Y2K10 somehow, these two were resolved. now we are going to face Y2K38 bug. This bug will affect most embedded systems and other systems also which use signed 32 bit format for representing the date and time. From 1,january,1970 the number of seconds represented as signed 32 bit format.Y2K38 problem occurs on 19,january,2038 at 03:14:07 UTC (Universal Coordinated Time).After this time all bits will be starts from first i.e. the date will change again to 1,january,1970. There are no proper solutions for this problem.
The Internet of Things: Challenges & Security Issues
Gurpreet Matharu, Priyanka Upadhyay and Lalita Chaudhary , Amity University , India
ABSTRACT
Propelled by large-scale advances in wireless technologies, sensing technologies and communication technologies, the transformation of the Internet into an integrated network of things termed as Internet of Things is rapidly unfolding. The Internet of Things enabled by Wireless Sensor Networks (WSN) and RFID sensors finds a plethora of applications in almost all the fields such as health, education, transportation and agriculture. This paper briefs the idea of IoT and discusses the challenges to its future growth. Further, this paper describes the general layered architecture of IoT which is extended to provide for a secure construction of the IoT architecture, by tackling security issues at each layer of the architecture. Also, the paper mentions the potential applications of the Internet of Things (IoT) technologies in fields ranging from intelligent transportation to smart home to e-health care and green agriculture.
Content Based Image Retrieval : A Review
Shereena V B and Julie M. David , MES College , India
ABSTRACT
In a content-based image retrieval system (CBIR), the main issue is to extract the image features that effectively represent the image contents in a database. Such an extraction requires a detailed evaluation of retrieval performance of image features. This paper presents a review of fundamental aspects of content based image retrieval including feature extraction of color and texture features. Commonly used color features including color moments, color histogram and color correlogram and Gabor texture are compared. The paper reviews the increase in efficiency of image retrieval when the color and texture features are combined. The similarity measures based on which matches are made and images are retrieved are also discussed. The paper discusses effective indexing and fast searching of images based on visual features.
Hadoop Distributed File System-Review and Measures for Optimum Performance
Dipayan Dev and Ripon Patgiri , NIT Silchar , India

ABSTRACT
The size of the data used in today’s enterprises has been growing at exponential rates from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage and storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem, HDFS is written in Java and designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications .This paper initially deals with the review of HDFS in details. Later on, the paper reports the experimental work of Hadoop with the big data and suggests the various factors on which Hadoop cluster shows an optimal performance. Paper concludes with providing the different real field challenges of Hadoop in recent days.
Study of Defects, Testcases and Testing Challenges in Website Projects using Manual and Automated Techniques
Bharti Bhattad and Abhay Kothari , Acropolis Institute Of Technology and Research , India
ABSTRACT
Testing is the one of the important component of any software engineering process. As we talking about the software’s applications then web application is the fastest growing application now a day. So web application or web sites will be tested accurately and correctly. Web testing includes testing of various applications like configuration control, navigation control, state, database etc. Web site testing ensures that there will be no broken links, no images will be missed, there should be no spelling mistakes, no any errors or bugs in software, and the download time should not be so delay as specified. Timeliness, structural quality, content, accuracy and consistency, response time and latency and performance are the major web site‘s quality factors. Functional, browser, performance, security, usability, database etc testing are performed on any website to make it defect free. Also for any project we also need to maintain the database.So database plays very important role for every organization, so for better results testing of database is required. It is now not only the necessity of project or web application itself but of the organization also to avoid any future problems that can be come in application. As a minute fault in data base can causes data loss that may be uncover able in future. Many tools and frameworks are available for testing of databases or generate test cases to check the applications. When we test the website or any web application and there is difference between expected results and the actual results, there is defect. Defects can be classified in to 3 categories: Wrong, Missing and Extra. Errors can be classified according to priority or severity. According to the severity and priority of the defects, these can be fixed before deliver product to the client. In this paper we represent that on which we can apply tests in on database .how we can perform testing on database. We have also computed the coverage of design of test cases to maintain the quality of testing. By this, we can decrease the time, memory and cost of project to some extent, there by easing the tester to manage their testing phases easily.
A Survey on: Privacy Preserving Mining Protocol and Techniques
Poonam Dhakad, Rakesh Salam ,Shiv Kumar and Amit Kumar Mishra , Technocrates Institute of Technology , India
ABSTRACT
With the growth of the digital world things get easier to share and transfer but this increase the privacy attack, as many data contain different private information of common people. So sharing of data is done by providing the security to the sensitive data that may cause harm or leak the private information of an individual. So the major goal of privacy preserving, is to find the important data then make change to the dataset in order to protect that information from other. In this paper corporate information are preserved from any kind of information mining attack. Here rules are generate by some association algorithm then sensitive rules are preserved by different techniques.
Ad Sharing in Social Networks: Role of User Defined Policies
Venkata Narasimha Inukollu, Divya Keshamoni, Sailaja Arsi and Manikanta Inukollu , 1Texas tech university , United States and 2Bhaskar Engineering College , India
ABSTRACT
Security policies describe the demeanor of a system through specific rules and are becoming an increasingly popular approach for static and dynamic environment applications. Online social networks have become a de facto portal for Internet access for millions of users. Users share different content on social media sometimes which includes personal information. However, users entrust the social network providers with such personal information. Although social networking sites offer privacy controls, the sites provide insufficient controls to restrict data sharing and let users restrict how their data is handled and viewed by other users. To match the privacy demands of an online social network user, we have suggested a new security policy and have tested the policy successfully on various levels.
Performance Analysis of Digital Image Steganographic Algorithms
N. D. Jambhekar¹ and C. A. Dhawale² , ¹S.S.S.K.R. Innani Mahavidyalaya, Karanja (Lad) , India and ²P.R. Pote College of Engineering & Management, India

ABSTRACT
Steganography is the technique with which the confidential data is hidden under the cover medium such as image, without reflecting any clue on the cover image. Many algorithms are designed to provide the security for the communication of data over the Internet. The good steganographic algorithm is identified by the performance of the algorithm measured with help of the parameters such as PSNR, MSE, robustness and capacity to hide the information in the cover image. This paper analyzed the Digital Image Steganographic algorithms in spatial and frequency domain.
Past, Present and Future of Camouflage Image Detection Methods
Sujit Kumar¹ and Chitra A.Dhawale², ¹IMRD , India and ²P.R. Pote College of Engineering & Management, India
ABSTRACT
Camouflaging means blending foreground texture image into background image texture. Camouflaging is used from very ancient days of animal kingdom in those days animals used to hide themselves from their enemies and later on, this concept is widely used by military and many more rich application areas like defects in manufacturing products namely carpet wear, tiles, wood etc. Many researchers have developed technique to detect camouflage portion in the image, however there is cost involved with applying any techniques mechanism, which tends to be proportional to the amount of identification provided that’s why innovation and enhancement are required to get the proper output. Camouflage related work is divided in two categories first is assessment and design of camouflaged texture second is camouflage breaking. In this paper author is trying to discuss past work of camouflaging and use of camouflaged image detection ie camouflage breaking in future.
Spectral Relevance Coding in Mining for Genomic Application
S.J.Saritha¹ and P.Govindarajulu² , ¹JNTUA , India and ² S.V.University, India
ABSTRACT
Most current gene detection systems are Bio-informatics based methods. Despite the number of Bio-informatics based gene detection algorithms applied to CEGMA (Core Eukaryotic Genes Mapping Approach) dataset, none of them have introduced a pre-model to increase the accuracy and time reduction in the different CEGMA datasets. This method enables us to significantly reduce the time consumption for gene detection and increases the accuracy in the different datasets without loss of Information. This method is based on feature based Principal Component Analysis (FPCA). It works by projecting data elements onto a feature space, which is actually a vector space that spans the significant variations among known data elements.

Accepted Papers

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT