Nuwakot, Nepal

Sunday, July 28, 2013

Protection dog training

protection dog training

Guarding dogs can be a good option for homeowners who wish to protect their property. We have to provide the training in order to make them aware about the real scenario. In most cases, simply having a dog that barks at strangers and alerts you of potentially dangerous situations is enough when it comes to home security. However, some canines can be much more aggressive when it comes to protecting their territory and masters. In this situation, it is particularly important to properly train a guard dog.

https://www.airportparkingconnection.com/mco-orlando-airport-parking.aspx

orlando international airport parking

Talking about orlando airport parking, it is airport parking based for parking and transportation facilities. It includes both nationwide and international wide.It consists of different offers like Economy airport parking, park orlando, international airport parking,action car rental,premier car venture,airport quick parking et cetera. It also includes many competitive features unlike other airport parking.

Friday, October 19, 2012

Rebecca Leeb

Rebecca Leeb
Visit the site.. http://www.fashonhair.net/blonde-hair-colors/

Wednesday, September 14, 2011

Proposal Defense at CDCSIT

Clustering the text using the normal vector space model could not handle the
semantic relevancy of words so due to lack of such features in traditional vector
space model the concept of enhanced vector method is proposed. The research has
not been performed yet in opinion mining task in Nepal which is the leading task
for Nepali researcher who wants to work in Nepali language for the text
clustering. The algorithms which work in English language may not work in other
language. The clustering task enables the analyst to observe those clusters having
maximum number of documents which saves the time in this busy world for the
opinion to be analyzed by the analyst.
None of the papers seems to be focused on the classical vector space model
although it was simple, easier computation due to its inability of finding of
semantic similarity.
If the classical Vector Space Model is used for clustering purpose, simply
syntactic structure is taken into consideration. Seeing in the below example it
cannot clearly find that the meaning of first two sentence is same because the only
syntactic concept is major concern for classical vector space model. But in this
thesis, semantic texts are grouped into a group which is performed using fuzzy set
theory.
The Enhanced Vector Space Model has to be checked whether it works properly
or not. The combination of fuzzy set and the classical Vector Space Model is
another study that is going to be performed in this thesis.

Sunday, June 12, 2011

My Proposal

Introduction
Text clustering is the process of grouping the opinion or review or comment on the particular topic. The clustering can be done using different methods like k-means, vector space models or other machine learning algorithms available in language processing tasks. For the operation to be performed in the process of opinion clustering, we have to   take an idea of linear algebra as well as the vector space model which is well known method in text processing.

The clustering task has been performed by different researchers like [1], [2], [3]. [1] Took the concept of modified vector space model in which they modified the inverse document frequency with document frequency only.

Similarly [2] take the vector space model for blog analysis purpose. They took different clustering algorithms with vector space model and found that fuzzy based model is the best.

[3] Used vector space model for the clustering purpose and they used the knn method for the clustering problem.

Thus far, scoring has hinged on whether or not a query term is present in a zone within a document. We take the next logical step: a document or zone that mentions a query term more often has more to do with that query and therefore should receive a higher score. [4]

Denoting as usual the total number of documents in a collection by N, define the   inverse document frequency (idf) of a term t as follows:
idft = log(N/dfi)
The tf-idf weighting scheme assigns to term t a weight TF-IDF in document d given
by
tf-idft,d = tft,d ×idft.[5]

By adding the membership value from the Gaussian membership function the tfidf value is updated and clustering concept is applied using threshold value for clustering.
For example TFIDF(new)=TFIDF(traditional)+semanticscore
Semantic score is obtained from the Gaussian membership function and it is added to the weight if it contains in the fuzzy set.

In our method suppose we have the following test documents to be clustered using this method

Documents    Content
1    यो घर राम्रो छ
2    यो घर सुन्दर छ
3    यो घर मनमोहक छ
4    म भात खान्छु
Figure 1 sample table for clustering

Figure 2 Resultant clustering

Figure 3 Flow chart how the operation works

    In the process above it is divided into five steps
1)    Calculate TF of each term in each document
2)    Add semantic score to each tf and perform tf*idf operation
3)    Perform cosine similarity with query vector for each document
4)    Apply   grouping rule
5)    Test the condition of grouping rule and check whether it falls in the same cluster or not.

Problem Definition
Clustering the text using the normal vector space model could not handle the    semantic relevancy of words so due to lack of such features in traditional vector space model the concept of enhanced vector method is proposed. The research has not been performed yet in opinion mining task in Nepal which is the leading task for the Nepali researcher who wants to work in Nepali language for the opinion clustering. The algorithms which work in English language may not work in other language. The clustering task enables the analyst to observe those clusters having maximum no of documents which saves the time in this busy world for the opinion to be analyzed by the analyst.

Objective
    The main objectives of this research work are

To cluster the Nepali texts
To find semantic relevancy and syntactic relevancy of the text
To observe the relation between semantic score and no of clusters.

Research Methodology
Data preparation
For the research purpose the test data written using Nepali Unicode software and these are tested using programming language php/mysql.


4.2    Performance evaluation
For the performance evaluation the concepts like random index, precision and recall are taken in to consideration

Expected output

The expected out put will be like this
Suppose we have given the different documents like

Documents    words
1    विशाल
2    ठुलो
4    नराम्रो
4    खराव

    Figure 4 sample table for clustering

It will make the cluster into at most two cluster semantically using enhanced vector space model.

Figure 5 resultant clustering

Working Schedule

Activities (weeks)    1   1   1    1   1   1   1   1   1   1   1   1   1   1    1   1     1    1    1

     Study and analysis

    Data collection

    Implementation

    Testing

    Documentation

    Review

    Presentation

Figure 6 Working Schedule

References
[1] Abdul-Rub, Mohammed Said, ”A modified vector spaced model for protein Retrieval”, UHCSNS, Vol 7 No 9, 2007.
[2] Ho, Chi-Shu, ”Blog analysis with Fuzzy TFIDF”, Master Project, San Jose State University, 2007
[3] Jaiswal, Mayank Prakash, “Clustering Blog Information”, Master Project, Paper 36,2007
[4] Shin Kwangcheol , Abraham Ajith, Han Sang Yong, “ Improving kNN Text Categorization by Removing Outliers from Training Set” , 2006.
[5] Emre Esin Yunus, “Improvement of corpus-based word similarity using vector space model”, Mater Thesis, Middle Ease University, 2009.

Tuesday, May 24, 2011

Counting the no of documents in php

echo countFiles('/usr/local/'); // outputs 27  function countFiles($dir){  

    $files = array();  

    $directory = opendir($dir);  

    while($item = readdir($directory)){  

    // We filter the elements that we don't want to appear ".", ".." and ".svn"  

         if(($item != ".") && ($item != "..") && ($item != ".svn") ){  

              $files[] = $item;  

         }  

    }  

    $numFiles = count($files);  

    return $numFiles;  

}  ?>

Information retrieval and tokenization in php

Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web. There is overlap in the usage of the terms data retrieval, document retrieval, information retrieval, and text retrieval, but each also has its own body of literature, theory, praxis, and technologies. IR is interdisciplinary, based on computer science, mathematics, library science, information science, information architecture, cognitive psychology, linguistics, and statistics.[Wiki]

Tokenization is the first step in preprocessing on Information Retrieval. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining. Tokenization is useful both in linguistics (where it is a form of text segmentation), and in computer science, where it forms part of lexical analysis.[Wiki]

The following program code shows the simple steps in making Tokenization using PHP, so it is easy to understand and can be run directly, but must be placed in the web server eg: htdocs (XAMPP or lampp).
Ok, This is the code.

// Tokenization Function
 function tokenization($text){
  // Removing punctuation in the text.
  $text = preg_replace('/[?!.,()*]|[-]|\'/','', $text);

  // Convert text to lower case
  $text = strtolower(trim($text));

  // Tokenization
  $word = explode(" ",$text);
  $tok = $word;

  for($i=0;$i<=(count($tok)-1);$i++){
   for($j=0;$j<=(count($tok)-1);$j++){
    if ($word[$i] == $tok[$j]){
     $freq[$word[$i]]+=1;
     array_splice($word,$i,1);
    }
   }
  }

  // Sort the results of tokenization based on the largest frequency
  arsort($freq);

  // Returns the result of Tokenization
  return $freq;

 }

 $news = "Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web. There is overlap in the usage of the terms data retrieval, document retrieval, information retrieval, and text retrieval, but each also has its own body of literature, theory, praxis, and technologies. IR is interdisciplinary, based on computer science, mathematics, library science, information science, information architecture, cognitive psychology, linguistics, and statistics.";

 $result = tokenization($news);

 // The result in table
 echo "
Result News
";  foreach($result as $key => $val) {   echo "
$key = $val
";  }  echo " $news
";
?>