Zhoucen's Blog

2014年11月5日星期三

Chinese word segmentation

In the course assignment 2, we practice how to do text classification which was introduced in the lecture two. This assignment helps me to review and better understand the content of lecture two. Before we do text classification, we should split sentences into words first. Basically, word segmentation is very important for natural language processing (NLP). I think this is the basis of NLP tasks in social media analysis. In the assignment, we only deal with English text. As we know, in English sentences there is a space between two words which can be regard as a natural delimiter.

Figure 1 English word segmentation

However, there is no apparent delimiter between words in Chinese sentences. Therefore, it is more difficult to split Chinese sentences into words than English sentences. Consequently, I find out that there are three methods of segmentation algorithms which can be used to split Chinese sentences.

Figure 2 Chinese word segmentation

1. Character matches method

This algorithm is to match the Chinese string with a ‘sufficiently large’ machine dictionary which according to a certain strategy. If a string is found in the dictionary, a word is identified. According to the scan direction, the match can be divided into forward match and reverse match. In accordance with the priority of different lengths, the match can be divided into longest match and shortest match.

2. Understanding method

This segmentation method is to let the computer simulate the sentence understanding process of human beings to achieve the effect of identifying word. However, since the general and complexity of Chinese language, it is difficult to organize various kinds of language information to a machine directly read form. As a result, the understanding-based word segmentation system is still in the experimental stage.

3. Statistics method

As we know, Chinese word is the combination of Chinese characters. Therefore, in a text, the more times consecutive characters occur simultaneously, the more likely they constitute a word. Consequently, to count the frequency of the simultaneously occur consecutive word, is a way to split sentences into word. However, this method also has some limitations, some high co-occurring frequency characters always be taken out, but they can not form a word.

In summary, Chinese word segmentation is still very difficult. When we actually split Chinese sentences, we may synthesis use the above three segment methods.

2014年10月16日星期四

Social network analysis

Social network analysis was introduced to us in the 6^th lecture. This analysis method becomes more and more popular and valuable with the rapid development of social network. Moreover, it involves multiple disciplines and research areas, such as data mining, knowledge management, statistical analysis, social capital, dissemination of information and so on. I remembered that one aphorism told that: “The successful of a person, it does not depend on what you know, but who you know.” I think this maxim explains that how important the relationship among people and why we do social network analysis.

Social network analysis mainly focuses on research the relationship among a group of actors. The actors may be people, communities, organizations and even countries. In order to analyze the social network, we learnt that how to draw graphs. In my opinion, drawing graphs is a direct way for us to find out the factors and their relationships in the network. We can intuitive see that who is the centre of the relationship network and the potential links between factors. Furthermore, we can finally build the complete relationship network which could reflect part of the social structure. We may also find out the deeper relationship between factors from the graphs. For instance:
1. Kinship: parent-child relationship;
2. Emotion: like, hated, respect;

3. Occupation: official, doctor, programmer;

4. Common property: hobbies, organizations

Figure 1 Social network analysis graph

We can see that by analyzing the social network, we can know very detailed information of a person. I feel horrible when I realized this. We easily exposed our personal information when we enjoy using social networking services. Therefore, it is necessary to enhance our personal information protection awareness. To protect our personal information, we may do the following things: disable cookies, regularly clean the browser cache, set restricted sites, encrypt important messages and do not reveal too much personal information on internet.

Figure 2 Personal information protection

Social network analysis can help us to understand the relationship between people better. If you are also interested in this topic, please share your opinion with me.

2014年10月2日星期四

Thoughts from lecture 4

In the 4^th lecture, we learnt sentiment analysis and opinion mining. In my opinion, this is quite an important topic, because it has great value for both individuals and organizations.

On the one hand, it could help individuals to understand what their friends thinking and feeling so that they will deepen the understanding of their friends and maintain their relationships better. In addition, we could more likely to make the right decision when we receive the influence of the others’ point of view. As we know, it is very difficult to avoid interference by others’ opinions and evaluation. In the 4^th class, Prof Rosanna shown us how strong the influence of others’ opinion. She let students who arrived early to mislead the arrived late students to give wrong answer to a question. Finally, only few students adhere to the correct answer to the end. Actually, we are so easily to be influenced by others that we really need sentiment analysis and opinion mining to judge others’ views.

Figure 1 Opinion Survey

On the other hand, for the organizations, understand people’s opinion will be beneficial to them to improve their products, services and develop their strategies. One specific thing is public opinion analysis. Public opinion is the public social and political attitude towards social events. It is integrated performance of believes, attitudes, opinions and emotions which are expressed by various social phenomena. When social media become the carrier of public opinion, public opinion has the following characteristics, higher freedom, lager information scale, faster propagation velocity, more participants and more real-time interactivity. These features bring some convenience; however, it also has some negative effects at the same time. Some rumors and incitement will cause extreme comments, irrational diatribes and personal attacks. Consequently, the improvement of sentiment analysis and opinion mining ability will probably make the social public opinion trend be grasped and guided better. I think, this is one issue and challenge faced by social media.

Figure 2 Public opinion

All above are my thoughts for sentiment analysis and opinion mining. Please feel free to leave comments and welcome to discuss and share your opinions with me.

2014年9月21日星期日

Some thoughts towards social media

Thanks to the rapid development of information technology, we can enjoy social networking services anytime and anywhere in recent years. Nowadays, social networking sites such as Facebook, Twitter are so popular that they have played an irreplaceable role in our daily life. Though I also use social networking services everyday now, I still cannot forget how excited I was when I met them the first time. I think this is one reason why I enrolled the social media analytics course immediately when I found it was in the course list.

Figure 1. Social media icons

I was excited when the first time I used SNS because I found that it gave us a simple and efficient way to keep in touch with our friends and make new friends at the same time. We could easily find our old friends who we have not seen for years. Moreover, we could also make friends with people who have the same interests and hobbies with us, even though we do not know each other before. As a result, we build our relationship unwittingly in this magical way. In this process, we may notice an interesting thing that the person who we just get to know is also a friend of our acquaintance. I often think the world seems small every this time, so I want to find an explanation to this amazing phenomenon.

After attending the first 3 lectures, I realize that this course is an interdisciplinary field course which not only teaches me the relevant theories and knowledge, but also lets me to review the social media industry in a broad way. Through this course, I have also learned some basic conceptions of social media and content analysis method. However, I still cannot explain the phenomenon I mentioned by using my current knowledge, so I looked for some other social media relevant theories. Finally, I find the Six Degrees of Separation. According to this theory, any two people in the world can be connected in a maximum of six steps by a friend introduction way[1]. Coincidentally, SNS is its application in the internet. This theory seems very magical and interesting, so I hope that the Six Degrees of Separation could be introduced to us in the future course.

Figure 2. Six degrees of separation

Social media is quite an attractive area which is worth to be studied and analyzed. I hope that I can have a deeper understanding in this field when I finish the social media analytics course!

Reference:

[1] Wikipedia, Six degrees of separation, [Online],
Available: http://en.wikipedia.org/wiki/Six_degrees_of_separation [21 Sept 2014].