Debjyoti Mitra's blog

Saturday, 30 March 2013

ITBA Lab Session 10

Assignment 1

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
T<- cbind(x,y,z)
Create 3 dimensional plot of the same

> sample<-rnorm(50,25,6)

> sample

[1] 30.785023 31.702170 23.528853 18.208267 32.110218 35.820121 32.404731

[8] 24.507976 14.959855 29.919671 27.677203 17.108632 27.514712 20.260337

[15] 26.557483 30.048945 23.540832 15.833124 29.411549 27.037098 29.744451

[22] 28.901576 31.999236 32.641413 24.628705 27.263692 32.895669 27.046758

[29] 20.699581 32.417177 20.637992 20.448817 29.045200 9.706208 19.479191

[36] 19.214362 30.487007 41.029803 26.190709 24.989519 28.134211 25.319421

[43] 22.595737 27.045515 20.529657 36.455755 31.249895 19.290580 24.701767

[50] 24.621257

> x<-sample(sample,10)

> y<-sample(sample,10)

> z<-sample(sample,10)

> x

[1] 30.45576 20.63799 23.52885 20.69958 41.02980 29.74445 31.24990 30.48701

[9] 32.64141 15.83312

> y

[1] 20.69958 22.59574 36.45576 30.48701 30.78502 32.64141 32.40473 24.50798

[9] 24.98952 26.55748

> z

[1] 27.03710 32.40473 27.04676 24.98952 30.04895 24.50798 36.45576 29.04520

[9] 19.29058 30.78502

> T<-cbind(x,y,z)

> T

x y z

[1,] 30.45576 20.69958 27.03710

[2,] 20.63799 22.59574 32.40473

[3,] 23.52885 36.45576 27.04676

[4,] 20.69958 30.48701 24.98952

[5,] 41.02980 30.78502 30.04895

[6,] 29.74445 32.64141 24.50798

[7,] 31.24990 32.40473 36.45576

[8,] 30.48701 24.50798 29.04520

[9,] 32.64141 24.98952 19.29058

[10,] 15.83312 26.55748 30.78502

> plot3d(T)

> plot3d(T,col=rainbow(1000))

> plot3d(T,col=rainbow(1000),type='s')

Assignment 2

Read the documentation of rnorm and pnorm,
Create 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) 3. Color code and draw the graph
4. Smooth and best fit line for the curve

> x<-rnorm(1500,100,10)
> y<-rnorm(1500,85,5)
> z1<-sample(letters,5)
> z2<-sample(z1,1500,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)
> qplot(x,y)

> qplot(x,z)

> qplot(x,z,alpha=I(1/10))

> qplot(x,y,geom=c("point","smooth"))

> qplot(x,y,colour=z)

> qplot(log(x),log(y),colour=z)

Friday, 22 March 2013

Mapping your connections(ITBA Lab session 9)

For this assignment, I have taken the help of facebook application Netvizz and have used an open source software Gephi.

Netvizz allows you to extract data from different sections of the Facebook platform for research purposes. It creates network files in the gdf format (a simple text format that specifies a graph) as well as statistical files using a tab separated format
These files can then be analyzed and visualized using graph visualization software such as the powerful and very easy to use gephi platform or statistical tools such the interactive visualization software Mondrian.
Big networks may take some time to process.
Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.

Gephi is a tool for people that have to explore and understand graphs. Like Photoshop but for data, the user interacts with the representation, manipulate the structures, shapes and colors to reveal hidden properties. The goal is to help data analysts to make hypothesis, intuitively discover patterns, isolate structure singularities or faults during data sourcing. It is a complementary tool to traditional statistics, as visual thinking with interactive interfaces is now recognized to facilitate reasoning. This is a software for Exploratory Data Analysis, a paradigm appeared in the Visual Analytics field of research.

1) I am a 2010 (Electrical engineering) pass out of NIT Durgapur. The date that I have extracted (Netvizz) and analyzed (using Gephi) is from a group called "Tricalites 2006-2010 Batch (NIT DGP)".

Click here to see the group

2) After extracting , I ran the data on Gephi.

The figure below tabulates the data according to Nodes, Sex, Id and Group Id.

3)The next figure shows the kind of connections that are there in between the members of the group. From the figure we can see that almost all of them are connected to each other i.e almost all happen to be mutual friends.

Number of nodes = 51

Number of edges = 995

4) The next figure shows the same connections in the form of a cloud.

5) The next figure shows the nodes after the "Fruchterman Reingold" layout was applied.

The figure below has been deliberately zoomed to bring out a better view of the nodes and the connections.

Features of Gephi :
1) Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.

2)Runs on Windows, Linux and Mac OS X.

3)Gephi is open-source and free.

Applications :

Exploratory Data Analysis: intuition-oriented analysis by networks manipulations in real time.

Link Analysis: revealing the underlying structures of associations between objects, in particular in scale-free networks.

Social Network Analysis: easy creation of social data connectors to map community organizations and small-world networks.

Biological Network analysis: representing patterns of biological data.

Poster creation: scientific work promotion with hi-quality printable maps.

Friday, 15 March 2013

ITBA Lab session 8

Panel Data Analysis: To do the panel data analysis of "Produc" using the models: Pooled, Fixed & Random.
Also to choose the best model by using the tests:
pFtest : between fixed and pooled
plmtest: between pooled and random
phtest : between random and fixed

To load the data
Commands:

> data("Produc" , package ="plm")
> head(Produc)

Pooled Affects Model

Commands:

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))
> summary(pool)

Fixed Affects Model
Commands:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))
> summary(fixed)

Random Affects Model
Commands:

> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))
> summary(random)

Tests:
Pooled vs Fixed:
H0: Pooled Affects Model
H1: Fixed Affects Model

As the p-value is very small, we reject the null hypothesis and accept the alternate hypothesis.
=> Fixed affects model is accepted.

Pooled vs Random:
H0: Pooled Affects Model
H1: Random Affects Model

As the p-value is very small, we reject the null hypothesis and accept the alternate hypothesis.
=> Fixed affects model is accepted.

Result:
By conducting all the above tests, we come to a conclusion that Fixed Affects Model is the best to do the panel data analysis for "Produc".