Just wrote the most complicated SQL query I’ve ever encountered. Since the table “clicks” has more than 5 million records, I’m going to wait till tomorrow to get the result.
The reason I have to write this is that I was trying to use a PERL program to do the same, and so far, after two days, it is still running with very slow progress.
create table sqlhistory as
SELECT a.inc, a.clicktime, (SELECT MAX(clicktime) FROM clicks b
where a.keyid=b.keyid and a.adid=b.adid and b.clicktime<a.clicktime
) as bctime, a.keyid, a.adid, a.bid, a.price, a.rank, a.reserve_price, a.reserve_price_new
FROM clicks a
where a.bid<>(select c.bid from clicks c
where a.keyid=c.keyid and a.adid=c.adid and c.clicktime=(SELECT MAX(clicktime) FROM clicks b
where a.keyid=b.keyid and a.adid=b.adid and b.clicktime<a.clicktime
)
limit 1
)
======Update====
Sun Wei suggested a new way to do it, and the productivity increased almost 1000 fold
1.
create table clickorder as
select * from clicks
order by keyid, adid, clicktime
2.
create an auto_increment field called ‘inc’, and a new field called lastbid
3.
update clickorder a, clickorder b
set a.lastbid=b.bid
where b.inc=a.inc-1 and a.keyid=b.keyid and a.adid=b.adid
UPDATE: A web-based version is available. You don’t need to upload the whole file to do the conversion now.
Check out the new version at: http://blog.mikezhang.com/dcolumn/processtable.cgi.
Fortunately, there is a package called dcolumn that helps with this problem.
This dcolumn package “defines a system for defining columns of entries in an array or tabular which are to be aligned on a ‘decimal point’.”
Unfortunately, it is very hard to make the modifications manually on the tables to align the decimal points of numbers. I searched online and could not find a solution. So I created a web service that does this:
Basically, just visit the webpage: http://blog.mikezhang.com/dcolumn/.
There is only one field for you to browse a latex file. Make sure it has extension .tex, otherwise, the system would not accept the file.
Click Submit, you will obtain the converted file with correctly aligned numbers in your tables.
See the screen shot of the finished page:
Now clicking “FILE” will bring you the new file, in txt format. You need to change the extension to .tex to try it out. This intended trouble helps you to avoid overwriting your original file.
Here is how it worked on a paper:
Before:
and After:
It’s that simple:
use DBI;
my $conn = DBI->connect
("DBI:mysql:db","owner","ownerpass") or die("Cannot connect: $DBI::errstr");
$sql=qq(select field from table where id=$id);
@array=@{$conn->selectcol_arrayref($sql)};
Google blog posted an article on “Strengthening the Study of Computer Science”.
It started with:
At a time when more and more digital technologies are becoming indispensable to millions of people, the field of computer science (CS) is in trouble. Enrollment and retention of CS students, particularly those historically underrepresented in the field (women, African-Americans, Native-Americans, and Hispanics) has declined sharply.
I think the sharp decline in enrollment is really an issue, but the inner-economist of mine keeps saying that there is something wrong with the argument that we are able to address the issue by promoting this major to underrepresented people.

I believe in the invisible hand theory of resource allocation. There was definitely an excess in supply of CS students during the bubble years, but then the students shifted attention to other desciplines. We only have certain amount of brain power in the society, and the flow of these brains to different industries is definitely a good thing for the society as a whole. If computer science proves to be playing an ever increasing role in the society, I’m sure the brains will flow back. Just reward these brains accordingly.
Overall, the marginal decision maker should be indifferent to choose between any major if the market is effecient and can reward the students through the market system.
# load the data
> nes96 <- read.table(”http://www.stat.washington.edu/quinn/classes/536/data/nes96r.dat”, header=TRUE)
# load the nnet library which has the multinom function
> library(nnet)
Loading required package: MASS
# let’s change the PID variable into a factor
> nes96$PID <- factor(nes96$PID, labels=c(”Strong Democrat”, “Weak
Democrat”, “Independent-Democrat”, “Independent-Independent”,
“Independent-Republican”, “Weak Republican”, “Strong Republican”))
# fit a multinomial logit model where PID is the response variable,
# and the predictors are: log(popul+.1), selfLR, age, educ, and income
> multinom.out <- multinom(PID~log(popul+.1)+selfLR+age+educ+income,
data=nes96)
# weights: 49 (36 variable)
initial value 1836.939181
iter 10 value 1691.507857
iter 20 value 1603.709441
iter 30 value 1523.540117
iter 40 value 1461.935703
final value 1461.922748
converged
# let’s take a look at the results
> summary(multinom.out, corr=FALSE)
Re-fitting to get Hessian
Call:
multinom(formula = PID ~ log(popul + 0.1) + selfLR + age + educ +
income, data = nes96)
Coefficients:
(Intercept) log(popul + 0.1) selfLR age
Weak Democrat -0.3733563 -0.01153736 0.2976980 -0.024944529
Independent-Democrat -2.2509348 -0.08875096 0.3916628 -0.022897526
Independent-Independent -3.6659051 -0.10596768 0.5735134 -0.014851243
Independent-Republican -7.6136944 -0.09155519 1.2787425 -0.008680754
Weak Republican -7.0604314 -0.09328575 1.3469400 -0.017903442
Strong Republican -12.1051935 -0.14087942 2.0699883 -0.009432601
educ income
Weak Democrat 0.082487696 0.005195818
Independent-Democrat 0.181044184 0.047874118
Independent-Independent -0.007131611 0.057577321
Independent-Republican 0.199828063 0.084495215
Weak Republican 0.216938699 0.080958623
Strong Republican 0.321923127 0.108890412
Std. Errors:
(Intercept) log(popul + 0.1) selfLR age
Weak Democrat 0.6298384 0.03428246 0.09362654 0.006524873
Independent-Democrat 0.7631917 0.03916169 0.10823837 0.007914493
Independent-Independent 1.1565170 0.05703689 0.15854307 0.011331040
Independent-Republican 0.9575695 0.04379006 0.12889466 0.008418690
Weak Republican 0.8443601 0.03935158 0.11718480 0.007611003
Strong Republican 1.0599179 0.04213748 0.14340364 0.008133748
educ income
Weak Democrat 0.07358680 0.01763372
Independent-Democrat 0.08528965 0.02228102
Independent-Independent 0.12628792 0.03361350
Independent-Republican 0.09412459 0.02619610
Weak Republican 0.08500687 0.02297606
Strong Republican 0.09109678 0.02530048
Residual Deviance: 2923.845
AIC: 2995.845
# the results above aren’t too surprising– for instance we see that
# more conservative respondents are more likely to be Republican identifiers
# than Democratic identifiers or independent identifiers. Income has a
# similar effect. Republican identifiers also tend to be better educated
# than other identifiers.
#
# let’s look at some fitted probabilities. To do this, we’ll set all of
# the covariates (except selfLR) equal to their median values and vary
# selfLR from its low value to it’s high value.
> beta <- coef(multinom.out)
> X <- cbind(1, 3.096, 1:7, 44, 4, 17)
> Xb1 <- X %*% beta[1,]
> Xb2 <- X %*% beta[2,]
> Xb3 <- X %*% beta[3,]
> Xb4 <- X %*% beta[4,]
> Xb5 <- X %*% beta[5,]
> Xb6 <- X %*% beta[6,]
> denomsum <- exp(Xb1) + exp(Xb2) + exp(Xb3) + exp(Xb4) + exp(Xb5) +
exp(Xb6)
> p0 <- 1/(1+denomsum)
> p1 <- exp(Xb1)/(1+denomsum)
> p2 <- exp(Xb2)/(1+denomsum)
> p3 <- exp(Xb3)/(1+denomsum)
> p4 <- exp(Xb4)/(1+denomsum)
> p5 <- exp(Xb5)/(1+denomsum)
> p6 <- exp(Xb6)/(1+denomsum)
> plot(0:6, p0, xlab=”Self LR Placement”, ylab=”", ylim=c(0,1), type=”l”,
col=1, pch=16)
> lines(0:6, p1, col=2)
> lines(0:6, p2, col=3)
> lines(0:6, p3, col=4)
> lines(0:6, p4, col=5)
> lines(0:6, p5, col=6)
> lines(0:6, p6, col=7)
> legend(0, 1, legend=c(”Strong Dem.”, “Weak Dem.”, “Ind. Dem.”, “Ind.
Ind.”, “Ind. Rep.”, “Weak Rep.”, “Strong Rep.”), col=1:7, lty=1)
A pleasant surprise:
Past winners:
2007 Award for Best Paper in 2006
Bundling, Unbundling and Pricing of Multiform Products: The Case of Magazine Content by R. Venkatesh and Rabikar Chatterjee (20/2)
Runners-up
A Field Experiment to Assess the Interruption Effect of Pop-Up Promotions by Wendy Moe (20/1)
Designing Marketplaces of the Artificial with Consumers in Mind: Four Approaches to Understanding Consumer Behavior in Electronic Environments by Steven Bellman, Eric Johnson, Gerald Lohse & Naomi Mandel (20/1)
2006 Award for Best Paper in 2005
Can We Predict Customer Lifetime Value by Edward Malthouse and Robert Blattberg (19/1)
Runners-up
Who Are the Multichannel Shoppers and How Do They Perform? Correlates of Multichannel Shopping Behavior by V. Kumar and Rajkumar Venkatesan (19/2)
Collaborating to Create: The Internet as Platform for Customer Engagement in Product Innovation by Mohanbir Sawhney, Gianmario Verona and Emanuela Prandelli (19/4)
Consumers in a Multichannel Environment: Product Utility, Process Utility and Channel Choice by Sridhar Balasubramanian, Rajagopal Raghunathan and Vijay Mahajan (19/2)
2005 Award for Best Paper in 2004
Capturing Evolving Visit Behavior in Clickstream Data by Wendy Moe and Peter Fader (18/1)
2004 Award for Best Paper in 2003
Customers as Assets by Sunil Gupta and Donald Lehmann (17/1)
2003 Award for Best Paper in 2002
Intentional Social Action in Virtual Communities by Richard Bagozzi and Utpal Dholakia (16/2)

Copyright Xiaoquan (Michael) Zhang, 2004-2010. All rights reserved.
All trademarks property of their owners.