(web design) Clients from hell

January 5, 2010 Leave a comment

Lisa just sent me a blog with a very funny collection of anonymously contributed client horror stories from designers:


The really funny part about those stories is that anyone who has worked with non-technical clients, has heard more or less the same kind of silly comments:

  • Client: We like the design, but could you make the blues all the same.
    It’s the same blue throughout the design.
    It looks like different blues.
    That’s because colors are perceived differently dependent on neighboring colors.
    That’s stupid.
  • Me: A basic web page will cost around $1000.
    Oh my, that is more than what we want to pay. My nephew is in Vo-Tech and I can get him to do it for $100.
  • Can you make each page on our website fade in and fade out when you click on a link? You know…like PowerPoint?

And if you are working as a software engineer/architect, the stories get even funnier: I once worked as a freelancer for a large Organization. Even though they had all the money in the world, they refused to install Oracle, so we had to develop their OLAP solutions over {a free DBMS with no support for anything that resembles a data warehouse}. The arranged payment was more than satisfying, so I had no problem implementing everything from scratch. A few months later, our system was able to perform roll ups and drill downs, and produce complicated aggregation reports using numerous dimensions.. The problem? A month after deployment, they complained because the (super heavy) yearly report (by week, product type, department and some other parameters) was taking a couple of minutes to generate! How can you explain to them that typical OLAP operations take more than 10 secs in order to finish even in the best tuned DBMS????

If our daily encounters were similar to what we are facing in the “software development business realm”, the result would be hilarious:

The “NoSQL” dispute: A performance argument

December 10, 2009 1 comment

NoSQL is a database movement that began in early to mid 2009 and which promotes non-relational data stores that do not need a fixed schema, and that usually avoid join operations. From [1]:

Next Generation Databases mostly address some of the points: being non-relational, distributed, open-source and horizontal scalable. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, replication support, easy API, eventually consistency, and more. So the misleading term “nosql” (some call it “not only sql”) should be seen as an alias to something like the definition above …

I just read a very insightful article from Michael Stonebraker [2], one of the pioneers of modern database theory and Relational Database Management Systems (RDBMS) and “father” of systems like Ingres and Postgres [3, 4]: ‘The “NoSQL” Discussion has Nothing to Do With SQL‘ [5]. In this article, published in the blog of the Communications of the ACM, Dr. Stonebraker responds to some of the arguments from the supporters of the NoSQL movement. There are two possible reasons to move away from structured Relational Database Management Systems and adopt an alternate DBMS technology: performance and flexibility:

The performance argument goes something like the following. I started with MySQL for my data storage needs and over time found performance to be inadequate. My options were: … 2. Abandon MySQL and pay big licensing fees for an enterprise SQL DBMS or move to something other than a SQL DBMS.

The flexibility argument goes something like the following. My data does not conform to a rigid relational schema. Hence, I can’t be bound by the structure of a RDBMS and need something more flexible. ….

Focusing on the performance argument, he explains what is more or less a common knowledge in the database community: The major performance burden of modern RDBMS  comes from all those extra features like transaction possessing (especially insuring the ACID properties), logging, etc and not from the core engine for executing an SQL query:

… However, the net-net is that the single-node performance of a NoSQL, disk-based, non-ACID, multithreaded system is limited to be a modest factor faster than a well-designed stored-procedure SQL OLTP engine. In essence, ACID transactions are jettisoned for a modest performance boost, and this performance boost has nothing to do with SQL.

In summary, blinding performance depends on removing overhead. Such overhead has nothing to do with SQL, but instead revolves around traditional implementations of ACID transactions, multi-threading, and disk management. To go wildly faster, one must remove all four sources of overhead, discussed above. This is possible in either a SQL context or some other context. …

I believe that anyone interested in the inner workings of modern RDBMS should read this short post from Dr. Stonebraker and continue with a very interesting paper from Harizopoulos, et.al “OLTP through the looking glass, and what we found there” [6]. I am getting tired of discussing with people outside the database community who after writing a couple of SQL queries think that they know all about modern RDBMS and insist that the MySQL’s MYISAM engine is the best engine out there just because it is fast(er) or question my motives when I consult them that a “Lite SQL” (no reference to any real product)  solution is just not enough..

PS. Dr. Stonebraker is not just an exceptional database researcher, but a visionary whose ideas have shaped the modern Database landscape. A short abstract from an article in SIGMOD Record when he received the IEEE John von Neumann Medal [7]:

… The relational data model and its associated benefits of “data independence” and “non-procedural access” were first invented by Tedd Codd. However, more than any other individual, Mike is responsible for making Codd’s vision of independence a reality through the architectures and algorithms embodied in the series of open-source prototypes and commercial systems that he has initiated and led. While many others have certainly made important contributions to the field, no one else comes close to his continuous sequence of landmark innovations over a period of almost 30 years.

… Mike has been the primary driving force behind major shifts in the research agenda of the database community, including two occasions where he launched entire new directions for the field to follow (the implementation of relational systems and object-relational systems).

Categories: Databases Tags: ,

Preacquired Account Marketing

December 3, 2009 Leave a comment

Tens of millions of consumers have fallen prey to “free” trial offers and membership clubs offered by preacquired account marketers. These companies insert themselves into your everyday transactions, hoping to trick you into letting them charge your account. In simple words, Preacquired Account Marketing can be defined as [1]:

… The practice at issue is called post-transaction marketing, which involves the presentation of offers during the online checkout process. When done to deceive, these offers typically appear to be a required part of the checkout process, in order to trick consumers into accepting charges for unwanted products or services.

A particularly pernicious form of post-transaction marketing is known as “preacquired account marketing,” a process by which the third-party marketer acquires a customer’s credit card information from the online merchant where the customer is making a purchase.

Using this tactic, the third-party marketer only needs an e-mail address or a click as purchase authorization. The retailer, in effect, sells the customers’ credit card information because the retailer, as a partner, will get a cut of whatever extra charge the customer can be duped or pushed into accepting. …

As Michael Arrington explains in TechCrunch [2]:

… Background: hundreds of well known ecommerce companies add post transaction marketing offers to consumers immediately after something is purchased on the site. Consumers are usually offered cash back if they just hit a confirmation button. But when they do, their credit card information is automatically passed through to a marketing company that signs them up for a credit card subscription to a package of useless services. The “rebate” is rarely paid. …

What shocked me was the report from the USA Senate hearing some weeks ago, which focused on the controversial marketing companies that allegedly dupe consumers into paying monthly fees to join online loyalty programs [3]:

… Vertrue, Webloyalty, and Affinion generated more than $1.4 billion by “misleading” Web shoppers, said members of the U.S. Senate Committee on Commerce, Science and Transportation, which called the hearing. Lawmakers saved their harshest rebuke for Web retailers that accepted big money–a combined sum of $792 million–to share their customers’ credit-card information with the marketers. …

… Many of those who complained say they don’t fear the ad because they aren’t being asked to turn over credit-card information, according to the Senate report. But buried in the ad’s fine print is notification that by entering their e-mail address, the shopper is agreeing to join a loyalty program and allowing the store to authorize marketers to charge their card each month, between $9 and $12. …

Fraud, even of this magnitude, is a commonplace in the internet. That kind of scams rely on the fact that most consumers usually don’t notice 10-20$ charges in their monthly credit card statements. So, why bother? If you check the list of companies that are working with Affinion, Vertrue, and Webloyalty, you’ll find some highly reputable web sites that I would never think that they could be part of such a scam: Expedia, Hotels.com, US Airways, Classmates.com, MovieTickets.com, etc…

… Retailers doing business with the companies are also aware that customers are likely to be angered once they notice the charges but do it because they are paid big bucks. Classmates.com has pocketed $70 million from partnering with all three companies, according to the report. The government says that 19 retailers have made more than $10 million through the partnerships with e-loyalty programs, while 88 retailers have made more than $1 million ….

I have used Expedia and Hotels.com in the past and, even if they are not guilty, I’ll think twice before I’ll use them again. A huge reputation hit for such businesses (in the end, they are not just a fast growing gaming company [4], so they should be more careful), but a good lesson to learn for web 2.0 startups.. Unfortunately, electronic transactions are still defined and dealt through a ‘lighter’ moral viewpoint from many companies. I am sure that, for example, a traditional hotel booking company would risk less its reputation through that kind of deals. I believe that the transfer of credit card data in post-transaction offers must be restricted by law, and improved disclosure requirements and easier charge reversal must be regulated.

Categories: Business, web Tags: ,

Ranking of paid and non-paid advertisments

November 26, 2009 Leave a comment

Ranking paid advertisements

The solution to the problem of ranking paid advertisements is more or less trivial nowadays (after all the work from Google et.al. in this field). I am not saying that it is easy, but you can make a good start just by using a fairly simple modeling scheme. Let’s study for example a simplified approach to AdWords algorithm:

Ad Rank = {MAX CPC bid OR CPM bid OR just content bid} × Quality Score

OK, so if you pay more per click or per thousand impressions (M is the roman numeral for “thousand” – CPM is a metric that we have inherited from “old fashion” media like TV or radio) and you have a good QS (Quality Score), then your add will be ranked higher than other adds and (eventually) it will be presented to the users. As you can imagine, the tricky part of this algorithm is on the calculation of the QS and, like search algorithms, you can make it better the more users you have giving you indirect (and some times direct) feedback [1 – check end of post]. Of course, Google gives a rough sketch of how this QS works [2]:

… Quality Score helps ensure that only the most relevant ads appear to users on Google and the Google Network. …

Quality Score for Google and the Search Network:

  • The historical clickthrough rate (CTR) of the keyword and the matched ad on Google; note that CTR on the Google Network only ever impacts Quality Score on the Google Network — not on Google
  • Your account history, which is measured by the CTR of all the ads and keywords in your account
  • The historical CTR of the display URLs in the ad group
  • The quality of your landing page
  • The relevance of the keyword to the ads in its ad group
  • The relevance of the keyword and the matched ad to the search query
  • Your account’s performance in the geographical region where the ad will be shown
  • Other relevance factors

Quality Score for the Content Network

  • The ad’s past performance on this and similar sites
  • The relevance of the ads and keywords in the ad group to the site
  • The quality of your landing page
  • Other relevance factors

So, the more users click your adds (historical CTR, overall CTR, etc), and the better the quality of your landing page, the better you QS and more users will follow… Recursion [3] in its finest form 🙂 . Again, the tricky part here is the “quality of your landing page” and it is the way Google monetizes its powerful ranking algorithm and, of course, its position as the highest traffic site on the web [again, check 1].

Also, for completeness, from a post on Google Blog [4]:

… You might also wonder: “But image ads usually take up multiple ad slots — does this limit the amount of money that I can make?” Good question, but no — for an image ad to appear on your site, it has to produce an effective CPM greater than the sum of the individual text ads. …

Ranking NON-paid advertisements

Ok, that was easy, but what about non-paid advertisements? First of all, Ι must define what I am talking about: Why anybody would want to support/offer non-paid ads??? The core definition of advertisement has {payment, paid} in it! Non-paid advertisements is for a {enter a commodity/type of business/job here} listing or a local classifieds web site (like for example Yelp) what organic search results are for a search engine like Google, Yahoo or Bing. I know that you can not call Yelp non-paid results advertisement, but that’s what they are! And most of the time they are more valuable than any form of advertisement and generate a higher clickthrough rate. I bet that if, for example, my restaurant ranked 1st in Fast food category for San Francisco (I have no affiliation with “In-N-Out Burger” 😉 ),  I would never need to advertise it anywhere else in the web…

Definition of the problem: Assume a website like Yelp: People add their location and {business/job/offered commodity} (call it whatever you like – I’ll stick with the non-paid advertisement term) and users of the site can search for XXX “commodity” in YYY location. The most important factor for the success of such a website is that businesses can not buy their ranking. They may be able to buy {sponsorships/adds/referred posts} but they must not be able to buy a good ranking in the search results.

How such a site can rank results? Think a little bit about the problem: When the user has decided what he is searching for (let’s say fast-food restaurants) and the location(s) he is interested into (“near Union Square, San Francisco”), a successful website must decide how to order those 100s of results that satisfy this simple categorical query. There are no keywords in order to find the best matches (using well known information retrieval algorithms) and no CPC/CPM as all are free “posts”/ads. Only QS (Quality Score) is left from the AdWords algorithm.. So, what are the options?

Results can be ranked according to their reviews, which generate some kind of a rating. This is the best solution for businesses/restaurants/etc: in the end, a restaurant must be good if thousands of  people say that it is good. And this is the approach that most big players, like Yelp, choose to take. Note that in the base case scenario (i.e. if the implementation is reasonably efficient), it is not that much different from the sophisticated ranking algorithm of Slashdot for ranking news.

But not all listings/classifieds can be ranked like restaurants, bars or news stories.. Either because most people do not have the expertise to rank the posts (what do you think about your doctor?) or because not so many people have an opinion about each post (how many apartments or doctors do you check each month?). How should a service presenting {doctors / lawyers / etc} or apartments in San Francisco rank the results?

The easy solution is by distance: You give your exact location and results are returned by how far the doctor’s office is. It is a nice visual representation and someone could think that it must be the clear winner, especially if the implementation uses an acceptable map representation (most sites I have seen don’t – they just use a super small google map). I must say that I don’t like this solution at all, as I prefer to travel 2 more kms in order to find a doctor that will not accidentally kill me (I live in a “not so reputable” area)… So, we can keep the map visual representation as an interesting component of a hybrid solution, but the problem of defining some kind of a Quality Score is still present…

Another solution that old-fashioned media and most web listings/classifieds (like “the one to rule them all” – craigslist) are offering is reverse chronological order. “Order By Date Desc” is the perfect solution for most classifieds (jobs, selling a kitty, searching for a brunette now, etc) but how about doctors, plumbers or apartments?  Is an apartment worse than others if it was post 2 weeks ago? I believe that this one can be used only for limit time offers and marketplaces.

The solution? I don’t have a clear solution right now, but in our new project saini.gr (still in alpha – nothing to see there + it is in greek), which can be summarized as a Yelp for tutors, language teachers, etc, we are experimenting with some new forms of ranking algorithms that will not depend solely on human ratings/reviews or on spatial parameters. I hope that we’ll soon end up with an interesting -non-trivial- algorithm, which will be worth sharing 🙂

Bellow, I quote some interesting ideas from teachstreet [5], one of the leading sites in finding tutors, classes, etc:

  • Claim your profile — teachers who are registered with TeachStreet automatically rank higher!
  • Add more content to your Teacher profile and class pages
  • Include photos, location, time, and skill level for your classes — the more details you provide, the better the search filters work for you.
  • Tell us why you are an expert by adding an extra sentence on your expert list.
  • Write articles (including photos or videos) to show up on our Home Page or on the articles page for your subject.
  • Be active on the site — accept enrollments, update listings, respond to students quickly, and just be an all-around good TeachStreet neighbor — remember that it’s in our best interest to promote our most active teachers, because they’ll provide a great experience for our student visitors.

here are also actions that a Teacher can do on a regular basis to give their listings little, short term boosts in results. These actions include:

  • Accept enrollments
  • Replying to messages
  • Creating and sending invoices
  • Adding/Updating sessions
  • Requesting reviews
  • Writing articles
  • Adding photos and other media

[1] That’s why it is so difficult for other companies to overthrow Google in this not so fair race.

[2] What is ‘Quality Score’ and how is it calculated?

[3] As we are talking about Google, have you ever tried to search for “Recursion” on Google? A suggestion is presented in the top of the results: “Did you mean: Recursion “. Press it and you can continue requesting for the same results and getting the same suggestion over and over again.. I love geeky engineer humor!

[4] Ad Rank explained!

[5] TeachStreet: http://www.teachstreet.com/
All About TeachStreet Search blog post
SEO & TeachStreet Search Rank

What is that?

November 26, 2009 Leave a comment

A very sweet short film from Constantin Pilavios:


Categories: Life Tags:

Mobile Web 2.0: Opportunities and problems

November 8, 2009 Leave a comment

Today’s smart phones are promoted by their manufacturers as lifestyle tools to enable sharing experiences and social networking via Web 2.0 sites and mobile friendly media portals. So, what Mobile Web 2.0 is? Just as Web 2.0, it can be defined as the network as a platform, spanning all connected devices. Effectively, the definition for Mobile Web 2.0 boils down to three key verbs : ‘share’, ‘collaborate’ and ‘exploit’.

The emerging Mobile Web 2.0 opportunities can be summarized as:

  • (Context aware) Advertising. In web environments, web sites have available only your IP in order to identify you, but in mobile web your location, unique ID or profile (don’t forget: most users own just one phone) are also always available.
  • Anywhere, anytime accessibility that will allow real-time social-networking and sharing of user-generated content (information, videos or photos).
  • Voice over IP and instant messaging. Why call your friends or send them a really expensive SMS (if you think about $$ per byte) when you can make a Skype conference call or send them a message through MSN, GTalk or any other platform available?
  • Off-portal services.
  • Location based services and context aware (mobile) search.

Numerous start-ups have entered the field, but even established social-networking companies are getting involved. “People want to take Facebook with them,” said Michael Sharon, a product manager with the company. “So we work with [device makers] to create applications for their phones.” As George Lawton writes in his very interesting article for the IEEE’s Computing Now Exclusive Content [1]:

eMarketer estimates that in the US alone, the number of mobile Internet users will rise from 59.5 million in 2008 to 134.3 million in 2013.

Juniper Research predicts revenue generated globally by Mobile Web 2.0 will grow from $5.5 billion in 2008 to $22.4 billion in 2013.

Social networking is the fastest growing type of mobile Web application, with US usage increasing 187 percent between July 2008 and July 2009, according to Elkin.

But there are also numerous challenges:

  • Bandwidth and device limitations
  • Platform compatibility
  • Viable business models

I believe that Mobile Web 2.0’s future will be location-based services, supported by context aware advertising.

What is missing? A (working) mobile platform a la Google AdWords/AdSense that will exploit not only keywords and user profiles, but also the rich available context aware information about the users.  I am talking about context aware data management, presentation of information and (of course) advertising. What we need is a framework for modeling that kind of volatile information and the thousands of different views/viewpoints that we can use in order to present it. As this is a topic that needs some preparatory discussion, I’ll return with a following post in order to describe the state of the art in context aware computing and my thoughts on context aware advertising.

Categories: web Tags:

The Status of the P Versus NP Problem

October 17, 2009 Leave a comment

I just read a very insightful article on the status of the P versus NP problem, which was published in the Communications of the ACM:


The author not only explains the problem in detail and the implications of solving it in modern computer science, but also presents some of the research attempts to solve it during the past decades and a compact survey of what is happening right now in this field.