July 09, 2009

Attention Web 3.0 Start-Ups: Redirect 25% of your R&D Budget to Market Understanding and Development

In my observations on the Semantic Technology conference last month, I mentioned I would be blogging separately about the dire lack of investments by Web 3.0 start-ups in market understanding and marketing.

Before I start, let me signal some conflict of interest here: in the course of my job, I lead efforts on market understanding and positioning, brand awareness and lead generation. The idea is that developing my market is in the interest of start-ups with a stake in Web 3.0 and the semantic web. I get more business when they do :) But please do discount my words for this bias, and decide for yourself, when all is said and done, whether we have a solid case here or not.

My main claim with this post is that companies in the Web 3.0 and semantic web space are downright bad at filling unmet needs, packaging their offer effectively and cutting through the market noise to secure users and revenues. It even appears this incapacity to grow revenues has a direct impact on their ability to make money (ok, it's sunny outside, after five days of clouds there is no harm in having a little fun, is there? ;)

The large and growing number of players at SemTech told me that we are already in a crowded market. And yet: few real products, and very few sales. Most of the people there live on R&D funding from public institutions and overtrusting angels, not client receivables.

Why?

As a VC said during his talk, "this is a very early-stage market".

Allow me to translate.

That's code for "I don't get it, what are you guys selling here? Better fix that quick if you want me to come back next year"

This has been an early-stage market for over 8 years, which by my estimate is 1,302 web years. And it is going to remain so, as long as players prefer to hire one more Java "guru" as opposed to investing properly in market research, a business model and some marketing.

It was obvious, just going around, attending presentations and talking to people, that the business side of those companies is generally in an advanced state of starvation.

So many companies that presented couldn't enunciate any reason to use their products in an hour, let alone an elevator pitch. Too many had inexpressive and even customer-hostile names (ask me for examples by email...) and geeky logos that don't talk to their audience; too many, to present their product, sent technology experts apparently missing the "communication" function; too many didn't have a clue about their market, how to penetrate it, and no thought as to why, when and how often would anybody use their "solution". Please, let's call it a technology.

If Web 3.0 companies are eager to become mainstream, it didn't show in San Jose.

I brought up the problem of underinvestment in market understanding and marketing up at my talk as well as with a few start-up executives. The most common response was that resources at start-ups are limited. I know they are, but I don't buy it when I see the same companies already employing a dozen folks working on the product development side and hiring extra R&D FTEs. What's the ROI of that as opposed to taking one dedicated business person on board (and it doesn't even have to be full-time to start paying off)? I can recognize a case of feature and scope creep when I see one, and this is the supersized version of it. "Just one more feature!"

The reason for that are diverse, and have a lot to do with the academic and R&D origins of this stuff. I'll expand on this in a broader post on my other blog To Revenue at some point, and explain how I think even fundamental R&D is moving towards a more market-driven approach. Suffice to say, most if not all of the investment for the semantic web went into R&D and technology development to build the tools (standards and applications), and nothing, or very little, to test and refine our initial understanding of the market problems these technologies would solve, and evaluate the best way to solve them.

Luckily, the solution is simple: as a good rule of thumb, I’d see every start-up in the web 3.0 and semantic web space redirect at least 25% of its R&D expense to consumer research, brand awareness, partnerships and marketing. All good things that will make a start-up stronger if they don't kill it (better that than a slow, painful agony).

As I see it, the development of any market solution is an exercize in bridging the gap between two things: technological possibilities and market needs. Focus on one at the expense of the other, and you'll find yourself quickly in la-la land.

I can't keep count of the numbers of start-ups I come across that have gone too far along the product development curve and find themselves with a technology they don't know how to sell and to whom. Invariably, they haven't tested the waters prior to funding  development or continued to tune that development to meet real needs.

Why risk a reality check? It was just all too exciting at the time.

So curb your enthusiasm and get yourself a suit, with that free business budget you just found.

 

PS. Business folks need to explore ways to deliver more value, too. It takes a special kind to be effective at a tech venture, and the rapid changes in marketing also make it challenging to deliver business results if you come from a traditional marketing background. I talked about the difficulties in balancing the tech and biz needs and cultures here, and I will tackle the marketing revolution in a future post.

Reblog this post [with Zemanta]

July 08, 2009

Video of the Semantic Web Gang at Semantic Tech 2009

Here is the video of the live podcast we recorded at the Semantic Tech conference, courtesy Semantic Universe and Paul Miller who arranged to have this recording.

The sound is a little lacking so plug your earphones to hear better. An MP3 will also be posted soon by Paul is available on the gang's page.

Disclaimer: I am not responsible for putting my own pic on the video screenshot!

Lastly, my company name was misspelled "Growthrate" in the introduction, which is funny. Maybe I'll use that if I launch some economic analysis subsidiary :)

PS. you can also find an MP3 for the panel I moderated on Business Models here

The Semantic Web Gang looks back at SemTech 2009 from Semantic Universe on Vimeo.

Reblog this post [with Zemanta]

July 07, 2009

Will We Soon See a Rally on Web 3.0 Start-Ups? My Take on Semantic Tech

About 3 weeks ago, I left a warm Canadian sun to fly south to San Jose's chilly weather, and attend the Semantic Technology conference. Or, let's be fair, 1.5 days out of the five the event lasted. The rest, mostly, I followed through Twitter, which by all accounts tends to be a little like watching the Olympics on TV, as opposed to being in the stadium: you don't get the pop corn and hot dog breeze, but in the end you know more about the games than people who were there (and not on Twitter too... it gets quickly confusing.)

Of course, the big objective at "Sem Tech" is networking. Thanks to the nice folks of Semantic Universe (thank you Eric and Steve), I managed to make it, last minute, onto the exclusive list of VC Connect, a cocktail reception which gathered both fundraising CEOs and investors (of whom more attended than I thought there would be. I guess the free hors d'oeuvres did the trick). I stroke most great conversations there (apologies to those I met outside of VC Connect!)

So, my impressions of the event. I know those are late but I'm not competing with news organizations, just with analysts, so that's ok!

Overall, it was sort of "hot and cold": there were lots of very positive signs and, by all admissions, there is still a lot to do before Web 3.0 and the Semantic Web truly happen.

Key mental notes I made to myself:

  • Fully endorsed: all the godfathers are now at the table. This event marked the first participation of Google, confirming the company's mood reversal after the tough words Eric Schmidt had for the field a while back. Anyway, what matters is that the 800-pound gorilla now sits next to Yahoo's SearchMonkey and King Kong Microsoft, putting their seal of approval on the effort, and all the credibility that comes with it.
     
  • Market traction (which some call hype): attendance was up by 16%, which, in these macro-economic conditions, is no small feat. Lots more companies are asking what they should do to get ready for web 3.0 and the semantic web, and visiting the conference to understand. I don't know whether they got the answers they hoped for, but they were there and that's one first step!
     
  • Real products in B2B: there were some real, commercialized semantic products, especially in the B2B space, such as those of Cambridge Semantics, and products in that space are starting to use Linked Data and integrate information into the Semantic Web.
     
  • B2C disappointment : on the consumer side, of course there was Wolfram Alpha, which is certainly interesting but ultimately not earth-shattering. But at the exception of search engines, I found that little was new in B2C. SIRI was presented as a "game-changer", but with all due respect to Tom Gruber, I believe it is way too early for "Virtual Personal Assistants" to take off. Technology is not ready to support such a value proposition, all the hype will only help create expectations that can't be fulfilled, and because of this it is likely to fall flat in the market.
     
    I saw nothing that took advantage of Linked Data (URIs) and the stack (RDF etc), beside those few apps we already know about, i.e. Zemanta, Glue... Even then support is limited (yet to come for Glue, I believe). Clearly the consumer space is still quite a bit in R&D mode (yawn...), and what I came across felt somewhat unfocused and confusing too. Hopefully more will happen on that side in the coming months, with the technologies maturing.
     

Sem Tech also showed me there were still challenges, at 4 levels:

  • Incomplete technology : linked data now seems mature enough to create commercial apps, but as I described in previous posts there is a huge risk of GIGO (garbage in garbage out), as text analysis is lagging behind. Also, the general focus on aggregation is diverting resources from creating smarter application from a user standpoint. Nothing I saw at SemTech made me feel much better about those issues, although I believe there are efforts taking place in text analysis that may yield good results going forward.
     
  • Usability deficit: way too many apps are still trying too hard to have machines do all the work. A majority of CTOs have a hard time internalizing that machines are still stupid. They are. AI won’t be here for another few decades. In the meantime, companies should focus on higher usability.

    I was delighted to see this theme emerge strongly at SemTech, when the search engine panel advocated turning the search experience into a "conversation" with the user. By that they meant asking the user for clarification when there is ambiguity, for instance, or too many answers to be practical. Beside improving the user's experience, dialoguing with her and collecting the results intelligently would really enable the creation of rich semantic engines over time.
     
  • Underinvestment in market understanding and marketing : this is worth its own post... Stay tune, I'll publish it in two days. Let's just say this is a huge issue in the space, and while I witnessed a certain progress at Sem Tech this year, we are faaaar from the levels required to make this R&D haven a business.
     
  • Forget VCs, unless...: I was also hoping to see significant growth in the number of VCs and strategic investors at Sem Tech this year, and yet there seemed to be very few of them around. I could count maybe a dozen on the participant list, and met only about five of them. Interestingly, all said they would not invest in anything “semantic web”, but in solutions to problems, and they kept repeating that they saw the main problem with companies at Sem Tech as being a lack of focus on a consumer or company problem (see, I'm not the only one!)
     
    I am not exactly sure what to think of that, though, coming from VCs. On the one hand, I obviously agree with them about the prevalent lack of a market-centric approach in the semantic web space. Yet, on the other, there is a number of companies with great semantic technologies just waiting to be properly positioned and packaged by a clever business team. What do VCs do if not helping with that?
     
    After all, if you ask them, VCs don't invest in "Web 2.0" either. Yet in the meantime, they are still funding the next social network...
     
    As was pointed out to me by a Gold sponsor of Sem Tech, even the best of VCs, such as my friends (before this post, that is!) at Intel Capital, end up sending confused signals to entrepreneurs: case in point, while professing on the Sem Tech podium to be only investing rationally in companies benefiting Intel, Intel Capital put money in companies with very fuzzy problem-solving potential and a very, very diffuse contribution to Intel sales. Such as Bragster, that I can't call anything else than a stupid social network, very unlikely to return any money to its investors. If Bragster is helping sell chips, what isn't?!
     
    So, when VCs don't follow their own advice, how do you blame entrepreneurs for not doing it either?
     
    But there is "hope".
     
    Since VCs pretty much invented irrational exuberance and benefit from it (believe it or not, I am fond of the VC model - I just wish there was a little more self-discipline in the field), we could very well see a rally on the Semantic Web and Web 3.0 this year or the next, that would of course temporarily put aside question marks around "market problems" etc... VCs need focal points like that, and since cleantech investing seems to be taking a breather, the web 3.0 is the next best candidate.
     
    Having said that, I am not encouraging any company to forget about tackling a real market need that they can get paid for. In any case, the best path for many of the start-ups at semantic technology companies is not to bet on a VC rally on Web 3.0 and rather to assume there is little investment money in the field. In other words, forget fundraising, and focus on revenue and profitability from day 1. If you do that properly, VCs may soon come knocking at your door (assuming they recover from their financial crisis wounds), and you will be ready for any rally.
     

Such minor considerations apart, it looks like things are moving in the right direction, although still a little too slowly on the business side. At Sem Tech, I was a bit let down by the limited range of new, innovative solutions solving a widespread or even a narrow problem, but I think that the stars are slowly aligning and we may see an acceleration wave. Especially if rapid improvements are made in technologies to structure unstructured data, if an increased focus is placed on revenue model, usability, and market awareness, and if you follow all the advice in this post and refer it to 10 other people...

During the live Semantic Web Gang podcast we recorded at Sem Tech, which will be available here when it is published, I said that business people should come in droves to Sem Tech next year and take things over from tech folks. While this was deliberately provocative, I think there is a huge imbalance in the field at the moment and it incumbs upon CTOs and other tech CEOs to rebalance their efforts, open the gates and welcome more business types in. Opening up the semantic web is, in my mind, the key to success. Rendez-vous next year to see how well start-ups have fared.

Reblog this post [with Zemanta]

July 03, 2009

Tech vs. Biz Type Divide: Tackling the Number One Product Failure Driver Headfirst

I am back after quite some time, my ability to blog being constrained by business requirements. I am indeed happy to report that Growthroute's activity is growing rapidly, with clients in the web 3.0 space in particular. I know, I know, I have not even published my thoughts about semtech yet. I was there for less than 2 days, but there were some interesting things I took away that I have not seen expressed anywhere yet, so I will blog briefly about that very soon.

In the meantime, I have just published the slides of a beta presentation I gave at Communitech, the innovation and entrepreneurship hub in the IT mecca that is Waterloo, Canada -- and I believe it should be of particular interest to ventures in the semantic web and web 3.0, since it tackles the gap between techies and business people, discusses how that affects commercial success, and explores potential solutions. The post and slides can be found here. I welcome all constructive feedback, with few limitations on what constructive means.

Reblog this post [with Zemanta]

June 15, 2009

Excellent Video on Web 3.0 and the Semantic Web

The following video was pointed out to me recently and I find it to be a terrific overview of the current evolution of the web:

I got this from here. I don't know about the European Union claiming to spearhead the next web but, that detail apart, it is good material.

June 12, 2009

Will 'Common Tag' Help a Publishing Assistant Alter the Web Forever?

This could be a giant leap forward for the web... toward a useful giant global graph.

Common Tag, which was released yesterday, is a logical extension of Linked Data. In a nutshell, it offers an accessible and open standard to incorporate semantic tags in web content. It is supported by a range of up-and-coming semantic players, the most notable among them being Yahoo SearchMonkey.
Logocommontag
That, in itself, is a solid step forward. Its simple set of standards adds the necessary muscle to the RDFa skeleton. Common Tag ties tagged concepts to URIs, defined addresses centralized in a handful of trusted repositories, yet even advocates straightforward cut-and-paste to tag HTML. Simple and powerful.

This is no technological breakthrough, all the pieces were already present before, but just like coins without a banking system, RDFa and Linked Data needed an easy-to-understand and easy-to-communicate mechanism to add semantics to content. Now we have it.

But wait, there is more.

And in my opinion, that could be web-changing.

The common user is about to be empowered with an easy tool to add semantics to their content, because one of the companies behind Common Tags, by outputting Linked Data in a standardized format, is solving the last problem standing between us and a more semantic web: the user interface.

Building on top of its friendly integration with mainstream web publishing tools, Zemanta, the "publishing assistant" application, is now making it possible for users to integrate semantic tags quickly and easily into their content. And, if I understood that correctly, it is the only company behind Common Tag to do so. If not the only one, at least the user-friendliest. Smart.

To do that, Zemanta reads your content, and makes recommendations of common tags for different concepts. In other words, it transforms unstructured data into structured semantic data that's quite precisely defined and linked out at the concept level. Quite precisely, because it doesn't rely just on algorithms, or just on humans, but on both. Don't be fooled: quite precisely is a huge improvement over existing solutions... As a human, I can now afford to take some time to add precision, because a lot of the groundwork to identify relevant tags and their addresses has been done for me.

Today, it works for blogs (except Wordpress as it strips out RDFa, but they'll have to change that...) and emails (I'm just guessing about that last one since Zemanta does email too). Tomorrow... mmm, let me guess... it will work on webpages and other content too.

No surprise Andraz Tori of Zemanta is the driving force behind this initiative...

Coupling its easy-to-use interface with Common Tags, Zemanta makes semantic data finally accessible to non-programming folks, i.e. most of the web publishing world. Which ultimately means, if it continues to execute as nicely as it has in the past, a lot of web 2.0 users. World domination is near. Think "Enhancing the world's knowledge".

Two things stand in the way. First, competition. OpenCalais so far has proved to be much less mainstream-friendly (sorry to my friends there, but I think they know that - they also focus much more on the B2B market as their recent deal with CNET shows), but they could come up with a more accessible interface for tagging webpages with what I'm just conjecturing might be more precise tag recommendations algorithms. Joe the web user would adopt that to enhance his web content. Many other application developers will target this new market.

Second, the system gives an incentive to humans to tie to the right tags: being found. But that's a two-side carrot. Remember meta tags? What makes Common Tags any different? The URI linking is no quality control.

Let me ask that: in your opinion will Common Tags survive the test of system-gaming by users using the wrong tags for their content? Could it work if a third-party application was the one tagging your content, using some CSS-type tags instead of embedded tags (nothing prevents Common Tag CSS)? It could do it as well through a mix of both algorithms and human inputs.

Thanks to a tag I added to an obscure post I made some time ago (I know, that sounds as credible as a Bill Gates ad for open source), I can make the bold claim of having long been calling for a "semantization" engine to help us easily add meaning to existing and new content (I even bought the domain semantized.com in case I was to launch that myself, you know, one day...) Turns out I was wrong. But not by much: what we got today, which can change the web as we know it, ladies and gentlemen, is a zemantization engine.

Reblog this post [with Zemanta]

May 26, 2009

Search: Statistics vs. Semantics. And so the Battle Begins...

The Semantic Web gang gathered this month to discuss the recent launch of Wolfram Alpha and the endorsement of RDFa by Google.

My impression of Wolfram, to talk about it for a second, is that it fills a clear white space in the search engine arena, a space I would divide up into 2 sub-fields:

  • FIND: when you seek a specific, well-defined piece of information, you're into FIND mode. IMHO, that's a task in which Google's supremacy is fast eroding. If I seek a precise answer to a question, say the names of the different provinces in India or all the movies in which Sharon Stone played (not that I'd ever look for that), I tend to rely less and less on the search engine gorilla. I either go directly to Wikipedia (although it's a little like Google in that it's often serving me 'too much information'), use vertical databases (such as IDMB for movies), or land directly on more targeted search engines such as Powerset or, now, Wolfram, which impressed me.
    Granted, I sometime still use Google to access Wikipedia. But the point is, Google is not my exclusive entry point to the web in that scenario. So Wolfram may well have found a key weakness to exploit, as the statistical approach *may* not be ideally suited to this task. Will Wolfram steal significant volume of clicks from Google? I don't know, a lot of that comes down to execution, but there is no denying it found a crack in the shiny armor.
    On top of that, the trick of using incoming links as a key determinant for search result relevancy is destined to be supplanted by approaches letting the machines interpret the content of those pages itself (read: semantics), and ultimately using that as the primary relevancy driver. Remember, incoming link data is just a proxy of popularity: and as any intermediary, it is destined to be cut out of the web search food chain sooner or later. Popularity is useful as a driver when I want to see what others read. That's not very useful when I seek some specific answers.
    Once a machine is able to figure out with high accuracy what a page is about, and furthermore, what each piece of data in that page is about (a problem services like OpenCalais and Twine are working hard to crack), it becomes much more precise at serving me what I am looking for, and only that. Given that currently, I still have to go through 95% of information I don't need to find the 5% I need, I say such improvement is more than welcome, and Google better watch out for better mousetraps.
     
  • DISCOVER: the other use of search engines is general surfing on a more or less well-defined subject, to DISCOVER interesting content. Google still dominates that activity in my day-to-day use (alongside Wikipedia, I'd say). It's pretty good at showing me things I didn't think of, from a variety of sources, and letting me explore as I wish.
    Say I'm looking to learn about ocean navigation tools under the Roman empire: Google would be my first stop. There are many competing services for that task, but nothing yet replaces the one-stop-shop that Google constitutes for that activity. Whatever I want to learn more about, I pretty much know I'll get interesting pages from Google, ranked with an algorithm I am ready to trust.
    Over time, I could see this Google activity as being threatened too. But the popularity-based approach strikes me as something that's likely to endure, and statistics are pretty good at determining popularity. Better than semantics that is.

Ultimately Statistics and Semantics won't be used in isolation of one another; and the two activities above are not black and white either, there are lots of grey area, with a mix of FIND and DISCOVER. But overall, I bet we could divide a good chunk of searches between those two activities. So the logical follow-up question is: how do our searches divide up between them, volume-wise. I don't know. If anyone has information on that, please share. In the meantime, I'll be watching my own searches...

All this to say I'll be listening with interest to the review of Wolfram Alpha and Google’s adoption of RDFa by the Semantic Web Gang's, which I wasn't a part of this month due to a prior client commitment following the exciting Web 3.0 conference (which I'll discuss later). The Semantic Web Gang's recording can be found here.

Reblog this post [with Zemanta]

May 24, 2009

Launch of the Toronto Semantic Group and First Meeting

I am glad to announce the launch of the Toronto Semantic Meetup Group, at last! My friend William Mougayar, CEO of Eqentia, has taken the initiative to put that together and asked me to join him as a co-organizer, a role I am glad to take on. I had been envisioning the launch of this group for a while but the logistical challenge of running a group in TO while living quite far from the city, coupled with work commitments, had kept me from acting. So I am especially thankful to William, and will support him in any way I can. After the successes of the Palo Alto and New york semantic groups, it became clear that Toronto was in dire need of a forum for our small-but-fast-growing community of interest.

Our first meetup will take place this Wednesday at 6PM at Xtreme Labs, 67 Yonge Street 16th floor, Toronto ON M3H 6A7. After an introduction to the group by William, I will be giving a presentation "Semantic Web 101" and discuss ways start-ups can succeed in the space.  William and I will also report on the recent Web 3.0 conference we both went to. There will be an extensive Q&A and networking opportunity. Please visit the Toronto Semantic Group page on Meetup to register with the group and tell us whether you can make it. Note that space is limited and there are only 15 seats left.

I hope you can make it.

Reblog this post [with Zemanta]

May 15, 2009

Great Contributions from Leaders of Companies Using RDF and URIs

Check out the comment section of the part 3 of my blog post series. Very interesting contributions by Sean Martin of Cambridge Semantics and Bill Roberts of Swirrl.

Reblog this post [with Zemanta]

May 14, 2009

New Review by ReadWriteWeb of our Series Of Posts on Web 3.0, the Semantic Web, Linked Data and Data Structuring Technologies

Richard MacManus, Founder and Editor of ReadWriteWeb, has just published a review of my last series of posts: it is called Understanding the New Web Era: Web 3.0, Linked Data, Semantic Web, also available through the NY Times, and nicely summarizes and complements my analysis with additional thoughts and examples. I greatly recommend the read, and thank Richard for taking the time. I hope to see interesting discussions emerge from it on RWW's site, on this blog, and at the imminent Web 3.0 and Semantic Technologies conferences.

Reblog this post [with Zemanta]

May 13, 2009

Tying Web 3.0, the Semantic Web and Linked Data Together - Part 3/3: Structuring Chaos

In my previous post, I argued that two things can help bring to life a truly semantic web: the first one is the Linked Data medium. One person commented that Linked Data is not just a medium, but creates meaning. I see the point, if you assert that meaning is created through data transformation, as one can process RDF triples (the Linked Data format) through SPARQL (the query language, like SQL for databases) and also create new associations of triples linked through common URIs (universal concept addresses - I described them in the previous post, and you can double-click on the word for a definition). Depending on how you define meaning, you could characterize that as meaning-creating. Or not.

Let me specify my thoughts further. As I see it, the biggest hurdle in enabling the semantic web right now is in creating “clean” triples, and the right links to the right URIs, from unstructured data. That type of data transformation from unstructured to structured really is where 80% of the meaning (to pick a number that sounds right – let me bet that too will get me interesting tweets) is added. That’s why, in most cases, it’s still best done by humans. Because it’s tough. And it’s tough because it adds lots of value to the original data.

I see how further data transformation of the type enabled by Linked Data can add extra value, by allowing the processing and linking of data across the web, but technically, that is (1) not adding "as much" meaning, in the sense that most of the meaning created comes from having the right triples and linkages in the first place (if the data is poorly structured or poorly linked, Linked Data will just turn garbage into more garbage), and then (2) most of the meaning added on top of that is derived from creating the right filters using SPARQL for instance, and SPARQL still needs to be programmed, which requires humans or other extraction algorithms, something that by any definition is not Linked Data. Linked Data just gives us better tools. Like a hammer does not by itself assemble a bookcase, Linked Data does not create meaning, it just makes it easier for the same technologies we already use to create it: usually, human programming and inputs, and text analysis algorithms, based on taxonomies, natural language processing, statistical methods, and other approaches.

In other words, most of the meaning is created by structuring unstructured data, and the rest is created by programming the right algorithms to process and filter data. None of that is Linked Data. If you still think Linked Data does help create some meaning, I won’t disagree, since it’s probably related to your definition of meaning creation differing from mine. But I maintain that the main contribution of Linked Data lies in encouraging us and making it easier to add meaning by opening up the data and linking it across, and then enabling processing on all those granular bits. That’s why I tagged it with “Open” in my graphic representation, and I tagged Data Structuring with “Smart”.

One more thing on Linked Data. I reserve my judgment as to whether it will and ought to become the dominant medium to carry information going forward. While it has made great progress in the past year, I have not seen it being adopted by new start-ups in the semantic space, while at this stage I would expect it to have been. Tom Tague agreed during our last podcast. Ian Davis of Talis also pointed that out in his article Where Are All The RDF-based Semantic Web Apps?  I would like to see counterexamples of this, so please fire up if you know of successful start-ups that are leveraging RDF, OWL and URIs. And I mean beyond Linked Data hosting platform play (such as Talis for instance), since it is not an application play.

Trying to answer my own question, I came across Ivan Herman’s Use Case and Case Study collection page, which is referenced by some readers, but I couldn’t find any live application, which makes it hard to assess the performance of Linked Data. His presentation on Applications is also interesting, but the examples are mostly not accessible, and those which are, are not always compelling, such as Twine, which actually shied away from using RDF to store data some time ago, I believe. Havind said that, there seems to be some concrete examples in the latest issue of Nodalities, e.g. O’Reilly’s use of Linked Data, which I will take a closer look at... soon!

I suspect this relative lack of start-up adoption is due to RDF being quite bulky as an information format, as I discussed in part 2, and thus requiring extensive processing. As such, I am not yet convinced it is destined to become the universal way to model data in a semantic web world. But so far, just like Democracy, it’s also the “least worst” for opening up your data, and many people are working on improving it. There was a seminar recently organized by Franz, a leading player in the space, on Solving Scale and Reasoning in Large RDF Datasets, for instance. As we well know by now, formats can win thanks to network effects, even without being initially the best technological option (at least in the short term, till something ten times better come along, the rule-of-thumb says). No wonder some of the Linked Data supporters are so adamant about pushing Linked Data as the universal format for the Semantic Web… Did Darwin ever consider network effects as an advantage of one species over another?

So while there is no way to state conclusively whether Linked Data is worth its weight in R&D funding (that's the temporary conclusion of my Cost-Benefit Analysis of Linked Data, for those keeping track), it clearly hinges on its ability to deliver a more granular, more flexible experience, that so far has proved a little elusive due to performance question marks, on the one hand, which I have little doubt can be solved, and lack of data sources, on the other, which is the tough part of the Semantic Web, and one that Data Structuring technologies will help resolve.


The Surprise Guest: Text Analysis and Other Data Structuring Methods

Let’s say we have the medium, what do we feed it now? This is the real problem behind the Semantic Web and Linked Data. It’s good to have a better way to carry water from a river and distribute it to all the other villages, but if that water is polluted, it doesn’t help that much. As I alluded to earlier, Garbage In, Garbage Out


“Garbage” cover

Image via Wikipedia


By that I do not mean to diminish the huge achievement that Linked Data represents in any way. Certainly, where the data is clean and structured, and it is in many places, it makes sense to have a better technology to link it out. And where it’s not, the availability of such a channel is a new incentive to decontaminate that "water". In fact, it’s the best incentive we’ve ever had.

But what I am pointing out is that the Linked Data format does not contain that Brita filter to select and clean the “water” it’s carrying, data. Neither does it have a mechanism to automatically sort the water in the right bottle sizes depending on village sizes, weather and other conditions (it offers us parts to do it, we still need to put those together in a clever fashion). It does not create smart data, it only enables it. And when noise-to-signal ratio is undeniably the biggest problem we face today on the web, I think spending more investment to tackle the problem of moving data from unstructured and dumb to structured and smart in a coordinated fashion, just like the coordinated effort that has been deployed for Linked Data, would be a wise investment. Without it, the ROI for Linked Data will remain invariably negative, because it will have to rely on existing sources of clean, granular, structured data, which are only a portion of all the information we exchange and create daily. It will be polluted by the other data streams, and likely add noise rather than reduce it.

So, technologies to turn unstructured data into structured data is really where we ought to invest, and focus our efforts. The good thing about Linked Data is that, if it manages to impose itself as a key medium for the semantic web, it will increasingly expose the limitations of our data analysis technologies.

What’s the endgame for the Semantic Web? I’d propose that it is a web where any information you input is immediately cleaned up, pre-structured and pre-connected to the rest. There is a variant of this vision that would see any information input remaining in its raw format until one needs it, at which point it is structured and connected on the fly, using the perspective of the person who queried to shape the structure and the connections. The problem with this vision obviously is that unless you have scouting agents that can query the whole web instantaneously for every query, and structure and link data on the fly -- and I think we can safely say that’s not going to happen anytime soon -- you need some pre-defined structure and connections so you know what the information is about and where it is. We need to meet those agents half-way. How much you process data ahead of time depends on the information’s intended use, type, importance, and other factor. But ultimately, it needs some level of structuring to be smarter. Linked Data is the medium, a medium that will be fed through Data Structuring, and a medium that will motivate us to invest further in those technologies, and fulfill much more of their potential at last.

If you’re interested in text analysis, I invite you to listen to the Interviews with Innovators podcast by Jon Udell with Seth Grimes, the Founder of a company called Alta Plana, on Business Analytics. Absolutely mind-opening.

The need for more effective data structuring technologies is nothing new, but this time, whoever gets it right and ties it together in a nice application that leverages the open channel created by Linked Data (if it can be made to scale to web proportions), could well be on its way to dominating a new sort of web. The key is to turn unstructured data into triples that make sense. By itself, Linked Data is unlikely to be a source of blockbuster application: but the newly-created ability to link data in more flexible ways will act as an echo chamber to other technologies, giving them a much larger market and amplifying their success. If you want to be successful, consider mashing up Linked Data with other technologies.

For now, the best is yet to come for Web 3.0, the Semantic Web, Linked Data and Data Structuring technologies: once again, Tim Berners-Lee was ahead of the curve when he said the Semantic Web was open for business. Let’s just say the Semantic Web is open, and business is welcome.

Reblog this post [with Zemanta]

May 11, 2009

Tying Web 3.0, the Semantic Web and Linked Data Together - Part 2/3: Linked Data is a Medium

If you’ve liked my last post, you should take a look at Kevin Kelly’s video at TED, about the first and the next 5,000 days of the Web, which a friend pointed out to me. “Smarter” is Kevin’s first tag to describe the next web, and given the timescale he chooses, it is a safe bet. In the short term, ubiquitous (Kevin’s second or third tag) is probably more likely to describe the Web.

Below, I am pursuing my exploration of the interaction between Web 3.0, the Semantic Web and Linked Data. I shared my thoughts on Web 3.0 in the previous post, so now let's tackle the Semantic Web, and what it will take for it to really happen.

Semantic Web

What is the semantic web? Here again I’ll refer to a post I wrote not too long ago, in which I wrote this is “a web in which machines get the meaning of information and use that understanding to transform/organize/synthesize data intelligently on our behalf.” Definition varies, but overall I think we all agree that the Semantic Web is an attempt at enabling machines to better understand and transform data. This is the overarching Objective, with a big O, of the semantic web.

In a world with a working Semantic Web, I should not only be able to know without launching a full web expedition, which Chinese restaurant in a 5-mile/km radius carries Peking Duck, but also to aggregate and filter information from various subprime real estate lenders by region and map that against mortgage default rates and lenders' pools of debt by risk level in a snap. That type of easy data transformation could help avoid a financial crisis of gigantic proportions, which, some would argue, is a handy benefit worth its weight in trillions of dollars.

Because I think the next step, the How, is usually where we get lost and diverge, now I’d like to decompose things about the Semantic Web a little further, while hopefully keeping it simple.. . So I’ll propose that the Semantic Web will really be enabled by two very different things:

  • Linked Data (or other formats embedding links at the data level)
  • Text Analysis and other technologies to structure data

I explain why I think that below.

But first, a bird’s eye view of the whole Web 3.0 landscape, which should help summarize my perspective on this space. Double-click it for a larger version.

20090511-Web30-BirsdeyeView

Linked Data

At the beginning there was unstructured data. And then men (women too, but mostly men, in their usual thirst for an edge over the competing tribes) decided that structuring it made it easier to find it, read it, and exchange it. So they structured data, created formats, lists, tables, agreed on standards and ultimately stumbled upon a key discovery: the relational database, or RDBMS for short, based on the relational model. That great approach to structuring things opened the door to a whole new world: a database-powered world.

The problem with RDBMSs is that, for all their power and flexibility, they require you to create your tables and decide how they are interlinked before you have populated them, and often before you actually know all you’ll do with them. It’s like setting the walls of a house prior to inhabiting it: it makes complete sense only until you learn that your in-laws are moving in with you. At that point, aside from that urge to run away, you would love to be capable of reconfiguring the house (with a big wall in the middle, preferably). And that’s where it gets tricky, because you need to move the content out, hire contractors or do it yourself if you’re that kind, and then put it all back into the new walls. And that filthy yellow sofa you’ve had since grade 8 doesn’t quite fit in the new place anymore...

Another problem with RDBMS is that everyone can define their tables one way or the other and, in fact, that’s what they do. In the absence of a meta-language to tell the machines what is contained in those RDBMS tables and how it all ties together, it’s virtually impossible for them to make two different databases speak with each other. Not to mention billions of them.

So in a way, the RDBMS model is too constraining and too structured for many applications. That is one of the reasons why most of the data out there remains unstructured (I read somewhere that unstructured data represented over 95% of the data exchanged daily – which sounds about right, in a wrong kind of way of course). It’s not modular, not elastic enough. Now think of something that would be.

Enter Linked Data. What Linked Data really does is breaking the walls of the RDBMS and offering a semi-structured way to create structured information. In some ways, it bridges the gap between unstructured data and structured data (RDBMS and others). It does that by using RDF, which embeds the linking directly into the data. With RDF, each concept becomes an association of 3 tags, each with a role: subject-predicate (verb)-object. The subject and object are two entities and the predicate is how they relate to each other.

For geekier readers, let me add this thought (other readers can jump 2 paragraphs down): in a sense, what RDBMS was doing was compressing RDF by removing duplicates of predicates. In a single RDBMS table, all entries in each column are linked to the other columns through the same predicates. If I have a type of pizza in the first column and its price in the second column, we know those two characteristics are all linked by a unique predicate such as “price of”. And if price is stored in another table, those two tables can be linked by the same predicate once and for all. No need to repeat that predicate a thousand times in my data store. Yet, that’s what RDF does, I believe (someone at some point will propose mechanisms to compress it out if they haven’t yet, but then we get back to a less machine-readable design!). This way, if you’re linking from outside to a specific pizza type, the information from just that pizza type comes embedded with a way to access its price too. Obviously, RDF comes at a price, since you now have all this duplicated information to store and process every time. That’s why we are seeing a lot of focus on scalability and processing cost in the industry.

In the Nov/Dec 2008 issue of Nodalities magazine (this is a link to the text version, but the magazine contains illustrations I highly recommend you take a look at), Bill Roberts of Swirrl provides a great example of the same information structured in an RDBMS and in RDF. I owe a grateful word to Bill and Talis, as this greatly clarified the relationship between those two relational models for me. End of the geekiest part of this post.

In sum, Linked Data offers a new way to establish linkages at the data level, as opposed to the document level we are mainly used to, and it does that in a more flexible manner than relational databases, which already tied data at the data scale in a pre-defined manner.

To complete the picture, Linked Data also introduces universal pointers in the form of URIs. Those are public addresses that all instances of a similar concept on the web can point to, so that machine can infer the meaning in your document based on what they already know about that address. It also enables indexes to tie together all those related instances, since they all point to the same node. In theory. In practice there is no such thing as really-universal URIs yet, although DBpedia is probably the best-known repository of URIs. Building on my previous post about Information Overload, I suspect we are also going to need much better filters to reduce the linking noise once URIs become more mainstream. But that's another story.

The key idea of this post is that Linked Data offers a new medium to link structured data that is then more machine-readable. It does not by itself add any semantic meaning to the information, but it better carries that semantic information once you have it. So, while Linked Data is not semantic, creating links at the data level paves the way to a true Semantic Web.

(this post continues in part 3, to be published on Wednesday 13)

Reblog this post [with Zemanta]

May 07, 2009

Tying Web 3.0, the Semantic Web and Linked Data Together --- Part 1/3: Web 3.0 Will Not Solve Information Overload

Over the past few weeks I have tried to dig deeper into the different concepts of semantic web, linked data, and web 3.0, to develop a better understanding of whether it matters, why, and what it all means from a web user angle. That led me to review recent articles in Nodalities magazine, attend discussions on Taxonomies, and talk to new startups. The Web 3.0 and Semantic Conferences are coming up and I thought that would also help me not to look too idiotic at the "Idiots’ Guide to the Semantic Web and Web 3.0" panel I’ll be participating in.........

One of the focuses of my quest was to try and assess whether, as Tim Berners-Lee put it over a year ago in an interview by Paul Miller, the Semantic Web is “open for business”. Another related goal was to try and compare the cost and benefits of Linked Data, an important component of the Semantic Web as most would agree (where people differ is whether it is a requisite or not). I’ll tackle that across this series of posts.

Now, I know there have been many attempts to define the terms Semantic Web, Linked Data, and Web 3.0, a few of which I have been a part of. I think all of those are worthwhile, especially when they get more people interested in the subject (thereby increasing our collective ability to shape and define those terms in a virtuous loop!). In my eyes of technology pollinizer for businesses and evangelist for the smarter web we all aspire to, the best definitions are the simplest ones. And it’s always worth clarifying what we mean by those terms prior to talking extensively about them. So here is yet another crack at it, one that is intended to point out new paths for reflection and discussion.

Broadly speaking, some think Web 3.0 = Semantic Web = Linked Data, and others think those concepts are more like Russian Dolls. Ah, those Russians… they always have to mess things up…

Well, I guess I must be a little Russian, so here it goes, from general to more particular, attempts at defining Web 3.0, Semantic Web, and Linked Data, how they all fit together and what they are about:

Web 3.0

I took a crack at defining this last year prior to the web 3.0 conference in this post (obviously, the ironical title was intended to point out that this would likely be work-in-progress for a long time to come!), which I invite you to read. My take on it tends to be broader than web 3.0 as almost a synonym for the semantic web. Although I agree about the game-changing aspect of a more atomic web in which machines can interpret and link information at a more granular level, it’s doubtful that this alone will drive the web 3.0 transformation. High-speed mobile and localization technologies such as 3G and HSDPA, cheap GPS sensors and GPRS, data recognition and conversion technologies such as voice or image pattern recognition, and cloud computing, are all game-changing as well, and together they will help define the web for the 3-5 years to come.

I hope that “Smarter” is going to be a key tag for the Web 3.0, and yet I think “More Open, More Ubiquitous, with even More Information (Overload) and a little Smarter” is what it’s really going to be. We’ll have to wait till “Web 4.0” for a web that really is stepwise more intelligent, one that could really be called semantic and hold the hidden promises of a “Semantic Web”. And the reason I believe this is that the community is focused on linking more stuff together in new ways and breaking down data siloes, much more than it is focused on creating new, smarter filters for all the data that’s going to be made accessible that way.

This is partially due to the sequence of things to be done to enable a smart web – or perceived as needed to be done (as I suggested previously, the debate is open on Linked Data): first, create and disseminate a working standard to model data at a more granular level. That’s what Linked Data is trying to do. It has gathered a lot of traction and is moving fast towards increasing adoption, but I am convinced it requires a few more years to become really mature and perform at the level required for true mainstream adoption.

I am only guessing, but it looks like it may also be partially due to the superior attraction of the IT community for aggregation and integration as opposed to filtering, analyzing and transforming data. For one thing, aggregating different sources is easier than developing advanced algorithms to, say, process unstructured data. We are already seeing the result of that everywhere, in what is hard to describe without using the word ‘messy’: Facebook status on Twitter and vice-versa, Wordpress blog updates on LinkedIn, Twitter-on-Twine and Twine-on-Twitter, Plaxo-on-that, RSS-on-this, and it’s all becoming a big soup as each app shows what each other app is doing and differentiation lines are blurring. This looks like a lose-lose competitive war in which every app tries to establish itself as the unique platform and does nothing well anymore as a result.


Don't-overload-your-trailer

Is that really how the Web experience is supposed to feel? (Image via Wikipedia)


Following less than 150 persons on Twitter, I already have a hard time keeping up. The most successful tool? TweetDeck. Why? Because it gives you a dashboard that allows you to filter things. And yet by all accounts, Tweetdeck lags behind our filtering needs. Twitter is in the best position to improve its interface and allow more powerful filtering soon, but if it doesn’t, I predict that an app like an improved Tweetdeck, i.e. a better filter, will steal its limelight. Just like Google became the doorman to the web because it filtered things better than others. Filtering, not aggregating, is where the money is. Not more, just smarter.

I’m sure many will disagree with that view, but I persist and sign: unfortunately, Web 3.0 is not going to solve information overload. At least not until it graduates into its next avatar. Things are going to get worse before they get better… But better they will get. And since the Web x.0 series is starting to sound a little tired, if it wasn’t to start with, I’d suggest calling that smarter, post-web 3.0 web, “Da Web”. Much better sounding already, don't you think?

(this post continues in part 2, to be published on Monday 11, and part 3, on Wednesday 13)

Reblog this post [with Zemanta]

April 26, 2009

Discount Code for Semantic Technologies Conference

I just realized that, in addition to the discount code for the Web 3.0 Conference in New York next month, which I already shared in my previous post, I also have a discount code for the Semantic Technology Conference taking place in San Jose, California, June 14-18, 2009. As a speaker, I have been authorized to share a coupon with you for up to $200 off your registration fees. Details on the conference and registration are at www.semantic-conference.com. To receive this discount, register by May 29, 2009 and use the COUPON CODE: ST9SPKR.

April 25, 2009

Don't Miss: Semantic Web Gang, Web 3.0 Conference, Semantic Technology Conference

I am finally back "on the air" after a long period of intensive work for clients of my consulting practice. My blog didn't miss this chance to give me the usual guilt trip in return. Those things take a life of their own...

Things happened nonetheless in the semantic web space:

  • On the business side of the web 3.0 and the semantic web, I had a number discussions with a few start-ups in the field. A generation of companies is growing and coming up with mind-blowing technologies focused on solving real problems, ranging from detecting the broader interests of an advertising audience, to enabling intelligent platforms to manage complex insurance businesses, and to tagging online content and enabling more intelligent sharing and product recommendations. [Plug alert!] I look forward to helping them fine-tune their market approach, scale their operations and revenue, and raise funding where needed. A current client of mine received an excellent term sheet from a VC, so there is still money around for start-ups with a good business story. I helped them secure it by running a full market program for them (now in phase 2) involving a number of components (see my services at www.growthroute.com) and helping them present a coherent and well-supported story on the revenue side [end of plug!]
  • We recorded two podcasts with Paul Miller and the rest of the Semantic Web Gang. The March edition didn't make it due to recording quality issues (being scattered on two continents and often using VoiP doesn't help...). To make up for it, the April recording is available here, and it rocks! 
    We had almost the whole gang present, and I can honestly say this has been one of the best discussions I've had with it. We raised a couple of points I have long wondered about:
    • Why do we create a semantic web that relies so much on human-generated ontologies (you can double-click that word for a definition)? Those ontologies are destined to become rapidly obsolete, and obtaining agreement on them is a huge investment with an uncertain ROI.
      My suggestion to entrepreneurs: work on a way to facilitate the creation of ontologies, using both folksonomies and text analysis to build hierarchies of concepts. Start small, with baby steps, and progressively increase the level of automation. Given the wide cottage industry in the human creation of ontologies, such tools should be attractive to this immediate market. And over time, they will get to replace the industry all together. I may be optimistic, but it doesn't strike me as an impossible problem to solve.
    • Which successful start-up in the semantic web is actually built on Linked Data, or uses it extensively? Tom Tague mentioned he doesn't know of any, and neither do I. That raises doubts in my mind as to the commercial potential of this technology. It may be that it's just an enabler, with no direct financial benefit, but then where are the start-ups taking advantage of the Linked Data and even semantic stack technologies?
      During the podcast, I promised a Cost-Benefit Overview of Linked Data and the Semantic Stack on this blog. Patience. I am putting the finishing touch to it and that will be my next post. Feel free to already send me your thoughts on the subject, I'd like to integrate your views in my post.
  • The second volume of the Web 3.0 conference is on, and it will take place in New York next month. I will participate to the opening panel on Tuesday, May 19, 2009 at 10am, nicely entitled the "Idiots' Guide to the Semantic Web and Web 3.0". We need more introductory sessions like this, and we need them to focus on what we are trying to achieve with the semantic web, as opposed to the nuts and bolts which is often what this community is obsessed with, because that's where it spends its time. And vice-versa.
    I have a promotional code for anyone looking to attend the conference: SPKRW3GB will give you a 15% discount on the registration. And if you sign up before April 29 you'll receive that on top of the early bird rate.
  • A month after that, I will be at the Semantic Technology Conference in San Jose, to attend what is possibly the largest conference in the space. On June 17, I will discuss Business Models and Market Opportunities for Semantic Start-Ups with Fraser Kelton of AdaptiveBlue. Our panel unfortunately takes place at the same time as the respective sessions of Paul Miller and Peter Mika, which should be terrific (too!). I'd encourage everyone to go back and forth between those!

That's it for now, but I'll be back soon with my thoughts on the cost and benefits of linked data and the semantic web. And I hope to see you in NY and San Jose soon. Feel free to contact me at gregboutin [at] growthroute.com to arrange discussions.

Reblog this post [with Zemanta]

Twitter Updates

    follow me on Twitter
    My Photo

    About me

    • I am Greg Boutin, founder of Growthroute Ventures. Acting as an outsourced executive, I help tech ventures develop solutions, go to market, sell, scale and raise their investor appeal and valuation. Managing information is a top interest for me, I am featured monthly on the semantic web gang podcasts, speak at events like the web 3.0 conference, write articles, and always work on a start-up concept or two.
    Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

    Blogroll

    Enter your email address:

    Delivered by FeedBurner