In this episode have interviewd Dixon Jones of MajesticSEO for OMReport.com during SES New York City in March 2011
This Podcast is also available on iTunes and on Youtube. You can find the transcription in the post details.
Alpar: Okay, so we are here with Dixon Jones. Dixon, what are you doing?
Jones: Hi I am the marketing director of Majestic SEO, which is a large link database. So we have about two point, three point two trillion links at the moment. We crawl the internet, about half a billion links every single day and we track the link information, the back link information into people’s websites.
Alpar: Okay, why would people need a paid tool if they have Yahoo?
Jones: Well, Yahoo own, well apart from the fact that Yahoo maybe on its way out, what with the Microsoft deal, they also only give you a thousand back links in their data set, maximum, even if they have more, but they don’t have very many actual links anyway. Majestic…..
Alpar: How will you know, I mean you didn’t…
Jones: Well, you can compare, if you compare back links for sites that have less than a thousand links for example, Then we would generally pick up four or five times as many links as Yahoo, and of course, we don’t make the decision for you as to which links are good and which ones are bad but we provide all the information.
Alpar: Do you think Yahoo does that?
Jones: Well, I think Yahoo, I don’t know what Yahoo is doing now, I can’t speak for Yahoo, but I do know that Yahoo’s site explorer is not the tool it used to be and so now you really, you really should be looking at using our tool or linkscape’s tool and on both counts, you know, we have a much larger data base than anybody else.
Alpar: Linkscape and the SEOMoz stuff.
Jones: SEOMoz yeah, they have some cool tools too.
Alpar: And do you know how big their data base is compared to yours?
Jones: Yes. Its, they kind of use slightly different measurements, but they have, we think, a two hundred billion URLs or something like that to our three trillion links to our three trillion links, so I mean there is a big difference. We think there is a big difference, yeah, where we are similar…
Alpar: They also seem to have quite…. Their database … but it is obviously not open, right?
Jones: It is still small in comparison so, size is one thing and freshness is the second thing, so we have a smaller data base, round about the same sort of size as Linkscape’s and Linkscape’s is quite a lot bigger than Blackhole’s from what we can establish. We think that Linkscape has much more than Blackhole. We have one now that is a similar sort of size to Linkscape which we call our Fresh Index, and we update that information every day, which is kind of cool for people that are really sort of wanting to find the new links to a competitor’s website or even to their own site, but also it could be used for media mentions and that sort of thing. If somebody suddenly talks about your website in the back end of Uganda or somewhere like that, because you have had a power plant failure or something there and then you could use it for that sort of information.
Alpar: But your tool is really focused on the Backlinks, right? I mean it’s not like you could do anything with the tools, they kind of, you said you made a decision and focused it on this area?
Jones: Oh yeah, absolutely, we are not trying to be everything to all people. We really have an industrial crawl and a really really high level crawl and we just look at the back link data this time. Not just Backlink data, also we do pick up the titles of those pages and also the anchor tags which is all part of the link anyway. What we don’t do is any kind of rankings checking or decision-making as to where you should get your next link from.
Alpar: This will always mean that people need something additional to your tool?
Alpar: So, what do they mostly use? Do you have an insight …
Jones: We have, absolutely, so we would then, we have an industrial strength API which then goes into those SEO suites. So, for example, Raven tools or Wordtracker or Analytics SEO.com, all of those have access to our data on an industrial level, so that they can then take the information and augment it with other bits of information. So they will have …
Alpar: So you kind of focus on a niche and then others incorporate with generated …
Jones: Because It would be madness for all those companies to all try and collect the same amount of data That we are collecting.
Alpar: I assume that this is because you crawl and you don’t crawl Google, but you crawl the websites themselves, you don’t have the problems that probably people who monitor a lot have then because …
Jones: it’s a huge number of people, yeah, if you are making like a fifty dollar
Alpar: You need a lot of IPs and so on
Jones: Yeah, if you are making a little bit of software that is trying to sort of rip it off of Yahoo itself or ripping out of Google or some other search engine somewhere. You are trying to take short cuts. We are really crawling every single web page on the internet. Well not every single one but you know the vast majority.
Alpar: So, how did you become that SEO niche tool builder?
Jones: Well, we started out by looking at being a distributing crawler, which is, so what we did was we found hundreds of people round the world that were interested….
Alpar: So, were you scraping websites and creating like a spammy content?
Jones: No, no no no.
Alpar: or why were you crawling in the web?
Jones: No, no no. So, we were trying to create a search engine without building all the infrastructure that is needed for a search engine. So what we got was hundreds of people around the world who were doing the crawling on their computers, and on their small ISPs and that sort of stuff. So they were using the spare bandwidth and that still happens to this day although we now have our own crawlers.
Alpar: Why would they let you crawl from there?
Jones: so they would crawl from there because they have an incentive. They have a whole game going on, who can crawl the most pages out of their team mates. That’s a different side of the equation.
Alpar: It looks like a historic thing.
Jones: Yeah, so, but it meant that we were getting so much data we found that we couldn’t store it all, so we couldn’t make a search engine out of Alex’s front room when it started, you know (laughs) But we had created the crawl so now we can crawl as aggressively as Google and if that was a paying point we could do more so we decided to take what we thought was the most valuable part, which was the link information because that’s so big, you just can’t find endings, all your links into a website unless you crawl the whole of the internet to find those links. So it’s the hardest piece for anybody to build.
Alpar: Can you discriminate between, I mean sometimes there is like some scrapers sites that take bits of your content and put a link and there is like millions of those?
Alpar: Can you separate those, like algorithmically from the rest that has like a little bit more legitimate content, a little bit more legitimate links, is it (really?) hard to do?
Jones: At this particular moment we use a metric called AC rank which is a very poor man’s page rank. All it is…..
Alpar: Developed by yourself?
Jones: Yeah. We call it a citation rank and it is just a representation of how many web pages link into the page that links, so…
Alpar: Like the raw number of links, not something like domain …
Jones: It’s not, it’s not, yeah, sorry, its number of referring domains that link into a page.
Alpar: Okay, so its domain popularity?
Jones: Yeah, domain popularity and it’s an algorithm.
Alpar: Is it like linear or like algo …
Jones: It’s a log scale, so its log scales going for no apparent reason going from nought to fifteen. I don’t know why from nought to fifteen.
Alpar: But you set it up so you must know?
Jones: hey, I didn’t set it up, Alex set it up, and I don’t think it was his basically (laughs) He didn’t want to make it, he didn’t want people to get confused with page rank, so it’s not the same measurement. But we should have done a percentage or something like that. Anyway, we have now this, this….
Alpar: Don’t worry, its historic, you just invent the next….
Alpar: The next Dixon rank or something then…
Jones: Yeah we just have a little…
Alpar: it will go from zero to ten.
Jones: Absolutely, yeah, but it doesn’t give a good representation of spam on a page, it just measures domain links so it can be manipulated and you can find things that are poor, but less so in the fresh index, because all of those kind of tactics were set up years ago, and now our crawl is much more intelligent. We used to crawl just every single page and now every day we crawl the important pages and so the BBC are not going to link to a spammy site.
Alpar: So you decided more on a trust-rank, like from a root source where you will crawl with priority or do you indeed try to find like the … where it starts and ends and then stop there.
Jones: We do crawl pretty deep but now we are more intelligent. So what we will do is, we will go back to the important pages every hour and start again. Because those pages are continually generating content and then less important pages every three hours and then every day, every week and every month, whatever. So, it just means that we have got some logic in there of which sites we think are going to be more important. And what we have done then is, we are actually crawling slightly less pages, but we are finding more links as a result, and better links, so better quality.
Alpar: And who are mostly your clients? Are they in the US? Are they in the UK or are they in continental Europe?
Jones: Ah, very world-wide. The UK, we are a UK-based company so we have a large UK following, which is great. Obviously here in the US we are getting a bit of a name for ourselves but we are picking up clients….
Alpar: It’s a first time for you in the conference so …
Jones: no no, I have talked….
Alpar: with a booth.
Jones: Yeah with a booth, I have talked in the conference way before…
Alpar: So, how is the experience having a booth in the conference?
Jones: Its hard work. I thought having the booth was the easy way to do it now….
Alpar: Is it more people approaching, are they asking more?
Jones: No, it’s more that you just have to stand up all day (laughs)
Alpar: So you never had a booth at all before?
Jones: No, no no no.
Alpar: Oh, okay, so it the first time with a booth and then there is the SES in New York?
Jones: Yeah I mean it’s not quite true, we took a, had a booth earlier on this year in Israel and also at Think Visibility in London so we got a couple of little practise ones there before having the major one here. But, I mean it definitely works you have got to be in the right audience, but talking for me has been a very very powerful way of getting out to the market, and speaking at events, that was way before Majestic came along, so, yeah.
Alpar: So, how do you think that your tool is mostly used, I mean, do people really use it as an end user, like as an SEO or are they mostly aggregating data into different tools or are they mostly agencies that they have their own tools? Do you have insight into that?
Jones: We have got three or four different types of people and what we haven’t got really is the Mom and Pop business that has a restaurant and they need to get links and stuff. They would be either pretty dedicated or pretty mad to use our tools. They would go to one of our data suppliers or one of our reseller suppliers, so like Raven Tools or Linkdex, or one of those kind of people, and that would be a better fit for them because it …
Alpar: Linkdex is what?
Jones: Linkdex is a UK company that also…
Alpar: Also DIYSEO I guess.
Jones: Like that, like that sort of thing, yeah, and so our, we have the reseller API people that are building tools, we then have, I’d say the next big thing is hard core SEOs. The kind of people who have conference pass ticket, you know and that whether they are in-house or whether they are agency, they would use our data, and you know, more and more we are just trying to convince them that we are the best data set. We know we are, we just got to just let everybody else know that (laughs) it’s my marketing that is coming out.
Alpar: Do you think there are probably other big datasets that you are not aware of?
Jones: Well there is, I mean there is, there is Sistrix, there’s Linkscape; there is also Eric Ward who is doing some stuff. I am not quite sure what he is doing though, but he hasn’t got very large…
Alpar: You probably you know if somebody pops up, I’d say, Ahhh, they have got a trillion back links too, out of nothing.
Jones: Out of the blue, I’d be highly suspicious, I would be highly suspicious. Because they would have, because they would have, in order to get that without us noticing, without the world noticing, they would have had to have done it without revealing their crawler agent.
Alpar: Can you crawl and be blocked?
Jones: Yeah, absolutely, we obey rules so, absolutely yes.
Alpar: You do?
Jones: So absolutely yes, so…
Alpar: do you think that is an advantage if you are a link database?
Jones: I think, I think …
Alpar: Or is it just out of moral reasons …
Jones: I think it is a moral reason, you want to obey … We want to do this job properly, we don’t want anybody to have arguments, in fact we have the best identification of crawler of any of the crawlers out there. Because we use distributed crawlers people can come to our site, well our boss can come from any IP number, we don’t know what IP number they are coming from, so in order to…
Alpar: (So they’re) from a cloud or something?
Jones: Which means that people can fake our bot and they can come from an IP and pretend to be our bot, so what we have is, if you go onto the majestic site and get the free account and use the API key we can give you a handshake basically to prove whether or not it’s the right bot…
Alpar: But then again, Just as a play of thoughts I would never of course do such things but imagine I was somebody, you know earning out of getting people links, then I can hide away all my sites, and then I am not endangered by competitors of my clients, of them finding my links, is that correct?
Jones: Kind of but of course just because you have blocked all your sites, doesn’t mean to say we don’t know all the links into your sites. We can see every single link into your websites, so…
Alpar: But if I blocked, let’s say site A is getting the links, and site A’s competitor wants to analyze site A’s links, I am site B and I have blocked your bot via robot, you cannot see the links from site B to site A, right? So a part of the competitor links that are blocked by robots, those cannot be analyzed.
Jones: Yeah, that’s true
Alpar: Do you know how many people block you; do you have some kind of experience? Does anybody care at all? Is it just theoretical?
Jones: We do obviously make a note of the robots. There are a number of people using robots.txt for …but the people stopping us is very small at the moment.
Alpar: Okay. So, did you try to analyse those cases, that you know, why do they hate me, why do they block Majestic SEO, what have I done to them?
Jones: It’s usually a lone person out there that’s selling links and stuff? so there’s not many people out there that would object to the bot coming in. If it became an industrial problem then we would have to look at it, but …
Alpar: And if you think, because if the value of the bot is diminished because everybody blocks Majestic, or is it a chain of thoughts which we should not continue right now? (Laughs) Sorry.
Jones: Thanks. (Laughs)
Alpar: Forget about it. So, do you do anything else besides Majestic and hanging around conferences?
Jones: Yeah, no I do.
Alpar: Do you have a life?
Jones: (laughs) yeah I have a life. I have a wife and kids, yeah, absolutely
Alpar: Do you have business? Business besides Majestic?
Jones: Yeah I have got a consultancy in the UK which I don’t get as involved in as I used to, but we have … Yeah, we have got fifteen or twenty employees Receptional which is a consultancy in the UK. It’s been there for ten years
Alpar: What’s it called again, I am sorry?
Jones: Receptional, which has been going for ten eleven years now, so that’s pretty active.
Alpar: So you jump between Majestic and Receptional all the time?
Alpar: or do they work in the same area?
Jones: No I work mostly from my office in Receptional but three, say I’d say sixty seventy percent of my time is spent doing Majestic these days, so the vast majority of my time, and then probably one of the other days is just doing administration for Majestic.
Alpar: So what’s a good thing about the agency, and what’s a good thing about the tools which you, why do you prefer which, I mean what’s, let’s focus on the good aspects
Jones: Yeah, of Majestic?
Jones: Yeah well, I think…
Alpar: Compared to the work today
Jones: the amount of crawl data that we have managed to do is just technologically an incredible feat.
Alpar: So who was the brains behind the technology
Jones: A guy called Alex Chudnovsky and he is a bright guy, I have to say, he is really really clever. Pain in the neck sometimes, geniuses always are, but he’s done a really really good job with the technology and what we have done is manage to create scale that is really hard to copy. It’s funny, you know, you think you start with a big database system, but as soon as you get to (two/?) three hundred billion records, all of a sudden it becomes very difficult to maintain and manage on industrial systems, so we had to build all the database structures from scratch so we could scale.
Alpar: Let’s just finish off with one last question because I know you have to rush off to your next appointment. What do you think of SEOs in Germany? Or do you…
Jones: I think they are quite clever. (laughs) Yeah, they are far too clever. And I think Sistrix is a great tool as well, so I think that’s good. I know that they are going international but I think they should just stick to the German market (laughs)
Alpar: Great. Thanks a lot for the interview
Jones: Thanks so much for your time.