Matt Cutts Video Transcription – Session 15: Data Center Comments

Filed Under video transcriptions · Tagged:  

Session 15: Data Center Comments
**************************
————————————————————————————-
http://www.mattcutts.com/blog/video-datacenter-comments/ http://video.google.com/videoplay?docid=-4814548594071648913
————————————————————————————-

Hey Everybody! Good to see you again!

I thought I will talk about datacenter updates, what to expect for the next few weeks in Google and stuff like that this time.

But before I do, I didn’t get to talk about fun schwag from the Search Engine Strategies conference. One of my favorites, check it out (holds up a a hat), its a white hat. Oooh! It got SEO in hidden text. Don’t say SEOs don’t have a sense of humor. (puts the cap away).

I thought this one was kind of fun (holding a picture), picture of Jake Bailey(?) fake autograph there and here I got a  real autograph. Infact I got several of them. Oh Yes. What can I do with lots of pictures of Jake Bailey? May be I can sell them and do some arbitrage or something like like that.

Anyway!

Also there was at least one British SEO, who evidently wants to keep me from doing anything productive for a long long time. Check that out (holds up a stack of three voluminous books). That’s three thousand five hundred plus pages of science fiction. Huh. Yes. The funny thing is, in Briton, these three books are published as three books and in United States, they take these three books and publish them as nine books. What does that say about British readers versus American readers? Yes, that’s what I thought. So probably I don’t need this to the Webspam(?) team whoever needs some hard SEO, hard SciFi I should say.

OK! Data Center Updates.

So, There are always updates going on, you know, practically daily, if not daily, a small portion of our index is updated every day, not small portion but a pretty large fraction of our MTindex(??) is updated everyday as we crawl the web. We also have algorithms and data pushes that are going out on a less frequent basis. So, for example, there was a data push on June 27th, July27th and then on August 17th. And again, its an algorithm that’s running for over 1.5 years. If you seem to be caught in that, you are more likely to be reading on an SEO board. So, you might want to think about ways that you could back your site off, think less about what the SEOs on the board are saying and how you can sort of not be optimizing quite quite as much on your site. That’s about as much as the advise I can give, I am afraid!

BigDaddy was a software infrastructure upgrade and that upgrade was finished around in February. And so it was pretty much a refresh to how we crawled the web and how we partly index the web. That’s been done for several months and things have been working quite smoothly.

There was also a complete refresh or update of our supplementary results index infrastructure. That happened a couple of months after BigDaddy, So it is been done for a month or two and it was a complete rewrite. So the indexing infrastructure is different than our main indexing infrastructure. So, you expect to see few more issues whenever we roll that out. We saw, you know, more small, off the beaten path stuff, like minus or exclusion terms where you use the minus sign, the no index meta tag, stuff like that. And the way that the supplementary results worked with the main index, you would often see site:results estimates that were too high. There was at least one incident where there was a spammer that some people thought had 5 billion pages and whenever I looked into it, the total number of pages that their biggest domain had  under 50000 pages. So they have been adding up these site:estimates and ending up with a really big number, that was just way, way off.

So, one nice thing is we have another software infrastructure update, which improves quality as the main aspect but it also improves our site:results estimates as well. Its just sort of like a side benefit. I know that, that is not at all data centers in the sense that it can run in some experimental modes, but its not fully on at every data center. And, they were shooting for the end of the summer to have it live every where, but again, that’s a hope and not a promise. So, if things need more testing, they will work for longer to make sure that everything goes smoothly. And if everything goes great,then they might roll it out faster. But, that is a really nice infrastructure. Its just a side benefit that site:results estimates get more accurate.

Its kind of interesting, let me talk about it for a minute, because I saw at least one guy who had said, you know, “what happened with site:results estimates on Google” and he was comparing two completely different data center IP addresses and they were different and he was worried about that. And yet, he had exactly one page in Yahoo, he had no pages in Ask. If you look at his link page, there were a ton of links to pharmacy sites, not just one pharmacy site, but a lot of pharmacy sites.

And so, I would say,  your time, your focus, is better spent looking at your server logs, asking how to improve the quality of your own site and not worrying about something like site:results estimates.

So let me drill down some reasons, why that’s true.

Number one. They are estimates. We don’t claim that they are exact. In fact, if you look at them they are only exact to three significant digits. And we do that to give people an idea of how many results there are from a ’site:’ query. But, we don’t claim that that’s a 100% precise. And truthfully, I didn’t consider it very high priority. There was recently a change that was pushed out that made the plain old results estimates much more accurate for unigram or single word queries. And I spent about  half hour with the guy who did the change. And he even asked me, “well do you think its worth working on making the results estimates for site: more accurate?”.
And this was like 5,6 months ago,may be eve more. At that time I said, “No!pretty much  nobody  pays attention to those. You know, they look at their server logs, its not really a high priority”.  And its gotten to be where more people are asking about these things and I am sure we will pay more attention to it.

But, in general I would spend more time worrying about good content on your site, looking at your server logs to find out niches where you can make new pages and make things that are more relevant.

And you know, the whole notion of watching data centers is going to get harder and harder for individuals going forward, because number one  we have so much stuff launching in various ways. I have seen weekly once launchings where there are double digit number of things, and these are things that are under the hood. So, strictly quality. They are not changing the UI or anything like that. And so, if you are not making a specific search in  Russian or Chinese, you might not notice the difference. But it goes to show that we are always going to be rolling out different things and at different data centers you might have slightly different data.
The other reason why its not worth watching data centers is because there is an entire set of ip addresses and if you are a super-dooper gung-ho SEO, you’ll know, you know, oh, 72.2.14.whatever. But that IP address will typically go, to one data center. But that’s not a gaurantee. If that one data center comes out of rotation, we are going to do something else to it, we are going to actually change the hardware infrastructure. and everything I have been talking about so far is software infrastructure. So if you take that datacenter out of rotation for some reason, that ip address will then point to a completely different data center. So, the currency, the ability to really compare changes and talk to a fellow data center watcher and say, “What do you see at 72.2.14.whatever” is really pretty limited. So I would definitely encourage you to spend more time worrying about you know, the results you ?? for, increasing the quality of your content, looking for high quality people that you think should be linking to you and may not even know about it and stuff like that.

I just want to give people a little bit of update on where we were on various infrastructure and the fact of the matter is that we are always going to be working on improving our infrastructure, so you can never guarantee a ranking or a number 1 for any given term. You know, because, if we find out that we think we can improve quality by changing our algorithms or data or infrastructure or anything else, we are going to make that change. So the best SEOs in my experience are the ones that can adapt  and they would say,ok, if this is the way the algorithms look right now to me, and  if I want to make a good site that will do well in search engines, this is the direction I want to head in next.  And if you work on these sort of skills, then you don’t have to worry about being number three again and talking on a forum about “What does this data center look like to you?, Did they change a whole lot” and stuff like that. So that’s the approach that I recommend.

Transcribing Matt Cutts’ Videos: One through Fourteen

Filed Under video transcriptions · Tagged:  

I’ve started publishing the transcriptions of Matt Cutts’ videos, and have put the first fourteen of them up tonight. I do hope they’ll be useful to a lot of people. If you’re going to republish the transcriptions, at least give me a link-back. It was a bunch of work to get this all done. Also, if you see errors, please take a moment to point them out so I can correct it. Thanks!

Matt Cutts Video Transcription – Session 1: Including qualities of a good site
Matt Cutts Video Transcription – Session 2: Some SEO Myths
Matt Cutts Video Transcription – Session 3: Should you Optimize for Search Engines or for Users?
Matt Cutts Video Transcription – Session 4: Static vs. Dynamic urls
Matt Cutts Video Transcription – Session 5: How to structure a site?
Matt Cutts Video Transcription – Session 6: All about Supplemental Results
Matt Cutts Video Transcription – Session 7: Does Google Analytics play a part in SERPs
Matt Cutts Video Transcription – Session 8: Google Terminology
Matt Cutts Video Transcription – Session 9: All about datacenters
Matt Cutts Video Transcription – Session 10: Lightning Round!
Matt Cutts Video Transcription – Session 11: Reinclusion requests
Matt Cutts Video Transcription – Session 12: Tips for Search Engine Strategies (SES) San Jose 2006
Matt Cutts Video Transcription – Session 13: Google Webmaster Tools
Matt Cutts Video Transcription – Session 14: Recap of SES San Jose 2006

Matt Cutts Video Transcription – Session 14: Recap of SES San Jose 2006

Filed Under video transcriptions · Tagged:  

Session 14: Recap of SES San Jose 2006

***********************

——————————————————-
http://www.mattcutts.com/blog/video-recap-of-ses-san-jose-2006/
http://video.google.com/videoplay?docid=-7246927612831078230
Duration 9 min 0 sec.
————————————————————————————-

OK Everybody! I am back. I am mostly over my cold and my wife is somewhere else tonight. So I get to make a video. Wow!

So, I thought I will give you a recap from my point of view of Search Engine Strategies and sort of cover some of the high-order bits and stuff that I thought was pretty neat.

A lot of people are curious about the industry news. What did the Search Engines announce, or what happened during the week. So, Yahoo announced sitebuilder, which is something that lets you do a free custom search engine for your own site,  Google has something that’s sort of related to that but we rolled out several years ago. So Yahoo for now looks like they have a nicer custom site search, that’s free. They also rolled out authentication in site explorer. So one thing you’ll notice is, you can now prove that you own a site in site explorer and presumably you will be able to do more stuff down the road.
They also turned off the ability to do site: in a domain on Yahoo. So a lot of people missed that during the conference. Its now a forced redirect to Yahoo’s siteexplorer.search.yahoo.com. So, you’ll have to login if you want to do a site: search on Yahoo now. You might be able to do a ‘-a’, ‘-the’ to get around that. But it’s pretty clear that they want to shunt most of the people doing the SEO kind of research to that one site and leave the main site for the regular searchers.

So, what did Google announce? Well, we rebranded and renamed sitemaps to Google Webmaster Tools and there is new Google webmaster blog, the sitemaster blog has been renamed. So lot of stuff has been reorganised, so its all in one spot,  and one place you can go to. There is also the refresh of the supplemental results, which is kind of nice. People who were complaining about results being from about 2005, I believe, by the end of the month will have those new fresher supplemental results everywhere. But the supplemental results are basically, mostly in the April, May, June, July time frame,  The earliest drop I know of is in February, so I know a lot of people are happy with the refresh of the supplemental results.

We also released a click fraud report. Kind of interesting. The auditing page clicks session was a kind of  ?? on the burner. Guess what, you had to be there. Lot of fun. If you don’t want to read the 17 page report, I would just read the appendix, where they sort of talk about mathematically impossible things and give some concrete examples. But it is a pretty interesting report, if you want to read it.

Microsoft, didn’t really announce much and I actually support that. I don’t think Search Engines should try to roll stuff on a conference schedule, becuase then all the events get squashed into one  you sort of get lost in the noise. So I think its a not a bad idea to roll stuff out when its ready and not worry so much about launching during a conference, trying to get a big boost because of the press. So, all other search engines, including Google, Don’t launch anything during the conferences. make life more mellow for everybody.

Probably the biggest industry news that happened was inadvertent. And that was becuase AOL accidentally, well semi-accidentally leaked queries for hundreds of thousands of users and million of queries and stuff like that. It was done in good faith, the researchers wanted to provide data to people to learn more about how people search with search engines. But it took about a day before people realized now that it can be tied to individual searchers and stuff like that. So, people probably heard the follout from that over  the last couple of weeks, so I don’t need to talk about that.

It was an interesting conference because I got to meet a few people for the first time. I got to meet ?? Becker, Jason Dowdle, Shawn Hogan, Jim Hedger, Steve Bryant from eWeek. I enjoyed meeting everybody there. I enjoyed talking to a lot of people, from the lady from netshops to the guy I shared lunch with. It was a lot of fun as far as lot of talking to lot of web masters. People over there that I didn’t get to talk to but I would have really liked to talk to, Lisa Browney, I don’t know how to say it,…(there a few other names, difficult to get spelling right, so omitting them).

Other things that happened, it was actually a conference in which there is a lot of changes that happened. Sounds like
Andy Beal  is moving to a different spot. Mike Grehan is moving to a different spot. This is one of my favorites, nobody else I think noticed this. Jeffrey McManus who is Yahoo search developer or something like that. He left Yahoo. If you are not familiar with the name, he is the guy that said, Google Maps API smelt like wet burnt dog hair or something like that. So, he is no longer at Yahoo. I think he is consulting now. So, if you want to get good consulting, I am sure you can talk to Jeffrey McManus.

Kanoodle, something happened with them. They moved to seevast or something like that. And at first I thought it was something like a name change, but evidently, they have something with Moniker or Moniker’s naming page stuff.  I didn’t get to talk to Monte or ..(name not clearly audible) of Moniker and find out what knoodle is upto. But, that’s kind of interesting.

Probably the biggest change, that I thought was entertaining was Niall Kennedy left Microsoft, which was kind of funny moment because it started out that we were going to have search engine blogger round table. and I think Robert Scoble was scheduled to be on the panel and he left Microsoft. and so, Niall Kennedy was scheduled to take his spot. And then he announced that he was going to leave Microsoft, and he was leaving like three days after this panel. So there was atleast one point where I was looking at Niall and somebody from Microsoft talking. I couldn’t get what they were saying but I was imagining the Microsoft PR guy going, “You are going to be Cool, alright?” and Niall like, “Yes, yes I am going to be cool”. And he was. He did a great job. he told a really funny story about international soccer and how you can avoid incidents by thinking about the impact of your words. So, he was a lot of fun. Being on a panel with him along with Gary Price and Jeremy Zawodney.

Other fun moments. I missed, I can’t believe this I missed Danny Sullivan in lederhosen. He lost a bet with Thomas Bindle and there are pictures all over the web. Just Do a Google search or some other image search you should be able to find Danny Sullivan in lederhosen.

I got to talk to a lot of metrics companies and grill them about various things. I still got a few ?? to talk about metrics. Picking brains with webmasters, of course, they picked my brain a little bit. Its always good to talk to web masters, I enjoyed that a lot.

It was fun to meet some Cuttlettes. So,  Jessica and Audry from an SEO down in LA. It was really nice to meet you. Lyndsay,  it was nice to meet you as well.  Didn’t make my wife jealous at all. No sir. No marital problems there, I’ll tell you. But it was a lot of fun meeting a ton of  people, including a couple of  Cuttlettes.

I got a killer cold, which I am now over, so that pretty good. And there was one heart stopping moment where, Danny was talking to Eric Schmidt, the CEO of Google. He did a Q&A on the third day of the conference. And, Sergey showed up at SES back in 99 or 2000 and he said something like, there is no such thing as search engine spam. Which, back then was basically true because Google was using pagerank and links and anchor text  in ways that nobody ever thought of before and it was very hard to spam Google and nobody worked on it, because Google was really small.  But that quote haunted Google or atleast webspam for a while. “There is no such thing as spam”, said Sergey. So there was one moment when Danny Sullivan asked Eric Schmidt. He said, “Oh all this link stuff, people are always going to be trying to abuse it. Do you want to just go ahead and say now that everything is OK, there is no such thing as spam, you can do whatever you want”. He didn’t say exactly like that, but I still have this heart stopping moment. I was like “Eric, say the right thing, say the right thing..”. And he did a fantastic job, heart attack avoided. ..It was really a neat affair there and talk about the importance of web masters and communication and stuff like that.

So, it was a lot of fun. it was a good conference. I am going to be out of conferences until may be Web Waster World, Vegas in November. So, I am looking forward to some quiet time at home and just working on spam and stuff like that. But it was a lot of fun and if I got to meet you at the conference, I am glad I did. and if I did not, I hope I meet you in a future conference.

Matt Cutts Video Transcription – Session 13: Google Webmaster Tools

Filed Under video transcriptions · Tagged:  

Session 13: Google Webmaster Tools

******************************

********
————————————————————————–
http://www.mattcutts.com/blog/type/movies/
http://video.google.com/videoplay?docid=4526554928294588907
Duration 6min 34 sec.
————————————————————————–
Hey everybody! This is Matt Cutts. Its Monday, August 7th, and going to be the first day of Search Engine Strategies.
I have been picking SEO’s brains on Saturday. So, already started to loose my voice a little bit. But, I wanted to alert you to some stuff that people might have missed that just happened this past Friday. I think it might have gotten missed a little bit, because it happened at 9′o clock on a Friday and partly because, like a large fraction of the A list, B list and C list, bloggers about search,  are all sort of on their way or arriving at Search Engine Strategies, San Jose.
So, Google has actually done quite a bit more lately to revamp the amount of information we provide to general users and to webmasters. So, one thing is google.com/support has been beefed up a whole lot. So, all the different support stuff , there is a lot more answers with a lot more fresh information. Its pretty cool. If you go to google.com/support, that is sort of the one stop shop for all sorts of  general user support needs.
However, if you found your way to this video, you are probably not just a regular user.  You are probably also a webmaster. And if you are a webmaster, there is a tool you need to know about, which used to be called sitemaps until Friday. It all started off sometime last year, when this tool called sitemaps, let people submit all the urls that were on their sites. They could even say things like: when they had last changed, which urls are more important… all sorts of stuff. And lot of people made tools to create those sitemaps files and that was fantastic. The thing that happened after that is, the sitemaps team decided to build a more general console, something that could help webmasters with all sorts of other problems. And so, that’s been called sitemaps. But I know, Adam Lasnik came back from Search Engine Strategies and said that when he talked about sitemaps, everybody thought, oh XML files or stuff like that. So just this last week, sitemaps changed their name.
So, there is now an official area called GoogleWebMasterCentral and if you go to that, its just google.com/webmaster or webmasters, I’ll make sure that they both work, you will get a set of lots of different tools. There is now an official GoogleWebmasterBlog, which is going to be mostly maintained by Venessa Fox and I am sure, I will stop by from time to time to weigh in on various things. But that used to be the sitemaps blog, and the scope of it is broadening to now include anything related to webmasters, which I think is fantastic. The other thing is, the sitemaps tool has now become the GoogleWebmasterTools. And it got all sorts of stuff.  Its not just a place where you can tell people, here are all the urls that I’ve got, Google come please call those urls. Just off the top of my head, it has got robots.txt checker, it has got things to show you what errors in urls it has seen…. Earlier today, in fact, I found where I had made a link without the http and that doesn’t work so well in wordpress. So, I had gotten four errors whenever Google tried to crawl. So, I was actually, able to fix a broken link by looking at that table.
In some cases we can tell you whether you have spam penalties or not. So, if you have hidden text or something like that, we can actually show you that you have a penalty and actually give you a re-inclusion request, which we can give a little more weight to, because we know its you, you verified and proved that you really own that site.
They also just did a new release on Friday, along with change in the name and they introduced a lot of different pretty neat little stuff. Things like show me all the query words that show up in each subdirectory, or show me the crawl errors in each subdirectory and things like that. However, the biggest thing that I am really happy about is something called preferred domain. Sometimes we see, whenever people have their links, you know, not as uniform. May be they don’t have all their ducks in a row. And so, some of the links point to www.mattcutts.com and some of the links point to just mattcutts.com. So, without the www or with the www. And, if some people from out side of you, like the odp or whatever links to one and other people link to the other, Google tries to dis-ambiguate that. It tries to figure out, oh www and non-www are actually the same page and they are always to going to be the same site. But we can’t always get that 100% correct. So this new feature in sitemaps, the Google Webmaster Console or Google Webmaster Tools, whatever you want to call it,  now lets you say, OK, I verify, I own this domain and I verify I own it with-the-www as well, now, treat those as the same. Now bear in mind,  its a preference, so the first thing is, it might take several weeks for it to go into effect. The next thing is, its a preference, so, we don’t one hundred percent, guarantee that if you say, I want www, we will always go that way. But in the normal typical situation, with in a few weeks you should see your urls change from split between www or non-www, if you have this issue, to all being on which ever one you prefer.
I volunteered my domain to be used as the guinea pig by the crawl guys, so they were whipping it back and forth from www to non-www and things are looking that they are working pretty well. So, Propstoday of UK??, asked this feature, a bunch of other people have asked for this feature. I am glad we are getting around to it. I am sure we continue to keep looking for ways that we can take request from webmasters and try to turn that into useful information that they can get.
So if you haven’t taken a fresh look at the Google Webmaster  Tools, I would highly recommend that you do that. Its worth your time, you can find all kinds of errors, you can test your robots.txt, you can sometimes see penalties. There is, word that you rank for, words that you get ranked for and got clicked on a lot and most importantly, there is this www and non-www. So, if you have been effected by that, you can now tell google, which way you want it to be.

The sitemaps team has been doing a great job. I am sure I’ll continue to call them sitemaps team for a while, not being able to get used to the name change. But I’ll get used to it eventually. I hope that you will give it a try. I think it can be useful for anybody who’s got a site.

Matt Cutts Video Transcription – Session 12: Tips for Search Engine Strategies (SES) San Jose 2006

Filed Under video transcriptions · Tagged:  

Session 12: Tips for Search Engine Strategies (SES) San Jose 2006

************

——————————

—————————————————–
http://www.mattcutts.com/blog/type/movies/
http://video.google.com/videoplay?docid=4648779681675044189
Duration 5 min 40 sec.
———————————————————————————–

OK! This is Matt Cutts. Its Monday, about 1.00AM, which means Search Engine Strategies, starts in about nine hours. And fictional reader Todd Smith(??) writes in and says, “How would you recommend doing Search Engine Strategies? What tips or tricks can you give us, because I am going to the conference for the first time and I want to get the most out of it.”
That’s a great question, fictional reader Todd Smith.
First off I would say, go ahead get checked in. You are going to get probably a bag with like 14 pounds of stuff in it. I would go through there, pick out basically just a little sheet of paper, that’s like four pages, that’s like, here are the sessions. And I would pretty much take the rest of the stuff to the hotel. You have probably checked into a hotel that’s right near the convention center, so just drop everything off at the bedroom.
Here is what I do. I take a back pack (showing a backpack). I also take my little pad of paper to write down feedback. Its a jamesport?? backpack. You’ll notice that  its the exact same kind of backpack that Sawyer uses on Lost and if I were going to be trapped on a desert Island, this is what I would want too, because there is actually two completely different pockets. So you can put food and water in one. You can put your laptop and charger kind of stuff in the other. So if your water leaks, you are not going to destroy your laptop. Its water proof. Works very well.  You bring your laptop, throw that sucker in there along with the schedule and then if you go on to the expo and if you pick up some brochures, throw them in there and you are in good shape.

I would probably sit down and circle the sessions that would be of interest to you. For example, to me, the talk about Search Landsacpe  Answers and Advice is going to be there. So I would like to be a fly on the wall and ask them some questions about, do you use paths in your metrics, what do you do with AJAX and stuff like that. Also on Monday, I think the lunch with the sitemaps team is going to be pretty interesting. Sitemaps team just rolled out, on Friday, lot of new changes and infact sitemaps has been renamed to the Google Webmaster Central. So its now a general webmaster console. So major probs to them for doing that, so may be I will talk about that little bit more in future. And then on Monday, I have the focus group back at the Googleplex, so I have to leave and go home for that. On Tuesday, I think, the auditing paid clicks session should be really interesting. On Wednesday, I wouldn’t miss the Q&A with Eric Schmidt. And I am biased because I am on the panel, but I think search engine bloggers one should be pretty interesting too. Nothing but Q&As, so you don’t have to worry about powerpoint or anything like that.

There is a lot of parties. Its always fun to do the parties. The one thing that I would definitely try and get to do is our Google Dance, that’s Tuesday night and I’ll go ahead and tell you a little secret that not everybody realizes. Its the fifth Google dance. So we will have music, DJ, lot of food and all sorts of fun stuff. The part that most people don’t know and we will try to get signs up, but I think this was a little too late to be on the main search engine strategies program, is that we are going to have another, meet the Googlers, session during the party. So its mostly engineers, but we will also have a couple of product managers. People from all over the company, you know, quality, the adside of things, webspam, people who have expertise in adsense, click fraud, all sort of stuff and that will be going on during the Google Dance. In fact the middle part of the Google Dance. So, if you are looking at the cafeteria, where there is probably loud music. It’s sort of up on the second floor, all the way to the right. It’s a room called ‘university’, its like a little minitheatre. And we’ll probably have 10 or 12 Googlers, mostly engineers, answering questions. So, if you want to take a break from the loud music and the dance seen and talk search for a while, please stop by and say hello. That’s probably where I ‘ll be. We’ll hopefully have signs, but I’d love to see a lot of people coming and asking questions.

You know those are the sessions, they are kind of really interesting to me, but if you are not a search engineer, who’s been doing it for a few years, you might find other sessions completely interesting. You know, search engine algorithm research, or if you are a marketer, you might want to go to completely different sessions. The one tip that I would give is, I would probably say, go ahead and sit in the back. Because, you know, if for whatever reason, somebody starts going on sales or something like that, you can just duck out, and the amazing thing about search engine strategy is, you do have four different tracks going on at one time. So if one track isn’t interesting, or one particular speaker isn’t your cup of tea, you just duck out, go look at another one. And if nobody is good at that moment, you just sit down and do some wifi or something like that. Over all have fun. The more people you talk to the better.  if you see my ugly mug around and if I am not walking into a panel or getting ready for the net presentation, Please comeup, say hello, introduce yourself. I am poor about names and faces. You may have to remind me, ‘Hey, I am incredible, we met in Vegas”, or something like that. But usually after a couple of times, I get it down. And, I love for as many people as possible to come up and introduce yourselves. So, if you are going to be Search Engine Strategies, Sanjose, I hope you have a good time and I hope we’ll see you there.

Matt Cutts Video Transcription – Session 11: Reinclusion requests

Filed Under video transcriptions · Tagged:  

Session 11: Reinclusion requests

*****************

——————————————-
http://www.mattcutts.com/blog/type/movies/
http://video.google.com/videoplay?docid=-51167291563232332
Duration 2 min 24 sec.
————————————————————————-

Hey! This is Matt and Emmy( Matt appears with a cute cat on his shoulder) coming to you on Thursday after hockey at the GooglePlex.
Lets talk about, I don’t know, reinclusion requests.
So, I did a blog post about reinclusion requests a while ago. The procedure has changed a little bit though. So, imagine if you spammed or someone that you hired as a web master has spammed and now you are no longer in Google. What do you do now. So the best thing that I recommend, is to register in sitemaps, or webmaster console or webmaster central whatever you want to call. And, its basically the place where you can get all kinds of information. Sometimes, you can even find out if you have penalties on your site. We can’t show all your penalties that we have because, that would clue malicious spammers as well. But, if there are real legit sites, that have valid content, we want them to able to be found. So we can show penalties for some sites.

So, if you do have a penalty or if you suspect that you might have a penalty, go ahead and register at sitemaps and then fill out a reinclusion request. I thinks it is like at the bottom left or something like that. And, the more information you can give, the better. So, for example, if you are using an SEO or somebody that your webhost got hacked or whatever, give us as much specifics as you can. You also want to try to give some sort of timeline, here is what was going on, here is the mistake we had made. The most important thing is, Google needs to know that it’s not going to happen again.So,  some ways of letting us know or convincing us that, what ever you think the problem was, usually you might have a pretty clear idea, something like a hidden text, doorway pages, sneaky re-direct using Javascript, anything like that. We need to know that those pages, those violations of our quality guidelines are not going to comeback. So that’s the procedure that I would go with. Try to include as much detail as possible about how it might have happened and what you are going do to make sure that it does not happen again. And then, that goes into a queue which we check and we try to find out, OK,  has the hidden text been removed, stuff like that. So, reinclusion requests definitely get looked at by people and that’s the procedure I would recommend to use.

Matt Cutts Video Transcription – Session 10: Lightning Round!

Filed Under video transcriptions · Tagged:  

Session 10: Lightning Round!

***************

——————————

———————————————-
http://www.mattcutts.com/blog/type/movies/
http://video.google.com/videoplay?docid=-1756437348670651505
Duration: 5min 2 sec.
—————————————————————————-

Alright. This is Matt Cutts, coming to you on July 31st Monday. This is probably the last one I will do tonight. So lets try to do a lightning round.

Alright! Peter writes in. Says, “Is it possible to search for just home pages? I tried doing -inurlhtml, -inurlhtm blah, blah blah.. php, asp, but that doesn’t filter out enough.”
That’s a really good sugestion Peter. I hadn’t thought about that.
Fast used to offer something like that. But I think, all they did was to look for a ~ in the url. I will file that as a feature request and see if people are willing to prioritize it where we might be able to offer that. My guess is, it would be relatively low on the priority list, because of the syntax you mentioned subtracting off a bunch of extensions would probably work pretty well.

Ah. I get to clarify something about strong versus bold, emphasis versus italic. So, there was a previous question where somebody had asked about whether it was better to use bold or whether it was better to use strong. Because bold is what everybody used in the olden days when the dinosaurs roamed the earth. And strong is what the W3C recommends. At that time, last night, I thought that we just barely, barely, barely, like an epsilon preferred bold over strong and I said, for the most part don’t worry about that. The nice thing is an engineer actually took me to the code where actually I could see it for myself, and Google does treat bold and strong with exactly the same weight. so thank you for that Paul. I really, really appreciate it.  In addition, I checked the code that shows that ‘em’ and italic are treated exactly the same as well. So, there you have it, go forth and mark up like the W3 would like you to do it, do you it semantically well and don’t worry so much about the old tags, because Google would score it just the same either way.

Alright. In the lightning round, Goodman??? asks, “Will we see more kitty-posts in the future?”
I think we will. In fact I tried to get my cats in on this show but they are a li’l scared of lights. Lets see, if I can get them used to it.

TomHTML asks, “What are Google SSD, Google GAS, Google RS2, Google global Marketplace, Google Weaver and other services discovered by Tony Rusco??”
I think it was very clever of Tony to try to do a dictionary tag against our services check-in, but I am not going to talk about what those services are.

What else have we got here.

Josef Humpkins?? asks, “A Preview of what many of the topics might be in the duplicate content session of the SES”.
I gave a little bit of a preview in one of the other sessions on video. But, I think what we would  basically talk about, Sherry?? will be there, a lot of people will be there, we will talk about shingling. We all essentially say is, Google does a lot of duplicate detection from the crawl, all the way down to the very last millisecond, practically when user sees things. And we use stuff that’s exact duplicate detection and we do stuff that’s near duplicate detection. So we do a pretty good job all the way along the line of trying to weed out duplicates and stuff like that. And the best advice I give is to make sure that your duplicate content, you know, pages which might have nearly same content, look as much different as possible, if they are truly different content. A Lot of people worry about printable versions or somebody else asked about .doc or word file  compared to an html file. Typically you don’t need to worry about that. If you have similar content on different domains, may be in French and another version in English, you really don’t need to worry about that. Again, if you do have the exact same content, may be for a Canadian site and for a .com site, its probably just the sort of thing where we will detect which ever one looks better to us and and just show that, but it wouldn’t necessarily trigger any sort of penalty or anything like that. Or if you want to avoid it, you can try to make sure that templates are very very different. But in general, if the content is quite similar, its better just to let us show which ever representation we think is the best anyway.

And Thomas writes in and says, “Does Google index or rank blog sites differently, than regular websites?”. That’s a Good Question.
Not really. Somebody  else asked about links from govs, edus and whether links from two level deep govs and edus, like gov.pl are the same as .gov. And the fact is we don’t really have much in the way to say, oh this is a link the from the odp or from .gov or .edu.so give that some sort of special boost.  Its just that those sites tend to have higher pagerank because more people link to them and reputable people link to them. So blog sites,there is not really any distinction unless if you go off to blogsearch ofcourse, and then its all constrained to blogs. In theory, we could rank them differently, but for the most part, just the general search, the way it crawls out. Things are working out ok.

Alright!. Thanks.

Matt Cutts Video Transcription – Session 9: All about datacenters

Filed Under video transcriptions · Tagged:  

Session 9: All about datacenters

*****************
——————————

———————————-
http://www.mattcutts.com/blog/type/movies/
http://video.google.com/videoplay?docid=8726665066825965913
Duration: 4min 36 sec.
—————————————————————-

OK! This is Matt Cutts, coming to you live from the Mattplex. Its Monday, July 31st. And I am wearing a different shirt, so its not all one big take.  In fact,  it my werewolf versus unicorn shirt. That’s right, you’ve got the unicorn (pointing to it) and the wereworlf. Mortal enemies since the beginning of time and “Its On Now” (pointing to text below the figures).

Alright! So, this should better be a special session. lets take a fun question from g1smd. They ask, “For all the datacenter watchers out there. Should all results across one class C IP address block be the same most of the time, except when you are pushing data or they are supposed to be different because you are trying different things on them? And, would make more sense to use the direct ip addresses when reporting issues or problems, or the 41gfe datacenter names?”

Alright. Well! Lets talk about Datacenters.
Back in the days of dinosaurs, you know (makes sounds and gestures imitating dinosaurs), when the dinosaurs were on the earth, you could actually run a search engine off of one computer. And those days are long since gone unless you have a really really powerful computer or something very very small to search over or you have Google Search appliance, I guess. So, these days you pretty much have to have a datacenter. And in the early days of datacenter you could just do, you know, some sort of round robin trick with dns, so that you always hit different datacenters, Google does some very smart stuff in load balancing, some very interesting techniques to try to make sure that different datacenters are able perform well.

So your basic question was this. Should all things on the same Class C IP block be roughly the same. And yes, they should roughly be the same in that they are typically the same datacenter. But not always. Let me give you a couple of examples. If one datacenter has to fail over or if one datacenter is out of rotation, then even if you are going to one IP address, you can get bounced over to a different datacenter. And even though it will look like you are consistently hitting the same datacenter, behind the scenes, underneath Google’s load balancing, you could be hitting a different datacenter, completely. So, those situations are somewhat rare but not that rare. So that’s why sometimes when you see people having debates online at webmasters world or datacenterwatcher and stuff like that, they can actually be seeing different things, even if they hit the same IP address.

The other point I wanted to make, and I made this at pubcon, Boston,  was that, the datacenters often have a lot of different things going on. So whenever there is a new algorithm update or some other feature that we are trying out, we often try it out on one datacenter first, to make sure the quality is what we have expected it to be based on evaluation, stuff like that.  So the datacenters do differ, you know, according to very some complex intricate plans, so that we can try out different things at different datacenters. Typically, on one class C IP address, you will usually hit the same datacenter, but that’s not guaranteed. Also, at pubcon Boston, I showed a list of, an example of the sorts of different things that are going on at different datacenters. It sort of shows how things a lot more intricate now than they use to be and so, Google does a lot more smart scheduling and its a lot harder for a random person to just look at a datacenter and reverse engineer or try to guess you know, which way things are going, stuff like that.

As far as IP address versus the gfe name, which I think exactly  me and  g1smd know about, no one else really bothered to talk about, except may be on webmasterworld, you can use either IP address, or you know the two letter code of a datacenter, because we are able to map them both back. If you tell us one, we can tell what the other one is, ether way. In general though, there are probably better ways to spend your time, than watching datacenters. I think its a good use of your time to work on your content, a good use of your time whenever something major is going on if you really want to look whenever there is a pagerank update or something going on. But, in general, there is enough stuff going on at different data centers, that I would say it’s probably not worth checking every single datacenter, every single day to try to figure out, OK, how am I going to do or how have I been doing. Its probably better to spend a little more time paying attention to your logs and work backwards based out of that.

Matt Cutts Video Transcription – Session 8: Google Terminology

Filed Under video transcriptions · Tagged:  

Session 8: Google Terminology

***************

——————————

——————————————
http://www.mattcutts.com/blog/type/movies/
http://video.google.com/videoplay?docid=8475081922887713591
Duration 4 min 40 sec.
————————————————————————

OK. We’re back. I want to start off with a really interesting question.
Dazzling Darr wrote all the way from Louisiana. She says, “Matt! I mentioned before that I love to see a define type post, redefine terms that you Googlers use, that we non-Googlers might get confused about. Things like Data Refresh, Orthogonal etc.. You may have defined them in various places. But one cheat-sheet kind of list would be great.” A very good question!

So, at some point I’ll have to do a blog post about host versus domain and a bunch of stuff like that. But several people have been asking questions about June27th, July 27th. So, let me talk about those a little bit, in the context of a data refresh versus an algorithm update versus an index update.

So, I’ll the use metaphor of the car. Back in 2003, we would crawl the web and index the web about once every month. And when we did that, that was called an index update. Algorithms could change, the data would change, every thing could change all in one shot. So, that was a pretty big deal. Webmaster world would name those as “index updates”. Now that we pretty much crawl and refresh some of our index every single day, it’s ever flux, always going on sort of process.

The biggest changes that people tend to see are algorithm updates. You don’t see many index updates anymore, because we moved away from this monthly update cycle. The only times you might see them is, if you are computing an index which is incompatible with the old index. So for example, if you change how you do segmentation of CJK, Chinese Japanese Korean or something like that, you might have to completely change your index and go to another index in parallel. So the index updates are relatively rare. Algorithm updates, basically are, when you change your algorithm. So, may be that’s changing how you score a particular page, you say to yourself, oh, the page rank matters this much more or this much less and things like that. And those can happen pretty much at any time. So we call that asynchronous, because whenever we did an algorithm update and evaluates positively, it improves quality, it improves relevance, we go and push that out.

And then the smallest change is called a data refresh. And that’s essentially like, you are changing the input to the algorithm, changing the data the algorithm works on.
So, in index update, with the car metaphor would be changing a large section of the car, things like, changing the car entirely. where as in algorithm update would be like changing a part in the car. May be changing out the engine for a different engine or some other large part of the car. A data refresh is more like changing the gas in your car. Every one or two weeks or three weeks, if you are driving a hybrid, you will change what actually goes in and how the algorithm operates on that data.

So for the most part, data refreshes are very common thing. We try to be very careful about how we safety check them. Some data refreshes happen all the time. For example we compute pagerank continually and continuously. So there is always a bank of machines refining pagerank based on incoming data. And page rank goes out all the time, anytime there is an update with our new index, which happens pretty much every day.

By contrast, some algorithms are updated every week, every couple of weeks and so those are data refreshes that happen on a slower pace. So the particular algorithm that people are interested in on June 27th and July 27th, those algorithms, well that particular algorithm is actually been live for over a year and half now. So it’s data refreshes that you seeing the are changing the way people’s sites rank.

In general, if your site has been affected, go back, take a fresh look and see, is there anything that might be exceedingly over optimized, or may be a bit hanging out on SEO forums for such a long time that I need to have a regular person come and take a look at the site and see if it looks ok to me. If you’ve tried all the regular stuff and it still looks ok to you, then I would just keep building regularly good content, and try to make the site very useful and if the site is useful, then Google should you know, fight hard to make sure that rank is where it should be ranking. That’s about the most advice I can give about June 27th and July 27th data refreshes, because it does go into our secrets also a little bit, But that hopefully gives you an idea about the scale, the magnitude of different changes. Algorithm changes happen a little more rarely, but data refreshes are always happening and sometimes they happen from day to day and sometimes they happen from week to week and month to month.

Matt Cutts Video Transcription – Session 7: Does Google Analytics play a part in SERPs

Filed Under video transcriptions · Tagged:  

Session 7: Does Google Analytics play a part in SERPs (Search Engine Result Pages)?
When does Google detect duplicate content, and how wide is the range?
I want to mark my page as porn in SafeSearch–what do you recommend?
Is it okay to make hyperlinks in option elements?

*****************************
——————————

——————————————-
http://www.mattcutts.com/blog/more-seo-answers-on-video/
http://video.google.com/videoplay?docid=-9028425054136856586
Duration 5 min 11 sec.
————————————————————————-(Matt appears sipping some kind of cola from a glass. keeping it aside, he says)
Ah. Well, Hello There!
I was just enjoying some delicious ‘Diet Sprite Zero’ (shows the can, puts it down and picks up a magazine), while reading my new issue of ‘Wired’ magazine. Oh, they really captured the asymmetry in Steven Colberry’s ears. Didn’t they? (Keeps the book aside).

I don’t know. I think it will be really fun to do fake commercials. Diet Sprite has not paid me anything for endorsing them.

Alright. Shawn Stinez (??) writes in.
“Does Google Analytics play part in SERPs?”. SERPs meaning, Search Engine Results Pages.
To the best of my knowledge, it does not. I am not going to categorically say we don’t use it any where in Google. But, I was asked this question in Webmaster World in Las Vegas last year, and I pledged that Webspam team will not use Google analytics data at all. Now, webspam is just a part of quality and quality is just a part of Google, but Webspam definitely has not used Analytics data to the best of my knowledge. Other places in Google don’t either. Because we want people to just feel comfortable using it and (pause) use it.

Alright. Gwen writes in. She or he says, “Dear Mr.Cutts, its going to be along weekend, You get a lot of questions asked.” Thank you ma’m very sympathetic of you! “But I have to. When does Google detect duplicate content and within which range will duplicate be duplicate?”. Good question.
So, that’s not a simple answer. The short answer is, we do a lot of duplicate content detection. It’s not like there is one stage where we say, right here is we detect the duplicates. Rather, it’s all the way from crawl, through the indexing, through the scoring, all the way down until finally just milliseconds before you answer things. And there are different types of duplicate content. There is certainly exact duplicate detection, so, if one page looks exactly same as another page, that could be quite helpful. But at the same time, its not case the pages are not always exactly the same. And so, we do also detect near duplicates. We are using a lot of sophisticated logic to do that. So, in general, if you think you might be having problems, your best guess is probably is to make sure that your pages are quite different from each other. Because we do, do a lot of duplicate detection to crawl less and to provide better results and more diversity.

OK. Jeff Jones(??) writes in. This is my favorite question. Well, there have been a lot of good questions. I really like this one.
“I would like to explicitly exclude a few of my sites from the default moderate safe search filtering. Google seems to be less of a prude than I would like to prefer. Is there any hope of a tag, attribute or other snippet to limit a page to unfiltered results or should I just start putting a few nasty words in the alt tags of blank images. Well, don’t do them in blank images. You know, put them in meta-tags. Whenever I was writing the very first version of safe search, I noticed that there were a lot of pages which did not tag their sites or their pages at all, in terms of we are being adult in content. So there are lot of industry groups,there is a lot of industry standards, but at that time, the vast majority of porn pages just sort of ignored these tags. So, its not that big deal, go ahead and include that.
So a short answer to your question is, to the best of my knowledge there is no tag that can just say, I am porn, please exclude me from your safe search. Its wonderful that you are asking about that.
Your best bet, I would go with meta-tags. Because safe search, unlike a lot of different stuff, actually does look at the raw content of a page, or at least the version that I last saw looks at the raw content of the page. And so, if you put it in your meta-tags or even in comments, which is something that isn’t usually is not indexed by Google at all, we should be able to detect that it is porn that way. Don’t use blank images. Don’t use images that people can’t see though.

And then lets finish of with a question from Andre Shogan (??).
He says,”sometimes I make a box spider-able, by just putting links in the option elements, normal browsers ignore them and spiders ignore the option. But since, Google is using the Mozilla bot, and the bot renders the page before it crawls it, I know that if the Mozilla engine renders the element ?? from the document object model tree”.
So in essence he is saying, can I put the element in an option box. You can. But I wouldn’t recommend it. it is pretty non-standard behavior. Its very rare.It would definitely make my eyebrows go up, if I were to see it, so its better for your users and better for search engines, if you probably just take those links out, put them somewhere at the bottom of the page or in a sitemap, and then that way, we will be able to crawl right through and we don’t have to have hyperlinks or anything like that.

Alright! that’s enough questions for now. Its getting toward eleven o’ clock. I am going to call it night.
Its Sunday, July 30th. So we will see if we can knock a few of these out next week. Thanks a lot.

Next Page »