Would you like to put something “public” to the internet but not for Google ..? That might be the case sometimes. The simplest way is to block access to your resources by basic authentication, but sometimes that is not an option. Today I will show you how you can hide your site from popular search engines like Google and Bing.

One simple and easy thing you need to do is add the X-Robots-Tag to your server response, but it is not enough because not all engines support that header. The one helpful thing is to add appropriate robots.txt file (for ex. Alexa web crawler and Yandex recognizes it). Is adding a robots.txt will not be good enough? mmhmm maybe but let’s say we have some ‘double protection’ here :see_no_evil:

Example configuration may look like this one:

server {
       listen 443 ssl;
       listen [::]:443 ssl;

       server_name example.com;

       root /var/www/example.com;

       location / {
         add_header  X-Robots-Tag "noindex, nofollow, nosnippet, noarchive";
         add_header X-Frame-Options SAMEORIGIN;
         add_header X-Content-Type-Options nosniff;
         add_header X-XSS-Protection "1; mode=block";
         index index.html;
       }
       
       location /robots.txt {
         return 200 "User-agent: YandexDisallow: /\nUser-agent: *\nDisallow: /";
       }

       ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
       ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
}

Be careful with adding add_header directive inside the location part - if you do so any global headers (for example headers added at server directive level) will be dropped.

Some reference links to get some more know-how about that X-Robots-Tag.