Thursday, July 03, 2025

Robots.txt and Search Engines

It is not sexy but it useful. The robots.txt is suppose to tell robots/bots/crawlers where they can crawl on a web site. The robots.txt must be placed in the root folder. The Robot Exclusion Standard (RES) is a proposed proposed that has never been agreed on. However, the respectable search engines on the web voluntarily follow it with extensions. There are however reported cases when they allegedly fail to comply with instructions in the robots.txt


Key points

  • # indicate that any follow it is a comment
  • * - wild card meaning everything
  • User-agent - The name of the robot/bot/spider/crawler
  • Disallow: - Allow access to all of site
  • Disallow: / - Disallow access to all of site
  • Disallow: /images/ - do not crawl the images folder
  • Disallow: /folder/filename.html - do not crawl filename.html in folder
Sample File
----------------------------------------------------
# pervent access of all user agents to the follow folders
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

# Specific intructions for specific user-agents
User-agent: msnbot
Disallow: /cgi-bin/
Disallow: /images/
Crawl-delay: 10

User-agent: Teoma
Disallow: /cgi-bin/
Disallow: /images/
Crawl-delay: 10

User-agent: Slurp
Disallow: /cgi-bin/
Disallow: /images/
Crawl-delay: 10


------------------------------------------------------------

Allow: and Crawl-delay: are not part of the standard but supported by MSN and Yahoo

Information on what, why and how see
http://www.robotstxt.org/wc/faq.html

What the Big Three plus one say

ASK
http://about.ask.com/en/docs/about/webmasters.shtml

Yahoo
http://help.yahoo.com/help/us/ysearch/slurp/slurp-02.html

Using the robots.txt analysis tool
http://www.google.com/support/webmasters/bin/topic.py?topic=8475

MSN
http://search.msn.com.sg/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm

Tougher action

see:

Using .htaccess for site control
http://linuxreviews.org/webdesign/htaccess/

How to keep bad robots, spiders and web crawlers away
http://www.fleiner.com/bots/



robots.txt generator
http://www.mcanerin.com/EN/search-engine/robots-txt.asp


http://www.outfront.net/tutorials_02/adv_tech/robots.htm

Tuesday, October 08, 2024

Generate PFX file using OPENSSL on Windows

Had a situation where a client needed a PFX with password for a particular setup. This is something I have not done before, so here are the steps

Generate the key and the CSR 

Enter the password when requested. Please ensure you remember this.

Replace domain with your domain name for simplicity.

openssl req -newkey rsa:2048 -nodes -keyout domain.key -out domain.csr

Once validated, you have should have access to these in the package.

  • The root cert files
  • intermediate cert files
  • the domain crt from the  TLS/SSL certificate provider.

Run the command to import the files, root, intermediate and domai. 

Note in some cases you will have 2 or more intermediate files, please combined into one file and call intermeidate.crt. Rename the "root" cert, rootca.crt. 

Generate the PFX

openssl pkcs12 -export -out domain.pfx -in rootca.crt -in intermediate.crt  -in domain.crt -inkey domain.key

You will now have the pfx file for use along with the password.

Friday, June 28, 2024

Possible Fix for BAD_SYSTEM_CONFIG_INFO 75736e6a726e6c2e 500

 Today I woke up to this error on a Dell Precision 3541 Laptop. For some reason it was asking for the Bitlocker Key.

After entering the key, received the following error on boot up

BAD_SYSTEM_CONFIG_INFO

After searching around I was advised to start here 

chkdsk C: /f /r /x

It did showed 1 bad file cluster that was fixed.

However, we  ran into another error. 

"unspecified error 75736e6a726e6c2e 500

Somewhere a user advised to simple run chkdsk C: /r  again.  (a user in reference #3)

That I did, and this time it ran without error and the issues was resolved.

The PC booted without any issue.


References

  1. windows-10-bad-system-config-info
  2. bad-system-config-info-windows-10-on-boot-up
  3. An unspecified error occurred (75736e6a726e6c2e 500)




Saturday, February 17, 2024

Audio Noise Removal


Many times in record clips we end up with unwanted background noise. Recently faced with the situation, I needed to clean up some audio and as usual I went to Google looking for tools.

These are what I found and my thoughts

Free

Paid with Free Trials

  • Adobe Enhance - https://podcast.adobe.com/enhance
    Good with Limitations
  • Media AI - Noise Reducer - https://noisereducer.media.io

  • MyEdit Audio Tools - https://myedit.online/en/audio-editor
    Has some interesting tools.

Results 

Overall, I found Audio Denoise to be the best overall and in the free category. In the paid category Adobe 

Friday, February 03, 2023

MYSQL Error: Lock wait timeout exceeded; try restarting transaction

I have not seen this error in a while with MySQL replication, " Lock wait timeout exceeded; try restarting transaction"

Check the slave status

 show slave status \G; and noticed the following.

Slave_IO_Running: Yes

  Slave_SQL_Running: No 

 Slave_IO_Running: Yes

  Slave_SQL_Running: No

   Last_SQL_Errno: 1205

   Last_SQL_Error: Lock wait timeout exceeded; try restarting transaction


Solution

Stopped the Slave

stop slave;

Restarted the Slave

start slave;

Check the status and all is well


References


Monday, January 21, 2019

MYSQL and New IP for Master in Replication

There comes a time you might need to change the IP of a Master in a MySQL Replication setup. The steps are fairly simple.

You will need to know the IP of the Master.

Important!
When you’re using CHANGE MASTER TO to set start position for the slave you’re specifying the position for SQL thread and so you should use Relay_Master_Log_File:Exec_Master_Log_Pos. 
Otherwise, you’re going to ruin your replication.

SSH into the MYSQL SLAVE and run SHOW SLAVE STATUS
Slave_IO_State: Reconnecting after a failed master event read
Master_Host: 10.0.0.1
Master_User: replicate
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.002933
Read_Master_Log_Pos: 832187423
Relay_Log_File: mysql-relay-bin.000230
Relay_Log_Pos: 832187707
Relay_Master_Log_File: binlog.002933
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 832187423
Relay_Log_Space: 832188044
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2003
Last_IO_Error: error reconnecting to master 'replicate@10.0.0.1:3306' - retry-time: 60  maximum-retries: 86400  message: Can't connect to MySQL server on '10.0.0.1' (110 "Connection timed out")
Last_SQL_Errno: 0
 Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Look for the values

  • Read_Master_Log_Pos:
  • Exec_Master_Log_Pos:
These should be the same, note them and note Relay_Master_Log_File

  • Exec_Master_Log_Pos: 832187423
  • Relay_Master_Log_File: binlog.002933
SSH into SLAVE and run SHOW SLAVE STATUS
STOP SLAVE
CHANGE MASTER TO MASTER_HOST='xxx.xxx.xxx.xxx', MASTER_LOG_FILE='binlog.002933', MASTER_LOG_POS=832187423;
START SLAVE
SHOW SLAVE STATUS
Reference

Thursday, April 19, 2018

Paypal




  1.  Log in to your PayPal account at https://www.paypal.com. The My Account Overview page appears. 
  2. Click the Profile subtab. 
  3. The Profile Summary page appears. Click the My Selling Tools link in the left column. 
  4. Under the Selling Online section, click the Update link in the row for Website Preferences. The Website Payment Preferences page appears 
  5. Under Auto Return for Website Payments, click the On radio button to enable Auto Return. 
  6. In the Return URL field, enter the URL to which you want your payers redirected after they complete their payments. 


NOTE: PayPal checks the Return URL that you enter.
 If the URL is not properly formatted or cannot be validated, PayPal will not activate Auto Return.

Scroll to the bottom of the page, and click the Save button.

References



Robots.txt and Search Engines

It is not sexy but it useful. The robots.txt is suppose to tell robots/bots/crawlers where they can crawl on a web site. The robots.txt mus...