url encoding

  • RFC 3986 section 2.2 reserved january 2005

    1
    ! * '( ) ; : @& = + $ ,/ ? # [ ]
  • RFC 3986 section 2.3 unreserved january 2005

    1
    2
    3
    A B C D E F G H I J K L M N O P Q RS T U V W X Y Z  
    a b c d e f g h i j k l m n o p q rs t u v w x y z
    0 1 2 3 4 5 6 7 8 9 - _ .~
  • RFC 2396 URI Generic Syntax reserved August 1998

    1
    ;  /  ?  :  @  &  =  +  $  ,
  • RFC 2396 URI Generic Syntax unreserved August 1998

    1
    2
    alphanum  or  mark
    mark = - _ . ! ~ * ' ( )

java use the older one
for compatible java use the same collection unreserved from all browser just like RFC2986 no ‘~’ add ‘*’

1
2
3
4
5
6
/*
* Unreserved characters can be escaped without changing the
* semantics of the URI, but this should not be done unless the
* URI is being used in a context that does not allow the
* unescaped character to appear.
*/

regular expression

1
/^((ht|f)tps?):\/\/[\w\-]+(\.[\w\-]+)+([\w\-\.,@?^=%&:\/~\+#]*[\w\-\@?^=%&\/~\+#])?$/
  • start with ‘http/https/ftp/ftps’
  • can’t contain double bytes characters or not unreserved characters