Function: url-generic-parse-url

url-generic-parse-url is an autoloaded and byte-compiled function defined in url-parse.el.gz.

Signature

(url-generic-parse-url URL)

Documentation

Return an URL-struct of the parts of URL.

The CL-style struct contains the following fields:

TYPE is the URI scheme (string or nil).
USER is the user name (string or nil).
PASSWORD is the password (string [deprecated] or nil).
HOST is the host (a registered name, IP literal in square
         brackets, or IPv4 address in dotted-decimal form).
PORTSPEC is the specified port (a number), or nil. FILENAME is the path AND the query component of the URI.
TARGET is the fragment identifier component (used to refer to a
         subordinate resource, e.g. a part of a webpage).
ATTRIBUTES is nil; this slot originally stored the attribute and
         value alists for IMAP URIs, but this feature was removed
         since it conflicts with RFC 3986.
FULLNESS is non-nil if the hierarchical sequence component of
         the URL starts with two slashes, "//".

The parser follows RFC 3986, except that it also tries to handle URIs that are not fully specified (e.g. lacking TYPE), and it does not check for or perform %-encoding.

Here is an example. The URL

  foo://bob:pass@example.com:42/a/b/c.dtb?type=animal&name=narwhal#nose

parses to

  TYPE = "foo"
  USER = "bob"
  PASSWORD = "pass"
  HOST = "example.com"
  PORTSPEC = 42
  FILENAME = "/a/b/c.dtb?type=animal&name=narwhal"
  TARGET = "nose"
  ATTRIBUTES = nil
  FULLNESS = t

Probably introduced at or before Emacs version 24.3.

Source Code

;; Defined in /usr/src/emacs/lisp/url/url-parse.el.gz
;;;###autoload
(defun url-generic-parse-url (url)
  "Return an URL-struct of the parts of URL.
The CL-style struct contains the following fields:

TYPE     is the URI scheme (string or nil).
USER     is the user name (string or nil).
PASSWORD is the password (string [deprecated] or nil).
HOST     is the host (a registered name, IP literal in square
         brackets, or IPv4 address in dotted-decimal form).
PORTSPEC is the specified port (a number), or nil.
FILENAME is the path AND the query component of the URI.
TARGET   is the fragment identifier component (used to refer to a
         subordinate resource, e.g. a part of a webpage).
ATTRIBUTES is nil; this slot originally stored the attribute and
         value alists for IMAP URIs, but this feature was removed
         since it conflicts with RFC 3986.
FULLNESS is non-nil if the hierarchical sequence component of
         the URL starts with two slashes, \"//\".

The parser follows RFC 3986, except that it also tries to handle
URIs that are not fully specified (e.g. lacking TYPE), and it
does not check for or perform %-encoding.

Here is an example.  The URL

  foo://bob:pass@example.com:42/a/b/c.dtb?type=animal&name=narwhal#nose

parses to

  TYPE     = \"foo\"
  USER     = \"bob\"
  PASSWORD = \"pass\"
  HOST     = \"example.com\"
  PORTSPEC = 42
  FILENAME = \"/a/b/c.dtb?type=animal&name=narwhal\"
  TARGET   = \"nose\"
  ATTRIBUTES = nil
  FULLNESS = t"
  (if (null url)
      (url-parse-make-urlobj)
    (with-temp-buffer
      ;; Don't let those temp-buffer modifications accidentally
      ;; deactivate the mark of the current-buffer.
      (let ((deactivate-mark nil))
        (set-syntax-table url-parse-syntax-table)
	(erase-buffer)
	(insert url)
	(goto-char (point-min))
        (let ((save-pos (point))
              scheme user pass host port file fragment full
              (inhibit-read-only t))

          ;; 3.1. Scheme
	  ;; This is nil for a URI that is not fully specified.
          (when (looking-at "\\([a-zA-Z][-a-zA-Z0-9+.]*\\):")
	    (goto-char (match-end 0))
            (setq save-pos (point))
	    (setq scheme (downcase (match-string 1))))

          ;; 3.2. Authority
          (when (looking-at "//")
            (setq full t)
            (forward-char 2)
            (setq save-pos (point))
            (skip-chars-forward "^/?#")
            (setq host (buffer-substring save-pos (point)))
	    ;; 3.2.1 User Information
            (if (string-match "^\\([^@]+\\)@" host)
                (setq user (match-string 1 host)
                      host (substring host (match-end 0))))
            (if (and user (string-match "\\`\\([^:]*\\):\\(.*\\)" user))
                (setq pass (match-string 2 user)
                      user (match-string 1 user)))
            (cond
	     ;; IPv6 literal address.
	     ((string-match "^\\(\\[[^]]+\\]\\)\\(?::\\([0-9]*\\)\\)?$" host)
	      (setq port (match-string 2 host)
		    host (match-string 1 host)))
	     ;; Registered name or IPv4 address.
	     ((string-match ":\\([0-9]*\\)$" host)
	      (setq port (match-string 1 host)
		    host (substring host 0 (match-beginning 0)))))
	    (cond ((equal port "")
		   (setq port nil))
		  (port
		   (setq port (string-to-number port))))
            (setq host (downcase host)))

	  ;; Now point is on the / ? or # which terminates the
	  ;; authority, or at the end of the URI, or (if there is no
	  ;; authority) at the beginning of the absolute path.

          (setq save-pos (point))
          (if (string= "data" scheme)
	      ;; For the "data" URI scheme, all the rest is the FILE.
	      (setq file (buffer-substring save-pos (point-max)))
	    ;; For hysterical raisins, our data structure returns the
	    ;; path and query components together in one slot.
	    ;; 3.3. Path
	    (skip-chars-forward "^?#")
	    ;; 3.4. Query
	    (when (looking-at "\\?")
	      (skip-chars-forward "^#"))
	    (setq file (buffer-substring save-pos (point)))
	    ;; 3.5 Fragment
	    (when (looking-at "#")
	      (let ((opoint (point)))
		(forward-char 1)
                (setq fragment (buffer-substring (point) (point-max)))
		(delete-region opoint (point-max)))))

          (if (and host (string-match "%[0-9][0-9]" host))
              (setq host (url-unhex-string host)))
          (url-parse-make-urlobj scheme user pass host port file
				 fragment nil full))))))