a ª‡Ù`ðCã@s¸dZddlZddlZddlmZdgZe d¡Ze d¡Ze d¡Z e d¡Z e d ¡Z e d ¡Z e d ¡Z e d ¡Ze d ¡Ze dej¡Ze d ¡Ze d¡ZGdd„dejƒZdS)zA parser for HTML and XHTML.éN)ÚunescapeÚ HTMLParserz[&<]z &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]ú>z--\s*>z+([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*aF <[a-zA-Z][^\t\n\r\f />\x00]* # tag name (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\s]* # bare value ) \s* # possibly followed by a space )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z#c@sàeZdZdZdZddœdd„Zdd„Zd d „Zd d „Zd Z dd„Z dd„Z dd„Z dd„Z dd„Zd7dd„Zdd„Zdd„Zdd „Zd!d"„Zd#d$„Zd%d&„Zd'd(„Zd)d*„Zd+d,„Zd-d.„Zd/d0„Zd1d2„Zd3d4„Zd5d6„Zd S)8raEFind tags and other markup and call handler functions. Usage: p = HTMLParser() p.feed(data) ... p.close() Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). If convert_charrefs is True the character references are converted automatically to the corresponding Unicode character (and self.handle_data() is no longer split in chunks), otherwise they are passed by calling self.handle_entityref() or self.handle_charref() with the string containing respectively the named or numeric reference as the argument. )ÚscriptÚstyleT)Úconvert_charrefscCs||_| ¡dS)zÆInitialize and reset this instance. If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters. N)rÚreset)Úselfr©r ú./usr/local/src/Python-3.9.6/Lib/html/parser.pyÚ__init__VszHTMLParser.__init__cCs(d|_d|_t|_d|_tj |¡dS)z1Reset this instance. Loses all unprocessed data.Úz???N)ÚrawdataÚlasttagÚinteresting_normalÚ interestingÚ cdata_elemÚ _markupbaseÚ ParserBaser©r r r r r_s zHTMLParser.resetcCs|j||_| d¡dS)z‘Feed data to the parser. Call this as often as you want, with as little or as much text as you want (may include '\n'). rN)rÚgoahead©r Údatar r r Úfeedgs zHTMLParser.feedcCs| d¡dS)zHandle any buffered data.éN)rrr r r ÚclosepszHTMLParser.closeNcCs|jS)z)Return full source of start tag: '<...>'.)Ú_HTMLParser__starttag_textrr r r Úget_starttag_textvszHTMLParser.get_starttag_textcCs$| ¡|_t d|jtj¡|_dS)Nz )ÚlowerrÚreÚcompileÚIr)r Úelemr r r Úset_cdata_modezs zHTMLParser.set_cdata_modecCst|_d|_dS©N)rrrrr r r Úclear_cdata_mode~szHTMLParser.clear_cdata_modec CsX|j}d}t|ƒ}||krè|jrv|jsv| d|¡}|dkr | dt||dƒ¡}|dkrpt d¡  ||¡spqè|}n*|j   ||¡}|r’|  ¡}n|jrœqè|}||krÞ|jrÌ|jsÌ|  t |||…ƒ¡n|  |||…¡| ||¡}||kröqè|j}|d|ƒrJt ||¡r"| |¡} n†|d|ƒr:| |¡} nn|d|ƒrR| |¡} nV|d|ƒrj| |¡} n>|d |ƒr‚| |¡} n&|d |krè|  d¡|d } nqè| dkr<|s¼qè| d |d ¡} | dkrú| d|d ¡} | dkr|d } n| d 7} |jr*|js*|  t ||| …ƒ¡n|  ||| …¡| || ¡}q|d |ƒrðt ||¡}|r²| ¡d d…} | | ¡| ¡} |d| d ƒs¢| d } | || ¡}qnÿs  z!HTMLParser.parse_html_declarationrcCs`|j}|||d…dvs"Jdƒ‚| d|d¡}|dkr>dS|rX| ||d|…¡|dS)Nr-)r,r)z"unexpected call to parse_comment()rr.r)rr1Úhandle_comment)r rFÚreportrÚposr r r rOszHTMLParser.parse_bogus_commentcCsd|j}|||d…dks"Jdƒ‚t ||d¡}|s:dS| ¡}| ||d|…¡| ¡}|S)Nr-r+zunexpected call to parse_pi()r.)rÚpicloser4r5Ú handle_pirB)r rFrr9rHr r r r= szHTMLParser.parse_picCsìd|_| |¡}|dkr|S|j}|||…|_g}t ||d¡}|sPJdƒ‚| ¡}| d¡ ¡|_}||kr.t  ||¡}|sŠq.| ddd¡\} } } | s¨d} n\| dd…dkrÌ| dd…ksøn| dd…dkrô| dd…krnn | dd…} | rt | ƒ} |  |  ¡| f¡| ¡}ql|||…  ¡} | d vr¬|  ¡\} }d |jvrˆ| |j d ¡} t|jƒ|j d ¡}n|t|jƒ}| |||…¡|S|  d ¡rÆ| ||¡n"| ||¡||jvrè| |¡|S) Nrrz#unexpected call to parse_starttag()r-rLú'r.ú")rú/>Ú rX)rÚcheck_for_whole_start_tagrÚtagfind_tolerantr9rBr@rrÚattrfind_tolerantrÚappendÚstripZgetposÚcountr0r2r6ÚendswithÚhandle_startendtagÚhandle_starttagÚCDATA_CONTENT_ELEMENTSr#)r rFÚendposrÚattrsr9rIÚtagÚmÚattrnameÚrestZ attrvaluerBÚlinenoÚoffsetr r r r:,sZ    & ÿ ÿ       ÿ    zHTMLParser.parse_starttagcCs¶|j}t ||¡}|rª| ¡}|||d…}|dkr>|dS|dkr~| d|¡rZ|dS| d|¡rjdS||krv|S|dS|dkrŠdS|dvr–dS||kr¢|S|dStd ƒ‚dS) Nrrú/rXr-r.r z6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZzwe should not get here!)rÚlocatestarttagend_tolerantr9rBr7ÚAssertionError)r rFrrgrHÚnextr r r rZ_s.   z$HTMLParser.check_for_whole_start_tagcCs.|j}|||d…dks"Jdƒ‚t ||d¡}|s:dS| ¡}t ||¡}|sÜ|jdurr| |||…¡|St ||d¡}|s¬|||d…dkr¢|dS|  |¡S|  d¡  ¡}|  d| ¡¡}|  |¡|dS|  d¡  ¡}|jdur||jkr| |||…¡|S|  |¡| ¡|S) Nr-r)zunexpected call to parse_endtagrr.rLzr)rÚ endendtagr4rBÚ endtagfindr9rr6r[rOr@rr1Ú handle_endtagr%)r rFrr9rPZ namematchZtagnamer"r r r r;s8       zHTMLParser.parse_endtagcCs| ||¡| |¡dSr$)rbrr©r rfrer r r ra©s zHTMLParser.handle_startendtagcCsdSr$r rsr r r rb®szHTMLParser.handle_starttagcCsdSr$r )r rfr r r rr²szHTMLParser.handle_endtagcCsdSr$r ©r rJr r r rA¶szHTMLParser.handle_charrefcCsdSr$r rtr r r rDºszHTMLParser.handle_entityrefcCsdSr$r rr r r r6¾szHTMLParser.handle_datacCsdSr$r rr r r rQÂszHTMLParser.handle_commentcCsdSr$r )r Zdeclr r r rNÆszHTMLParser.handle_declcCsdSr$r rr r r rUÊszHTMLParser.handle_picCsdSr$r rr r r Ú unknown_declÍszHTMLParser.unknown_decl)r)Ú__name__Ú __module__Ú __qualname__Ú__doc__rcr rrrrrr#r%rr>rOr=r:rZr;rarbrrrArDr6rQrNrUrur r r r r>s6  z  3"()ryrrÚhtmlrÚ__all__r rrErCr?r8rTZ commentcloser[r\ÚVERBOSErmrprqrrr r r r Ús*          ÿò