Discussion:
[t2t] How to recognise [description uri] ?
rch
2015-07-25 08:23:21 UTC
Permalink
I am using t2t to pretty-print the entries in my
Firefox list of bookmarks.

Perl reads the *.jsonlz4 file and writes a t2t file
with a long list of all the entries;
A typical entry would be e.g.
+ [description http://foo.bar/foobar.html].

It mostly works very well, but text2tags completely
fails to encode a few entries.

Question
How do I get text2tags to recognise the following
problems as [description uri] ?

I have tried encoding the uris with Perl URI::Escape;
and with URI::Encode; but these did not help.

Richard H

PROBLEMS -
t2t FAILS ON THE FOLLOWING:-

+ [(11) NVRS-HRI-gg - Google Groups
https://groups.google.com/forum/#!forum/nvrs-hri-gg]
+ [INRA 383 pp Vie microbienne du sol et production végétale
http://prodinra.inra.fr/?locale=fr#!ConsultNotice:122538]
+ [Invasive alien species European Parliament
http://www.europarl.europa.eu/RegData/etudes/workshop/join/2014/518746/IPOL-ENVI_AT(2014)518746_EN.pdf]


+ [Statistique sur la migration des amphibiens
http://karch.ch/karch/page-35576.html;jsessionid=508B95B836F099F2DA38FC7ACEFA00A1.corvus2]

+ [Let's Liberate Diversity! Annual Forum / News / Agrobiodiversity
@knowledged / Themes / Hivos Knowledge Programme / Home -
Ontwikkelingsorganisatie Hivos
http://www.hivos.net/Hivos-Knowledge-Programme/Themes/Agrobiodiversity-knowledged/News2/Let-s-Liberate-Diversity!-Annual-Forum]

+ [For pressure cookers - excellent - Physical Chemistry: Understanding
our Chemical World - Paul M. S. Monk - Google Books
http://books.google.be/books?id=9qVpoDH00pEC&pg=PA199&lpg=PA199&dq=Clausius+Clapeyron+pressure+cooker&source=bl&ots=1wJ9fTtxHC&sig=L4rJ1GTBgC-IuDRlO75SZHF_HWY&hl=en&sa=X&ei=elkHVObrKdK4ogTdqIDwDw&ved=0CGQQ6AEwBw#v=onepage&q=Clausius%20Clapeyron%20pressure%20cooker&f=false]


+ [Nice pdf on bees by Inst roy Sci Nat BE
http://www.vivelesabeilles.be/uploads/Mediatheek/IRSCNB%20abeilles%20FR%20(page%20a%20page)%20LR.pdf]

+ [Atlas Mammiferes donnees sur docs google com
https://docs.google.com/spreadsheets/d/17_ufDWx7sb6_iivlmbv6g-UER1arFLsWpkH5B0j8PnA/edit#gid=0]


WHAT I WAS HOPING FOR:-

<LI> <a href="https://groups.google.com/forum/#!foru-m/nvrs-hri-gg">(11)
NVRS-HRI-gg- Google Groups</a></li>
<LI> <a
href="http://prodinra.inra.fr/?locale=fr#!ConsultNotice:122538">INRA 383
pp Vie microbienne du sol et production végétale</a></li>
<LI> <a
href="http://www.europarl.europa.eu/RegData/etudes/workshop/join/2014/518746/IPOL-ENVI_AT(2014)518746_EN.pdf">Invasive
alien species European Parliament</a></li>

<LI> <a
href="http://karch.ch/karch/page-35576.html;jsessionid=508B95B836F099F2DA38FC7ACEFA00A1.corvus2">Statistique
sur la migration des amphibiens</a></li>
<LI> <a
href="http://www.hivos.net/Hivos-Knowledge-Programme/Themes/Agrobiodiversity-knowledged/News2/Let-s-Liberate-Diversity!-Annual-Forum">Let's
Liberate Diversity! Annual Forum / News / Agrobiodiversity @knowledged /
Themes / Hivos Knowledge Programme / Home - Ontwikkelingsorganisatie
Hivos</a></li>
<LI> <a
href="http://books.google.be/books?id=9qVpoDH00pEC&pg=PA199&lpg=PA199&dq=Clausius+Clapeyron+pressure+cooker&source=bl&ots=1wJ9fTtxHC&sig=L4rJ1GTBgC-IuDRlO75SZHF_HWY&hl=en&sa=X&ei=elkHVObrKdK4ogTdqIDwDw&ved=0CGQQ6AEwBw#v=onepage&q=Clausius%20Clapeyron%20pressure%20cooker&f=false">For
pressure cookers - excellent - Physical Chemistry: Understanding our
Chemical World - Paul M. S. Monk - Google Books</a></li>

<LI> <a
href="http://www.vivelesabeilles.be/uploads/Mediatheek/IRSCNB%20abeilles%20FR%20(page%20a%20page)%20LR.pdf">Nice
pdf on bees by Inst roy Sci Nat BE</a></li>
<LI> <a
href="https://docs.google.com/spreadsheets/d-/17_ufDWx7sb6_iivlmbv6g-UER1arFLsWpkH5B0j8PnA/edit#gid=0>Atlas
Mammiferes donnees sur docs google com</a></li>

/ends



------------------------------------------------------------------------------
Forgeot Eric
2015-07-25 09:34:27 UTC
Permalink
Yahoo mail doesn't recognise some of those links either... :(Obviously this kind of characters are causing the problems ( ; # ! and probably more. Why do webmaster add this kind of things in their code?

I guess some of them are legit, while others are less or not. Should or could txt2tags support all of them? I don't know. Meanwhile, you can probably find some scripts to convert those characters to proper encoding. Maybe this? Text::Unidecode - search.cpan.org

|   |
|   | |   |   |   |   |   |
| Text::Unidecode - search.cpan.orgNAME SYNOPSIS DESCRIPTION DESIGN PHILOSOPHY FUNCTIONS DESIGN GOALS AND CONSTRAINTS A POD ENCODING TEST TODO MOTTO WHEN YOU DON'T ... |
| |
| Afficher sur search.cpan.org | Aperçu par Yahoo |
| |
|   |


HTML URL Encoding Reference

|   |
|   |   |   |   |   |
| HTML URL Encoding ReferenceHTML URL Encoding Reference « Previous Next Reference » URL encoding converts characters into a format that can be transmitted over the Internet. URL - Uniform Resource Locator |
| |
| Afficher sur www.w3schools.com | Aperçu par Yahoo |
| |
|   |






Le Samedi 25 juillet 2015 10h23, rch <***@skynet.be> a écrit :



    I am using t2t to pretty-print the entries in my
    Firefox list of bookmarks.

    Perl reads the *.jsonlz4 file and writes a t2t file
    with a long list of  all the entries;
    A typical entry would be e.g.
        + [description http://foo.bar/foobar.html].

    It mostly works very well, but text2tags completely
    fails to encode a few entries.

  Question
    How do I get text2tags to recognise the following
    problems as [description uri] ?

    I have tried encoding the uris with Perl URI::Escape;
    and with URI::Encode; but these did not help.

    Richard H

PROBLEMS -
t2t FAILS ON THE FOLLOWING:-

+ [(11) NVRS-HRI-gg - Google Groups
https://groups.google.com/forum/#!forum/nvrs-hri-gg]
+ [INRA 383 pp Vie microbienne du sol et production végétale
http://prodinra.inra.fr/?locale=fr#!ConsultNotice:122538]
+ [Invasive alien species European Parliament
http://www.europarl.europa.eu/RegData/etudes/workshop/join/2014/518746/IPOL-ENVI_AT(2014)518746_EN.pdf]


+ [Statistique sur la migration des amphibiens
http://karch.ch/karch/page-35576.html;jsessionid=508B95B836F099F2DA38FC7ACEFA00A1.corvus2]

+ [Let's Liberate Diversity! Annual Forum / News / Agrobiodiversity
@knowledged / Themes / Hivos Knowledge Programme / Home -
Ontwikkelingsorganisatie Hivos
http://www.hivos.net/Hivos-Knowledge-Programme/Themes/Agrobiodiversity-knowledged/News2/Let-s-Liberate-Diversity!-Annual-Forum]

+ [For pressure cookers - excellent - Physical Chemistry: Understanding
our Chemical World - Paul M. S. Monk - Google Books
http://books.google.be/books?id=9qVpoDH00pEC&pg=PA199&lpg=PA199&dq=Clausius+Clapeyron+pressure+cooker&source=bl&ots=1wJ9fTtxHC&sig=L4rJ1GTBgC-IuDRlO75SZHF_HWY&hl=en&sa=X&ei=elkHVObrKdK4ogTdqIDwDw&ved=0CGQQ6AEwBw#v=onepage&q=Clausius%20Clapeyron%20pressure%20cooker&f=false]


+ [Nice pdf on bees by Inst roy Sci Nat BE
http://www.vivelesabeilles.be/uploads/Mediatheek/IRSCNB%20abeilles%20FR%20(page%20a%20page)%20LR.pdf]

+ [Atlas Mammiferes donnees sur docs google com
https://docs.google.com/spreadsheets/d/17_ufDWx7sb6_iivlmbv6g-UER1arFLsWpkH5B0j8PnA/edit#gid=0]


WHAT I WAS HOPING FOR:-

<LI> <a href="https://groups.google.com/forum/#!foru-m/nvrs-hri-gg">(11)
NVRS-HRI-gg- Google Groups</a></li>
<LI> <a
href="http://prodinra.inra.fr/?locale=fr#!ConsultNotice:122538">INRA 383
pp Vie microbienne du sol et production végétale</a></li>
<LI> <a
href="http://www.europarl.europa.eu/RegData/etudes/workshop/join/2014/518746/IPOL-ENVI_AT(2014)518746_EN.pdf">Invasive
alien species European Parliament</a></li> 

<LI> <a
href="http://karch.ch/karch/page-35576.html;jsessionid=508B95B836F099F2DA38FC7ACEFA00A1.corvus2">Statistique
sur la migration des amphibiens</a></li> 
<LI> <a
href="http://www.hivos.net/Hivos-Knowledge-Programme/Themes/Agrobiodiversity-knowledged/News2/Let-s-Liberate-Diversity!-Annual-Forum">Let's
Liberate Diversity! Annual Forum / News / Agrobiodiversity @knowledged /
Themes / Hivos Knowledge Programme / Home - Ontwikkelingsorganisatie
Hivos</a></li> 
<LI> <a
href="http://books.google.be/books?id=9qVpoDH00pEC&pg=PA199&lpg=PA199&dq=Clausius+Clapeyron+pressure+cooker&source=bl&ots=1wJ9fTtxHC&sig=L4rJ1GTBgC-IuDRlO75SZHF_HWY&hl=en&sa=X&ei=elkHVObrKdK4ogTdqIDwDw&ved=0CGQQ6AEwBw#v=onepage&q=Clausius%20Clapeyron%20pressure%20cooker&f=false">For
pressure cookers - excellent - Physical Chemistry: Understanding our
Chemical World - Paul M. S. Monk - Google Books</a></li> 

<LI> <a
href="http://www.vivelesabeilles.be/uploads/Mediatheek/IRSCNB%20abeilles%20FR%20(page%20a%20page)%20LR.pdf">Nice
pdf on bees by Inst roy Sci Nat BE</a></li> 
<LI> <a
href="https://docs.google.com/spreadsheets/d-/17_ufDWx7sb6_iivlmbv6g-UER1arFLsWpkH5B0j8PnA/edit#gid=0>Atlas
Mammiferes donnees sur docs google com</a></li>

/ends



------------------------------------------------------------------------------
rch
2015-07-27 07:40:21 UTC
Permalink
------------------------------------------------------------------------------
Aurelio Jargas
2015-07-30 10:43:57 UTC
Permalink
Hi rch,

The txt2tags URL parser is conservative, to avoid matching false positives,
such as normal punctuation right after a URL.

The solution you found works by escaping punctuation before de URL parsing
(with preproc) and restoring them after conversion (postproc).

Another alternative to avoid funky URLs is to save them in an "URL bank"
made by series of postprocs. Example:

%!postproc: URL01 http://example.com/foo
%!postproc: URL02 http://example.com/bar
%!postproc: URL03 http://example.com/baz
...

And in the text body, you make links using those names instead of URLs:

- [My homepage URL01]
- [Work stuff URL02]
- [Weekend pictures URL03]
- ...

A similar technique is used in the txt2tags website, on the Children page.
The sources are here:

http://txt2tags.org/children.t2t
Post by rch
Question
How do I get text2tags to recognise the following
problems as [description uri] ?

 
 

Off this list, a correspondent suggested to try the following:-
%!preproc : '!' 'AAAAA'
%!preproc : '\(' 'BBBBB'
%!preproc : '\)' 'CCCCC'
%!preproc : ';' 'DDDDD'
%!preproc : ':' 'EEEEE'
%!preproc : '#' 'FFFFF'
%!preproc : '\?' 'GGGGG'
%!postproc : 'AAAAA' '!'
%!postproc : 'BBBBB' '('
%!postproc : 'CCCCC' ')'
%!postproc : 'DDDDD' ';'
%!postproc : 'EEEEE' ':'
%!postproc : 'FFFFF' '#'
%!postproc : 'GGGGG' '?'
And that works!
All the *«The text2tags problem entries, format t2t»* at
http://users.skynet.be/watermael/miscellaneous/In_asciidoc_they_all_work.html
now solved
With many thanks to my correspondent
and to txt2tags
Richard H
------------------------------------------------------------------------------
_______________________________________________
txt2tags-list mailing list
https://lists.sourceforge.net/lists/listinfo/txt2tags-list
--
Aurelio | www.aurelio.net | @oreio
rch
2015-07-30 13:06:10 UTC
Permalink
Post by Aurelio Jargas
Another alternative to avoid funky URLs is to save them in an "URL bank"
%!postproc: URL01 http://example.com/foo
%!postproc: URL02 http://example.com/bar
%!postproc: URL03 http://example.com/baz
...
- [My homepage URL01]
- [Work stuff URL02]
- [Weekend pictures URL03]
- ...
A similar technique is used in the txt2tags website, on the Children page.
http://txt2tags.org/children.t2t
Thank you!

The line
%!preproc: '^ \| ((SOFT|TECH|HOME|MISC)..)$' ' | \1_SRC | [\1_TXT
\1_URL] | \1_LNG '
in http://txt2tags.org/children.t2t

is really clever

Richard H


------------------------------------------------------------------------------
Continue reading on narkive:
Loading...