Hebrew Support in Hyperref – Situation Review

It’s been a bit more than three years since I’ve written about a workaround for getting hyperref to play (almost) nicely with Hebrew. Over the past few weeks, I saw I rising interest in this and few people contacted me regarding this issue. So I thought it’s a good opportunity to better document the current situation, and possible ways that should be further investigated which I believe might lead to better solutions.

Basically the situation is this: PDF has some sort of “stack” to keep track of the various properties applied to the text, and so has “begin link” (achieved via \hyper@linkstart and “end link” (\hyper@linkend), because LaTeX’s awful RTL support, when in RTL mode, every symbol is inserted in reverse order into the PDF, hence the “end link” appears before the “begin link” which is obviously wrong.

My solution was to override the \hyper@link macro when in RTL mode put the \hyper@linkstart after the \hyper@linkend (reverse order). This solution works fine most of the time. But if we delve deeper into how the page is laid out, we find out it is done one line at the time (e.g. latex builds the page as LTR and then reverses the RTL parts, or some twisted thing like that). I’ll try to illustrate why it’s problematic below:

Logical data (assume it’s Hebrew)

abc def ghk start_link link text end_link bla

Normally is transferred into:

alb end_link txet knil start_link khg fed cba

The fix replaces the order so “in memory” it looks like this:

abc def ghk end_link link text start_link bla

But when the RTL text gets reversed then:

alb start_link txet knil end_link khg fed cba

Which is good. However when we introduce a newline inside the text, it goes astray.

abc def ghk end_link link 
text start_link bla

which turns into

knil end_link khg fed cba
alb start_link txet

Which is messed up, as we again end the link before starting it.

How can this be solved? We could use an hbox around the link to prevent it from breaking. This has the downside of creating overfull (or underfull) hbox‘s. On the other hand, if we could detect when a link would have a line break and disable the hack only for that link, it would work properly (we note that while the hack fixes the regular links, it actually breaks those with newlines). I’ve tried finding a way have a hook on line breaks, but I couldn’t find how to do so.

Another method that could prove useful is to provide functionality like breakurl. breakurl allows breaking across URL in the dvips/ps2pdf. I haven’t looked thoroughly on this, but it seems that dvips/ps2pdf don’t support links across line boundaries. The solution which breakurl seems to employ is to define possible breakage points inside the URL, and make each “block” a separate link. That way, you can have a newline between the “blocks” while still having a correct link. I guess that if we hack the breakurl we could make the Hebrew links non-breakable by default and let breakurl like functionality handle the breaks. It wouldn’t be pretty, but it should allow working Hebrew links across line boundaries.

The other issue, regarding the links in the table of contents, was solved by Or Dagmi and it works with ps2pdf.

So overall, I think that with a bit more effort, we’ll be able to overcome the rest of the major hurdles for proper Hebrew support in hyperref. Of course if you’re interested in these issues, please comment and share your insights.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.