I'm currently working on making my own PHP script to create epubs. One of the things my script does is create an atom feed for the "Online Catalog" section of Stanza. I've noticed on one of the items, the title displays correctly when viewing via "Online Catalog" but when the epub is downloaded to Stanza's library, the right single quotation mark (UTF-8 hex e28099) in the title gets displayed as â
I'm guessing other multi-byte characters also suffer a similar fate. Unfortunately, I'm unable to test this further at the moment (have a friend coming over tomorrow and I need to do some major house cleaning) so I have no idea if it's only the library that's messed up or if actual epub rendering is affected. I've done some reading on the IDPF. From what I understand, UTF-8 is the encoding used for XML and XHTML content.
Does Stanza have it's own quirks when displaying UTF-8 encoded content? Also, is there a tag that we're supposed to use to signal UTF-8 character encoding?
Just to make sure I
Just to make sure I understand:
1. UTF-8 characters in the book title show up correctly in the Online Catalog interface
2. UTF-8 characters in the book's ePub contents show up correctly when read in Stanza
3. UTF-8 characters in the book's title do not show up correctly once the book has been imported into the Stanza library
If that is the case, then I wonder if the problem might be that the title in the .opf file isn't being read correctly. Can you let us know if the .opf file declares that it uses UTF-8 encoding? Would it be possible to post a snippet of the file?
Yep, that's the exact
Yep, that's the exact behavior I'm seeing.
However, per your inquiry, I looked into the OPF file and it seems the issue is with Calibre (html2epub). It treated the title as ASCII and re-encoded each byte into UTF-8. I guess I'll be asking Kovid for help on this one. Thank you very much for all the help.
Stanza definitely has good
Stanza definitely has good support for UTF-8 (see some of the foreign language catalogs in the Project Gutenberg catalog for some examples). If you open the file in Adobe Digital editions does the quotation mark look correct?
One possibility is that you haven't declared the encoding in the XHTML file. E.g., you should start it with:
<?xml version="1.0" encoding="UTF-8"?>
Had a bit more time to play
Had a bit more time to play with it. I'd like to reiterate, the title is displayed correctly when viewed via the "Online Catalog" menu. It's when the files are downloaded and viewed via the local library that the multi-byte characters in the title aren't displayed properly. UTF-8 encoded characters within ePub files are displayed correctly.